
Artificial intelligence (AI) has rapidly transformed multiple industries, and its potential in psychology is no exception. One of the most complex and time-consuming tasks in psychological research and clinical practice is narrative assessment, where experts analyze stories to evaluate personality traits, emotions, and thought patterns. Traditionally, this requires extensive human expertise, making the process slow and resource-intensive. However, a recent study suggests that AI, when guided by expert-crafted prompts, can rate psychological narratives as reliably as trained psychologists.
A study titled “From Llama to Language: Prompt-Engineering Allows General-Purpose Artificial Intelligence to Rate Narratives Like Expert Psychologists”, published in Frontiers in Artificial Intelligence (2025) by Barry Dauphin and Caleb Siefert, investigates whether AI chatbots can reliably perform psychological assessments. By using a refined prompt engineering process, researchers enabled AI models such as ChatGPT-4 and CLAUDE-2-100k to assess narratives with high accuracy and consistency. Their findings provide groundbreaking insights into AI’s role in psychological evaluations.
The power of prompt engineering in psychological assessments
The study explored whether expert-crafted prompts could enable AI chatbots to rate narratives using the Social Cognition and Object Relations Scales – Global Rating Method (SCORS-G), a widely used tool in psychological research. SCORS-G evaluates psychological narratives based on various cognitive, emotional, and interpersonal dimensions, requiring expertise to apply correctly.
Researchers followed a structured prompt optimization process, where experts refined AI-generated prompts through multiple iterations. Initially, chatbots struggled to produce accurate ratings. However, after incorporating expert-designed prompts, the AI models showed significant improvement in rating consistency and reliability. The study compared basic prompts, one-shot prompts, and expert-refined prompts, finding that only the expert-refined prompts led to acceptable reliability scores matching those of human raters.
Additionally, the researchers tested whether combining ratings from two different AI models – ChatGPT-4 and CLAUDE-2-100k – would enhance accuracy. They found that averaging AI ratings led to even greater reliability, reducing inconsistencies and mitigating individual model biases. This suggests that multiple AI models can complement each other, strengthening psychological assessments.
AI as a research assistant: Can machines match human psychologists?
One of the most compelling findings of the study is that AI models, when given optimized prompts, can match or exceed human raters in reliability. The AI’s ability to process vast amounts of narrative data in minutes – compared to the months it takes human experts – suggests that AI could revolutionize personality research. This has major implications for psychology, where subjective evaluations of narrative data have historically been slow and expensive.
Moreover, AI has the potential to detect subtle patterns in narratives that humans might overlook. In clinical psychology, this could help identify early signs of mental health conditions, refine personality assessments, and assist in therapeutic planning. The study also highlights AI’s potential to reduce bias and fatigue in human assessments, as it can maintain consistent performance over long periods without cognitive decline.
However, despite these benefits, researchers caution against fully replacing human evaluators. While AI proved effective in assessing global psychological traits, certain nuanced aspects of narrative interpretation still require human judgment. For now, AI can serve as a powerful assistant to psychologists, rather than a replacement.
Challenges and ethical considerations in AI-powered psychology
Despite the promising results, the study acknowledges key challenges and ethical concerns in using AI for psychological assessments. One major issue is data privacy, as sensitive personal narratives must be protected from unauthorized access. Additionally, AI bias remains a concern, as models trained on biased datasets may introduce unintended distortions in their evaluations.
Another challenge is the interpretability of AI-generated ratings. While AI can provide scores and summaries, psychologists must understand how the AI reached its conclusions to ensure its assessments align with human expertise. The study suggests that greater transparency in AI decision-making is necessary before full integration into clinical settings.
Additionally, ethical concerns regarding AI’s influence in mental health decisions must be addressed. As AI gains a larger role in psychology, clear guidelines and regulatory frameworks will be needed to ensure responsible use, prevent misuse, and maintain human oversight.
The future of AI in psychological research and clinical practice
The study concludes that AI has the potential to transform narrative-based psychological research and assessments. With further refinements in prompt engineering, AI models could assist in diagnosing mental health conditions, tracking therapy progress, and improving the efficiency of personality assessments.
Future research should explore AI’s ability to assess more diverse narrative datasets and evaluate whether different language models (e.g., GPT-4, Gemini, and Llama) yield similar results. Additionally, integrating AI-powered psychological assessments with traditional diagnostic tools could create a hybrid model combining human expertise with AI efficiency.
Ultimately, this research represents a significant step toward AI-assisted psychology, demonstrating how expert-guided AI models can enhance clinical research and practice. While AI is not yet a replacement for human psychologists, it is poised to become a valuable tool in improving the speed, accuracy, and accessibility of psychological assessments.