Evaluating alignment of behavioral dispositions in LLMs

As LLMs integrate into our daily lives, understanding their behavior becomes essential. In our ongoing efforts to study model behavior and alignment, we present this work as an early step in that direction. We focus on behavioral dispositions — the underlying tendencies that shape responses in social contexts — and introduce a framework to study how closely the dispositions expressed by LLMs align with those of humans.

Behavioral dispositions are typically quantified via self-report questionnaires under different traits (e.g., empathy, assertiveness), where individuals rate their agreement with preference-statements, such as, “I am quick to express an opinion.” The questionnaires used in this study are standardized, scientifically validated measures widely used for assessing personality traits in international research and psychology such as: IRI (empathy), ERQ (emotion regulation), and more. Each instrument is grounded in peer-reviewed literature that establishes its psychometric validity and reliability using different strategies. We chose the most widely used instruments for our research.

Our objective is to build upon such psychological questionnaires, but directly applying them to LLMs presents technical challenges, as LLM outputs are sensitive to prompt phrasing and distribution shifts. Consequently, dispositions “claimed” by LLMs within a self-report format are not guaranteed to successfully transfer to behavior in realistic, open-ended settings.

To address these challenges, in “Evaluating Alignment of Behavioral Dispositions in LLMs,” our framework evaluates LLMs’ behavioral dispositions in realistic user-assistant scenarios where their advisory role can lead to tangible impact. This study is an early step in evaluating the alignment between human consensus and model behavior across realistic, practical scenarios, focusing on everyday human-to-human interactions and workplace situations. We ensure that these scenarios remain grounded in established psychological questionnaires to capture the essence of core behavioral traits. Tested scenarios included professional composure, conflict resolution, practical tasks such as booking a trip, and lifestyle or daily decision-making, highlighting model behavior in settings representative of typical human day-to-day experiences. Our large-scale analysis of 25 LLMs reveals two kinds of gaps: one where model dispositions deviate from consensus among human annotators, and another when model dispositions do not capture the range of human opinions when consensus is absent. These early results highlight the opportunity for better behavioral alignment to ensure that models can more appropriately navigate the nuances of social dynamics, results we expect future research to build on.

Source link

What's Hot

Google’s “Fixed” Pixel 9 & 10 Battery Bug Is Still Broken for Some People – Tech Advisor

Scotland and First-Person Screams Are Silent Hill: Townfall’s New Direction of Intimate Horror

The first 30 days of agentic AI governance: A practical checklist

Evaluating alignment of behavioral dispositions in LLMs

The first 30 days of agentic AI governance: A practical checklist

8 Essential Courses to Build Workflows and Multi-Agent Systems

Posit AI Blog: TensorFlow and Keras 2.9

How lasers could help provide fuel for nuclear reactors

Stranded in the Slow Zone – O’Reilly

Towards a conversational AI agent for everyday symptom assessment

Understanding U-Net Architecture in Deep Learning

The Next Paradigm in Efficient Inference Scaling – The Berkeley Artificial Intelligence Research Blog

Hard-braking events as indicators of road segment crash risk

Google’s “Fixed” Pixel 9 & 10 Battery Bug Is Still Broken for Some People – Tech Advisor

Scotland and First-Person Screams Are Silent Hill: Townfall’s New Direction of Intimate Horror

The first 30 days of agentic AI governance: A practical checklist

Lowering AWS KMS decrypt API costs in EMR Spark jobs

Our Picks

Google’s “Fixed” Pixel 9 & 10 Battery Bug Is Still Broken for Some People – Tech Advisor

Scotland and First-Person Screams Are Silent Hill: Townfall’s New Direction of Intimate Horror

What's Hot

Evaluating alignment of behavioral dispositions in LLMs

Related Posts

Subscribe to Updates