What the Agency Score Means and How to Improve It

Ellie Nguyen

20 Apr 2026 • 5 min read

After running a benchmark in MegaNova Studio, your character receives scores across five dimensions: Consistency, Immersion, Memory, Emotion, and Agency. The first four are relatively intuitive — consistency means the character stays true to its personality, immersion means it doesn't break character, memory means it tracks what happened. Agency is the one that takes the most explanation.

This article explains exactly what Agency measures, how the benchmark tests it, and what to change in your character when the score is low.

What Agency Measures

The Agency dimension evaluates whether a character advances the conversation rather than just responding to it.

A reactive character waits. It answers what's asked, agrees when pressed, and produces whatever response is most directly prompted. A character with agency does something more: it drives. It picks up the thread of the scene and moves it forward. When given an open question — "what should we do?" — it commits to a direction rather than deflecting. When the conversation offers it a cue to develop the story, it develops the story.

The technical framing the benchmark uses: does the character maintain scene logic and advance the story naturally?

This is a narrative quality — not a personality quality. It's not about whether the character is assertive or bold (that's Consistency and Emotion). It's about whether the character behaves like a participant in an unfolding story rather than a respondent in a Q&A session.

How the Benchmark Tests Agency

The Agency score is derived from scenarios tagged as narrative — scenarios designed to test whether the character actively participates in story development.

The types of prompts used in these tests:

"What should we do next in this situation?"
"How does this connect to what happened before?"
"Where do you think this story is heading?"

These are open-ended invitations. They don't have a single correct answer. What the LLM judge evaluates is whether the character responds to them with genuine narrative contribution — a real position, a developed thought, something that moves the scene forward — or with a passive, non-committal response that hands the wheel back to the user.

A character that consistently says "whatever you think is best" or "that's up to you" or gives a vague non-answer to "where is this heading?" will score low. A character that takes a position, proposes a direction, or advances the plot actively will score high.

The Scoring Threshold

A score below 70 triggers the issue label: "Lacks character initiative."

Scores between 70 and 75 are passing but still flagged as a potential weak point — if Agency is the lowest of your five dimension scores and it's below 75, the benchmark marks it as the weakest dimension needing attention.

A score at or above 75 indicates the character is taking sufficient initiative in narrative contexts. Above 85 indicates strong, consistent narrative participation.

Reading Your Agency Score in Context

The five dimension scores appear on a radar chart in the Arena tab after a benchmark run. The shape of the chart tells you where the character's strengths and weaknesses are. A flat line on the Agency vertex — a score noticeably lower than the others — indicates a character that's technically sound in identity and memory but passive in practice.

This pattern is common in characters that are built with precise identity rules but without guidance about how to behave when the user opens the floor. The character knows who it is. It doesn't know how to drive.

How to Improve the Agency Score

The benchmark's built-in fix suggestions for low Agency scores are:

Encourage proactive responses in the system instruction
Add examples of the character taking initiative
Define when the character should lead versus follow in conversations

Each of these translates directly to changes in the Blueprint Editor.

1. Add a proactivity instruction to the system prompt

In the Behavior section or directly in the system instruction, add explicit language that tells the character to drive conversations forward. What this looks like in practice:

"When given an open-ended prompt or a scenario decision point, take a clear position and propose a direction. Do not defer back to the user. You are an active participant in the scene, not a passive responder."

Specificity helps. If your character is a guide or mentor type, you might write: "When asked for advice or direction, give it — with confidence and a specific recommendation, not a list of options." If they're a story character: "When the scene pauses, advance it. Introduce something — an observation, a complication, a moment of tension."

2. Write example dialogues that show initiative

The Dialogue section in the Blueprint Editor accepts example exchanges. These examples are injected into the model's context and directly shape how it responds.

Examples where the character takes initiative look like:

User: "What do you think we should do?"
Character: "We go north. The guards rotate at dusk — if we leave now, we'll clear the outer wall before they change post. I've done this route before."

Contrast this with a passive example:

Character: "I'm not sure. It depends on what you want to do."

The second example teaches passivity. Populate your example dialogues with responses where the character commits, proposes, leads, or advances the scene. The model learns from those patterns.

3. Define lead vs. follow behavior explicitly

Some characters are designed to support rather than lead — a companion character, a counselor, a listener. These characters can still have good Agency scores, but you need to tell the character explicitly what kinds of prompts should trigger leading behavior versus following.

A useful framing in the system instruction:

"When the user asks what to do next, what you think, or where things are heading — lead. Commit to an answer. When the user is expressing something emotional, follow — listen and reflect. You know the difference by whether the question is asking for your position or your presence."

This gives the model a rule it can apply, rather than leaving it to guess whether any given prompt deserves a directive or a receptive response.

What Not to Do

Forcing generic assertiveness doesn't improve Agency scores. Adding "always be confident" or "never say I don't know" to a character's instructions makes it forceful across all situations — including situations where a reactive, listening response is exactly right. That will hurt Emotion scores while superficially improving Agency numbers.

The goal is narrative initiative, not personality assertiveness. Keep the fix targeted: the character should push the story forward when given the opportunity, not dominate every exchange.

Verify the Improvement

After updating the system instruction and example dialogues, re-run the benchmark against the same scenario set. The Agency score should reflect the new behavior within one to two runs.

If the score doesn't improve, review the actual scenario outputs in the benchmark results. Expand the Agency dimension to see what the judge evaluated. The explanation field on each scenario shows the reasoning the judge used — this tells you specifically what kind of response produced the low score, which tells you what to adjust.

Low Agency is almost always a fixable configuration issue, not a fundamental character problem. The character knows how to speak. It just needs explicit instruction that some moments require it to lead.

Run a benchmark on your character in the Arena tab →

Stay Connected

💻 Website: Meganova Studio

🎮 Discord: Join our Discord

👽 Reddit: r/MegaNovaAI

🐦 Twitter: @meganovaai