Claude AI Fluency Scorecard: 11 Skills Anthropic Tracks

Anthropic is testing a personal AI Fluency Scorecard inside Claude that analyzes your conversation history and grades your skills across 11 specific behaviors. Spotted on May 26, 2026 by TestingCatalog, the feature generates a fractional score (for example, 7.5 out of 11) alongside targeted feedback on which habits to strengthen. No launch date has been announced, but the groundwork is already in place: Anthropic published the AI Fluency Index research in February 2026, tracking 11 observable behaviors across 9,830 Claude conversations. The scorecard turns that research into a mirror you can hold up to your own usage.

What the Scorecard Does

Claude AI Fluency Scorecard showing 7.5 out of 11 score with orange progress arc

The feature lives inside Claude's settings panel. When you request it, Claude scans your conversation history across Chat, Cowork, and Claude Code sessions, scores each exchange against the 11 behavioral indicators, and produces a structured report. The result is a fraction alongside a plain-language breakdown of where you're strong and where you're leaving potential on the table.

The intent is practical: help people who are new to Claude understand where their habits are paying off, and give experienced users a concrete way to identify blind spots. Whether the rollout will cover all users or begin with enterprise accounts is still unclear.

The Three Dimensions of AI Fluency

Three AI fluency dimensions: Delegation megaphone, Description clipboard, Discernment magnifying glass

Anthropic organizes the 11 behaviors into three groups under what it calls the 4D AI Fluency Framework. For creators working with AI tools daily, each dimension maps directly to outcomes in the work.

Delegation (2 behaviors)

Delegation measures whether you set Claude up to do good work before execution begins. The two indicators are: clarifying your goals before asking for help, and consulting on approach before diving into a task. Skipping both is the most common way experienced users plateau. They get fast outputs that miss the mark, then spend more time correcting than it would have taken to brief properly.

Description (5 behaviors)

Description covers how much information you give Claude about the work itself. The five behaviors: defining your intended audience, specifying output format, communicating desired tone and style, building iteratively through refinement, and providing examples or references. The research found that only 30% of conversations included explicit collaboration instructions. The remaining 70% left Claude guessing on at least some of these dimensions.

Discernment (3 behaviors)

Discernment is the most underused dimension. It asks whether you're critically evaluating what Claude produces. The three behaviors are: checking factual claims, pushing back on reasoning gaps, and proactively sharing relevant context when you notice Claude is working with incomplete information. The Anthropic data shows a sharp warning here: when Claude produces an artifact (code, a document, an app), users are 3.1 percentage points less likely to question its reasoning and 5.2 points less likely to flag missing context. Polished outputs suppress critical thinking. Counteract this deliberately.

What the Research Found

AI Fluency research stats: 85.7 percent iteration rate across 9830 conversations

The AI Fluency Index analyzed 9,830 Claude.ai conversations from January 2026. The top-line number: 85.7% of conversations showed iteration and refinement. That sounds like a strong baseline until you look at what separates iterative conversations from the rest.

Conversations with iteration showed 2.67 additional fluency behaviors on average versus 1.33 for non-iterative conversations. Users who iterate are 5.6 times more likely to question Claude's reasoning and 4 times more likely to identify missing context. The gap between someone who sends one prompt and accepts the first response versus someone who treats the response as a starting point is not marginal. It compounds across every session.

The research also found that 12.3% of conversations involved artifact creation. Those users consistently scored lower on Discernment. The more production-ready the output looks, the less likely people are to interrogate it. For creative professionals shipping AI-assisted work (brand assets, scripts, designs, copy), this is a real quality risk.

The full methodology and framework are available in Anthropic's AI Fluency Framework PDF.

How to Improve Your Score Before the Feature Launches

Rising block transitioning from gray to orange representing score improvement

You don't need the scorecard to start. The 11 behaviors are concrete enough to audit your own sessions right now. Here's a repeatable workflow for any creative task in Claude:

Set collaboration terms at the start. Open with context: who the work is for, what format you need, what tone is right. One or two sentences up front saves multiple correction rounds later.
Brief before you ask. For anything more than a simple question, consult Claude on approach before requesting execution. Ask: "What would you need to know to do this well?" The answers often surface constraints you hadn't thought of.
Treat the first response as a draft, not a final. Respond with at least one refinement request on any meaningful task. Iterative conversations score more than twice as many fluency behaviors as single-exchange ones.
Add a verification step to artifact work. After Claude produces code, a document, or any polished output, run one targeted check: "What are the weakest assumptions in this?" or "What did you leave out that I should know?" This counteracts the polished-output effect that suppresses critical evaluation.
Push back once per session on something. Fluency is not about accepting outputs. It is about engaging with them. If Claude's reasoning doesn't fully track, say so. You don't need to be adversarial; "I'm not sure this holds because..." is enough to trigger a more rigorous response.

Anthropic Academy's AI Fluency Framework course covers the full 24-behavior model (11 are observable in conversations; 13 occur outside them) if you want a deeper foundation. It's free and self-paced. Separately, Claude's Code Review feature builds structured Discernment directly into the Claude Code workflow for technical users.

Why This Feature Matters for Creative AI Users

Most AI fluency advice is abstract. "Iterate more" and "write better prompts" are directives that don't tell you what to change. A scorecard with 11 specific behavioral signals, tracked against your actual session history, converts vague advice into a gap analysis.

For creative professionals, the stakes are higher than for general use. When AI assists with client work, design systems, scripts, or production pipelines, the quality of the collaboration directly affects the quality of the deliverable. Low-fluency usage produces outputs that look fine but have invisible reasoning failures or missed context baked in. The scorecard makes those patterns visible.

Anthropic has also been building toward longer-term user context: the Claude Memory Files feature in development will let Claude maintain context across sessions. A fluency scorecard and persistent memory together would let Claude adapt to how you actually work, not just how you work in a single conversation.

Frequently Asked Questions

When will the AI Fluency Scorecard launch?

No launch date has been announced. TestingCatalog spotted the feature in Anthropic's testing environment on May 26, 2026. It may roll out to enterprise accounts first before reaching all Claude.ai users.

How does Claude know which behaviors I've shown?

The scorecard analyzes your conversation history directly from your Claude.ai account, looking at sessions across Chat, Cowork, and Claude Code. It scores each session against the 11 behavioral indicators and aggregates the results.

Will the scorecard work on Claude Pro?

Anthropic hasn't confirmed tier availability yet. The feature accesses conversation history within Claude.ai, so it likely requires an active account at some tier. Given that the research was built on Claude.ai conversations, Pro access seems probable at minimum. Anthropic's personal use guide for Claude covers how different tiers handle history and personalization.

What is a good score?

The example in TestingCatalog's reporting shows "7.5 out of 11." A high score means you regularly exhibit most of the 11 behaviors. Because behaviors like Discernment are underused across the board (only 30% of conversations include explicit collaboration terms), anyone consistently practicing all three dimensions is already in a small minority.

Can teams use the scorecard for training?

The feature as described works on individual accounts. Whether enterprise plans will include team-level reporting or manager dashboards is not yet known. The AI Fluency Index was partly funded through Anthropic's education initiative, so organizational use seems like a natural extension.

Is the AI Fluency Framework available now?

Yes. The AI Fluency Index research is public, and the free Anthropic Academy course covers the full framework. You can apply the 11 behaviors to your work today without waiting for the scorecard feature.

Claude AI Fluency Scorecard: 11 Skills Anthropic Tracks

What the Scorecard Does