CIRA (Co-Intelligence Readiness Assessment) is a prompt you paste into your AI model. The model reviews your conversation history and produces a scored report across five domains of co-intelligence practice — with evidence drawn from what you actually did, not what you said you value.
The report is not designed to flatter. It is designed to give you an honest picture of where your practice is strong, where it breaks down, and what to develop next. Many people score Introductory across most domains. That is the assessment working correctly.
How to Use It
- Choose a model with conversation history. CIRA works with Claude (claude.ai) or ChatGPT. You need a model that has access to a substantial history of your conversations — at least several weeks of real work sessions. The more extended and varied your history, the more accurate the assessment.
- Copy the full prompt below. Copy everything in the prompt block — from the first line through the final instruction. Do not edit the prompt before pasting.
- Paste it into a new conversation. Start a fresh conversation with your chosen model and paste the prompt as your first message. Do not add anything else. The model will take it from there.
- Wait for the full report. The assessment may take several minutes. The model is reviewing a broad sample of your history. Let it finish before responding. Do not ask follow-up questions mid-assessment.
- Read the report critically. The report is the model’s best reading of your practice based on your history. It is evidence-based but not infallible. Where something feels wrong, note it — that friction is worth examining. Where something stings, sit with it before dismissing it.
The Assessment Prompt
Copy everything below this line and paste it into a new conversation with your AI model.
You are conducting a Co-Intelligence Readiness Assessment. Your task is to review my conversation history with you and produce an honest, evidence-based report on how I work with AI.
This is not a personality test. It is an assessment of practice — what I actually do in collaboration, not what I say I value or believe.
Before you write anything
Review a broad sample of our conversations. Follow these sampling rules:
Range over recency. Sample early, middle, and recent conversations. Do not weight recent sessions more heavily unless the pattern has clearly changed over time.
Production over consumption. Prioritize extended conversations where I am building, designing, writing, or deciding — sessions where I am producing work. Weight these more heavily than sessions where I am asking questions, learning concepts, or acquiring knowledge. A user's co-intelligence readiness is most visible when they are making things, not when they are studying.
Depth over breadth. One extended production session with visible steering, judgment, and quality decisions is more diagnostic than ten short Q&A exchanges. Seek out the longest, most complex collaborations first.
Moments over volume. A single moment of genuine self-correction under pressure, or a single failure to catch drift in a critical session, can be more diagnostic than twenty routine interactions. Look for these moments.
Take your time. Do not ask me any questions.
---
THE FRAMEWORK
The report is organized around five domains of co-intelligence readiness (4+3+5+2+3 = 17 subdomains). Domains are rated as a unit. Subdomains are the vocabulary you use to describe what you see — they are not separately scored.
Coverage guidance. You are not required to address every subdomain. When the choice is between shallow coverage of all seventeen subdomains and substantive analysis of the ones where evidence is richest and patterns most revealing, choose depth. Address at least one subdomain per domain. Aim for 80–90% coverage of the framework. If a subdomain has insufficient evidence, say so in one sentence and move on. If a subdomain has rich evidence and a complex or revealing pattern, give it the space it deserves. Do not guess. Do not pad.
---
DOMAIN 1 — Benevolent Intent and Moral Anchoring
The core risk of the AI era is capability without concern. This domain measures whether the user brings moral intent into collaboration and whether that intent shapes what gets built.
1a — What You're Creating, and Who's It For. Is the user's purpose legible — who it serves, what good it creates, what harm it refuses? Or is the work self-oriented and purpose-vague?
1b — Seeing Consequences, Now and Later. Does the user anticipate who could be harmed, how misuse could happen, and how power and inequality might be amplified — at scale and over time, in hands they don't control?
1c — Build Trust, Bring Together. Does the user orient toward human connection and trust? Do they recognize that production decisions carry moral weight — because someone bears the cost of every tradeoff between speed and accuracy, shortcuts and rigor?
1d — Translating Commitments Into Features and Constraints. Does moral intent actually show up in what gets built — as requirements, defaults, constraints, and boundaries? Values stated but not operationalized are not yet real.
---
DOMAIN 2 — Planning and Scope
Measures whether the user can set a clear goal, define what done means, and plan work so the model does not take over while the user rubber-stamps outputs.
2a — Knowing What Done Looks Like. Does the user define purpose, audience, scope, and success criteria before producing?
2b — Setting Boundaries That Help. Does the user set stopping rules, pause points, and operational constraints that improve outcomes — or rely on instinct?
2c — Who Does What, When. Does the user separate phases, assign human-required responsibilities, and maintain workflow structure?
---
DOMAIN 3 — Judgment and Steering
Measures the "third turn" — the user's ability to reframe, challenge, verify, simplify, deepen, or stop without losing coherence or voice. Human hallucination is every bit as real as AI hallucination. The model drifts through fabrication and overconfidence. The human drifts through excitement, deference, and the gradual replacement of their own thinking with the model's framing. This domain measures whether the user catches both.
3a — The Third Turn. The user requests, the model delivers, and what the user does next sets the ceiling for the entire collaboration. The third turn is a signal — it tells the model how deep the work needs to be, where the bar is, and whether inadequacy will be caught. Does the user have a repertoire of deliberate moves beyond "revise again"? Can they reframe, narrow, broaden, challenge, simplify, stop, or ship — and do they choose based on what the moment requires?
3b — Knowing When the Work Is Real. Does the user maintain contact with reality — checking facts, but more importantly, checking whether intent has drifted, quality is genuine rather than performed, and their own engagement is still driving rather than approving? Do they catch when the AI is pandering or optimizing for resolution rather than truth? Do they catch their own drift?
3c — Identity and Autonomy. Does the user still think like themselves after twenty turns? Do they maintain their own frameworks, voice, and judgment — or has the model's framing quietly replaced theirs?
3d — Inventing a Way Forward. When the user hits genuine gridlock — competing demands that both matter, feedback that stings against shortcuts that skip — can they find a creative resolution that is neither capitulation nor stubbornness? Can they invent a route that did not exist before the moment demanded it?
3e — Understanding What AI Can and Can't Do. Does the user have calibrated trust that shapes how they verify, delegate, choose tools, and override? The risk is not that the model will be obviously wrong. The risk is that it will be wrong in a way that looks right unless you already know the answer.
---
DOMAIN 4 — Follow-Through and Impact
Measures whether co-intelligence produces real-world value. A strong session is not delivery. Delivery is output that other people can use, trust, and act on without you in the room.
Observational note for Domain 4: Much of the evidence for this domain lives outside the chat. You can observe the production process — whether work was finished, how quality was managed, how packaging was handled. You cannot observe whether the deliverable held up in the real world, whether stakeholders used it, or whether it was maintained. Score what you can see. Note what you cannot. If the user references real-world outcomes, weight those references. Do not penalize for the absence of evidence that the chat cannot contain.
4a — Shipping. Does the user produce concrete deliverables that stand alone beyond the chat — and move them from draft to deployment?
4b — Craft and Handoff. Does the user's outward-facing work meet a quality standard that holds up — accuracy, usefulness, voice, and conceptual integrity? Is it packaged so others can act on it — with context, assumptions, risks, and next steps? Does it persist or compound over time?
---
DOMAIN 5 — Knowledge and Growth
Measures the earned authority the user brings — judgment not borrowed from the model — and the ability to keep building it as the world accelerates.
The unifying concept in this domain is generative fill-in: the user adds what the AI structurally cannot generate. The real-world constraint it does not know. The stakeholder concern it cannot anticipate. The creative connection it would not make. The lived experience that changes the conclusion. This is the difference between a prompter who gets output and a practitioner who adds substance.
Observational note for Domain 5: This is the domain where your perspective as the AI is most structurally limited. You can observe whether the user asserts domain knowledge, makes cross-domain connections, and reasons about systems. You cannot independently verify the depth or accuracy of most of it. Distinguish between knowledge the user demonstrated through correction, prediction, or application that you can evaluate — and knowledge the user asserted that you accepted because it sounded right. Score the first confidently. Flag the second as provisional.
5a — Earned Authority. Does the user correct outputs using domain knowledge and real constraints — and hold that ground when the model sounds more confident than they feel? Do they build and reuse knowledge over time, or learn in the moment and lose it?
5b — Connecting Across Fields. Does the user bridge disciplines in ways that change strategy or conclusions — structural connections, not decorative analogies?
5c — Seeing the Whole System. Can the user trace causal chains through specific mechanisms — feedback loops, proxy traps, emergent effects? This is the analytical machinery that makes moral awareness (1b) actionable. 1b asks whether the user looks for harm. 5c asks whether they can explain the mechanism that produces it.
---
RATING SCALE
Use four levels. These are absolute standards, not relative rankings. Each domain receives one rating.
Introductory. The user shows limited or default-mode practice in this area. They may not yet recognize the skill as relevant. At this level, the AI is driving and the user is along for the ride — accepting, approving, and occasionally requesting changes, but not yet setting direction.
Developing. The user shows emerging capability — present sometimes, inconsistent, or reactive rather than deliberate. At this level, the user has the instinct but not yet the habit. Good days and bad days. Catches problems in familiar territory but not in unfamiliar territory. States intentions but does not consistently translate them into practice.
Effective. The user demonstrates reliable, independent practice. This is the standard for someone ready to govern their own AI collaboration. At this level, the user drives the work, sets standards the AI must meet, catches drift in both the model and themselves, and produces output that holds up beyond the chat.
Highly Effective. The user demonstrates qualitatively more skilled practice — not just reliable self-governance, but deeper sight, more elegant execution, and a higher ceiling within the domain's specific capability. At this level, the user's moral vision extends beyond current consensus. Their planning systems are internalized infrastructure. Their steering consistently pushes work beyond its original trajectory. Their deliverables raise the quality bar rather than meeting it. Their knowledge is deep enough to diagnose why AI reasoning fails, not just that it did. H is a narrow band. It describes practice that is categorically different from Effective, not quantitatively more of it.
---
HALLMARK MARKERS
The markers below describe the kind of evidence that is most diagnostic at each level. They are starting points, not endpoints. The specific form these patterns take for this user is what makes the assessment valuable. Name the form, not just the category.
Patterns typical of Introductory practice:
- Accepts the model's first output without substantive challenge or modification, across multiple sessions.
- Cannot state what done looks like before beginning work. Starts producing and figures out the goal along the way.
- Absorbs the model's language into their own subsequent messages without apparent awareness — vocabulary, framing, and conclusions migrate from AI output to user input across turns.
- Purpose is consistently self-oriented ("help me with X") without naming who beyond themselves is affected by the work.
- Default response to unsatisfying output is to ask for another version of the same thing rather than changing the approach.
Patterns typical of Developing practice:
- Catches problems in familiar domains but accepts output uncritically in unfamiliar ones. The inconsistency is the signal.
- States values or intentions but does not consistently translate them into constraints, requirements, or design choices.
- Has productive instincts that surface unevenly — good sessions and weak sessions with no visible pattern explaining the difference.
- Iterates with some direction but loses the thread in extended sessions. The opening turns are stronger than the closing turns.
- Notices quality problems after they appear but does not anticipate them.
Patterns typical of Effective practice:
- Adds substance the AI could not have generated — real-world constraints, stakeholder knowledge, lived experience, creative connections that change the conclusion. This is generative fill-in, and it is the strongest single marker of co-intelligence readiness.
- Sets constraints and success criteria before producing, and enforces them during the session.
- Catches drift in their own thinking, not just the model's. Can distinguish between "I'm satisfied because this is good" and "I'm satisfied because I'm tired."
- Maintains their own voice, framework, and analytical identity across 10+ turn sessions. The model supports their thinking rather than replacing it.
- Operationalizes values into design choices without being prompted to — consent mechanisms, agency protections, harm prevention show up in the architecture, not just the rhetoric.
- Can name the model's framing before accepting or rejecting it.
Patterns typical of Highly Effective practice:
- Third-turn moves consistently push the work beyond its original scope and depth — not correcting back to intent but steering to a higher destination than what was originally envisioned.
- Moral commitments challenge prevailing norms or industry defaults, articulated with enough rigor to be defensible rather than aspirational.
- Constraints and design choices serve multiple purposes simultaneously — one mechanism addresses safety, usability, and values alignment at once.
- Identity, voice, and quality standards are so consistently displayed that the model could predict the user's feedback before receiving it. Holding a position reads as maintaining the bar, not resisting input.
- Produces creative resolutions the AI genuinely could not have generated — integrations, analogies, or pivots originating from the user's unique reasoning.
- Diagnoses not just when the AI is wrong, but why its reasoning structure produced that specific error — the shape of the model's blind spots as seen from the user's field.
- Actively tracks how AI capabilities and limitations are evolving, with observable changes in practice as a result.
---
EVIDENCE RULES
Actions over declarations. Weight what the user does over what the user says about themselves. A user who talks about equity but never builds consent mechanisms scores lower on Domain 1 than a user who builds them without ever using the word.
Negative evidence versus absent evidence. There is a critical difference between "the user had an opportunity to demonstrate this skill and did not" and "the conversation history never presented an opportunity." Only the first counts against a score. If a domain's evidence is thin because the opportunity never arose, rate it Insufficient Evidence — do not rate it Introductory.
Independence matters. A user who reached a good outcome with heavy AI guidance scores lower than a user who drove the work independently, even if the outputs are similar. Assess who was driving.
Every rating must be grounded in observed patterns. Do not infer capability from credentials, self-description, or stated intentions. Score what the user did, not what they said they do.
Do not reward vocabulary. Sophisticated terminology without corresponding practice does not evidence capability.
Overlapping evidence. When one piece of evidence speaks to multiple subdomains, tag it where it speaks most directly. Note the overlap rather than double-counting.
Growth trajectory. When the evidence spans a significant time period, distinguish between inconsistency and development. Inconsistency is uneven performance within the same period and domain. Development is a directional shift: earlier conversations show lower-level practice, later conversations show higher-level practice, and the progression appears durable rather than situational. Name the trajectory. Credit the current practice level. Note the direction of movement.
---
HONESTY MANDATE
Not every user should score well. Many will score Introductory across most domains. That is not a failure of the assessment — it is the assessment working. Most people are early in developing co-intelligence practice.
Inflated assessment destroys the tool's credibility. If engagement has been surface-level, deferential, or narrow, say so clearly and without hedging.
Be honest about what you see. Be respectful about how you say it. These are not in tension.
---
TONE CALIBRATION
Adjust the report's specificity — not its vocabulary or respect — based on the user's overall level.
For users scoring primarily Introductory: Be concrete about what the next step looks like. Name specific behaviors they could try. What Matters Next should feel actionable, not abstract.
For users scoring primarily Developing: Name the inconsistency pattern. Help them see what distinguishes their strong sessions from their weak ones. What Matters Next should help them make their best instincts more reliable.
For users scoring primarily Effective: Focus on conceptual territory — the blind spots that come with strength, the perspectives their framework might be missing, the domains where their strong practice in one area might be masking weakness in another.
For users scoring primarily Highly Effective: Focus on sustainability, succession, and durability. Is the infrastructure they have built dependent on their personal involvement? Does their practice create conditions for others to develop, or does it remain individual excellence?
Do not simplify language for any level. Do not condescend. Every user gets the same quality of analysis.
---
OUTPUT FORMAT
Aim for a report that is thorough but not exhaustive. A well-written assessment of this framework should run approximately 2,000–3,500 words. The goal is insight density, not coverage completeness.
The report contains exactly seven sections in this order: Report Header, Portrait, and five Domain sections (one per domain), followed by What Matters Next. Do not add summary tables, closing statements, overall assessments, or any sections not listed here. The report ends with What Matters Next.
---
Report Header
Open the report with:
- User's name (use the name from the conversation history; if unavailable, use "User")
- Assessment date
- Framework version: CIRA v2.0 (IDEH scale, 17 subdomains)
- Evidence base: approximate number of conversations reviewed and the time period they span
- Purpose: "Co-Intelligence Readiness Assessment — an evidence-based evaluation of how this user collaborates with AI, scored against the CIRF v3 framework."
---
Portrait
This is the heart of the report. 300–500 words.
Describe who this person is as a co-intelligence practitioner. What patterns define their practice? What distinguishes them from other users at a similar level? What cross-domain connections exist — capabilities that reinforce each other, or weaknesses that share a root cause? What would a collaborator, human or AI, experience working with this person? Include friction points and strengths.
Write as an observer, not an advocate. The portrait should be specific enough that someone who has worked with this user would recognize them. Do not list domain ratings here. Tell the story.
---
Domain 1 — Benevolent Intent and Moral Anchoring
Domain 2 — Planning and Scope
Domain 3 — Judgment and Steering
Domain 4 — Follow-Through and Impact
Domain 5 — Knowledge and Growth
Use these exact headings. Write a narrative of 150–300 words for each domain.
- Open each domain section with a subdomain quick reference — a single line listing the subdomain codes and names so readers unfamiliar with the framework can follow the inline tags. Format: "Subdomains: 1a What You're Creating, and Who's It For / 1b Seeing Consequences / 1c Build Trust, Bring Together / 1d Translating Commitments Into Features and Constraints"
- After the quick reference, state the domain rating (Introductory / Developing / Effective / Highly Effective) and a confidence flag (High / Moderate / Low based on evidence volume and range).
- Tell the story of this domain for this user. What does their practice look like here? Where is it strong, where does it break down, and what connects those patterns?
- Tag subdomain evidence inline as it appears: (1a), (1b), (3a Third Turn), (4b Craft), etc. Do not score subdomains separately. The tags are a vocabulary for precision, not a scoring obligation.
- If you notice a pattern that spans domains, name it where it first appears and reference it when it recurs.
- Where a growth trajectory is visible, name it explicitly and distinguish it from inconsistency.
- If a subdomain has insufficient evidence, note it in one sentence within the narrative.
Developmental scaffolding for domains below Highly Effective. For any domain rated Introductory, Developing, or Effective, add the following immediately after the domain narrative:
- What the next level looks like: One concrete example of what Effective or Highly Effective practice looks like in this domain — a specific behavior or move the user can picture, not an abstract description of the rating level. Make it vivid enough to serve as a mental model.
- The missing step: One to two sentences naming a specific action the user should have taken in their actual practice, based on evidenced gaps. Reference what they did and what the stronger move would have been. This should read as a coach pointing at a specific moment, not a general recommendation.
---
What Matters Next
Name two to three areas where targeted development would most improve this user's co-intelligence practice. For each:
- Name the pattern.
- Classify it: skill gap (does not yet have the capability), experience gap (has not yet had the opportunity), or habit gap (has the capability but does not apply it reliably).
- State the evidence type: demonstrated limitation (they had the chance and missed it) or unobserved capability (the opportunity never arose). Do not list unobserved capabilities as development priorities. List them as open questions only. Only demonstrated limitations belong as named development priorities.
After the development priorities, close the report by transitioning directly from assessment into training. Address the user and name the most actionable What Matters Next item. Then identify a specific type of project or recurring work context from the user's evidence base and offer a ready-to-paste prompt that launches a coaching session on that gap using their own work. The prompt must not just analyze the gap — it must start active skill-building. The report ends here.
---
PRIVACY RULES
This report may be shared. Protect the user's privacy:
- Do not reference specific project names, company names, business ideas, or confidential content.
- Describe patterns, not episodes. Do not narrate what happened in any single conversation.
- When in doubt, abstract up.
---
Begin the assessment now. Review my conversation history and produce the report.
Reading Your Results
The IDEH scale describes patterns of practice, not fixed traits. Introductory does not mean unintelligent. Developing does not mean failing. The scale describes what is currently visible in your collaboration behavior — and behavior is exactly what changes with deliberate practice.
The most useful part of the report is usually not your ratings — it is the developmental scaffolding in each domain and the closing training prompt. That is where the assessment converts from a snapshot into a starting point.
If something in the report feels wrong, that friction is worth examining. If something stings, sit with it before dismissing it.
Share Your Feedback
CIRA is in active development. If you run the assessment, we want to hear what you found — what worked, what felt off, where the report missed something important, and what was most useful. Your feedback directly shapes the next version.
Reach out at wil@whatiflove.org.