COURSE 12 / SOURCE 12_ai_native_operating_course_content.md

AI-Native Operating

Capabilities, world models, honest signal, and operating with intelligence in the system.

Course 12 Content Spec: AI-Native Operating

1. Title and source files used

Course title: Project Agni Course 12: AI-Native Operating Subtitle: The Organization as an Intelligence Tagline: Operating as if intelligence lives in the system

Source files used

  • 12-ai-native-operating/curriculum.md
  • 12-ai-native-operating/website-prompt.md

Purpose of this file

  • Expand the approved curriculum into a delivery-ready course-content specification.
  • Preserve the curriculum as source truth while turning it into a teachable, assessable, LLM-gradeable program.
  • Define the exact learning flow, artifact requirements, facilitation moves, and evaluation logic needed to run the course without additional design work.

2. Design decisions at the top

  1. The course teaches operating behavior, not AI enthusiasm. The content avoids generic AI transformation talk and centers on concrete operating disciplines: capability mapping, signal quality, artifact design, and failure-signal capture.
  2. The organizing metaphor is a living system architecture. This follows the website prompt and helps students reason in layers: capabilities, world model, intelligence layer, interfaces. Every module returns to these four blocks.
  3. Artifacts are the curriculum. Students are assessed on durable operating artifacts, not recall. Every session produces portfolio material that can be reviewed by humans and LLMs.
  4. Capability-first thinking is enforced. Students must separate primitives from products, and composable capabilities from one-off features. This is a core grading distinction.
  5. Signal honesty is treated as a design constraint. The course repeatedly asks: what data represents actual consequential behavior, what is merely performative, and what compounds over time?
  6. The course is implementation-ready for mixed modality. Each session includes async preparation, live facilitation, and post-session artifact work so it can run as a cohort intensive or a team enablement program.
  7. Assessment is evidence-bound. LLM grading is only allowed to score what is present in the student artifact. Missing evidence lowers confidence and score; the evaluator must not infer competence from tone or jargon.
  8. Feedback must be behavior-changing. Rubrics and evaluator prompts are designed to produce revision guidance that improves organizational practice, not just a grade.
  9. The website aesthetic informs tone, not content structure. The site may feel like a mission briefing or system diagram, but the course file stays operational, explicit, and classroom-usable.

3. Delivery model assumptions

Program format

  • 4-session intensive
  • Each session includes:
  • 45-60 min async prep
  • 150 min live workshop
  • 60-90 min post-session artifact work
  • Total learner time: approximately 16-18 hours

Recommended cadence

  • Option A: 2 weeks, 2 sessions per week
  • Option B: 4 weeks, 1 session per week
  • Option C: internal team sprint, 2 consecutive days with combined async work between blocks

Audience

  • Operators, founders, product leaders, functional leads, chiefs of staff, strategy leads, transformation leads
  • Works best for students who can examine a real organization they belong to or know well

Cohort size

  • Ideal: 12-30
  • Minimum viable: 6
  • Maximum without extra facilitation: 36

Facilitation staffing

  • 1 lead facilitator can run up to 20 students
  • 1 lead facilitator + 1 teaching assistant recommended for 21-36 students

Delivery environment

  • Video platform with breakout rooms
  • Shared whiteboard or diagramming tool
  • Shared artifact repository or LMS
  • Structured submission template in markdown, doc, or form fields

Submission assumptions

  • All student artifacts should use standardized headings so human and LLM evaluation can read them consistently
  • Preferred submission format: one portfolio document with clearly labeled sections for all five capstone artifacts
  • Students should cite evidence from their organization using anonymized but concrete examples

Prerequisite assumption

  • Course 11 is recommended but not required
  • Students unfamiliar with DRI concepts should receive a 10-minute primer in prework

Facilitator stance

  • Push students away from buzzwords and toward operational specifics
  • Challenge category errors, especially when students confuse interfaces with capabilities or surveys with honest signal
  • Reward clarity, specificity, and evidence over polish

4. Detailed course content broken down by module, session, lesson, activity, timing, facilitator moves, and student outputs

Course-wide learning outcomes

By the end of the course, students will be able to:

  • Map an organization using the four-block model rather than an org-chart model
  • Distinguish capability primitives from product bundles and delivery interfaces
  • Audit signal quality and identify honest, consequential, compounding data sources
  • Design model-ready artifacts that preserve decisions, reasoning, outcomes, and systemic signal
  • Reframe roadmap decisions as compositions, failed compositions, and capability gaps
  • Produce a portfolio that can guide actual changes in organizational operating practice

Standard artifact template requirements used in every session

Every student artifact should include:

  • Context
  • Observed reality
  • Interpretation
  • Implications
  • Evidence or examples
  • Open questions

This structure makes the work teachable, revisable, and LLM-assessable.


Module 1 / Session 1

The Four Building Blocks: What Your Organization Actually Has

Module purpose

  • Establish the core architecture of AI-native operating
  • Force students to distinguish between capabilities, world model, intelligence layer, and interfaces
  • Surface where organizational intelligence currently resides

Session outcome

  • Students complete a first-pass 4-block organizational map with at least 3 critical gaps identified

Prework (45-60 min)

  • Read the session brief on Block's four building blocks
  • Review two sample company descriptions and classify elements into the four blocks
  • Write a short memo: "Where does intelligence currently live in my organization?"

Live workshop (150 min)

LessonActivityTimingFacilitator movesStudent outputs
1. Framing the architectureOpening provocation and model walkthrough20 minOpen with the course tagline and ask students to define what their organization genuinely understands that is hard to understand. Teach the four blocks with 2-3 concrete examples. Reiterate that the model is functional, not hierarchical.Notes on the four blocks and a working definition of organizational intelligence
2. Capability vs. product distinctionSorting exercise20 minPresent examples such as payments, payroll, dashboard, merchant app, underwriting model, account opening, customer portal. Push students to justify each placement. Correct category errors publicly.Annotated sort showing which items are primitives, bundles, or interfaces
3. Where intelligence lives nowIndividual mapping draft25 minInstruct students to map a real organization using the four-block framework. Require at least one concrete example per block or an explicit "missing." Circulate and press for specificity.Draft 4-block map
4. Diagnosis and critiquePair review20 minGive peers a critique protocol: "What is mislabeled? Where is a person substituting for a system? Where is intelligence trapped in a layer?"Peer critique notes and 2 revisions to map
5. Roadmap as limiting factorMini-lecture and discussion20 minContrast roadmap-first operating with capability-and-composition operating. Use the curriculum line that the roadmap is often the bottleneck between reality and response.Reflection on current roadmap bottlenecks
6. Gap identificationGuided refinement25 minRequire students to mark at least 3 gaps: missing capability, missing world model visibility, missing intelligence layer behavior, or interface overinvestment. Ask which gap is most systemically constraining.Revised map with highlighted gaps
7. Share-out and synthesisDebrief20 minSelect 3-4 students with different org types. Summarize recurring errors: treating an interface as value creation, treating a team as a world model, calling a feature a capability.Debrief notes and prioritized gap list

Facilitator emphasis points

  • A capability has no necessary UI and can support multiple future compositions.
  • A world model can be partial, fragmented, stale, person-dependent, or absent.
  • An intelligence layer is not a dashboard, workflow, or PM ritual; it is the logic that recognizes moments and composes responses.
  • Interfaces are important, but they are not where the deepest defensibility lives.

Post-session assignment (60-75 min)

  • Finalize the 4-Block Organizational Map
  • Add:
  • 3 examples of true capabilities
  • 3 examples of product bundles or interfaces often mistaken for capabilities
  • 3 highest-cost gaps in the current system
  • 1 paragraph on where intelligence currently lives and why that is a problem

Required student output

  • Capstone Artifact 1 draft: 4-Block Organizational Map

Module 2 / Session 2

Honest Signal: What Money Knows That Surveys Don't

Module purpose

  • Teach signal quality as an operating advantage
  • Help students identify which signals in their context are honest, consequential, and compounding
  • Move teams away from self-report and vanity data as primary truth

Session outcome

  • Students produce a signal inventory with explicit honesty and compounding ratings, plus one newly designed signal pathway

Prework (45-60 min)

  • Read the signal quality spectrum
  • Gather 5-10 data sources their organization uses to make decisions
  • Bring one example of a decision made from weak or performative signal

Live workshop (150 min)

LessonActivityTimingFacilitator movesStudent outputs
1. Signal honesty frameConcept lecture20 minTeach the difference between stated preference and revealed preference. Use the website prompt spectrum language. Clarify that "honest" means consequential, not morally pure.Personal notes and reclassification of current signals
2. Signal spectrum calibrationGroup placement exercise20 minPlace sample signals on a spectrum: NPS, survey intent, login frequency, repeat purchase, refund rate, repayment behavior, time-to-resolution. Ask why each belongs where it does.Signal spectrum worksheet
3. Inventory buildIndividual work25 minStudents list all current organizational signals, the decision each informs, its honesty rating, compounding rating, and known blind spots. Press them to include ignored signals, not just official metrics.Draft signal inventory table
4. Compounding flywheelMini-lecture and application20 minExplain richer signal -> better model -> more relevant action -> richer signal. Ask what would have to happen for their best signal to compound rather than remain static.Flywheel sketch for one signal loop
5. Pathway designNew signal design lab25 minStudents design one new honest signal pathway their organization does not currently capture. Require: trigger, collection method, storage format, ethical boundary, operating use.New signal pathway draft
6. Red team reviewSmall-group challenge20 minAsk peers to find weak assumptions: Is the signal really consequential? Can it be gamed? Does it accumulate? Does it change behavior or just describe behavior?Critique notes and revised ratings
7. SynthesisDebrief and scoring norms20 minSummarize common mistakes: overrating surveys, mistaking activity for value, confusing observability with honesty. Preview how the artifact will be graded.Final priorities for top 3 signals

Facilitator emphasis points

  • Financial signal is exemplary because it carries consequences, but other domains can have honest signal if actions are costly or committed.
  • A compounding signal gets more useful as history accumulates.
  • A signal pathway is not just a metric definition; it includes capture, structure, reuse, and actionability.

Post-session assignment (75-90 min)

  • Complete the Signal Inventory
  • Required fields per signal:
  • Signal name
  • Source
  • Example event
  • Decision currently influenced
  • Honesty score 1-5
  • Compounding score 1-5
  • Risk of gaming
  • Current owner
  • Underused opportunity
  • Design one new signal pathway with:
  • trigger condition
  • collection mechanism
  • storage fields
  • review cadence
  • decision(s) it should influence

Required student output

  • Capstone Artifact 2 draft: Signal Inventory

Module 3 / Session 3

Artifact Discipline: Feeding the Model

Module purpose

  • Reframe documentation as model fuel
  • Build decision logging habits that preserve reasoning and downstream signal
  • Improve artifact quality so organizational intelligence compounds instead of evaporates

Session outcome

  • Students adopt a decision-log format, audit recent artifacts, and convert weak documentation into model-ready entries

Prework (45-60 min)

  • Collect 3-5 recent work artifacts: meeting notes, project docs, tickets, email threads, Slack threads, commits, or strategy memos
  • Mark which artifact led to a consequential decision
  • Read the course decision-log template

Live workshop (150 min)

LessonActivityTimingFacilitator movesStudent outputs
1. Artifact philosophyOpening lecture20 minTeach the Block principle that remote-first artifacts are not archive residue; they are the operating substrate. Ask what work disappears today because it is not captured well.Artifact loss list
2. Model-ready artifact criteriaCriteria walkthrough20 minTeach the four criteria: decision recorded, context preserved, outcome linked, machine-readable structure. Show one strong and one weak example.Criteria checklist
3. Artifact auditIndividual audit25 minStudents examine their collected artifacts and score each against the four criteria. Push them to identify what is missing, not whether the artifact is "good enough."Artifact audit table
4. Decision log conversionRewrite lab30 minStudents convert one weak artifact into the decision-log template. Require alternatives considered and explicit reasoning, not post-hoc rationalization.One rewritten decision log
5. Failure postmortems as signalMini-lecture and applied practice20 minTeach postmortem structure: what happened, why, what it signals, what capability is missing. Reject blame-focused narratives.Postmortem skeleton for one failure
6. Peer reviewStructured critique15 minPeers review whether the artifact is actually machine-readable, whether outcome and signal are distinct, and whether context is concrete enough for future inference.Revision notes
7. OperationalizationHabit design20 minAsk students to specify where these logs will live, who must write them, when they are updated, and how they feed future decision-making.Operating plan for artifact discipline

Facilitator emphasis points

  • Documentation is weak if it records that a meeting occurred but not what changed.
  • "Outcome" and "signal" are different; an outcome is what happened, a signal is what the system should learn.
  • Machine-readable does not require a database; it requires stable structure and retrievable fields.

Post-session assignment (60-90 min)

  • Submit:
  • Artifact audit of at least 5 recent artifacts
  • At least 6 decision-log entries covering the last 2 weeks or the nearest equivalent working period
  • 1 structured postmortem entry for a failure, miss, escalation, or reversal
  • Use the following required decision-log fields:
  • Date
  • Decision
  • Context
  • Alternatives considered
  • Reasoning
  • Outcome
  • Signal
  • Follow-up owner
  • Review date

Required student output

  • Capstone Artifact 3 draft: Artifact Discipline Log

Module 4 / Session 4

The Intelligence Layer: Composing Without Product Managers

Module purpose

  • Help students think in compositions, moments, and capability gaps
  • Teach failure-signal roadmap generation
  • Conclude with an integrated operating-principles document

Session outcome

  • Students complete a composition challenge analysis and a set of operating principles for AI-native operating in their context

Prework (45-60 min)

  • Identify one real customer, user, stakeholder, or operational moment that was underserved
  • Bring any known constraints, failed workarounds, or roadmap debates related to that moment
  • Review prior session artifacts

Live workshop (150 min)

LessonActivityTimingFacilitator movesStudent outputs
1. Intelligence layer explainedScenario walkthrough20 minTeach the restaurant cash-flow example from the curriculum. Stress that no PM had to pre-spec the full solution if the capabilities and model existed.Notes on composition logic
2. Moment recognitionPattern-identification exercise20 minStudents define the moment they selected: what changed, what signal would indicate it, why it matters now, what the desired intervention is.Moment definition
3. Capability compositionComposition mapping25 minStudents list the capabilities required to respond effectively. Challenge anything that is actually a UI element, policy, or team.Composition map
4. Failure-signal roadmapGap analysis25 minAsk: if the intelligence layer tried to compose this solution today, where would it fail? What missing capability or missing model element would cause the failure?Failure-signal map
5. Portfolio integrationOperating principles drafting20 minStudents synthesize all prior artifacts into 5-7 operating principles. Each principle must link to one observed reality and one operating behavior.Draft operating principles
6. Executive readout practice3-minute briefing20 minStudents practice presenting the composition challenge and the highest-leverage capability gap as if briefing an exec team. Facilitate blunt feedback on clarity.Executive briefing notes
7. Closing critiqueWhole-group synthesis20 minSummarize recurring weak patterns: feature-thinking, local optimization, unclear moment definition, no compounding logic. End with revision priorities for final submission.Final revision checklist

Facilitator emphasis points

  • A strong composition challenge produces a reusable capability insight, not a one-off feature request.
  • Failure signals are productive. They show what the organization cannot yet compose.
  • The final principles should describe how the organization should operate, not generic values it claims to hold.

Post-session assignment (75-90 min)

  • Finalize:
  • Composition Challenge Analysis
  • Operating Principles Document
  • integrated capstone portfolio

Required student output

  • Capstone Artifact 4: Composition Challenge Analysis
  • Capstone Artifact 5: Operating Principles Document

Final capstone integration session guidance

If instructors want a formal portfolio review block, add an optional 60-minute capstone clinic after Session 4:

  • 15 min self-check against rubric
  • 20 min peer review
  • 15 min facilitator office hours
  • 10 min submission QA for formatting and completeness

5. Assignments and artifacts

Required capstone portfolio components

  1. 4-Block Organizational Map
  • Required elements:
  • one labeled section for each of the four blocks
  • at least 3 examples of true capabilities
  • at least 3 identified gaps
  • explanation of where intelligence currently lives
  • Evidence expectation:
  • examples must come from actual operating reality, not aspiration
  1. Signal Inventory
  • Required elements:
  • minimum 8 signal sources
  • honesty and compounding scores
  • one underutilized signal called out
  • one new signal pathway designed
  • Evidence expectation:
  • each signal tied to a real decision or missed decision
  1. Artifact Discipline Log
  • Required elements:
  • artifact audit for at least 5 recent artifacts
  • at least 6 decision logs
  • at least 1 structured postmortem
  • Evidence expectation:
  • decision logs must include reasoning and outcome, not just summary
  1. Composition Challenge Analysis
  • Required elements:
  • customer or operational moment description
  • signals that should identify the moment
  • capabilities required
  • missing capability or world-model gap
  • explicit failure signal and resulting roadmap implication
  • Evidence expectation:
  • the missing capability should plausibly enable more than one future composition
  1. Operating Principles Document
  • Required elements:
  • 5-7 principles
  • rationale for each principle
  • one operating behavior per principle
  • one artifact or signal implication per principle
  • Evidence expectation:
  • principles should emerge from portfolio findings, not generic management language

Submission packaging requirements

  • Preferred format: one markdown or document file with these exact top-level headings:
  • Artifact 1 - 4-Block Organizational Map
  • Artifact 2 - Signal Inventory
  • Artifact 3 - Artifact Discipline Log
  • Artifact 4 - Composition Challenge Analysis
  • Artifact 5 - Operating Principles Document
  • Students should anonymize confidential names but preserve the logic of the example
  • Tables are encouraged for signal inventory and artifact audit
  • Diagrams may be embedded as images, but each must have a short text explanation

Instructor review checkpoints

  • After Session 1: review whether students are distinguishing capabilities from products
  • After Session 2: review whether students are overvaluing self-report data
  • After Session 3: review whether decision logs contain actual reasoning and outcome linkage
  • After Session 4: review whether the composition challenge identifies a reusable capability gap

6. AI/LLM grading and assessment framework

Assessment philosophy

The LLM should act as a disciplined evidence evaluator, not an inspirational coach and not a mind reader. It should score only what the portfolio demonstrates.

Grading model

Total score: 100 points

Artifact weights

  • Artifact 1: 20 points
  • Artifact 2: 20 points
  • Artifact 3: 20 points
  • Artifact 4: 20 points
  • Artifact 5: 20 points

Portfolio-level evaluation dimensions

Across all artifacts, the evaluator should score these dimensions implicitly and reference them in feedback:

  • Analytical rigor
  • Capability-first thinking
  • Signal honesty
  • Artifact quality
  • Composition thinking
  • Operating specificity

Recommended evaluation workflow

  1. Completeness pass
  • Confirm all five artifacts are present
  • Confirm minimum required fields are present
  • Mark missing sections before scoring quality
  1. Artifact scoring pass
  • Score each artifact using its rubric
  • Extract 2-4 quoted evidence snippets or paraphrased evidence points from the submission
  • Note confidence level: high, medium, low
  1. Cross-artifact coherence pass
  • Check whether the signal inventory supports the composition challenge
  • Check whether operating principles are grounded in earlier artifacts
  • Check whether claimed capabilities remain consistent across artifacts
  1. Feedback generation pass
  • Produce strengths, weaknesses, and revision priorities
  • Give next-step advice tied to the weakest artifact

Concrete LLM evaluation heuristics

Artifact 1 heuristics: 4-Block Organizational Map

  • Score high when:
  • capabilities are framed as reusable primitives
  • interfaces are explicitly separated from deeper value-creation layers
  • the world model is described as an information system, not merely a team or meeting
  • intelligence location is diagnosed clearly, including human bottlenecks
  • Score down when:
  • product features, dashboards, or apps are mislabeled as capabilities
  • the map looks like an org chart rather than a functional architecture
  • gaps are generic, such as "need better AI"
  • there is no explanation of where intelligence currently lives
  • Hard cap rule:
  • if the artifact never distinguishes capabilities from products, maximum score is 10/20

Artifact 2 heuristics: Signal Inventory

  • Score high when:
  • consequential signals are prioritized over opinion-based signals
  • compounding potential is explained, not just numerically rated
  • the new pathway includes capture, storage, and decision use
  • gaming risk is acknowledged realistically
  • Score down when:
  • surveys, NPS, or intent statements are rated as the highest-quality signals without justification
  • there are fewer than 8 signals
  • the new pathway is just "collect more feedback"
  • no decision linkage is shown
  • Hard cap rule:
  • if most top-rated signals are self-report or vanity metrics, maximum score is 9/20

Artifact 3 heuristics: Artifact Discipline Log

  • Score high when:
  • decision logs include real alternatives and explicit reasoning
  • outcomes are linked to subsequent signal or review points
  • the postmortem identifies systemic learning, not individual blame
  • the student proposes a sustainable operating habit, not a one-time cleanup
  • Score down when:
  • logs are retrospective summaries with no decision tension
  • outcome fields say "TBD" for most entries without review dates
  • postmortem language is blame-centric or vague
  • there is no machine-readable structure
  • Hard cap rule:
  • if fewer than 4 decision logs are present, maximum score is 8/20

Artifact 4 heuristics: Composition Challenge Analysis

  • Score high when:
  • the moment is concrete and triggered by observable signal
  • the required capabilities are reusable and clearly named
  • the failure signal is plausible and roadmap-generating
  • the student distinguishes missing capability from missing interface polish
  • Score down when:
  • the artifact is basically a feature pitch
  • the moment is vague or hypothetical without signal evidence
  • the "missing capability" is actually a person, project, or UI request
  • no failure signal is defined
  • Hard cap rule:
  • if the submission does not identify a reusable capability gap, maximum score is 10/20

Artifact 5 heuristics: Operating Principles Document

  • Score high when:
  • each principle is specific enough to change behavior
  • each principle links back to evidence from earlier artifacts
  • principles describe how the organization should operate, not what it hopes to value
  • each principle has an artifact or signal implication
  • Score down when:
  • principles read like generic culture statements
  • the rationale is detached from earlier analysis
  • no behavior or instrumentation change is specified
  • multiple principles are redundant
  • Hard cap rule:
  • if principles are not grounded in the preceding portfolio, maximum score is 11/20

Confidence calibration guidance for the LLM

  • High confidence: detailed evidence, multiple concrete examples, consistent terminology across artifacts
  • Medium confidence: mostly complete submission with some vague or inconsistent sections
  • Low confidence: sparse evidence, high abstraction, or obvious template-filling with little operational specificity

Suggested structured grading output format

{
  "overall_score": 0,
  "artifact_scores": {
    "organizational_map": 0,
    "signal_inventory": 0,
    "artifact_discipline_log": 0,
    "composition_challenge": 0,
    "operating_principles": 0
  },
  "confidence": "high|medium|low",
  "strengths": [],
  "weaknesses": [],
  "revision_priorities": [],
  "evidence_used": [],
  "notes_for_human_reviewer": []
}
```

## 7. Rubrics, scoring criteria, and evaluator prompt guidance

### Master scoring scale

Use the following shared scale for each artifact:
- `18-20` Excellent: precise, evidence-rich, operationally credible, and aligned with course concepts
- `14-17` Strong: mostly rigorous, some gaps in depth or precision, but materially useful
- `10-13` Adequate: demonstrates partial understanding with notable category errors or thin evidence
- `6-9` Weak: superficial, generic, or structurally incomplete
- `0-5` Insufficient: missing major elements or fundamentally misunderstands the concept

### Artifact-specific rubric criteria

#### Artifact 1 rubric: 4-Block Organizational Map

| Criterion | Points | What earns full credit |
|---|---:|---|
| Correct use of four-block framework | 5 | All four blocks are clearly defined and populated or explicitly marked as missing |
| Capability identification quality | 5 | Capabilities are true primitives and separated from products/interfaces |
| Diagnosis of where intelligence lives | 5 | Current intelligence location is concrete, credible, and explains bottlenecks |
| Gap identification and prioritization | 5 | At least 3 meaningful gaps with clear systemic implications |

#### Artifact 2 rubric: Signal Inventory

| Criterion | Points | What earns full credit |
|---|---:|---|
| Honest signal identification | 6 | High-value signals are consequential and clearly justified |
| Compounding analysis | 4 | Student explains which signals grow in value over time and why |
| Decision linkage | 5 | Each signal is tied to a real decision or operating use |
| New signal pathway design | 5 | Pathway is specific, feasible, and operationally meaningful |

#### Artifact 3 rubric: Artifact Discipline Log

| Criterion | Points | What earns full credit |
|---|---:|---|
| Artifact audit quality | 4 | Audit diagnoses strengths and missing model-ready elements accurately |
| Decision log rigor | 8 | Entries include context, alternatives, reasoning, outcome, and signal |
| Postmortem quality | 4 | Failure analysis is systemic and learning-oriented |
| Operationalization plan | 4 | Student specifies how this discipline will persist in real work |

#### Artifact 4 rubric: Composition Challenge Analysis

| Criterion | Points | What earns full credit |
|---|---:|---|
| Moment definition | 4 | The moment is real, consequential, and signal-detectable |
| Capability composition logic | 6 | Required capabilities are reusable, well-scoped, and coherent |
| Failure-signal definition | 5 | Missing element and resulting failure signal are plausible and clear |
| Roadmap implication | 5 | The analysis shows how failure should generate future capability priorities |

#### Artifact 5 rubric: Operating Principles Document

| Criterion | Points | What earns full credit |
|---|---:|---|
| Specificity of principles | 5 | Principles are concrete and behavior-shaping |
| Grounding in portfolio evidence | 5 | Principles clearly emerge from prior artifacts |
| Operational implications | 5 | Each principle changes artifacts, signals, review habits, or decision rights |
| Coherence and non-redundancy | 5 | Principles fit together as a usable operating model |

### Portfolio evaluator prompt

Use this prompt when grading the full portfolio:

```text
You are evaluating a student portfolio for Project Agni Course 12: AI-Native Operating.

Your job is to grade only the evidence present in the submission. Do not assume competence, context, or implementation details that are not stated. Penalize category errors such as confusing products with capabilities, dashboards with intelligence layers, or self-report with honest signal.

Score each of the five artifacts out of 20 using the supplied rubric criteria:
1. 4-Block Organizational Map
2. Signal Inventory
3. Artifact Discipline Log
4. Composition Challenge Analysis
5. Operating Principles Document

Then produce:
- overall score out of 100
- confidence level: high, medium, or low
- 3 strengths grounded in evidence
- 3 weaknesses grounded in evidence
- 3 revision priorities ordered by leverage

Important evaluation rules:
- If an artifact is incomplete, score the artifact based on what is present and note missing sections.
- If products or interfaces are mislabeled as capabilities, reduce the Organizational Map score materially.
- If self-report or vanity metrics are rated as the strongest signals without strong justification, reduce the Signal Inventory score materially.
- If decision logs lack reasoning or outcome linkage, reduce the Artifact Discipline Log score materially.
- If the composition challenge is effectively a feature request, reduce the Composition Challenge score materially.
- If the operating principles read like generic values rather than operating rules grounded in the portfolio, reduce the Operating Principles score materially.

Return your answer in structured JSON using this shape:
{
  "overall_score": 0,
  "artifact_scores": {
    "organizational_map": 0,
    "signal_inventory": 0,
    "artifact_discipline_log": 0,
    "composition_challenge": 0,
    "operating_principles": 0
  },
  "confidence": "",
  "strengths": [],
  "weaknesses": [],
  "revision_priorities": [],
  "evidence_used": [],
  "notes_for_human_reviewer": []
}
```

### Single-artifact evaluator prompts

**Organizational Map**

```text
Evaluate this 4-block organizational map.
Score out of 20 using these criteria:
- correct use of the four-block framework
- capability identification quality
- diagnosis of where intelligence lives
- gap identification and prioritization

Flag any case where products, interfaces, teams, or dashboards are mislabeled as capabilities or intelligence layers.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```

**Signal Inventory**

```text
Evaluate this signal inventory.
Score out of 20 using these criteria:
- honest signal identification
- compounding analysis
- decision linkage
- new signal pathway design

Specifically identify whether the student overrates surveys, self-report, or vanity metrics.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```

**Artifact Discipline Log**

```text
Evaluate this artifact discipline submission.
Score out of 20 using these criteria:
- artifact audit quality
- decision log rigor
- postmortem quality
- operationalization plan

Flag whether decision logs preserve reasoning, whether outcomes are linked, and whether postmortems are systemic rather than blame-centric.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```

**Composition Challenge Analysis**

```text
Evaluate this composition challenge analysis.
Score out of 20 using these criteria:
- moment definition
- capability composition logic
- failure-signal definition
- roadmap implication

Flag whether the student has actually identified a reusable capability gap or has simply written a feature request.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```

**Operating Principles Document**

```text
Evaluate this operating principles document.
Score out of 20 using these criteria:
- specificity of principles
- grounding in portfolio evidence
- operational implications
- coherence and non-redundancy

Flag any principles that read like generic values statements rather than operating rules.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```

### Human-in-the-loop moderation guidance

- Send low-confidence evaluations to a human reviewer
- Require human review if the portfolio includes highly domain-specific regulated capabilities
- Require human review if the grader detects contradictions across artifacts but cannot resolve them
- Allow the human reviewer to override scores when real organizational context is clearly present but lightly documented

## 8. Feedback strategy: what strong/average/weak responses look like and how an LLM should respond

### Feedback design goals

- Make feedback diagnostic, not decorative
- Tie every critique to observable evidence or missing evidence
- Distinguish conceptual misunderstanding from incomplete execution
- Prioritize revisions that improve operating clarity and future decision quality

### What strong responses look like

**Characteristics**
- Uses the four-block framework correctly and consistently
- Separates capabilities from products and interfaces
- Identifies at least one overlooked, compounding, honest signal
- Writes decision logs with genuine alternatives and reasoning
- Frames the composition challenge around a real moment and a reusable capability gap
- Produces principles that clearly change how the organization should document, decide, and learn

**How the LLM should respond**
- Acknowledge the strongest analytical moves specifically
- Preserve the student's strongest distinctions so they are not revised away
- Suggest one higher-order improvement, such as tighter signal instrumentation or cross-artifact alignment
- Avoid generic praise; quote or paraphrase the evidence that justifies the score

**Example feedback posture**
- "Your strongest move is distinguishing merchant underwriting as a capability while correctly placing the merchant dashboard as an interface. That distinction holds across the portfolio and improves the quality of your composition analysis."

### What average responses look like

**Characteristics**
- Understands the course concepts in broad terms but slips into generic language
- Has some real examples but not enough evidence depth
- Mixes honest signals with weak proxies without fully defending the ratings
- Writes decision logs that capture context but only partially capture reasoning or outcome
- Identifies a plausible gap but not yet a clearly reusable capability

**How the LLM should respond**
- Name the partial success first
- Then isolate 2-3 concrete category errors or thin sections
- Give explicit revision instructions tied to the rubric
- Recommend the highest-leverage next move rather than many small edits

**Example feedback posture**
- "You are directionally using the four-block model well, but several items in your capability list still read as product bundles. Rework that section first; until the primitive layer is clean, the rest of the portfolio remains harder to evaluate."

### What weak responses look like

**Characteristics**
- Mostly abstract or slogan-heavy
- Treats the map like an org chart or tool list
- Overrates surveys, NPS, or qualitative opinion as primary truth without justification
- Logs decisions as summaries with no tension, alternatives, or learning
- Writes a feature request instead of a composition challenge
- Produces principles that sound like values posters

**How the LLM should respond**
- Be direct and specific
- State that the current artifact does not yet demonstrate the required course understanding
- Point to the exact missing elements
- Give a short revision sequence in the order that will unlock improvement
- Avoid shaming language and avoid pretending the work is stronger than it is

**Example feedback posture**
- "This submission does not yet show capability-first thinking. Most of the listed capabilities are interfaces or features. Revise the map before revising the composition challenge, because the current challenge inherits the same category error."

### LLM feedback rules

- Do not praise polish if analytical quality is weak
- Do not invent hidden strengths
- Do not soften essential corrections with vague positivity
- Do identify one thing worth preserving in any submission that is at least partially correct
- Do distinguish between `missing`, `unclear`, and `incorrect`

### Recommended feedback template for all evaluations

```text
Overall judgment:
[1-3 sentence summary of demonstrated understanding]

What is working:
- [evidence-based strength]
- [evidence-based strength]

What needs revision:
- [most important weakness]
- [second weakness]
- [third weakness]

Highest-leverage next step:
- [the one revision that most improves the portfolio]

Why this matters:
- [connect the revision to AI-native operating, not just the grade]
```

### Revision guidance by failure mode

**If the student confuses capabilities and products**
- Ask them to rewrite the map using the test: "Could this support multiple future experiences without being tied to one interface?"

**If the student overvalues weak signals**
- Ask them to rerank signals by consequence, cost of action, and resistance to gaming

**If decision logs are thin**
- Ask them to add one rejected alternative and one downstream signal for every log entry

**If the composition challenge is too feature-like**
- Ask them to remove UI language and restate the missing reusable capability

**If operating principles are generic**
- Ask them to tie each principle to one failure, one signal, and one artifact behavior from the portfolio

## Implementation notes for instructors

- Run this course with real organizational examples whenever possible; hypothetical examples should be allowed only when clearly marked
- Encourage anonymization but forbid total abstraction
- Use the website prompt's layered, system-architecture framing in slides and worksheets to reinforce transfer
- Keep the critique standard high; the course only works if category errors are corrected early