Course 12 Content Spec: AI-Native Operating

1. Title and source files used

Course title: Project Agni Course 12: AI-Native Operating Subtitle: The Organization as an Intelligence Tagline: Operating as if intelligence lives in the system

Source files used

12-ai-native-operating/curriculum.md
12-ai-native-operating/website-prompt.md

Purpose of this file

Expand the approved curriculum into a delivery-ready course-content specification.
Preserve the curriculum as source truth while turning it into a teachable, assessable, LLM-gradeable program.
Define the exact learning flow, artifact requirements, facilitation moves, and evaluation logic needed to run the course without additional design work.

2. Design decisions at the top

The course teaches operating behavior, not AI enthusiasm. The content avoids generic AI transformation talk and centers on concrete operating disciplines: capability mapping, signal quality, artifact design, and failure-signal capture.
The organizing metaphor is a living system architecture. This follows the website prompt and helps students reason in layers: capabilities, world model, intelligence layer, interfaces. Every module returns to these four blocks.
Artifacts are the curriculum. Students are assessed on durable operating artifacts, not recall. Every session produces portfolio material that can be reviewed by humans and LLMs.
Capability-first thinking is enforced. Students must separate primitives from products, and composable capabilities from one-off features. This is a core grading distinction.
Signal honesty is treated as a design constraint. The course repeatedly asks: what data represents actual consequential behavior, what is merely performative, and what compounds over time?
The course is implementation-ready for mixed modality. Each session includes async preparation, live facilitation, and post-session artifact work so it can run as a cohort intensive or a team enablement program.
Assessment is evidence-bound. LLM grading is only allowed to score what is present in the student artifact. Missing evidence lowers confidence and score; the evaluator must not infer competence from tone or jargon.
Feedback must be behavior-changing. Rubrics and evaluator prompts are designed to produce revision guidance that improves organizational practice, not just a grade.
The website aesthetic informs tone, not content structure. The site may feel like a mission briefing or system diagram, but the course file stays operational, explicit, and classroom-usable.

3. Delivery model assumptions

Program format

4-session intensive
Each session includes:
45-60 min async prep
150 min live workshop
60-90 min post-session artifact work
Total learner time: approximately 16-18 hours

Recommended cadence

Option A: 2 weeks, 2 sessions per week
Option B: 4 weeks, 1 session per week
Option C: internal team sprint, 2 consecutive days with combined async work between blocks

Audience

Operators, founders, product leaders, functional leads, chiefs of staff, strategy leads, transformation leads
Works best for students who can examine a real organization they belong to or know well

Cohort size

Ideal: 12-30
Minimum viable: 6
Maximum without extra facilitation: 36

Facilitation staffing

1 lead facilitator can run up to 20 students
1 lead facilitator + 1 teaching assistant recommended for 21-36 students

Delivery environment

Video platform with breakout rooms
Shared whiteboard or diagramming tool
Shared artifact repository or LMS
Structured submission template in markdown, doc, or form fields

Submission assumptions

All student artifacts should use standardized headings so human and LLM evaluation can read them consistently
Preferred submission format: one portfolio document with clearly labeled sections for all five capstone artifacts
Students should cite evidence from their organization using anonymized but concrete examples

Prerequisite assumption

Course 11 is recommended but not required
Students unfamiliar with DRI concepts should receive a 10-minute primer in prework

Facilitator stance

Push students away from buzzwords and toward operational specifics
Challenge category errors, especially when students confuse interfaces with capabilities or surveys with honest signal
Reward clarity, specificity, and evidence over polish

4. Detailed course content broken down by module, session, lesson, activity, timing, facilitator moves, and student outputs

Course-wide learning outcomes

By the end of the course, students will be able to:

Map an organization using the four-block model rather than an org-chart model
Distinguish capability primitives from product bundles and delivery interfaces
Audit signal quality and identify honest, consequential, compounding data sources
Design model-ready artifacts that preserve decisions, reasoning, outcomes, and systemic signal
Reframe roadmap decisions as compositions, failed compositions, and capability gaps
Produce a portfolio that can guide actual changes in organizational operating practice

Standard artifact template requirements used in every session

Every student artifact should include:

Context
Observed reality
Interpretation
Implications
Evidence or examples
Open questions

This structure makes the work teachable, revisable, and LLM-assessable.

Module 1 / Session 1

The Four Building Blocks: What Your Organization Actually Has

Module purpose

Establish the core architecture of AI-native operating
Force students to distinguish between capabilities, world model, intelligence layer, and interfaces
Surface where organizational intelligence currently resides

Session outcome

Students complete a first-pass 4-block organizational map with at least 3 critical gaps identified

Prework (45-60 min)

Read the session brief on Block's four building blocks
Review two sample company descriptions and classify elements into the four blocks
Write a short memo: "Where does intelligence currently live in my organization?"

Live workshop (150 min)

Lesson	Activity	Timing	Facilitator moves	Student outputs
1. Framing the architecture	Opening provocation and model walkthrough	20 min	Open with the course tagline and ask students to define what their organization genuinely understands that is hard to understand. Teach the four blocks with 2-3 concrete examples. Reiterate that the model is functional, not hierarchical.	Notes on the four blocks and a working definition of organizational intelligence
2. Capability vs. product distinction	Sorting exercise	20 min	Present examples such as payments, payroll, dashboard, merchant app, underwriting model, account opening, customer portal. Push students to justify each placement. Correct category errors publicly.	Annotated sort showing which items are primitives, bundles, or interfaces
3. Where intelligence lives now	Individual mapping draft	25 min	Instruct students to map a real organization using the four-block framework. Require at least one concrete example per block or an explicit "missing." Circulate and press for specificity.	Draft 4-block map
4. Diagnosis and critique	Pair review	20 min	Give peers a critique protocol: "What is mislabeled? Where is a person substituting for a system? Where is intelligence trapped in a layer?"	Peer critique notes and 2 revisions to map
5. Roadmap as limiting factor	Mini-lecture and discussion	20 min	Contrast roadmap-first operating with capability-and-composition operating. Use the curriculum line that the roadmap is often the bottleneck between reality and response.	Reflection on current roadmap bottlenecks
6. Gap identification	Guided refinement	25 min	Require students to mark at least 3 gaps: missing capability, missing world model visibility, missing intelligence layer behavior, or interface overinvestment. Ask which gap is most systemically constraining.	Revised map with highlighted gaps
7. Share-out and synthesis	Debrief	20 min	Select 3-4 students with different org types. Summarize recurring errors: treating an interface as value creation, treating a team as a world model, calling a feature a capability.	Debrief notes and prioritized gap list

Facilitator emphasis points

A capability has no necessary UI and can support multiple future compositions.
A world model can be partial, fragmented, stale, person-dependent, or absent.
An intelligence layer is not a dashboard, workflow, or PM ritual; it is the logic that recognizes moments and composes responses.
Interfaces are important, but they are not where the deepest defensibility lives.

Post-session assignment (60-75 min)

Finalize the 4-Block Organizational Map
Add:
3 examples of true capabilities
3 examples of product bundles or interfaces often mistaken for capabilities
3 highest-cost gaps in the current system
1 paragraph on where intelligence currently lives and why that is a problem

Required student output

Capstone Artifact 1 draft: 4-Block Organizational Map

Module 2 / Session 2

Honest Signal: What Money Knows That Surveys Don't

Module purpose

Teach signal quality as an operating advantage
Help students identify which signals in their context are honest, consequential, and compounding
Move teams away from self-report and vanity data as primary truth

Session outcome

Students produce a signal inventory with explicit honesty and compounding ratings, plus one newly designed signal pathway

Prework (45-60 min)

Read the signal quality spectrum
Gather 5-10 data sources their organization uses to make decisions
Bring one example of a decision made from weak or performative signal

Live workshop (150 min)

Lesson	Activity	Timing	Facilitator moves	Student outputs
1. Signal honesty frame	Concept lecture	20 min	Teach the difference between stated preference and revealed preference. Use the website prompt spectrum language. Clarify that "honest" means consequential, not morally pure.	Personal notes and reclassification of current signals
2. Signal spectrum calibration	Group placement exercise	20 min	Place sample signals on a spectrum: NPS, survey intent, login frequency, repeat purchase, refund rate, repayment behavior, time-to-resolution. Ask why each belongs where it does.	Signal spectrum worksheet
3. Inventory build	Individual work	25 min	Students list all current organizational signals, the decision each informs, its honesty rating, compounding rating, and known blind spots. Press them to include ignored signals, not just official metrics.	Draft signal inventory table
4. Compounding flywheel	Mini-lecture and application	20 min	Explain richer signal -> better model -> more relevant action -> richer signal. Ask what would have to happen for their best signal to compound rather than remain static.	Flywheel sketch for one signal loop
5. Pathway design	New signal design lab	25 min	Students design one new honest signal pathway their organization does not currently capture. Require: trigger, collection method, storage format, ethical boundary, operating use.	New signal pathway draft
6. Red team review	Small-group challenge	20 min	Ask peers to find weak assumptions: Is the signal really consequential? Can it be gamed? Does it accumulate? Does it change behavior or just describe behavior?	Critique notes and revised ratings
7. Synthesis	Debrief and scoring norms	20 min	Summarize common mistakes: overrating surveys, mistaking activity for value, confusing observability with honesty. Preview how the artifact will be graded.	Final priorities for top 3 signals

Facilitator emphasis points

Financial signal is exemplary because it carries consequences, but other domains can have honest signal if actions are costly or committed.
A compounding signal gets more useful as history accumulates.
A signal pathway is not just a metric definition; it includes capture, structure, reuse, and actionability.

Post-session assignment (75-90 min)

Complete the Signal Inventory
Required fields per signal:
Signal name
Source
Example event
Decision currently influenced
Honesty score 1-5
Compounding score 1-5
Risk of gaming
Current owner
Underused opportunity
Design one new signal pathway with:
trigger condition
collection mechanism
storage fields
review cadence
decision(s) it should influence

Required student output

Capstone Artifact 2 draft: Signal Inventory

Module 3 / Session 3

Artifact Discipline: Feeding the Model

Module purpose

Reframe documentation as model fuel
Build decision logging habits that preserve reasoning and downstream signal
Improve artifact quality so organizational intelligence compounds instead of evaporates

Session outcome

Students adopt a decision-log format, audit recent artifacts, and convert weak documentation into model-ready entries

Prework (45-60 min)

Collect 3-5 recent work artifacts: meeting notes, project docs, tickets, email threads, Slack threads, commits, or strategy memos
Mark which artifact led to a consequential decision
Read the course decision-log template

Live workshop (150 min)

Lesson	Activity	Timing	Facilitator moves	Student outputs
1. Artifact philosophy	Opening lecture	20 min	Teach the Block principle that remote-first artifacts are not archive residue; they are the operating substrate. Ask what work disappears today because it is not captured well.	Artifact loss list
2. Model-ready artifact criteria	Criteria walkthrough	20 min	Teach the four criteria: decision recorded, context preserved, outcome linked, machine-readable structure. Show one strong and one weak example.	Criteria checklist
3. Artifact audit	Individual audit	25 min	Students examine their collected artifacts and score each against the four criteria. Push them to identify what is missing, not whether the artifact is "good enough."	Artifact audit table
4. Decision log conversion	Rewrite lab	30 min	Students convert one weak artifact into the decision-log template. Require alternatives considered and explicit reasoning, not post-hoc rationalization.	One rewritten decision log
5. Failure postmortems as signal	Mini-lecture and applied practice	20 min	Teach postmortem structure: what happened, why, what it signals, what capability is missing. Reject blame-focused narratives.	Postmortem skeleton for one failure
6. Peer review	Structured critique	15 min	Peers review whether the artifact is actually machine-readable, whether outcome and signal are distinct, and whether context is concrete enough for future inference.	Revision notes
7. Operationalization	Habit design	20 min	Ask students to specify where these logs will live, who must write them, when they are updated, and how they feed future decision-making.	Operating plan for artifact discipline

Facilitator emphasis points

Documentation is weak if it records that a meeting occurred but not what changed.
"Outcome" and "signal" are different; an outcome is what happened, a signal is what the system should learn.
Machine-readable does not require a database; it requires stable structure and retrievable fields.

Post-session assignment (60-90 min)

Submit:
Artifact audit of at least 5 recent artifacts
At least 6 decision-log entries covering the last 2 weeks or the nearest equivalent working period
1 structured postmortem entry for a failure, miss, escalation, or reversal
Use the following required decision-log fields:
Date
Decision
Context
Alternatives considered
Reasoning
Outcome
Signal
Follow-up owner
Review date

Required student output

Capstone Artifact 3 draft: Artifact Discipline Log

Module 4 / Session 4

The Intelligence Layer: Composing Without Product Managers

Module purpose

Help students think in compositions, moments, and capability gaps
Teach failure-signal roadmap generation
Conclude with an integrated operating-principles document

Session outcome

Students complete a composition challenge analysis and a set of operating principles for AI-native operating in their context

Prework (45-60 min)

Identify one real customer, user, stakeholder, or operational moment that was underserved
Bring any known constraints, failed workarounds, or roadmap debates related to that moment
Review prior session artifacts

Live workshop (150 min)

Lesson	Activity	Timing	Facilitator moves	Student outputs
1. Intelligence layer explained	Scenario walkthrough	20 min	Teach the restaurant cash-flow example from the curriculum. Stress that no PM had to pre-spec the full solution if the capabilities and model existed.	Notes on composition logic
2. Moment recognition	Pattern-identification exercise	20 min	Students define the moment they selected: what changed, what signal would indicate it, why it matters now, what the desired intervention is.	Moment definition
3. Capability composition	Composition mapping	25 min	Students list the capabilities required to respond effectively. Challenge anything that is actually a UI element, policy, or team.	Composition map
4. Failure-signal roadmap	Gap analysis	25 min	Ask: if the intelligence layer tried to compose this solution today, where would it fail? What missing capability or missing model element would cause the failure?	Failure-signal map
5. Portfolio integration	Operating principles drafting	20 min	Students synthesize all prior artifacts into 5-7 operating principles. Each principle must link to one observed reality and one operating behavior.	Draft operating principles
6. Executive readout practice	3-minute briefing	20 min	Students practice presenting the composition challenge and the highest-leverage capability gap as if briefing an exec team. Facilitate blunt feedback on clarity.	Executive briefing notes
7. Closing critique	Whole-group synthesis	20 min	Summarize recurring weak patterns: feature-thinking, local optimization, unclear moment definition, no compounding logic. End with revision priorities for final submission.	Final revision checklist

Facilitator emphasis points

A strong composition challenge produces a reusable capability insight, not a one-off feature request.
Failure signals are productive. They show what the organization cannot yet compose.
The final principles should describe how the organization should operate, not generic values it claims to hold.

Post-session assignment (75-90 min)

Finalize:
Composition Challenge Analysis
Operating Principles Document
integrated capstone portfolio

Required student output

Capstone Artifact 4: Composition Challenge Analysis
Capstone Artifact 5: Operating Principles Document

Final capstone integration session guidance

If instructors want a formal portfolio review block, add an optional 60-minute capstone clinic after Session 4:

15 min self-check against rubric
20 min peer review
15 min facilitator office hours
10 min submission QA for formatting and completeness

5. Assignments and artifacts

Required capstone portfolio components

4-Block Organizational Map

Required elements:
one labeled section for each of the four blocks
at least 3 examples of true capabilities
at least 3 identified gaps
explanation of where intelligence currently lives
Evidence expectation:
examples must come from actual operating reality, not aspiration

Signal Inventory

Required elements:
minimum 8 signal sources
honesty and compounding scores
one underutilized signal called out
one new signal pathway designed
Evidence expectation:
each signal tied to a real decision or missed decision

Artifact Discipline Log

Required elements:
artifact audit for at least 5 recent artifacts
at least 6 decision logs
at least 1 structured postmortem
Evidence expectation:
decision logs must include reasoning and outcome, not just summary

Composition Challenge Analysis

Required elements:
customer or operational moment description
signals that should identify the moment
capabilities required
missing capability or world-model gap
explicit failure signal and resulting roadmap implication
Evidence expectation:
the missing capability should plausibly enable more than one future composition

Operating Principles Document

Required elements:
5-7 principles
rationale for each principle
one operating behavior per principle
one artifact or signal implication per principle
Evidence expectation:
principles should emerge from portfolio findings, not generic management language

Submission packaging requirements

Preferred format: one markdown or document file with these exact top-level headings:
Artifact 1 - 4-Block Organizational Map
Artifact 2 - Signal Inventory
Artifact 3 - Artifact Discipline Log
Artifact 4 - Composition Challenge Analysis
Artifact 5 - Operating Principles Document
Students should anonymize confidential names but preserve the logic of the example
Tables are encouraged for signal inventory and artifact audit
Diagrams may be embedded as images, but each must have a short text explanation

Instructor review checkpoints

After Session 1: review whether students are distinguishing capabilities from products
After Session 2: review whether students are overvaluing self-report data
After Session 3: review whether decision logs contain actual reasoning and outcome linkage
After Session 4: review whether the composition challenge identifies a reusable capability gap

6. AI/LLM grading and assessment framework

Assessment philosophy

The LLM should act as a disciplined evidence evaluator, not an inspirational coach and not a mind reader. It should score only what the portfolio demonstrates.

Grading model

Total score: 100 points

Artifact weights

Artifact 1: 20 points
Artifact 2: 20 points
Artifact 3: 20 points
Artifact 4: 20 points
Artifact 5: 20 points

Portfolio-level evaluation dimensions

Across all artifacts, the evaluator should score these dimensions implicitly and reference them in feedback:

Analytical rigor
Capability-first thinking
Signal honesty
Artifact quality
Composition thinking
Operating specificity

Recommended evaluation workflow

Completeness pass

Confirm all five artifacts are present
Confirm minimum required fields are present
Mark missing sections before scoring quality

Artifact scoring pass

Score each artifact using its rubric
Extract 2-4 quoted evidence snippets or paraphrased evidence points from the submission
Note confidence level: high, medium, low

Cross-artifact coherence pass

Check whether the signal inventory supports the composition challenge
Check whether operating principles are grounded in earlier artifacts
Check whether claimed capabilities remain consistent across artifacts

Feedback generation pass

Produce strengths, weaknesses, and revision priorities
Give next-step advice tied to the weakest artifact

Concrete LLM evaluation heuristics

Artifact 1 heuristics: 4-Block Organizational Map

Score high when:
capabilities are framed as reusable primitives
interfaces are explicitly separated from deeper value-creation layers
the world model is described as an information system, not merely a team or meeting
intelligence location is diagnosed clearly, including human bottlenecks
Score down when:
product features, dashboards, or apps are mislabeled as capabilities
the map looks like an org chart rather than a functional architecture
gaps are generic, such as "need better AI"
there is no explanation of where intelligence currently lives
Hard cap rule:
if the artifact never distinguishes capabilities from products, maximum score is 10/20

Artifact 2 heuristics: Signal Inventory

Score high when:
consequential signals are prioritized over opinion-based signals
compounding potential is explained, not just numerically rated
the new pathway includes capture, storage, and decision use
gaming risk is acknowledged realistically
Score down when:
surveys, NPS, or intent statements are rated as the highest-quality signals without justification
there are fewer than 8 signals
the new pathway is just "collect more feedback"
no decision linkage is shown
Hard cap rule:
if most top-rated signals are self-report or vanity metrics, maximum score is 9/20

Artifact 3 heuristics: Artifact Discipline Log

Score high when:
decision logs include real alternatives and explicit reasoning
outcomes are linked to subsequent signal or review points
the postmortem identifies systemic learning, not individual blame
the student proposes a sustainable operating habit, not a one-time cleanup
Score down when:
logs are retrospective summaries with no decision tension
outcome fields say "TBD" for most entries without review dates
postmortem language is blame-centric or vague
there is no machine-readable structure
Hard cap rule:
if fewer than 4 decision logs are present, maximum score is 8/20

Artifact 4 heuristics: Composition Challenge Analysis

Score high when:
the moment is concrete and triggered by observable signal
the required capabilities are reusable and clearly named
the failure signal is plausible and roadmap-generating
the student distinguishes missing capability from missing interface polish
Score down when:
the artifact is basically a feature pitch
the moment is vague or hypothetical without signal evidence
the "missing capability" is actually a person, project, or UI request
no failure signal is defined
Hard cap rule:
if the submission does not identify a reusable capability gap, maximum score is 10/20

Artifact 5 heuristics: Operating Principles Document

Score high when:
each principle is specific enough to change behavior
each principle links back to evidence from earlier artifacts
principles describe how the organization should operate, not what it hopes to value
each principle has an artifact or signal implication
Score down when:
principles read like generic culture statements
the rationale is detached from earlier analysis
no behavior or instrumentation change is specified
multiple principles are redundant
Hard cap rule:
if principles are not grounded in the preceding portfolio, maximum score is 11/20

Confidence calibration guidance for the LLM

High confidence: detailed evidence, multiple concrete examples, consistent terminology across artifacts
Medium confidence: mostly complete submission with some vague or inconsistent sections
Low confidence: sparse evidence, high abstraction, or obvious template-filling with little operational specificity

Suggested structured grading output format

{
  "overall_score": 0,
  "artifact_scores": {
    "organizational_map": 0,
    "signal_inventory": 0,
    "artifact_discipline_log": 0,
    "composition_challenge": 0,
    "operating_principles": 0
  },
  "confidence": "high|medium|low",
  "strengths": [],
  "weaknesses": [],
  "revision_priorities": [],
  "evidence_used": [],
  "notes_for_human_reviewer": []
}
```

## 7. Rubrics, scoring criteria, and evaluator prompt guidance

### Master scoring scale

Use the following shared scale for each artifact:
- `18-20` Excellent: precise, evidence-rich, operationally credible, and aligned with course concepts
- `14-17` Strong: mostly rigorous, some gaps in depth or precision, but materially useful
- `10-13` Adequate: demonstrates partial understanding with notable category errors or thin evidence
- `6-9` Weak: superficial, generic, or structurally incomplete
- `0-5` Insufficient: missing major elements or fundamentally misunderstands the concept

### Artifact-specific rubric criteria

#### Artifact 1 rubric: 4-Block Organizational Map

| Criterion | Points | What earns full credit |
|---|---:|---|
| Correct use of four-block framework | 5 | All four blocks are clearly defined and populated or explicitly marked as missing |
| Capability identification quality | 5 | Capabilities are true primitives and separated from products/interfaces |
| Diagnosis of where intelligence lives | 5 | Current intelligence location is concrete, credible, and explains bottlenecks |
| Gap identification and prioritization | 5 | At least 3 meaningful gaps with clear systemic implications |

#### Artifact 2 rubric: Signal Inventory

| Criterion | Points | What earns full credit |
|---|---:|---|
| Honest signal identification | 6 | High-value signals are consequential and clearly justified |
| Compounding analysis | 4 | Student explains which signals grow in value over time and why |
| Decision linkage | 5 | Each signal is tied to a real decision or operating use |
| New signal pathway design | 5 | Pathway is specific, feasible, and operationally meaningful |

#### Artifact 3 rubric: Artifact Discipline Log

| Criterion | Points | What earns full credit |
|---|---:|---|
| Artifact audit quality | 4 | Audit diagnoses strengths and missing model-ready elements accurately |
| Decision log rigor | 8 | Entries include context, alternatives, reasoning, outcome, and signal |
| Postmortem quality | 4 | Failure analysis is systemic and learning-oriented |
| Operationalization plan | 4 | Student specifies how this discipline will persist in real work |

#### Artifact 4 rubric: Composition Challenge Analysis

| Criterion | Points | What earns full credit |
|---|---:|---|
| Moment definition | 4 | The moment is real, consequential, and signal-detectable |
| Capability composition logic | 6 | Required capabilities are reusable, well-scoped, and coherent |
| Failure-signal definition | 5 | Missing element and resulting failure signal are plausible and clear |
| Roadmap implication | 5 | The analysis shows how failure should generate future capability priorities |

#### Artifact 5 rubric: Operating Principles Document

| Criterion | Points | What earns full credit |
|---|---:|---|
| Specificity of principles | 5 | Principles are concrete and behavior-shaping |
| Grounding in portfolio evidence | 5 | Principles clearly emerge from prior artifacts |
| Operational implications | 5 | Each principle changes artifacts, signals, review habits, or decision rights |
| Coherence and non-redundancy | 5 | Principles fit together as a usable operating model |

### Portfolio evaluator prompt

Use this prompt when grading the full portfolio:

```text
You are evaluating a student portfolio for Project Agni Course 12: AI-Native Operating.

Your job is to grade only the evidence present in the submission. Do not assume competence, context, or implementation details that are not stated. Penalize category errors such as confusing products with capabilities, dashboards with intelligence layers, or self-report with honest signal.

Score each of the five artifacts out of 20 using the supplied rubric criteria:
1. 4-Block Organizational Map
2. Signal Inventory
3. Artifact Discipline Log
4. Composition Challenge Analysis
5. Operating Principles Document

Then produce:
- overall score out of 100
- confidence level: high, medium, or low
- 3 strengths grounded in evidence
- 3 weaknesses grounded in evidence
- 3 revision priorities ordered by leverage

Important evaluation rules:
- If an artifact is incomplete, score the artifact based on what is present and note missing sections.
- If products or interfaces are mislabeled as capabilities, reduce the Organizational Map score materially.
- If self-report or vanity metrics are rated as the strongest signals without strong justification, reduce the Signal Inventory score materially.
- If decision logs lack reasoning or outcome linkage, reduce the Artifact Discipline Log score materially.
- If the composition challenge is effectively a feature request, reduce the Composition Challenge score materially.
- If the operating principles read like generic values rather than operating rules grounded in the portfolio, reduce the Operating Principles score materially.

Return your answer in structured JSON using this shape:
{
  "overall_score": 0,
  "artifact_scores": {
    "organizational_map": 0,
    "signal_inventory": 0,
    "artifact_discipline_log": 0,
    "composition_challenge": 0,
    "operating_principles": 0
  },
  "confidence": "",
  "strengths": [],
  "weaknesses": [],
  "revision_priorities": [],
  "evidence_used": [],
  "notes_for_human_reviewer": []
}
```

### Single-artifact evaluator prompts

**Organizational Map**

```text
Evaluate this 4-block organizational map.
Score out of 20 using these criteria:
- correct use of the four-block framework
- capability identification quality
- diagnosis of where intelligence lives
- gap identification and prioritization

Flag any case where products, interfaces, teams, or dashboards are mislabeled as capabilities or intelligence layers.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```

**Signal Inventory**

```text
Evaluate this signal inventory.
Score out of 20 using these criteria:
- honest signal identification
- compounding analysis
- decision linkage
- new signal pathway design

Specifically identify whether the student overrates surveys, self-report, or vanity metrics.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```

**Artifact Discipline Log**

```text
Evaluate this artifact discipline submission.
Score out of 20 using these criteria:
- artifact audit quality
- decision log rigor
- postmortem quality
- operationalization plan

Flag whether decision logs preserve reasoning, whether outcomes are linked, and whether postmortems are systemic rather than blame-centric.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```

**Composition Challenge Analysis**

```text
Evaluate this composition challenge analysis.
Score out of 20 using these criteria:
- moment definition
- capability composition logic
- failure-signal definition
- roadmap implication

Flag whether the student has actually identified a reusable capability gap or has simply written a feature request.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```

**Operating Principles Document**

```text
Evaluate this operating principles document.
Score out of 20 using these criteria:
- specificity of principles
- grounding in portfolio evidence
- operational implications
- coherence and non-redundancy

Flag any principles that read like generic values statements rather than operating rules.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```

### Human-in-the-loop moderation guidance

- Send low-confidence evaluations to a human reviewer
- Require human review if the portfolio includes highly domain-specific regulated capabilities
- Require human review if the grader detects contradictions across artifacts but cannot resolve them
- Allow the human reviewer to override scores when real organizational context is clearly present but lightly documented

## 8. Feedback strategy: what strong/average/weak responses look like and how an LLM should respond

### Feedback design goals

- Make feedback diagnostic, not decorative
- Tie every critique to observable evidence or missing evidence
- Distinguish conceptual misunderstanding from incomplete execution
- Prioritize revisions that improve operating clarity and future decision quality

### What strong responses look like

**Characteristics**
- Uses the four-block framework correctly and consistently
- Separates capabilities from products and interfaces
- Identifies at least one overlooked, compounding, honest signal
- Writes decision logs with genuine alternatives and reasoning
- Frames the composition challenge around a real moment and a reusable capability gap
- Produces principles that clearly change how the organization should document, decide, and learn

**How the LLM should respond**
- Acknowledge the strongest analytical moves specifically
- Preserve the student's strongest distinctions so they are not revised away
- Suggest one higher-order improvement, such as tighter signal instrumentation or cross-artifact alignment
- Avoid generic praise; quote or paraphrase the evidence that justifies the score

**Example feedback posture**
- "Your strongest move is distinguishing merchant underwriting as a capability while correctly placing the merchant dashboard as an interface. That distinction holds across the portfolio and improves the quality of your composition analysis."

### What average responses look like

**Characteristics**
- Understands the course concepts in broad terms but slips into generic language
- Has some real examples but not enough evidence depth
- Mixes honest signals with weak proxies without fully defending the ratings
- Writes decision logs that capture context but only partially capture reasoning or outcome
- Identifies a plausible gap but not yet a clearly reusable capability

**How the LLM should respond**
- Name the partial success first
- Then isolate 2-3 concrete category errors or thin sections
- Give explicit revision instructions tied to the rubric
- Recommend the highest-leverage next move rather than many small edits

**Example feedback posture**
- "You are directionally using the four-block model well, but several items in your capability list still read as product bundles. Rework that section first; until the primitive layer is clean, the rest of the portfolio remains harder to evaluate."

### What weak responses look like

**Characteristics**
- Mostly abstract or slogan-heavy
- Treats the map like an org chart or tool list
- Overrates surveys, NPS, or qualitative opinion as primary truth without justification
- Logs decisions as summaries with no tension, alternatives, or learning
- Writes a feature request instead of a composition challenge
- Produces principles that sound like values posters

**How the LLM should respond**
- Be direct and specific
- State that the current artifact does not yet demonstrate the required course understanding
- Point to the exact missing elements
- Give a short revision sequence in the order that will unlock improvement
- Avoid shaming language and avoid pretending the work is stronger than it is

**Example feedback posture**
- "This submission does not yet show capability-first thinking. Most of the listed capabilities are interfaces or features. Revise the map before revising the composition challenge, because the current challenge inherits the same category error."

### LLM feedback rules

- Do not praise polish if analytical quality is weak
- Do not invent hidden strengths
- Do not soften essential corrections with vague positivity
- Do identify one thing worth preserving in any submission that is at least partially correct
- Do distinguish between `missing`, `unclear`, and `incorrect`

### Recommended feedback template for all evaluations

```text
Overall judgment:
[1-3 sentence summary of demonstrated understanding]

What is working:
- [evidence-based strength]
- [evidence-based strength]

What needs revision:
- [most important weakness]
- [second weakness]
- [third weakness]

Highest-leverage next step:
- [the one revision that most improves the portfolio]

Why this matters:
- [connect the revision to AI-native operating, not just the grade]
```

### Revision guidance by failure mode

**If the student confuses capabilities and products**
- Ask them to rewrite the map using the test: "Could this support multiple future experiences without being tied to one interface?"

**If the student overvalues weak signals**
- Ask them to rerank signals by consequence, cost of action, and resistance to gaming

**If decision logs are thin**
- Ask them to add one rejected alternative and one downstream signal for every log entry

**If the composition challenge is too feature-like**
- Ask them to remove UI language and restate the missing reusable capability

**If operating principles are generic**
- Ask them to tie each principle to one failure, one signal, and one artifact behavior from the portfolio

## Implementation notes for instructors

- Run this course with real organizational examples whenever possible; hypothetical examples should be allowed only when clearly marked
- Encourage anonymization but forbid total abstraction
- Use the website prompt's layered, system-architecture framing in slides and worksheets to reinforce transfer
- Keep the critique standard high; the course only works if category errors are corrected early