Course 12 Content Spec: AI-Native Operating
1. Title and source files used
Course title: Project Agni Course 12: AI-Native Operating Subtitle: The Organization as an Intelligence Tagline: Operating as if intelligence lives in the system
Source files used
12-ai-native-operating/curriculum.md12-ai-native-operating/website-prompt.md
Purpose of this file
- Expand the approved curriculum into a delivery-ready course-content specification.
- Preserve the curriculum as source truth while turning it into a teachable, assessable, LLM-gradeable program.
- Define the exact learning flow, artifact requirements, facilitation moves, and evaluation logic needed to run the course without additional design work.
2. Design decisions at the top
- The course teaches operating behavior, not AI enthusiasm. The content avoids generic AI transformation talk and centers on concrete operating disciplines: capability mapping, signal quality, artifact design, and failure-signal capture.
- The organizing metaphor is a living system architecture. This follows the website prompt and helps students reason in layers: capabilities, world model, intelligence layer, interfaces. Every module returns to these four blocks.
- Artifacts are the curriculum. Students are assessed on durable operating artifacts, not recall. Every session produces portfolio material that can be reviewed by humans and LLMs.
- Capability-first thinking is enforced. Students must separate primitives from products, and composable capabilities from one-off features. This is a core grading distinction.
- Signal honesty is treated as a design constraint. The course repeatedly asks: what data represents actual consequential behavior, what is merely performative, and what compounds over time?
- The course is implementation-ready for mixed modality. Each session includes async preparation, live facilitation, and post-session artifact work so it can run as a cohort intensive or a team enablement program.
- Assessment is evidence-bound. LLM grading is only allowed to score what is present in the student artifact. Missing evidence lowers confidence and score; the evaluator must not infer competence from tone or jargon.
- Feedback must be behavior-changing. Rubrics and evaluator prompts are designed to produce revision guidance that improves organizational practice, not just a grade.
- The website aesthetic informs tone, not content structure. The site may feel like a mission briefing or system diagram, but the course file stays operational, explicit, and classroom-usable.
3. Delivery model assumptions
Program format
- 4-session intensive
- Each session includes:
45-60 minasync prep150 minlive workshop60-90 minpost-session artifact work- Total learner time: approximately
16-18 hours
Recommended cadence
- Option A: 2 weeks, 2 sessions per week
- Option B: 4 weeks, 1 session per week
- Option C: internal team sprint, 2 consecutive days with combined async work between blocks
Audience
- Operators, founders, product leaders, functional leads, chiefs of staff, strategy leads, transformation leads
- Works best for students who can examine a real organization they belong to or know well
Cohort size
- Ideal:
12-30 - Minimum viable:
6 - Maximum without extra facilitation:
36
Facilitation staffing
1lead facilitator can run up to 20 students1lead facilitator +1teaching assistant recommended for 21-36 students
Delivery environment
- Video platform with breakout rooms
- Shared whiteboard or diagramming tool
- Shared artifact repository or LMS
- Structured submission template in markdown, doc, or form fields
Submission assumptions
- All student artifacts should use standardized headings so human and LLM evaluation can read them consistently
- Preferred submission format: one portfolio document with clearly labeled sections for all five capstone artifacts
- Students should cite evidence from their organization using anonymized but concrete examples
Prerequisite assumption
- Course 11 is recommended but not required
- Students unfamiliar with DRI concepts should receive a 10-minute primer in prework
Facilitator stance
- Push students away from buzzwords and toward operational specifics
- Challenge category errors, especially when students confuse interfaces with capabilities or surveys with honest signal
- Reward clarity, specificity, and evidence over polish
4. Detailed course content broken down by module, session, lesson, activity, timing, facilitator moves, and student outputs
Course-wide learning outcomes
By the end of the course, students will be able to:
- Map an organization using the four-block model rather than an org-chart model
- Distinguish capability primitives from product bundles and delivery interfaces
- Audit signal quality and identify honest, consequential, compounding data sources
- Design model-ready artifacts that preserve decisions, reasoning, outcomes, and systemic signal
- Reframe roadmap decisions as compositions, failed compositions, and capability gaps
- Produce a portfolio that can guide actual changes in organizational operating practice
Standard artifact template requirements used in every session
Every student artifact should include:
ContextObserved realityInterpretationImplicationsEvidence or examplesOpen questions
This structure makes the work teachable, revisable, and LLM-assessable.
Module 1 / Session 1
The Four Building Blocks: What Your Organization Actually Has
Module purpose
- Establish the core architecture of AI-native operating
- Force students to distinguish between capabilities, world model, intelligence layer, and interfaces
- Surface where organizational intelligence currently resides
Session outcome
- Students complete a first-pass 4-block organizational map with at least 3 critical gaps identified
Prework (45-60 min)
- Read the session brief on Block's four building blocks
- Review two sample company descriptions and classify elements into the four blocks
- Write a short memo: "Where does intelligence currently live in my organization?"
Live workshop (150 min)
| Lesson | Activity | Timing | Facilitator moves | Student outputs |
|---|---|---|---|---|
| 1. Framing the architecture | Opening provocation and model walkthrough | 20 min | Open with the course tagline and ask students to define what their organization genuinely understands that is hard to understand. Teach the four blocks with 2-3 concrete examples. Reiterate that the model is functional, not hierarchical. | Notes on the four blocks and a working definition of organizational intelligence |
| 2. Capability vs. product distinction | Sorting exercise | 20 min | Present examples such as payments, payroll, dashboard, merchant app, underwriting model, account opening, customer portal. Push students to justify each placement. Correct category errors publicly. | Annotated sort showing which items are primitives, bundles, or interfaces |
| 3. Where intelligence lives now | Individual mapping draft | 25 min | Instruct students to map a real organization using the four-block framework. Require at least one concrete example per block or an explicit "missing." Circulate and press for specificity. | Draft 4-block map |
| 4. Diagnosis and critique | Pair review | 20 min | Give peers a critique protocol: "What is mislabeled? Where is a person substituting for a system? Where is intelligence trapped in a layer?" | Peer critique notes and 2 revisions to map |
| 5. Roadmap as limiting factor | Mini-lecture and discussion | 20 min | Contrast roadmap-first operating with capability-and-composition operating. Use the curriculum line that the roadmap is often the bottleneck between reality and response. | Reflection on current roadmap bottlenecks |
| 6. Gap identification | Guided refinement | 25 min | Require students to mark at least 3 gaps: missing capability, missing world model visibility, missing intelligence layer behavior, or interface overinvestment. Ask which gap is most systemically constraining. | Revised map with highlighted gaps |
| 7. Share-out and synthesis | Debrief | 20 min | Select 3-4 students with different org types. Summarize recurring errors: treating an interface as value creation, treating a team as a world model, calling a feature a capability. | Debrief notes and prioritized gap list |
Facilitator emphasis points
- A capability has no necessary UI and can support multiple future compositions.
- A world model can be partial, fragmented, stale, person-dependent, or absent.
- An intelligence layer is not a dashboard, workflow, or PM ritual; it is the logic that recognizes moments and composes responses.
- Interfaces are important, but they are not where the deepest defensibility lives.
Post-session assignment (60-75 min)
- Finalize the
4-Block Organizational Map - Add:
- 3 examples of true capabilities
- 3 examples of product bundles or interfaces often mistaken for capabilities
- 3 highest-cost gaps in the current system
- 1 paragraph on where intelligence currently lives and why that is a problem
Required student output
- Capstone Artifact 1 draft:
4-Block Organizational Map
Module 2 / Session 2
Honest Signal: What Money Knows That Surveys Don't
Module purpose
- Teach signal quality as an operating advantage
- Help students identify which signals in their context are honest, consequential, and compounding
- Move teams away from self-report and vanity data as primary truth
Session outcome
- Students produce a signal inventory with explicit honesty and compounding ratings, plus one newly designed signal pathway
Prework (45-60 min)
- Read the signal quality spectrum
- Gather 5-10 data sources their organization uses to make decisions
- Bring one example of a decision made from weak or performative signal
Live workshop (150 min)
| Lesson | Activity | Timing | Facilitator moves | Student outputs |
|---|---|---|---|---|
| 1. Signal honesty frame | Concept lecture | 20 min | Teach the difference between stated preference and revealed preference. Use the website prompt spectrum language. Clarify that "honest" means consequential, not morally pure. | Personal notes and reclassification of current signals |
| 2. Signal spectrum calibration | Group placement exercise | 20 min | Place sample signals on a spectrum: NPS, survey intent, login frequency, repeat purchase, refund rate, repayment behavior, time-to-resolution. Ask why each belongs where it does. | Signal spectrum worksheet |
| 3. Inventory build | Individual work | 25 min | Students list all current organizational signals, the decision each informs, its honesty rating, compounding rating, and known blind spots. Press them to include ignored signals, not just official metrics. | Draft signal inventory table |
| 4. Compounding flywheel | Mini-lecture and application | 20 min | Explain richer signal -> better model -> more relevant action -> richer signal. Ask what would have to happen for their best signal to compound rather than remain static. | Flywheel sketch for one signal loop |
| 5. Pathway design | New signal design lab | 25 min | Students design one new honest signal pathway their organization does not currently capture. Require: trigger, collection method, storage format, ethical boundary, operating use. | New signal pathway draft |
| 6. Red team review | Small-group challenge | 20 min | Ask peers to find weak assumptions: Is the signal really consequential? Can it be gamed? Does it accumulate? Does it change behavior or just describe behavior? | Critique notes and revised ratings |
| 7. Synthesis | Debrief and scoring norms | 20 min | Summarize common mistakes: overrating surveys, mistaking activity for value, confusing observability with honesty. Preview how the artifact will be graded. | Final priorities for top 3 signals |
Facilitator emphasis points
- Financial signal is exemplary because it carries consequences, but other domains can have honest signal if actions are costly or committed.
- A compounding signal gets more useful as history accumulates.
- A signal pathway is not just a metric definition; it includes capture, structure, reuse, and actionability.
Post-session assignment (75-90 min)
- Complete the
Signal Inventory - Required fields per signal:
- Signal name
- Source
- Example event
- Decision currently influenced
- Honesty score
1-5 - Compounding score
1-5 - Risk of gaming
- Current owner
- Underused opportunity
- Design one new signal pathway with:
- trigger condition
- collection mechanism
- storage fields
- review cadence
- decision(s) it should influence
Required student output
- Capstone Artifact 2 draft:
Signal Inventory
Module 3 / Session 3
Artifact Discipline: Feeding the Model
Module purpose
- Reframe documentation as model fuel
- Build decision logging habits that preserve reasoning and downstream signal
- Improve artifact quality so organizational intelligence compounds instead of evaporates
Session outcome
- Students adopt a decision-log format, audit recent artifacts, and convert weak documentation into model-ready entries
Prework (45-60 min)
- Collect 3-5 recent work artifacts: meeting notes, project docs, tickets, email threads, Slack threads, commits, or strategy memos
- Mark which artifact led to a consequential decision
- Read the course decision-log template
Live workshop (150 min)
| Lesson | Activity | Timing | Facilitator moves | Student outputs |
|---|---|---|---|---|
| 1. Artifact philosophy | Opening lecture | 20 min | Teach the Block principle that remote-first artifacts are not archive residue; they are the operating substrate. Ask what work disappears today because it is not captured well. | Artifact loss list |
| 2. Model-ready artifact criteria | Criteria walkthrough | 20 min | Teach the four criteria: decision recorded, context preserved, outcome linked, machine-readable structure. Show one strong and one weak example. | Criteria checklist |
| 3. Artifact audit | Individual audit | 25 min | Students examine their collected artifacts and score each against the four criteria. Push them to identify what is missing, not whether the artifact is "good enough." | Artifact audit table |
| 4. Decision log conversion | Rewrite lab | 30 min | Students convert one weak artifact into the decision-log template. Require alternatives considered and explicit reasoning, not post-hoc rationalization. | One rewritten decision log |
| 5. Failure postmortems as signal | Mini-lecture and applied practice | 20 min | Teach postmortem structure: what happened, why, what it signals, what capability is missing. Reject blame-focused narratives. | Postmortem skeleton for one failure |
| 6. Peer review | Structured critique | 15 min | Peers review whether the artifact is actually machine-readable, whether outcome and signal are distinct, and whether context is concrete enough for future inference. | Revision notes |
| 7. Operationalization | Habit design | 20 min | Ask students to specify where these logs will live, who must write them, when they are updated, and how they feed future decision-making. | Operating plan for artifact discipline |
Facilitator emphasis points
- Documentation is weak if it records that a meeting occurred but not what changed.
- "Outcome" and "signal" are different; an outcome is what happened, a signal is what the system should learn.
- Machine-readable does not require a database; it requires stable structure and retrievable fields.
Post-session assignment (60-90 min)
- Submit:
- Artifact audit of at least 5 recent artifacts
- At least 6 decision-log entries covering the last 2 weeks or the nearest equivalent working period
- 1 structured postmortem entry for a failure, miss, escalation, or reversal
- Use the following required decision-log fields:
- Date
- Decision
- Context
- Alternatives considered
- Reasoning
- Outcome
- Signal
- Follow-up owner
- Review date
Required student output
- Capstone Artifact 3 draft:
Artifact Discipline Log
Module 4 / Session 4
The Intelligence Layer: Composing Without Product Managers
Module purpose
- Help students think in compositions, moments, and capability gaps
- Teach failure-signal roadmap generation
- Conclude with an integrated operating-principles document
Session outcome
- Students complete a composition challenge analysis and a set of operating principles for AI-native operating in their context
Prework (45-60 min)
- Identify one real customer, user, stakeholder, or operational moment that was underserved
- Bring any known constraints, failed workarounds, or roadmap debates related to that moment
- Review prior session artifacts
Live workshop (150 min)
| Lesson | Activity | Timing | Facilitator moves | Student outputs |
|---|---|---|---|---|
| 1. Intelligence layer explained | Scenario walkthrough | 20 min | Teach the restaurant cash-flow example from the curriculum. Stress that no PM had to pre-spec the full solution if the capabilities and model existed. | Notes on composition logic |
| 2. Moment recognition | Pattern-identification exercise | 20 min | Students define the moment they selected: what changed, what signal would indicate it, why it matters now, what the desired intervention is. | Moment definition |
| 3. Capability composition | Composition mapping | 25 min | Students list the capabilities required to respond effectively. Challenge anything that is actually a UI element, policy, or team. | Composition map |
| 4. Failure-signal roadmap | Gap analysis | 25 min | Ask: if the intelligence layer tried to compose this solution today, where would it fail? What missing capability or missing model element would cause the failure? | Failure-signal map |
| 5. Portfolio integration | Operating principles drafting | 20 min | Students synthesize all prior artifacts into 5-7 operating principles. Each principle must link to one observed reality and one operating behavior. | Draft operating principles |
| 6. Executive readout practice | 3-minute briefing | 20 min | Students practice presenting the composition challenge and the highest-leverage capability gap as if briefing an exec team. Facilitate blunt feedback on clarity. | Executive briefing notes |
| 7. Closing critique | Whole-group synthesis | 20 min | Summarize recurring weak patterns: feature-thinking, local optimization, unclear moment definition, no compounding logic. End with revision priorities for final submission. | Final revision checklist |
Facilitator emphasis points
- A strong composition challenge produces a reusable capability insight, not a one-off feature request.
- Failure signals are productive. They show what the organization cannot yet compose.
- The final principles should describe how the organization should operate, not generic values it claims to hold.
Post-session assignment (75-90 min)
- Finalize:
Composition Challenge AnalysisOperating Principles Document- integrated capstone portfolio
Required student output
- Capstone Artifact 4:
Composition Challenge Analysis - Capstone Artifact 5:
Operating Principles Document
Final capstone integration session guidance
If instructors want a formal portfolio review block, add an optional 60-minute capstone clinic after Session 4:
15 minself-check against rubric20 minpeer review15 minfacilitator office hours10 minsubmission QA for formatting and completeness
5. Assignments and artifacts
Required capstone portfolio components
- 4-Block Organizational Map
- Required elements:
- one labeled section for each of the four blocks
- at least 3 examples of true capabilities
- at least 3 identified gaps
- explanation of where intelligence currently lives
- Evidence expectation:
- examples must come from actual operating reality, not aspiration
- Signal Inventory
- Required elements:
- minimum 8 signal sources
- honesty and compounding scores
- one underutilized signal called out
- one new signal pathway designed
- Evidence expectation:
- each signal tied to a real decision or missed decision
- Artifact Discipline Log
- Required elements:
- artifact audit for at least 5 recent artifacts
- at least 6 decision logs
- at least 1 structured postmortem
- Evidence expectation:
- decision logs must include reasoning and outcome, not just summary
- Composition Challenge Analysis
- Required elements:
- customer or operational moment description
- signals that should identify the moment
- capabilities required
- missing capability or world-model gap
- explicit failure signal and resulting roadmap implication
- Evidence expectation:
- the missing capability should plausibly enable more than one future composition
- Operating Principles Document
- Required elements:
- 5-7 principles
- rationale for each principle
- one operating behavior per principle
- one artifact or signal implication per principle
- Evidence expectation:
- principles should emerge from portfolio findings, not generic management language
Submission packaging requirements
- Preferred format: one markdown or document file with these exact top-level headings:
Artifact 1 - 4-Block Organizational MapArtifact 2 - Signal InventoryArtifact 3 - Artifact Discipline LogArtifact 4 - Composition Challenge AnalysisArtifact 5 - Operating Principles Document- Students should anonymize confidential names but preserve the logic of the example
- Tables are encouraged for signal inventory and artifact audit
- Diagrams may be embedded as images, but each must have a short text explanation
Instructor review checkpoints
- After Session 1: review whether students are distinguishing capabilities from products
- After Session 2: review whether students are overvaluing self-report data
- After Session 3: review whether decision logs contain actual reasoning and outcome linkage
- After Session 4: review whether the composition challenge identifies a reusable capability gap
6. AI/LLM grading and assessment framework
Assessment philosophy
The LLM should act as a disciplined evidence evaluator, not an inspirational coach and not a mind reader. It should score only what the portfolio demonstrates.
Grading model
Total score: 100 points
Artifact weights
- Artifact 1:
20 points - Artifact 2:
20 points - Artifact 3:
20 points - Artifact 4:
20 points - Artifact 5:
20 points
Portfolio-level evaluation dimensions
Across all artifacts, the evaluator should score these dimensions implicitly and reference them in feedback:
- Analytical rigor
- Capability-first thinking
- Signal honesty
- Artifact quality
- Composition thinking
- Operating specificity
Recommended evaluation workflow
- Completeness pass
- Confirm all five artifacts are present
- Confirm minimum required fields are present
- Mark missing sections before scoring quality
- Artifact scoring pass
- Score each artifact using its rubric
- Extract 2-4 quoted evidence snippets or paraphrased evidence points from the submission
- Note confidence level:
high,medium,low
- Cross-artifact coherence pass
- Check whether the signal inventory supports the composition challenge
- Check whether operating principles are grounded in earlier artifacts
- Check whether claimed capabilities remain consistent across artifacts
- Feedback generation pass
- Produce strengths, weaknesses, and revision priorities
- Give next-step advice tied to the weakest artifact
Concrete LLM evaluation heuristics
Artifact 1 heuristics: 4-Block Organizational Map
- Score high when:
- capabilities are framed as reusable primitives
- interfaces are explicitly separated from deeper value-creation layers
- the world model is described as an information system, not merely a team or meeting
- intelligence location is diagnosed clearly, including human bottlenecks
- Score down when:
- product features, dashboards, or apps are mislabeled as capabilities
- the map looks like an org chart rather than a functional architecture
- gaps are generic, such as "need better AI"
- there is no explanation of where intelligence currently lives
- Hard cap rule:
- if the artifact never distinguishes capabilities from products, maximum score is
10/20
Artifact 2 heuristics: Signal Inventory
- Score high when:
- consequential signals are prioritized over opinion-based signals
- compounding potential is explained, not just numerically rated
- the new pathway includes capture, storage, and decision use
- gaming risk is acknowledged realistically
- Score down when:
- surveys, NPS, or intent statements are rated as the highest-quality signals without justification
- there are fewer than 8 signals
- the new pathway is just "collect more feedback"
- no decision linkage is shown
- Hard cap rule:
- if most top-rated signals are self-report or vanity metrics, maximum score is
9/20
Artifact 3 heuristics: Artifact Discipline Log
- Score high when:
- decision logs include real alternatives and explicit reasoning
- outcomes are linked to subsequent signal or review points
- the postmortem identifies systemic learning, not individual blame
- the student proposes a sustainable operating habit, not a one-time cleanup
- Score down when:
- logs are retrospective summaries with no decision tension
- outcome fields say "TBD" for most entries without review dates
- postmortem language is blame-centric or vague
- there is no machine-readable structure
- Hard cap rule:
- if fewer than 4 decision logs are present, maximum score is
8/20
Artifact 4 heuristics: Composition Challenge Analysis
- Score high when:
- the moment is concrete and triggered by observable signal
- the required capabilities are reusable and clearly named
- the failure signal is plausible and roadmap-generating
- the student distinguishes missing capability from missing interface polish
- Score down when:
- the artifact is basically a feature pitch
- the moment is vague or hypothetical without signal evidence
- the "missing capability" is actually a person, project, or UI request
- no failure signal is defined
- Hard cap rule:
- if the submission does not identify a reusable capability gap, maximum score is
10/20
Artifact 5 heuristics: Operating Principles Document
- Score high when:
- each principle is specific enough to change behavior
- each principle links back to evidence from earlier artifacts
- principles describe how the organization should operate, not what it hopes to value
- each principle has an artifact or signal implication
- Score down when:
- principles read like generic culture statements
- the rationale is detached from earlier analysis
- no behavior or instrumentation change is specified
- multiple principles are redundant
- Hard cap rule:
- if principles are not grounded in the preceding portfolio, maximum score is
11/20
Confidence calibration guidance for the LLM
High confidence: detailed evidence, multiple concrete examples, consistent terminology across artifactsMedium confidence: mostly complete submission with some vague or inconsistent sectionsLow confidence: sparse evidence, high abstraction, or obvious template-filling with little operational specificity
Suggested structured grading output format
{
"overall_score": 0,
"artifact_scores": {
"organizational_map": 0,
"signal_inventory": 0,
"artifact_discipline_log": 0,
"composition_challenge": 0,
"operating_principles": 0
},
"confidence": "high|medium|low",
"strengths": [],
"weaknesses": [],
"revision_priorities": [],
"evidence_used": [],
"notes_for_human_reviewer": []
}
```
## 7. Rubrics, scoring criteria, and evaluator prompt guidance
### Master scoring scale
Use the following shared scale for each artifact:
- `18-20` Excellent: precise, evidence-rich, operationally credible, and aligned with course concepts
- `14-17` Strong: mostly rigorous, some gaps in depth or precision, but materially useful
- `10-13` Adequate: demonstrates partial understanding with notable category errors or thin evidence
- `6-9` Weak: superficial, generic, or structurally incomplete
- `0-5` Insufficient: missing major elements or fundamentally misunderstands the concept
### Artifact-specific rubric criteria
#### Artifact 1 rubric: 4-Block Organizational Map
| Criterion | Points | What earns full credit |
|---|---:|---|
| Correct use of four-block framework | 5 | All four blocks are clearly defined and populated or explicitly marked as missing |
| Capability identification quality | 5 | Capabilities are true primitives and separated from products/interfaces |
| Diagnosis of where intelligence lives | 5 | Current intelligence location is concrete, credible, and explains bottlenecks |
| Gap identification and prioritization | 5 | At least 3 meaningful gaps with clear systemic implications |
#### Artifact 2 rubric: Signal Inventory
| Criterion | Points | What earns full credit |
|---|---:|---|
| Honest signal identification | 6 | High-value signals are consequential and clearly justified |
| Compounding analysis | 4 | Student explains which signals grow in value over time and why |
| Decision linkage | 5 | Each signal is tied to a real decision or operating use |
| New signal pathway design | 5 | Pathway is specific, feasible, and operationally meaningful |
#### Artifact 3 rubric: Artifact Discipline Log
| Criterion | Points | What earns full credit |
|---|---:|---|
| Artifact audit quality | 4 | Audit diagnoses strengths and missing model-ready elements accurately |
| Decision log rigor | 8 | Entries include context, alternatives, reasoning, outcome, and signal |
| Postmortem quality | 4 | Failure analysis is systemic and learning-oriented |
| Operationalization plan | 4 | Student specifies how this discipline will persist in real work |
#### Artifact 4 rubric: Composition Challenge Analysis
| Criterion | Points | What earns full credit |
|---|---:|---|
| Moment definition | 4 | The moment is real, consequential, and signal-detectable |
| Capability composition logic | 6 | Required capabilities are reusable, well-scoped, and coherent |
| Failure-signal definition | 5 | Missing element and resulting failure signal are plausible and clear |
| Roadmap implication | 5 | The analysis shows how failure should generate future capability priorities |
#### Artifact 5 rubric: Operating Principles Document
| Criterion | Points | What earns full credit |
|---|---:|---|
| Specificity of principles | 5 | Principles are concrete and behavior-shaping |
| Grounding in portfolio evidence | 5 | Principles clearly emerge from prior artifacts |
| Operational implications | 5 | Each principle changes artifacts, signals, review habits, or decision rights |
| Coherence and non-redundancy | 5 | Principles fit together as a usable operating model |
### Portfolio evaluator prompt
Use this prompt when grading the full portfolio:
```text
You are evaluating a student portfolio for Project Agni Course 12: AI-Native Operating.
Your job is to grade only the evidence present in the submission. Do not assume competence, context, or implementation details that are not stated. Penalize category errors such as confusing products with capabilities, dashboards with intelligence layers, or self-report with honest signal.
Score each of the five artifacts out of 20 using the supplied rubric criteria:
1. 4-Block Organizational Map
2. Signal Inventory
3. Artifact Discipline Log
4. Composition Challenge Analysis
5. Operating Principles Document
Then produce:
- overall score out of 100
- confidence level: high, medium, or low
- 3 strengths grounded in evidence
- 3 weaknesses grounded in evidence
- 3 revision priorities ordered by leverage
Important evaluation rules:
- If an artifact is incomplete, score the artifact based on what is present and note missing sections.
- If products or interfaces are mislabeled as capabilities, reduce the Organizational Map score materially.
- If self-report or vanity metrics are rated as the strongest signals without strong justification, reduce the Signal Inventory score materially.
- If decision logs lack reasoning or outcome linkage, reduce the Artifact Discipline Log score materially.
- If the composition challenge is effectively a feature request, reduce the Composition Challenge score materially.
- If the operating principles read like generic values rather than operating rules grounded in the portfolio, reduce the Operating Principles score materially.
Return your answer in structured JSON using this shape:
{
"overall_score": 0,
"artifact_scores": {
"organizational_map": 0,
"signal_inventory": 0,
"artifact_discipline_log": 0,
"composition_challenge": 0,
"operating_principles": 0
},
"confidence": "",
"strengths": [],
"weaknesses": [],
"revision_priorities": [],
"evidence_used": [],
"notes_for_human_reviewer": []
}
```
### Single-artifact evaluator prompts
**Organizational Map**
```text
Evaluate this 4-block organizational map.
Score out of 20 using these criteria:
- correct use of the four-block framework
- capability identification quality
- diagnosis of where intelligence lives
- gap identification and prioritization
Flag any case where products, interfaces, teams, or dashboards are mislabeled as capabilities or intelligence layers.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```
**Signal Inventory**
```text
Evaluate this signal inventory.
Score out of 20 using these criteria:
- honest signal identification
- compounding analysis
- decision linkage
- new signal pathway design
Specifically identify whether the student overrates surveys, self-report, or vanity metrics.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```
**Artifact Discipline Log**
```text
Evaluate this artifact discipline submission.
Score out of 20 using these criteria:
- artifact audit quality
- decision log rigor
- postmortem quality
- operationalization plan
Flag whether decision logs preserve reasoning, whether outcomes are linked, and whether postmortems are systemic rather than blame-centric.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```
**Composition Challenge Analysis**
```text
Evaluate this composition challenge analysis.
Score out of 20 using these criteria:
- moment definition
- capability composition logic
- failure-signal definition
- roadmap implication
Flag whether the student has actually identified a reusable capability gap or has simply written a feature request.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```
**Operating Principles Document**
```text
Evaluate this operating principles document.
Score out of 20 using these criteria:
- specificity of principles
- grounding in portfolio evidence
- operational implications
- coherence and non-redundancy
Flag any principles that read like generic values statements rather than operating rules.
Provide 2 strengths, 2 weaknesses, and 2 revision priorities.
```
### Human-in-the-loop moderation guidance
- Send low-confidence evaluations to a human reviewer
- Require human review if the portfolio includes highly domain-specific regulated capabilities
- Require human review if the grader detects contradictions across artifacts but cannot resolve them
- Allow the human reviewer to override scores when real organizational context is clearly present but lightly documented
## 8. Feedback strategy: what strong/average/weak responses look like and how an LLM should respond
### Feedback design goals
- Make feedback diagnostic, not decorative
- Tie every critique to observable evidence or missing evidence
- Distinguish conceptual misunderstanding from incomplete execution
- Prioritize revisions that improve operating clarity and future decision quality
### What strong responses look like
**Characteristics**
- Uses the four-block framework correctly and consistently
- Separates capabilities from products and interfaces
- Identifies at least one overlooked, compounding, honest signal
- Writes decision logs with genuine alternatives and reasoning
- Frames the composition challenge around a real moment and a reusable capability gap
- Produces principles that clearly change how the organization should document, decide, and learn
**How the LLM should respond**
- Acknowledge the strongest analytical moves specifically
- Preserve the student's strongest distinctions so they are not revised away
- Suggest one higher-order improvement, such as tighter signal instrumentation or cross-artifact alignment
- Avoid generic praise; quote or paraphrase the evidence that justifies the score
**Example feedback posture**
- "Your strongest move is distinguishing merchant underwriting as a capability while correctly placing the merchant dashboard as an interface. That distinction holds across the portfolio and improves the quality of your composition analysis."
### What average responses look like
**Characteristics**
- Understands the course concepts in broad terms but slips into generic language
- Has some real examples but not enough evidence depth
- Mixes honest signals with weak proxies without fully defending the ratings
- Writes decision logs that capture context but only partially capture reasoning or outcome
- Identifies a plausible gap but not yet a clearly reusable capability
**How the LLM should respond**
- Name the partial success first
- Then isolate 2-3 concrete category errors or thin sections
- Give explicit revision instructions tied to the rubric
- Recommend the highest-leverage next move rather than many small edits
**Example feedback posture**
- "You are directionally using the four-block model well, but several items in your capability list still read as product bundles. Rework that section first; until the primitive layer is clean, the rest of the portfolio remains harder to evaluate."
### What weak responses look like
**Characteristics**
- Mostly abstract or slogan-heavy
- Treats the map like an org chart or tool list
- Overrates surveys, NPS, or qualitative opinion as primary truth without justification
- Logs decisions as summaries with no tension, alternatives, or learning
- Writes a feature request instead of a composition challenge
- Produces principles that sound like values posters
**How the LLM should respond**
- Be direct and specific
- State that the current artifact does not yet demonstrate the required course understanding
- Point to the exact missing elements
- Give a short revision sequence in the order that will unlock improvement
- Avoid shaming language and avoid pretending the work is stronger than it is
**Example feedback posture**
- "This submission does not yet show capability-first thinking. Most of the listed capabilities are interfaces or features. Revise the map before revising the composition challenge, because the current challenge inherits the same category error."
### LLM feedback rules
- Do not praise polish if analytical quality is weak
- Do not invent hidden strengths
- Do not soften essential corrections with vague positivity
- Do identify one thing worth preserving in any submission that is at least partially correct
- Do distinguish between `missing`, `unclear`, and `incorrect`
### Recommended feedback template for all evaluations
```text
Overall judgment:
[1-3 sentence summary of demonstrated understanding]
What is working:
- [evidence-based strength]
- [evidence-based strength]
What needs revision:
- [most important weakness]
- [second weakness]
- [third weakness]
Highest-leverage next step:
- [the one revision that most improves the portfolio]
Why this matters:
- [connect the revision to AI-native operating, not just the grade]
```
### Revision guidance by failure mode
**If the student confuses capabilities and products**
- Ask them to rewrite the map using the test: "Could this support multiple future experiences without being tied to one interface?"
**If the student overvalues weak signals**
- Ask them to rerank signals by consequence, cost of action, and resistance to gaming
**If decision logs are thin**
- Ask them to add one rejected alternative and one downstream signal for every log entry
**If the composition challenge is too feature-like**
- Ask them to remove UI language and restate the missing reusable capability
**If operating principles are generic**
- Ask them to tie each principle to one failure, one signal, and one artifact behavior from the portfolio
## Implementation notes for instructors
- Run this course with real organizational examples whenever possible; hypothetical examples should be allowed only when clearly marked
- Encourage anonymization but forbid total abstraction
- Use the website prompt's layered, system-architecture framing in slides and worksheets to reinforce transfer
- Keep the critique standard high; the course only works if category errors are corrected early