Course 09 Course Content Spec: Deep End Deployment
1. Title and source files used
Course title: Deep End Deployment
Owned output file: 09-deep-end-deployment/09_deep_end_deployment_course_content.md
Source files used
09-deep-end-deployment/curriculum.md09-deep-end-deployment/website-prompt.md
Source-of-truth rule
curriculum.mdgoverns pedagogy, learning objectives, course arc, and assessment intent.website-prompt.mdgoverns tone, visual language, framing, and student-facing emotional posture.- Where the two sources differ operationally, this document resolves the ambiguity into one implementation-ready delivery plan.
2. Design decisions at the top
- This course only works with real partner organizations. No simulated cases, dummy briefs, or classroom-only substitutes. If no partner pipeline exists, the course should not run.
- Standardized format:
Phase 0 placement + 4 live sessions + async deployment + async memo. The curriculum references "5 sessions" and the website prompt lists four live moments; this spec resolves that by treating placement as a facilitated pre-course phase rather than a numbered live class. - Default runtime is 3 weeks, expandable to 4 weeks. A 2-week version is possible only for highly responsive partners and smaller-scope projects.
- The first 72 hours are the primary leading indicator. The course is designed around initial diagnosis, speed of orientation, and first contribution, not around polished final deliverables.
- Students are judged on contribution quality under ambiguity, not domain mastery. A rough but correctly targeted artifact beats a polished but irrelevant one.
- Reflection is part of the work, not a postscript. The identity shift matters as much as the external artifact.
- Partner experience matters. Students are not free labor. They must create value without creating coordination drag or trust damage.
- Assessment uses LLM support but not LLM autonomy. LLMs generate evidence-tagged draft evaluations; a human facilitator approves final grades, especially for borderline cases, partner complaints, or integrity concerns.
- Evidence beats self-report. Students must submit artifacts, communications logs, notes, and stakeholder feedback. Unsupported claims score lower.
- Tone should feel like a mission briefing, not school. Facilitation should be direct, high-trust, and high-standard, consistent with the website prompt's deployment framing.
3. Delivery model assumptions
Cohort and staffing assumptions
- Target cohort size: 12-24 students
- Partner count: 4-8 organizations
- Placement ratio: 1-3 students per organization, with 1 preferred for highest accountability
- Lead facilitator: runs live sessions, calibrates standards, resolves escalations
- Deployment coordinator: manages partner matching, logistics, risk, and check-ins
- Assessment lead: oversees rubric calibration, LLM evaluation pipeline, and final grades
- Optional partner liaison: useful if partners are high-profile, time-constrained, or international
Learner assumptions
- Age band: 16-25
- Prerequisite completion: at least 2 prior Project Agni courses
- Recommended prior courses: 01, 03, or 05
- Baseline capability: can write clearly, run meetings, synthesize ambiguity, and produce a basic deliverable without step-by-step instruction
- Weekly time commitment: 8-12 hours per week minimum, including partner time
Delivery format assumptions
- Primary mode: cohort-based, live-sync + field deployment
- Default duration: 3 weeks
- Live sessions: 4 synchronous sessions, 90-120 minutes each
- Async deployment time: ongoing across weeks 1-3
- Memo writing window: 3-5 days after the main deployment ends
- Communication stack: email, Slack/Discord/WhatsApp, shared drive, scheduling tool, and optional project management board
Partner/project assumptions
- Partners must offer:
- A real project with live stakes
- A named point of contact
- Access to enough context for the student to act
- A problem with genuine ambiguity
- A timeline where a student can produce useful movement in 2-4 weeks
- Partners must not offer:
- Purely clerical work
- Tasks with zero stakeholder relevance
- Fake "innovation projects" no one cares about
- Work requiring legal/regulatory responsibility beyond a student's authority
- Projects where success requires proprietary access the partner will not grant
Risk and safeguarding assumptions
- Students must sign a professionalism and confidentiality agreement if partner work requires it.
- No student should be asked to represent themselves as an employee, expert, or decision-maker beyond their actual role.
- Students should escalate if they encounter unethical requests, unsafe labor expectations, or unmanageable ambiguity.
- Facilitators should have a fallback reassignment protocol if a partner disappears or becomes unusable in week 1.
Evidence collection assumptions
Each student must maintain a lightweight deployment record containing:
- Time-stamped notes from stakeholder conversations
- Public research collected in the first 48 hours
- Drafts and shipped artifacts
- A communication log showing initiative and follow-through
- A short partner feedback form or recorded feedback statement
4. Detailed course content broken down by module, session, lesson, activity, timing, facilitator moves, and student outputs
Course arc overview
| Phase | Timing | Purpose | Core output |
|---|---|---|---|
| Phase 0: Placement and briefing prep | 5-7 days before launch | Match students to real projects and establish stakes | Student deployment brief + ranked match rationale |
| Module 1: The Briefing | Week 1, Day 1 | Frame the rules, first-72-hours plan, and contribution standard | 72-hour deployment plan |
| Deployment Window A | Week 1, Days 1-3 | Run rapid diagnosis and produce first output | First shipped contribution |
| Module 2: Mid-Deployment Check-In | End of Week 1 | Diagnose reality vs. assumptions and unblock execution | Status report + revised scope |
| Deployment Window B | Week 2 | Continue contribution with tighter targeting | Work product v1 or v2 |
| Module 3: Structured Debrief | End of Week 2 | Extract lessons from live execution | Debrief presentation |
| Module 4: Deep End Memo | Week 3 async | Convert experience into durable judgment and narrative | 2,000-3,000 word memo |
| Module 5: Cohort Showcase | Week 3 or 4 | Publicly articulate the deep-end story and identity shift | Showcase presentation |
Phase 0: Placement and partner readiness
Duration: 5-7 days pre-course Facilitated touchpoints: partner intake, student ranking, match confirmation, deployment brief release
Objectives
- Confirm only high-quality partner projects enter the cohort.
- Match students to projects where ambiguity is real but survivable.
- Preserve the "no hand-holding" philosophy without creating preventable chaos.
Partner intake requirements
Partners submit:
- Organization description
- One-sentence stated problem
- One paragraph on what is actually at stake
- Available tools/access
- Primary contact and expected response time
- Constraints, confidentiality needs, and non-negotiables
Facilitator screens for:
- Real urgency
- Clarity of stakes
- Appropriate scope
- Student safety and legitimacy
- Likelihood of meaningful contribution within 2-4 weeks
Student pre-work
Students submit:
- One-page capability profile
- Ranked partner preferences
- A short note on why each top choice fits their edge
- A statement of what kind of ambiguity they handle well and poorly
Phase 0 activity plan
| Lesson | Activity | Timing | Facilitator moves | Student outputs |
|---|---|---|---|---|
| 0.1 Partner screening | Review incoming partner briefs | 60-90 min internal | Reject vague or low-stakes projects; tighten sloppy scopes; secure point-of-contact commitment | None |
| 0.2 Student self-positioning | Capability + preference submission | 30 min async | Require specificity; push students away from generic claims like "I can help with anything" | Capability profile + ranked preferences |
| 0.3 Matchmaking | Staff match meeting | 45-60 min internal | Pair for stretch, not comfort; avoid stacking weak communicators at the same partner | Match roster |
| 0.4 Deployment brief release | Send intentionally sparse brief | 10 min | Reveal only enough to establish stakes; do not overspecify | Deployment brief received |
| 0.5 Readiness checkpoint | Optional short logistics call with partner contacts | 15 min per partner | Confirm access, start date, and communication channel | Calendar hold + contact confirmation |
Phase 0 facilitator script guidance
- Use language like: "You are being matched to a live problem, not a role."
- Do not give students hidden context the partner has not shared.
- Do tell partners that students should not be treated as passive interns waiting for tasks.
Module 1: The Briefing
Session length: 120 minutes When: Week 1, Day 1
Session goals
- Establish the psychological contract.
- Teach the 48-hour diagnostic protocol.
- Force each student to define a concrete first contribution.
Session agenda
| Lesson | Activity | Timing | Facilitator moves | Student outputs |
|---|---|---|---|---|
| 1.1 Mission framing | Opening talk: why deep-end experiences change identity | 15 min | Set stakes fast; reject school framing; make it clear that discomfort is expected | None |
| 1.2 Rules of engagement | Walk through the seven rules from the curriculum | 20 min | Read each rule plainly; ask for one implication and one failure mode per rule | Annotated rule sheet |
| 1.3 The 48-hour rule | Teach the first-48-hours diagnostic protocol | 20 min | Model questions: "What is the real problem?" "Who actually matters?" "What can ship by hour 72?" | Diagnostic note template |
| 1.4 Case contrast | Compare a good vs. bad first-week approach | 15 min | Use facilitator-created examples showing over-waiting, overreaching, and correct pacing | Notes on pitfalls |
| 1.5 Stakeholder map sprint | Students sketch org map and information gaps | 15 min | Push them to identify power, not just titles; ask who can unblock and who can veto | Initial stakeholder map |
| 1.6 First-output planning | Draft first 72-hour plan | 20 min | Require a deliverable with a verb and a recipient; ban vague plans like "learn more" | Draft 72-hour plan |
| 1.7 Peer pressure test | Students pair-review each plan | 10 min | Instruct peers to challenge softness, not to be nice | Revised 72-hour plan |
| 1.8 Commitment round | Verbal commitment: who they will contact, what they will ship, what they fear | 5 min per 4-5 students | Force clarity and public accountability | Spoken commitment |
Required takeaways from Module 1
Each student leaves with:
- A list of the first 3 people to contact
- A 72-hour deliverable
- A hypothesis about the real problem
- A top risk they will monitor
- A communication norm for escalating confusion
Deployment Window A: First 72 hours
Duration: Days 1-3 after Session 1 Primary goal: establish traction quickly
Required student actions
- Research the organization using only public and partner-provided materials.
- Hold at least 2 conversations:
- one with point of contact
- one with another stakeholder
- Produce one concrete shipped output by hour 72.
- Log what changed between the stated problem and the observed problem.
Acceptable first outputs
- A rewritten investor or donor narrative
- A scoped operating memo
- A cleaned dataset with initial insights
- A user-interview synthesis
- A product teardown with recommended decisions
- A dashboard prototype
- A process map revealing bottlenecks
Unacceptable first outputs
- "I am still getting context"
- A long note with no decision or deliverable attached
- A speculative strategy deck no one asked for and no stakeholder has seen
- A work product that ignores partner realities or permissions
Facilitator monitoring during Window A
| Checkpoint | Timing | Facilitator moves | Student outputs |
|---|---|---|---|
| 24-hour pulse | End of Day 1 | Ask for one sentence on the real problem and one blocker | Brief update |
| 48-hour pulse | End of Day 2 | Check whether conversations happened and output is on track | Evidence of outreach |
| 72-hour submission | End of Day 3 | Require proof of shipment, not a promise | First output + commentary |
Module 2: Mid-Deployment Check-In
Session length: 90 minutes When: End of Week 1
Session goals
- Surface the gap between initial assumptions and actual organizational reality.
- Correct under-contribution, overreach, and drift.
- Re-scope week 2 work around a sharper understanding of value.
Session agenda
| Lesson | Activity | Timing | Facilitator moves | Student outputs |
|---|---|---|---|---|
| 2.1 Rapid status reports | Each student gives a 4-5 minute update | 35 min | Enforce the exact format: stated problem, actual problem, what shipped, what is stuck, next move | Status report |
| 2.2 Cohort pattern readout | Facilitator names recurring failure modes | 10 min | Call out patterns like hesitation, overbuilding, poor stakeholder mapping, weak asks | Cohort notes |
| 2.3 Targeted peer consults | Structured problem-solving in small groups | 20 min | Require students to ask for a specific kind of help: info, perspective, connection, or decision framing | Peer consult notes |
| 2.4 Scope reset | Students rewrite their week-2 objective | 15 min | Push toward narrower, higher-leverage work; cut vanity work | Revised scope statement |
| 2.5 Escalation coaching | Short mini-lesson on asking better questions inside organizations | 10 min | Teach concise asks, escalation timing, and how to avoid sounding lost | Outreach revision |
Facilitator diagnostics in Module 2
Watch for:
- Students hiding behind busyness instead of shipped work
- Students creating value no one internally recognizes
- Students avoiding politically sensitive but central stakeholders
- Students over-identifying with the first person they met
- Students confusing responsiveness with alignment
Student output requirements after Module 2
By the next 72 hours, each student must submit:
- Revised problem statement
- Updated stakeholder map
- One week-2 objective
- One explicit thing they will stop doing
Deployment Window B: Focused contribution
Duration: Week 2 Primary goal: turn orientation into useful movement
Recommended work pattern
- Day 1 of week 2: align on target outcome
- Day 2-4: build, test, revise
- Day 5: present, hand off, or secure decision
Facilitator role in Window B
- Stay available for escalation, but do not rescue students from ambiguity that is still productive.
- Intervene if the student is clearly mis-scoped, blocked by access, or damaging partner trust.
- Push students to define what "partner-relevant progress" means by the end of week 2.
Module 3: Structured Debrief
Session length: 120 minutes When: End of Week 2
Session goals
- Convert raw activity into judgment.
- Separate the problem given from the problem discovered.
- Make students narrate their own behavior under pressure.
Session agenda
| Lesson | Activity | Timing | Facilitator moves | Student outputs |
|---|---|---|---|---|
| 3.1 Artifact show-and-tell | Students show the actual work product | 30 min | Do not let them summarize without showing evidence; ask who used it and what changed | Work product evidence |
| 3.2 Problem reframe debrief | Compare stated problem vs. actual problem | 20 min | Push for precision: what evidence changed your view, and when? | Problem delta statement |
| 3.3 Behavioral reflection | Analyze default reactions to ambiguity and authority | 20 min | Ask where they hesitated, overstepped, or asked weak questions | Reflection notes |
| 3.4 Peer review circle | Three peers give structured feedback | 30 min | Use the curriculum's three prompts; enforce candor over kindness theater | Peer review record |
| 3.5 Reset for memo | Teach memo expectations and evidence requirements | 20 min | Clarify that the memo must be analytical, not diary-style | Memo outline |
Peer review prompts
Peers must answer:
- What did this student do that demonstrated day-one contribution?
- What assumption did they carry too long?
- What question should they sit with before the next deployment?
Module 4: Deep End Memo
Format: asynchronous, 2,000-3,000 words When due: 3-5 days after Module 3
Memo purpose
The memo converts a live deployment into durable operating insight. It should become a future interview story, self-diagnostic artifact, and proof that the student can extract principles from live ambiguity.
Required memo structure
| Section | Target length | What must be included |
|---|---|---|
| The setup | 250-400 words | Org, project, initial brief, what the student thought they were entering |
| The first 72 hours | 400-600 words | Conversations, discoveries, first output, shift in understanding |
| The pivots | 400-600 words | When the student realized the given problem was not the actual problem |
| What the experience revealed | 500-700 words | Analysis of ambiguity, ownership, default patterns, and judgment |
| What comes next | 250-400 words | Specific behavior change in future deployments |
Memo quality bar
Strong memos:
- cite evidence
- name real mistakes
- distinguish signal from story
- show changed judgment, not just changed feelings
- connect personal reflection to real operating decisions
Weak memos:
- overdramatize simple events
- describe effort without effect
- claim growth without evidence
- avoid naming misreads, political mistakes, or shallow assumptions
Module 5: Cohort Showcase
Session length: 120 minutes When: Week 3 or 4, after memo submission
Session goals
- Help students consolidate a durable "deep end story."
- Create public accountability for learning extraction.
- Make the identity shift explicit.
Session agenda
| Lesson | Activity | Timing | Facilitator moves | Student outputs |
|---|---|---|---|---|
| 5.1 Showcase framing | Explain that this is a story, not a status report | 10 min | Demand narrative shape: setup, surprise, move, mistake, lesson | None |
| 5.2 Student presentations | 5-7 minute presentations + 3 minutes Q&A each | 80 min | Cut rambling; ask one sharp follow-up on judgment or self-awareness | Showcase deck or talk |
| 5.3 Collective reflection | Whole-cohort synthesis | 20 min | Ask what patterns repeated across organizations and what future rule they will carry forward | Cohort synthesis notes |
| 5.4 Identity close | Final reflection round | 10 min | Name the before/after identity shift explicitly | Exit reflection |
Showcase presentation structure
Students must include:
- The original brief
- The actual situation they discovered
- The artifact they produced
- The biggest mistake they made
- The most important thing they learned about how they operate
5. Assignments and artifacts
Assignment list
| Assignment | Due | Weight | Required artifact(s) | Minimum acceptance threshold |
|---|---|---|---|---|
| A1. Capability profile + partner ranking | Phase 0 | Ungraded gateway | Capability profile, ranked preferences | Specific, honest, usable for matching |
| A2. 72-hour plan | End of Session 1 | 15% | Written plan with contacts, hypothesis, and first output | Concrete, time-bound, not generic |
| A3. First shipped output | Hour 72 | Included in A2 score | Actual artifact and proof it was delivered | Real recipient, relevant contribution |
| A4. Week 1 status report | End of Week 1 | 10% | Live report or recording + revised scope | Honest diagnosis with evidence |
| A5. Work product | End of Week 2 | 25% | Partner-relevant artifact, notes, or handoff | Useful movement for partner |
| A6. Structured debrief presentation | End of Week 2 | 10% | Presentation artifact and peer feedback log | Clear distinction between stated and real problem |
| A7. Deep End Memo | Week 3 | 20% | 2,000-3,000 word memo | Analytical, evidence-based reflection |
| A8. Cohort showcase | Final session | 10% | Final presentation | Coherent deep-end story |
| A9. Partner feedback | Final week | 10% | Short feedback form or interview summary | Confirms professionalism and usefulness |
Artifact standards
A2-A3: 72-hour plan and first output
Must include:
- 3 named stakeholders or stakeholder roles
- 1 working hypothesis about the actual problem
- 1 deliverable due by hour 72
- Proof of delivery
- 100-200 words on what changed between hour 0 and hour 72
A4: Week 1 status report
Must answer:
- What did you think the problem was?
- What do you now think it is?
- What did you produce?
- What is still unclear?
- What will you do in week 2?
A5: Work product
Acceptable formats:
- memo
- analysis
- prototype
- dashboard
- process redesign
- research synthesis
- communication asset
- operating recommendation
The artifact must matter to a real stakeholder. Purely performative classroom work does not count.
A7: Deep End Memo
Submission packet must include:
- memo
- appendix with artifacts or links
- partner/project context summary
- one paragraph on what the student would do differently if restarting
A9: Partner feedback
Partner form should ask:
- Was the student proactive?
- Did the student improve in relevance over time?
- Did the student produce something useful?
- Did the student require too much hand-holding?
- Would you trust this student on another ambiguous project?
6. AI/LLM grading and assessment framework
Role of the LLM
The LLM is used to:
- extract evidence from student submissions
- score against rubrics using stated criteria
- draft feedback aligned to the student's performance band
- flag contradictions, unsupported claims, and missing evidence
The LLM is not used to:
- invent missing evidence
- decide integrity violations without human review
- override documented partner feedback
- make final pass/fail decisions in disputed cases
Assessment pipeline
- Ingest all student artifacts: plans, outputs, notes, presentations, memo, partner feedback.
- Normalize them into one evaluation packet per student.
- Extract evidence using the rubric dimensions below.
- Assign provisional criterion scores with direct evidence citations.
- Run contradiction check between student self-report and artifacts/partner feedback.
- Generate narrative feedback matched to the student's actual evidence band.
- Human facilitator reviews low-confidence, edge-case, or escalated evaluations before release.
Core rubric dimensions for LLM scoring
| Dimension | Definition | Evidence heuristics |
|---|---|---|
| Initiative under ambiguity | Did the student move without waiting for permission to think? | Early outreach, specific asks, first output by hour 72, evidence of self-starting behavior |
| Diagnostic quality | Did the student correctly distinguish stated problem from actual problem? | Explicit problem reframing, evidence trail, stakeholder interviews, changed plan |
| Value creation | Did the student create movement that mattered to the partner? | Adoption, usage, response from stakeholders, handoff quality, partner testimony |
| Stakeholder navigation | Did the student read the org well enough to act without unnecessary damage? | Correct escalation, tone, stakeholder map quality, lack of avoidable political mistakes |
| Reflection quality | Did the student extract real lessons rather than perform growth? | Specific mistakes, causal analysis, changed behavior commitments |
| Professional reliability | Did the student follow through consistently? | Timeliness, submission completeness, communication quality, partner trust |
Scoring model
- Each rubric dimension is scored on a 1-5 scale.
- Deliverable scores are calculated from the relevant dimensions.
- Final course grade is a weighted composite of assignments, not a simple average of rubric dimensions.
- If partner feedback directly contradicts the student's self-report, the LLM must reduce confidence and flag for human review.
Confidence rules for the LLM
- High confidence: multiple artifacts support the same conclusion.
- Medium confidence: evidence is present but uneven or partially inferred from context.
- Low confidence: the student makes claims without artifact support, or partner data is missing.
The LLM must label each evaluation with one of those confidence levels.
Automatic flags for human review
- No first output by hour 72
- Partner reports unreliability or boundary issues
- Submission contains generic reflection with no evidence
- Work product appears substantially AI-generated without grounded deployment evidence
- Student claims impact that cannot be traced to any stakeholder response or artifact
- Severe mismatch between polished memo and weak documented execution
7. Rubrics, scoring criteria, and evaluator prompt guidance
Master performance scale
| Score | Label | General meaning |
|---|---|---|
| 5 | Exceptional | High-agency, evidence-rich, materially useful, self-aware |
| 4 | Strong | Clear contribution, good diagnosis, credible reflection |
| 3 | Competent | Met the bar with uneven sharpness or partial evidence |
| 2 | Weak | Limited contribution, soft diagnosis, or shallow reflection |
| 1 | Failing | Little useful action, missing evidence, or poor professionalism |
Assignment-specific scoring criteria
A2-A3: 72-hour plan + first output
| Criterion | 5 | 3 | 1 |
|---|---|---|---|
| Specificity of plan | Names people, questions, risks, and a concrete shipped output | Has a plausible plan but parts remain vague | Mostly generic intentions |
| Speed to contribution | Ships something relevant by hour 72 | Ships late or ships something marginally useful | Does not ship or ships something irrelevant |
| Problem diagnosis | Early hypothesis is thoughtful and evidence-linked | Hypothesis exists but is shallow | No meaningful hypothesis |
| Evidence of initiative | Student drove outreach and momentum | Some initiative, some waiting | Mostly passive |
A4: Week 1 status report
| Criterion | 5 | 3 | 1 |
|---|---|---|---|
| Reality update | Clearly distinguishes actual vs. stated problem | Some update but still muddy | Still speaking in original vague brief |
| Honesty about unknowns | Names gaps without defensiveness | Some honesty, some glossing | Defensive or evasive |
| Scope adjustment | Week-2 plan is sharper and better targeted | Adjusted, but not decisively | No meaningful adjustment |
A5: Work product
| Criterion | 5 | 3 | 1 |
|---|---|---|---|
| Partner relevance | Directly useful to a stakeholder decision or process | Potentially useful, but adoption unclear | Little sign anyone needed it |
| Quality of thinking | Shows sharp prioritization and judgment | Adequate but conventional | Sloppy, generic, or mis-scoped |
| Execution quality | Coherent, usable, and handed off cleanly | Understandable but rough | Hard to use or incomplete |
A6-A7: Debrief + Deep End Memo
| Criterion | 5 | 3 | 1 |
|---|---|---|---|
| Self-awareness | Names concrete mistakes and causal patterns | Reflection is present but partial | Reflection is abstract or self-protective |
| Evidence use | Connects claims to artifacts and moments | Some evidence, some summary claims | Mostly unsupported narrative |
| Identity-level learning | Articulates a real operating shift | Some lesson, not yet durable | No clear learning extracted |
A8-A9: Showcase + partner feedback
| Criterion | 5 | 3 | 1 |
|---|---|---|---|
| Story coherence | Clear narrative arc with stakes, pivot, and lesson | Story is understandable but flat | Rambling or unclear |
| Credibility | Claims align with artifacts and partner feedback | Mostly aligned, some soft spots | Inflated or contradicted |
| Trust signal | Partner would re-engage the student | Mixed signal | Partner would not re-engage |
Concrete assessment heuristics for LLM evaluation
The LLM should use the following heuristics:
- Initiative heuristic: score up when the student made specific asks, secured conversations quickly, and shipped before being fully comfortable.
- Diagnosis heuristic: score up when the student names what evidence changed their understanding of the problem.
- Value heuristic: score up when there is a visible stakeholder response, adoption, decision influence, or handoff.
- Scope-control heuristic: score down when the student built too much before alignment or stayed too abstract for too long.
- Reflection heuristic: score up when the student can name not just what happened, but why they behaved as they did and what they will change.
- Credibility heuristic: score down when the memo claims growth or impact that the artifacts do not support.
- Professionalism heuristic: score down when deadlines slip without communication, partner trust declines, or deliverables arrive without context.
Evaluator prompt guidance
System prompt for evaluator LLM
You are evaluating a Project Agni course called "Deep End Deployment."
You must grade only from provided evidence.
Do not infer unstated impact.
Do not reward polish over relevance.
Do not punish roughness if the work clearly created value under ambiguity.
For every score, cite the evidence that supports it.
If evidence is missing, say so explicitly.
If the student's claims conflict with partner feedback or artifacts, lower confidence and flag for human review.
```
#### Core evaluation prompt template
Evaluate this student's performance in Deep End Deployment.
Inputs:
- Deployment brief
- 72-hour plan
- First shipped output
- Week 1 status report
- Work product
- Debrief notes
- Deep End Memo
- Showcase transcript or slides
- Partner feedback
Tasks:
- Summarize the student's actual deployment in 120 words or less.
- Extract evidence for these dimensions:
- Initiative under ambiguity
- Diagnostic quality
- Value creation
- Stakeholder navigation
- Reflection quality
- Professional reliability
- Score each dimension from 1-5 with one quoted or paraphrased evidence reference.
- Calculate provisional assignment-level judgments:
- 72-hour plan + first output
- Week 1 status report
- Work product
- Debrief + memo
- Showcase + partner trust
- Assign confidence level: high, medium, or low.
- Flag any contradictions, unsupported claims, or reasons for human review.
- Write feedback in three parts:
- What the student did well
- Where the student misread or underperformed
- What they should change on the next deployment
Output format:
- Summary
- Dimension scores
- Assignment judgments
- Confidence
- Human review flags
- Feedback
#### Prompt add-ons for specific failure cases
- If the student overstates impact: "Check whether claimed partner impact is verified by partner feedback, artifact usage, or a decision trace."
- If the memo is polished but execution is weak: "Prioritize execution evidence over narrative quality."
- If the work product is rough but partner loved it: "Do not penalize low polish if usefulness is clearly evidenced."
## 8. Feedback strategy: what strong/average/weak responses look like and how an LLM should respond
### Feedback principles
- Be direct.
- Tie feedback to observed evidence.
- Separate execution feedback from identity judgment.
- Praise only what is concretely earned.
- Turn weak performance into a next-deployment playbook, not generic encouragement.
### What strong responses look like
Strong responses usually show:
- movement in the first 72 hours
- a visible shift from stated problem to actual problem
- at least one useful artifact adopted, discussed, or acted on by stakeholders
- accurate reading of organizational dynamics
- memo-level honesty about mistakes without collapsing into self-criticism theater
#### How the LLM should respond to strong work
The LLM should:
- identify the exact behaviors worth repeating
- name the operating principles the student demonstrated
- point to the next level of challenge, usually sharper stakeholder navigation or bigger-scope ambiguity
**Example feedback posture**
- "You created trust by shipping early and revising quickly once the real problem surfaced. Keep that pattern. Your next edge is political reading: get to the real decision-maker faster."
### What average responses look like
Average responses usually show:
- some initiative, but delayed sharpness
- a useful but only partially aligned artifact
- an emerging understanding of the real problem
- reflection that is sincere but not yet rigorous
- partner value that is plausible but not strongly evidenced
#### How the LLM should respond to average work
The LLM should:
- acknowledge what crossed the bar
- precisely identify where the student stayed too vague, too late
- convert the feedback into 2-3 concrete behaviors for next time
**Example feedback posture**
- "You did not freeze, which matters. But you stayed in context-gathering mode too long before putting a point of view in front of the partner. Next time, force a draft recommendation earlier, even if it is provisional."
### What weak responses look like
Weak responses usually show:
- passive waiting disguised as professionalism
- generic communication with few sharp questions
- no meaningful first output, or an output disconnected from partner needs
- reflection focused on feelings or effort rather than misread decisions
- claims of impact with little evidence
#### How the LLM should respond to weak work
The LLM should:
- state plainly that the student did not yet meet the course bar
- identify the first broken link in the chain: diagnosis, initiative, execution, or reflection
- offer a narrow corrective path instead of a long motivational speech
**Example feedback posture**
- "You did not convert ambiguity into contribution. The main failure was not effort; it was passivity. On the next deployment, schedule two stakeholder conversations in the first 24 hours and ship a draft artifact by hour 48, even if incomplete."
### Response-style guardrails for the LLM
- Do not say "great job" unless the evidence clearly supports it.
- Do not use therapeutic language to soften performance feedback.
- Do not confuse verbosity with thoughtfulness.
- Do not produce identical feedback patterns for all mid-band students.
- Always end with a specific next-deployment behavior change.
### Recommended feedback format
| Section | Guidance |
|---|---|
| Evidence-backed strength | Name 1-2 things the student did that clearly worked |
| Core miss | Identify the most important mistaken behavior or assumption |
| Consequence | Explain what that miss cost them in the deployment |
| Next move | Give one concrete operating rule for the next ambiguous environment |
## Appendix: recommended operational assets for the facilitation team
These assets should exist in the delivery environment even if they are stored outside this file:
- Partner intake form
- Student capability profile template
- Deployment brief template
- 72-hour plan template
- Status report template
- Partner feedback form
- Evaluation packet template for LLM scoring
## Implementation summary
Deep End Deployment should feel like a controlled professional stress test. The course succeeds when students are forced to create value before they feel ready, then can explain with precision what that experience taught them about contribution, judgment, and identity. The bar is not polish. The bar is day-one usefulness under real conditions.