Course 04 Course Content Spec: AI Agent Build Lab

1. Title and Source Files Used

Course Title: AI Agent Build Lab

Owned Output File: 04-ai-agent-build-lab/04_ai_agent_build_lab_course_content.md

Primary source files used:

04-ai-agent-build-lab/curriculum.md
04-ai-agent-build-lab/website-prompt.md

Purpose of this document: This is the implementation-ready course-content specification for Course 04. It converts the source curriculum into a facilitation, delivery, assignment, and assessment plan that can be taught consistently by an instructor and evaluated consistently by humans and LLMs.

2. Design Decisions at the Top

Curriculum is the base truth; this document expands rather than reinterprets it.

The source curriculum defines the core philosophy, modules, example projects, and assessment priorities. This document preserves those and adds operational detail.

The course is taught as a build lab, not a lecture series.

The dominant learning mode is making, testing, debugging, and shipping. Explanations are short and instrumental. Students should leave each session with visible progress.

The central mental model is assistant vs. agent.

The course repeatedly reinforces the difference between asking for answers and delegating outcomes. This framing appears in instruction, exercises, demos, and assessment.

No-code and low-code are the default, not a fallback.

The curriculum explicitly says no coding experience is required. Tool choices, assignments, and evaluation therefore reward clear workflow design and reliable execution more than technical sophistication.

Working boring agents beat ambitious broken agents.

This is both a teaching principle and a grading principle. Scope control is treated as a skill, not a compromise.

Students build on their own real problem space.

The curriculum emphasizes agent opportunities in students' actual lives. All major artifacts should stay anchored to a real workflow the student cares about.

Evaluation must distinguish plausible output from successful execution.

Because agents can look impressive while failing silently, the assessment framework requires evidence of runs, test cases, failure analysis, and iteration.

The website prompt informs tone and presentation language.

The website source adds a high-energy, futuristic framing: "A workforce, not a tool," "You delegate. It executes," and "architect vs. operator." Those phrases should shape facilitator language, slide copy, and demo framing, but not override the curriculum's practical focus.

The source duration conflict is resolved explicitly.

The curriculum metadata says "5 sessions x 2 hours," while Modules 3-5 imply a longer build sprint with Days 3-4 at 4 hours each and Day 5 split into morning and afternoon blocks. This spec assumes a 5-day intensive with 14 contact hours total:

Day 1: 2 hours
Day 2: 2 hours
Day 3: 4 hours
Day 4: 4 hours
Day 5: 2 hours

Assessment uses human judgment supported by LLM evaluation, not replaced by it.

LLMs are used for first-pass scoring, rubric alignment, and formative feedback. Final high-stakes decisions should remain reviewable by a facilitator.

3. Delivery Model Assumptions

Target learner profile

Ages 15-25
Mixed academic backgrounds
No coding experience assumed
Comfortable with web apps and basic digital workflows

Cohort size

Ideal: 15-24 students
Minimum viable: 8 students
Maximum without instructional support: 28 students

Facilitation staffing

1 lead facilitator
1 teaching assistant or floating technical coach for every 12-15 students during build days

Instructional format

In-person preferred
Can run hybrid if all tools are browser-based and students can share screens
Projected live demos are required

Technology assumptions

Every student has a laptop
Stable internet access
Access to at least one LLM interface such as ChatGPT or Claude
Access to one workflow tool such as n8n, Zapier, Make, or equivalent
Students can create free accounts if institutional accounts are not provided

Tool policy

Default stack: ChatGPT or Claude plus n8n
Simpler fallback: ChatGPT or Claude only, using structured multi-step prompting
More technical extension path: LangFlow or similar visual builder

Data and privacy assumptions

Students should avoid connecting sensitive personal data unless explicit safeguards are taught
Demo tasks should use low-risk workflows, sample data, or sanitized personal data
Email, calendar, and messaging automations should default to draft mode rather than send mode

Instructional pacing assumptions

Day 1 and Day 2 are concept plus design heavy
Day 3 and Day 4 are build heavy
Day 5 is test, demo, and reflection heavy

Definition of success

Every student finishes with a functioning agent or tightly scoped agent workflow prototype that completes a real task with evidence
Every student can explain the goal, workflow, limitations, and next iteration

4. Detailed Course Content

Course Arc

The week progresses through five stages:

See the paradigm shift
Find a real problem worth delegating
Design an agent workflow with prompts and guardrails
Build and debug until the workflow works
Evaluate, present, and plan the next version

Module 1 / Session 1

Session title: What Is an AI Agent?

Day and duration: Day 1, 2 hours

Session outcomes

Students can explain the difference between an AI assistant and an AI agent
Students can identify at least three possible agent opportunities from their own lives
Students understand the five core components of an agent: goal, tools, memory, feedback loop, termination condition

Session agenda

0:00-0:10 | Opening framing

Activity: Welcome, course challenge, and week outcome preview
Facilitator moves:
State the promise: by the end of the week each student ships one working agent
Frame the course as delegation, not prompting trivia
Use language from the source materials: "A workforce, not a tool" and "You are the architect, not the operator"
Student outputs:
Verbal check-in: one repetitive task they wish they could hand off

0:10-0:30 | Live contrast demo: smartest assistant vs. dumbest agent

Activity: Demonstrate the same problem in two modes
Mode A: single-turn assistant response
Mode B: multi-step agent workflow with a defined goal and tool use
Recommended demo task:
"Prepare a one-page brief on a topic, with sources, summary, and next actions saved in a structured format"
Facilitator moves:
Narrate the difference between answering and executing
Make the invisible visible: point out planning, tool use, verification, and completion criteria
Ask students which version they would trust with repeated work
Student outputs:
Quick written comparison: three differences between assistant mode and agent mode

0:30-0:50 | Mini-lesson: anatomy of an agent

Activity: Direct instruction with examples
Content:
Goal definition
Tool access
Memory/context
Feedback loop
Termination condition
Facilitator moves:
Show one bad example with a vague goal and one improved version
Emphasize that most failures start with unclear success criteria
Student outputs:
Fill in a simple agent anatomy worksheet for a sample use case

0:50-1:15 | Exercise: Find the agent opportunities

Activity: Students audit the last 24 hours of their life or school/workflow
Prompt:
What did you do repeatedly?
What did you do by pattern?
What consumed time but not deep judgment?
What could be delegated if the delegate understood your instructions?
Facilitator moves:
Push students away from generic ideas toward concrete workflows
Help students distinguish "hard because unclear" from "hard because deeply human"
Encourage boring candidates such as organizing, summarizing, triaging, tracking, formatting, and routine research
Student outputs:
A labeled list of tasks under three columns:
Manual only
Delegatable to an AI agent
Human judgment required

1:15-1:35 | Pair share and shortlist

Activity: Students explain one top opportunity to a partner
Facilitator moves:
Instruct peers to challenge vagueness
Require each student to turn a broad idea into a concrete workflow
Ask, "What would count as done?"
Student outputs:
Top 3 candidate agent ideas ranked by feasibility and usefulness

1:35-1:55 | Tool overview and path selection

Activity: Brief introduction to available build paths
Tracks:
Track A: ChatGPT or Claude only
Track B: n8n plus LLM
Track C: LangFlow or more technical visual builder
Facilitator moves:
Recommend the simplest tool path that can solve the student's problem
Normalize choosing a low-complexity stack
Student outputs:
Selected build path
One-sentence problem statement for tomorrow

1:55-2:00 | Exit ticket

Student outputs:
"My agent will help me ___ by ___"
"The biggest risk is ___"

Artifacts produced

Agent Opportunity Audit
Initial problem statement
Build-path choice

Facilitator prep

Preload assistant-vs-agent demo
Prepare worksheet for agent anatomy
Prepare example workflows at three difficulty levels

Module 2 / Session 2

Session title: Designing the Agent

Day and duration: Day 2, 2 hours

Session outcomes

Students can decompose a goal into agent-executable steps
Students can draft a system prompt, task prompt, output format, and guardrails
Students can define where human review belongs in their workflow

Session agenda

0:00-0:15 | Warm start and idea refinement

Activity: Review yesterday's shortlisted agent ideas
Facilitator moves:
Have each student state their chosen workflow in one sentence
Force specificity: input, transformation, output, and user value
Student outputs:
Final selected project idea

0:15-0:40 | Lesson: goal decomposition

Activity: Direct instruction plus worked example
Content:
Task vs. goal
Sequential steps
Decision points
What the human does vs. what the agent does
Where failure is most likely
Facilitator moves:
Demonstrate a bad decomposition that skips inputs, checks, or outputs
Model a good decomposition with explicit checkpoints
Student outputs:
Notes on decomposition pattern

0:40-1:05 | Exercise: Goal Autopsy

Activity: Students map their selected problem into steps
Required outputs in the map:
Trigger/input
Processing steps
Tools required
Human review points
Final output
Stop condition
Facilitator moves:
Ask "Can an AI actually do this step with the tools you chose?"
Mark any steps that are too ambiguous or too broad
Encourage scope cuts if a student has more than 5-7 major steps
Student outputs:
Agent workflow map

1:05-1:25 | Lesson: prompt design for agents

Activity: Direct instruction and prompt teardown
Content:
Role definition
Goal statement
Constraints
Output format
Tool use instructions
Error handling
Escalation rules
Facilitator moves:
Show one weak prompt and one strong prompt
Emphasize structure over clever wording
Teach students to specify what the agent should do when uncertain
Student outputs:
Prompt template draft

1:25-1:50 | Hands-on workshop: Agent Blueprint

Activity: Students complete a build-ready blueprint
Blueprint sections:
Project name
User problem
Success definition
Input format
Step-by-step workflow
Tools
Prompt set
Guardrails
Failure modes
Test cases
Facilitator moves:
Review for feasibility, not elegance
Require at least two test cases and one likely failure case
Student outputs:
Version 1 Agent Blueprint

1:50-2:00 | Build readiness check

Activity: Rapid desk checks or pair reviews
Facilitator moves:
Verify each student can answer:
What starts the workflow?
What tools are used?
What does "done" look like?
What could fail first?
Student outputs:
Build-ready approval or scoped-down revision

Artifacts produced

Goal Autopsy map
Agent Blueprint v1
Prompt set v1
Test case set

Facilitator prep

Blueprint template
Prompt template
Decomposition examples for simple and complex projects

Module 3 / Session 3

Session title: Build Sprint Part 1

Day and duration: Day 3, 4 hours

Session outcomes

Students configure their chosen tools
Students build a first runnable version of their workflow
Students complete at least one end-to-end test, even if partial or fragile

Session agenda

0:00-0:20 | Sprint kickoff

Activity: Re-state build rules and ship criteria
Facilitator moves:
Set the rule: "By end of Day 4, your agent must work"
State that broken complexity is not rewarded
Ask students to define today's concrete milestone
Student outputs:
Sprint goal for the day

0:20-1:00 | Tool setup and environment readiness

Activity: Account access, workflow tool setup, folder structure, test input preparation
Facilitator moves:
Use a setup checklist projected on screen
Encourage pair troubleshooting before facilitator escalation
Offer fallback tools if account or integration issues stall progress
Student outputs:
Working tool access
Ready-to-use inputs and sample data

1:00-1:45 | Build block 1: trigger plus first action

Activity: Students create the first executable step
Facilitator moves:
Require early testing rather than planning forever
Tell students to start with the smallest useful slice
Help students remove unnecessary automation branches
Student outputs:
First functioning trigger and action

1:45-2:00 | Stand-up checkpoint

Activity: Fast progress round
Prompt:
What works?
What breaks?
What is your next smallest step?
Facilitator moves:
Identify common blockers for a mini-clinic
Student outputs:
Updated sprint plan

2:00-2:45 | Build block 2: core workflow path

Activity: Students connect the main sequence of steps
Facilitator moves:
Push students to keep one path working before adding branches
Ask for explicit output formatting
Ensure the workflow produces an observable result
Student outputs:
Core workflow v1

2:45-3:15 | Mini-lesson: prompt iteration under failure

Activity: Short intervention based on real blockers in the room
Common topics:
Vague prompts causing messy outputs
Missing required fields
Tool mismatch
Overly large scope
Facilitator moves:
Use student examples with permission
Model one prompt change and one workflow change
Student outputs:
Revised prompt set

3:15-3:50 | Build block 3: first real run

Activity: Students execute one real workflow run against a real or sanitized task
Facilitator moves:
Require capture of evidence: screenshots, logs, outputs
Ask students to annotate what failed and why
Student outputs:
First real run evidence
Failure notes

3:50-4:00 | Close and next-step commitment

Student outputs:
Written plan for Day 4:
one bug to fix
one feature to cut or simplify
one success criterion for tomorrow

Artifacts produced

Workflow v1
Run evidence set
Failure log v1
Revised prompt set

Facilitator prep

Setup checklist
Troubleshooting board for common errors
Fast fallback exercise for students blocked by tool access

Module 3 / Session 4

Session title: Build Sprint Part 2

Day and duration: Day 4, 4 hours

Session outcomes

Students improve reliability and handle at least one edge case
Students test the workflow with multiple inputs
Students reach a demonstrable working version

Session agenda

0:00-0:15 | Sprint reset

Activity: Review Day 3 evidence and set shipping targets
Facilitator moves:
Have students define the minimum viable working agent
Make students cut optional features before resuming build
Student outputs:
Final ship target

0:15-1:15 | Build block 1: stabilize the happy path

Activity: Students fix the most important broken step
Facilitator moves:
Keep them on the main path until it succeeds consistently
Ban feature creep during this block
Student outputs:
Stable happy-path run

1:15-1:35 | Lesson: common agent failure modes

Activity: Targeted instruction
Failure modes:
Goal drift
Infinite loops
Hallucinated tools
Premature termination
Output not matching user need
Facilitator moves:
Tie each failure mode to what students are seeing in the room
Give one concrete fix pattern for each
Student outputs:
Self-diagnosis of likely failure mode

1:35-2:20 | Build block 2: add one control mechanism

Activity: Students improve reliability by adding one of:
Validation step
Human review checkpoint
Explicit stop condition
Required output schema
Retry or fallback instruction
Facilitator moves:
Require students to articulate why the control matters
Favor simple validation over complex autonomy
Student outputs:
Workflow v2 with one reliability feature

2:20-2:35 | Break and peer debug

Activity: Students pair up and test each other's agent
Facilitator moves:
Tell peers to break the workflow, not praise it
Require specific bug reports
Student outputs:
Peer bug report

2:35-3:20 | Build block 3: edge case plus second test

Activity: Students run at least two test cases and one edge case
Facilitator moves:
Require evidence for each run
Push students to test the exact scenario they fear most
Student outputs:
Test log with results for:
Test Case 1
Test Case 2
Edge Case

3:20-3:45 | Documentation block

Activity: Students prepare a compact project record
Required contents:
What the agent does
Input
Output
Tools used
Prompt summary
Known limitations
What still fails
Facilitator moves:
Stress honesty over polish
Explain that documentation is part of real agent design
Student outputs:
Agent project sheet

3:45-4:00 | Ship review

Activity: Instructor checks if the agent is demo-ready
Facilitator moves:
Sort students into:
working
almost working
needs rescue scope cut
Student outputs:
Demo readiness status

Artifacts produced

Workflow v2 or working agent
Reliability enhancement
Test log with multiple runs
Peer bug report
Agent project sheet

Facilitator prep

Failure mode examples
Peer debug template
Demo readiness checklist

Module 4 and Module 5 / Session 5

Session title: Evaluate, Demo, and Roadmap

Day and duration: Day 5, 2 hours

Session outcomes

Students can evaluate whether their agent actually solves the intended problem
Students can explain failure modes and future improvements
Students present a working or near-working agent with evidence

Session agenda

0:00-0:20 | Lesson: the evaluation problem

Activity: Short instruction
Content:
Why plausible outputs are not enough
What counts as evidence of success
When to use human review
How to detect goal drift and premature completion
Facilitator moves:
Contrast "looks smart" with "completed the job"
Show one example of output that is polished but wrong
Student outputs:
Personal checklist: how I know my agent works

0:20-0:40 | Exercise: Agent Surgery

Activity: Diagnose a broken workflow
Facilitator moves:
Give students a pre-made failing agent example
Ask them to name the failure mode, likely cause, and fix
Student outputs:
Short diagnostic response

0:40-1:30 | Demo day

Activity: 5-minute presentations with evidence
Student demo structure:
Problem
Workflow
Live run or recorded run evidence
What failed and what changed
What comes next
Facilitator moves:
Keep time aggressively
Require honesty about limitations
Ask one question about evaluation or reliability for each presenter
Student outputs:
Demo presentation

1:30-1:45 | Peer feedback

Activity: One specific strength and one specific improvement per presenter
Facilitator moves:
Ban empty praise
Encourage comments on scope, clarity, and reliability
Student outputs:
Written peer feedback

1:45-2:00 | Closing: the agent operating system

Activity: Reflection and next-step roadmap
Facilitator moves:
Frame each shipped agent as a reusable capability
Encourage weekly agent sprints and monthly tool reviews
Name the next frontier: multi-agent systems, agent-to-agent calls, feedback-driven improvement
Student outputs:
Next-version roadmap
Reflection on architect vs. operator mindset

Artifacts produced

Evaluation checklist
Agent Surgery response
Demo recording or live presentation
Peer feedback set
Next-version roadmap

Facilitator prep

Broken-agent exercise
Demo timer and order
Closing reflection prompt

5. Assignments and Artifacts

Assignment 1: Agent Opportunity Audit

When: Day 1

Purpose: Identify viable agent opportunities from the student's real life

Submission requirements

Minimum 10 tasks from the student's recent routine
Each labeled as manual, delegatable, or human-judgment-heavy
Top 3 agent opportunities ranked with a brief reason

Artifact produced

Opportunity audit sheet

Assignment 2: Goal Autopsy and Agent Blueprint

When: Day 2

Purpose: Turn one selected opportunity into a buildable workflow

Submission requirements

One clear problem statement
Success definition
Stepwise workflow map
Tool list
Prompt set
Guardrails
At least 2 normal test cases and 1 failure or edge case

Artifact produced

Agent Blueprint v1

Assignment 3: Build Sprint Log

When: Days 3-4

Purpose: Capture implementation progress, test evidence, and iteration decisions

Submission requirements

Date and version markers
Screenshots or logs from at least 3 runs
Notes on prompt or workflow changes
At least 2 failures documented with cause and attempted fix
Clear statement of what was cut or simplified

Artifact produced

Build Sprint Log

Assignment 4: Working Agent and Project Sheet

When: Day 4 end

Purpose: Produce a usable agent with enough documentation for evaluation and demo

Submission requirements

The actual workflow or reproducible setup
Input example
Output example
Tool stack used
Known limitations
Instructions for how to run it

Artifact produced

Working agent or runnable workflow
Agent project sheet

Assignment 5: Demo and Reflection

When: Day 5

Purpose: Explain what was built, demonstrate reliability, and articulate next steps

Submission requirements

5-minute demo
Evidence of at least one successful run
One major failure mode encountered
One improvement planned
Reflection on what changed in the student's understanding of AI agents

Artifact produced

Demo deck or live walkthrough
Reflection note

Required end-of-course artifact bundle

Each student should leave with:

Agent Opportunity Audit
Agent Blueprint
Prompt set
Build Sprint Log
Working agent or runnable workflow
Test log with multiple runs
Agent project sheet
Demo artifact
Reflection and next-version roadmap

6. AI/LLM Grading and Assessment Framework

Assessment philosophy

The source curriculum is explicit: execution matters more than polish. Therefore the grading system must reward:

Real utility
Clear design thinking
Evidence of testing
Honest debugging
Practical scope choices

It must not over-reward:

Fancy language
Complex tooling without reliable outcomes
Ambition without execution

Recommended grading weights

Working Agent: 40%
Agent Blueprint and Goal Decomposition: 20%
Testing and Evaluation Evidence: 20%
Demo and Explanation: 10%
Peer Feedback Participation: 10%

This slightly expands the source assessment framework by splitting "Working Agent" from "Testing and Evaluation Evidence." That change is intentional because a functioning-looking workflow without credible test evidence should not receive top marks.

What LLMs should grade directly

LLMs are well suited for:

Checking completeness of written artifacts
Evaluating clarity and specificity of goals and prompts
Assessing whether workflow descriptions are coherent
Comparing student work against rubric descriptors
Producing formative feedback aligned to evidence

What LLMs should not decide alone

Human review is required or strongly recommended for:

Whether the agent actually ran successfully when evidence is ambiguous
Whether screenshots or logs are authentic
Safety concerns or inappropriate automation choices
Final grade overrides in borderline cases

Submission package for LLM evaluation

To evaluate consistently, the evaluator should receive:

Student identifier
Agent Opportunity Audit
Agent Blueprint
Prompt set
Build Sprint Log
Test log
Agent project sheet
Demo summary or transcript
Run evidence excerpts

Core LLM evaluation heuristics

The evaluator should inspect the following:

1. Problem clarity

Is the problem concrete, real, and narrow enough to build in one week?
Does the submission name a user, input, process, and output?

2. Agent suitability

Is the workflow actually a candidate for delegation?
Does the student distinguish agent work from human judgment?

3. Workflow coherence

Do the steps logically connect?
Are tools matched to tasks?
Is there a clear start and stop condition?

4. Prompt quality

Does the prompt specify role, goal, constraints, and output format?
Does it say what to do when uncertain or when a tool fails?

5. Evidence of execution

Is there proof of actual runs?
Are outputs tied to inputs?
Is at least one result demonstrably successful?

6. Reliability and testing

Did the student test more than once?
Did they include an edge case?
Did they use validation, guardrails, or human review appropriately?

7. Debugging quality

Did the student identify real failure modes?
Did they respond with specific changes rather than vague complaints?

8. Reflection and transfer

Can the student explain what they learned about agent design?
Can they name a credible next iteration?

Concrete assessment heuristics for LLM scoring

The LLM should apply these rules:

Score down if the "agent" is just a one-off chat answer with no multi-step workflow, no defined outcome, and no repeatable process.
Score down if the project goal remains broad, such as "help me with school," without a bounded workflow.
Score down if the artifact bundle lacks evidence of more than one test.
Score down if the student cannot state what success looks like.
Score down if tools are named but not actually used in the described workflow.
Score down if the output is polished but the student provides no failure analysis.
Score up if the student made smart scope cuts that increased reliability.
Score up if guardrails, validation, or review checkpoints are intentionally placed.
Score up if the student demonstrates awareness of where the agent should stop and hand back to a human.
Score up if the workflow is simple, repeatable, and clearly useful.

Pass threshold guidance

Pass / proficient baseline

The student built a repeatable workflow that solves one real task at least once with evidence
The student can explain its logic and limitations
The student has documented at least one iteration based on failure

Strong pass / distinction

The workflow succeeds across multiple runs
The project is well scoped and clearly useful
The student shows thoughtful evaluation and reliability improvements

Needs revision

The workflow is mostly conceptual or incomplete
Evidence is missing or weak
The student confuses an assistant output with an agent workflow

7. Rubrics, Scoring Criteria, and Evaluator Prompt Guidance

Rubric overview

Use a 4-point scale for each criterion:

4 = Exceeds
3 = Meets
2 = Approaching
1 = Not yet

Criterion A: Problem Selection and Scope

Problem is real, specific, valuable, and appropriately scoped for one week
Student clearly identified what the agent should and should not do

Problem is relevant and mostly well scoped
Minor ambiguity remains, but build target is clear

Problem is somewhat vague or slightly too broad
Scope required facilitator intervention to become buildable

Problem remains abstract, unrealistic, or not suited to agent delegation

Criterion B: Workflow and Goal Decomposition

Workflow is explicit, stepwise, and feasible
Human vs. agent responsibilities are clearly separated
Stop condition is defined

Workflow is coherent with only minor gaps
Most steps are feasible and connected

Workflow has important missing transitions, unclear steps, or tool mismatches

Workflow is fragmented, implausible, or cannot be followed

Criterion C: Prompt and Guardrail Design

Prompts are structured, precise, and include role, goal, constraints, outputs, and error handling
Guardrails meaningfully reduce failure risk

Prompts are generally strong but miss one important element
Guardrails exist but may be light

Prompts are partially useful but vague, underspecified, or inconsistent

Prompts are generic, incomplete, or unusable for reliable execution

Criterion D: Working Agent Execution

Agent completes the intended task reliably across multiple runs
Evidence clearly links inputs, process, and outputs

Agent completes the intended task at least once and mostly works
Some fragility remains

Agent partially works or only works with heavy intervention

Agent does not demonstrate successful task completion

Criterion E: Testing and Evaluation

Student ran multiple tests, included an edge case, documented results, and added reliability controls

Student completed at least two tests and documented basic outcomes

Student tested minimally or incompletely

Testing evidence is missing or superficial

Criterion F: Debugging and Iteration

Student identified specific failure modes and made targeted, justified fixes

Student documented at least one real issue and one sensible adjustment

Student noticed problems but responses were vague or ineffective

Student provides little evidence of iteration or learning from failure

Criterion G: Demo and Explanation

Presentation is clear, concrete, honest about limits, and grounded in evidence

Presentation explains the project competently with minor gaps

Presentation is understandable but vague, overly polished, or missing key evidence

Presentation does not clearly explain the project or its outcome

Criterion H: Peer Feedback

Feedback is specific, actionable, and grounded in the peer's demonstrated workflow

Feedback is constructive and relevant

Feedback is generic or only partially useful

Feedback is missing, superficial, or non-constructive

Suggested scoring model

Recommended weights by criterion:

A: 10%
B: 15%
C: 10%
D: 25%
E: 15%
F: 10%
G: 10%
H: 5%

Evaluator prompt guidance for LLM use

Use the following operating rules when prompting an LLM evaluator:

Require the evaluator to cite evidence from the submission bundle for every score
Instruct it not to assume unprovided success evidence
Tell it to reward appropriate scoping and reliability over sophistication
Tell it to separate "good idea" from "working implementation"
Require it to flag uncertainty when evidence is incomplete

Recommended system prompt for an LLM evaluator

You are evaluating a student project for the course "AI Agent Build Lab."

Your job is to score the work against the provided rubric using only evidence present in the submission. Do not assume facts not in evidence. Do not reward polish, ambition, or advanced tooling unless the workflow actually works and the student shows proof.

Prioritize:
1. Whether the student defined a concrete problem.
2. Whether the workflow is a real agent-like delegation workflow rather than a one-off assistant answer.
3. Whether there is evidence of successful execution.
4. Whether the student tested, debugged, and improved the workflow.
5. Whether the student can explain limitations honestly.

For each criterion:
- Assign a score from 1 to 4.
- Quote or paraphrase the exact evidence that supports the score.
- State one reason the score is not higher.

Then provide:
- A weighted total score.
- A 3-5 sentence summary.
- 3 prioritized improvement actions.

If evidence is missing, say so explicitly and lower the score accordingly.
```

### Recommended user prompt template for an LLM evaluator

Evaluate the following student submission for the course "AI Agent Build Lab."

Course expectations:

Students should build a working AI agent or agent workflow that solves a real problem.
Execution matters more than polish.
Boring and working beats ambitious and broken.
Evidence of multiple test runs and at least one iteration is important.

Rubric: [paste rubric criteria and weights]

Student submission: [paste or attach artifact bundle]

Output format:

Criterion-by-criterion scores with evidence
Weighted total
Strengths
Risks or gaps
Improvement actions
Confidence level: High / Medium / Low


### Calibration guidance for evaluators

Before grading a cohort, evaluators should review three anchor examples:
- A clearly excellent simple project
- A competent but fragile project
- A polished but mostly nonfunctional project

This reduces the common LLM error of over-scoring polished language and under-valuing simple reliable systems.

## 8. Feedback Strategy: What Strong, Average, and Weak Responses Look Like and How an LLM Should Respond

### Feedback principles

Feedback should be:
- Evidence-based
- Specific
- Actionable
- Honest about what works and what does not
- Focused on the next best improvement, not generic encouragement

LLM feedback should avoid:
- Overpraising vague work
- Inventing success that is not shown
- Giving ten suggestions at once
- Criticizing ambition without offering scope-control guidance

### Strong response profile

**What strong work looks like**
- The student chose a concrete problem with real utility
- The workflow is clear and repeatable
- The prompt design is structured and constrained
- There is evidence from multiple runs
- The student can explain one or more failure modes and what changed

**How an LLM should respond**
- Acknowledge the specific strengths with evidence
- Preserve what is already working
- Suggest one or two leverage improvements, such as stronger validation or broader test coverage

**Example feedback pattern**
- "Your project is strong because the workflow is narrow, repeatable, and backed by multiple test runs. The clearest evidence is your documented input-output sequence and the edge-case test. The next improvement is to add a validation check before final output so the agent can catch incomplete results."

### Average response profile

**What average work looks like**
- The idea is good and mostly scoped
- The workflow makes sense, but evidence is limited or reliability is shaky
- The prompt or guardrails are only partially specified
- The student shows some iteration but not enough testing

**How an LLM should respond**
- Confirm what is promising
- Identify the main missing element
- Recommend a concrete next step that is feasible within a short iteration

**Example feedback pattern**
- "The project is promising because the problem is real and the workflow is understandable. The main limitation is that the evidence shows only one successful run, so reliability is still unclear. Your next step should be to run two additional tests, including one edge case, and document what fails or changes."

### Weak response profile

**What weak work looks like**
- The problem is vague or too large
- The project is mostly conceptual
- The student presents an assistant-style answer as if it were an agent
- There is little or no test evidence
- Reflection is generic and not tied to the actual build

**How an LLM should respond**
- State plainly what is missing
- Avoid demoralizing language
- Recommend a scope cut and a minimum viable version
- Give a short path back to passing work

**Example feedback pattern**
- "This submission does not yet demonstrate a working agent workflow. The biggest gap is that the artifacts describe what the system should do, but do not show a repeatable multi-step run with evidence. To reach a passing standard, narrow the project to one task, define the exact input and output, run it twice, and document one change you made after a failure."

### LLM response structure for formative feedback

For each student, the LLM should respond in this order:

1. **Current status**
   - One sentence: strong, developing, or not yet meeting expectations

2. **What is working**
   - Two or three evidence-based observations

3. **What is limiting performance**
   - One to three specific gaps

4. **Best next move**
   - The single highest-leverage improvement

5. **If there is time for one more improvement**
   - One optional secondary action

### Tone guidance for LLM feedback

The tone should be:
- Direct
- Specific
- Non-patronizing
- Grounded in evidence

The tone should not be:
- Hype-heavy
- Vague
- Overly harsh
- Generic praise followed by generic critique

### Instructor use of LLM feedback

Facilitators should use LLM feedback as:
- A first-pass evaluation aid
- A consistency tool across multiple student projects
- A way to generate draft written comments quickly

Facilitators should still review:
- Borderline grades
- Cases with unclear evidence
- Cases involving safety or privacy concerns

## Recommended Implementation Notes

- Build templates should be prepared before the course starts: opportunity audit, blueprint, build log, test log, peer bug report, project sheet, and demo prompt.
- Facilitators should maintain a visible "common failure modes" board during Days 3-4.
- Every student should be pushed to save evidence as they go; otherwise demo day becomes storytelling rather than proof.
- If a student is far behind by Day 4, the required intervention is scope reduction, not motivational coaching.
- The instructor demo should use a real problem and include at least one visible failure plus iteration, so students see debugging as normal.