Day 11

Claude Certified Architect Study Guide

Day 10 Day 12

Claude Certified Architect (Foundations)

Claude Certified Architect

The complete study guide. Every domain. Every concept. Every trap.

Questions

120

Minutes

720

/ 1000 Pass Score

~85

Study Hours

“You don't need the certificate to build production-grade applications. You just need the knowledge.”

-- The only thing that matters

5 domains -- weighted by exam percentage

The exam tests 5 domains. Domain 1 (Agentic Architecture) is the heaviest at 27%. Together, these cover everything you need to architect production Claude systems.

1. Agentic Architecture & Orchestration

27% of exam -- ~23 hours study

2. Tool Design & MCP Integration

18% of exam -- ~15 hours study

3. Claude Code Configuration & Workflows

20% of exam -- ~17 hours study

4. Prompt Engineering & Structured Output

20% of exam -- ~17 hours study

5. Context Management & Reliability

15% of exam -- ~13 hours study

Exam logistics

Proctored exam. No Claude allowed during the test. You cannot use the tool you are being tested on.

Exclusive to Claude Partner Network members. $99 fee, but free for the first 5,000 partners.

But the knowledge is what matters. Whether you take the exam or not, the concepts in this guide are what separate someone who uses Claude from someone who architects production systems with it.

What you actually need to know

The exam tests 4 core technologies. If you understand these deeply, you pass.

Claude Code

CLAUDE.md hierarchy, slash commands, skills, plan mode, CI/CD with -p flag, built-in tools (Grep, Glob, Edit, Read)

Claude Agent SDK

Agentic loops, stop_reason handling, multi-agent orchestration, hooks (PostToolUse, tool call interception), session management

Claude API

Messages API, tool_use, JSON schemas, tool_choice, Batch API, structured output, validation-retry loops

Model Context Protocol (MCP)

Project-level vs user-level config, .mcp.json, environment variable expansion, community servers, custom server design

6 scenario types tested

The exam presents real-world scenarios. These are the 6 archetypes you will encounter:

1. Customer Support Resolution Agent
Agent SDK + MCP tools + escalation triggers + error propagation

2. Code Generation with Claude Code
CLAUDE.md config + plan mode + slash commands + built-in tools

3. Multi-Agent Research System
Coordinator-subagent orchestration + isolated context + explicit passing

4. Developer Productivity Tools
Built-in tools + MCP servers + tool descriptions + distribution

5. Claude Code for CI/CD
Non-interactive pipelines (-p flag) + structured output + independent review

6. Structured Data Extraction
JSON schemas + tool_use + validation-retry loops + batch processing

Domain 1 -- 27% of Exam

Agentic Architecture & Orchestration

The heaviest domain. Agentic loops, multi-agent coordination, workflow enforcement, task decomposition, and session management.

Core Concept: Agentic Loops

The agentic loop -- how agents actually work

stop_reason is the ONLY reliable termination signal

Every agent follows the same core loop. Understand this and you understand agents.

Step 1: Send a request to Claude via the Messages API.
Step 2: Inspect the stop_reason field in the response.
Step 3: If stop_reason === "tool_use": execute the tool, append results to conversation, send back.
Step 4: If stop_reason === "end_turn": the agent is finished. Exit the loop.

agentic-loop.py

# The canonical agentic loop
while True:
response = client.messages.create(
model="claude-sonnet-4-20250514",
messages=conversation,
tools=tools
)

if response.stop_reason == "end_turn":
break # Agent finished

if response.stop_reason == "tool_use":
tool_results = execute_tools(response)
conversation.append(response)
conversation.append(tool_results)

3 anti-patterns the exam will test you on

The exam will present these as plausible options. Reject all three.

1. Parsing natural language to determine loop termination. Checking if Claude says “I'm done” or “task complete.” This is unreliable.

2. Arbitrary iteration caps as primary stopping mechanism. Setting max_iterations=10 and hoping the agent finishes. Caps are safety nets, not control flow.

3. Checking for assistant text as completion indicator. Looking for the presence of text content as a signal. The only signal is stop_reason.

Multi-Agent Orchestration

Hub-and-spoke architecture

ALL communication flows through the coordinator

Multi-agent systems use hub-and-spoke architecture. There is one coordinator and multiple subagents.

Rule 1: All communication flows through the coordinator. Subagents NEVER communicate directly with each other.

Rule 2: Subagents do NOT share memory with the coordinator. They operate with isolated context.

Rule 3: Every piece of information a subagent needs must be passed explicitly in its prompt.

Coordinator Agent

owns the conversation, delegates tasks, aggregates results

Subagent A

isolated context

Subagent B

isolated context

Subagent C

isolated context

“People assume subagents share memory with the coordinator. They do not. Subagents operate with isolated context. Every piece of information must be passed explicitly in the prompt.”

-- The biggest mistake on the exam

Workflow Enforcement

Prompt-based guidance

Probabilistic -- works most of the time

Tell the agent what to do in the system prompt. It will follow instructions ~95% of the time.

Good for: formatting, style, tone, general workflow suggestions, low-stakes decisions.

Bad for: anything where failure has real consequences. Financial transactions, security checks, compliance rules.

Programmatic enforcement (hooks)

Deterministic -- works every time

Use hooks to intercept and enforce. The agent cannot bypass programmatic checks.

Good for: financial thresholds, security policies, compliance rules, data validation, access control.

Rule: If the consequence of failure is financial, security, or compliance related, ALWAYS use programmatic enforcement.

Agent SDK Hooks

PostToolUse and tool call interception

PostToolUse hooks: Run after a tool executes. Use for normalizing data formats, validating outputs, enriching results.

Tool call interception: Run before a tool executes. Use for blocking actions that violate policy. Example: block any refund > $500 without manager approval.

Decision framework: Hooks = deterministic guarantees. Prompts = probabilistic guidance.

Task Decomposition

Fixed sequential pipelines

Best for: predictable, structured tasks where steps are known upfront.

Example: Extract data from PDF, validate against schema, write to database, generate report.

Dynamic adaptive decomposition

Best for: open-ended investigation where next steps depend on what was found.

Example: Research a topic, follow interesting threads, synthesize findings.

Attention dilution problem

When you pass too many files in a single analysis pass, Claude's attention is diluted. The result: inconsistent depth.

The fix: Per-file local analysis pass + a separate cross-file integration pass. Two passes always beats one pass.

Session State Management

Session management options

--resume: Continue a named session. All previous context is preserved.

fork_session: Create an independent branch from a shared baseline.

Fresh start with summary injection: Start a new session but inject a summary of previous work.

Stale context problem: When resuming, files may have changed. You must inform the agent of SPECIFIC file changes.

What to build for this domain

A multi-tool agent with 3-4 MCP tools, proper stop_reason handling, a PostToolUse hook normalizing data formats, and a tool call interception hook blocking policy violations. If you can build this, you understand Domain 1.

Real Code: Agent SDK

agent-sdk-loop.py

# The canonical agentic loop (Agent SDK)
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions

async def main():
async for message in query(
prompt="Find and fix the bug in auth.py",
options=ClaudeAgentOptions(
allowed_tools=["Read", "Edit", "Bash"]
),
):
if hasattr(message, "result"):
print(message.result)

asyncio.run(main())

hook-example.py

# PostToolUse hook: log every file change
async def log_file_change(input_data, tool_use_id, context):
file_path = input_data.get("tool_input", {}).get("file_path", "unknown")
with open("./audit.log", "a") as f:
f.write(f"{datetime.now()}: modified {file_path}\n")
return {}

Practice Scenarios -- Domain 1

PRACTICE SCENARIO

A developer's agent sometimes terminates prematurely because they check if response.content[0].type === 'text' to determine completion. What is the bug?

ANSWER

The model can return text alongside tool_use blocks. Checking for text content is not a reliable termination signal. Use stop_reason === 'end_turn' as the ONLY reliable termination signal.

PRACTICE SCENARIO

A multi-agent research system produces a report on 'renewable energy' that only covers solar and wind, missing geothermal, tidal, and biomass. Where is the root cause?

ANSWER

The coordinator's task decomposition is too narrow. The failure is in the coordinator, not the subagents. The coordinator must decompose research scope to cover all major subtopics before delegating.

PRACTICE SCENARIO

Production data shows that in 8% of cases, a customer support agent processes refunds without verifying account ownership. What is the correct fix?

ANSWER

Programmatic prerequisite gate (hook) that blocks the refund tool until account verification is complete. NOT enhanced prompts, NOT few-shot examples, NOT routing classifiers. Financial operations require deterministic enforcement.

Study Prompt -- Domain 1

Paste this into Claude to study Domain 1

I am studying for the Claude Certified Architect exam. Quiz me on Domain 1: Agentic Architecture and Orchestration (27% of exam).

Test me on these specific concepts:
1. The agentic loop: stop_reason as the ONLY reliable termination signal. Why checking for text content or parsing "I'm done" are anti-patterns.
2. Multi-agent orchestration: hub-and-spoke architecture, coordinator-subagent communication, isolated context (subagents do NOT share memory).
3. Workflow enforcement: prompt-based guidance (probabilistic) vs programmatic hooks (deterministic). When to use each. Financial/security = always hooks.
4. Agent SDK hooks: PostToolUse (data normalization, validation) vs tool call interception (policy enforcement, blocking).
5. Task decomposition: fixed sequential pipelines vs dynamic adaptive decomposition. Attention dilution and per-file analysis passes.
6. Session management: --resume, fork_session, fresh start with summary injection, stale context.

Present scenarios and ask me to identify the correct architectural decision. After I answer, tell me if I am right or wrong and explain why. Focus on the traps: things that sound plausible but are wrong.

Domain 2 -- 18% of Exam

Tool Design & MCP Integration

Tool descriptions, error responses, tool distribution, MCP server configuration, and built-in tools.

Tool descriptions -- the #1 selection mechanism

Vague descriptions = misrouting. Fix descriptions first.

Tool descriptions are THE mechanism Claude uses for tool selection. When Claude picks the wrong tool, the fix is almost always a better description.

Structured Error Responses

4 error categories -- know which is retryable

Transient Timeouts, service unavailability
Strategy: retry with backoff.

Validation Invalid input, wrong format, missing fields
Strategy: fix the input, retry.

Business Policy violations, insufficient funds
Strategy: NOT retryable. Needs alternative workflow.

Permission Access denied, unauthorized
Strategy: escalation needed.

Critical distinction: access failure vs valid empty result

When a tool returns nothing, the agent must distinguish between “I couldn't access the data” (error) and “the data genuinely doesn't exist” (valid result). The exam tests this distinction explicitly.

Tool Distribution & MCP

Too many tools = degraded selection

18 tools on one agent is an anti-pattern

At ~18 tools, Claude's tool selection accuracy degrades. Optimal: 4-5 tools per agent.

Project-level: .mcp.json

Lives in the project root. Version controlled. Shared with the team. Use ${GITHUB_TOKEN} expansion syntax for secrets.

User-level: ~/.claude.json

Lives in your home directory. NOT version controlled. NOT shared. Exam trap: New team member missing tools? Check if they're in user-level config.

Built-in Tools

Know the difference -- the exam tests this

Grep: Searches file CONTENTS for patterns.

Glob: Matches file PATHS by naming patterns.

Edit: Targeted modifications to specific parts of a file.

Incremental exploration strategy: Grep entry points first, then Read to trace flows. NOT read all files upfront.

built-in-tools

# Wrong: read all files upfront
Read src/api/*.ts # wastes context on irrelevant files

# Right: incremental exploration
Grep "processPayment" # find entry points
Read src/payments/handler.ts # trace the flow

Practice Scenarios -- Domain 2

PRACTICE SCENARIO

An agent routes 'check the status of order #12345' to get_customer instead of lookup_order. Both tools have minimal descriptions. What is the first fix?

ANSWER

Better tool descriptions. Not few-shot examples, not routing classifiers. Tool descriptions are the PRIMARY selection mechanism.

PRACTICE SCENARIO

A tool returns an empty array after a customer lookup. The agent retries 3 times then escalates. But the customer account simply does not exist. What went wrong?

ANSWER

The system confuses a valid empty result with an access failure. Valid empty = the tool found nothing (this IS the answer). Access failure = the tool could not reach the source. Different handling required.

Study Prompt -- Domain 2

Paste this into Claude to study Domain 2

I am studying for the Claude Certified Architect exam. Quiz me on Domain 2: Tool Design and MCP Integration (18% of exam).

Test me on these specific concepts:
Tool descriptions as the PRIMARY selection mechanism. Why better descriptions fix misrouting before anything else.
Structured error responses: 4 error categories (transient, validation, business, permission) and which are retryable.
Access failure vs valid empty result -- the critical distinction the exam tests explicitly.
Tool distribution: why 18 tools on one agent degrades selection. Optimal is 4-5 per agent.
tool_choice parameter: "auto" vs "any" vs forced specific tool.
MCP server configuration: project-level (.mcp.json) vs user-level (~/.claude.json). Version control implications.
Community vs custom MCP servers: use community first, build custom only when needed.
Built-in tools: Grep (content search) vs Glob (path matching) vs Edit vs Read. Incremental exploration strategy.

Present scenarios and ask me to identify the correct tool design decision. After I answer, tell me if I am right or wrong and explain why.

Domain 3 -- 20% of Exam

Claude Code Configuration & Workflows

CLAUDE.md hierarchy, custom commands, skills, path-specific rules, plan mode, and CI/CD integration.

CLAUDE.md Hierarchy

3 levels -- know where each lives and who sees it

The hierarchy determines which instructions apply to whom.

1. User-level: ~/.claude/CLAUDE.md

Only you. NOT shared. NOT version controlled.

2. Project-level: .claude/CLAUDE.md

Everyone on the team. Version controlled. Shared via git.

3. Directory-level: subdirectory/CLAUDE.md

Only when working in that directory. Additive to project-level.

Exam trap: the missing instructions problem

Scenario: A new team member joins and Claude is not following conventions. If instructions are in user-level config, the new member doesn't have them. Fix: move to project-level (.claude/CLAUDE.md).

Path-Specific Rules & Plan Mode

.claude/rules/ with YAML frontmatter

Token-efficient, pattern-matched instructions

Instead of putting everything in CLAUDE.md, use path-specific rules that only load when editing matching files. A rule with paths: ["**/*.test.tsx"] only loads when editing test files.

Plan mode

Think before acting

Use when: monolith restructuring, multi-file migration, architectural decisions, ambiguous requirements.

Direct execution

Just do it

Use when: single-file bug fix, clear scope, repetitive task, small well-defined changes.

CI/CD Integration

Non-interactive mode for pipelines

The -p flag is essential -- without it, CI hangs

-p flag: Runs Claude Code in non-interactive mode.

--output-format json: Produces structured findings for downstream pipeline steps.

Independent review instance: Use a separate Claude instance for code review -- not the same session that wrote the code.

Practice Scenarios -- Domain 3

PRACTICE SCENARIO

Developer A's Claude Code follows API naming conventions perfectly. Developer B (joined last week) gets inconsistent naming. Both work on the same repo. What is the root cause?

ANSWER

The naming conventions are in user-level config (~/.claude/CLAUDE.md) instead of project-level (.claude/CLAUDE.md). User-level is not version controlled and not shared.

PRACTICE SCENARIO

A CI pipeline script 'claude Analyze this PR' hangs indefinitely. Logs show Claude waiting for input. What is the fix?

ANSWER

Add the -p flag for non-interactive (print) mode. Without -p, Claude Code expects interactive terminal input.

Study Prompt -- Domain 3

Paste this into Claude to study Domain 3

I am studying for the Claude Certified Architect exam. Quiz me on Domain 3: Claude Code Configuration and Workflows (20% of exam).

Test me on these specific concepts:
CLAUDE.md hierarchy: user-level (~/.claude/CLAUDE.md) vs project-level (.claude/CLAUDE.md) vs directory-level (subdirectory/CLAUDE.md). Who sees what. Version control implications.
Path-specific rules: .claude/rules/ with YAML frontmatter and glob patterns. Token efficiency vs directory-level CLAUDE.md.
Custom slash commands: project-scoped (.claude/commands/) vs personal (~/.claude/commands/).
Skills: SKILL.md frontmatter (context: fork, allowed-tools, argument-hint).
Plan mode vs direct execution: when to use each. Monolith restructuring vs single-file bug fix.
CI/CD integration: -p flag for non-interactive mode, --output-format json, --json-schema, independent review instances.
The "missing instructions" problem: new team member joins, Claude ignores conventions. Root cause and fix.

Present scenarios and ask me to identify the correct configuration decision. After I answer, tell me if I am right or wrong and explain why.

Domain 4 -- 20% of Exam

Prompt Engineering & Structured Output

Explicit criteria, few-shot prompting, tool_use for structured output, validation-retry loops, batch processing, and multi-instance review.

Explicit criteria -- vague instructions fail

“Be conservative” does NOT work

Vague instructions produce inconsistent results. The fix: Define exactly which issues to report vs skip. Include concrete code examples for each severity level.

Most effective technique for consistency

2-4 targeted examples for ambiguous scenarios. Each example shows REASONING for why one action was chosen over another. Include boundary cases.

Structured Output & Batch API

tool_use with JSON schemas

Eliminates syntax errors. Does NOT prevent semantic errors.

Schema compliance does not prevent fabrication. Claude can return perfectly structured JSON with made-up data. Design schemas with nullable fields and “unclear” enum values.

Batch API

50% cost savings vs synchronous. Up to 24 hours. No multi-turn tool calling. Good for bulk extraction where latency doesn't matter.

Synchronous API

Real-time results. Full multi-turn tool calling. Higher cost. Good for pre-merge checks and interactive agents.

“If developers wait for it, use synchronous. If it can run overnight, use batch.”

-- The decision rule

Practice Scenarios -- Domain 4

PRACTICE SCENARIO

A code review tool has a 97% overall accuracy rate but developers still don't trust it. Why?

ANSWER

97% overall can hide 40% error rates on specific document types. High false positive rates in one category destroy trust in ALL categories. Fix: validate accuracy by type and category.

PRACTICE SCENARIO

A manager proposes using the Batch API for all code review tasks to save costs. What is wrong with this?

ANSWER

Batch API has up to 24-hour processing and no latency SLA. Pre-merge code reviews are blocking workflows that developers wait for. Keep blocking workflows synchronous.

Study Prompt -- Domain 4

Paste this into Claude to study Domain 4

I am studying for the Claude Certified Architect exam. Quiz me on Domain 4: Prompt Engineering and Structured Output (20% of exam).

Test me on these specific concepts:
Explicit criteria: why vague instructions like "be conservative" fail. How to define concrete thresholds and examples.
Few-shot prompting: 2-4 targeted examples with reasoning. Boundary cases matter more than happy-path examples.
Structured output with tool_use: JSON schema guarantees structure but NOT semantic correctness. Schema design tips (nullable fields, "unclear" enums, "other" + detail_string).
tool_choice parameter: "auto" vs "any" vs forced specific tool. When to use each.
Validation-retry loops: send back original + error + failed attempt. Effective for format errors, ineffective for missing data.
Batch API vs synchronous: 50% savings but up to 24 hours, no multi-turn tool calling. Decision rule: blocking = sync, overnight = batch.
Multi-instance review: why self-review is less effective. Independent instance catches more. Per-file + cross-file passes.
False positive rates: 97% overall accuracy hiding 40% error in specific categories. Trust destruction.

Present scenarios and ask me to identify the correct prompt engineering decision. After I answer, tell me if I am right or wrong and explain why.

Domain 5 -- 15% of Exam

Context Management & Reliability

Context preservation, escalation triggers, error propagation, codebase exploration, and information provenance.

Context preservation -- what progressive summarization kills

Never summarize transactional data

Progressive summarization destroys critical data: dollar amounts, case IDs, timestamps, order numbers.

The fix: Maintain a persistent “case facts” block that is never summarized.

“Lost in the middle” effect: Place key summaries at the beginning, not buried in the middle.

Trim verbose tool results: Tool responses often include 90% irrelevant data.

Escalation Triggers

Valid escalation triggers

Customer requests human: Honor immediately. No negotiation.

Policy gaps: Agent encounters a situation not covered by instructions.

Inability to progress: After reasonable attempts, cannot resolve the issue.

Unreliable escalation triggers

Sentiment analysis: Sarcasm, cultural differences, and tone misreads create false triggers.

Self-reported confidence scores: Claude's self-reported confidence does not correlate well with actual accuracy.

Error Propagation & Exploration

Structured error context

Propagate: what failed, what was attempted, and any partial results.

Anti-pattern 1: Silent suppression. Catching an error and continuing as if nothing happened.

Anti-pattern 2: Workflow termination on single failure. Partial results are often still valuable.

Information provenance

Every finding must include: claim, source URL, relevant excerpt, and date accessed. When sources conflict, annotate both.

Practice Scenarios -- Domain 5

PRACTICE SCENARIO

After several conversation turns, a support agent refunds $247.83 to the wrong order because it 'remembers' $247.83 but lost the order number. What architectural fix prevents this?

ANSWER

Persistent 'case facts' block with extracted transactional data (amounts, dates, order numbers) included in every prompt. Never rely on progressive summarization for transactional data.

PRACTICE SCENARIO

A customer says 'I want to speak to a human.' The agent offers to help first and tries to resolve the issue. Is this correct?

ANSWER

No. When a customer explicitly requests a human, honor it immediately. Do NOT attempt to resolve first. This is a hard rule.

Study Prompt -- Domain 5

Paste this into Claude to study Domain 5

I am studying for the Claude Certified Architect exam. Quiz me on Domain 5: Context Management and Reliability (15% of exam).

Test me on these specific concepts:
Context preservation: progressive summarization kills transactional data. Persistent "case facts" block that is never summarized.
"Lost in the middle" effect: Claude pays more attention to beginning and end of context. Place critical data at the beginning.
Escalation triggers: valid (customer request, policy gap, inability to progress) vs unreliable (sentiment analysis, self-reported confidence).
When customer requests human: honor immediately. No negotiation.
Error propagation: structured context (what failed, what was attempted, partial results). Anti-patterns: silent suppression and full workflow termination.
Access failure vs valid empty result: timeout = error, zero rows = valid result. Different handling required.
Codebase exploration: scratchpad files, subagent delegation, /compact for context management.
Information provenance: every finding needs source URL, relevant excerpt, date accessed. Conflicting sources: annotate both.

Present scenarios and ask me to identify the correct reliability decision. After I answer, tell me if I am right or wrong and explain why.

Preparation

Your Study Plan

Three phases: Foundation, Build, Practice. Each phase is one week.

Phase 1: Foundation (Week 1)

Read and understand

- Read Anthropic's official docs on Agent SDK, MCP, Claude Code
- Understand the agentic loop (stop_reason, tool_use cycle) completely
- Set up a CLAUDE.md hierarchy on a real project
- Build your first MCP server
- Understand Grep, Glob, Edit, Read differences
- Read tool_use documentation and JSON schema validation

Phase 2: Build (Week 2)

Hands-on construction

- Build a multi-agent system with coordinator + 2 subagents
- Implement PostToolUse and tool call interception hooks
- Create a tool_use extraction pipeline with validation-retry
- Set up CI/CD with -p flag and --output-format json
- Create .claude/rules/ with path-specific patterns
- Build a skill with SKILL.md frontmatter

Phase 3: Practice (Week 3)

Test yourself

- Run through all 5 domain concepts
- For each domain: “Can I explain every concept without looking?”
- Build the capstone: a full agent system with all 5 domains
- Review anti-patterns and exam traps
- Practice classifying error types

Anthropic Academy Courses (Free)

Claude 101

anthropic.skilljar.com/claude-101

→

Building with the Claude API

anthropic.skilljar.com/claude-with-the-anthropic-api

→

Claude Code in Action

anthropic.skilljar.com/claude-code-in-action

→

Introduction to Agent Skills

anthropic.skilljar.com/introduction-to-agent-skills

→

Introduction to MCP

anthropic.skilljar.com/introduction-to-model-context-protocol

→

MCP Advanced Topics

anthropic.skilljar.com/model-context-protocol-advanced-topics

→

Official Documentation

Agent SDK Overview

platform.claude.com/docs/en/agent-sdk/overview

→

Claude Code Skills

code.claude.com/docs/en/skills

→

MCP Specification

modelcontextprotocol.io

→

Anthropic Prompt Engineering Guide

docs.anthropic.com/en/docs/build-with-claude/prompt-engineering

→

Community Study Materials

110-Question Practice Set (SGridworks)

github.com/SGridworks/claude-certified-architect-training

→

5 things to build before the exam

If you can build all 5, you know enough to pass.

Multi-tool agent with stop_reason handling + hooks

3-4 MCP tools, proper agentic loop, PostToolUse hook for data normalization, tool call interception for policy enforcement. Covers Domain 1.

MCP tools with intentionally similar descriptions

Create 2-3 tools with overlapping functionality. Practice writing descriptions that eliminate misrouting. Covers Domain 2.

Project with full CLAUDE.md hierarchy + rules + skills

User-level, project-level, and directory-level CLAUDE.md files. Path-specific rules. At least one skill with SKILL.md frontmatter. Covers Domain 3.

Extraction pipeline with tool_use + validation-retry

JSON schema for structured output, forced tool_choice, validation step, retry loop. Covers Domain 4.

Coordinator with subagents + error propagation + provenance

Coordinator delegates to 2+ subagents with isolated context. Error propagation with structured context. Research findings include source + excerpt + date. Covers Domain 5.

Practice Exam

60 multiple choice questions. One correct answer. Three distractors. Weighted by domain. Just like the real thing.

Questions

~120

Minutes

720

/ 1000 to Pass

Questions are shuffled each attempt. You cannot change an answer after selecting it.

Exam Day

Exam Day Cheat Sheet

The 10 facts to remember and the 7 anti-patterns to reject. If you know these cold, you will not be tricked.

10 key facts -- always true

Subagents do NOT share memoryEvery piece of information must be passed explicitly in the prompt.

stop_reason is the ONLY reliable termination signalNot "I'm done." Not iteration caps. Not checking for text content. Only stop_reason === "end_turn."

Financial/security = programmatic enforcementNever use prompt-based guidance for financial thresholds or security policies. Use hooks.

Tool descriptions are the PRIMARY selection mechanismWhen Claude picks the wrong tool, fix the description first.

-p flag for CI/CDWithout the -p flag, Claude Code in CI will hang waiting for interactive input.

Independent instance for code reviewSame session reviewing own output retains reasoning context and pulls punches.

Batch API: no multi-turn tool calling50% cheaper, up to 24 hours, but single request/response only.

Progressive summarization kills transactional dataMaintain a persistent "case facts" block that is never summarized.

Access failure != valid empty resultA timeout is an error. Zero rows returned is a valid result.

Few-shot examples beat prose instructions2-4 targeted examples with reasoning produce more consistent output.

7 anti-patterns -- always reject these

Parsing "I'm done" for loop terminationNatural language is unreliable for control flow. Use stop_reason.

Arbitrary iteration caps as primary stoppingCaps are safety nets, not control flow.

18 tools on one agentSelection accuracy degrades. Split into 4-5 tools per agent.

Sentiment-based escalationSentiment analysis is unreliable. Use explicit triggers.

Self-reported confidence for escalationClaude's self-reported confidence does not correlate with actual accuracy.

Silent error suppressionEmpty catch blocks, swallowed errors. Always propagate with structured context.

Reading all files upfrontWastes context window. Use incremental exploration: Grep then Read.

Quick decision matrix

When the exam gives you a scenario, use this to pick the right answer:

SCENARIO CORRECT APPROACH
--------- -----------------
Financial/security enforcement Programmatic hooks (NEVER prompts)
Tool misrouting Better descriptions (FIRST step)
CI pipeline hangs Add -p flag
Code self-review Independent instance
Batch vs sync Blocking = sync, overnight = batch
Subagent needs data Pass explicitly in prompt
Customer wants human Honor immediately
Loop termination stop_reason only
18 tools on one agent Scope to 4-5 per agent
Progressive summarization Persistent case facts block
New team member missing config Move to project-level CLAUDE.md
Ambiguous architecture decision Plan mode, not direct execution

Day 10: Founder OS — gstack: How YC’s CEO Engineers with Claude Code

Day 12: cmux Prompt Playbook - The Terminal for Agents