Book a Free Strategy Call
Skip the read — talk to Walid in 30 min.
Free strategy call. We map your AI engineering team, you keep the notes.
Claude Fable 5 API Tutorial: Python, TypeScript, and Streaming Examples (2026)
This is the practical tutorial for calling Claude Fable 5 from your own code. We cover the minimal "hello world" in Python and TypeScript, streaming for long generations, prompt caching for cost control, tool use, and the three gotchas that catch most first-time users.
If you're just getting set up, see our day-zero access guide for installation across Claude.ai, Claude Code, and the desktop app.
Prerequisites
- An Anthropic API key — generate one at console.anthropic.com
- Python 3.10+ or Node.js 18+
- The official SDK:
- Python:
pip install anthropic - Node:
npm install @anthropic-ai/sdk
- Python:
Set your API key in your shell:
export ANTHROPIC_API_KEY="sk-ant-..."
Minimal Python Example
import anthropic
client = anthropic.Anthropic() # picks up ANTHROPIC_API_KEY from env
message = client.messages.create(
model="claude-fable-5",
max_tokens=4096,
messages=[
{
"role": "user",
"content": "Refactor this 200-line module into 4 testable units. Return a single diff."
}
]
)
print(message.content[0].text)
print(f"\nUsage: {message.usage.input_tokens} in / {message.usage.output_tokens} out")
That's a complete working integration. Three things worth noting:
model="claude-fable-5"— the exact model ID. If you get a 404, your API tier may not have access yet; check the models list endpoint to confirm what your org sees.max_tokens=4096— this is a budget cap on the response. Set it explicitly; Fable 5 will gladly write 30K tokens if you don't.message.usage— log this. Tracking input/output tokens per call is the only way to catch cost regressions before your bill spikes.
Minimal TypeScript Example
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const message = await client.messages.create({
model: "claude-fable-5",
max_tokens: 4096,
messages: [
{
role: "user",
content: "Build a complete /pricing page in Next.js App Router that loads from a JSON config and matches our existing Tailwind theme. Return all files in a single response."
}
]
});
if (message.content[0].type === "text") {
console.log(message.content[0].text);
}
console.log(`Usage: ${message.usage.input_tokens} in / ${message.usage.output_tokens} out`);
Same shape as Python. The TypeScript SDK uses a typed content discriminated union — guard with .type === "text" before accessing .text.
Streaming (You Should Always Use This for Fable 5)
Fable 5 runs are slow. A non-streaming call to Fable 5 for a complex task means staring at a blank terminal for 60+ seconds. Use streaming. It gives you progress visibility and lets you start processing the output as it arrives.
Python streaming
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-fable-5",
max_tokens=8192,
messages=[{"role": "user", "content": "Build a complete user-authentication system with tests."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
final = stream.get_final_message()
print(f"\n\nUsage: {final.usage.input_tokens} in / {final.usage.output_tokens} out")
TypeScript streaming
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const stream = client.messages.stream({
model: "claude-fable-5",
max_tokens: 8192,
messages: [{ role: "user", content: "Build a complete user-authentication system with tests." }],
});
for await (const chunk of stream) {
if (chunk.type === "content_block_delta" && chunk.delta.type === "text_delta") {
process.stdout.write(chunk.delta.text);
}
}
const final = await stream.finalMessage();
console.log(`\n\nUsage: ${final.usage.input_tokens} in / ${final.usage.output_tokens} out`);
Streaming changes the user experience entirely for long-running Fable 5 calls. Make it the default in your integrations.
Prompt Caching (Critical for Cost)
If you're calling Fable 5 with a large system prompt or tool definitions repeatedly (e.g. an agent that runs over many turns, or a batch pipeline that processes 1,000 records), enable prompt caching. It cuts the input cost of cached tokens by ~90%.
import anthropic
client = anthropic.Anthropic()
SYSTEM_PROMPT = """You are a senior software engineer. When given a refactoring task:
1. Read the entire input carefully
2. Identify the natural seams in the code
3. Propose a clean, testable decomposition
4. Return a unified diff
... (long system prompt here) ..."""
message = client.messages.create(
model="claude-fable-5",
max_tokens=4096,
system=[
{
"type": "text",
"text": SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"} # cache this block
}
],
messages=[{"role": "user", "content": "Refactor this module: ..."}]
)
The cache_control flag tells Anthropic to cache the preceding block. Subsequent requests with the same cached block (within a 5-minute window by default) hit the cache and pay ~10% of the normal input cost for those tokens.
You can cache up to 4 blocks per request — typically: system prompt, tool definitions, large reference documents.
Tool Use (Function Calling)
Fable 5 supports tool use with the same schema as Opus 4.8 and Sonnet 4.6 — no migration needed.
import anthropic
import json
client = anthropic.Anthropic()
tools = [
{
"name": "get_weather",
"description": "Get current weather for a location.",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and state, e.g. San Francisco, CA"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
]
def get_weather(location, unit="celsius"):
# your real implementation here
return {"location": location, "temp": 22, "unit": unit, "conditions": "clear"}
# First call — model decides to use the tool
response = client.messages.create(
model="claude-fable-5",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "What's the weather in San Francisco?"}]
)
# Loop: handle tool_use → tool_result until the model is done
messages = [{"role": "user", "content": "What's the weather in San Francisco?"}]
while response.stop_reason == "tool_use":
tool_use_block = next(b for b in response.content if b.type == "tool_use")
tool_result = get_weather(**tool_use_block.input)
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use_block.id,
"content": json.dumps(tool_result)
}]
})
response = client.messages.create(
model="claude-fable-5",
max_tokens=1024,
tools=tools,
messages=messages
)
# Final text response
for block in response.content:
if block.type == "text":
print(block.text)
The pattern: model returns stop_reason: "tool_use" → you run the tool → append the result → call again. Repeat until stop_reason: "end_turn".
For multi-tool agents, the loop runs many times. Fable 5 is particularly good at long tool-use chains — that's part of what makes it a "whole-job delegation" model.
Three Gotchas That Catch First-Time Users
Gotcha 1: 404 on model: "claude-fable-5"
Two likely causes:
- Your API tier doesn't have access yet. Anthropic rolls out new models region-by-region; some orgs get access on day 0, others over the following 24–48 hours. Verify by calling the models list endpoint.
- Your SDK is out of date. Update with
pip install -U anthropicornpm install @anthropic-ai/sdk@latest. Older SDKs may have stale model registries.
Gotcha 2: Unexpected routing to Opus 4.8
Fable 5 includes safeguards that route cybersecurity and biology queries to Opus 4.8 automatically. If your prompt touches offensive security, malware, exploit development, gain-of-function research, or synthesis routes, you'll get an Opus 4.8 response back — and the response itself will note the routing.
If you're doing legitimate research in these areas, you'd need Mythos 5 access (the unrestricted variant), which is limited to vetted partners. Apply through Anthropic's enterprise contact.
Gotcha 3: Hitting max_tokens mid-response
If Fable 5's response is cut off, you'll see stop_reason: "max_tokens" in the response. The output is truncated; you didn't get the full answer.
Two fixes:
- Increase
max_tokensfor the call. Fable 5 can produce 8K–32K token responses on complex tasks. - Continue the response by sending the truncated output back as the assistant's message and asking the model to continue. This is the right approach for long-form generations where you want to chunk output (e.g. for streaming UX).
# Continue a truncated response
continued = client.messages.create(
model="claude-fable-5",
max_tokens=4096,
messages=[
original_user_message,
{"role": "assistant", "content": first_response.content}, # the truncated output
{"role": "user", "content": "continue"}
]
)
Where to Go From Here
- Anthropic API docs — full reference for messages, streaming, tool use, vision, prompt caching, and the Batch API
- Anthropic Python SDK on GitHub
- Anthropic TypeScript SDK on GitHub
- Our Fable 5 pricing breakdown — real cost per workload
- Our Fable 5 vs Opus 4.8 comparison — when to pick each
Shipping Fable 5 in Production?
Building production agents on Fable 5 means making decisions about model fallbacks (Fable → Opus when rate-limited), prompt caching strategy (which blocks to cache, when to invalidate), tool-use loop design (how many turns is too many), and observability (cost per request, p95 latency, tool-call success rate).
Getting these right is the difference between an agent that ships and an agent that burns $5K/month and silently fails on 8% of requests.
AY Automate places senior AI engineers into your team to design, build, and ship production agents on Claude Fable 5 and the broader Anthropic stack. Book a free 30-min strategy call — we'll look at your architecture and tell you where the risks are.
Book a Free Strategy Call
Building this in production?
Walid runs a 30-min call to map your AI engineering team. Free, no slides.

Adel keeps the engine running at AY Automate. He owns internal processes, team coordination, and the operational excellence that lets us ship fast for clients.
