DSPy
Declarative framework for programming foundation models
- Best for
- Teams with eval sets who want to stop hand-tuning prompts
- Pricing
- Free and open-source (MIT)
- Stack layer
- AI Stack

Live preview of dspy.ai
Overview
DSPy treats prompts the way ML treats model weights — as something you should optimize automatically against a metric, not hand-tune. You declare modules (Predict, ChainOfThought, ReAct) with typed signatures, write a metric, and DSPy compiles the best prompts and few-shot examples for your data.
The win shows up on hard tasks where prompt quality moves accuracy more than 10 points. We use DSPy when the use case has a clear eval set — classification, extraction, QA — and brittle prompts have been a recurring pain.
It's not for every project. If you don't have an eval set, DSPy has nothing to optimize against.
Key Features
Declarative Modules
Predict, ChainOfThought, ReAct, ProgramOfThought
Typed Signatures
Define inputs and outputs as Python types
Prompt Optimizers
BootstrapFewShot, MIPRO, COPRO, and more
Metric-Driven
Compile against your eval metric, not hunches
Model Agnostic
Works with any LLM provider or local model
Stanford Backed
Active research from the Stanford NLP group
Why We Recommend DSPy
Once you have evals, DSPy turns prompt engineering into an actual optimization problem. The results outperform manual tuning consistently.