DSPy

Declarative framework for programming foundation models

Visit DSPy Get help integrating

Best for: Teams with eval sets who want to stop hand-tuning prompts
Pricing: Free and open-source (MIT)
Stack layer: AI Stack

dspy.ai

Live preview of dspy.ai

Overview

DSPy treats prompts the way ML treats model weights: as something you should optimize automatically against a metric, not hand-tune. You declare modules (Predict, ChainOfThought, ReAct) with typed signatures, write a metric, and DSPy compiles the best prompts and few-shot examples for your data.

The win shows up on hard tasks where prompt quality moves accuracy more than 10 points. We use DSPy when the use case has a clear eval set (classification, extraction, QA) and brittle prompts have been a recurring pain.

It's not for every project. If you don't have an eval set, DSPy has nothing to optimize against.

Key Features

Declarative Modules

Predict, ChainOfThought, ReAct, ProgramOfThought

Typed Signatures

Define inputs and outputs as Python types

Prompt Optimizers

BootstrapFewShot, MIPRO, COPRO, and more

Metric-Driven

Compile against your eval metric, not hunches

Model Agnostic

Works with any LLM provider or local model

Stanford Backed

Active research from the Stanford NLP group

Why We Recommend DSPy

Once you have evals, DSPy turns prompt engineering into an actual optimization problem. The results outperform manual tuning consistently.