Llama 3.3 70B by Cerebras
The fastest tokens/sec available free - wafer-scale inference. OpenAI-compatible.
Access
Free API tier
Free limits
~14k tok/min
Modality
text
Credit card
Not required
Commercial use
Allowed
Model ID
llama-3.3-70b
Base URL
https://api.cerebras.ai/v1
Last verified
June 2026
How to use Llama 3.3 70B
The fastest tokens/sec available free - wafer-scale inference. OpenAI-compatible.
Quickstart
curl https://api.cerebras.ai/v1/chat/completions \
-H "Authorization: Bearer $KEY" \
-d '{"model":"llama-3.3-70b","messages":[{"role":"user","content":"hi"}]}'Frequently asked
Is Llama 3.3 70B free?
Yes. Cerebras offers it as Free API tier with these limits: ~14k tok/min. No credit card is required.
Can I use Llama 3.3 70B commercially?
Yes, commercial use is allowed. Verify the current license or terms before shipping.
How do I start using Llama 3.3 70B?
The fastest tokens/sec available free - wafer-scale inference. OpenAI-compatible.
Related free models
- Kimi K2.6 (Ollama Cloud) · Free tier
- Gemini 2.5 Flash (Google AI Studio) · Generous · no card
- GLM 4.6 (Z.ai) · Self-host free
- @cf/openai/gpt-oss-120b (Cloudflare Workers AI) · 10K neurons/day (shared)
- bytedance-seed/dola-seed-2.0-pro:free (Kilo Code) · ~200 req/hr
- @cf/deepseek-ai/deepseek-r1-distill-qwen-32b (Cloudflare Workers AI) · 10K neurons/day (shared)
Want this wired into your business?
We build production automations and agents on free and paid models, picked for your workload and budget.
Book a build call