Llama 3.3 70B by Cerebras

The fastest tokens/sec available free - wafer-scale inference. OpenAI-compatible.

Access

Free API tier

Free limits

~14k tok/min

Modality

text

Credit card

Not required

Commercial use

Allowed

Model ID

llama-3.3-70b

Base URL

https://api.cerebras.ai/v1

Last verified

June 2026

How to use Llama 3.3 70B

The fastest tokens/sec available free - wafer-scale inference. OpenAI-compatible.

Quickstart

curl https://api.cerebras.ai/v1/chat/completions \
  -H "Authorization: Bearer $KEY" \
  -d '{"model":"llama-3.3-70b","messages":[{"role":"user","content":"hi"}]}'

Provider docs Get a free key Compare all 297 free models

Frequently asked

Is Llama 3.3 70B free?

Yes. Cerebras offers it as Free API tier with these limits: ~14k tok/min. No credit card is required.

Can I use Llama 3.3 70B commercially?

Yes, commercial use is allowed. Verify the current license or terms before shipping.

How do I start using Llama 3.3 70B?

The fastest tokens/sec available free - wafer-scale inference. OpenAI-compatible.

Related free models

Kimi K2.6 (Ollama Cloud) · Free tier
Gemini 2.5 Flash (Google AI Studio) · Generous · no card
GLM 4.6 (Z.ai) · Self-host free
@cf/openai/gpt-oss-120b (Cloudflare Workers AI) · 10K neurons/day (shared)
bytedance-seed/dola-seed-2.0-pro:free (Kilo Code) · ~200 req/hr
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b (Cloudflare Workers AI) · 10K neurons/day (shared)

Want this wired into your business?

We build production automations and agents on free and paid models, picked for your workload and budget.

Book a build call