Claude Fable 5$22.000/MClaude Opus 4.8$11.000/MClaude Opus 4.7$11.000/MClaude Opus 4.6$11.000/MClaude Opus 4.5$33.000/MClaude Sonnet 3.7$6.600/MClaude Opus 3$33.000/MClaude 2.1$12.800/MClaude 2$12.800/MGPT-5.5$12.500/MGPT-5.2$5.425/MGPT-5.2-Codex$5.425/MGPT-5$3.875/MGPT-4.5$97.500/MGPT-4 Turbo Preview$16.000/MGPT-4$39.000/MGPT-4-32k$78.000/Mo3$19.000/Mo3-mini$2.090/Mo4-mini$2.090/Mo1$28.500/Mo1-mini$5.700/Mo1-preview$28.500/MGemini 3.5 Pro$5.000/MGemini 3.1 Pro$5.000/MGemini 3 Pro$5.000/MGemini 2.5 Pro$3.875/MGemini 1.5 Pro$2.375/MGemini 1.0 Ultra$12.000/MGemini 1.0 Pro$0.800/MClaude Fable 5$22.000/MClaude Opus 4.8$11.000/MClaude Opus 4.7$11.000/MClaude Opus 4.6$11.000/MClaude Opus 4.5$33.000/MClaude Sonnet 3.7$6.600/MClaude Opus 3$33.000/MClaude 2.1$12.800/MClaude 2$12.800/MGPT-5.5$12.500/MGPT-5.2$5.425/MGPT-5.2-Codex$5.425/MGPT-5$3.875/MGPT-4.5$97.500/MGPT-4 Turbo Preview$16.000/MGPT-4$39.000/MGPT-4-32k$78.000/Mo3$19.000/Mo3-mini$2.090/Mo4-mini$2.090/Mo1$28.500/Mo1-mini$5.700/Mo1-preview$28.500/MGemini 3.5 Pro$5.000/MGemini 3.1 Pro$5.000/MGemini 3 Pro$5.000/MGemini 2.5 Pro$3.875/MGemini 1.5 Pro$2.375/MGemini 1.0 Ultra$12.000/MGemini 1.0 Pro$0.800/M

ReplicateFrontier

Llama 3.1 405B (Rep)

Name: Llama 3.1 405B (Rep)
Brand: Replicate
Price: 9.500000 USD

Serverless

Largest dense Llama. Used as a quality benchmark for open weights — heavy to run, often hosted via Together / DeepInfra / Cerebras.

Llama 3.1 405B (Rep) is a frontier AI model from Replicate. It costs $9.500 per million input tokens and $9.500 per million output tokens (blended $9.500/M), with a 128K-token context window.

Profile inherited from upstream Llama 3.1 405B ↗ — this is a hosted variant of the same open-weights model.

Modalities textOfficial model page ↗Provider pricing ↗API docs ↗Compare with another model →Estimate monthly cost →

INPUT

$9.500/M

per million input tokens

OUTPUT

$9.500/M

per million output tokens

BLENDED 70/30

$9.500/M

default reference rate · how it's calculated →

CONTEXT

128K

128,000 tokens

What it's good at

Top open-weights quality
Permissive license
Strong reasoning

Typical use cases

Open-weights quality ceiling
Synthetic data generation
Distillation source

Benchmarks

vs. best public score

Scores inherited from Llama 3.1 405B — this is a hosted variant of the same open-weights model, so the underlying benchmark scores are identical.

MMLU88%

Multitask academic knowledge across 57 subjects.

GPQA Diamond51%

Graduate-level science questions, "Google-proof".

MATH73%

High-school competition math problems.

HumanEval89%

Python function synthesis from docstrings.

LMArena Elo1290 Elo

Crowd-sourced head-to-head preference Elo rating.

Hand-curated from each provider's published reports and public leaderboards. Methodology varies across sources — treat as directional rather than authoritative.

How much does Llama 3.1 405B (Rep) cost?

Llama 3.1 405B (Rep) costs $9.500 per million input tokens and $9.500 per million output tokens, for a blended reference rate of $9.500 per million tokens.

What is Llama 3.1 405B (Rep)'s context window?

Llama 3.1 405B (Rep) supports up to 128K tokens of context (128,000 tokens).

What is Llama 3.1 405B (Rep) best for?

Llama 3.1 405B (Rep) is well suited to Top open-weights quality, Permissive license and Strong reasoning.

Who makes Llama 3.1 405B (Rep)?

Llama 3.1 405B (Rep) is developed and served by Replicate.