LangSmith API Pricing

1) What “LangSmith API Pricing” really means

LangSmith sits in the “agent engineering” category: it helps you debug and evaluate chains/agents, and (in some tiers) helps you deploy them. That means API pricing isn’t the same as “model pricing.” You are not paying LangSmith for the tokens produced by your model provider. Instead, you’re paying for the LangSmith platform’s ability to ingest, store, query, and analyze traces, and to run evaluations over time.

In practice, people use “LangSmith API pricing” to ask one of these questions:

What does it cost to use LangSmith as my tracing backend? (trace volume + retention)
What does it cost for my team to collaborate in LangSmith? (seat pricing)
What does it cost to run deployed agents through LangSmith-managed deployments? (deployment runs and uptime patterns)
What does it cost to build agents using Agent Builder? (Agent Builder runs + trace usage)
How can I budget accurately and avoid overage surprises? (sampling, retention, and governance)

Key takeaway: Treat LangSmith costs as “observability + QA infrastructure.” Your biggest levers are how much you trace and how long you keep it.

Model costs ≠ LangSmith costs

Token costs are billed by your model provider. LangSmith costs are driven by tracing volume, retention, and platform features.

“Traces” are the main meter

Most teams underestimate how quickly trace volume grows in production. Measure early and adopt sampling policies.

Retention is a multiplier

Keeping everything for months can dominate spend. Use short retention broadly, and long retention selectively.

2) Quick pricing snapshot

The following is a typical snapshot of what’s often discussed publicly for self-serve tiers. Always confirm the current details in your account.

Category	What it covers	Common self-serve pattern	What changes the most	Best cost control
Seats	Workspace users, collaboration, reviews, admin roles	Plus is per seat per month (often cited as $39/seat/month)	How many non-engineers need access	Right-size seats; use roles; keep “view-only” where possible
Traces	Trace ingest, storage, querying, and UI analytics	Monthly included traces, then pay-as-you-go per 1k traces	Traffic growth and tracing percentage	Sampling, batching, and filtering low-value traces
Retention	How long trace data is kept	Short base retention (often ~14 days) vs extended (often ~400 days)	Whether you keep long retention for too much volume	Use extended retention only for a curated subset
Deployments	Hosted agent invocations and possibly uptime	Per deployment run charges (billing docs sometimes cite $0.005/run)	How many times your product calls deployed agents	Caching, routing, consolidating agent calls
Agent Builder	Agent Builder run quotas/overages and the traces they generate	Monthly included runs, then per-run charges (plan dependent)	Experiment loops during agent development	Batch experiments; rely on offline evals; avoid repeated identical runs

Budgeting in one sentence: Seats are predictable; traces and retention create the variable part; deployments and Agent Builder matter if you use them heavily.

3) Plans: Developer vs Plus vs Enterprise

LangSmith plans are usually described in three layers: a free Developer tier (for individual development and learning), a self-serve Plus tier (for teams shipping and collaborating), and an Enterprise tier (for advanced security, governance, and support needs). Depending on the organization, “Enterprise” may also include optional self-hosting and custom retention policies.

3.1 Plan comparison (responsive)

Plan	Best for	Typical pricing model	Typical included usage	Common reasons to upgrade
Developer	Solo developers, prototypes, learning the tooling	Free	Often includes a limited monthly trace allowance (frequently described around 5k traces/month)	Need team collaboration, more traces, deployments, or org governance
Plus	Teams building and iterating quickly	Per-seat monthly fee + usage-based billing (traces, etc.)	Often includes a base trace allowance (frequently described around 10k traces/month)	Higher security needs, SSO, SLAs, custom retention, self-hosting
Enterprise	Large orgs, regulated environments, strict governance	Custom contract	Custom	SSO/SAML, advanced RBAC, audit logs, data residency, self-hosted options, support

3.2 What “self-hosted” usually means for pricing

When teams ask about self-hosting, the real requirement is usually one of these: data residency, compliance, private networking, stricter access control, or internal policy around third-party storage. In many product categories, self-hosted plans are negotiated under Enterprise because they include licensing, support, and technical requirements. If you’re preparing an internal business case, plan for a contract conversation rather than a simple public rate card.

Does self-hosting remove usage fees?

In many platforms, self-hosting changes where the service runs and how it’s licensed, but it does not automatically mean “unlimited usage.” Contracts can still define usage tiers, storage policies, and support scope. Treat it as a different commercial model rather than “free.”

4) Seats: who needs one and how to right-size collaboration

Seat pricing is the predictable portion of your budget. A “seat” typically corresponds to a user who can log into the workspace, view traces, annotate runs, manage datasets, execute evaluations, and potentially administer projects depending on their role. Seats matter most when your workflow includes reviewers and stakeholders beyond the core engineering team.

4.1 Who usually needs a seat

Engineers and ML builders

They instrument tracing, debug failures, create datasets, and iterate on prompts/tools. They nearly always need seats.

QA reviewers / annotators

They label outputs, attach feedback, validate eval results, and maintain rubrics. Seats are often worth it for quality workflows.

Product / support leads

They review outcomes, identify trends, and help prioritize improvements. Sometimes they can work from exported dashboards instead.

Admins / security

They manage org settings, retention, and access controls. These seats are few but important.

4.2 Seat optimization strategies (that don’t break your workflow)

Role-based access: give full access only to those who need it; keep others on limited roles if supported.
Export dashboards: for stakeholders who only need high-level charts, export metrics instead of giving full trace access.
Rotate reviewers: if you do periodic review sprints, you may not need permanent seats for everyone all month.
Separate workspaces: keep dev/staging review separate from prod if policy demands it, but avoid unnecessary duplication.

Tip: Seat cost is rarely the source of a surprise bill. Most surprises come from trace volume and retention decisions.

5) Traces: what counts, what scales, and how to estimate accurately

Traces are the primary usage meter for LangSmith in many billing descriptions. A trace records an execution of your app or agent: inputs, outputs, intermediate tool calls, retrieval, errors, latency, metadata, and sometimes token/cost estimates. You can think of traces as “recorded footage” of what your system did.

5.1 Root traces vs internal steps

In many common setups, pricing is based on a “root trace” (or similar unit) that represents an end-to-end user request. Inside a root trace, you may have many child runs (LLM calls, tools, retrievers). The UI shows the full tree so you can debug step-by-step. Whether billing counts only root traces or also counts certain internal spans can depend on current plan definitions and ingestion method, so always confirm how your account’s billing defines “trace units.”

Why do trace counts grow faster than you expect?

Because agent systems often do multiple LLM calls per request, retries on failure, retrieval/reranking, and tool calls. Even if you are billed mostly per root trace, the stored data volume and query load grows with complexity—so retention decisions and high sampling can cost more operationally.

5.2 The simplest forecasting method (works for almost everyone)

Count monthly user requests (or agent invocations): requests/month.
Decide what fraction you trace in production: sampling_rate (example: 0.10 for 10%).
Trace all errors at 100% (recommended): adjust for your error rate if you sample successes differently.
Compute estimated traces: traces/month ≈ requests/month × sampling_rate (plus error overrides).
Split by retention: what fraction is short vs long.

5.3 A trace sampling policy that teams actually keep

The best policy is simple enough to remember and enforce:

Dev/staging: trace 100% (you want maximum visibility while building).
Production:
- Trace 100% of errors and timeouts (always).
- Trace a small percentage of successes (start at 1–10%).
- Increase sampling temporarily during new releases (e.g., 20–50% for 24–72 hours).

Video analogy: You don’t archive every second of every rehearsal for a year. You archive enough footage to improve and to investigate. Sampling is your “which clips do we keep?” decision.

6) Retention: the hidden multiplier on trace cost

Retention is the most underestimated pricing driver. Two teams can have the same trace volume, but vastly different costs if one keeps short retention and the other keeps long retention. Long retention is valuable for compliance and investigations, but expensive if you apply it to everything.

6.1 Typical retention tiers (conceptual)

Many pricing descriptions talk about a short “base retention” window (often around two weeks) and an “extended retention” window (often around a year+). The exact numbers, availability, and costs can vary by plan and can change over time.

Retention choice	What you get	Best for	Risk if misused	Smart default
Short / base	Recent traces for debugging and iteration	Day-to-day engineering, incident triage, rapid iteration	Very low; you might lose old context if you don’t export summaries	Use for most production volume
Extended	Long-term trace availability	Audits, compliance, long investigations, longitudinal analysis	Costs explode if applied to all traffic; privacy exposure increases	Use for a curated subset only

6.2 A retention strategy that balances cost and compliance

Keep short retention for broad traffic (cheap, high volume visibility).
Promote selected traces to extended retention:
- security/compliance investigations
- high-value customer incidents
- golden evaluation traces used for long-term comparisons
Export derived metrics (counts, latency, score distributions) so you can analyze trends beyond the retention window.

What if I need long retention but want to minimize privacy risk?

Store long retention with reduced content: log metadata, costs, and outcomes, but redact or omit full user text and retrieved documents. You can keep enough information to analyze quality and incidents while limiting sensitive payload storage.

7) Deployments: run pricing, uptime patterns, and common cost traps

Deployments apply when you run agents through managed hosting features (often used for agent graphs). In this case, your cost model may include: (1) per deployment run charges, and (2) potentially uptime/instance charges depending on tier and deployment type.

7.1 Deployment runs in plain English

A deployment run is typically one complete invocation of a deployed agent—one end-to-end “session” for a request. If your app calls a deployed agent multiple times for a single user action (for example: “draft answer,” “verify answer,” “format answer”), you may be paying for multiple deployment runs per user action. That’s a common trap: deployment runs can be high even if user traffic is moderate.

7.2 Deployment billing items (responsive table)

Item	Unit	What triggers it	How it scales	Cost controls
Deployment runs	Per run	Each invocation of a deployed agent	With how often your product calls the deployed agent	Caching, routing simple queries, reducing retries, consolidating multiple agent calls
Uptime / instance (if applicable)	Per time	Keeping a deployment “warm” or always-on	With always-on hours, number of deployments, capacity settings	Autoscaling, turning off dev deployments off-hours, merging low-traffic deployments

7.3 Avoiding the “agent cascade” billing trap

In larger systems, one agent calls another agent, which calls another. This creates an invocation cascade that multiplies run counts. If you notice deployment runs growing much faster than user requests, check whether:

You trigger multiple agent invocations per request (planner → executor → verifier).
You retry entire agent executions on partial failures instead of retrying only the failing step.
You call a deployed agent for tasks that could be handled locally (formatting, trivial classification, templated responses).

Best practice: Make “runs per user request” a tracked metric. It’s the quickest way to spot runaway deployment usage.

8) Agent Builder: how building/testing contributes to cost

Agent Builder is often used to rapidly prototype and test agent behaviors. From a pricing perspective, you should think in two layers: Agent Builder activity can be limited by run quotas, and those runs can also generate traces because the platform records what happened. That means agent development loops can create real usage even before you have production users.

8.1 Development loop patterns that inflate costs

Re-running the same prompt/tool config repeatedly without saving results.
Exploring with large, high-cost models for every test case instead of a smaller model for early iteration.
Keeping long retention for experiments that don’t matter after you decide what works.
Running “manual experiments” instead of running a small offline evaluation suite.

8.2 Cost-friendly Agent Builder workflow

Use a small “starter dataset” (20–50 examples) that represents your target tasks.
Run changes against the dataset and compare results—avoid random ad hoc testing.
Keep experiments in short retention unless they are “golden references.”
When you find a failure, add it to the dataset so it doesn’t regress later.

Do Agent Builder runs count toward trace usage?

In many setups, Agent Builder runs are traced for visibility, which means they can contribute to trace usage. Check your account billing definitions to see how the platform counts them for your plan.

9) Cost controls: the levers that actually work

The most effective cost controls are not complicated. They are policies and defaults you can enforce in code and in workflow. Think of cost control as a “video production discipline”: record enough to improve, then keep only what you need long term.

9.1 Sampling and routing (primary dial)

Trace all errors, sample successes

Ensure debuggability and incident response, while keeping volume manageable in stable periods.

Feature-based sampling

Trace high-risk features more, stable features less. Raise sampling during launches, lower after stabilization.

9.2 Retention mixing (secondary dial)

Choose a retention mix: short retention for broad coverage, long retention for critical subsets. If your plan supports multiple retention tiers, you can treat extended retention like an “archive” bucket, not the default.

9.3 Content minimization (privacy and cost together)

Logging full inputs/outputs can increase both risk and volume. A common pattern is:

Store full content in dev/staging.
Store redacted/summarized content in production.
Store derived metrics everywhere: latency, model id, tool counts, success labels, evaluation scores.

9.4 “Project hygiene” to prevent accidental mixing

Many orgs accidentally send production volume to a dev project or keep dev traces for too long. Keep a strict naming convention like: product-dev, product-staging, product-prod. Also add metadata keys like env and app.version so you can filter and detect misrouted traces immediately.

Best habit: Put cost levers in code, not in human memory (sampling rate, retention choice, and content redaction).

10) Build a simple pricing calculator (transparent and accurate enough)

You can estimate your monthly spend with a straightforward calculator. The goal is not to match every billing nuance perfectly. The goal is to avoid “order-of-magnitude mistakes” and to know which dial to turn if costs rise.

10.1 Inputs

Seats: number of paid users
Requests/month: user requests or agent invocations
Sampling: fraction of requests traced (plus “trace all errors” adjustment)
Retention mix: % base vs % extended
Included traces: monthly allowance in your plan
Trace rates: your account’s per-1k trace pricing for each tier
Deployment runs: monthly deployment invocations and per-run rate
Agent Builder runs: monthly runs, included quotas, and overage rate (if applicable)

10.2 A clear formula (copy/paste friendly)

Calculator pseudo-code

# Seats
seat_cost = seats * seat_price_per_month

# Traces (estimate root traces)
estimated_traces = requests_per_month * sampling_rate
# Optional: if you trace 100% errors but sample successes, add:
# estimated_traces += requests_per_month * error_rate * (1 - sampling_rate)

base_traces = estimated_traces * base_retention_fraction
extended_traces = estimated_traces * extended_retention_fraction

# Overage logic (example)
base_overage = max(0, base_traces - included_base_traces)
trace_cost = (base_overage / 1000) * base_rate_per_1k + (extended_traces / 1000) * extended_rate_per_1k

# Deployments
deployment_cost = deployment_runs_per_month * deployment_rate_per_run

# Agent Builder (if billed separately)
agent_overage = max(0, agent_builder_runs - included_agent_builder_runs)
agent_builder_cost = agent_overage * agent_builder_rate_per_run

monthly_total = seat_cost + trace_cost + deployment_cost + agent_builder_cost

10.3 What to do if your estimate is off

If the estimate and the real bill disagree, it’s usually because your trace counting assumption is wrong or because you have multiple sources of traces: production app traces plus evaluation runs plus Agent Builder runs plus deployment runtime. The solution is simple: measure actual trace counts for one week, categorize by project/env, then scale to a month and update your calculator.

How do I measure trace volume without guesswork?

Use your LangSmith usage/billing dashboard or run queries by time range and project to count root traces. Measure for 7 days, calculate average per day, then multiply by 30. Also record the sampling rate and retention.

11) Budget examples and scenarios (useful patterns, not promises)

The examples below are designed to teach the structure of budgeting. Replace the numbers with your actual traffic and your actual billing rates. The key learning is which levers matter and how the cost changes when you change sampling or retention.

11.1 Scenario A: small team shipping an internal assistant

A team of 5 uses LangSmith to debug an internal support assistant. They trace all dev/staging requests and sample 10% of production successes, while tracing 100% of production errors. They keep base retention for most traces and promote a small set of incident traces to extended retention.

Parameter	Example value	Why	Cost lever	Risk if ignored
Seats	5	Engineers + one QA reviewer	Right-size reviewers; export summary dashboards	Low; seat costs predictable
Requests/month	15,000	Internal usage across teams	Routing and caching; reduce unnecessary requests	Higher trace volume if you trace everything
Prod sampling	10% successes + 100% errors	Keep failures visible while controlling volume	Reduce to 5% if stable; raise temporarily during releases	Cost spikes as traffic grows
Retention mix	95% base / 5% extended	Archive only incident/golden traces	Keep extended small; store only redacted content	Large long-term storage cost and privacy exposure

11.2 Scenario B: high-traffic consumer app

A consumer product has 600k requests/month. Tracing 100% of production is unnecessary and expensive. They trace 100% errors and 1–2% successes by default, then raise sampling to 20% during major releases for 48 hours. They keep base retention broadly and use extended retention only for flagged customer incidents.

Lesson: With high traffic, your sampling policy is your budget. Make it explicit, version it, and enforce it at runtime.

11.3 Scenario C: deployment-heavy architecture

A product relies on deployed agent graphs. Their trace volume is moderate because they sample, but deployment runs are high because the product calls the agent repeatedly for multi-step operations. They reduce cost by consolidating calls, caching results, and moving trivial steps out of the deployment (formatting, static rules, and post-processing).

Optional: embed a pricing walkthrough video

Replace this responsive placeholder with your own YouTube/Vimeo tutorial (or remove it).

Fast checklist to keep spend stable

Track traces/day and runs/request.
Enforce sampling in production.
Keep extended retention small.
Batch ingest and async flush.
Route simple tasks away from deployments.

12) Glossary of pricing and billing terms

Billing language can be confusing because different products use similar terms differently. This glossary is written for practical budgeting.

Term	Plain-English meaning	Why it matters for pricing	Common misunderstanding	Best practice
Seat	A user account with access to workspace features	Predictable monthly cost, scales with team size	“All users need seats” (some stakeholders can use exports)	Right-size seats and roles; keep stakeholders on metrics when possible
Trace	A recorded execution of your workflow (often a root request + child steps)	Primary usage meter; grows with traffic and sampling	“Each internal LLM call is a new billable trace” (depends on billing definition)	Measure actual counted traces and tune sampling
Retention	How long traces stay stored/accessible	Long retention can multiply costs and privacy risk	“Long retention is always better”	Use short retention broadly; long retention selectively
Deployment run	One invocation of a deployed agent	Can become a major line item in deployment-heavy systems	“One user request equals one run” (can be multiple runs)	Track runs/request; reduce cascades and retries
Agent Builder run	A run created while building/testing an agent	Development loops can generate significant usage	“Only production traffic costs money”	Use offline evals; batch experiments; keep short retention

Reminder: Terms and exact units can change across versions and plans. Always validate your account’s billing definitions.

13) FAQ: LangSmith API Pricing

Is LangSmith “priced per API call” like OpenAI or Anthropic?

Not in the same way. Model providers price by tokens. LangSmith prices around platform usage (traces, retention, deployments) plus seats. You should budget for both: model inference costs + LangSmith observability/evaluation costs.

What is the fastest way to estimate cost before a big launch?

Run a one-week pilot. Measure counted traces/day by project/env, record the sampling policy, and then scale to a month. Decide retention mix and add expected deployment runs. This beats guessing by a huge margin.

What’s the most common reason bills spike?

Tracing 100% of high-traffic production requests with extended retention (or tracing far more than intended due to misconfigured sampling). Fix it by enforcing sampling in code and limiting extended retention to a small curated set.

Should I keep extended retention for “just in case” investigations?

Usually no. Keep short retention for most traffic and promote only the traces you actually need to archive. Also consider storing redacted content or derived metrics for long periods instead of full text.

How do I control costs without losing debuggability?

Trace all errors, sample successes, and increase sampling temporarily during releases. This keeps failures visible while limiting volume. Combine with a stable metadata scheme so you can target your investigations even with sampling.

Do evaluations increase billing?

Evaluations run workflows over datasets, which can generate traces and model calls. The model calls cost money with your model provider, and the traces can count toward LangSmith usage. Use smaller eval suites for PR checks and full suites before releases.

What should I track monthly to keep pricing under control?

Track: traces/day (by env), sampling rate, retention mix, runs per request (if using deployments), and the top projects producing volume. A simple monthly report prevents surprises.

LangSmith API Pricing - The practical guide to estimating cost and avoiding surprises

Jump to a section

1) What “LangSmith API Pricing” really means

2) Quick pricing snapshot

3) Plans: Developer vs Plus vs Enterprise

3.1 Plan comparison (responsive)

3.2 What “self-hosted” usually means for pricing

4) Seats: who needs one and how to right-size collaboration

4.1 Who usually needs a seat

Engineers and ML builders

QA reviewers / annotators

Product / support leads

Admins / security

4.2 Seat optimization strategies (that don’t break your workflow)

5) Traces: what counts, what scales, and how to estimate accurately

5.1 Root traces vs internal steps

5.2 The simplest forecasting method (works for almost everyone)

5.3 A trace sampling policy that teams actually keep

6) Retention: the hidden multiplier on trace cost

6.1 Typical retention tiers (conceptual)

6.2 A retention strategy that balances cost and compliance

7) Deployments: run pricing, uptime patterns, and common cost traps

7.1 Deployment runs in plain English

7.2 Deployment billing items (responsive table)

7.3 Avoiding the “agent cascade” billing trap

8) Agent Builder: how building/testing contributes to cost

8.1 Development loop patterns that inflate costs

8.2 Cost-friendly Agent Builder workflow

9) Cost controls: the levers that actually work

9.1 Sampling and routing (primary dial)

Trace all errors, sample successes

Feature-based sampling

9.2 Retention mixing (secondary dial)

9.3 Content minimization (privacy and cost together)

9.4 “Project hygiene” to prevent accidental mixing

10) Build a simple pricing calculator (transparent and accurate enough)

10.1 Inputs

10.2 A clear formula (copy/paste friendly)

10.3 What to do if your estimate is off

11) Budget examples and scenarios (useful patterns, not promises)

11.1 Scenario A: small team shipping an internal assistant

11.2 Scenario B: high-traffic consumer app

11.3 Scenario C: deployment-heavy architecture

Optional: embed a pricing walkthrough video

Fast checklist to keep spend stable

12) Glossary of pricing and billing terms

13) FAQ: LangSmith API Pricing