What is LangSmith, and why does pricing feel “different” from a typical API?
Many pricing pages in AI look like the classic developer meter: a simple table of per-token rates, a few model tiers,
and maybe a couple of add-ons. LangSmith is different because it is not a model provider. It is an “agent engineering platform”
focused on observing and improving your LLM applications. When you use it well, it becomes part of your reliability loop:
you instrument your application, inspect trace trees, run evaluations, build datasets, compare prompt and tool changes, monitor production drift,
and (for some teams) deploy agents in a managed way.
That means a good pricing calculator must consider usage patterns that do not show up in model bills:
how many “runs” you produce (traces), how long you retain them, whether certain classes of traces are automatically upgraded to longer retention,
and whether you are using hosted deployments that come with their own run and uptime meters.
The purpose of this page is to help you estimate those costs with clear assumptions and to give you enough context to
choose the right plan and architecture. If you are a solo builder experimenting locally, your costs may be near zero and the main question is
“How do I stay within included traces?” If you are a team shipping a customer-facing agent, the questions become:
“How do we avoid sending every dev run to production tracing?”, “Which traces should we keep for 400 days?”, and
“How do we set usage limits so a runaway loop doesn’t blow up our bill?”
Think of LangSmith pricing like observability pricing, not like model pricing.
Your model costs are driven by tokens. Your LangSmith costs are driven by trace volume, retention, and deployment execution.
The “best” configuration is the one that gives you enough visibility to ship reliable agents without paying to store noise.
How LangSmith pricing works
LangSmith pricing typically breaks into a few buckets. You can summarize them as:
Seats
Traces
Retention (base vs extended)
Deployments (runs + uptime)
Optional product features / enterprise terms
The public plan descriptions usually highlight the first three: plan seat cost, included traces per month, and pay-as-you-go overages.
The billing documentation highlights deployment runs and clarifies what constitutes a billable “deployment run.”
1) Seats
Seats are the simplest part. If you are on a plan that charges per user, your base subscription cost is
seat count × price per seat. In the public plan snapshot, the Plus plan is shown at a per-seat monthly rate,
while the Developer plan is shown as $0 per seat per month for a single seat (solo use).
Practically, seats matter because they anchor your “minimum bill.” Even if you send only a small number of traces,
a team plan may still have a baseline monthly seat charge. This is common in developer tools because it funds support,
collaboration features, and the product surface around traces (dashboards, projects, workspaces, and evaluation workflows).
2) Traces (base traces)
A trace is the recorded story of a run. In a simple chain, it might contain a prompt, an LLM call, and the final result.
In an agent, it can contain steps, tool calls, intermediate model calls, retrieval, and more. Traces are a core unit because
they let you debug and evaluate behavior. When something goes wrong, the trace tree tells you where.
Plans include a number of base traces per month. After you exceed that included amount, base trace overages are billed at a per-1,000-trace rate.
The commonly surfaced public overage statement is “starting at $0.50 per 1k base traces thereafter,” after the included amount.
Your calculator therefore needs both the included traces and the overage rate, because the billable portion is:
max(0, total base traces − included base traces).
3) Retention: base vs extended
Retention is about how long traces stay available. Many observability systems offer a “default retention” tier and a “long retention” tier.
Base retention covers the short window you need for day-to-day debugging; extended retention covers the longer window you need for audits,
regression investigation, and longitudinal monitoring.
In practice, you rarely want to keep everything for extended retention. Most traces are routine or noisy:
dev experiments, repeated failures, tests, and iterative runs. A smaller subset is genuinely valuable for a long time:
traces from production incidents, traces tied to important dataset examples, traces for compliance workflows, or traces that demonstrate
a key behavior your team cares about. That is why this page models “upgrades” as a percentage or a fixed count of traces.
4) Deployments: runs and uptime
If you deploy agents via the platform, deployment billing can have separate meters. One commonly documented meter is
deployment runs, defined as one end-to-end invocation of a deployed agent. The billing doc clarifies that
nodes and subgraphs inside a single execution are not billed separately; however, calls to other agents can be charged to the hosting deployment,
and resuming after an interrupt can count as a separate run. That matters for teams using human-in-the-loop patterns.
Some teams also see deployment uptime line items on invoices, especially if deployments are continuously hosted.
Uptime can behave like the “always-on service” component of a platform: you pay for the infrastructure to keep the deployment available,
plus you pay per run when it is actually invoked. Since uptime can vary by contract and deployment type, the calculator keeps this optional and editable.
Key idea: most surprising bills come from retention upgrades and “too many traces.” If you treat tracing like logging
and send everything forever, costs rise quickly. If you treat tracing like sampling—keeping the right traces for longer—cost stays manageable.
Plan snapshot (public-facing)
The following table summarizes typical public plan highlights. Always verify in your dashboard or the latest plan page before final decisions.
| Plan |
Seats |
Included base traces / month |
Base overage (after included) |
Best for |
| Developer |
1 seat (solo) |
5,000 |
Starts at ~$0.50 / 1k base traces |
Solo builders, early prototyping, small-scale debugging |
| Plus |
$39 / seat / month |
10,000 |
Starts at ~$0.50 / 1k base traces |
Teams shipping agents, collaboration + deployment workflows |
| Enterprise |
Custom |
Custom |
Custom |
SSO, governance, large-scale usage, custom terms |
Trace retention and upgrades: how to think about it
Retention is both a technical and a budgeting decision. Technically, retention affects what you can inspect later.
Budget-wise, retention is a storage-and-access decision: keeping traces longer costs more than keeping them briefly.
The simplest mental model is:
Base retention is short-term and cheap.
Extended retention is long-term and costs more.
An upgrade moves a trace from base to extended for the long window.
In the calculator you can choose “upgraded traces” as a percentage of total traces or a fixed count.
Percent is useful when you are forecasting. Fixed count is useful when you have operational clarity:
you know how many traces are “important” each month, or you know how many traces are automatically upgraded by rules.
Why upgrades happen
Many teams explicitly upgrade traces tied to:
- Incidents: production errors, outages, or user complaints where you want a record for later postmortems.
- Evaluations: labeled datasets and regression tests where you need trace history as evidence of behavior.
- Key journeys: “golden paths” in your application that define success for your users.
- Compliance: environments where a longer evidence trail is beneficial (subject to policy).
- Long-cycle debugging: intermittent issues that occur weekly or monthly and are hard to reproduce.
Upgrades can also happen automatically depending on platform behavior and settings. The safest approach is to assume
that “some portion” of traces will be upgraded in real deployments unless you actively manage retention settings and upgrade rules.
How to pick an upgrade percentage
There is no universal correct percentage. Here are practical starting points:
- 1–3%: Mature teams with strict sampling, strong limits, and clear retention rules. Mostly production issues only.
- 5–10%: Common for teams shipping and iterating quickly. Some dev experiments plus production issues are kept long-term.
- 15–30%: Early teams discovering patterns, running heavy evaluations, or keeping many traces for analysis. This can be expensive.
If you are unsure, start at 10% in your model, run the calculator, then re-run at 5% and 20%. You will quickly see how sensitive your total is
to retention. This sensitivity test is one of the fastest ways to find the best cost-control lever for your organization.
Retention tip: “Keep everything” is rarely the best plan. Decide what you must keep for 400 days, and let the rest expire
on base retention. You can still be highly reliable without storing noise.
Deployment costs: runs, uptime, and how the math works
Deployments are where people often mix up “trace volume” and “execution volume.” Traces measure observability records.
Deployment runs measure invocations of the hosted agent. These are related but not the same. You can have many traces without deployments
(for example, local runs instrumented during development). And you can have deployments with fewer traces if you limit tracing or sample.
Deployment runs
The documented definition of a deployment run is one end-to-end invocation. Importantly:
- Nodes and subgraphs inside one agent execution are not charged separately as separate runs.
- Calling other agents can incur charges on the deployment hosting the called agent.
- Human-in-the-loop patterns can create additional runs when you resume after an interrupt.
The calculator uses a default run price field and multiplies it by the number of runs you enter. If your workflow includes interrupts and resumes,
you should consider modeling a higher effective run count than “user requests,” because a single user request could produce multiple runs.
Uptime (optional modeling)
If you are hosting deployments continuously, uptime may appear as a separate cost component. It behaves like a capacity reservation:
the platform keeps the deployment available. Uptime is often billed in minutes, which is why the calculator asks for minutes.
A practical way to estimate uptime minutes is:
- Always-on deployment: about 43,200 minutes in a 30-day month (24 × 60 × 30), minus any pauses, maintenance, or scaling behavior.
- Work-hours only: 8 hours/day × 22 workdays ≈ 10,560 minutes/month.
- Staging environment: often a fraction of the above depending on team practice.
Because uptime rates and calculation rules can vary by contract and deployment type, uptime fields are intentionally editable and off by default.
Use them to plan and to compare scenarios: “always-on vs work-hours” or “dev deployment vs prod deployment.”
Deployment planning: For early-stage teams, start by modeling costs without uptime, then add uptime later if you move to continuously hosted agents.
This avoids overestimating on day one while still keeping the tool useful as you scale.
Examples: realistic monthly scenarios and step-by-step math
Examples are the fastest way to build intuition. The goal isn’t to perfectly match an invoice; it’s to understand how each dial changes the outcome
so you can decide what to instrument, what to retain, and what to limit. The examples below use simple math, then highlight practical actions
you can take to reduce cost without losing reliability.
Example A: Solo developer prototyping (Developer plan)
You are a solo builder. You test a small RAG chatbot, run a few evaluations, and keep only a handful of traces for extended retention.
Your numbers:
- Plan: Developer
- Seats: 1
- Total traces: 4,000
- Upgraded traces: 2% (80 traces)
- Deployment runs: 0 (you deploy elsewhere or not yet)
Since total traces are below included traces, billable base traces are 0. Your estimate becomes:
- Seat cost: $0
- Base overage: $0
- Upgrades: tiny (80 traces × upgrade rate / 1,000)
- Runs: $0
In this scenario the main cost driver is not the platform; it’s your model usage. LangSmith is effectively “free” for observability.
Your best strategy is to keep traces low-value: do not trace every unit test, avoid tracing huge synthetic loops, and keep upgrades near 0–2%.
Example B: Small team shipping an internal assistant (Plus plan)
You have a team of 4. You trace heavily in staging and lightly in production. You keep 10% of traces for long retention because you run
monthly evaluation cycles and you want to compare results across releases.
- Plan: Plus
- Seats: 4
- Total traces: 200,000
- Included traces: 10,000
- Billable base traces: 190,000
- Upgraded traces: 10% (20,000)
- Deployment runs: 25,000
Your cost components:
- Seats: 4 × $39 = $156/month baseline
- Base overage: 190,000 / 1,000 × base rate
- Upgrades: 20,000 / 1,000 × upgrade rate
- Runs: 25,000 × $0.005 = $125/month (if using that default)
The practical takeaway: once trace volume gets large, “included traces” becomes a small part of the story. Your cost is primarily about
how many runs you send and how many you keep long-term. This is where you should introduce:
- Sampling: trace 100% of failures and only a portion of successes.
- Environment rules: avoid sending dev/test loops to the same workspace as production.
- Upgrade discipline: keep upgrades for incidents, eval datasets, and key journeys only.
Example C: Customer-facing agent with always-on deployments
You run a customer-facing agent 24/7. You keep longer retention for compliance and incident review, and you see uptime as a cost component.
Here the best lever is often to reduce uptime minutes by pausing unused environments (like staging) and ensuring your production deployment
uses the right tier for its traffic.
If you enable uptime in the calculator, enter your dev and prod minutes and rates. Then compare:
- Always-on staging + always-on production
- Work-hours staging + always-on production
- Work-hours staging + scaled-down production during low-traffic windows (if supported)
Even if you cannot perfectly control uptime, this scenario planning helps you understand where budget goes and what to optimize first.
Cost optimization: how to reduce LangSmith spend without losing visibility
The best cost strategy is not “trace less.” It is “trace smarter.” You want enough data to diagnose issues and prove improvements, and you want
your data to be the right kind of data. That means:
- Prioritize high-signal traces (errors, unusual tool calls, low-confidence answers, low satisfaction feedback).
- Reduce or exclude low-signal traces (unit tests, repeated dev loops, synthetic sweeps not needed for diagnosis).
- Keep only a curated subset for extended retention.
- Enforce usage limits so runaway behavior cannot create a surprise bill.
1) Sampling strategies that work in practice
Sampling is common in observability because it balances cost and insight. A robust sampling approach might include:
- Trace 100% of failures: all exceptions, timeouts, tool errors, and parsing failures.
- Trace 100% of “high-risk” actions: actions that trigger side effects (emails sent, tickets created, payments initiated).
- Trace a fixed percentage of successful runs: e.g., 5–10% of successes for baseline monitoring.
- Trace more when you ship: temporarily increase sampling for a release window, then decrease.
This gives you a consistent picture while keeping trace counts bounded. A common pattern is “dynamic sampling”: if the system sees high error rate,
it increases sampling for a short period to capture more context.
2) Separate workspaces by environment
One of the easiest ways to accidentally inflate trace volume is to send:
dev runs + test runs + staging runs + production runs into the same workspace. It becomes hard to filter, and you end up retaining data you do not need.
A cleaner setup is:
- Dev workspace: short retention, aggressive sampling, minimal upgrades.
- Staging workspace: medium retention for release cycles, upgrades only for evaluation datasets.
- Production workspace: high signal, upgrades for incidents and key journeys.
This also simplifies budgeting because you can allocate a budget per environment and enforce limits accordingly.
3) Set usage limits and alerting
Cost surprises happen when a loop runs unexpectedly—an agent gets stuck, a retriever returns huge context, or a queue replays messages.
The safest solution is to set trace limits for a workspace and to monitor usage. Even if you later raise limits, having the guardrails prevents
the “overnight bill spike” scenario.
4) Keep extended retention for only what you truly need
Extended retention is extremely valuable for regression detection and long-cycle debugging, but it is also a common cost driver. A good
extended-retention policy usually defines:
- What is automatically upgraded (if anything).
- What must be upgraded during incidents.
- What evaluation traces are always upgraded.
- How long you actually need the data (and who can access it).
Then you can pick a stable upgraded-trace percentage. Over time, many teams push that percentage down as they become more disciplined.
Rule of thumb: Reduce trace volume first, then reduce upgraded traces. If you keep upgraded traces constant but reduce overall traces,
you typically preserve the most valuable long-term evidence while lowering the base overage.
5) Use evaluation datasets to “pay once, learn many times”
If you invest in a curated evaluation dataset, you can run repeated experiments and compare improvements without keeping every production trace forever.
This shifts your long-term knowledge from raw traces to structured evaluations. In many teams, that is the path to both higher reliability
and lower storage cost.
LangSmith Self-Hosted Pricing Calculator:
A LangSmith Self-Hosted Pricing Calculator is a planning tool that estimates the total cost of running LangSmith in your own environment not just the Enterprise license, but also the infrastructure you’ll operate. Unlike a basic SaaS calculator, a self-hosted calculator combines two major cost layers:
-
Enterprise/self-hosted licensing (contract-driven, often based on seats, expected usage, support level, and security requirements), and
-
Your infrastructure costs (Kubernetes/compute, Postgres and storage, backups, monitoring, and networking) that grow with trace volume and retention.
A good self-hosted calculator lets you model the real cost drivers that matter most in production: number of seats, monthly trace volume, the split between base vs extended retention, sampling rate for successful requests (while keeping 100% of errors), and whether you’ll run deployments or high-volume Agent Builder workflows. It then outputs a clear monthly estimate and shows which levers reduce spend like defaulting to base retention, promoting only high-signal traces to extended, separating dev/staging/prod environments, and trimming large trace payloads by storing big artifacts externally.
In short, a LangSmith Self-Hosted Pricing Calculator helps engineering and finance teams build a realistic budget and rollout plan so you can meet compliance/data residency goals while keeping both licensing and infrastructure costs predictable as usage scales.
FAQ: LangSmith Pricing Calculator
Tap a question to expand. These are practical answers focused on budgeting and estimating.
What does this calculator estimate?Open
It estimates your monthly spend based on seats, base trace overages (after included traces), extended retention upgrades, deployment runs,
and optional uptime minutes. It is meant for forecasting and scenario planning.
Why do you model upgraded traces separately from base overages?Open
Retention upgrades are a different behavior from simply exceeding included traces. Even if you are under included traces, keeping data longer can
still create additional cost. Modeling upgrades separately helps you see that lever clearly.
Should “upgraded traces” apply only to billable traces or to all traces?Open
Different billing implementations can treat retention as an add-on line item. For planning, applying upgrades to total traces is a conservative and
common estimate because retention is a property of stored traces, not only overage traces. If your invoice shows upgrades only after a threshold,
switch to Custom and align the logic to your observed billing.
What is a “deployment run”?Open
A deployment run is one end-to-end invocation of a deployed agent. It can be higher than user requests if your workflow includes interrupts and resumes,
because resuming after an interrupt can count as another run.
How do I estimate deployment uptime minutes?Open
For a 30-day month, always-on is roughly 43,200 minutes. If you only keep a deployment active during working hours, estimate 8 hours/day × 22 workdays
× 60 minutes ≈ 10,560 minutes. Use your real operational pattern when possible.
Does tracing affect my model costs?Open
Tracing itself doesn’t directly add model tokens, but the way you build your system might. For example, if you log or store huge payloads, you might
do additional processing. The primary model cost driver remains token usage; the primary LangSmith cost driver is trace volume and retention.
What’s the best way to lower costs quickly?Open
Start with sampling and environment separation. Trace 100% of failures, sample successes, and keep dev/test noise out of production workspaces.
Then reduce extended retention upgrades by defining clear upgrade rules.
Can I rely on this calculator for accounting?Open
It’s a planning tool. For accounting, rely on invoices and your billing dashboard. Use the Advanced section to match the calculator’s rates to
what you observe in your own billing.
Glossary
A short glossary helps align vocabulary across engineering, product, and finance stakeholders.
| Term |
Meaning in plain language |
Why it matters for cost |
| Trace |
A recorded run (often a tree) of your app or agent execution. |
Trace volume drives base overage and storage. |
| Base retention |
Default “short window” trace storage for debugging. |
Cheaper than extended; most traces should stay here. |
| Extended retention |
Longer storage window (useful for audits, regression, long-cycle debugging). |
Usually more expensive; upgrade only high-signal traces. |
| Upgrade |
Moving a trace from base retention to extended retention. |
A major lever: reducing upgrades can lower bills fast. |
| Deployment run |
One full invocation of a deployed agent. |
Billed per run; human-in-the-loop resumes can increase counts. |
| Deployment uptime |
Minutes the deployment is kept active/available. |
Can be a baseline infra cost for hosted deployments. |
Shareable takeaway: If you want predictable bills, track three numbers monthly:
total traces, upgraded traces, and deployment runs. Everything else is a multiplier.
Disclaimer & maintenance notes
This page is an educational calculator. Actual billing depends on your LangSmith account settings, plan details, and contract terms.
If pricing changes, update the values in the “Advanced pricing settings” section (seat price, included traces, overage rates, upgrade rates,
deployment run price, and uptime rates).
Recommended maintenance:
- Update the “Last updated” date when you change assumptions.
- Keep a short changelog for transparency (what changed, why).
- Cross-check one real invoice monthly until you trust your model.