LangSmith Retention

1) What is “retention” in LangSmith?

In LangSmith, retention is the length of time your trace data remains available for viewing, querying, and analysis. Traces are the core artifact of observability: they contain the sequence of runs/spans that represent one “operation” (a user request, an agent task, an evaluation, a pipeline job). Retention defines whether you can still open that trace in the UI two days later, two weeks later, or a year later.

Retention matters because traces contain the evidence behind your product decisions. When an agent fails, the trace shows the prompt, retrieved docs, tool calls, and intermediate outputs. When you run evaluations, traces show exactly how the system behaved and why it earned a particular score. When you respond to customer tickets, the trace can explain what happened for that user at that moment in time.

Retention is a tradeoff: longer retention improves auditability and long-term analysis, but increases storage cost and privacy/compliance obligations. Shorter retention is cheaper and safer, but can remove the data you need for debugging and quality improvements.

Retention is not the same as sampling

Sampling and retention solve different problems. Sampling decides how many traces you store (e.g., store 1% of traffic, store 100% of errors). Retention decides how long you keep whatever you stored. A strong strategy uses both: sample aggressively to reduce volume, then keep only the most valuable subset for a longer period.

Retention is not the same as deletion

Retention sets a policy window; deletion is an action. Sometimes you’ll delete data earlier than retention for privacy, contractual, or operational reasons. Sometimes you’ll preserve a specific trace longer by upgrading it to a longer retention tier. In other words: retention gives you a default life cycle, and deletion/upgrade lets you override it in special cases.

2) Retention tiers: base vs extended

LangSmith retention is organized into two main tiers:

Base retention: short-lived traces kept for 14 days
Extended retention: long-lived traces kept for 400 days

These two fixed retention periods are documented in LangChain support and billing materials. (See references below.)

Base retention (14 days)

Base retention is designed for high-volume, short-term usage: daily debugging, recent incidents, quick checks, and ad-hoc analysis. For most production teams, base retention is the default for the majority of traces because most traces are never revisited after a week or two.

Best for: day-to-day debugging, short incident windows, rapid iteration
Risks: you may lose older evidence if you didn’t save it explicitly
Strategy: keep base for most traffic; selectively preserve “important” traces

Extended retention (400 days)

Extended retention is designed for traces that remain valuable over time: golden datasets, audit trails, long-running regressions, and recurring issue investigations. Extended retention is also important when features (like feedback and certain evaluation workflows) require trace data to remain available for longer than 14 days.

Best for: audits, long-term analysis, “golden” examples, repeatable eval corpora
Risks: greater privacy/compliance exposure; higher cost
Strategy: limit extended to cases that justify the long-term value

Why fixed retention tiers exist

Many teams want custom retention (30 days, 90 days, 180 days). LangSmith’s hosted platform uses fixed tiers primarily to simplify cost predictability and operational guarantees. Rather than managing infinite combinations of TTL windows, it offers two clear choices with known economics and makes workarounds available when policy requires something else.

Self-hosted retention can be different

If you are self-hosting LangSmith, you may have additional controls. LangSmith’s self-host TTL documentation describes enabling data retention and configuring system-wide retention periods for short-lived and long-lived traces, then managing default tiers at the org/project level. This is separate from the hosted, fixed-tier experience and is worth reading if you operate LangSmith in your own infrastructure.

3) How retention affects pricing and billing

Retention isn’t only “how long data lives.” In LangSmith pricing, retention tier also affects the price of trace storage. The official pricing page explains that:

Base traces have 14-day retention and are priced per 1,000 traces.
Extended traces have 400-day retention and are priced higher per 1,000 traces.
You can upgrade base traces to extended traces at an upgrade price per 1,000 traces.

Tier	Retention period	How it’s priced (conceptually)	Typical use
Base	14 days	Low cost / high volume short-term traces	Default tracing for debugging and monitoring
Extended	400 days	Higher cost / preserve for long-term value	Audits, golden sets, recurring regressions, feedback/evals
Upgrade	Base → Extended	Additional cost to extend life from 14 days to 400 days	Promote only the traces you truly need long-term

The most important billing principle: upgrades can be automatic

Many teams assume: “We set base retention, so all traces are cheap and short-lived.” But LangSmith documentation explicitly warns that certain features can cause automatic upgrades from base to extended retention. When that happens, you pay extended-tier pricing for those traces, and they remain stored longer.

Practical takeaway: Retention is both a policy and a cost lever. When you enable features like automation rules or online evaluations, you should assume retention upgrades can occur and audit them like you would any other spend driver.

Deleting traces does not typically “undo” billed usage

It’s tempting to think: “If my extended retention usage is high, I’ll just delete traces and the bill will drop.” In many metered systems, usage is counted at ingestion time (when traces are recorded), and deletion later does not retroactively change what was counted for that period. Community discussions and guidance around LangSmith indicate that deleting traces doesn’t affect what has already been counted for billing/limits in the period.

How to forecast retention cost without perfect information

Forecasting becomes easier when you separate three numbers:

Total traces per month (your request volume, batch jobs, eval runs)
Fraction upgraded to extended (via defaults, rules, feedback, evaluators)
Price per 1,000 for base + extended and/or upgrade

Even a rough estimate is better than guessing. Most surprises come from auto-upgrades: you think only 1% should be long-lived, but a rule accidentally upgrades 30% because it matches too broadly.

4) Setting a default retention tier (org and project defaults)

LangSmith’s billing documentation explains that there are two trace tiers — base and extended — and that you can set a default data retention tier in your account settings. The intention is that when new traces are registered, they receive that default tier unless your workflow upgrades them later.

Default retention is a governance decision

Choosing the default tier is like choosing the default logging level in a production system. If you default to extended, you increase long-term availability but you also increase cost and compliance burden. If you default to base, you keep cost and privacy exposure low, but you must create a process for preserving important traces.

Default to base when…

Your app has high traffic and you primarily debug recent issues.
You have strong sampling and “promote to extended” workflows.
You handle sensitive data and want to minimize long-lived storage.
You’re still iterating and your “important traces” criteria isn’t stable yet.

Default to extended when…

You must keep traces for auditability (e.g., regulated workflows).
Your volume is low enough that long retention is affordable.
You rely heavily on feedback/evaluation workflows that expect trace longevity.
You have strong access controls and governance around viewing stored data.

Project-level strategies

Even if you set an org-wide default, you can still structure projects to enforce different retention behaviors:

Production (base default): keep volume manageable, promote only valuable incidents.
Evaluation lab (extended default): preserve datasets and results for long-term comparisons.
Customer escalation project (extended via rules): preserve traces tied to critical tickets.
Privacy-sensitive project (base + aggressive deletion): keep data short-lived and restricted.

If you’re self-hosting, the TTL documentation describes managing organization/project default TTL tiers and configuring system-wide retention periods for short-lived and long-lived traces. That can be powerful for organizations with strict internal policy requirements.

5) What triggers automatic upgrades to extended retention?

The most important thing to understand about LangSmith retention is that traces can be upgraded to extended retention automatically when you use certain features. LangSmith documentation for automation rules explicitly states: If an automation rule matches any run within a trace, the trace will be auto-upgraded to extended data retention.

Key rule: When a trigger matches any run in the trace, the whole trace is upgraded. That means a single child span can promote a large trace to extended retention.

5.1 Automation rules

Automation rules are a common and powerful upgrade mechanism. They let you define criteria such as:

Run errors or exceptions
Latency thresholds (slow traces)
Specific tags or metadata fields
Model name, tool name, or run type
Specific customers, environments, or products

If a rule is written too broadly, it can unintentionally upgrade far more traces than you intended. For example: “upgrade anything with tag ‘prod’” would likely upgrade the majority of your production traffic if you tag everything with “prod.” The correct pattern is to make rules selective, like “upgrade if error=true” or “upgrade if eval_score < X” or “upgrade if user_tier=enterprise AND outcome=escalated.”

5.2 Online evaluations / LLM-as-a-judge evaluators

LangSmith documentation for online evaluators notes that when an online evaluator runs on any run within a trace, the trace will be auto-upgraded to extended data retention. The intent is sensible: online evaluations usually generate high-value traces that you’ll want to keep for investigation, audits, or longitudinal quality tracking.

However, this means you should be deliberate about where you run online evaluators. If you run an evaluator on all production traffic, you may be upgrading a huge fraction of traces. Many teams run online evaluation on:

A small, representative sample of production traffic
Only error traces or high-risk categories
Only specific cohorts (new feature flag users, new customers, new model versions)

5.3 Feedback and annotation workflows

Retention upgrades can also be driven by feedback and evaluation workflows. Community discussion indicates that traces with feedback can be converted to extended retention because feedback scores for evaluations are pulled from traces and would otherwise expire after base retention. This is a strong reason to treat feedback as “promote to long-lived evidence.”

That is typically what you want: if someone took the time to label a trace as “wrong,” “unsafe,” “excellent,” or “needs improvement,” it’s valuable training and evaluation data. It should remain available long enough to act on.

5.4 “Why are my traces being automatically upgraded?”

LangChain support has a dedicated article explaining that LangSmith automatically upgrades certain traces from base retention (14 days) to extended retention (400 days) when they meet specific conditions, and that the goal is to help preserve traces that are most valuable for analysis and investigation. If you see unexpected extended usage, this is the first official reference to consult.

5.5 Summary: the upgrade trigger mental model

A clean mental model is:

Most traces enter as base (unless your default is extended).
High-value workflows (rules, evaluators, feedback) can upgrade traces to extended.
Upgraded traces remain available longer and cost more.
Therefore: define what “high value” means, and ensure only those traces are upgraded.

6) Manual upgrades, “golden traces,” and what you should preserve

Automatic upgrades are useful, but you also need a deliberate “manual preservation” strategy — especially when your default is base retention. The goal is to define which traces deserve to live for 400 days and why.

The concept of “golden traces”

“Golden traces” are high-quality examples you keep to prevent regressions and to teach your system what good looks like. They usually include:

High-value user journeys (refund, cancellation, onboarding, security-sensitive flows)
Representative success cases (so you can validate new models don’t break core behavior)
Known failure cases (so you can prove fixes actually work)
Edge cases that recur (long documents, unusual languages, rare tool errors)
Escalated ticket traces (real customer pain points)

How to preserve golden traces without upgrading everything

The sustainable pattern is:

Keep most traffic at base retention (14 days).
Use rules to upgrade only high-value or high-risk traces.
Use feedback/annotation to preserve traces that humans label.
Maintain a curated dataset or collection outside LangSmith if you need longer-term archival.

Manual retention decisions in incident response

Incidents are a classic case for manual promotion. When something breaks, you’ll investigate in real time. But the best teams also preserve key evidence so they can:

Write a postmortem with concrete trace examples
Build regression tests and evals using those traces
Validate that the fix remains stable weeks later
Train new team members using real incidents

If your incident response process doesn’t include “preserve representative traces,” you’ll often lose the most useful evidence once the 14-day window passes.

7) Custom retention: what if you need 30/90/180 days?

In hosted LangSmith, retention periods are not generally configurable beyond the two fixed tiers (14 and 400 days). LangChain support explicitly states that direct configuration of retention periods is not available, and provides workaround options for customers who need custom retention windows.

Workaround A: programmatic deletion after your desired retention period

The most common workaround is to keep traces (base or extended) and then run a scheduled deletion process that removes traces older than your desired policy window. For example:

Policy requires 30 days → delete traces older than 30 days nightly
Policy requires 90 days for some tenants, 14 days for others → delete per-tenant using metadata tags
Policy requires immediate deletion for specific requests → delete those traces by trace ID when triggered

Example approach (pseudo-steps):
1) Query runs in project P older_than=30d AND tenant_id="X"
2) Resolve trace IDs for those runs
3) Call deletion API for trace IDs
4) Log deletion actions for audit

Workaround B: move long-term archives outside LangSmith

Some teams keep LangSmith as the “nearline” debugging system and export important traces to a data warehouse or secure archive for long-term analysis. This is useful when you need retention longer than 400 days or you need custom access controls and analytics workflows. The tradeoff is you must invest in:

Export pipelines and schema/versioning
Redaction and privacy compliance
Tooling to search and analyze archived traces

Multi-tenant custom retention patterns

In multi-tenant SaaS apps, retention may vary by tenant contract. A common pattern is to attach:

tenant_id (or customer_id)
retention_policy (e.g., 14, 30, 90, 400)
data_sensitivity (e.g., standard, restricted, regulated)

Then you can implement deletion jobs that enforce tenant-specific policy. LangChain support also has guidance about deleting traces and runs based on custom retention periods, especially in multi-tenant environments.

Self-host: TTL configuration can provide true custom windows

If you self-host LangSmith, the TTL documentation indicates you can enable data retention and configure system-wide retention periods for short-lived and long-lived traces. This can allow you to make your “short-lived” period 30 days and your “long-lived” period 365 days, for example — depending on your infra and policy requirements.

Reality check: Deleting traces later doesn’t necessarily reduce usage already counted for the month. Use deletion for compliance and safety, not as a retroactive billing refund mechanism.

8) Privacy, security, and compliance: retention is a data risk multiplier

If you store user prompts, documents, tool outputs, or any personally identifiable information in traces, retention increases your data exposure. Longer retention means:

More time for accidental access
More data to manage in audits
Greater impact if credentials or permissions are misconfigured
More obligations under privacy rules and contracts

Privacy-first tracing + retention strategy

The safest retention strategy begins before retention is even applied: it begins with what you trace.

1) Redact before storing

Mask secrets, remove raw identifiers, and avoid storing full sensitive documents unless you must. Use hashed IDs for user and session references. The best retention policy can’t protect data that shouldn’t be stored at all.

2) Restrict access by workspace/project

Put sensitive traces into projects with limited membership. Use strong admin controls and audit who can access production traces. Many organizations treat trace access as production access.

Compliance-driven retention policies

Some teams must store evidence for audits. Others must delete data quickly. Retention is therefore contextual:

Regulated flows: you may need extended retention for audit and investigation.
Privacy-sensitive flows: you may need base retention plus aggressive deletion.
Enterprise customers: contract may require fixed deletion windows and clear access logs.

“Least privilege” meets retention

A simple principle: the longer you keep data, the fewer people should have access to it. If you keep traces for 400 days, treat them as an evidence store, not as a casual debugging feed.

9) Governance patterns that keep retention sane (and costs predictable)

Most retention problems are governance problems. Teams either keep everything too long (cost + risk) or keep nothing long enough (lost evidence). The best pattern is to encode governance into defaults, rules, and runbooks.

Pattern A: “Base by default, promote on signal”

This is the most common mature approach:

Default tier = base retention (14 days)
Auto-upgrade on signal (errors, severe latency, low eval score, escalation tags)
Manual promotion for incident evidence and golden traces
Periodic review of which rules are upgrading data

It mirrors how teams handle logs: keep high-volume logs short-lived, keep high-signal logs longer.

Pattern B: “Extended for eval lab, base for production”

Keep production data short-lived for privacy and cost, but keep evaluation datasets long-lived because those are explicitly curated and repeatedly reused.

Production: base + rule upgrades only for high-signal traces
Eval lab: extended, because you want long-term comparability
Golden set: extended + exported for archival

Pattern C: “Extended only when humans touch it”

If your team uses feedback and annotation, this pattern can work extremely well:

Everything starts as base.
Any trace that receives feedback, annotation, or is used in a formal eval pipeline becomes extended.
Everything else expires naturally after 14 days.

This yields an organically curated long-lived dataset: the traces people care about survive.

Pattern D: Multi-tenant retention with deletion enforcement

If tenants have different policies, rely on deletion jobs. Attach tenant policy metadata at trace creation, then run a scheduled enforcement job that deletes according to policy. This is more work, but it is the most precise control when policy requirements vary.

Governance needs a review cadence

Retention rules should be reviewed like security rules:

Weekly during early production: check for accidental broad upgrades.
Monthly for stable products: review upgrade rates and new feature flags/evaluators.
Quarterly for compliance: confirm that policy is met, access controls are correct, deletion jobs work.

10) Retention runbook: checklist you can adopt today

This section is designed to be copy/pasted into an internal runbook. It helps you answer: What should our default retention be, what upgrades should we allow, and how do we prevent surprises?

A) Decide defaults

Choose org-wide default tier (base vs extended).
Decide project structure (dev/staging/prod/eval-lab).
Document which projects may store sensitive data.
Assign owners for each project (who is accountable).

B) Define upgrade triggers

List “signals” that justify extended retention (errors, escalations, low scores).
Implement automation rules narrowly (avoid broad tags like “prod”).
Decide where online evaluators can run (sampled vs full traffic).
Decide whether feedback automatically implies preservation.

C) Prevent surprises

Audit extended upgrade rate weekly at first.
Track which rule/evaluator caused upgrades (keep a change log).
Set usage limits/caps if available so spikes don’t blow the budget.
Train the team: adding feedback or evaluators can increase retention usage.

D) Enforce policy

Implement redaction before traces are stored.
Restrict access to sensitive projects; review membership monthly.
Implement deletion jobs for custom retention windows (if needed).
Document deletion and retention policies for audits.

One sentence strategy: Default to base, promote on high-value signal, run evaluators intentionally, and use deletion jobs for custom policy — while keeping access tight and data redacted.

11) FAQ: LangSmith retention

What retention periods does LangSmith offer?

Hosted LangSmith offers two fixed retention periods: 14 days (base retention) and 400 days (extended retention), as documented in LangChain support and billing documentation.

Can I set retention to 30 days or 90 days?

Not directly in hosted LangSmith beyond the two tiers. A common workaround is programmatic deletion: run a scheduled job that deletes traces older than your desired window. LangChain support provides guidance and APIs for deletion-based solutions.

Why are my traces being automatically upgraded to extended retention?

Auto-upgrades can happen when certain conditions are met, including automation rules or online evaluators that run on any run within a trace. LangChain support has a dedicated article explaining common triggers and how to manage extended usage.

Do automation rules upgrade an entire trace or only the matched span?

Automation documentation states that if a rule matches any run within a trace, the trace is auto-upgraded to extended retention. In practice, one matching child span can promote the entire trace.

Do online evaluators cause retention upgrades?

Documentation for online evaluators indicates that when an online evaluator runs on any run within a trace, the trace is auto-upgraded to extended retention. This preserves evaluation-relevant traces but can increase extended usage if applied broadly.

Does adding feedback to a trace affect retention?

In practice, feedback/annotation workflows can lead to traces being treated as long-lived evidence so that feedback remains available for evaluation and analysis beyond base retention. Plan for this: feedback is often a “promote to extended” signal.

If I delete traces, will my current month usage decrease?

Typically, usage is recorded when traces are ingested. Deleting traces later can be essential for privacy and compliance, but it may not retroactively reduce usage already counted for the billing period. Use deletion for policy, not refunds.

What’s the best default retention tier for most teams?

Many production teams default to base retention and then upgrade selectively on signals (errors, escalations, low eval scores). If you require long-term audits or rely heavily on evaluation/feedback flows, consider defaulting some projects to extended.

How should I choose what gets extended retention?

Preserve traces that remain valuable over time: escalations, failures, high-impact flows, golden examples, and evaluation datasets. Everything else should usually expire after 14 days to keep noise and cost under control.

12) Official references (source of truth)

Pricing and retention behavior can evolve. Use the official pages below to verify the latest rules and settings:

Educational note: This page is an independent guide. The official docs and pricing pages above are the source of truth for production decisions.