LangSmith Traces

1) What is a LangSmith trace?

A trace in LangSmith represents a single “operation” end-to-end most commonly a user request, a background job, an evaluation run, or a scheduled agent workflow. In practical terms, a trace is the unit you open in the UI when you want to answer: “What happened, in what order, and why did we get this output?”

LangSmith traces are built from smaller pieces called runs (also described as spans). If you have experience with OpenTelemetry, the mapping is intuitive: a trace is a collection of spans, where each span is one step in the overall operation.

One-sentence definition: A trace is a collection of runs/spans for a single operation, bound together by a shared trace ID, so you can inspect an operation as a coherent timeline.

What counts as “one operation”?

In real systems, “one operation” depends on your product. A chat assistant might define one trace as one user message and the system’s response. A document processing pipeline might define one trace as “ingest one PDF and produce one summary + embeddings + metadata.” A multi-agent planner might define one trace as “complete one task,” even if it takes dozens of sub-steps and tool calls.

The point isn’t to force a single perfect definition—it’s to choose a trace boundary that makes debugging and evaluation meaningful. Too small and you lose context (you can’t see the whole chain of decisions). Too large and your traces become noisy and expensive, and you’ll spend time hunting for the “interesting” parts.

What you see in the UI when you open a trace

The LangSmith UI typically presents a trace as:

A timeline/tree of runs (root run → children)
Inputs/outputs at each step
Timing information and (often) token/cost metadata when available
Tags, metadata, errors/exceptions, and attachments such as retrieved documents
Links to annotations, eval results, or monitoring dashboards depending on your setup

That structure is why traces are so useful: they convert “the model did something weird” into inspectable facts you can act on.

2) Runs and spans: how traces are structured

The building blocks of a trace are runs (a.k.a. spans). Each run captures one step in the execution of your application. A run has a start time, an end time, inputs, outputs, and optional metadata. Runs can be nested, which is how LangSmith represents complex agent behavior.

Root run (the “operation”)

A trace usually has a root run that represents the top-level operation: “handle user message,” “answer question,” “run evaluation,” etc. The root run’s children represent the steps that happened inside.

If you make just one LLM call and return the result, you might see a short trace: root run → LLM run → output parser run.

Child runs (the “steps”)

Child runs are the nested steps: LLM calls, tool calls, retrieval operations, formatting, re-ranking, routing, validators, and more. Nesting is the key to making the execution understandable.

When an agent loops, you’ll often see repeated patterns: plan → tool → observe → plan → tool → observe.

Trace IDs and run IDs

Runs are bound to a trace by a trace ID. That trace ID is how LangSmith knows “these spans belong together.” Within the trace, each run also has its own identifier so you can query, link, and reference specific spans.

This matters for production systems because you often want to connect LangSmith traces to external systems: your request IDs, user session IDs, error tracking events, A/B test cohorts, or incident tickets. The best practice is to store those as metadata/tags and also keep a stable external request ID that you can search for.

Why the run (span) data format matters

LangSmith stores trace data in a structured format designed to be easy to export and import. Understanding the shape of the data helps when you want to:

Build internal dashboards beyond the UI
Archive traces for compliance or long-term analytics
Write scripts to triage failed runs
Compare runs across versions of prompts or tools

If you ever feel like the UI is “just a viewer,” remember: traces are data, and you can treat them like a dataset.

3) Why tracing matters for LLM apps (debugging, evals, monitoring)

LLM apps fail differently than classic software. Instead of a stack trace and a deterministic exception, you can see: hallucinations, tool misuse, retrieval misses, instruction drift, formatting errors, latency spikes, cost blow-ups, and “it works on my prompt but not on theirs” behavior.

Tracing turns “mystery behavior” into inspectable evidence

Debugging: see exact prompt, tool parameters, retrieved docs, parsing steps, and errors.
Quality: connect outputs to eval scores, labels, and human feedback.
Ops: monitor latency, failure rates, and high-cost traces; set alerts when patterns change.
Iteration: compare traces across prompt versions or model swaps to identify what improved.

Tracing is not just for engineers

Strong organizations turn traces into a shared artifact:

Product reviews: “Show me 10 traces where the agent failed on refunds.”
Support triage: “Here’s the trace for customer ticket #18493.”
Safety audits: “Here are traces that triggered a policy rule.”
Model governance: “What changed when we upgraded to a new model?”

A good trace is a story: the context the system saw, the decisions it made, and the outcome it produced.

4) How to send traces to LangSmith

LangSmith supports multiple ways to capture traces, depending on your stack:

A) Native LangChain tracing

If you’re using LangChain Runnables, you can enable tracing via environment configuration and/or runtime config. This is usually the fastest path: you write normal LangChain code and LangSmith records the run tree.

B) LangGraph tracing

If you’re building multi-step agent graphs with LangGraph, you can trace graph execution so each node and tool call becomes a run inside the trace.

C) REST API / OpenTelemetry

If you’re not in the LangChain ecosystem—or you want deeper control—you can send traces via the LangSmith REST API or instrument with OpenTelemetry and forward traces to LangSmith.

Tracing setup: the core idea

Regardless of the integration method, tracing is a pipeline:

Create a trace context (trace ID, root run)
Create nested runs/spans for steps (LLM, tool, retrieval, parse)
Attach inputs/outputs/metadata
Send the run tree to LangSmith (ideally asynchronously)
View/query/export in UI or via SDK/API

Performance note (especially for API-based tracing)

When you send traces yourself (e.g., via REST), avoid blocking your user response on trace uploads. Synchronous trace posting can add latency and reduce reliability if the observability backend is slow or temporarily unavailable. The usual production pattern is: enqueue trace events → send in background → retry with backoff → drop if necessary.

Production principle: Observability must never be your bottleneck. Trace aggressively, but ship reliably.

Minimal mental model for “traceable” work

If you’re instrumenting custom functions, think in spans:

Wrap each meaningful unit of work in a run/span.
Use nesting to preserve causality: “this tool call happened because of this plan step.”
Store the things you will need later: prompt, tool args, retrieved docs, errors, versions.
Keep sensitive fields out or redacted (more on this later).

5) Projects: organizing traces so they stay usable

Projects are how you keep trace data from turning into a junk drawer. A good project structure makes it obvious: which environment produced the trace, which application produced it, and which experiment or version it belongs to.

Common project structures that work

By environment

Use separate projects for dev, staging, and prod. Production traces are precious for incident response and quality monitoring. Dev traces are noisy and often contain sensitive test content.

proj: myapp-dev
proj: myapp-staging
proj: myapp-prod

By product or agent

If you run multiple agents, separate them. Otherwise your monitoring signals become ambiguous and your debugging time increases.

proj: support-agent
proj: sales-agent
proj: doc-summarizer

Naming runs for readability

Within a trace, run names act like headings. Good names make the trace understandable at a glance. For example: “Retrieve policies,” “Call refund tool,” “Generate final reply,” “Validate JSON schema,” “Safety filter,” etc.

Naming is not cosmetic: it’s the difference between a trace that your team can read in 30 seconds and a trace that requires tribal knowledge and guessing.

6) What a trace typically contains (and what you should add)

A trace is only as useful as the data captured inside. At minimum, you want inputs and outputs for each run. In practice, high-quality traces contain additional fields that make analysis and debugging faster.

Core fields you’ll usually see

Inputs: prompt, system instruction, tool arguments, retrieved docs, structured request.
Outputs: model response, tool results, parsed structured output, final message.
Timing: start/end timestamps and duration per run.
Status: success/failure, error messages, exception type/stack where applicable.
Trace/run IDs: stable identifiers for linking and querying.

High-leverage extras you should consider adding

Versioning metadata

Store prompt version, model version, tool version, and app build SHA. This is essential for regression analysis.

metadata: { prompt_version: "v12", model: "gpt-4.1", build: "a13f9c2" }

User & session context

Store a hashed user ID or session ID, plus important flags (locale, plan tier, channel). Avoid raw PII unless you have explicit permission and strong controls.

tags: ["locale:en-US","channel:web","tier:pro"]

Retrieval diagnostics

When doing RAG, log the query, top-k docs, doc IDs, scores, and any re-ranking results. Most RAG failures are “retrieved the wrong thing” not “the model is dumb.”

Token/cost metadata (when available)

For budgeting and performance work, token counts and per-call cost fields can be extremely useful. If your stack provides token usage in responses, capture it on the corresponding run. If you don’t have this data, you’ll end up guessing which parts of your agent are expensive.

Redaction and safety

Traces can contain sensitive user content. Before you trace in production, decide what you will store and what you will redact. Many teams use:

Field-level redaction: remove or mask PII in inputs/outputs.
Selective tracing: only trace a sample of requests, or only trace error cases.
Separate projects: isolate sensitive traces and restrict access.
Short retention: keep most traces in base retention and only upgrade “golden” cases.

7) Pricing fundamentals: base traces and included monthly amounts

If you’re reading this page, you likely searched “LangSmith traces” because you want to understand what you’re being billed for. The key is to separate:

What is counted: traces (collections of runs), with pricing expressed in “base traces”
What is included: a monthly included amount per plan
What changes cost: volume beyond included + retention upgrades

Included base traces by plan (self-serve)

Plan	Included base traces / month	After included amount	Notes
Developer	Up to 5,000 base traces / month	Pay-as-you-go beyond included	Billing setup removes the 5k rate limit and charges overages per pricing page.
Plus	Up to 10,000 base traces / month	Pay-as-you-go beyond included	Team orgs include 10k traces per month before overage rates apply.
Enterprise	Custom	Custom	Contracts may include different bundled usage and retention options.

Overage rate (base traces)

Official pricing describes base trace overage as $0.50 per 1,000 base traces. A practical way to think about it is: once you exceed included traces, each additional trace has a small per-trace fee—then retention can multiply it.

Budgeting shortcut: Estimate monthly traces (N). If N exceeds included, overage cost is roughly (N − included) / 1000 × $0.50 for base traces, before considering retention upgrades.

What counts as “one trace” for billing?

In LangSmith, pricing is stated in terms of traces rather than runs/spans. A trace corresponds to one top-level operation (one trace ID). That means an agent trace might contain many runs (LLM + tools + retrieval), but it still counts as one trace at the “base trace” level.

That’s good news: a richly instrumented trace doesn’t automatically mean you pay per span. However, richly instrumented traces may encourage you to trace more operations, and high throughput systems can generate many traces quickly—so the biggest cost driver is typically request volume and retention, not how many spans you have within each trace.

How billing limits affect tracing behavior

The billing docs describe a rate limit on personal organizations (5k traces/month) until a card is added, and that team organizations have an initial 10k traces/month included. This means if you want to run high-volume load tests or production traffic, you should plan billing setup early—otherwise you might hit trace limits mid-experiment.

8) Retention: base (14 days) vs extended (400 days)

Retention is the second major axis of trace cost and governance. Official support documentation describes two fixed retention periods:

Base retention: 14 days
Extended retention: 400 days

Why retention exists (and why it’s not configurable)

Retention affects storage and indexing cost. Many teams want “keep everything forever,” but that’s rarely necessary. In fact, if you keep everything, you’ll stop looking at the data because the signal-to-noise ratio collapses.

Today, retention is offered as two fixed tiers rather than arbitrary configuration. If you need custom retention, support documentation suggests workarounds like programmatic deletion after your desired period, implemented as an automated job.

Automatic upgrades to extended retention

Retention is also influenced by your automation setup. If an automation rule matches any run within a trace, the trace can be auto-upgraded to extended retention. This is intentional: teams often want to preserve exactly the traces that match certain criteria (errors, high latency, low eval scores, certain tags, specific customers, etc.).

Important: Automations are powerful, but they can also increase extended retention usage. Treat rules like a cost lever: define them precisely and audit them periodically.

How to choose what gets extended retention

A sustainable retention strategy usually looks like this:

Default: base retention for the majority of traffic.
Upgrade: extended retention for high-value traces (failures, edge cases, golden datasets, audits).
Delete: programmatically remove traces that must not be stored longer than policy allows.

Retention and compliance

Retention decisions are not just cost decisions. They affect privacy and legal posture. If your application processes personal data, you should:

Redact sensitive fields before tracing.
Restrict access to sensitive projects/workspaces.
Document the retention tier used per project.
Implement deletion workflows if required by policy.

Retention and debugging velocity

Short retention can feel risky: “What if we need an old trace?” The fix is not “store everything for 400 days.” The fix is to identify which traces matter and preserve those. When you do this well, you end up with:

A compact set of “golden” traces you can revisit
A clear set of failure examples you can evaluate against
Lower noise when you investigate current incidents

9) Querying and exporting traces (SDK + API patterns)

Once you have traces, the next step is: find the right ones. The recommended way to query the span data is to query runs (runs are the span objects inside traces). In other words: you filter runs by project, time range, tags, status, name, metadata fields, or trace ID—and then you can reconstruct or inspect the trace context.

Why “query runs” instead of “query traces”?

Runs are the atomic units that carry the details you filter on: run name, errors, model type, tool name, tags, and metadata. Traces are bundles. When you search “tool failed,” you’re really searching for runs where a specific tool run errored, and then you want the trace that contains it.

Common query patterns

Incident response

Filter: last 30 minutes
Filter: error status
Group: by run name (e.g., “Call payment tool”)
Open: top failing traces

Quality regression

Filter: model version changed
Filter: eval score below threshold
Compare: traces before vs after deployment
Extract: golden set of failures for fixes

Exporting traces (why and when)

Exporting becomes important when you want to:

Build internal BI dashboards from trace data
Archive high-value traces into a long-term dataset
Run offline analyses at scale
Create reproducible evaluation corpora

A practical approach is to schedule a daily export for specific projects, then analyze in your data warehouse. If you do this, make sure you also export versioning metadata so you can compare changes over time.

Accessing the “current span” in custom code

Advanced tracing often requires injecting metadata into the currently active run. LangSmith provides helper functions in the SDKs to access the current run tree, which enables advanced workflows like tagging runs with request IDs, attaching additional artifacts, or dynamically changing trace naming.

Best practice: attach the external request ID and build SHA to the root run early, then inherit or reuse it on child runs as needed for consistent querying.

10) Production tracing patterns (sampling, async, privacy, and “don’t DDoS yourself”)

Production tracing has one rule: don’t break your product to record observability data. Everything else is implementation detail. Below are patterns that keep tracing useful and safe.

Pattern A: asynchronous ingestion

Send traces in the background. A clean architecture is:

Capture runs/spans in memory during request handling
Serialize to a queue (memory buffer, Redis, Kafka, etc.)
Worker flushes to LangSmith with retries/backoff
On failure, drop non-critical traces or sample down

Pattern B: sampling (trace less, learn more)

Sampling is not a compromise—it’s a strategy. Many systems use:

Baseline sample: 1% of all requests
Error sample: 100% of error requests
Latency sample: 100% of slow requests
Customer sample: 100% of whitelisted accounts during onboarding

This preserves visibility where it matters while keeping trace volume manageable and cost predictable.

Pattern C: “upgrade only the good stuff” retention strategy

Use base retention for the bulk of traffic. Then upgrade:

Traces with low eval scores
Traces with high user impact (refunds, payment flows, safety issues)
Traces associated with escalated tickets
Traces selected for golden datasets

Pattern D: privacy-first tracing

Privacy-first tracing is a combination of:

Redaction: remove PII before sending to LangSmith
Least privilege: limit who can view sensitive projects
Isolation: separate projects/workspaces for sensitive workflows
Deletion: implement trace deletion if required by policy

Pattern E: version every change that affects outputs

Many trace investigations fail because teams can’t answer: “Which prompt/model/tool version produced this output?” Always log versions:

Prompt version or commit hash
Model name and provider
Tool version and schema version
App build SHA / container image tag

11) Best practices checklist (copy/paste into your runbook)

Trace design

Define “one operation” clearly (one trace = one user request, job, or task).
Make a root run with a descriptive name.
Use nested runs for steps; keep naming consistent.
Attach external request/session IDs early.

Data & governance

Redact PII and secrets; avoid storing raw sensitive content by default.
Store versions: prompt, model, tools, build SHA.
Separate projects by env (dev/staging/prod).
Restrict access to sensitive traces; audit permissions.

Cost control

Estimate monthly traces; compare to included (5k dev / 10k plus).
Use sampling; trace 100% of errors, not 100% of everything.
Keep most traces at base retention; upgrade only valuable cases.
Audit automation rules that upgrade retention.

Reliability

Send traces asynchronously; never block user latency on tracing.
Retry with backoff; drop non-critical traces if under pressure.
Implement rate limiting on trace upload pipeline.
Monitor tracing failures separately from app failures.

Rule of thumb: Trace enough to explain failures, measure quality, and monitor drift—then use sampling + retention to keep the signal strong and the cost predictable.

12) FAQ: LangSmith traces

Does LangSmith charge per run/span or per trace?

LangSmith pricing is expressed in “base traces” for included usage and overage rates. A trace is one operation (one trace ID) that can contain many nested runs/spans. Runs/spans are critical for debugging and querying, but the pricing headline metric is traces.

How many traces are included on Developer and Plus?

Self-serve pricing lists 5,000 base traces/month included on Developer and 10,000 base traces/month included on Plus, then pay-as-you-go after. Always confirm the latest numbers on the official pricing page before budgeting.

What’s the safest way to trace in production?

Use asynchronous ingestion (queue + worker), sample traffic, and redact sensitive fields. Treat observability as best-effort: never let tracing degrade your core product reliability.

Why are some traces stored much longer than others?

LangSmith supports two fixed retention tiers (14 days base and 400 days extended). Certain conditions—like automation rules— can upgrade a trace to extended retention so it’s preserved for investigation. Audit your rules to avoid unexpected extended usage.

Can I set a custom retention period like 30 or 90 days?

Retention periods are typically fixed to base vs extended. If you need custom retention, a common workaround is programmatic deletion: run a scheduled job that deletes traces older than your desired policy window.

How do I query traces programmatically?

The recommended approach is to query runs (the span data inside traces) via the SDK or the /runs/query endpoint, then inspect or reconstruct the trace context. This supports filtering by project, time range, tags, name, and error status.

13) Official references (verify current behavior here)

Pricing, billing, and retention can evolve. Use these official pages as the source of truth:

Educational note: This page is an independent guide. Confirm pricing/limits/retention rules on the official pages above.