TL;DR
- CNCF Incubating (since 2021), Apache 2.0. Formed in 2019 by merging OpenTracing (tracing API) and OpenCensus (Google's metrics + tracing library); the second most active CNCF project after Kubernetes by contributor count.
- Defines a vendor-neutral API, SDK and wire protocol (OTLP) for four signals: traces, metrics, logs and (in-progress) profiling. Instrument once, export to any backend that speaks OTLP.
- Operational surface is two artefacts: language SDKs (Python, Go, Java, JS/TS, Rust, .NET, Ruby, PHP, C++, Swift) and the Collector — a Go binary with receiver / processor / exporter pipelines for any combination of source and backend.
- OTLP wire protocol uses Protocol Buffers over gRPC (port 4317) or HTTP/Protobuf and HTTP/JSON (port 4318). The Collector ships in two builds: `otelcol-core` and `otelcol-contrib` with 100+ receivers and exporters.
- Yobibyte emits OTel traces and metrics for every inference and fine-tune workload under OpenInference / OpenLLMetry semantic conventions; Yobitel NeoCloud regions run Collector gateways that customers can OTLP-push to without standing up their own collector tier.
Overview#
OpenTelemetry (OTel) is a CNCF Incubating project that standardises how applications produce telemetry — traces, metrics, logs and profiling — and how that telemetry moves from the application to a backend. It is *not* a backend itself: OTel ships the instrumentation API, the language SDKs that implement the API, the OTLP wire protocol that carries telemetry over the network and the Collector that routes telemetry between sources and backends. The storage and UI live downstream in Prometheus, Tempo, Loki, Jaeger, Phoenix, Langfuse, Datadog, Honeycomb, New Relic, Grafana Cloud and every major observability vendor.
Before OTel, every observability vendor shipped its own SDK. Instrumenting a service for one backend locked you in; switching meant rewriting the instrumentation. OpenTracing standardised the tracing API but left the implementation to vendors. OpenCensus standardised the implementation but had a smaller ecosystem. In 2019 the two merged under the CNCF as OpenTelemetry with the explicit goal of making instrumentation a write-once, deploy-anywhere concern. The bet has played out: every major observability vendor accepts OTLP natively, and most have deprecated their proprietary SDKs in favour of OTel.
On AI infrastructure OTel matters because the request flow is inherently multi-component. A single LLM request crosses an API gateway, an orchestrator, a retrieval step (embedding + vector DB lookup), a model-server call (vLLM or TensorRT-LLM), tool invocations and a response post-processor. Without trace context propagation those become disconnected log lines; with OTel they become one trace with parent-child spans, latency attribution and a trace ID that links from a Prometheus exemplar to a Jaeger waterfall to a Phoenix evaluation view.
Yobibyte instruments every inference and fine-tune workload with OTel under the OpenInference / OpenLLMetry semantic conventions and emits OTLP traces, metrics and logs back to the customer's chosen backend. The Yobitel NeoCloud regional Collector gateway accepts customer OTLP pushes — useful for application-side instrumentation that needs to enrich and sample before egress — and the regional control plane itself is OTel-instrumented for Yobitel's internal SRE rota.
This entry helps you instrument an AI application correctly the first time, choose between sidecar / DaemonSet / gateway Collector topologies, apply OpenInference semantic conventions to LLM spans, and integrate the result with Yobibyte's managed observability surface and the Yobitel NeoCloud Collector gateway.
Quick start#
The example below instruments a Python LLM application with OTel using the auto-instrumentation for the OpenAI SDK plus a manual span for the retrieval step, deploys an OTel Collector on Kubernetes that fans out to Prometheus, Tempo and Loki, and verifies a trace end-to-end. The second block is the standalone Collector for a bare-metal host. The third block points an OTLP exporter at a Yobitel NeoCloud regional Collector gateway.
# 1. Python application — auto-instrument OpenAI + manual RAG span
pip install \
opentelemetry-api \
opentelemetry-sdk \
opentelemetry-exporter-otlp \
opentelemetry-instrumentation-openai \
openinference-instrumentation-openai
# Run with auto-instrumentation
OTEL_SERVICE_NAME=rag-app \
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector.observability:4317 \
OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=prod,k8s.cluster=london-1 \
OTEL_TRACES_SAMPLER=parentbased_traceidratio \
OTEL_TRACES_SAMPLER_ARG=0.1 \
opentelemetry-instrument python app.py
# 2. Deploy OTel Collector via the operator + DaemonSet+Gateway pattern
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm upgrade --install otel-collector open-telemetry/opentelemetry-collector \
--namespace observability --create-namespace \
--set mode=daemonset \
--set image.repository=otel/opentelemetry-collector-contrib \
--set presets.kubernetesAttributes.enabled=true \
--set presets.kubeletMetrics.enabled=true \
--set presets.hostMetrics.enabled=true
# Gateway tier — central collector for sampling + export
helm upgrade --install otel-gateway open-telemetry/opentelemetry-collector \
--namespace observability \
--set mode=deployment \
--set replicaCount=3
# 3. Push OTLP to a Yobitel NeoCloud regional Collector gateway
# (illustrative env var)
OTEL_EXPORTER_OTLP_ENDPOINT=https://otel.london-1.yobitel.com:4317 \
OTEL_EXPORTER_OTLP_HEADERS="authorization=Bearer $YOBITEL_TENANT_TOKEN"Always set `OTEL_RESOURCE_ATTRIBUTES` with `service.name`, `service.version`, `deployment.environment` and `k8s.cluster` at the very minimum. Spans without identifying resource attributes are operationally useless — you lose the join from a trace to the workload, the pod, the cluster and the tenant.
How it works#
OTel separates four concerns: the API (what the application calls), the SDK (the in-process implementation), the wire protocol (OTLP — how telemetry leaves the process) and the Collector (how telemetry is routed, processed and exported). The API is intentionally minimal — start a span, record a metric, emit a log — so application code is portable across SDK implementations. The SDK adds the production machinery: span processors, batch exporters, samplers, resource detection, propagation across process boundaries.
Each signal has its own data model. A trace is a tree of spans sharing a 128-bit trace ID; each span has a 64-bit span ID, a parent span ID, a name, a start and end timestamp, a status, a kind (`SERVER`, `CLIENT`, `INTERNAL`, `PRODUCER`, `CONSUMER`) and an arbitrary attribute map. Metrics come in four instruments — counter, up-down counter, histogram, gauge — each with an attribute set and an aggregation temporality (delta or cumulative). Logs in OTel are timestamped severity records with attributes and a trace context, designed to coexist with traces rather than replace structured-log frameworks.
Context propagation is the mechanism that knits cross-process spans into one trace. The W3C Trace Context standard defines a `traceparent` HTTP header (`00-<trace-id>-<span-id>-<flags>`) and a `tracestate` header for vendor extensions; the OTel SDK auto-injects these on outbound HTTP, gRPC, Kafka and AWS SDK calls when the respective instrumentation is active. On the receiving side the SDK extracts the headers and makes the upstream span the parent of the new server-side span.
OTLP (OpenTelemetry Protocol) is the wire format. Protocol Buffers schemas define `TracesData`, `MetricsData`, `LogsData` and (in-progress) `ProfilesData` messages; transports are gRPC on port 4317 (preferred) and HTTP/Protobuf or HTTP/JSON on port 4318. Because every signal shares one wire format, a single Collector pipeline can fan signals out to different backends — Prometheus for metrics, Tempo for traces, Loki for logs — without changing any application code.
The Collector is the most underrated part of OTel. It is a Go binary with three pluggable stages: receivers (OTLP, Prometheus scrape, Jaeger, Zipkin, Fluentd, Kafka, AWS Firehose, 80+ more), processors (batch, memory limiter, attributes, k8sattributes, resource, tail_sampling, transform, redaction) and exporters (OTLP to another Collector, Prometheus remote_write, Tempo, Loki, Jaeger, every major vendor). Pipelines are wired per-signal — you can have one metrics pipeline, two traces pipelines and three logs pipelines in one Collector process.
- API → SDK → Exporter → Collector → Backend — the standard pipeline; each step is independently swappable.
- Trace context propagation: W3C `traceparent` HTTP header injected by auto-instrumentation on outbound calls.
- Sampling: head-based at the SDK (`parentbased_traceidratio`) or tail-based at the gateway Collector (`tail_sampling` processor).
- Resource attributes: `service.name`, `service.version`, `deployment.environment`, `k8s.*`, `cloud.*`, `gpu.uuid` — stamped on every signal from a process.
- Semantic conventions: standard attribute names — `http.method`, `db.system`, `messaging.system`, `gen_ai.system`, `gen_ai.usage.input_tokens`.
- OpenInference / OpenLLMetry: LLM-specific conventions — `llm.prompt`, `llm.completion`, `llm.token_count.completion`, `llm.model_name`, `retrieval.documents`.
- Collector topology: agent (DaemonSet, near app) + gateway (Deployment, centralised) is the standard production shape.
- Tail-based sampling: keep all error traces, slow traces and statistically interesting traces; drop the boring majority — typically cuts trace storage by 90 percent.
Reference and specifications#
The reference below documents the OTel surface that an AI-infrastructure engineer touches most. The full specification is much larger — propagation formats, exemplars, log data model, metric aggregation rules — but the table covers the API artefacts, environment variables, OTLP wire details, Collector building blocks and the LLM-specific semantic conventions that matter when instrumenting a vLLM-backed RAG application or a Yobibyte fine-tune workflow.
| Artefact | Type | Purpose |
|---|---|---|
| `Tracer.start_as_current_span(name)` | API | Open a span and make it the active context. |
| `Span.set_attribute(key, value)` | API | Attach a key-value attribute to the current span. |
| `Span.record_exception(exc)` | API | Record an exception event and set status to error. |
| `Meter.create_counter / create_histogram` | API | Construct a metric instrument. |
| `Counter.add(value, attributes)` | API | Record a counter increment with dimension attributes. |
| `Histogram.record(value, attributes)` | API | Record an observation in a histogram instrument. |
| `BatchSpanProcessor` | SDK | Buffer spans and flush in batches to the exporter. |
| `TraceIdRatioBased(rate)` sampler | SDK | Head-based sampling — keep `rate` fraction of trace IDs. |
| `ParentBased` sampler | SDK | Honour the upstream sampling decision; default in production. |
| `OTEL_SERVICE_NAME` | env | Sets `service.name` resource attribute. |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | env | OTLP target — gRPC `:4317` or HTTP `:4318`. |
| `OTEL_EXPORTER_OTLP_PROTOCOL` | env | `grpc` (default) or `http/protobuf`. |
| `OTEL_RESOURCE_ATTRIBUTES` | env | Comma-separated resource attributes. |
| `OTEL_TRACES_SAMPLER` | env | `parentbased_traceidratio` is the production default. |
| `OTEL_TRACES_SAMPLER_ARG` | env | Sampling ratio — `0.1` keeps 10% of new traces. |
| `OTEL_PROPAGATORS` | env | `tracecontext,baggage` is the W3C default. |
| `OTEL_LOG_LEVEL` | env | SDK self-log verbosity. |
| OTLP/gRPC port 4317 | wire | Protocol Buffers over gRPC — preferred transport. |
| OTLP/HTTP port 4318 | wire | Protocol Buffers (or JSON) over HTTP — firewall-friendly. |
| W3C `traceparent` header | propagation | `00-<trace-id>-<span-id>-<flags>`; identifies the parent span across processes. |
| W3C `tracestate` header | propagation | Vendor-specific routing data; stays opaque to most middleware. |
| `receivers.otlp` (Collector) | Collector | Accept OTLP over gRPC and/or HTTP. |
| `receivers.prometheus` (Collector) | Collector | Scrape Prometheus targets and convert to OTLP metrics. |
| `processors.batch` | Collector | Buffer and flush — always include in production pipelines. |
| `processors.k8sattributes` | Collector | Enrich spans with `k8s.pod.*`, `k8s.namespace.*`, `k8s.deployment.*`. |
| `processors.tail_sampling` | Collector | Per-trace sampling at the gateway based on errors / latency / attributes. |
| `processors.transform` | Collector | OTTL — OpenTelemetry Transformation Language for span/metric rewriting. |
| `exporters.otlp` | Collector | Forward to another Collector or any OTLP backend. |
| `exporters.prometheusremotewrite` | Collector | Send OTel metrics to Prometheus. |
| `gen_ai.system` attribute | semconv | LLM provider name — `openai`, `anthropic`, `yobibyte`. |
| `gen_ai.request.model` | semconv | Model identifier — `gpt-4o`, `llama-3.1-70b-instruct`. |
| `gen_ai.usage.input_tokens` / `output_tokens` | semconv | Token counts — basis of cost attribution. |
| `gen_ai.response.finish_reasons` | semconv | `stop` / `length` / `content_filter` etc. |
| OpenInference `llm.prompt` | semconv | Full prompt content (subject to redaction). |
| OpenInference `retrieval.documents` | semconv | Retrieved document IDs and scores for RAG spans. |
Avoid putting raw prompts or completions in span attributes without a redaction processor. The OpenInference convention names exist (`llm.prompt`, `llm.completion`) but leaking customer prompts to a third-party tracing vendor is a routine compliance incident. Use the Collector's `transform` or `redaction` processor to strip or hash sensitive fields before egress — see the Security section.
Workload patterns#
Three workload shapes cover the bulk of OTel deployments on AI infrastructure: a single-service Python application emitting traces directly to a Collector gateway, a multi-tenant Kubernetes cluster with agent-plus-gateway Collector topology, and a customer pushing OTLP into a Yobitel NeoCloud regional Collector gateway. Each pattern has a different propagation, sampling and processor profile.
Pattern A — single Python service, head-based sampling, direct push. Application uses OTel SDK with auto-instrumentation for the OpenAI SDK, FastAPI, requests, httpx, SQLAlchemy and Redis plus manual spans for orchestration. `OTEL_TRACES_SAMPLER=parentbased_traceidratio` with a 10 percent ratio. Direct OTLP push to a single Collector gateway that handles enrichment and export. This is the right shape for a small RAG service or a development cluster — minimal infrastructure, easy to debug.
Pattern B — agent + gateway on Kubernetes. The OpenTelemetry Operator runs a DaemonSet agent Collector on every node performing local enrichment (k8sattributes, hostmetrics, kubeletstats) and a Deployment gateway Collector tier (3-5 replicas) performing tail-based sampling, redaction and export. Applications push OTLP to the local node-agent (low-latency, no cross-zone hop); the agent forwards to the gateway via OTLP/gRPC; the gateway fans out to Tempo, Prometheus, Loki and the LLM observability backend (Langfuse, Phoenix or Helicone). This is the production shape on every Yobitel NeoCloud region and the shape we recommend customers adopt for medium-to-large AI workloads.
Pattern C — push OTLP into a Yobibyte / NeoCloud regional Collector gateway. The customer runs application instrumentation but does not stand up their own Collector tier. The OTel SDK exports OTLP/gRPC to `otel.<region>.yobitel.com:4317` with a tenant-scoped bearer token. Yobitel's regional gateway accepts the push, enriches with NeoCloud-side attributes (`gpu.uuid`, `nvidia.gpu.model`), applies tenant-scoped tail sampling and forwards to the customer's configured backend — by default the Yobibyte console's trace browser, but customers can route to Datadog, Honeycomb or any OTLP target. This is the recommended integration for customers who want immediate trace visibility without operating their own Collector tier.
# Gateway Collector — tail sampling, k8s enrichment, multi-backend export
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: gateway
namespace: observability
spec:
mode: deployment
replicas: 3
image: otel/opentelemetry-collector-contrib:0.110.0
config:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 8192
memory_limiter:
check_interval: 2s
limit_percentage: 80
spike_limit_percentage: 25
k8sattributes:
auth_type: serviceAccount
passthrough: false
extract:
metadata: [k8s.namespace.name, k8s.pod.name, k8s.deployment.name, k8s.node.name]
resource:
attributes:
- key: deployment.environment
value: prod
action: insert
transform/redact:
trace_statements:
- context: span
statements:
- replace_pattern(attributes["llm.prompt"], "(?i)email[^ ]+", "<redacted>")
- delete_key(attributes, "llm.completion") where attributes["customer.tier"] == "regulated"
tail_sampling:
decision_wait: 10s
num_traces: 100000
policies:
- { name: errors, type: status_code, status_code: { status_codes: [ERROR] } }
- { name: slow, type: latency, latency: { threshold_ms: 1000 } }
- { name: sample, type: probabilistic, probabilistic: { sampling_percentage: 10 } }
exporters:
otlp/tempo:
endpoint: tempo:4317
tls: { insecure: true }
prometheusremotewrite:
endpoint: http://prometheus:9090/api/v1/write
loki:
endpoint: http://loki:3100/loki/api/v1/push
otlp/yobibyte:
endpoint: otel.london-1.yobitel.com:4317
headers: { authorization: "Bearer ${YOBITEL_TENANT_TOKEN}" }
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, k8sattributes, resource, transform/redact, tail_sampling]
exporters: [otlp/tempo, otlp/yobibyte]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch, k8sattributes, resource]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp]
processors: [memory_limiter, batch, k8sattributes, resource]
exporters: [loki]Always run the OTel Collector as a DaemonSet plus a Deployment gateway. The DaemonSet collects local node telemetry with low latency; the gateway centralises sampling and export. Never let application pods talk directly to the trace backend — the indirection is what makes tail sampling, redaction and vendor switching cheap.
Sizing and capacity planning#
OTel sizing is governed by span rate, span size, processor cost (especially tail sampling), batch/flush configuration and the number of exporter destinations. As a planning anchor, a typical LLM application produces 5-20 spans per user request (gateway → orchestrator → 1-3 retrieval spans → 1 LLM call → response post-processor) and each span averages 1-3 KB after attributes. The table below maps representative workload sizes onto Collector footprint at agent and gateway tiers, plus the downstream tracing-store impact.
On Yobitel NeoCloud the regional gateway Collector tier runs as a 3-replica HPA-backed Deployment per region, sized for the regional inference and fine-tune traffic with tail sampling at 10 percent of clean traces, 100 percent of error traces and 100 percent of traces above 1 s latency. This typically cuts downstream Tempo storage by 90 percent without losing investigable incidents — the same pattern we recommend for customer-side gateway deployments.
- Default flush — `batch` processor: 5 s timeout, 8,192 spans per batch on the gateway; 2 s / 1,024 spans on the agent.
- Memory limiter — always include `memory_limiter` processor first; it back-pressures the receiver before OOM.
- Sampling — head-based at 10 percent on the SDK, tail-based at the gateway for error and slow-trace retention.
- Resource — `~3 KB` RAM per buffered span pre-export; pre-sample buffer dominates gateway sizing.
- Network — OTLP/gRPC is roughly 4-6x more efficient than OTLP/HTTP-JSON; prefer gRPC where the network allows.
- Yobitel NeoCloud gateway anchor: 3-replica Deployment, 2 vCPU / 4 GB RAM per replica per 5,000 spans/s.
- Customer push to Yobibyte: bandwidth caps apply per tenant token — 50 MB/s for standard tiers, configurable up.
| Workload | Spans/s | Pre-sample bandwidth | Post-sample bandwidth | Collector RAM (agent / gateway) | Tempo storage (30d) |
|---|---|---|---|---|---|
| Single dev RAG app | ~10 | ~50 KB/s | ~5 KB/s | 100 MB / 256 MB | ~3 GB |
| Small production LLM service | ~200 | ~600 KB/s | ~60 KB/s | 200 MB / 512 MB | ~30 GB |
| Multi-tenant inference platform (100 RPS) | ~2,000 | ~6 MB/s | ~600 KB/s | 500 MB × N nodes / 2 GB × 3 | ~300 GB |
| Yobitel London-1 region | ~10,000 | ~30 MB/s | ~3 MB/s | 1 GB × N / 4 GB × 5 | ~1.5 TB |
| Yobitel multi-region fleet | ~40,000 | ~120 MB/s | ~12 MB/s | Per-region only | ~6 TB |
Limits and quotas#
OTel itself has very few hard limits — the SDK and Collector are designed to be horizontally scalable. The constraints that matter operationally are span and attribute size caps, sampling-decision memory in tail samplers, exporter queue depth, and downstream backpressure. The table below documents each ceiling and the operational lever for raising it.
| Limit | Default | Ceiling | How to raise / work around |
|---|---|---|---|
| Max attributes per span | 128 | configurable | `OTEL_ATTRIBUTE_COUNT_LIMIT` env or SDK config. |
| Max attribute value length | unlimited | wire-size bound | `OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT`; redact long prompts. |
| Max events per span | 128 | configurable | `OTEL_SPAN_EVENT_COUNT_LIMIT`; prefer attributes. |
| Max links per span | 128 | configurable | `OTEL_SPAN_LINK_COUNT_LIMIT`; rarely needed above default. |
| SDK batch queue size | 2,048 | memory-bound | `BatchSpanProcessor(max_queue_size=4096)`. |
| SDK batch flush interval | 5 s | n/a | Lower for SLA-critical short-lived spans. |
| OTLP/gRPC max message | 4 MB | configurable | `grpc.max_recv_msg_size_mib` on receiver. |
| Collector receiver buffer | default | memory_limiter | `memory_limiter` processor stops accepting at threshold. |
| `tail_sampling` `num_traces` | 50,000 | RAM-bound | Raise `num_traces`; `decision_wait` longer for slow traces. |
| `tail_sampling` decision_wait | 30 s | trace duration | Must exceed max realistic trace duration. |
| Exporter queue depth | 1,000 | memory-bound | `sending_queue.queue_size` on each exporter. |
| Exporter retry on failure | 5 attempts | n/a | `retry_on_failure.max_elapsed_time`; circuit-breaker downstream. |
| OpenInference `llm.prompt` length | unlimited | wire-size + compliance | Redaction processor; convert to hash for audit trail. |
| Yobitel tenant push rate | 50 MB/s standard | Tier-dependent | Higher tiers available via Yobitel sales. |
Tail sampling needs to buffer every span of every active trace in memory until the decision_wait elapses. A noisy service with 10,000 spans/s and 30 s decision_wait holds 300,000 spans in RAM — roughly 1 GB. Either raise gateway RAM, lower decision_wait, or move tail sampling closer to the source.
Observability#
The Collector is itself observable — it exposes Prometheus metrics on port 8888 by default, plus its own OTLP self-telemetry if configured. The metrics below cover the failure modes that account for almost all production Collector incidents, plus the SDK-side telemetry every OTel deployment should alert on.
- `otelcol_receiver_accepted_spans` / `_refused_spans`: receiver-side throughput and rejection rate.
- `otelcol_processor_batch_send_size`: batch sizes — too small wastes bandwidth, too large blows memory.
- `otelcol_processor_dropped_spans`: dropped traces — investigate immediately, never silently tolerate.
- `otelcol_exporter_send_failed_spans`: downstream exporter failures; pair with retry queue depth.
- `otelcol_processor_memory_limiter_state`: `1` is healthy, `2` indicates limiter is actively dropping.
- `otelcol_process_runtime_total_alloc_bytes`: Go runtime memory; pair with limiter for OOM defence.
- SDK-side `otel.sdk.span_processor.spans_processed`: spans flushed by the BatchSpanProcessor.
- SDK-side `otel.sdk.exporter.export_calls`: exporter activity and failure rate.
- Trace round-trip canary: synthetic trace every 30 s end-to-end through agent + gateway + backend.
# Prometheus alerting rules for OTel Collector health
groups:
- name: otel-collector
interval: 30s
rules:
- alert: OTelCollectorDroppingSpans
expr: rate(otelcol_processor_dropped_spans_total[5m]) > 0
for: 5m
labels: { severity: critical, team: observability }
annotations:
summary: "Collector {{ $labels.instance }} dropping spans — back-pressure or config error"
- alert: OTelCollectorMemoryLimiterTripped
expr: otelcol_processor_memory_limiter_state == 2
for: 5m
labels: { severity: warning }
annotations:
summary: "Collector memory limiter forcing GC/drop — scale up or sample harder"
- alert: OTelCollectorExporterFailed
expr: rate(otelcol_exporter_send_failed_spans_total[10m]) > 10
for: 10m
labels: { severity: warning }
annotations:
summary: "Exporter {{ $labels.exporter }} failing — backend unreachable?"
- alert: OTelCollectorExporterQueueFull
expr: otelcol_exporter_queue_size / otelcol_exporter_queue_capacity > 0.9
for: 10m
labels: { severity: warning }
annotations:
summary: "Exporter {{ $labels.exporter }} queue >90% — downstream slow"
- alert: OTelCollectorReceiverRefusing
expr: rate(otelcol_receiver_refused_spans_total[10m]) > 0
for: 10m
labels: { severity: critical }
annotations:
summary: "Receiver refusing spans — config, TLS, or auth error"
- alert: OTelTraceCanaryFailed
expr: synthetic_trace_roundtrip_seconds > 60
for: 10m
labels: { severity: critical }
annotations:
summary: "Synthetic trace not round-tripping — collector or backend down"Add a synthetic trace canary every 30 s — emit a tiny trace from a known service and verify it appears in your tracing store within a minute. This is the only alert that catches "collector running but not delivering" failure modes, which the per-component metrics often miss.
Cost and FinOps#
OTel itself is free under Apache 2.0 — there is no licence cost. The operational cost is the Collector compute footprint plus the downstream tracing, metrics and logs stores, and the bandwidth between them. Tail sampling is the single biggest lever — a well-tuned tail sampler typically cuts trace storage cost by 90 percent without losing investigable traces. The table below puts both in USD terms for representative AI workloads.
- Collector compute: ~$50-100/month per 3-replica gateway on a small VM tier; scales linearly with span rate.
- Tempo / Jaeger self-hosted: ~$0.025/GB-month on object storage — trace storage is the dominant cost.
- Datadog APM: ~$31/host/month or per-span usage tier; tail sampling is essential to control cost.
- Honeycomb: per-event pricing; tail sampling at the Collector keeps event count predictable.
- Yobitel NeoCloud: regional Collector gateway + Tempo + Loki are included in the GPU rate; no per-span fee.
- Yobibyte managed observability: OTel push endpoint and 30-day trace retention included in the workspace fee.
- FinOps wedge: redact or hash `llm.prompt` and `llm.completion` before export to per-event-priced vendors — prompt/completion bytes are the single biggest cost lever on LLM traces.
| Workload | Spans/s pre-sample | Collector + Tempo (self-hosted, monthly USD) | Datadog APM (monthly USD) | Honeycomb (monthly USD) | Yobitel NeoCloud |
|---|---|---|---|---|---|
| Single dev RAG app | ~10 | ~$25 | ~$60 | ~$0 (free tier) | Included via Yobibyte |
| Small production LLM service | ~200 | ~$200 | ~$600 | ~$130 | Included via Yobibyte |
| Multi-tenant inference platform | ~2,000 | ~$1,400 | ~$5,500 | ~$1,300 | Included via Yobibyte |
| Yobitel London-1 region | ~10,000 | ~$7,000 | ~$28,000 | ~$6,500 | Yobitel-operated |
| Push to Yobitel NeoCloud Collector | varies | ~$25 (small Collector) | n/a | n/a | Federation/push included |
Security and compliance#
OTel does not authenticate by default but supports TLS (mutual TLS supported) and bearer-token auth on every receiver and exporter. The Collector accepts arbitrary OTLP from any caller that can reach the port; in production this is fronted by NetworkPolicy, an Envoy/Istio mesh, or a public-facing reverse proxy with mTLS. The Yobitel NeoCloud regional gateway requires per-tenant bearer tokens scoped to that tenant; the same token shape is documented for customer-side Collector ingress.
The most sensitive OTel field on AI workloads is the LLM prompt and completion. The OpenInference convention provides `llm.prompt` and `llm.completion` attribute names, but populating them sends raw user input to whatever trace backend you have configured. For UK public-sector workloads (NCSC Cloud Security Principles, G-Cloud 14, OFFICIAL-handling) and EU Data Boundary commitments, leaking prompt content cross-region is a documented control breach. The Collector's `transform` and `redaction` processors strip or hash the prompt/completion before egress; both are configurable per-tenant on the Yobitel NeoCloud regional gateway.
Yobibyte's customer-facing OTel surface enforces three controls. Per-tenant API tokens limit OTLP push and federation to that tenant's scope. The regional Collector gateway applies a default redaction pipeline that hashes `llm.prompt` and `llm.completion` for tenants on regulated tiers; raw content stays inside the sovereign tenancy. All cross-tenant export paths are signed and rate-limited per token. Customers see enough trace fidelity to debug their own workloads — span tree, latency breakdown, retrieval document IDs, token counts — without raw payload escaping the sovereignty boundary.
Treat `llm.prompt` and `llm.completion` as PII by default. Even when you control the trace backend, the same span will end up on a developer's laptop the first time they investigate a slow request. Redact or hash at the Collector before export and decide explicitly which tenants and tiers are allowed raw prompt visibility.
Migration and alternatives#
Most migrations to OTel come from one of four origins: vendor-specific tracing SDKs (Datadog APM SDK, New Relic SDK, AWS X-Ray SDK), Jaeger or Zipkin direct instrumentation, legacy OpenTracing or OpenCensus code, or no tracing at all. The table below documents the trade-offs of each migration path. For green-field LLM applications, install the OTel SDK plus auto-instrumentation and skip the alternatives — every backend you might want to use already accepts OTLP.
Vendor SDK migrations are typically straightforward: the OTel SDK is the long-term direction every vendor publicly endorses, and most vendor SDKs offer an OTel-compatible mode that emits OTLP alongside the proprietary protocol. Run both for one quarter, confirm trace fidelity, then drop the proprietary SDK.
| Migration source | Effort | What you gain | What you lose |
|---|---|---|---|
| Datadog APM SDK | Low — auto-instrumentation overlaps heavily | Vendor portability, open semconv, no per-host fee for Collector | Datadog-specific dashboards re-built on Tempo or kept via OTLP export to Datadog |
| New Relic SDK | Low | Same | Same |
| AWS X-Ray SDK | Medium | Multi-cloud portability, open semconv | X-Ray service map regenerated downstream |
| Jaeger direct (Jaeger SDK) | Trivial — Jaeger receiver in Collector | OTLP everywhere, modern auto-instrumentation | Jaeger SDK was deprecated in favour of OTel |
| Zipkin direct | Trivial — Zipkin receiver in Collector | Same | Same |
| OpenTracing / OpenCensus | Medium | Active community, no maintenance fork | Compatibility shims exist; minor API changes |
| No tracing at all | Trivial — install SDK + auto-instrumentation | Every benefit | n/a — this is the right migration |
| Send OTLP to Datadog / Honeycomb / New Relic | Configuration only | Vendor-neutral application code | n/a — the explicit goal of OTel |
| Yobitel-managed observability (Yobibyte) | OTLP env var | No Collector to run; included in workspace fee | Choose Yobibyte UI or your own backend |
# Equivalent: vendor SDK vs OTel SDK
# Old: Datadog APM SDK
from ddtrace import tracer, patch_all
patch_all()
@tracer.wrap()
def answer_question(question, docs):
span = tracer.current_span()
span.set_tag("rag.docs", len(docs))
return openai.chat.completions.create(...)
# New: OTel SDK + auto-instrumentation + OpenInference for LLM spans
from opentelemetry import trace
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from openinference.instrumentation.openai import OpenAIInstrumentor as OIOpenAI
# Standard HTTP / DB auto-instrumentation
OpenAIInstrumentor().instrument()
# LLM-aware semantic conventions
OIOpenAI().instrument()
tracer = trace.get_tracer(__name__)
def answer_question(question, docs):
with tracer.start_as_current_span("rag.answer") as span:
span.set_attribute("rag.docs", len(docs))
return openai.chat.completions.create(...)
# Export config is environment, not code:
# OTEL_EXPORTER_OTLP_ENDPOINT=https://otel.london-1.yobitel.com:4317
# OTEL_EXPORTER_OTLP_HEADERS=authorization=Bearer $YOBITEL_TENANT_TOKENPicking between Tempo, Jaeger, Honeycomb, Datadog and Phoenix as your trace backend is much smaller than picking OTel over a vendor SDK. The instrumentation work is what costs months; the export destination is one Collector config change. Optimise the instrumentation choice first.
Troubleshooting#
The error table below covers the failure modes that account for almost all real OTel incidents. Each row maps an observable symptom to the underlying cause and the minimum-viable fix. Most issues trace back to one of four root causes: misconfigured propagation, dropped spans under backpressure, exporter target unreachable, or attribute cardinality blowing out downstream storage.
| Symptom | Cause | Fix |
|---|---|---|
| Spans appear but no parent-child relationship | Propagation header not injected/extracted across services | Confirm `OTEL_PROPAGATORS=tracecontext,baggage`; verify auto-instrumentation covers the HTTP client. |
| Traces split into multiple disconnected trees | Trace context lost across an async boundary | Use `trace.use_span(span)` context manager or carry context manually. |
| SDK dropping spans silently | BatchSpanProcessor queue full | Raise `max_queue_size`; lower `schedule_delay_millis`; investigate exporter throughput. |
| Collector receiver refusing OTLP | TLS, auth, port mismatch | Curl `/v1/traces` directly; check Collector logs for receiver errors. |
| Tail sampler dropping known errors | Policy ordering — probabilistic before status_code | Reorder policies so error / latency policies precede sampling. |
| High cardinality blowing out metrics backend | User ID or request ID set as metric attribute | Move identifier to span attribute (free), drop from metric attributes. |
| LLM prompt visible in shared dashboard | Redaction not applied at the gateway | Add `transform` processor to delete/replace `llm.prompt` before export. |
| Spans missing k8s metadata | k8sattributes processor not enabled or RBAC missing | Add processor; grant ClusterRole over pods, namespaces and deployments. |
| Span timestamps in the future | Container clock drift | Run chrony/NTP on every node; verify with `chronyc tracking`. |
| Exporter retry storm during backend outage | No circuit breaker, infinite retry | Set `retry_on_failure.max_elapsed_time`; pair with bounded queue. |
| Trace canary failing intermittently | One of agent/gateway/backend periodically slow | Run canary at each hop; identify which leg is slow. |
| Tenant push to Yobitel returns 401 | Tenant token expired or scoped to wrong tenant | Rotate via the Yobibyte console; verify `authorization: Bearer` header on exporter. |
| OpenInference attributes missing | Standard auto-instrumentation but no openinference instrumentor | Install and call `openinference.instrumentation.openai.OpenAIInstrumentor().instrument()`. |
| Token counts wrong on streamed responses | Auto-instrumentation captures usage only on final chunk | Confirm stream completion handler runs; some SDK versions need manual span finalisation. |
Where this fits in the Yobitel stack#
OpenTelemetry is the tracing and structured-telemetry layer Yobitel runs internally and the integration surface Yobitel publishes to customers. Every Yobibyte inference replica, fine-tune workflow and marketplace deployment is OTel-instrumented under the OpenInference / OpenLLMetry conventions. Spans carry the OpenAI-compatible request/response shape, retrieval document IDs, model name, region pin and tenant identifier. The Yobitel NeoCloud regional Collector gateway accepts the traces, applies tenant-scoped tail sampling, redaction and enrichment and exports to the customer's chosen backend by default — Yobibyte console for the included trace browser, or any OTLP target the customer configures.
Customer-side integration uses one of two paths. Customers running their own Collector tier point an exporter at the Yobitel NeoCloud regional Collector gateway with a tenant-scoped bearer token, and Yobibyte's spans appear nested inside the customer's application traces — a single trace ID from the user's request through the customer's orchestrator into Yobibyte's inference replica and back. Customers without a Collector tier push OTLP directly from their application SDK to the same gateway and rely on the Yobibyte console for the trace UI. Either path preserves W3C trace context across the Yobibyte boundary.
On UK and EU sovereign tenancies the regional Collector gateway never forwards traces cross-region. Tail sampling, redaction and tenant scoping run in-region; the Yobibyte console queries only the in-region Tempo. Sovereign customers under NCSC Cloud Security Principles, G-Cloud 14 OFFICIAL-handling or EU Data Boundary commitments see a one-region trace surface and a documented control boundary. The recipe-protection rule applies: customers see the trace fidelity they need (latency breakdown, retrieval IDs, token counts) without disclosing Yobitel's internal scheduling, admission or routing spans — see the [yobibyte](/knowledge-base/yobibyte) entry for the customer-facing observability surface and the [prometheus](/knowledge-base/prometheus) entry for the parallel metrics path.
References
- OpenTelemetry Documentation · OpenTelemetry Project
- OpenTelemetry on GitHub · GitHub
- OpenTelemetry at the CNCF · Cloud Native Computing Foundation
- OpenTelemetry Specification · OpenTelemetry
- OTLP Specification · OpenTelemetry
- W3C Trace Context · W3C
- OpenInference Semantic Conventions · GitHub (Arize)
- OpenLLMetry Conventions · GitHub (Traceloop)
- OpenTelemetry Collector Components · GitHub