OpenTelemetry

TL;DR

CNCF Incubating (since 2021), Apache 2.0. Formed in 2019 by merging OpenTracing (tracing API) and OpenCensus (Google's metrics + tracing library); the second most active CNCF project after Kubernetes by contributor count.
Defines a vendor-neutral API, SDK and wire protocol (OTLP) for four signals: traces, metrics, logs and (in-progress) profiling. Instrument once, export to any backend that speaks OTLP.
Operational surface is two artefacts: language SDKs (Python, Go, Java, JS/TS, Rust, .NET, Ruby, PHP, C++, Swift) and the Collector — a Go binary with receiver / processor / exporter pipelines for any combination of source and backend.
OTLP wire protocol uses Protocol Buffers over gRPC (port 4317) or HTTP/Protobuf and HTTP/JSON (port 4318). The Collector ships in two builds: `otelcol-core` and `otelcol-contrib` with 100+ receivers and exporters.
Yobibyte emits OTel traces and metrics for every inference and fine-tune workload under OpenInference / OpenLLMetry semantic conventions; Yobitel NeoCloud regions run Collector gateways that customers can OTLP-push to without standing up their own collector tier.

Overview

OpenTelemetry (OTel) is a CNCF Incubating project that standardises how applications produce telemetry — traces, metrics, logs and profiling — and how that telemetry moves from the application to a backend. It is not a backend itself: OTel ships the instrumentation API, the language SDKs that implement the API, the OTLP wire protocol that carries telemetry over the network and the Collector that routes telemetry between sources and backends. The storage and UI live downstream in Prometheus, Tempo, Loki, Jaeger, Phoenix, Langfuse, Datadog, Honeycomb, New Relic, Grafana Cloud and every major observability vendor.

Before OTel, every observability vendor shipped its own SDK. Instrumenting a service for one backend locked you in; switching meant rewriting the instrumentation. OpenTracing standardised the tracing API but left the implementation to vendors. OpenCensus standardised the implementation but had a smaller ecosystem. In 2019 the two merged under the CNCF as OpenTelemetry with the explicit goal of making instrumentation a write-once, deploy-anywhere concern. The bet has played out: every major observability vendor accepts OTLP natively, and most have deprecated their proprietary SDKs in favour of OTel.

On AI infrastructure OTel matters because the request flow is inherently multi-component. A single LLM request crosses an API gateway, an orchestrator, a retrieval step (embedding + vector DB lookup), a model-server call (vLLM or TensorRT-LLM), tool invocations and a response post-processor. Without trace context propagation those become disconnected log lines; with OTel they become one trace with parent-child spans, latency attribution and a trace ID that links from a Prometheus exemplar to a Jaeger waterfall to a Phoenix evaluation view.

Yobibyte instruments every inference and fine-tune workload with OTel under the OpenInference / OpenLLMetry semantic conventions and emits OTLP traces, metrics and logs back to the customer's chosen backend. The Yobitel NeoCloud regional Collector gateway accepts customer OTLP pushes — useful for application-side instrumentation that needs to enrich and sample before egress — and the regional control plane itself is OTel-instrumented for Yobitel's internal SRE rota.

This entry helps you instrument an AI application correctly the first time, choose between sidecar / DaemonSet / gateway Collector topologies, apply OpenInference semantic conventions to LLM spans, and integrate the result with Yobibyte's managed observability surface and the Yobitel NeoCloud Collector gateway.

Quick start

The example below instruments a Python LLM application with OTel using the auto-instrumentation for the OpenAI SDK plus a manual span for the retrieval step, deploys an OTel Collector on Kubernetes that fans out to Prometheus, Tempo and Loki, and verifies a trace end-to-end. The second block is the standalone Collector for a bare-metal host. The third block points an OTLP exporter at a Yobitel NeoCloud regional Collector gateway.

# 1. Python application — auto-instrument OpenAI + manual RAG span
pip install \
  opentelemetry-api \
  opentelemetry-sdk \
  opentelemetry-exporter-otlp \
  opentelemetry-instrumentation-openai \
  openinference-instrumentation-openai

# Run with auto-instrumentation
OTEL_SERVICE_NAME=rag-app \
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector.observability:4317 \
OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=prod,k8s.cluster=london-1 \
OTEL_TRACES_SAMPLER=parentbased_traceidratio \
OTEL_TRACES_SAMPLER_ARG=0.1 \
opentelemetry-instrument python app.py

# 2. Deploy OTel Collector via the operator + DaemonSet+Gateway pattern
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm upgrade --install otel-collector open-telemetry/opentelemetry-collector \
    --namespace observability --create-namespace \
    --set mode=daemonset \
    --set image.repository=otel/opentelemetry-collector-contrib \
    --set presets.kubernetesAttributes.enabled=true \
    --set presets.kubeletMetrics.enabled=true \
    --set presets.hostMetrics.enabled=true

# Gateway tier — central collector for sampling + export
helm upgrade --install otel-gateway open-telemetry/opentelemetry-collector \
    --namespace observability \
    --set mode=deployment \
    --set replicaCount=3

# 3. Push OTLP to a Yobitel NeoCloud regional Collector gateway
# (illustrative env var)
OTEL_EXPORTER_OTLP_ENDPOINT=https://otel.london-1.yobitel.com:4317 \
OTEL_EXPORTER_OTLP_HEADERS="authorization=Bearer $YOBITEL_TENANT_TOKEN"

Tip: Always set OTEL_RESOURCE_ATTRIBUTES with service.name, service.version, deployment.environment and k8s.cluster at the very minimum. Spans without identifying resource attributes are operationally useless — you lose the join from a trace to the workload, the pod, the cluster and the tenant.

How it works

OTel separates four concerns: the API (what the application calls), the SDK (the in-process implementation), the wire protocol (OTLP — how telemetry leaves the process) and the Collector (how telemetry is routed, processed and exported). The API is intentionally minimal — start a span, record a metric, emit a log — so application code is portable across SDK implementations. The SDK adds the production machinery: span processors, batch exporters, samplers, resource detection, propagation across process boundaries.

Each signal has its own data model. A trace is a tree of spans sharing a 128-bit trace ID; each span has a 64-bit span ID, a parent span ID, a name, a start and end timestamp, a status, a kind (SERVER, CLIENT, INTERNAL, PRODUCER, CONSUMER) and an arbitrary attribute map. Metrics come in four instruments — counter, up-down counter, histogram, gauge — each with an attribute set and an aggregation temporality (delta or cumulative). Logs in OTel are timestamped severity records with attributes and a trace context, designed to coexist with traces rather than replace structured-log frameworks.

Context propagation is the mechanism that knits cross-process spans into one trace. The W3C Trace Context standard defines a traceparent HTTP header (00-<trace-id>-<span-id>-<flags>) and a tracestate header for vendor extensions; the OTel SDK auto-injects these on outbound HTTP, gRPC, Kafka and AWS SDK calls when the respective instrumentation is active. On the receiving side the SDK extracts the headers and makes the upstream span the parent of the new server-side span.

OTLP (OpenTelemetry Protocol) is the wire format. Protocol Buffers schemas define TracesData, MetricsData, LogsData and (in-progress) ProfilesData messages; transports are gRPC on port 4317 (preferred) and HTTP/Protobuf or HTTP/JSON on port 4318. Because every signal shares one wire format, a single Collector pipeline can fan signals out to different backends — Prometheus for metrics, Tempo for traces, Loki for logs — without changing any application code.

The Collector is the most underrated part of OTel. It is a Go binary with three pluggable stages: receivers (OTLP, Prometheus scrape, Jaeger, Zipkin, Fluentd, Kafka, AWS Firehose, 80+ more), processors (batch, memory limiter, attributes, k8sattributes, resource, tail_sampling, transform, redaction) and exporters (OTLP to another Collector, Prometheus remote_write, Tempo, Loki, Jaeger, every major vendor). Pipelines are wired per-signal — you can have one metrics pipeline, two traces pipelines and three logs pipelines in one Collector process.

API → SDK → Exporter → Collector → Backend — the standard pipeline; each step is independently swappable.
Trace context propagation: W3C traceparent HTTP header injected by auto-instrumentation on outbound calls.
Sampling: head-based at the SDK (parentbased_traceidratio) or tail-based at the gateway Collector (tail_sampling processor).
Resource attributes: service.name, service.version, deployment.environment, k8s.*, cloud.*, gpu.uuid — stamped on every signal from a process.
Semantic conventions: standard attribute names — http.method, db.system, messaging.system, gen_ai.system, gen_ai.usage.input_tokens.
OpenInference / OpenLLMetry: LLM-specific conventions — llm.prompt, llm.completion, llm.token_count.completion, llm.model_name, retrieval.documents.
Collector topology: agent (DaemonSet, near app) + gateway (Deployment, centralised) is the standard production shape.
Tail-based sampling: keep all error traces, slow traces and statistically interesting traces; drop the boring majority — typically cuts trace storage by 90 percent.

Reference and specifications

The reference below documents the OTel surface that an AI-infrastructure engineer touches most. The full specification is much larger — propagation formats, exemplars, log data model, metric aggregation rules — but the table covers the API artefacts, environment variables, OTLP wire details, Collector building blocks and the LLM-specific semantic conventions that matter when instrumenting a vLLM-backed RAG application or a Yobibyte fine-tune workflow.

Artefact	Type	Purpose
`Tracer.start_as_current_span(name)`	API	Open a span and make it the active context.
`Span.set_attribute(key, value)`	API	Attach a key-value attribute to the current span.
`Span.record_exception(exc)`	API	Record an exception event and set status to error.
`Meter.create_counter / create_histogram`	API	Construct a metric instrument.
`Counter.add(value, attributes)`	API	Record a counter increment with dimension attributes.
`Histogram.record(value, attributes)`	API	Record an observation in a histogram instrument.
`BatchSpanProcessor`	SDK	Buffer spans and flush in batches to the exporter.
`TraceIdRatioBased(rate)` sampler	SDK	Head-based sampling — keep `rate` fraction of trace IDs.
`ParentBased` sampler	SDK	Honour the upstream sampling decision; default in production.
`OTEL_SERVICE_NAME`	env	Sets `service.name` resource attribute.
`OTEL_EXPORTER_OTLP_ENDPOINT`	env	OTLP target — gRPC `:4317` or HTTP `:4318`.
`OTEL_EXPORTER_OTLP_PROTOCOL`	env	`grpc` (default) or `http/protobuf`.
`OTEL_RESOURCE_ATTRIBUTES`	env	Comma-separated resource attributes.
`OTEL_TRACES_SAMPLER`	env	`parentbased_traceidratio` is the production default.
`OTEL_TRACES_SAMPLER_ARG`	env	Sampling ratio — `0.1` keeps 10% of new traces.
`OTEL_PROPAGATORS`	env	`tracecontext,baggage` is the W3C default.
`OTEL_LOG_LEVEL`	env	SDK self-log verbosity.
OTLP/gRPC port 4317	wire	Protocol Buffers over gRPC — preferred transport.
OTLP/HTTP port 4318	wire	Protocol Buffers (or JSON) over HTTP — firewall-friendly.
W3C `traceparent` header	propagation	`00-<trace-id>-<span-id>-<flags>`; identifies the parent span across processes.
W3C `tracestate` header	propagation	Vendor-specific routing data; stays opaque to most middleware.
`receivers.otlp` (Collector)	Collector	Accept OTLP over gRPC and/or HTTP.
`receivers.prometheus` (Collector)	Collector	Scrape Prometheus targets and convert to OTLP metrics.
`processors.batch`	Collector	Buffer and flush — always include in production pipelines.
`processors.k8sattributes`	Collector	Enrich spans with `k8s.pod.`, `k8s.namespace.`, `k8s.deployment.*`.
`processors.tail_sampling`	Collector	Per-trace sampling at the gateway based on errors / latency / attributes.
`processors.transform`	Collector	OTTL — OpenTelemetry Transformation Language for span/metric rewriting.
`exporters.otlp`	Collector	Forward to another Collector or any OTLP backend.
`exporters.prometheusremotewrite`	Collector	Send OTel metrics to Prometheus.
`gen_ai.system` attribute	semconv	LLM provider name — `openai`, `anthropic`, `yobibyte`.
`gen_ai.request.model`	semconv	Model identifier — `gpt-4o`, `llama-3.1-70b-instruct`.
`gen_ai.usage.input_tokens` / `output_tokens`	semconv	Token counts — basis of cost attribution.
`gen_ai.response.finish_reasons`	semconv	`stop` / `length` / `content_filter` etc.
OpenInference `llm.prompt`	semconv	Full prompt content (subject to redaction).
OpenInference `retrieval.documents`	semconv	Retrieved document IDs and scores for RAG spans.

Warning: Avoid putting raw prompts or completions in span attributes without a redaction processor. The OpenInference convention names exist (llm.prompt, llm.completion) but leaking customer prompts to a third-party tracing vendor is a routine compliance incident. Use the Collector's transform or redaction processor to strip or hash sensitive fields before egress — see the Security section.

Workload patterns

Three workload shapes cover the bulk of OTel deployments on AI infrastructure: a single-service Python application emitting traces directly to a Collector gateway, a multi-tenant Kubernetes cluster with agent-plus-gateway Collector topology, and a customer pushing OTLP into a Yobitel NeoCloud regional Collector gateway. Each pattern has a different propagation, sampling and processor profile.

Pattern A — single Python service, head-based sampling, direct push. Application uses OTel SDK with auto-instrumentation for the OpenAI SDK, FastAPI, requests, httpx, SQLAlchemy and Redis plus manual spans for orchestration. OTEL_TRACES_SAMPLER=parentbased_traceidratio with a 10 percent ratio. Direct OTLP push to a single Collector gateway that handles enrichment and export. This is the right shape for a small RAG service or a development cluster — minimal infrastructure, easy to debug.

Pattern B — agent + gateway on Kubernetes. The OpenTelemetry Operator runs a DaemonSet agent Collector on every node performing local enrichment (k8sattributes, hostmetrics, kubeletstats) and a Deployment gateway Collector tier (3-5 replicas) performing tail-based sampling, redaction and export. Applications push OTLP to the local node-agent (low-latency, no cross-zone hop); the agent forwards to the gateway via OTLP/gRPC; the gateway fans out to Tempo, Prometheus, Loki and the LLM observability backend (Langfuse, Phoenix or Helicone). This is the production shape on every Yobitel NeoCloud region and the shape we recommend customers adopt for medium-to-large AI workloads.

Pattern C — push OTLP into a Yobibyte / NeoCloud regional Collector gateway. The customer runs application instrumentation but does not stand up their own Collector tier. The OTel SDK exports OTLP/gRPC to otel.<region>.yobitel.com:4317 with a tenant-scoped bearer token. Yobitel's regional gateway accepts the push, enriches with NeoCloud-side attributes (gpu.uuid, nvidia.gpu.model), applies tenant-scoped tail sampling and forwards to the customer's configured backend — by default the Yobibyte console's trace browser, but customers can route to Datadog, Honeycomb or any OTLP target. This is the recommended integration for customers who want immediate trace visibility without operating their own Collector tier.

# Gateway Collector — tail sampling, k8s enrichment, multi-backend export
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: gateway
  namespace: observability
spec:
  mode: deployment
  replicas: 3
  image: otel/opentelemetry-collector-contrib:0.110.0
  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch:
        timeout: 5s
        send_batch_size: 8192
      memory_limiter:
        check_interval: 2s
        limit_percentage: 80
        spike_limit_percentage: 25
      k8sattributes:
        auth_type: serviceAccount
        passthrough: false
        extract:
          metadata: [k8s.namespace.name, k8s.pod.name, k8s.deployment.name, k8s.node.name]
      resource:
        attributes:
          - key: deployment.environment
            value: prod
            action: insert
      transform/redact:
        trace_statements:
          - context: span
            statements:
              - replace_pattern(attributes["llm.prompt"], "(?i)email[^ ]+", "<redacted>")
              - delete_key(attributes, "llm.completion") where attributes["customer.tier"] == "regulated"
      tail_sampling:
        decision_wait: 10s
        num_traces: 100000
        policies:
          - { name: errors, type: status_code, status_code: { status_codes: [ERROR] } }
          - { name: slow, type: latency, latency: { threshold_ms: 1000 } }
          - { name: sample, type: probabilistic, probabilistic: { sampling_percentage: 10 } }
    exporters:
      otlp/tempo:
        endpoint: tempo:4317
        tls: { insecure: true }
      prometheusremotewrite:
        endpoint: http://prometheus:9090/api/v1/write
      loki:
        endpoint: http://loki:3100/loki/api/v1/push
      otlp/yobibyte:
        endpoint: otel.london-1.yobitel.com:4317
        headers: { authorization: "Bearer ${YOBITEL_TENANT_TOKEN}" }
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch, k8sattributes, resource, transform/redact, tail_sampling]
          exporters: [otlp/tempo, otlp/yobibyte]
        metrics:
          receivers: [otlp]
          processors: [memory_limiter, batch, k8sattributes, resource]
          exporters: [prometheusremotewrite]
        logs:
          receivers: [otlp]
          processors: [memory_limiter, batch, k8sattributes, resource]
          exporters: [loki]

Tip: Always run the OTel Collector as a DaemonSet plus a Deployment gateway. The DaemonSet collects local node telemetry with low latency; the gateway centralises sampling and export. Never let application pods talk directly to the trace backend — the indirection is what makes tail sampling, redaction and vendor switching cheap.

Sizing and capacity planning

OTel sizing is governed by span rate, span size, processor cost (especially tail sampling), batch/flush configuration and the number of exporter destinations. As a planning anchor, a typical LLM application produces 5-20 spans per user request (gateway → orchestrator → 1-3 retrieval spans → 1 LLM call → response post-processor) and each span averages 1-3 KB after attributes. The table below maps representative workload sizes onto Collector footprint at agent and gateway tiers, plus the downstream tracing-store impact.

On Yobitel NeoCloud the regional gateway Collector tier runs as a 3-replica HPA-backed Deployment per region, sized for the regional inference and fine-tune traffic with tail sampling at 10 percent of clean traces, 100 percent of error traces and 100 percent of traces above 1 s latency. This typically cuts downstream Tempo storage by 90 percent without losing investigable incidents — the same pattern we recommend for customer-side gateway deployments.

Default flush — batch processor: 5 s timeout, 8,192 spans per batch on the gateway; 2 s / 1,024 spans on the agent.
Memory limiter — always include memory_limiter processor first; it back-pressures the receiver before OOM.
Sampling — head-based at 10 percent on the SDK, tail-based at the gateway for error and slow-trace retention.
Resource — ~3 KB RAM per buffered span pre-export; pre-sample buffer dominates gateway sizing.
Network — OTLP/gRPC is roughly 4-6x more efficient than OTLP/HTTP-JSON; prefer gRPC where the network allows.
Yobitel NeoCloud gateway anchor: 3-replica Deployment, 2 vCPU / 4 GB RAM per replica per 5,000 spans/s.
Customer push to Yobibyte: bandwidth caps apply per tenant token — 50 MB/s for standard tiers, configurable up.

Workload	Spans/s	Pre-sample bandwidth	Post-sample bandwidth	Collector RAM (agent / gateway)	Tempo storage (30d)
Single dev RAG app	~10	~50 KB/s	~5 KB/s	100 MB / 256 MB	~3 GB
Small production LLM service	~200	~600 KB/s	~60 KB/s	200 MB / 512 MB	~30 GB
Multi-tenant inference platform (100 RPS)	~2,000	~6 MB/s	~600 KB/s	500 MB × N nodes / 2 GB × 3	~300 GB
Yobitel London-1 region	~10,000	~30 MB/s	~3 MB/s	1 GB × N / 4 GB × 5	~1.5 TB
Yobitel multi-region fleet	~40,000	~120 MB/s	~12 MB/s	Per-region only	~6 TB

Limits and quotas

OTel itself has very few hard limits — the SDK and Collector are designed to be horizontally scalable. The constraints that matter operationally are span and attribute size caps, sampling-decision memory in tail samplers, exporter queue depth, and downstream backpressure. The table below documents each ceiling and the operational lever for raising it.

Limit	Default	Ceiling	How to raise / work around
Max attributes per span	128	configurable	`OTEL_ATTRIBUTE_COUNT_LIMIT` env or SDK config.
Max attribute value length	unlimited	wire-size bound	`OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT`; redact long prompts.
Max events per span	128	configurable	`OTEL_SPAN_EVENT_COUNT_LIMIT`; prefer attributes.
Max links per span	128	configurable	`OTEL_SPAN_LINK_COUNT_LIMIT`; rarely needed above default.
SDK batch queue size	2,048	memory-bound	`BatchSpanProcessor(max_queue_size=4096)`.
SDK batch flush interval	5 s	n/a	Lower for SLA-critical short-lived spans.
OTLP/gRPC max message	4 MB	configurable	`grpc.max_recv_msg_size_mib` on receiver.
Collector receiver buffer	default	memory_limiter	`memory_limiter` processor stops accepting at threshold.
`tail_sampling` `num_traces`	50,000	RAM-bound	Raise `num_traces`; `decision_wait` longer for slow traces.
`tail_sampling` decision_wait	30 s	trace duration	Must exceed max realistic trace duration.
Exporter queue depth	1,000	memory-bound	`sending_queue.queue_size` on each exporter.
Exporter retry on failure	5 attempts	n/a	`retry_on_failure.max_elapsed_time`; circuit-breaker downstream.
OpenInference `llm.prompt` length	unlimited	wire-size + compliance	Redaction processor; convert to hash for audit trail.
Yobitel tenant push rate	50 MB/s standard	Tier-dependent	Higher tiers available via Yobitel sales.

Warning: Tail sampling needs to buffer every span of every active trace in memory until the decision_wait elapses. A noisy service with 10,000 spans/s and 30 s decision_wait holds 300,000 spans in RAM — roughly 1 GB. Either raise gateway RAM, lower decision_wait, or move tail sampling closer to the source.

Observability

The Collector is itself observable — it exposes Prometheus metrics on port 8888 by default, plus its own OTLP self-telemetry if configured. The metrics below cover the failure modes that account for almost all production Collector incidents, plus the SDK-side telemetry every OTel deployment should alert on.

otelcol_receiver_accepted_spans / _refused_spans: receiver-side throughput and rejection rate.
otelcol_processor_batch_send_size: batch sizes — too small wastes bandwidth, too large blows memory.
otelcol_processor_dropped_spans: dropped traces — investigate immediately, never silently tolerate.
otelcol_exporter_send_failed_spans: downstream exporter failures; pair with retry queue depth.
otelcol_processor_memory_limiter_state: 1 is healthy, 2 indicates limiter is actively dropping.
otelcol_process_runtime_total_alloc_bytes: Go runtime memory; pair with limiter for OOM defence.
SDK-side otel.sdk.span_processor.spans_processed: spans flushed by the BatchSpanProcessor.
SDK-side otel.sdk.exporter.export_calls: exporter activity and failure rate.
Trace round-trip canary: synthetic trace every 30 s end-to-end through agent + gateway + backend.

# Prometheus alerting rules for OTel Collector health
groups:
  - name: otel-collector
    interval: 30s
    rules:
      - alert: OTelCollectorDroppingSpans
        expr: rate(otelcol_processor_dropped_spans_total[5m]) > 0
        for: 5m
        labels: { severity: critical, team: observability }
        annotations:
          summary: "Collector {{ $labels.instance }} dropping spans — back-pressure or config error"

      - alert: OTelCollectorMemoryLimiterTripped
        expr: otelcol_processor_memory_limiter_state == 2
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "Collector memory limiter forcing GC/drop — scale up or sample harder"

      - alert: OTelCollectorExporterFailed
        expr: rate(otelcol_exporter_send_failed_spans_total[10m]) > 10
        for: 10m
        labels: { severity: warning }
        annotations:
          summary: "Exporter {{ $labels.exporter }} failing — backend unreachable?"

      - alert: OTelCollectorExporterQueueFull
        expr: otelcol_exporter_queue_size / otelcol_exporter_queue_capacity > 0.9
        for: 10m
        labels: { severity: warning }
        annotations:
          summary: "Exporter {{ $labels.exporter }} queue >90% — downstream slow"

      - alert: OTelCollectorReceiverRefusing
        expr: rate(otelcol_receiver_refused_spans_total[10m]) > 0
        for: 10m
        labels: { severity: critical }
        annotations:
          summary: "Receiver refusing spans — config, TLS, or auth error"

      - alert: OTelTraceCanaryFailed
        expr: synthetic_trace_roundtrip_seconds > 60
        for: 10m
        labels: { severity: critical }
        annotations:
          summary: "Synthetic trace not round-tripping — collector or backend down"

Tip: Add a synthetic trace canary every 30 s — emit a tiny trace from a known service and verify it appears in your tracing store within a minute. This is the only alert that catches "collector running but not delivering" failure modes, which the per-component metrics often miss.

Cost and FinOps

OTel itself is free under Apache 2.0 — there is no licence cost. The operational cost is the Collector compute footprint plus the downstream tracing, metrics and logs stores, and the bandwidth between them. Tail sampling is the single biggest lever — a well-tuned tail sampler typically cuts trace storage cost by 90 percent without losing investigable traces. The table below puts both in USD terms for representative AI workloads.

Collector compute: ~$50-100/month per 3-replica gateway on a small VM tier; scales linearly with span rate.
Tempo / Jaeger self-hosted: ~$0.025/GB-month on object storage — trace storage is the dominant cost.
Datadog APM: ~$31/host/month or per-span usage tier; tail sampling is essential to control cost.
Honeycomb: per-event pricing; tail sampling at the Collector keeps event count predictable.
Yobitel NeoCloud: regional Collector gateway + Tempo + Loki are included in the GPU rate; no per-span fee.
Yobibyte managed observability: OTel push endpoint and 30-day trace retention included in the workspace fee.
FinOps wedge: redact or hash llm.prompt and llm.completion before export to per-event-priced vendors — prompt/completion bytes are the single biggest cost lever on LLM traces.

Workload	Spans/s pre-sample	Collector + Tempo (self-hosted, monthly USD)	Datadog APM (monthly USD)	Honeycomb (monthly USD)	Yobitel NeoCloud
Single dev RAG app	~10	~$25	~$60	~$0 (free tier)	Included via Yobibyte
Small production LLM service	~200	~$200	~$600	~$130	Included via Yobibyte
Multi-tenant inference platform	~2,000	~$1,400	~$5,500	~$1,300	Included via Yobibyte
Yobitel London-1 region	~10,000	~$7,000	~$28,000	~$6,500	Yobitel-operated
Push to Yobitel NeoCloud Collector	varies	~$25 (small Collector)	n/a	n/a	Federation/push included

Security and compliance

OTel does not authenticate by default but supports TLS (mutual TLS supported) and bearer-token auth on every receiver and exporter. The Collector accepts arbitrary OTLP from any caller that can reach the port; in production this is fronted by NetworkPolicy, an Envoy/Istio mesh, or a public-facing reverse proxy with mTLS. The Yobitel NeoCloud regional gateway requires per-tenant bearer tokens scoped to that tenant; the same token shape is documented for customer-side Collector ingress.

The most sensitive OTel field on AI workloads is the LLM prompt and completion. The OpenInference convention provides llm.prompt and llm.completion attribute names, but populating them sends raw user input to whatever trace backend you have configured. For UK public-sector workloads (NCSC Cloud Security Principles, G-Cloud 14, OFFICIAL-handling) and EU Data Boundary commitments, leaking prompt content cross-region is a documented control breach. The Collector's transform and redaction processors strip or hash the prompt/completion before egress; both are configurable per-tenant on the Yobitel NeoCloud regional gateway.

Yobibyte's customer-facing OTel surface enforces three controls. Per-tenant API tokens limit OTLP push and federation to that tenant's scope. The regional Collector gateway applies a default redaction pipeline that hashes llm.prompt and llm.completion for tenants on regulated tiers; raw content stays inside the sovereign tenancy. All cross-tenant export paths are signed and rate-limited per token. Customers see enough trace fidelity to debug their own workloads — span tree, latency breakdown, retrieval document IDs, token counts — without raw payload escaping the sovereignty boundary.

Warning: Treat llm.prompt and llm.completion as PII by default. Even when you control the trace backend, the same span will end up on a developer's laptop the first time they investigate a slow request. Redact or hash at the Collector before export and decide explicitly which tenants and tiers are allowed raw prompt visibility.

Migration and alternatives

Most migrations to OTel come from one of four origins: vendor-specific tracing SDKs (Datadog APM SDK, New Relic SDK, AWS X-Ray SDK), Jaeger or Zipkin direct instrumentation, legacy OpenTracing or OpenCensus code, or no tracing at all. The table below documents the trade-offs of each migration path. For green-field LLM applications, install the OTel SDK plus auto-instrumentation and skip the alternatives — every backend you might want to use already accepts OTLP.

Vendor SDK migrations are typically straightforward: the OTel SDK is the long-term direction every vendor publicly endorses, and most vendor SDKs offer an OTel-compatible mode that emits OTLP alongside the proprietary protocol. Run both for one quarter, confirm trace fidelity, then drop the proprietary SDK.

Migration source	Effort	What you gain	What you lose
Datadog APM SDK	Low — auto-instrumentation overlaps heavily	Vendor portability, open semconv, no per-host fee for Collector	Datadog-specific dashboards re-built on Tempo or kept via OTLP export to Datadog
New Relic SDK	Low	Same	Same
AWS X-Ray SDK	Medium	Multi-cloud portability, open semconv	X-Ray service map regenerated downstream
Jaeger direct (Jaeger SDK)	Trivial — Jaeger receiver in Collector	OTLP everywhere, modern auto-instrumentation	Jaeger SDK was deprecated in favour of OTel
Zipkin direct	Trivial — Zipkin receiver in Collector	Same	Same
OpenTracing / OpenCensus	Medium	Active community, no maintenance fork	Compatibility shims exist; minor API changes
No tracing at all	Trivial — install SDK + auto-instrumentation	Every benefit	n/a — this is the right migration
Send OTLP to Datadog / Honeycomb / New Relic	Configuration only	Vendor-neutral application code	n/a — the explicit goal of OTel
Yobitel-managed observability (Yobibyte)	OTLP env var	No Collector to run; included in workspace fee	Choose Yobibyte UI or your own backend

# Equivalent: vendor SDK vs OTel SDK
# Old: Datadog APM SDK
from ddtrace import tracer, patch_all
patch_all()

@tracer.wrap()
def answer_question(question, docs):
    span = tracer.current_span()
    span.set_tag("rag.docs", len(docs))
    return openai.chat.completions.create(...)

# New: OTel SDK + auto-instrumentation + OpenInference for LLM spans
from opentelemetry import trace
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from openinference.instrumentation.openai import OpenAIInstrumentor as OIOpenAI

# Standard HTTP / DB auto-instrumentation
OpenAIInstrumentor().instrument()
# LLM-aware semantic conventions
OIOpenAI().instrument()

tracer = trace.get_tracer(__name__)

def answer_question(question, docs):
    with tracer.start_as_current_span("rag.answer") as span:
        span.set_attribute("rag.docs", len(docs))
        return openai.chat.completions.create(...)

# Export config is environment, not code:
#   OTEL_EXPORTER_OTLP_ENDPOINT=https://otel.london-1.yobitel.com:4317
#   OTEL_EXPORTER_OTLP_HEADERS=authorization=Bearer $YOBITEL_TENANT_TOKEN

Note: Picking between Tempo, Jaeger, Honeycomb, Datadog and Phoenix as your trace backend is much smaller than picking OTel over a vendor SDK. The instrumentation work is what costs months; the export destination is one Collector config change. Optimise the instrumentation choice first.

Troubleshooting

The error table below covers the failure modes that account for almost all real OTel incidents. Each row maps an observable symptom to the underlying cause and the minimum-viable fix. Most issues trace back to one of four root causes: misconfigured propagation, dropped spans under backpressure, exporter target unreachable, or attribute cardinality blowing out downstream storage.

Symptom	Cause	Fix
Spans appear but no parent-child relationship	Propagation header not injected/extracted across services	Confirm `OTEL_PROPAGATORS=tracecontext,baggage`; verify auto-instrumentation covers the HTTP client.
Traces split into multiple disconnected trees	Trace context lost across an async boundary	Use `trace.use_span(span)` context manager or carry context manually.
SDK dropping spans silently	BatchSpanProcessor queue full	Raise `max_queue_size`; lower `schedule_delay_millis`; investigate exporter throughput.
Collector receiver refusing OTLP	TLS, auth, port mismatch	Curl `/v1/traces` directly; check Collector logs for receiver errors.
Tail sampler dropping known errors	Policy ordering — probabilistic before status_code	Reorder policies so error / latency policies precede sampling.
High cardinality blowing out metrics backend	User ID or request ID set as metric attribute	Move identifier to span attribute (free), drop from metric attributes.
LLM prompt visible in shared dashboard	Redaction not applied at the gateway	Add `transform` processor to delete/replace `llm.prompt` before export.
Spans missing k8s metadata	k8sattributes processor not enabled or RBAC missing	Add processor; grant ClusterRole over pods, namespaces and deployments.
Span timestamps in the future	Container clock drift	Run chrony/NTP on every node; verify with `chronyc tracking`.
Exporter retry storm during backend outage	No circuit breaker, infinite retry	Set `retry_on_failure.max_elapsed_time`; pair with bounded queue.
Trace canary failing intermittently	One of agent/gateway/backend periodically slow	Run canary at each hop; identify which leg is slow.
Tenant push to Yobitel returns 401	Tenant token expired or scoped to wrong tenant	Rotate via the Yobibyte console; verify `authorization: Bearer` header on exporter.
OpenInference attributes missing	Standard auto-instrumentation but no openinference instrumentor	Install and call `openinference.instrumentation.openai.OpenAIInstrumentor().instrument()`.
Token counts wrong on streamed responses	Auto-instrumentation captures usage only on final chunk	Confirm stream completion handler runs; some SDK versions need manual span finalisation.

Where this fits in the Yobitel stack

OpenTelemetry is the tracing and structured-telemetry layer Yobitel runs internally and the integration surface Yobitel publishes to customers. Every Yobibyte inference replica, fine-tune workflow and marketplace deployment is OTel-instrumented under the OpenInference / OpenLLMetry conventions. Spans carry the OpenAI-compatible request/response shape, retrieval document IDs, model name, region pin and tenant identifier. The Yobitel NeoCloud regional Collector gateway accepts the traces, applies tenant-scoped tail sampling, redaction and enrichment and exports to the customer's chosen backend by default — Yobibyte console for the included trace browser, or any OTLP target the customer configures.

Customer-side integration uses one of two paths. Customers running their own Collector tier point an exporter at the Yobitel NeoCloud regional Collector gateway with a tenant-scoped bearer token, and Yobibyte's spans appear nested inside the customer's application traces — a single trace ID from the user's request through the customer's orchestrator into Yobibyte's inference replica and back. Customers without a Collector tier push OTLP directly from their application SDK to the same gateway and rely on the Yobibyte console for the trace UI. Either path preserves W3C trace context across the Yobibyte boundary.

On UK and EU sovereign tenancies the regional Collector gateway never forwards traces cross-region. Tail sampling, redaction and tenant scoping run in-region; the Yobibyte console queries only the in-region Tempo. Sovereign customers under NCSC Cloud Security Principles, G-Cloud 14 OFFICIAL-handling or EU Data Boundary commitments see a one-region trace surface and a documented control boundary. The recipe-protection rule applies: customers see the trace fidelity they need (latency breakdown, retrieval IDs, token counts) without disclosing Yobitel's internal scheduling, admission or routing spans — see the yobibyte entry for the customer-facing observability surface and the prometheus entry for the parallel metrics path.

References

OpenTelemetry Documentation · OpenTelemetry Project
OpenTelemetry on GitHub · GitHub
OpenTelemetry at the CNCF · Cloud Native Computing Foundation
OpenTelemetry Specification · OpenTelemetry
OTLP Specification · OpenTelemetry
W3C Trace Context · W3C
OpenInference Semantic Conventions · GitHub (Arize)
OpenLLMetry Conventions · GitHub (Traceloop)
OpenTelemetry Collector Components · GitHub

TL;DR

CNCF Incubating (since 2021), Apache 2.0. Formed in 2019 by merging OpenTracing (tracing API) and OpenCensus (Google's metrics + tracing library); the second most active CNCF project after Kubernetes by contributor count.
Defines a vendor-neutral API, SDK and wire protocol (OTLP) for four signals: traces, metrics, logs and (in-progress) profiling. Instrument once, export to any backend that speaks OTLP.
Operational surface is two artefacts: language SDKs (Python, Go, Java, JS/TS, Rust, .NET, Ruby, PHP, C++, Swift) and the Collector — a Go binary with receiver / processor / exporter pipelines for any combination of source and backend.
OTLP wire protocol uses Protocol Buffers over gRPC (port 4317) or HTTP/Protobuf and HTTP/JSON (port 4318). The Collector ships in two builds: `otelcol-core` and `otelcol-contrib` with 100+ receivers and exporters.
Yobibyte emits OTel traces and metrics for every inference and fine-tune workload under OpenInference / OpenLLMetry semantic conventions; Yobitel NeoCloud regions run Collector gateways that customers can OTLP-push to without standing up their own collector tier.

Overview

Quick start

# 1. Python application — auto-instrument OpenAI + manual RAG span
pip install \
  opentelemetry-api \
  opentelemetry-sdk \
  opentelemetry-exporter-otlp \
  opentelemetry-instrumentation-openai \
  openinference-instrumentation-openai

# Run with auto-instrumentation
OTEL_SERVICE_NAME=rag-app \
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector.observability:4317 \
OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=prod,k8s.cluster=london-1 \
OTEL_TRACES_SAMPLER=parentbased_traceidratio \
OTEL_TRACES_SAMPLER_ARG=0.1 \
opentelemetry-instrument python app.py

# 2. Deploy OTel Collector via the operator + DaemonSet+Gateway pattern
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm upgrade --install otel-collector open-telemetry/opentelemetry-collector \
    --namespace observability --create-namespace \
    --set mode=daemonset \
    --set image.repository=otel/opentelemetry-collector-contrib \
    --set presets.kubernetesAttributes.enabled=true \
    --set presets.kubeletMetrics.enabled=true \
    --set presets.hostMetrics.enabled=true

# Gateway tier — central collector for sampling + export
helm upgrade --install otel-gateway open-telemetry/opentelemetry-collector \
    --namespace observability \
    --set mode=deployment \
    --set replicaCount=3

# 3. Push OTLP to a Yobitel NeoCloud regional Collector gateway
# (illustrative env var)
OTEL_EXPORTER_OTLP_ENDPOINT=https://otel.london-1.yobitel.com:4317 \
OTEL_EXPORTER_OTLP_HEADERS="authorization=Bearer $YOBITEL_TENANT_TOKEN"

Tip: Always set OTEL_RESOURCE_ATTRIBUTES with service.name, service.version, deployment.environment and k8s.cluster at the very minimum. Spans without identifying resource attributes are operationally useless — you lose the join from a trace to the workload, the pod, the cluster and the tenant.

How it works

API → SDK → Exporter → Collector → Backend — the standard pipeline; each step is independently swappable.
Trace context propagation: W3C traceparent HTTP header injected by auto-instrumentation on outbound calls.
Sampling: head-based at the SDK (parentbased_traceidratio) or tail-based at the gateway Collector (tail_sampling processor).
Resource attributes: service.name, service.version, deployment.environment, k8s.*, cloud.*, gpu.uuid — stamped on every signal from a process.
Semantic conventions: standard attribute names — http.method, db.system, messaging.system, gen_ai.system, gen_ai.usage.input_tokens.
OpenInference / OpenLLMetry: LLM-specific conventions — llm.prompt, llm.completion, llm.token_count.completion, llm.model_name, retrieval.documents.
Collector topology: agent (DaemonSet, near app) + gateway (Deployment, centralised) is the standard production shape.
Tail-based sampling: keep all error traces, slow traces and statistically interesting traces; drop the boring majority — typically cuts trace storage by 90 percent.

Reference and specifications

Artefact	Type	Purpose
`Tracer.start_as_current_span(name)`	API	Open a span and make it the active context.
`Span.set_attribute(key, value)`	API	Attach a key-value attribute to the current span.
`Span.record_exception(exc)`	API	Record an exception event and set status to error.
`Meter.create_counter / create_histogram`	API	Construct a metric instrument.
`Counter.add(value, attributes)`	API	Record a counter increment with dimension attributes.
`Histogram.record(value, attributes)`	API	Record an observation in a histogram instrument.
`BatchSpanProcessor`	SDK	Buffer spans and flush in batches to the exporter.
`TraceIdRatioBased(rate)` sampler	SDK	Head-based sampling — keep `rate` fraction of trace IDs.
`ParentBased` sampler	SDK	Honour the upstream sampling decision; default in production.
`OTEL_SERVICE_NAME`	env	Sets `service.name` resource attribute.
`OTEL_EXPORTER_OTLP_ENDPOINT`	env	OTLP target — gRPC `:4317` or HTTP `:4318`.
`OTEL_EXPORTER_OTLP_PROTOCOL`	env	`grpc` (default) or `http/protobuf`.
`OTEL_RESOURCE_ATTRIBUTES`	env	Comma-separated resource attributes.
`OTEL_TRACES_SAMPLER`	env	`parentbased_traceidratio` is the production default.
`OTEL_TRACES_SAMPLER_ARG`	env	Sampling ratio — `0.1` keeps 10% of new traces.
`OTEL_PROPAGATORS`	env	`tracecontext,baggage` is the W3C default.
`OTEL_LOG_LEVEL`	env	SDK self-log verbosity.
OTLP/gRPC port 4317	wire	Protocol Buffers over gRPC — preferred transport.
OTLP/HTTP port 4318	wire	Protocol Buffers (or JSON) over HTTP — firewall-friendly.
W3C `traceparent` header	propagation	`00-<trace-id>-<span-id>-<flags>`; identifies the parent span across processes.
W3C `tracestate` header	propagation	Vendor-specific routing data; stays opaque to most middleware.
`receivers.otlp` (Collector)	Collector	Accept OTLP over gRPC and/or HTTP.
`receivers.prometheus` (Collector)	Collector	Scrape Prometheus targets and convert to OTLP metrics.
`processors.batch`	Collector	Buffer and flush — always include in production pipelines.
`processors.k8sattributes`	Collector	Enrich spans with `k8s.pod.`, `k8s.namespace.`, `k8s.deployment.*`.
`processors.tail_sampling`	Collector	Per-trace sampling at the gateway based on errors / latency / attributes.
`processors.transform`	Collector	OTTL — OpenTelemetry Transformation Language for span/metric rewriting.
`exporters.otlp`	Collector	Forward to another Collector or any OTLP backend.
`exporters.prometheusremotewrite`	Collector	Send OTel metrics to Prometheus.
`gen_ai.system` attribute	semconv	LLM provider name — `openai`, `anthropic`, `yobibyte`.
`gen_ai.request.model`	semconv	Model identifier — `gpt-4o`, `llama-3.1-70b-instruct`.
`gen_ai.usage.input_tokens` / `output_tokens`	semconv	Token counts — basis of cost attribution.
`gen_ai.response.finish_reasons`	semconv	`stop` / `length` / `content_filter` etc.
OpenInference `llm.prompt`	semconv	Full prompt content (subject to redaction).
OpenInference `retrieval.documents`	semconv	Retrieved document IDs and scores for RAG spans.

Warning: Avoid putting raw prompts or completions in span attributes without a redaction processor. The OpenInference convention names exist (llm.prompt, llm.completion) but leaking customer prompts to a third-party tracing vendor is a routine compliance incident. Use the Collector's transform or redaction processor to strip or hash sensitive fields before egress — see the Security section.

Workload patterns

# Gateway Collector — tail sampling, k8s enrichment, multi-backend export
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: gateway
  namespace: observability
spec:
  mode: deployment
  replicas: 3
  image: otel/opentelemetry-collector-contrib:0.110.0
  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch:
        timeout: 5s
        send_batch_size: 8192
      memory_limiter:
        check_interval: 2s
        limit_percentage: 80
        spike_limit_percentage: 25
      k8sattributes:
        auth_type: serviceAccount
        passthrough: false
        extract:
          metadata: [k8s.namespace.name, k8s.pod.name, k8s.deployment.name, k8s.node.name]
      resource:
        attributes:
          - key: deployment.environment
            value: prod
            action: insert
      transform/redact:
        trace_statements:
          - context: span
            statements:
              - replace_pattern(attributes["llm.prompt"], "(?i)email[^ ]+", "<redacted>")
              - delete_key(attributes, "llm.completion") where attributes["customer.tier"] == "regulated"
      tail_sampling:
        decision_wait: 10s
        num_traces: 100000
        policies:
          - { name: errors, type: status_code, status_code: { status_codes: [ERROR] } }
          - { name: slow, type: latency, latency: { threshold_ms: 1000 } }
          - { name: sample, type: probabilistic, probabilistic: { sampling_percentage: 10 } }
    exporters:
      otlp/tempo:
        endpoint: tempo:4317
        tls: { insecure: true }
      prometheusremotewrite:
        endpoint: http://prometheus:9090/api/v1/write
      loki:
        endpoint: http://loki:3100/loki/api/v1/push
      otlp/yobibyte:
        endpoint: otel.london-1.yobitel.com:4317
        headers: { authorization: "Bearer ${YOBITEL_TENANT_TOKEN}" }
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch, k8sattributes, resource, transform/redact, tail_sampling]
          exporters: [otlp/tempo, otlp/yobibyte]
        metrics:
          receivers: [otlp]
          processors: [memory_limiter, batch, k8sattributes, resource]
          exporters: [prometheusremotewrite]
        logs:
          receivers: [otlp]
          processors: [memory_limiter, batch, k8sattributes, resource]
          exporters: [loki]

Tip: Always run the OTel Collector as a DaemonSet plus a Deployment gateway. The DaemonSet collects local node telemetry with low latency; the gateway centralises sampling and export. Never let application pods talk directly to the trace backend — the indirection is what makes tail sampling, redaction and vendor switching cheap.

Sizing and capacity planning

Default flush — batch processor: 5 s timeout, 8,192 spans per batch on the gateway; 2 s / 1,024 spans on the agent.
Memory limiter — always include memory_limiter processor first; it back-pressures the receiver before OOM.
Sampling — head-based at 10 percent on the SDK, tail-based at the gateway for error and slow-trace retention.
Resource — ~3 KB RAM per buffered span pre-export; pre-sample buffer dominates gateway sizing.
Network — OTLP/gRPC is roughly 4-6x more efficient than OTLP/HTTP-JSON; prefer gRPC where the network allows.
Yobitel NeoCloud gateway anchor: 3-replica Deployment, 2 vCPU / 4 GB RAM per replica per 5,000 spans/s.
Customer push to Yobibyte: bandwidth caps apply per tenant token — 50 MB/s for standard tiers, configurable up.

Workload	Spans/s	Pre-sample bandwidth	Post-sample bandwidth	Collector RAM (agent / gateway)	Tempo storage (30d)
Single dev RAG app	~10	~50 KB/s	~5 KB/s	100 MB / 256 MB	~3 GB
Small production LLM service	~200	~600 KB/s	~60 KB/s	200 MB / 512 MB	~30 GB
Multi-tenant inference platform (100 RPS)	~2,000	~6 MB/s	~600 KB/s	500 MB × N nodes / 2 GB × 3	~300 GB
Yobitel London-1 region	~10,000	~30 MB/s	~3 MB/s	1 GB × N / 4 GB × 5	~1.5 TB
Yobitel multi-region fleet	~40,000	~120 MB/s	~12 MB/s	Per-region only	~6 TB

Limits and quotas

Limit	Default	Ceiling	How to raise / work around
Max attributes per span	128	configurable	`OTEL_ATTRIBUTE_COUNT_LIMIT` env or SDK config.
Max attribute value length	unlimited	wire-size bound	`OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT`; redact long prompts.
Max events per span	128	configurable	`OTEL_SPAN_EVENT_COUNT_LIMIT`; prefer attributes.
Max links per span	128	configurable	`OTEL_SPAN_LINK_COUNT_LIMIT`; rarely needed above default.
SDK batch queue size	2,048	memory-bound	`BatchSpanProcessor(max_queue_size=4096)`.
SDK batch flush interval	5 s	n/a	Lower for SLA-critical short-lived spans.
OTLP/gRPC max message	4 MB	configurable	`grpc.max_recv_msg_size_mib` on receiver.
Collector receiver buffer	default	memory_limiter	`memory_limiter` processor stops accepting at threshold.
`tail_sampling` `num_traces`	50,000	RAM-bound	Raise `num_traces`; `decision_wait` longer for slow traces.
`tail_sampling` decision_wait	30 s	trace duration	Must exceed max realistic trace duration.
Exporter queue depth	1,000	memory-bound	`sending_queue.queue_size` on each exporter.
Exporter retry on failure	5 attempts	n/a	`retry_on_failure.max_elapsed_time`; circuit-breaker downstream.
OpenInference `llm.prompt` length	unlimited	wire-size + compliance	Redaction processor; convert to hash for audit trail.
Yobitel tenant push rate	50 MB/s standard	Tier-dependent	Higher tiers available via Yobitel sales.

Warning: Tail sampling needs to buffer every span of every active trace in memory until the decision_wait elapses. A noisy service with 10,000 spans/s and 30 s decision_wait holds 300,000 spans in RAM — roughly 1 GB. Either raise gateway RAM, lower decision_wait, or move tail sampling closer to the source.

Observability

otelcol_receiver_accepted_spans / _refused_spans: receiver-side throughput and rejection rate.
otelcol_processor_batch_send_size: batch sizes — too small wastes bandwidth, too large blows memory.
otelcol_processor_dropped_spans: dropped traces — investigate immediately, never silently tolerate.
otelcol_exporter_send_failed_spans: downstream exporter failures; pair with retry queue depth.
otelcol_processor_memory_limiter_state: 1 is healthy, 2 indicates limiter is actively dropping.
otelcol_process_runtime_total_alloc_bytes: Go runtime memory; pair with limiter for OOM defence.
SDK-side otel.sdk.span_processor.spans_processed: spans flushed by the BatchSpanProcessor.
SDK-side otel.sdk.exporter.export_calls: exporter activity and failure rate.
Trace round-trip canary: synthetic trace every 30 s end-to-end through agent + gateway + backend.

# Prometheus alerting rules for OTel Collector health
groups:
  - name: otel-collector
    interval: 30s
    rules:
      - alert: OTelCollectorDroppingSpans
        expr: rate(otelcol_processor_dropped_spans_total[5m]) > 0
        for: 5m
        labels: { severity: critical, team: observability }
        annotations:
          summary: "Collector {{ $labels.instance }} dropping spans — back-pressure or config error"

      - alert: OTelCollectorMemoryLimiterTripped
        expr: otelcol_processor_memory_limiter_state == 2
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "Collector memory limiter forcing GC/drop — scale up or sample harder"

      - alert: OTelCollectorExporterFailed
        expr: rate(otelcol_exporter_send_failed_spans_total[10m]) > 10
        for: 10m
        labels: { severity: warning }
        annotations:
          summary: "Exporter {{ $labels.exporter }} failing — backend unreachable?"

      - alert: OTelCollectorExporterQueueFull
        expr: otelcol_exporter_queue_size / otelcol_exporter_queue_capacity > 0.9
        for: 10m
        labels: { severity: warning }
        annotations:
          summary: "Exporter {{ $labels.exporter }} queue >90% — downstream slow"

      - alert: OTelCollectorReceiverRefusing
        expr: rate(otelcol_receiver_refused_spans_total[10m]) > 0
        for: 10m
        labels: { severity: critical }
        annotations:
          summary: "Receiver refusing spans — config, TLS, or auth error"

      - alert: OTelTraceCanaryFailed
        expr: synthetic_trace_roundtrip_seconds > 60
        for: 10m
        labels: { severity: critical }
        annotations:
          summary: "Synthetic trace not round-tripping — collector or backend down"

Tip: Add a synthetic trace canary every 30 s — emit a tiny trace from a known service and verify it appears in your tracing store within a minute. This is the only alert that catches "collector running but not delivering" failure modes, which the per-component metrics often miss.

Cost and FinOps

Collector compute: ~$50-100/month per 3-replica gateway on a small VM tier; scales linearly with span rate.
Tempo / Jaeger self-hosted: ~$0.025/GB-month on object storage — trace storage is the dominant cost.
Datadog APM: ~$31/host/month or per-span usage tier; tail sampling is essential to control cost.
Honeycomb: per-event pricing; tail sampling at the Collector keeps event count predictable.
Yobitel NeoCloud: regional Collector gateway + Tempo + Loki are included in the GPU rate; no per-span fee.
Yobibyte managed observability: OTel push endpoint and 30-day trace retention included in the workspace fee.
FinOps wedge: redact or hash llm.prompt and llm.completion before export to per-event-priced vendors — prompt/completion bytes are the single biggest cost lever on LLM traces.

Workload	Spans/s pre-sample	Collector + Tempo (self-hosted, monthly USD)	Datadog APM (monthly USD)	Honeycomb (monthly USD)	Yobitel NeoCloud
Single dev RAG app	~10	~$25	~$60	~$0 (free tier)	Included via Yobibyte
Small production LLM service	~200	~$200	~$600	~$130	Included via Yobibyte
Multi-tenant inference platform	~2,000	~$1,400	~$5,500	~$1,300	Included via Yobibyte
Yobitel London-1 region	~10,000	~$7,000	~$28,000	~$6,500	Yobitel-operated
Push to Yobitel NeoCloud Collector	varies	~$25 (small Collector)	n/a	n/a	Federation/push included

Security and compliance

Warning: Treat llm.prompt and llm.completion as PII by default. Even when you control the trace backend, the same span will end up on a developer's laptop the first time they investigate a slow request. Redact or hash at the Collector before export and decide explicitly which tenants and tiers are allowed raw prompt visibility.

Migration and alternatives

Migration source	Effort	What you gain	What you lose
Datadog APM SDK	Low — auto-instrumentation overlaps heavily	Vendor portability, open semconv, no per-host fee for Collector	Datadog-specific dashboards re-built on Tempo or kept via OTLP export to Datadog
New Relic SDK	Low	Same	Same
AWS X-Ray SDK	Medium	Multi-cloud portability, open semconv	X-Ray service map regenerated downstream
Jaeger direct (Jaeger SDK)	Trivial — Jaeger receiver in Collector	OTLP everywhere, modern auto-instrumentation	Jaeger SDK was deprecated in favour of OTel
Zipkin direct	Trivial — Zipkin receiver in Collector	Same	Same
OpenTracing / OpenCensus	Medium	Active community, no maintenance fork	Compatibility shims exist; minor API changes
No tracing at all	Trivial — install SDK + auto-instrumentation	Every benefit	n/a — this is the right migration
Send OTLP to Datadog / Honeycomb / New Relic	Configuration only	Vendor-neutral application code	n/a — the explicit goal of OTel
Yobitel-managed observability (Yobibyte)	OTLP env var	No Collector to run; included in workspace fee	Choose Yobibyte UI or your own backend

# Equivalent: vendor SDK vs OTel SDK
# Old: Datadog APM SDK
from ddtrace import tracer, patch_all
patch_all()

@tracer.wrap()
def answer_question(question, docs):
    span = tracer.current_span()
    span.set_tag("rag.docs", len(docs))
    return openai.chat.completions.create(...)

# New: OTel SDK + auto-instrumentation + OpenInference for LLM spans
from opentelemetry import trace
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from openinference.instrumentation.openai import OpenAIInstrumentor as OIOpenAI

# Standard HTTP / DB auto-instrumentation
OpenAIInstrumentor().instrument()
# LLM-aware semantic conventions
OIOpenAI().instrument()

tracer = trace.get_tracer(__name__)

def answer_question(question, docs):
    with tracer.start_as_current_span("rag.answer") as span:
        span.set_attribute("rag.docs", len(docs))
        return openai.chat.completions.create(...)

# Export config is environment, not code:
#   OTEL_EXPORTER_OTLP_ENDPOINT=https://otel.london-1.yobitel.com:4317
#   OTEL_EXPORTER_OTLP_HEADERS=authorization=Bearer $YOBITEL_TENANT_TOKEN

Note: Picking between Tempo, Jaeger, Honeycomb, Datadog and Phoenix as your trace backend is much smaller than picking OTel over a vendor SDK. The instrumentation work is what costs months; the export destination is one Collector config change. Optimise the instrumentation choice first.

Troubleshooting

Symptom	Cause	Fix
Spans appear but no parent-child relationship	Propagation header not injected/extracted across services	Confirm `OTEL_PROPAGATORS=tracecontext,baggage`; verify auto-instrumentation covers the HTTP client.
Traces split into multiple disconnected trees	Trace context lost across an async boundary	Use `trace.use_span(span)` context manager or carry context manually.
SDK dropping spans silently	BatchSpanProcessor queue full	Raise `max_queue_size`; lower `schedule_delay_millis`; investigate exporter throughput.
Collector receiver refusing OTLP	TLS, auth, port mismatch	Curl `/v1/traces` directly; check Collector logs for receiver errors.
Tail sampler dropping known errors	Policy ordering — probabilistic before status_code	Reorder policies so error / latency policies precede sampling.
High cardinality blowing out metrics backend	User ID or request ID set as metric attribute	Move identifier to span attribute (free), drop from metric attributes.
LLM prompt visible in shared dashboard	Redaction not applied at the gateway	Add `transform` processor to delete/replace `llm.prompt` before export.
Spans missing k8s metadata	k8sattributes processor not enabled or RBAC missing	Add processor; grant ClusterRole over pods, namespaces and deployments.
Span timestamps in the future	Container clock drift	Run chrony/NTP on every node; verify with `chronyc tracking`.
Exporter retry storm during backend outage	No circuit breaker, infinite retry	Set `retry_on_failure.max_elapsed_time`; pair with bounded queue.
Trace canary failing intermittently	One of agent/gateway/backend periodically slow	Run canary at each hop; identify which leg is slow.
Tenant push to Yobitel returns 401	Tenant token expired or scoped to wrong tenant	Rotate via the Yobibyte console; verify `authorization: Bearer` header on exporter.
OpenInference attributes missing	Standard auto-instrumentation but no openinference instrumentor	Install and call `openinference.instrumentation.openai.OpenAIInstrumentor().instrument()`.
Token counts wrong on streamed responses	Auto-instrumentation captures usage only on final chunk	Confirm stream completion handler runs; some SDK versions need manual span finalisation.

Where this fits in the Yobitel stack

References

OpenTelemetry Documentation · OpenTelemetry Project
OpenTelemetry on GitHub · GitHub
OpenTelemetry at the CNCF · Cloud Native Computing Foundation
OpenTelemetry Specification · OpenTelemetry
OTLP Specification · OpenTelemetry
W3C Trace Context · W3C
OpenInference Semantic Conventions · GitHub (Arize)
OpenLLMetry Conventions · GitHub (Traceloop)
OpenTelemetry Collector Components · GitHub

OpenTelemetry

Overview

Quick start

How it works

Reference and specifications

Workload patterns

Sizing and capacity planning

Limits and quotas

Observability

Cost and FinOps

Security and compliance

Migration and alternatives

Troubleshooting

Where this fits in the Yobitel stack

References

Browse all entries

Deploy on Yobibyte

OpenTelemetry

Overview

Quick start

How it works

Reference and specifications

Workload patterns

Sizing and capacity planning

Limits and quotas

Observability

Cost and FinOps

Security and compliance

Migration and alternatives

Troubleshooting

Where this fits in the Yobitel stack

References

Browse all entries

Deploy on Yobibyte