OpenTelemetry Tracing in Kubernetes Core

by , | Oct 20, 2021 | Hybrid Cloud

In this post, OpenTelemetry tracing and the OpenTelemetry Protocol (OTLP) is examined. As an example, the instrumentation required to generate and export OTLP traces from CRI-O is explained. CRI-O is an implementation of the Kubernetes Container Runtime Interface, and it is the default container engine for Red Hat OpenShift. A later post demonstrates how to collect and visualize traces from CRI-O and from other Kubernetes core components.

OTLP describes the encoding, transport, and delivery of telemetry data between sources, collectors, and backends. It is a Cloud Native Computing Foundation (CNCF) Incubating Project, and the result of two open source solutions: in 2019, OpenTracing and OpenCensus projects merged to form OpenTelemetry. Its goal is to provide open standards for vendor-neutral and interoperable cloud-native observability solutions.

As of today, the OpenTelemetry tracing API is stable, while the metrics API is newly stable with experimental features. The logging API is a work in progress heading toward stability. In this post we focus on trace data. A trace is telemetry data that represents work being done, and spans are the building blocks of a trace. Traces are a record of the path of requests through a system. Across services, spans can be assembled into traces, and transactions can be followed as they move through a distributed system.

OpenTelemetry’s vision is to make handling telemetry data simple and a standard feature of cloud-native applications. The benefit of a community-led project for generating and exporting telemetry data is twofold. It empowers users to observe complex applications, and it allows vendors and developers to focus on improving the tools to analyze and visualize data.

Red Hat’s Emerging Technologies blog includes posts that discuss technologies that are under active development in upstream open source communities and at Red Hat. We believe in sharing early and often the things we’re working on, but we want to note that unless otherwise stated the technologies and how-tos shared here aren’t part of supported products, nor promised to be in the future.

Why OpenTelemetry?

OpenTelemetry provides vendor neutral standards, APIs, and SDKs to generate, collect, and export trace data. OTLP data can be exported to and analyzed with a variety of tracing backends without the need to update application code.

Kubernetes is a complex distributed system. Core components like the API server, kube-controller-manager, scheduler, kubelet, container runtime, and etcd interact across multiple machines and nodes. Tracking interactions between components and across workloads is difficult, and it can be nearly impossible to answer important questions: Why is my application running slowly? Where is the bottleneck occurring? Which component needs a fix? What has caused this regression? Observing transactions as they flow through a microservice-based or otherwise complex system is essential to view, debug, and tune the performance of applications. Gaining this insight is known as distributed tracing.

Kubernetes components have recently added OpenTelemetry instrumentation. The API server includes the option to export OTLP traces, etcd added tracing as an experimental feature, and CRI-O can also be configured to export OpenTelemetry traces. A Kubernetes Enhancement Proposal to instrument the kubelet for trace exporting is currently being reviewed. Recently graduated from a CNCF sandbox project to an incubating project, OpenTelemetry is becoming the standard way to implement distributed tracing in Kubernetes.

Let’s start exporting!

OpenTelemetry is a developing project. The documentation is helpful and is improving as the community grows. However, many online tutorials are outdated with example code that has been refactored, moved, or no longer exists. The Go implementation of OpenTelemetry is also under active development. There are breaking changes as the project moves toward a stable release. We gathered what we considered best practices from the community and crafted a solution to instrument CRI-O that brings together previous examples with the current state of OpenTelemetry. The examples are using the most current release of OpenTelemetry-Go, v1.0.0.

With the OpenTelemetry tracing API and SDKs, instrumenting an application is not difficult, but there are choices to consider. To generate traces, OpenTelemetry gRPC interceptors were added to CRI-O’s already existing gRPC server. With gRPC interceptors in place, a span will be generated with every sampled gRPC request. With a few lines of code, we can export OTLP trace data from CRI-O.

The OpenTelemetry-Go SDK is imported to configure an OTLP exporter and an OTLP trace provider along with a context propagator. The trace provider is required to generate traces. A trace provider might include span processors to modify trace span data being exported. Here is a look at the (abbreviated) code that configures an OTLP exporter and returns a tracer provider, then configures a gRPC server with OpenTelemetry interceptors:

var tp *sdktrace.TracerProvider
// otelServiceIDKey, _ := os.Hostname()
// otelServiceName := "crio"

res := resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceNameKey.String(otelServiceName),
semconv.ServiceInstanceIDKey.String(otelServiceIDKey),
)

address := "0.0.0.0:4317”
exporter, _ := otlptracegrpc.New(ctx,
otlptracegrpc.WithEndpoint(address),
otlptracegrpc.WithInsecure(),
)

// Refer to the go SDK for sampling options.
// only emit spans when the kubelet sends a request with a sampled trace
sampler := sdktrace.NeverSample()

// Or, emit spans for a configured fraction of generated data
if samplingRate != nil && samplingRate > 0 {
sampler = sdktrace.TraceIDRatioBased(float64(samplingRate) / float64(1000000))
}

// batch span processor to aggregate spans before export.
bsp := sdktrace.NewBatchSpanProcessor(exporter)
tp = sdktrace.NewTracerProvider(
sdktrace.WithSampler(sdktrace.ParentBased(sampler)),
sdktrace.WithSpanProcessor(bsp),
sdktrace.WithResource(res),
)

// configure context propagation across processes and services
tmp := propagation.NewCompositeTextMapPropagator(
propagation.TraceContext{},propagation.Baggage{})

otel.SetTracerProvider(tp)
otel.SetTextMapPropagator(tmp)
opts := otelgrpc.Option{otelgrpc.WithPropagators(tmp), otelgrpc.WithTracerProvider(tp)}

// configure the gRPC server with opentelemetry interceptors
grpcServer := grpc.NewServer(
grpc.UnaryInterceptor(grpc_middleware.ChainUnaryServer(
metrics.UnaryInterceptor(),
log.UnaryInterceptor(),
otelgrpc.UnaryServerInterceptor(opts…),
)),
grpc.StreamInterceptor(grpc_middleware.ChainStreamServer(
log.StreamInterceptor(),
otelgrpc.StreamServerInterceptor(opts…),
)),
grpc.MaxSendMsgSize(config.GRPCMaxSendMsgSize),
grpc.MaxRecvMsgSize(config.GRPCMaxRecvMsgSize),
)

What’s with the sampler and processor?

The sampler and processor for a trace provider is configurable. OpenTelemetry offers built-in samplers and processors. They are explained in depth in the SDK specification.

  • The sampler is set to NeverSample() to minimize the number of spans emitted. With tracing enabled and the NeverSample setting, spans will only be emitted if parent requests are sampled, and the CPU and latency overhead of tracing is kept minimal. An administrator may set a sampling-rate-per-million configuration value greater than 0 to override the minimal sampling behavior.
  • The TraceIDRatioBased sampler can be set to export a reasonable fraction of trace data. The default sampling rate per million is 0. Setting a sampling rate greater than 0 ensures that spans are emitted regardless of whether the request was sampled. If enabled and with the sampling rate of 0, spans will only be emitted if parent requests are sampled.
  • The sampler is ParentBased. In the absence of a parent span, the above sampler will apply (if tracing is enabled)—otherwise, the sampler of the parent span is inherited.

A batch span processor sends spans in batches to reduce outgoing connections when transmitting data to the exporter. Until the OTLP-Go SDK is stable, it is always best to refer to the sampling and processing code when making decisions in your code. There are also sampling and processing options when collecting trace data. These are configured on the collector that runs external to the application.

What about context propagation?

Contexts are a stored state that can be shared across services with a configured propagator. OpenTelemetry-Go provides the propagation package with which you can add custom key-value pairs to contexts. Distributed tracing includes propagating contexts and exporting spans across instrumented services and processes. While it’s possible to stitch together time-stamped logs to troubleshoot and to identify latency issues with metrics, tracing makes this easier and can give more information about the root cause of a problem using context propagation. Structured OTLP trace data collected across service boundaries with context propagation is what tracing provides.

Where are the spans?

Now that a tracer provider, exporter, and gRPC OpenTelemetry interceptors are configured, CRI-O can be restarted with tracing enabled. With the below commands, CRI-O will export spans with each sampled gRPC request.

sudo su
cat <<EOF > /etc/crio/crio.conf.d/tracing.conf
[crio.tracing]
enable_tracing=true
EOF

systemctl daemon-reload
systemctl restart crio

By default, when tracing is enabled, the sampler is set to NeverSample(). With NeverSample, CRI-O will only emit spans when handling a sampled request. In Kubernetes, requests to CRI-O are from the kubelet. In other words, if the kubelet were instrumented, then CRI-O would emit spans for whichever kubelet requests were sampled. To emit spans with a fraction of every gRPC call regardless of whether the kubelet is emitting spans, add a configuration file for CRI-O like the following:

sudo su
cat <<EOF > /etc/crio/crio.conf.d/tracing.conf
[crio.tracing]
enable_tracing=true
tracing_sampling_rate_per_million=999999
EOF

systemctl daemon-reload
systemctl restart crio

With tracing enabled, CRI-O traces from gRPC calls are exported from each node’s systemd crio service, and OTLP data can now be collected. Here we have given examples to help anyone add distributed tracing to their applications. To learn how the CRI-O traces are collected, check out Part 2 of this series, coming next week! It will demonstrate collecting and visualizing OpenTelemetry traces, from a single-node Kubernetes and also from a multiple-node OpenShift cluster.