Telemetry for Python Web Apps: A Comprehensive Guide

10 min read
python opentelemetry observability web-development 2024
Telemetry for Python Web Apps: A Comprehensive Guide

Introduction

Have you ever deployed a Python web application only to face mysterious performance issues or crashes that you can’t reproduce locally? Without proper telemetry, debugging production systems becomes a frustrating game of guesswork. Modern Python applications—whether they’re Flask APIs, Django backends, or FastAPI microservices—generate thousands of requests daily, and understanding their behavior requires visibility into what’s actually happening.

Telemetry is the practice of collecting, processing, and analyzing data about your application’s performance and behavior. In this guide, you’ll learn how to implement comprehensive telemetry in Python web applications using OpenTelemetry, the industry-standard framework. We’ll cover automatic instrumentation that gets you started in minutes, manual instrumentation for custom insights, and production deployment patterns that scale. By the end, you’ll be able to track request flows, identify performance bottlenecks, and troubleshoot issues with confidence.

Prerequisites

Before diving into implementation, ensure you have:

  • Python 3.9 or higher installed
  • Basic understanding of Python web frameworks (Flask, Django, or FastAPI)
  • Familiarity with HTTP requests and API concepts
  • Access to an observability backend (we’ll use console output for examples, but production requires a backend like SigNoz, Grafana, or Jaeger)
  • Docker installed (optional, for running OpenTelemetry Collector)

Understanding Telemetry Signals

Telemetry encompasses three primary types of data that work together to give you complete application visibility:

Traces represent the journey of a request through your system. Each trace contains spans—individual units of work like database queries, HTTP calls, or function executions. Traces answer “what happened and in what order?”

Metrics are numerical measurements collected over time: request counts, response times, error rates, and resource utilization. They answer “how much and how often?”

Logs are timestamped event records with varying severity levels. They provide detailed context about specific events and answer “what was the system doing when X occurred?”

OpenTelemetry unifies all three signals under one framework, automatically correlating them so you can jump from a slow request trace directly to relevant logs, or from high error rates in metrics to the specific traces causing problems.

Traces

Metrics

Logs

Incoming Request

Web Framework

OpenTelemetry SDK

Signal Type

Span Processor

Metric Reader

Log Processor

OTLP Exporter

OpenTelemetry Collector

Observability Backend

Dashboards & Alerts

Setting Up OpenTelemetry for Python

OpenTelemetry provides two approaches: automatic instrumentation with zero code changes, and manual instrumentation for custom telemetry. We’ll start with the automatic approach.

Installation

Install the core OpenTelemetry packages and distribution:

# Install OpenTelemetry distro with API, SDK, and tools
pip install opentelemetry-distro opentelemetry-exporter-otlp

# Bootstrap auto-instrumentation for installed packages
opentelemetry-bootstrap -a install

The opentelemetry-bootstrap command scans your environment and installs instrumentation libraries for frameworks it detects—Flask, Django, FastAPI, requests, SQLAlchemy, and more.

Quick Start with Flask

Let’s instrument a simple Flask application. Create app.py:

# app.py - Flask 3.0+ compatible
from random import randint
from flask import Flask, request
import logging

app = Flask(__name__)
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@app.route("/rolldice")
def roll_dice():
    player = request.args.get('player', default='Anonymous', type=str)
    result = str(roll())
    logger.info(f"{player} rolled: {result}")
    return result

def roll():
    return randint(1, 6)

if __name__ == "__main__":
    app.run(port=8080)

Now run it with OpenTelemetry instrumentation:

# Enable logging auto-instrumentation
export OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true

# Run with console exporters for development
opentelemetry-instrument \
    --traces_exporter console \
    --metrics_exporter console \
    --logs_exporter console \
    --service_name dice-server \
    python app.py

Visit http://localhost:8080/rolldice and watch your console fill with telemetry data. You’ll see spans tracking the HTTP request lifecycle, metrics about request counts and latencies, and your application logs enriched with trace IDs.

OpenTelemetry supports all major Python web frameworks with zero-code instrumentation.

FastAPI Applications

FastAPI, being ASGI-based, requires slightly different setup:

# fastapi_app.py
from fastapi import FastAPI
from random import randint

app = FastAPI()

@app.get("/")
async def root():
    return {"message": "Hello World"}

@app.get("/compute")
async def compute():
    result = sum(range(1000000))
    return {"result": result}

Run with instrumentation:

pip install opentelemetry-instrumentation-fastapi

opentelemetry-instrument \
    --traces_exporter console \
    --service_name fastapi-service \
    uvicorn fastapi_app:app --host 0.0.0.0 --port 8000

Django Projects

For Django applications, modify your manage.py or wsgi.py:

# manage.py additions
import os
import sys

# Add this before execute_from_command_line
from opentelemetry.instrumentation.django import DjangoInstrumentor
DjangoInstrumentor().instrument()

if __name__ == '__main__':
    os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myproject.settings')
    # ... rest of manage.py

Or use the automatic approach:

opentelemetry-instrument python manage.py runserver

Manual Instrumentation: Adding Custom Spans

Automatic instrumentation captures system boundaries, but you’ll want custom spans for business logic, complex computations, or external API calls.

Creating Custom Spans

Here’s how to add detailed tracing to your application:

# app_with_manual_instrumentation.py
from flask import Flask
from opentelemetry import trace
from random import randint, choice
import time

app = Flask(__name__)

# Get a tracer instance
tracer = trace.get_tracer(__name__)

@app.route("/process")
def process_request():
    # Automatic span created by Flask instrumentation
    # Add custom child span for business logic
    with tracer.start_as_current_span("process_business_logic") as span:
        span.set_attribute("user.type", "premium")
        
        # Simulate validation
        with tracer.start_as_current_span("validate_input"):
            is_valid = validate_data()
            span.set_attribute("validation.result", is_valid)
        
        # Simulate processing
        with tracer.start_as_current_span("complex_calculation"):
            result = perform_calculation()
            span.set_attribute("calculation.result", result)
        
        return {"status": "success", "result": result}

def validate_data():
    time.sleep(0.05)  # Simulate validation time
    return True

def perform_calculation():
    time.sleep(0.1)  # Simulate work
    return randint(100, 999)

if __name__ == "__main__":
    app.run(port=8080)

Adding Events and Attributes

Spans can be enriched with events and attributes for debugging:

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

@app.route("/checkout")
def checkout():
    with tracer.start_as_current_span("checkout_process") as span:
        current_span = trace.get_current_span()
        
        # Add custom attributes
        current_span.set_attribute("cart.items", 5)
        current_span.set_attribute("cart.total", 129.99)
        current_span.set_attribute("user.id", "user_12345")
        
        # Add events for significant moments
        current_span.add_event("payment_initiated")
        
        try:
            process_payment()
            current_span.add_event("payment_successful")
        except Exception as e:
            current_span.add_event("payment_failed", {
                "error.message": str(e)
            })
            current_span.set_status(trace.Status(trace.StatusCode.ERROR))
            raise
        
        return {"status": "complete"}

Implementing Metrics Collection

While traces show individual request flows, metrics aggregate data over time for monitoring trends.

# metrics_example.py
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader, ConsoleMetricExporter

# Set up metrics provider
reader = PeriodicExportingMetricReader(ConsoleMetricExporter())
provider = MeterProvider(metric_readers=[reader])
metrics.set_meter_provider(provider)

# Get a meter instance
meter = metrics.get_meter(__name__)

# Create different metric types
request_counter = meter.create_counter(
    "requests_total",
    description="Total number of requests",
    unit="1"
)

active_requests = meter.create_up_down_counter(
    "active_requests",
    description="Number of requests currently being processed"
)

request_duration = meter.create_histogram(
    "request_duration_seconds",
    description="Request processing time",
    unit="s"
)

@app.route("/api/data")
def get_data():
    import time
    start = time.time()
    
    # Increment counters
    request_counter.add(1, {"endpoint": "/api/data", "method": "GET"})
    active_requests.add(1)
    
    try:
        # Process request
        time.sleep(0.1)
        result = {"data": "sample"}
        return result
    finally:
        # Record duration and decrement active requests
        duration = time.time() - start
        request_duration.record(duration, {"endpoint": "/api/data"})
        active_requests.add(-1)

Production Deployment with OpenTelemetry Collector

In production, you don’t want applications sending telemetry directly to backends. The OpenTelemetry Collector acts as a central pipeline for receiving, processing, and exporting telemetry.

Collector Configuration

Create a otel-collector-config.yaml:

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024
  
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
  
  # Remove sensitive data
  attributes:
    actions:
      - key: password
        action: delete
      - key: api_key
        action: delete

exporters:
  # Console exporter for debugging
  debug:
    verbosity: detailed
  
  # OTLP exporter to backend (e.g., Jaeger, Grafana, SigNoz)
  otlp:
    endpoint: "your-backend:4317"
    tls:
      insecure: false

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, attributes]
      exporters: [debug, otlp]
    
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [debug, otlp]
    
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch, attributes]
      exporters: [debug, otlp]

Running the Collector

# Using Docker
docker run -p 4317:4317 -p 4318:4318 \
  -v $(pwd)/otel-collector-config.yaml:/etc/otelcol/config.yaml \
  otel/opentelemetry-collector:latest

Configuring Your Application

Update your application to send telemetry to the collector:

# Set OTLP endpoint to collector
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
export OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
export OTEL_SERVICE_NAME="my-python-app"
export OTEL_RESOURCE_ATTRIBUTES="service.version=1.0.0,deployment.environment=production"

# Run your instrumented application
opentelemetry-instrument python app.py

Common Pitfalls and Troubleshooting

Issue 1: Missing Trace Data

Problem: Instrumentation installed but no traces appearing.

Solution: Check that instrumentation is loaded before your web framework imports:

# WRONG - Framework imported first
from flask import Flask
from opentelemetry.instrumentation.flask import FlaskInstrumentor

# CORRECT - Instrument before using framework
from opentelemetry.instrumentation.flask import FlaskInstrumentor
FlaskInstrumentor().instrument()
from flask import Flask

Or use opentelemetry-instrument CLI wrapper which handles initialization order automatically.

Issue 2: High Memory Usage

Problem: Application memory grows continuously.

Solution: Configure batch processors with appropriate limits:

from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

provider = TracerProvider()

# Configure batching to reduce memory usage
span_processor = BatchSpanProcessor(
    exporter,
    max_queue_size=2048,
    max_export_batch_size=512,
    schedule_delay_millis=5000
)

provider.add_span_processor(span_processor)

Issue 3: Duplicate Telemetry in Forked Processes

Problem: When using Gunicorn or uWSGI with multiple workers, metrics are duplicated or broken.

Solution: For pre-fork servers, metrics export can be problematic. Use traces and logs primarily, or configure one worker for metrics:

# Single worker for development/testing
gunicorn -w 1 app:app

# For production, use the collector pattern
# Each worker sends to collector, which handles aggregation

Issue 4: Missing Database Query Details

Problem: Seeing HTTP traces but not database operations.

Solution: Install and enable database-specific instrumentation:

# For PostgreSQL with psycopg2
pip install opentelemetry-instrumentation-psycopg2

# For SQLAlchemy
pip install opentelemetry-instrumentation-sqlalchemy

# Re-run bootstrap
opentelemetry-bootstrap -a install

Issue 5: Cannot Connect to OTLP Endpoint

Problem: Failed to export spans or connection errors.

Solution: Verify endpoint configuration and network connectivity:

import logging
logging.basicConfig(level=logging.DEBUG)

# Check GRPC connectivity
export GRPC_VERBOSITY=debug
export GRPC_TRACE=http,call_error,connectivity_state

Ensure your endpoint uses the correct protocol (gRPC vs HTTP) and port:

  • gRPC: typically port 4317
  • HTTP: typically port 4318

Best Practices for Production

1. Use Sampling for High-Traffic Applications

Don’t trace every request at scale. Implement sampling:

from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased

# Sample 10% of traces
sampler = TraceIdRatioBased(0.1)
provider = TracerProvider(sampler=sampler)

2. Add Resource Attributes

Enrich telemetry with service metadata:

from opentelemetry.sdk.resources import Resource

resource = Resource.create({
    "service.name": "user-api",
    "service.version": "2.1.0",
    "deployment.environment": "production",
    "service.instance.id": os.getenv("HOSTNAME", "unknown")
})

provider = TracerProvider(resource=resource)

3. Handle Sensitive Data

Never log passwords, tokens, or PII:

from opentelemetry import trace

def process_user_data(user_email, password):
    with tracer.start_as_current_span("process_user") as span:
        # DO NOT do this
        # span.set_attribute("user.password", password)
        
        # DO this instead
        span.set_attribute("user.email_domain", user_email.split("@")[1])
        span.set_attribute("auth.method", "password")

Configure the collector to strip sensitive attributes as a safety net.

4. Monitor the Monitoring System

Your telemetry pipeline itself needs monitoring:

# Add self-telemetry in collector config
service:
  telemetry:
    logs:
      level: info
    metrics:
      address: :8888

5. Use Semantic Conventions

Follow OpenTelemetry semantic conventions for consistency:

# Use standard attribute names
span.set_attribute("http.method", "GET")
span.set_attribute("http.status_code", 200)
span.set_attribute("db.system", "postgresql")
span.set_attribute("db.statement", "SELECT * FROM users WHERE id = ?")

Conclusion

Implementing telemetry in Python web applications transforms debugging from guesswork into data-driven analysis. With OpenTelemetry, you can start with automatic instrumentation in minutes, then progressively add custom instrumentation for deeper insights. The three signals—traces, metrics, and logs—work together to give you complete visibility into application behavior.

Key takeaways: Start with automatic instrumentation for quick wins. Use the OpenTelemetry Collector in production for scalability and flexibility. Add custom spans around critical business logic. Follow sampling strategies for high-traffic services. Always handle sensitive data appropriately.

Next steps: Deploy a collector in your staging environment, experiment with different exporters and backends, and gradually instrument your most critical endpoints. For advanced topics, explore distributed tracing across microservices, custom metrics for business KPIs, and integration with alerting systems.


References:

  1. OpenTelemetry Python Documentation - https://opentelemetry.io/docs/languages/python/ - Official getting started guide, API reference, and instrumentation details
  2. OpenTelemetry Python SDK on GitHub - https://github.com/open-telemetry/opentelemetry-python - Source code, releases, and issue tracker for Python implementation
  3. Getting Started with OpenTelemetry and Python (SigNoz) - https://signoz.io/opentelemetry/python/ - Practical tutorial with backend setup and visualization examples
  4. OpenTelemetry Python Best Practices (Coralogix) - https://coralogix.com/guides/opentelemetry/opentelemetry-python-basics-tutorial-practices/ - Production patterns, exception tracking, and collector usage
  5. Deploying OpenTelemetry at Scale (Last9) - https://last9.io/guides/opentelemetry/deploying-opentelemetry-at-scale-production-patterns-that-work/ - Production deployment patterns and configuration management