You’ve probably spent the last three years watching Datadog’s pricing climb like a Tesla at a Supercharger. You’ve heard about the LGTM stack (Loki, Grafana, Tempo, Mimir), assembled it yourself, and now you’re managing five separate UIs, five separate retention policies, and wondering why you need a PhD to correlate a trace with a metric.
Here’s the thing: there’s a better way. Two, actually.
SigNoz and Uptrace are OpenTelemetry-first observability platforms that do what Datadog does — traces, metrics, logs, alerts — but they’re built from the ground up around a single storage engine (ClickHouse) and shipped as cohesive products. No assembly required. No PhD. Just install, point your applications at them via OTLP, and watch your whole system talk to itself.
This isn’t “close enough.” This is the real deal. And if you’re self-hosting, one of these two will save you weeks of Grafana dashboard tweaking.
What “OTLP-First” Actually Means
Let me get this out of the way because it matters.
Traditional observability platforms (Prometheus, Grafana, the ELK stack) were built around isolated data models. Prometheus does metrics. Loki does logs. Tempo does traces. You bolt them together with glue and hope they talk.
OTLP-first means: the platform is designed around the OpenTelemetry Protocol from day one. Your applications send one stream of unified telemetry — traces with embedded metrics and logs — and the platform catches it, indexes it, and makes it queryable.
This matters because:
- You don’t need five exporters per application
- Correlation is implicit (a trace ID carries through logs and metrics automatically)
- Storage efficiency improves (shared indexes, smarter compression)
- Your mental model simplifies: “I have telemetry. It has spans, which have metrics and logs attached.”
Both SigNoz and Uptrace embrace this. Prometheus does not. Neither does the LGTM stack (they’re compatible with OTLP, but not designed around it).
ClickHouse: The Bet They Both Made
Both platforms chose ClickHouse as their storage backend. This is not an accident.
ClickHouse is a columnar OLAP database tuned for high-cardinality time-series data. It compresses aggressively (think 1:10 or better), handles billions of rows per table, and can ingest thousands of events per second on modest hardware.
For observability, ClickHouse is perfect. Your typical Prometheus setup with a million time series and 15-second scrape intervals will balloon to 100+ GB in a few months. The same workload in ClickHouse? 10-15 GB, with better query performance.
The gotcha: ClickHouse is a database. It needs tuning. You’ll encounter settings like max_insert_select_query_insert_timeout, retention policies that don’t auto-delete cleanly, and the occasional re-merge of table parts that tanks your CPU.
Not a dealbreaker. Just not “install and forget.”
SigNoz: Polished, Feature-Rich, Dashboards Are First-Class
SigNoz is the bigger project. Founded in 2020, well-funded, used at scale by teams at Figma, Pagerduty, and others. The UI feels mature.
Installing SigNoz
version: '3.9'services: clickhouse: image: clickhouse/clickhouse-server:latest ports: - "9000:9000" - "8123:8123" environment: CLICKHOUSE_DB: signoz volumes: - clickhouse-data:/var/lib/clickhouse healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8123/ping"] interval: 10s timeout: 5s retries: 5
otel-collector: image: otel/opentelemetry-collector-k8s:latest command: - "--config=/etc/otel-collector-config.yaml" ports: - "4317:4317" # OTLP gRPC - "4318:4318" # OTLP HTTP - "9411:9411" # Zipkin (optional) volumes: - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml depends_on: - clickhouse environment: GOGC: 80
signoz: image: signoz/signoz:latest ports: - "3301:3301" # UI environment: CLICKHOUSE_HOST: clickhouse CLICKHOUSE_PORT: 9000 CLICKHOUSE_DB: signoz depends_on: - clickhouse - otel-collector volumes: - signoz-data:/var/lib/signoz
volumes: clickhouse-data: signoz-data:That’s SigNoz in 50 lines. The UI pops up at http://localhost:3301.
What You Get
Dashboards: SigNoz treats dashboards as a first-class citizen. You get a template gallery, drag-and-drop panels, and queries that feel intuitive. Unlike Grafana, you’re not writing JSON. You’re clicking.
Traces: The trace viewer is gorgeous. Click into a span, see latency breakdown, identify the slow service in two clicks. Dependency graph builds itself. Span attributes are searchable.
Metrics: Scrape Prometheus targets or ingest via OTLP. The query builder is solid (though you can drop down to ClickHouse SQL if needed).
Logs: Full-text searchable, tagged by service and span ID. You can jump from a trace to “show all logs from this service in this 5-minute window.” The UX is snappy.
Alerts: SigNoz has rules. You define thresholds on metrics or logs, set notification channels (Slack, PagerDuty, email, webhooks), and they fire. The alert management UI is actually usable.
Pricing: Open source (Apache 2.0). Cloud offering exists but undercuts Datadog by 10x.
Uptrace: Lean, Fast, OTLP-Native by Religion
Uptrace is the minimalist’s choice. Same founder as the go-redis library. If SigNoz is the feature-rich sedan, Uptrace is the motorcycle.
Installing Uptrace
version: '3.8'services: clickhouse: image: clickhouse/clickhouse-server:latest ports: - "9000:9000" - "8123:8123" environment: CLICKHOUSE_DB: uptrace volumes: - clickhouse-data:/var/lib/clickhouse healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8123/ping"] interval: 10s timeout: 5s retries: 5
uptrace: image: uptrace/uptrace:latest ports: - "14317:14317" # OTLP gRPC - "14318:14318" # OTLP HTTP - "14250:14250" # Jaeger gRPC (optional) - "4317:4317" # Forwarded OTLP gRPC environment: UPTRACE_DB: clickhouse://clickhouse:9000/uptrace UPTRACE_HTTP_ADDR: :14318 depends_on: - clickhouse volumes: - uptrace-data:/var/lib/uptrace
volumes: clickhouse-data: uptrace-data:Wait, that’s it? Yeah. Uptrace is small. No separate collector service (it has one built in). No orchestration. Just the database and the app.
UI is at http://localhost:14318 by default.
What You Get
Traces: Top-notch trace visualization. Flame graphs, span timelines, dependency graphs. Incredibly fast even with millions of traces.
Metrics: If you’re sending metrics via OTLP Metrics, Uptrace ingests and dashboards them. The dashboarding is simpler than SigNoz (no drag-and-drop), but you can write raw ClickHouse SQL queries.
Logs: Searchable, correlated by trace ID. The logs view is cleaner than SigNoz in some ways (less chrome, more density).
Alerts: Built-in alerting, but simpler than SigNoz. You define alert rules on metric thresholds, route to Slack/Discord/email/webhooks. It works. It’s not as fancy, but it works.
Pricing: Open source (BSL license, conversion to Apache 2.0 after two years). Uptrace Cloud exists but is priced like a bootstrapped project (cheap).
Philosophy: Uptrace optimizes for simplicity and speed over feature richness. The UI is snappier, queries are faster, and the codebase is leaner.
Wiring Up Your Applications
Here’s where both shine equally. OTLP is OTLP. Same sender, two receivers.
Node.js Example
const { NodeSDK } = require('@opentelemetry/sdk-node');const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-grpc');const { MeterProvider, PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');const { Resource } = require('@opentelemetry/resources');const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');
const traceExporter = new OTLPTraceExporter({ url: 'http://localhost:4317', // SigNoz or Uptrace OTLP gRPC endpoint});
const metricExporter = new OTLPMetricExporter({ url: 'http://localhost:4317',});
const sdk = new NodeSDK({ resource: new Resource({ [SemanticResourceAttributes.SERVICE_NAME]: 'my-app', }), traceExporter, instrumentations: [getNodeAutoInstrumentations()], metricReader: new PeriodicExportingMetricReader(metricExporter),});
sdk.start();console.log('Tracing initialized. Sending to http://localhost:4317');
process.on('SIGTERM', () => { sdk.shutdown() .then(() => console.log('Tracing terminated')) .catch((err) => console.error('Tracing shutdown failed', err)) .finally(() => process.exit(0));});
// Your app code hereconst http = require('http');http.createServer((req, res) => { res.writeHead(200); res.end('Hello World\n');}).listen(3000);Python Example
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporterfrom opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporterfrom opentelemetry.sdk.metrics import MeterProviderfrom opentelemetry.sdk.metrics.export import PeriodicExportingMetricReaderfrom opentelemetry.sdk.resources import Resource, SERVICE_NAMEfrom opentelemetry.instrumentation.flask import FlaskInstrumentorfrom opentelemetry.instrumentation.requests import RequestsInstrumentorimport os
# Both SigNoz and Uptrace accept standard OTLPotlp_endpoint = os.getenv('OTLP_ENDPOINT', 'grpc://localhost:4317')
resource = Resource(attributes={SERVICE_NAME: 'my-python-app'})
trace_exporter = OTLPSpanExporter(endpoint=otlp_endpoint.replace('grpc://', ''))metric_exporter = OTLPMetricExporter(endpoint=otlp_endpoint.replace('grpc://', ''))
meter_provider = MeterProvider(resource=resource, metric_readers=[PeriodicExportingMetricReader(metric_exporter)])
from opentelemetry.sdk.trace import TracerProviderfrom opentelemetry.sdk.trace.export import BatchSpanProcessor
tracer_provider = TracerProvider(resource=resource)tracer_provider.add_span_processor(BatchSpanProcessor(trace_exporter))
FlaskInstrumentor().instrument()RequestsInstrumentor().instrument()
from flask import Flaskapp = Flask(__name__)
@app.route('/')def hello(): return 'Hello World'
if __name__ == '__main__': app.run(port=5000)Point both to the same OTLP endpoint (localhost:4317 for SigNoz, localhost:14317 for Uptrace). That’s it. Traces, metrics, and logs flow in.
Feature Parity: A Closer Look
| Feature | SigNoz | Uptrace |
|---|---|---|
| Trace ingestion (OTLP) | ✓ | ✓ |
| Metrics (OTLP + Prometheus scrape) | ✓ | ✓ (OTLP only) |
| Log ingestion + search | ✓ | ✓ |
| Alert rules | ✓ (rich) | ✓ (basic) |
| Dashboards | ✓ (drag-and-drop UI) | ✓ (SQL-heavy, manual) |
| Trace dependency graph | ✓ | ✓ |
| Span-level metrics | ✓ | ✓ |
| Retention policies | ✓ (by dataset) | ✓ (by table) |
| Multi-user + RBAC | ✓ (Pro) | ✓ (Pro) |
| ClickHouse backend | ✓ | ✓ |
| Built-in OTLP collector | ✗ (separate service) | ✓ (integrated) |
| Grafana integration | ✓ (as datasource) | ✓ (as datasource) |
The big delta: dashboarding and alerting flavor. SigNoz leans into visual, point-and-click UX. Uptrace leans into SQL and simplicity.
Retention and TTL: The Tuning Dance
ClickHouse doesn’t auto-delete old data like managed solutions do. Both platforms ship with retention helpers, but you’ll need to think about it.
SigNoz
# In the SigNoz configretention: traces: 1209600 # 14 days in seconds metrics: 2592000 # 30 days logs: 604800 # 7 daysSet it once, and SigNoz background jobs will merge and delete expired data. Works. Not always instant, and ClickHouse merges can chew CPU on large datasets.
Uptrace
# In uptrace.ymlretention: # Uptrace auto-deletes based on ALTER TABLE settings traces: 14d metrics: 30d logs: 7dSimilar story. Uptrace delegates to ClickHouse’s TTL system, which is cleaner but requires understanding how ClickHouse handles data lifecycle.
In practice: Don’t assume 7-day retention means “exactly 7 days.” ClickHouse merges parts every few hours (or on manual trigger). You might see old data linger for a few hours or get deleted earlier if merges happen faster. Plan for “approximately X days” and monitor actual disk usage.
Resource Footprint: How Much RAM Do You Need?
Both run on ClickHouse, so the hardware profile is similar. Here’s what to expect on a small homelab (3 nodes, ~50 microservices, ~100K spans/min):
| Component | SigNoz | Uptrace |
|---|---|---|
| ClickHouse | 8GB+ | 8GB+ |
| OTLP Collector (SigNoz) / Built-in (Uptrace) | 2GB | 1GB (built in) |
| UI Service | 2GB | 1GB |
| Total for both | ~12GB | ~10GB |
| Disk (30 days retention, 100K spans/min) | ~200GB | ~180GB |
Uptrace edges out SigNoz on RAM (no separate collector process), but both are light compared to the LGTM stack:
- Prometheus + Grafana + Tempo + Loki + Mimir: 24+ GB RAM, 500+ GB disk (same workload, scattered across five databases)
- SigNoz: 12 GB RAM, 200 GB disk
- Uptrace: 10 GB RAM, 180 GB disk
You’re saving 50% on memory and 60% on storage. Your 2 AM self will appreciate it.
Dashboards and Customization
Here’s where the philosophies diverge.
SigNoz Dashboards
You get a visual editor. Drag panels, add time-series charts, heatmaps, tables. Queries are built visually (no PromQL required). It feels like Grafana, which is good if you like Grafana and bad if you hate clicking.
SigNoz also ships with pre-built dashboards for common setups (Node.js, Python, Go, Redis, Postgres). Useful as starting points.
Pro: Non-engineers can build dashboards. Onboarding is faster. Con: Complex queries or unusual visualizations require custom ClickHouse SQL (you’re not escaping SQL entirely).
Uptrace Dashboards
Uptrace leans on SQL. You write ClickHouse queries, bind them to panels, and tweak formatting. There’s no drag-and-drop.
Pro: Powerful. You have the full ClickHouse query language. Con: You need to know ClickHouse SQL. Non-engineers won’t build dashboards without help.
Honest take: If your team knows SQL, Uptrace is faster and more flexible. If your team is mixed skill levels, SigNoz’s UI is a win.
Alerting: SigNoz Wins Here
SigNoz has a richer alerting system. You define rules on metric thresholds, log patterns, or trace sample rates. You can compose alerts with AND/OR logic, route by team, and escalate.
Uptrace’s alerting is simpler. Threshold-based on metrics or logs, route to Slack/email. It works, but it’s not as featureful.
If alerting complexity matters to you (multi-team escalation, sophisticated routing), SigNoz is the move.
The “vs Grafana + Tempo + Loki + Prometheus” Comparison
Let me be blunt: the LGTM stack won the popularity contest. It’s what everyone recommends on Reddit.
Here’s the trade-off:
LGTM Stack Wins:
- Dashboards are infinitely customizable (Grafana is the UI god)
- You can mix datasources (Prometheus, Loki, Tempo, external APIs)
- If you already have Prometheus scrapers deployed, reuse them
- Community is massive
SigNoz / Uptrace Win:
- Single pane of glass (traces, metrics, logs in one UI)
- No glue code (no Grafana plugin shuffling)
- Cheaper to run (ClickHouse is efficient)
- Correlation is built in (trace IDs carry through automatically)
- Simpler mental model (one telemetry stream, not five)
My take: If you’re building observability from scratch and your team is <50 people, SigNoz or Uptrace saves you weeks of frustration. If you already have Grafana dashboards you love and a Prometheus infra, stick with LGTM.
Licensing: Both OSS, Both Have Cloud
SigNoz:
- Apache 2.0 (open source)
- Cloud offering (https://signoz.io/cloud) with pricing comparable to Datadog at 1/10th
Uptrace:
- Business Source License (converts to Apache 2.0 after two years)
- Cloud offering (https://uptrace.dev/cloud) with pricing for bootstrapped projects
Both are production-ready. Neither will lock you in or suddenly stop working.
Decision Matrix: Pick Your Poison
| You Should Use SigNoz If… | You Should Use Uptrace If… |
|---|---|
| You want point-and-click dashboarding | You’re comfortable writing ClickHouse SQL |
| Your team has mixed technical levels | Your team is engineers-heavy |
| You need rich alerting with routing/escalation | Simple threshold alerts are enough |
| You’re migrating from Datadog | You value minimal footprint and speed |
| You like visual UX | You prioritize simplicity and performance |
| You want pre-built dashboards | You like building custom queries |
The SumGuy Take
I’ve run both on my homelab. Both work. Both are leagues better than debugging five separate UIs at 2 AM.
SigNoz is the safe bet. It’s more features, better UX, and feels like a real product. If you’re unsure, start here.
Uptrace is the minimalist’s choice. It’s lean, fast, and if you know SQL, you’ll move faster than clicking through SigNoz. The built-in OTLP collector is a nice touch — fewer moving parts.
My honest recommendation: Use Uptrace if you’re comfortable with SQL, use SigNoz if you’re not. Both will save you money and sanity compared to Datadog or the LGTM stack assembly tax. You’ll thank yourself the next time you’re chasing a production bug and your traces, metrics, and logs are actually correlated.
Deploy one this weekend. Your 2 AM self is counting on you.