Ingesting Mobile Navigation Telemetry into ClickHouse: Architecture and Best Practices
data engineeringClickHousetelemetry

Ingesting Mobile Navigation Telemetry into ClickHouse: Architecture and Best Practices

wwebscraper
2026-02-09
11 min read
Advertisement

Practical 2026 guide to ingesting high‑velocity navigation telemetry into ClickHouse—partitioning, codecs, ingestion patterns, and queries.

Hook: Why mobile navigation telemetry breaks traditional warehouses — and how ClickHouse solves it

High‑velocity location pings, noisy sensor data, and bursty ingestion from millions of devices create a unique operational profile: extreme write concurrency, time-series hot partitions, and a relentless need for compact storage. If your analytics stack is stalling on insert latency, exploding storage costs, or brittle queries that can't keep up with 2026 real‑time expectations, this guide is for you.

Executive summary (most important first)

This guide explains an end‑to‑end architecture for ingesting mobile navigation telemetry into ClickHouse in 2026. You'll get a production‑ready schema, recommended partitioning and compression strategies, reliable ingestion patterns (Kafka → ClickHouse, Buffering, Native protocol), deduplication and TTL retention patterns, and query patterns for real‑time analytics and historical aggregations. We include code snippets, DDL examples and operational tips designed for developers and SREs running at 10k–10M events/second.

Context: Why ClickHouse in 2026?

ClickHouse's rapid commercial growth and major funding rounds through late 2025 underscore its investment in scalability, cloud native connectors, and operational features such as ClickHouse Keeper and improved S3 integration. For telemetry use cases — where low‑latency ad‑hoc analytics and cost‑efficient long‑term storage matter — ClickHouse is now a mature choice and competes with other OLAP systems on both price and latency.

Core constraints of mobile telemetry

  • High cardinality device identifiers and irregular activity patterns (many devices idle, a subset bursty).
  • Frequent small writes (per‑second pings per device) that must be batched and de‑duplicated.
  • Queries that are both point lookups (device session) and large time‑range analytics (heatmaps, congestion trends).
  • Strong cost and retention controls for PII and location data due to regulatory changes in 2025–2026.

High level:

  1. Mobile SDK → Edge collector (mobile batching, encryption, local caching)
  2. Edge collector → Message bus (Kafka / Pulsar with partitioning tuned to ingest parallelism)
  3. ClickHouse ingestion layer: Kafka Engine or External connector → Materialized View → MergeTree table
  4. Real‑time materialized rollups into AggregatingMergeTree / SummingMergeTree for dashboards
  5. Cold storage via TTL moves to S3 volumes or separate cold ClickHouse cluster

Why Kafka (or Pulsar)?

It decouples mobile pulse bursts from ClickHouse spikes. Configure Kafka partitions equal to the parallel ClickHouse inserters (or higher) so ingestion threads stay busy. As of 2026, managed Kafka (or Kafka‑compatible cloud streaming) plus ClickHouse Kafka engine and ClickHouse Keeper provide low‑ops cluster reliability; similar streaming patterns are described in guides for low‑latency event pipelines like hybrid game events.

Schema design — practical DDL for telemetry

Below is a pragmatic table optimized for storage, compression and queries you’ll run most.

CREATE TABLE telemetry_events
(
    event_id UUID,
    device_id UInt64, -- hashed client ID for privacy
    event_ts DateTime64(3, 'UTC'),
    recv_ts DateTime64(3, 'UTC') DEFAULT now64(3),

    lat Float32 CODEC(Gorilla(1)),
    lon Float32 CODEC(Gorilla(1)),
    speed_kmh Float32 CODEC(Delta, ZSTD(3)),
    bearing UInt16 CODEC(Delta),
    accuracy_m Float32,

    provider LowCardinality(String),
    status UInt8,
    properties String, -- JSON blob for sparse attributes

    geohash UInt64 MATERIALIZED geohash64(lat, lon)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(event_ts)
ORDER BY (device_id, event_ts)
SETTINGS index_granularity = 16384;

Notes:

  • Store device_id as UInt64 produced by a deterministic hash (e.g., xxh3_64) to avoid PII leaks and natural sharding semantics.
  • Use DateTime64(3) for millisecond precision needed by navigation telemetry.
  • Materialize a geohash as a fast spatial filter, keeping heavy geospatial indexes out of the primary storage. Geohash64 is compact, indexable and great for partition pruning and heatmaps — for local mapping considerations see map plugin guidance.
  • Keep sparse attributes in a JSON string for flexibility; use JSONExtract when needed. If high‑volume structured attributes appear, promote to columns.

Partitioning: choosing a granularity that balances reads and merges

Partitioning strategy is the single biggest lever for operational performance.

Guidelines (2026 best practices)

  • Target average partition size: 10 GB – 50 GB. Too small: partition management and merges explode. Too large: heavy merges slow compaction and increase query latency.
  • For most mobile navigation workloads, start with daily partitions (PARTITION BY toYYYYMMDD(event_ts)). If you ingest >500M rows/day per shard, switch to hourly (toStartOfHour).
  • Design partitions so hot time windows are limited — huge hourly bursts (e.g., events during commuting hour) should still remain within a partition you can compact quickly.
  • Use ORDER BY (device_id, event_ts) for efficient per‑device queries and sequential writes. If your most common queries are purely time‑range scans, prefer ORDER BY (event_ts, device_id).

Partition sizing heuristic

Estimate partition size = average_event_size_bytes * events_per_partition. Example: 200 bytes/event * 200M events/day = 40 GB/day → daily partition OK. If you exceed 100 GB/day, consider hourly partitions or vertical partitioning (hot vs. cold columns).

Compression: codecs and column‑level tuning

ClickHouse offers per‑column codecs. For telemetry you can get significant gains by using time/number specific codecs.

  • LZ4 (default): fastest decode, decent ratio — good for low‑CPU clusters and HOT queries.
  • ZSTD(level): better ratio, higher CPU. Use for archival or large columns like JSON.
  • Delta / DoubleDelta: for integers and timestamps — excellent where values are monotonic or small differences.
  • Gorilla (for floats): low error and high compression for time series floats (lat/lon, speed).

Recommended codec examples:

  • event_ts: CODEC(T64, ZSTD(1))
  • lat/lon: CODEC(Gorilla(1), ZSTD(1))
  • speed_kmh: CODEC(Delta, ZSTD(3))
  • properties (JSON): CODEC(ZSTD(6))

Ingestion patterns and code examples

Create a Kafka engine table and a materialized view to the MergeTree table so ClickHouse pulls at its own pace. Many high-throughput streaming designs follow the same decoupling patterns used in live events and gaming ingestion pipelines — see notes on building low‑latency streaming systems in hybrid game events.

CREATE TABLE kafka_telemetry (
  payload String
) ENGINE = Kafka SETTINGS
  kafka_broker_list = 'broker1:9092,broker2:9092',
  kafka_topic_list = 'telemetry-events',
  kafka_group_name = 'ch_ingest_group',
  kafka_format = 'JSONEachRow';

CREATE MATERIALIZED VIEW mv_telemetry
TO telemetry_events
AS
INSERT INTO telemetry_events SELECT
  JSONExtractString(payload, 'event_id') AS event_id,
  xxHash64(JSONExtractString(payload, 'device_id')) AS device_id,
  parseDateTime64BestEffort(JSONExtractString(payload, 'event_ts')) AS event_ts,
  now64(3) AS recv_ts,
  JSONExtractFloat(payload, 'lat') AS lat,
  JSONExtractFloat(payload, 'lon') AS lon,
  JSONExtractFloat(payload, 'speed_kmh') AS speed_kmh,
  JSONExtractString(payload, 'provider') AS provider,
  JSONExtractString(payload, 'properties') AS properties;

2) Native protocol or HTTP batch inserts (low‑latency path)

For minimal latency use the native binary protocol or client libraries (clickhouse-go, clickhouse‑connect for Python). Send batches of 1k–50k rows to amortize TCP overhead. Example: JSONEachRow via HTTP:

curl -sS -X POST 'https://clickhouse.example.com/?query=INSERT%20INTO%20telemetry_events%20FORMAT%20JSONEachRow' \
  --data-binary @batch.json

3) Buffer table pattern for burst smoothing

If you expect large bursts, put a Buffer engine table in front of the MergeTree table. Configure buffer limits so memory spikes are controlled and flush frequency optimized to your latency SLA. For architecture notes on edge buffering and observability, see materials on edge observability.

Deduplication and ordering

Mobile telemetry can produce duplicates (retries) and out‑of‑order events. Patterns:

  • Include an event_id UUID from the client. Use ReplacingMergeTree(event_version) or a separate dedupe table to remove duplicates during merges.
  • If exact ordering matters, use event_ts from the device and compute session windows on query time with window functions (lag/lead).
CREATE TABLE telemetry_replacing
ENGINE = ReplacingMergeTree(event_version)
PARTITION BY toYYYYMMDD(event_ts)
ORDER BY (device_id, event_ts)
AS SELECT * FROM telemetry_events;

Retention, TTLs and cold storage

Modern ClickHouse supports moving parts to S3 or remote volumes. Define TTLs for columns and partitions to manage PII and cost.

ALTER TABLE telemetry_events
MODIFY TTL
  event_ts + INTERVAL 90 DAY DELETE,
  event_ts + INTERVAL 30 DAY TO VOLUME 'cold_s3';

This keeps recent data local & hot, and moves older data to cheaper S3 volumes automatically. For teams worried about runaway cloud costs, read platform updates like the cloud per‑query cost cap guidance for city data teams — it illustrates how cost controls can impact retention choices.

Query patterns and performance tips

Common queries

  • Per‑device recent history: WHERE device_id = X ORDER BY event_ts DESC LIMIT 1000
  • Congestion heatmap: group by geohash and time buckets
  • Top speed events per city: filter by geohash prefix and ORDER BY speed_kmh DESC
  • Sessionization: use window functions to identify trip starts (speed > x and gap > y seconds)

SQL examples

-- Heatmap: avg speed per geohash per 5min bucket
SELECT
  toStartOfFiveMinute(event_ts) AS t,
  substring(toString(geohash), 1, 6) AS gh6,
  avg(speed_kmh) AS avg_speed
FROM telemetry_events
WHERE event_ts BETWEEN now() - INTERVAL 1 HOUR AND now()
GROUP BY t, gh6
ORDER BY t, gh6;

-- Per device recent history
SELECT * FROM telemetry_events
WHERE device_id = 123456789
ORDER BY event_ts DESC
LIMIT 1000;

Sampling and exploratory analysis

Define SAMPLE BY xxHash64(device_id) at table creation to support fast approximate queries:

ENGINE = MergeTree()
PARTITION BY ...
SAMPLE BY xxHash64(device_id)

Use the SAMPLE clause in queries for fast, approximate results during iteration, then run full queries for production metrics.

Operational best practices

  • Shard by device hash to distribute writes evenly. Align Kafka partitioning with ClickHouse insert parallelism.
  • Monitor MergeTree background merges, mutation queues, and rejected inserts. Set alerts for long merge queues.
  • Use ClickHouse Keeper (2025+) rather than ZooKeeper where possible; it reduces operational complexity and matches ClickHouse's own development path.
  • Set index_granularity to balance point lookups vs. storage. Typical ranges: 8192–65536. For telemetry with large range scans, 16384–32768 is a good starting point.
  • Benchmark codecs and levels on a representative sample. ZSTD(6) might give large savings on JSON, but adds CPU cost on ingest and queries.

Privacy, compliance and data governance

By 2026 regulators expanded guidance on location telemetry. Build privacy in:

  • Hash or salt device identifiers; store only hashed IDs used for analytics. If you're designing consent and data flows, pair technical controls with legal guidance similar to architecting consent flows.
  • Enforce TTLs aligned with consent windows and legal limits; automate deletions with TTLs.
  • Minimize verbatim location retention if not required—consider downsampling or obfuscation for analytics that don't need full precision.

Operational tip: Treat telemetry like money — store it accurately for only as long as it earns value. Use TTLs, rollups, and cold storage aggressively.

Scaling case study (real‑world example)

Scenario: A navigation app with 3M active devices generating on average 1 event/sec during active use, peaking at 5M events/sec during morning commute across global shards.

  • Ingestion: Kafka with 200 partitions, each consumed by a dedicated ClickHouse inserter; Buffer tables at 10k row flush sizes smoothed spikes.
  • Storage: Daily partitions sized ~30GB; index_granularity = 16384. Compression: lat/lon with Gorilla; JSON with ZSTD(6).
  • Querying: Hot cluster for last 30 days, cold S3 for older. Materialized views aggregated per‑minute for dashboards reduced query latency by 30x vs raw scans.
  • Result: Sustained 4–5M inserts/sec with 200ms tail latency for most analytical queries and compressed storage at ~6x ratio vs raw JSON.

Benchmarks and tradeoffs

Expect tradeoffs between CPU (compression), storage cost and query latency:

  • LZ4: fastest, lowest CPU. Good for low latency. Ratio 2x–4x.
  • ZSTD(3–6): 1.5x–2x better ratio than LZ4, with higher CPU. Use for cold or JSON columns.
  • Per‑column advanced codecs (Gorilla/Delta): up to 3–10x improvements on time‑series numeric columns with minimal query penalty.

Common pitfalls and how to avoid them

  • Too many tiny partitions — causes overhead and slows merges. Consolidate via ALTER TABLE ... MOVE PARTITION or change partitioning strategy.
  • Unbounded JSON growth — promotes frequently used keys to columns early to allow efficient compression and querying.
  • No ingestion backpressure — always place Kafka/Buffer layer to ingest at controlled pace. Don’t rely solely on HTTP direct inserts at peak load. Streaming and rate-limit patterns from notification and messaging systems are useful references; see implementation patterns like RCS fallback architectures.
  • Ignoring privacy requirements — implement TTLs and hashing from day one and follow regional regulatory playbooks such as adapting to Europe’s AI rules for compliance planning.
  • Stronger native geospatial primitives and indexes in ClickHouse — start by using geohash but plan to migrate to native spatial indexes as they mature. For embedding maps and geospatial filters, see guidance on map plugin choices.
  • Improved integration with cloud object stores and tiering — push more cold data to S3 or cloud analytics lakes automatically.
  • Managed ClickHouse services reducing operational overhead; still plan for sharding and partitioning choices early to avoid costly refactors.
  • Increasing adoption of privacy‑first telemetry: expect analytics to rely more on hashed identifiers and aggregated rollups rather than raw location retention.

Actionable checklist (quick start)

  1. Design event schema with DateTime64(3), hashed device_id, and materialized geohash column.
  2. Estimate daily event volume and choose partition granularity to hit 10–50GB per partition.
  3. Use Kafka + Materialized View ingestion; batch 1k–50k rows per insert; prefer native protocol where latency matters.
  4. Tune codecs: Gorilla for lat/lon; Delta for integers/time deltas; ZSTD for JSON.
  5. Configure TTLs to delete/move old data and ensure compliance with consent windows.
  6. Create per‑minute / per‑hour rollups into AggregatingMergeTree tables for dashboards.
  7. Monitor merges, backpressure, and storage growth; alert on long merge queues and high mutation times.

Closing: why this matters now

With ClickHouse's continued investment and ecosystem growth through late 2025 and into 2026, teams building navigation analytics can deploy cost‑efficient, low‑latency pipelines that scale to millions of events per second. Correct partitioning and compression choices reduce both cloud costs and maintenance overhead. Proper ingestion topology prevents outages during peak commuting hours while allowing near real‑time insights.

Call to action

Ready to prototype? Start with the DDL above on a representative dataset (1–10M rows) and run codec/partitioning experiments to measure compression and query latency. If you want a ready‑made architecture review—share a sample ingestion profile and query patterns and we’ll propose tuned schema and operational knobs for your environment. For operational observability patterns at the edge, see the edge observability playbook and for streaming/rollup design review the hybrid events guidance.

Advertisement

Related Topics

#data engineering#ClickHouse#telemetry
w

webscraper

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-12T23:47:35.230Z