Analytics DB: Dual-DB Architecture with ClickHouse
Date: 2026-02-19
Problem
TimescaleDB hits 100% CPU on read queries that combine session data with message data. All workloads — high-volume message writes, trigger-based session upserts, CAGG refreshes, dashboard aggregation queries, session detail lookups — compete for CPU/IO on a single PG instance.
Top operations by total DB time:
| # | Operation | Avg (s) | Calls | Total Time (s) | Source |
|---|---|---|---|---|---|
| 1 | GetSessionContent | 1.0 | 152,033 | 155,461 | raw chatlogs + correlated subqueries to reactions, corrections, insights |
| 2 | GetSessions | 19.0 | 2,985 | 56,675 | sessions_hourly CAGG + LEFT JOIN reactions + LEFT JOIN insights |
| 3 | RatingsCounts | 56.2 | 321 | 18,049 | raw chatlogs scan, COUNT DISTINCT, PERCENTILE_CONT |
| 4 | SessionsMeteringFast | 2.3 | 2,054 | 4,720 | sessions CAGGs |
| 5 | GetNormalizedRatingFeedback | 21.8 | 205 | 4,469 | raw chatlogs |
| 6 | LogsSessionsIntervalV2 | 13.9 | 207 | 2,872 | chatlogs CAGG |
| 7 | GetMonthlyActiveUsers | 4.6 | 534 | 2,437 | monthly_active_users_1m CAGG |
| 8 | CountsPerBrainFast | 95.4 | 25 | 2,385 | cross-join: sessions CAGG + chatlogs CAGG |
Additional slow queries (low volume, extreme latency):
| # | Operation | Avg (s) | Calls | Total Time (s) |
|---|---|---|---|---|
| 9 | RequestsCountsFast | 120.0 (timeout) | 2 | 240 |
| 10 | RequestsCounts | 120.0 (timeout) | 2 | 240 |
| 11 | GetSessionsCount | 82.2 | 3 | 247 |
| 12 | GetAgentsPerformance | 66.0 | 6 | 396 |
| 13 | BrainsPerformance | 64.5 | 4 | 258 |
Why GetSessionContent Dominates (60% of total DB time)
Despite a modest 1s average, it's called 152K times. Per call it:
- Scans raw chatlogs hypertable (~45 columns) for all messages in a session
- For every
message:brain_sendrow, executes 2 correlated subqueries: one toreactions, one toresponse_corrections - Builds 30+ field JSONB objects per message via
ARRAY_AGG(JSONB_BUILD_OBJECT(...)) - Deduplicates tag and rule arrays via
ARRAY(SELECT DISTINCT UNNEST(ARRAY_CAT_AGG(...))) - LEFT JOINs
insightstable
A session with 100 messages and 50 brain_sends = 100 JSONB constructions + 100 index lookups against other hypertables. Caching is not viable: users browse historical sessions going back months with low repeat rate.
Why GetSessions Is Second (23% of total DB time)
Reads from sessions_hourly CAGG (derived from chatlogs), then LEFT JOINs reactions and insights. A HAVING clause with 15+ filter conditions including JSONB array unnesting, text LIKE searches, and insight filter evaluation. 19s average at 3K invocations.
Current Architecture
Data Flow
Kafka (Strimzi, KRaft v3.8.0)
3 brokers, SCRAM-SHA-512 auth
12h retention, gzip compression
│
┌──────────────────────┬┴──────────────────────┐
│ │ │
tracking.analytics. tracking.insights.1 tracking.analytics.
message.2 (6 partitions) message.failed.1
(6 partitions) │ (6 partitions)
│ │ │
▼ ▼ ▼
┌─────────────────── Go Analytics Service ───────────────────┐
│ Consumer groups: │
│ analytics.group.4 → MessagePool │
│ analytics.insights.group.1 → InsightsPool │
│ analytics.failed.group.1 → MessageFailedPool │
│ │
│ pgx pool (analyticsUser) → INSERT chatlogs │
│ → insert_session() trigger │
│ fires → UPSERT sessions │
└────────────────────────────────────────────────────────────┘
│
▼
TimescaleDB (PG17)
Timescale Cloud (Aiven)
│
┌────────────┴────────────┐
▼ ▼
Hasura v2 (OSS) Go service (gRPC)
GraphQL API query functions
│ │
┌───────────────┼───────────────┐ │
▼ ▼ ▼ ▼
Orchestrator Tooling (React) Hub services Tooling
(JWT auth, (dashboard (metering, (session
only calls queries via weekly detail)
get_session_ Hasura) email)
content)
Database
- TimescaleDB (PG17 + TimescaleDB 2.22/2.23) on Timescale Cloud (Aiven infra)
- AWS Prod:
prod-analytics-moveo-0dff.a.timescaledb.io:21879 - GCP Prod: internal CNPG (CloudNativePG, 3 instances, PG17 + TimescaleDB extension)
- Dev access via Tailscale VPN (
prod-analytics-db:21879)
Who Connects
| Service | Method | Auth |
|---|---|---|
| analytics (Go) | Direct pgx pool (3 max connections) | analyticsUser (sealed secret) |
| datawarehouse (Go) | Direct pgx pool, same binary different mode | separate DB credentials |
| hasura (v2 OSS) | PostgreSQL URL | adminUser (sealed secret) |
| orchestrator (Node.js) | Hasura GraphQL | JWT (HS256, 5min expiry) — only get_session_content() |
| tooling (React) | Hasura GraphQL | JWT via orchestrator webhook |
| hub/metering (Node.js) | Hasura GraphQL | sessions_metering_fast(), get_meaningful_sessions_count_by_hour_range() |
| hub/analytics-emails (Node.js) | Hasura GraphQL | sessions_counts_fast(), ratings_counts(), agents_performance() |
Kafka Topology
| Topic | Partitions | Consumer Group | Pool |
|---|---|---|---|
tracking.analytics.message.2 |
6 | analytics.group.4 |
MessagePool |
tracking.insights.1 |
6 | analytics.insights.group.1 |
InsightsPool |
tracking.analytics.message.failed.1 |
6 | analytics.failed.group.1 |
MessageFailedPool |
Kafka cluster: Strimzi KRaft v3.8.0, 3 brokers, min.insync.replicas=2, SCRAM-SHA-512, 12h retention, gzip compression. Broker address: p4-kafka-kafka-bootstrap:9093.
Schema
Hypertables:
chatlogs— primary event log, partitioned ontime. ~45 columns. Every message event is a row.sessions— session aggregation, partitioned onstart_time. ~23 columns. UNIQUE on(start_time, session_id, brain_parent_id).reactions— user reactions (thumbs up/down), partitioned ontime.response_corrections— agent response edits, partitioned ontime.insights— AI-generated session insights (CSAT, sentiment, goal), partitioned ontime.
Continuous Aggregates (CAGGs):
| CAGG | Source | Granularity | Purpose |
|---|---|---|---|
sessions_account_desk_daily |
sessions | 1 day | Dashboard counts by account/desk |
sessions_brain_daily |
sessions | 1 day | Dashboard counts by brain |
chatlogs_account_desk_brain_daily |
chatlogs | 1 day | Message counts, last_used |
ratings_daily |
chatlogs | 1 day | Rating aggregates by account/desk/brain/agent |
agents_performance_1h |
chatlogs | 1 hour | Per-agent perf with ROLLUP stats |
monthly_active_users_1m |
chatlogs | 1 month | Unique users/month |
sessions_hourly |
chatlogs | 1 hour | Main sessions list (GetSessions) |
meaningful_sessions_per_hour |
chatlogs | 1 hour | GCP metering |
Event Types (18 tracked)
Message events: message:received, message:brain_send, message:send, message:broadcast_send, message:campaign_send, message:compose, message:delivered, message:read, message:rating, message:correction
Conversation events: conversation:closed, conversation:member_join, conversation:member_leave
Session events: session:closed, session:rating, session:expired, context:updated
The insert_session() trigger fires only on 6 of these: message:received, message:brain_send, message:send, session:rating, message:broadcast_send, message:campaign_send.
Architectural Pain Points
- CPU contention — writes (ingestion + triggers + CAGG refresh) compete with reads (dashboard queries + session detail) on one PG instance
- Triple redundancy — same session data in 3 places: raw chatlogs,
sessionstable (trigger-maintained),sessions_hourlyCAGG insert_session()trigger — every chatlog INSERT fires a 14-column CASE UPSERT into sessions (row lock contention, read-before-write, array dedup per row)- 189-day CAGG window — CAGGs only cover ~6 months (some functions allow 369 days). Beyond window = raw table scans
_fastsuffix pattern — 6+ functions duplicated:requests_counts_fast(CAGG) vsrequests_counts(raw),counts_per_brain_fast,counts_per_desk_fast,log_requests_interval_counts_v2_fast,log_sessions_interval_counts_v2_fast,sessions_metering_fast- Session uniqueness is messy — one session = multiple rows if it spans brains.
sessions_hourlycollapses differently. Counts disagree.
Query Pattern Summary
| Category | Source | Functions |
|---|---|---|
| Chatlogs-only | Raw chatlogs or chatlogs CAGGs | ~20 functions (RatingsCounts, GetSessionsCount, etc.) |
| Sessions-only | Sessions CAGGs | ~5 _fast functions (SessionsCountsFast, SessionsMeteringFast, etc.) |
| Cross-table (CAGG join) | sessions CAGG + chatlogs CAGG | CountsPerBrainFast, CountsPerDeskFast |
| Session list | sessions_hourly CAGG + reactions + insights |
GetSessions (served via Hasura to frontend) |
| Session detail | Raw chatlogs + reactions + corrections + insights | GetSessionContent (called by orchestrator + frontend via Hasura) |
Proposed Architecture
Guiding Principle
Move all event/message data and its associated tables to ClickHouse. Keep only session metadata in PG for Hasura.
Kafka (Strimzi)
│
┌────────────────────┬┴───────────────────────┐
│ │ │
▼ ▼ ▼
tracking.analytics. tracking.insights.1 tracking.analytics.
message.2 message.failed.1
│ │ │
┌──────── ┼ ────────┐ │ │
▼ ▼ ▼ ▼
ClickHouse (Aiven) Go Analytics Service (same Go service)
Kafka Engine tables (modified consumer)
───────────────── ──────────────────
chatlogs Only upsert_session()
reactions for 6 event types
corrections → PG sessions table
insights (no chatlog INSERTs)
│
├── MVs: daily aggs, ratings, agent perf, MAU, sessions_hourly
├── Bloom filters on session_id, user_id, request_id
└── Dictionaries: pull session metadata from PG
│
▼
Go analytics service
queries ClickHouse for:
- GetSessionContent (single CH query)
- GetSessions (CH MV + reactions + insights)
- All aggregation queries
│
PG retains only:
- sessions table (for Hasura + CountsPerBrain/Desk)
- Hasura GraphQL (session metadata)
Why This Split
ClickHouse gets: chatlogs (high-volume, append-only), reactions, corrections, insights — all append-only with the same access pattern. This means:
- GetSessionContent (60% of DB time) becomes a single ClickHouse query — no cross-DB joins, no correlated subqueries
- GetSessions (23% of DB time) becomes a single ClickHouse query against a sessions_hourly MV with reactions/insights JOINed in CH
- All aggregation queries (RatingsCounts, MAU, AgentsPerformance, etc.) are pure ClickHouse
PG keeps: sessions table (relational, needs UPSERT updates, serves Hasura). PG's workload drops to:
- Session metadata upserts (from Go service, no more trigger)
- CountsPerBrain/DeskFast session-side subquery (lightweight CAGG scan)
- Hasura GraphQL for session metadata
Only 2 functions need cross-DB joins (CountsPerBrainFast, CountsPerDeskFast) — these join session counts from PG CAGGs with message counts from ClickHouse MVs. Handled as app-level joins in Go (both subqueries return small aggregated result sets).
Impact by Query
| Operation | Total Time (s) | Current | After Split |
|---|---|---|---|
| GetSessionContent | 155,461 | PG: chatlogs + reactions + corrections + insights | CH only: all tables in one engine |
| GetSessions | 56,675 | PG: sessions_hourly CAGG + reactions + insights | CH only: MV + reactions + insights |
| RatingsCounts | 18,049 | PG: raw chatlogs scan | CH only: MV or raw scan |
| SessionsMeteringFast | 4,720 | PG: sessions CAGGs | CH only: MV |
| GetNormalizedRatingFeedback | 4,469 | PG: raw chatlogs | CH only |
| LogsSessionsIntervalV2 | 2,872 | PG: chatlogs CAGG | CH only: MV |
| GetMonthlyActiveUsers | 2,437 | PG: chatlogs CAGG | CH only: MV |
| CountsPerBrainFast | 2,385 | PG: sessions CAGG + chatlogs CAGG | Split: PG sessions + CH chatlogs, app join |
~98% of total DB time moves to ClickHouse. PG retains only the CountsPerBrain/DeskFast session-side scans (~2% of load).
Ingestion Strategy: Kafka Fan-out (Not CDC)
Kafka writes to both DBs via independent consumer groups. CDC with Debezium was evaluated and rejected:
| Dimension | CDC (Debezium) | Kafka Fan-out (chosen) |
|---|---|---|
| Latency | +20-500ms (WAL → Debezium → Kafka → CH) | Near-zero. Both consumers read same topics |
| Ops overhead | Kafka Connect cluster, replication slot monitoring, WAL retention | Additional consumer group per topic. Already manage Kafka consumers |
| Failure blast radius | Debezium falling behind → WAL grows → PG disk fills | Consumer lag = Kafka offset. No impact on source DB |
| Schema coupling | CH schema derived from PG physical schema | Each DB independently interprets Kafka events |
| Throughput | PG connector single-task: ~7-15K events/sec | No bottleneck. Both consumers scale independently |
Additional Debezium/TimescaleDB limitations:
- Must use
FOR ALL TABLESpublications (dynamic chunk tables) - Compressed chunks invisible to CDC
- No ClickHouse source connector (ClickHouse has no WAL)
Fan-out Topology
Kafka Topics Consumer Groups
───────────── ────────────────
tracking.analytics.message.2 ──→ analytics.group.4 (existing Go, modified)
└→ upsert_session() for 6 event types → PG sessions
──→ clickhouse-chatlogs (CH Kafka engine)
└→ chatlogs + reactions + corrections tables in CH
tracking.insights.1 ──→ analytics.insights.group.1 (existing Go, kept)
└→ insights still need special processing?
──→ clickhouse-insights (CH Kafka engine)
└→ insights table in CH
tracking.analytics. ──→ analytics.failed.group.1 (existing Go, kept)
message.failed.1 └→ error tracking, retries → PG
Critical constraint: Kafka retention is only 12 hours. No replay-based backfill. Historical data migration must come from TimescaleDB directly.
Session Creation Without Chatlogs in PG
The current insert_session() trigger fires AFTER INSERT on chatlogs. If chatlogs stop going to PG, sessions table stays empty.
Current: Kafka → PG chatlogs INSERT → trigger fires → sessions UPSERT
After: Kafka → ClickHouse (chatlogs)
→ Go service → upsert_session() → PG sessions (no chatlog INSERT)
Solution: Replace the trigger with a callable upsert_session() function. The Go consumer reads all Kafka messages but only writes session UPSERTs — no chatlog INSERTs.
The function preserves identical semantics: sticky booleans (is_covered once false stays false), conditional is_meaningful calculation referencing current row state, tag array dedup via SELECT DISTINCT UNNEST, and atomic read-before-write via INSERT ... ON CONFLICT DO UPDATE. The UPSERT logic must stay in PG (not Go) because it requires atomicity guarantees that would otherwise require explicit row locking.
Consistency Between PG and ClickHouse
Both consumer groups read independently. Small window where one is ahead:
| Scenario | Effect | Acceptable? |
|---|---|---|
| PG ahead — session exists, CH messages still ingesting | Session listed; content returns partial messages | Yes — messages appear within ms, LEFT JOINs handle NULLs |
| CH ahead — messages in CH, session row not yet in PG | Session not listed yet | Yes — session appears within ms, no user-visible impact |
Typical inter-consumer-group lag: <100ms. Acceptable for analytics.
Fallback if stronger guarantees needed: merge into single consumer group that writes PG (sync) then CH (async/buffered). Tradeoff: coupling if CH is down.
ClickHouse Schema Design Principles
Main Table: chatlogs
- Engine: MergeTree (auto-remapped to ReplicatedMergeTree on Aiven)
- Partition: Monthly (
toYYYYMM(time)) - ORDER BY:
(account_id, desk_id, brain_id, toStartOfHour(time), session_id, event) - PRIMARY KEY:
(account_id, desk_id, brain_id, toStartOfHour(time))— sparse index only - TTL: 2 years
ORDER BY rationale:
account_idfirst — lowest cardinality, every query filters by itdesk_id,brain_id— aggregations roll up by thesetoStartOfHour(time)— bucketed for sparse index efficiencysession_idin ORDER BY (not PRIMARY KEY) — helps compression + GetSessionContent lookupseventlast — groups same event types for compression
Secondary indexes: Bloom filters on session_id, user_id, request_id. Full-text index on message_text.
JSONB message Handling (Three-Tier)
| Tier | What | How |
|---|---|---|
| Extracted columns | message_text, message_type, message_sender |
Dedicated typed columns. Best compression + query speed |
| Native JSON | message_data with JSON(max_dynamic_paths=256) |
Semi-structured remainder. Lazy parsing |
| Overflow | Rare/long-tail keys | Auto-stored in shared format. Still queryable |
Supporting Tables
- reactions — ORDER BY
(account_id, session_id, time), bloom on session_id + request_id - response_corrections — ORDER BY
(account_id, request_id, time), bloom on request_id - insights — ORDER BY
(account_id, session_id, time), bloom on session_id
All use MergeTree, monthly partitions, 2-year TTL.
Materialized Views (Replacing CAGGs)
ClickHouse MVs fire on INSERT (truly real-time) vs TimescaleDB CAGGs (periodic refresh, 1-15 min lag). No 189-day window limitation.
| Current CAGG | CH Replacement | Engine | Key Difference |
|---|---|---|---|
chatlogs_account_desk_brain_daily |
chatlogs_daily_agg |
AggregatingMergeTree | Real-time, uses -State/-Merge combinators |
ratings_daily |
ratings_daily_agg |
AggregatingMergeTree | Filters on rating events |
agents_performance_1h |
agents_perf_hourly |
AggregatingMergeTree | Hourly agent metrics |
monthly_active_users_1m |
mau_monthly |
AggregatingMergeTree | uniqState for approximate distinct |
sessions_hourly |
sessions_hourly_agg |
AggregatingMergeTree | Session-level aggregation with bit operations |
All MVs use AggregatingMergeTree with *State() aggregate functions. Queries use *Merge() combinators. This enables incremental aggregation on INSERT without full recomputation.
Compression Expectations
| Column type | Expected ratio |
|---|---|
LowCardinality(String) (event, channel, account_id) |
50-100x |
String (session_id, request_id) |
5-15x |
message_text (ZSTD) |
3-8x |
| Numerics (Delta+LZ4) | 10-20x |
| Overall vs TimescaleDB | ~10-20x less disk |
Kafka Ingestion (Three-Table Pattern)
Each topic uses ClickHouse's Kafka Engine:
- Kafka table — reads from topic, no storage
- Materialized view — transforms + routes (e.g., JSON extraction for message fields)
- Target MergeTree table — final storage
Kafka engine settings must match partition counts (6 consumers, not 4) and use SCRAM-SHA-512 auth. At-least-once semantics (acceptable for analytics).
Cross-DB Query Strategies
Only 2 functions need cross-DB queries: counts_per_desk_fast and counts_per_brain_fast.
1. Application-Level Joins (primary)
Go analytics service queries both DBs, merges in memory. Both subqueries return small aggregated result sets (one row per desk/brain).
2. ClickHouse Dictionaries (for enrichment)
HASHED dictionary sourced from PG sessions table, refreshed every 5-10 min. Useful for enriching ClickHouse queries with session metadata without leaving CH.
3. Aiven PG Integration (for ad-hoc)
Aiven auto-creates a PostgreSQL engine database in ClickHouse for live remote queries. Suitable for small lookups, not large scans.
Hasura Implications
- Hasura v2 OSS (confirmed current deployment,
dardanos/hasuraimage) - Statement timeout: 120s
- Auth: webhook to orchestrator, returns
X-Hasura-Role+X-Hasura-Allowed-Account-Idfor row-level security - All analytics functions tracked as type tables with auto-generated GraphQL
- Orchestrator only calls
get_session_content()— all other analytics queries come from tooling frontend or hub services
Recommendation: Keep Hasura on PG for session metadata. Route all ClickHouse-backed queries (GetSessionContent, GetSessions, aggregations) through the Go analytics service gRPC, not Hasura. This means:
- Frontend (tooling) must switch from Hasura GraphQL to Go gRPC for these queries
- Hub metering/analytics-emails must switch from Hasura to Go gRPC
- Orchestrator's
get_session_content()must route through Go service instead of Hasura
Performance Expectations
| Query type | TimescaleDB (current) | ClickHouse (expected) |
|---|---|---|
| Daily aggregation (from MV) | 50-200ms | 5-30ms |
| Complex aggregation (raw) | 2+ seconds | ~280ms |
| GetSessionContent (per call) | ~1s (correlated subqueries) | ~100-200ms (JOINs, columnar, bloom filter) |
| GetSessions (session list) | ~19s (CAGG + JOINs + HAVING) | ~1-3s (MV + JOINs, columnar) |
| RatingsCounts | ~56s (raw scan, PERCENTILE_CONT) | ~1-5s (quantile(), columnar) |
| Funnel analysis | Complex CTEs, seconds | Native windowFunnel(), sub-second |
| Full-text search | pg_trgm, slow | full_text index, fast |
Estimated Total DB Time Reduction
| Operation | Current Total (s) | Expected After Split |
|---|---|---|
| GetSessionContent | 155,461 | ~15,000-30,000 (5-10x faster per call) |
| GetSessions | 56,675 | ~3,000-9,000 (6-20x faster with MV) |
| RatingsCounts | 18,049 | ~500-1,500 (10-30x faster) |
| Remaining aggregations | ~17,000 | ~1,000-3,000 |
| Total | ~247,000 | ~20,000-44,000 |
| PG CPU load | 100% | ~2% (only sessions CAGG for 2 cross-DB functions) |
What This Eliminates
insert_session()trigger (replaced by callableupsert_session()— same logic, no chatlog INSERT needed)sessions_hourlyCAGG (replaced by ClickHouse MV)_fast/ non-_fastfunction duplication (6+ function pairs eliminated)- 189-day CAGG window limitation (ClickHouse MVs cover full retention)
- ~10-20x disk usage (columnar compression)
- CPU contention between reads and writes on single PG instance
- Correlated subqueries in GetSessionContent (replaced by ClickHouse JOINs)
Unresolved Questions
- Current chatlogs volume? (rows/day, GB/day) — needed to size ClickHouse
- Message JSONB schema — which fields are stable/frequent? (determines extracted columns)
logs_sessions_preview_v2still called or dead code?- Non-
_fastcount functions still called? (RequestsCounts times out at 120s — remove?) - Insights pipeline: does
InsightsPooldo processing before DB write, or pass-through? If processing, CH Kafka engine may not suffice for that topic conversation:reopenedevent appearing in logs but handled as "invalid type" — new event type to support?- Team ClickHouse experience?
- Frontend (tooling) currently queries Hasura for GetSessions — rerouting to Go gRPC is a frontend change. Scope?
- Hub metering/analytics-emails also query Hasura — reroute to Go gRPC or keep Hasura for session-only queries?
upsert_session(): Go consumer already parses Kafka headers (ReqID, AccountID, DeskID, SessionID, BrainParentID, IsTest, Channel, etc.) — does it also parsesession_start_time,collection_response_code,is_covered,is_containedfrom message body?- Migration cutover: parallel (trigger + new consumer) during transition, or hard switch?
- Historical data backfill: Kafka has only 12h retention. Must export from TimescaleDB directly — what's the export strategy?
Sources
- Aiven: ClickHouse
- Aiven: Connect Kafka to ClickHouse
- Aiven: Connect PostgreSQL to ClickHouse
- Aiven: ClickHouse Limitations
- Aiven: Create Dictionaries
- Aiven: Plans & Pricing
- ClickHouse vs TimescaleDB (Tinybird)
- ClickHouse vs TimescaleDB benchmarks
- Wingify: PG to ClickHouse, 80% cost savings
- SigNoz: ClickHouse JSON constraints
- Altinity: Pick keys for MergeTree
- Altinity: Kafka Engine FAQ
- Hasura: ClickHouse v2
- Hasura: DDN ClickHouse Connector
- ClickHouse: Native Postgres Service (Jan 2026)
- PG + ClickHouse: Industry Standard Pattern