Analytics DB: Dual-DB Architecture with ClickHouse

Date: 2026-02-19

Problem

TimescaleDB hits 100% CPU on read queries that combine session data with message data. All workloads — high-volume message writes, trigger-based session upserts, CAGG refreshes, dashboard aggregation queries, session detail lookups — compete for CPU/IO on a single PG instance.

Top operations by total DB time:

# Operation Avg (s) Calls Total Time (s) Source
1 GetSessionContent 1.0 152,033 155,461 raw chatlogs + correlated subqueries to reactions, corrections, insights
2 GetSessions 19.0 2,985 56,675 sessions_hourly CAGG + LEFT JOIN reactions + LEFT JOIN insights
3 RatingsCounts 56.2 321 18,049 raw chatlogs scan, COUNT DISTINCT, PERCENTILE_CONT
4 SessionsMeteringFast 2.3 2,054 4,720 sessions CAGGs
5 GetNormalizedRatingFeedback 21.8 205 4,469 raw chatlogs
6 LogsSessionsIntervalV2 13.9 207 2,872 chatlogs CAGG
7 GetMonthlyActiveUsers 4.6 534 2,437 monthly_active_users_1m CAGG
8 CountsPerBrainFast 95.4 25 2,385 cross-join: sessions CAGG + chatlogs CAGG

Additional slow queries (low volume, extreme latency):

# Operation Avg (s) Calls Total Time (s)
9 RequestsCountsFast 120.0 (timeout) 2 240
10 RequestsCounts 120.0 (timeout) 2 240
11 GetSessionsCount 82.2 3 247
12 GetAgentsPerformance 66.0 6 396
13 BrainsPerformance 64.5 4 258

Why GetSessionContent Dominates (60% of total DB time)

Despite a modest 1s average, it's called 152K times. Per call it:

  1. Scans raw chatlogs hypertable (~45 columns) for all messages in a session
  2. For every message:brain_send row, executes 2 correlated subqueries: one to reactions, one to response_corrections
  3. Builds 30+ field JSONB objects per message via ARRAY_AGG(JSONB_BUILD_OBJECT(...))
  4. Deduplicates tag and rule arrays via ARRAY(SELECT DISTINCT UNNEST(ARRAY_CAT_AGG(...)))
  5. LEFT JOINs insights table

A session with 100 messages and 50 brain_sends = 100 JSONB constructions + 100 index lookups against other hypertables. Caching is not viable: users browse historical sessions going back months with low repeat rate.

Why GetSessions Is Second (23% of total DB time)

Reads from sessions_hourly CAGG (derived from chatlogs), then LEFT JOINs reactions and insights. A HAVING clause with 15+ filter conditions including JSONB array unnesting, text LIKE searches, and insight filter evaluation. 19s average at 3K invocations.


Current Architecture

Data Flow

                                  Kafka (Strimzi, KRaft v3.8.0)
                                  3 brokers, SCRAM-SHA-512 auth
                                  12h retention, gzip compression
                                           │
                    ┌──────────────────────┬┴──────────────────────┐
                    │                      │                       │
        tracking.analytics.       tracking.insights.1     tracking.analytics.
           message.2                 (6 partitions)       message.failed.1
          (6 partitions)                   │                (6 partitions)
                    │                      │                       │
                    ▼                      ▼                       ▼
              ┌─────────────────── Go Analytics Service ───────────────────┐
              │  Consumer groups:                                          │
              │    analytics.group.4        → MessagePool                  │
              │    analytics.insights.group.1 → InsightsPool               │
              │    analytics.failed.group.1 → MessageFailedPool            │
              │                                                            │
              │  pgx pool (analyticsUser) → INSERT chatlogs                │
              │                            → insert_session() trigger      │
              │                               fires → UPSERT sessions     │
              └────────────────────────────────────────────────────────────┘
                                           │
                                           ▼
                                    TimescaleDB (PG17)
                                   Timescale Cloud (Aiven)
                                           │
                              ┌────────────┴────────────┐
                              ▼                          ▼
                      Hasura v2 (OSS)          Go service (gRPC)
                      GraphQL API               query functions
                              │                          │
              ┌───────────────┼───────────────┐          │
              ▼               ▼               ▼          ▼
        Orchestrator    Tooling (React)    Hub services   Tooling
        (JWT auth,      (dashboard         (metering,     (session
        only calls       queries via        weekly         detail)
        get_session_     Hasura)            email)
        content)

Database

  • TimescaleDB (PG17 + TimescaleDB 2.22/2.23) on Timescale Cloud (Aiven infra)
  • AWS Prod: prod-analytics-moveo-0dff.a.timescaledb.io:21879
  • GCP Prod: internal CNPG (CloudNativePG, 3 instances, PG17 + TimescaleDB extension)
  • Dev access via Tailscale VPN (prod-analytics-db:21879)

Who Connects

Service Method Auth
analytics (Go) Direct pgx pool (3 max connections) analyticsUser (sealed secret)
datawarehouse (Go) Direct pgx pool, same binary different mode separate DB credentials
hasura (v2 OSS) PostgreSQL URL adminUser (sealed secret)
orchestrator (Node.js) Hasura GraphQL JWT (HS256, 5min expiry) — only get_session_content()
tooling (React) Hasura GraphQL JWT via orchestrator webhook
hub/metering (Node.js) Hasura GraphQL sessions_metering_fast(), get_meaningful_sessions_count_by_hour_range()
hub/analytics-emails (Node.js) Hasura GraphQL sessions_counts_fast(), ratings_counts(), agents_performance()

Kafka Topology

Topic Partitions Consumer Group Pool
tracking.analytics.message.2 6 analytics.group.4 MessagePool
tracking.insights.1 6 analytics.insights.group.1 InsightsPool
tracking.analytics.message.failed.1 6 analytics.failed.group.1 MessageFailedPool

Kafka cluster: Strimzi KRaft v3.8.0, 3 brokers, min.insync.replicas=2, SCRAM-SHA-512, 12h retention, gzip compression. Broker address: p4-kafka-kafka-bootstrap:9093.

Schema

Hypertables:

  • chatlogs — primary event log, partitioned on time. ~45 columns. Every message event is a row.
  • sessions — session aggregation, partitioned on start_time. ~23 columns. UNIQUE on (start_time, session_id, brain_parent_id).
  • reactions — user reactions (thumbs up/down), partitioned on time.
  • response_corrections — agent response edits, partitioned on time.
  • insights — AI-generated session insights (CSAT, sentiment, goal), partitioned on time.

Continuous Aggregates (CAGGs):

CAGG Source Granularity Purpose
sessions_account_desk_daily sessions 1 day Dashboard counts by account/desk
sessions_brain_daily sessions 1 day Dashboard counts by brain
chatlogs_account_desk_brain_daily chatlogs 1 day Message counts, last_used
ratings_daily chatlogs 1 day Rating aggregates by account/desk/brain/agent
agents_performance_1h chatlogs 1 hour Per-agent perf with ROLLUP stats
monthly_active_users_1m chatlogs 1 month Unique users/month
sessions_hourly chatlogs 1 hour Main sessions list (GetSessions)
meaningful_sessions_per_hour chatlogs 1 hour GCP metering

Event Types (18 tracked)

Message events: message:received, message:brain_send, message:send, message:broadcast_send, message:campaign_send, message:compose, message:delivered, message:read, message:rating, message:correction

Conversation events: conversation:closed, conversation:member_join, conversation:member_leave

Session events: session:closed, session:rating, session:expired, context:updated

The insert_session() trigger fires only on 6 of these: message:received, message:brain_send, message:send, session:rating, message:broadcast_send, message:campaign_send.

Architectural Pain Points

  1. CPU contention — writes (ingestion + triggers + CAGG refresh) compete with reads (dashboard queries + session detail) on one PG instance
  2. Triple redundancy — same session data in 3 places: raw chatlogs, sessions table (trigger-maintained), sessions_hourly CAGG
  3. insert_session() trigger — every chatlog INSERT fires a 14-column CASE UPSERT into sessions (row lock contention, read-before-write, array dedup per row)
  4. 189-day CAGG window — CAGGs only cover ~6 months (some functions allow 369 days). Beyond window = raw table scans
  5. _fast suffix pattern — 6+ functions duplicated: requests_counts_fast (CAGG) vs requests_counts (raw), counts_per_brain_fast, counts_per_desk_fast, log_requests_interval_counts_v2_fast, log_sessions_interval_counts_v2_fast, sessions_metering_fast
  6. Session uniqueness is messy — one session = multiple rows if it spans brains. sessions_hourly collapses differently. Counts disagree.

Query Pattern Summary

Category Source Functions
Chatlogs-only Raw chatlogs or chatlogs CAGGs ~20 functions (RatingsCounts, GetSessionsCount, etc.)
Sessions-only Sessions CAGGs ~5 _fast functions (SessionsCountsFast, SessionsMeteringFast, etc.)
Cross-table (CAGG join) sessions CAGG + chatlogs CAGG CountsPerBrainFast, CountsPerDeskFast
Session list sessions_hourly CAGG + reactions + insights GetSessions (served via Hasura to frontend)
Session detail Raw chatlogs + reactions + corrections + insights GetSessionContent (called by orchestrator + frontend via Hasura)

Proposed Architecture

Guiding Principle

Move all event/message data and its associated tables to ClickHouse. Keep only session metadata in PG for Hasura.

                                    Kafka (Strimzi)
                                         │
                    ┌────────────────────┬┴───────────────────────┐
                    │                    │                         │
                    ▼                    ▼                         ▼
         tracking.analytics.    tracking.insights.1     tracking.analytics.
            message.2                                   message.failed.1
                    │                    │                         │
         ┌──────── ┼ ────────┐          │                         │
         ▼                   ▼          ▼                         ▼
   ClickHouse (Aiven)    Go Analytics Service              (same Go service)
   Kafka Engine tables    (modified consumer)
   ─────────────────      ──────────────────
   chatlogs              Only upsert_session()
   reactions             for 6 event types
   corrections           → PG sessions table
   insights              (no chatlog INSERTs)
         │
         ├── MVs: daily aggs, ratings, agent perf, MAU, sessions_hourly
         ├── Bloom filters on session_id, user_id, request_id
         └── Dictionaries: pull session metadata from PG
                    │
                    ▼
          Go analytics service
          queries ClickHouse for:
          - GetSessionContent (single CH query)
          - GetSessions (CH MV + reactions + insights)
          - All aggregation queries
                    │
          PG retains only:
          - sessions table (for Hasura + CountsPerBrain/Desk)
          - Hasura GraphQL (session metadata)

Why This Split

ClickHouse gets: chatlogs (high-volume, append-only), reactions, corrections, insights — all append-only with the same access pattern. This means:

  • GetSessionContent (60% of DB time) becomes a single ClickHouse query — no cross-DB joins, no correlated subqueries
  • GetSessions (23% of DB time) becomes a single ClickHouse query against a sessions_hourly MV with reactions/insights JOINed in CH
  • All aggregation queries (RatingsCounts, MAU, AgentsPerformance, etc.) are pure ClickHouse

PG keeps: sessions table (relational, needs UPSERT updates, serves Hasura). PG's workload drops to:

  • Session metadata upserts (from Go service, no more trigger)
  • CountsPerBrain/DeskFast session-side subquery (lightweight CAGG scan)
  • Hasura GraphQL for session metadata

Only 2 functions need cross-DB joins (CountsPerBrainFast, CountsPerDeskFast) — these join session counts from PG CAGGs with message counts from ClickHouse MVs. Handled as app-level joins in Go (both subqueries return small aggregated result sets).

Impact by Query

Operation Total Time (s) Current After Split
GetSessionContent 155,461 PG: chatlogs + reactions + corrections + insights CH only: all tables in one engine
GetSessions 56,675 PG: sessions_hourly CAGG + reactions + insights CH only: MV + reactions + insights
RatingsCounts 18,049 PG: raw chatlogs scan CH only: MV or raw scan
SessionsMeteringFast 4,720 PG: sessions CAGGs CH only: MV
GetNormalizedRatingFeedback 4,469 PG: raw chatlogs CH only
LogsSessionsIntervalV2 2,872 PG: chatlogs CAGG CH only: MV
GetMonthlyActiveUsers 2,437 PG: chatlogs CAGG CH only: MV
CountsPerBrainFast 2,385 PG: sessions CAGG + chatlogs CAGG Split: PG sessions + CH chatlogs, app join

~98% of total DB time moves to ClickHouse. PG retains only the CountsPerBrain/DeskFast session-side scans (~2% of load).


Ingestion Strategy: Kafka Fan-out (Not CDC)

Kafka writes to both DBs via independent consumer groups. CDC with Debezium was evaluated and rejected:

Dimension CDC (Debezium) Kafka Fan-out (chosen)
Latency +20-500ms (WAL → Debezium → Kafka → CH) Near-zero. Both consumers read same topics
Ops overhead Kafka Connect cluster, replication slot monitoring, WAL retention Additional consumer group per topic. Already manage Kafka consumers
Failure blast radius Debezium falling behind → WAL grows → PG disk fills Consumer lag = Kafka offset. No impact on source DB
Schema coupling CH schema derived from PG physical schema Each DB independently interprets Kafka events
Throughput PG connector single-task: ~7-15K events/sec No bottleneck. Both consumers scale independently

Additional Debezium/TimescaleDB limitations:

  • Must use FOR ALL TABLES publications (dynamic chunk tables)
  • Compressed chunks invisible to CDC
  • No ClickHouse source connector (ClickHouse has no WAL)

Fan-out Topology

Kafka Topics                        Consumer Groups
─────────────                       ────────────────
tracking.analytics.message.2   ──→  analytics.group.4 (existing Go, modified)
                                      └→ upsert_session() for 6 event types → PG sessions
                               ──→  clickhouse-chatlogs (CH Kafka engine)
                                      └→ chatlogs + reactions + corrections tables in CH

tracking.insights.1            ──→  analytics.insights.group.1 (existing Go, kept)
                                      └→ insights still need special processing?
                               ──→  clickhouse-insights (CH Kafka engine)
                                      └→ insights table in CH

tracking.analytics.             ──→  analytics.failed.group.1 (existing Go, kept)
  message.failed.1                    └→ error tracking, retries → PG

Critical constraint: Kafka retention is only 12 hours. No replay-based backfill. Historical data migration must come from TimescaleDB directly.

Session Creation Without Chatlogs in PG

The current insert_session() trigger fires AFTER INSERT on chatlogs. If chatlogs stop going to PG, sessions table stays empty.

Current:  Kafka → PG chatlogs INSERT → trigger fires → sessions UPSERT
After:    Kafka → ClickHouse (chatlogs)
                → Go service → upsert_session() → PG sessions (no chatlog INSERT)

Solution: Replace the trigger with a callable upsert_session() function. The Go consumer reads all Kafka messages but only writes session UPSERTs — no chatlog INSERTs.

The function preserves identical semantics: sticky booleans (is_covered once false stays false), conditional is_meaningful calculation referencing current row state, tag array dedup via SELECT DISTINCT UNNEST, and atomic read-before-write via INSERT ... ON CONFLICT DO UPDATE. The UPSERT logic must stay in PG (not Go) because it requires atomicity guarantees that would otherwise require explicit row locking.

Consistency Between PG and ClickHouse

Both consumer groups read independently. Small window where one is ahead:

Scenario Effect Acceptable?
PG ahead — session exists, CH messages still ingesting Session listed; content returns partial messages Yes — messages appear within ms, LEFT JOINs handle NULLs
CH ahead — messages in CH, session row not yet in PG Session not listed yet Yes — session appears within ms, no user-visible impact

Typical inter-consumer-group lag: <100ms. Acceptable for analytics.

Fallback if stronger guarantees needed: merge into single consumer group that writes PG (sync) then CH (async/buffered). Tradeoff: coupling if CH is down.


ClickHouse Schema Design Principles

Main Table: chatlogs

  • Engine: MergeTree (auto-remapped to ReplicatedMergeTree on Aiven)
  • Partition: Monthly (toYYYYMM(time))
  • ORDER BY: (account_id, desk_id, brain_id, toStartOfHour(time), session_id, event)
  • PRIMARY KEY: (account_id, desk_id, brain_id, toStartOfHour(time)) — sparse index only
  • TTL: 2 years

ORDER BY rationale:

  • account_id first — lowest cardinality, every query filters by it
  • desk_id, brain_id — aggregations roll up by these
  • toStartOfHour(time) — bucketed for sparse index efficiency
  • session_id in ORDER BY (not PRIMARY KEY) — helps compression + GetSessionContent lookups
  • event last — groups same event types for compression

Secondary indexes: Bloom filters on session_id, user_id, request_id. Full-text index on message_text.

JSONB message Handling (Three-Tier)

Tier What How
Extracted columns message_text, message_type, message_sender Dedicated typed columns. Best compression + query speed
Native JSON message_data with JSON(max_dynamic_paths=256) Semi-structured remainder. Lazy parsing
Overflow Rare/long-tail keys Auto-stored in shared format. Still queryable

Supporting Tables

  • reactions — ORDER BY (account_id, session_id, time), bloom on session_id + request_id
  • response_corrections — ORDER BY (account_id, request_id, time), bloom on request_id
  • insights — ORDER BY (account_id, session_id, time), bloom on session_id

All use MergeTree, monthly partitions, 2-year TTL.

Materialized Views (Replacing CAGGs)

ClickHouse MVs fire on INSERT (truly real-time) vs TimescaleDB CAGGs (periodic refresh, 1-15 min lag). No 189-day window limitation.

Current CAGG CH Replacement Engine Key Difference
chatlogs_account_desk_brain_daily chatlogs_daily_agg AggregatingMergeTree Real-time, uses -State/-Merge combinators
ratings_daily ratings_daily_agg AggregatingMergeTree Filters on rating events
agents_performance_1h agents_perf_hourly AggregatingMergeTree Hourly agent metrics
monthly_active_users_1m mau_monthly AggregatingMergeTree uniqState for approximate distinct
sessions_hourly sessions_hourly_agg AggregatingMergeTree Session-level aggregation with bit operations

All MVs use AggregatingMergeTree with *State() aggregate functions. Queries use *Merge() combinators. This enables incremental aggregation on INSERT without full recomputation.

Compression Expectations

Column type Expected ratio
LowCardinality(String) (event, channel, account_id) 50-100x
String (session_id, request_id) 5-15x
message_text (ZSTD) 3-8x
Numerics (Delta+LZ4) 10-20x
Overall vs TimescaleDB ~10-20x less disk

Kafka Ingestion (Three-Table Pattern)

Each topic uses ClickHouse's Kafka Engine:

  1. Kafka table — reads from topic, no storage
  2. Materialized view — transforms + routes (e.g., JSON extraction for message fields)
  3. Target MergeTree table — final storage

Kafka engine settings must match partition counts (6 consumers, not 4) and use SCRAM-SHA-512 auth. At-least-once semantics (acceptable for analytics).


Cross-DB Query Strategies

Only 2 functions need cross-DB queries: counts_per_desk_fast and counts_per_brain_fast.

1. Application-Level Joins (primary)

Go analytics service queries both DBs, merges in memory. Both subqueries return small aggregated result sets (one row per desk/brain).

2. ClickHouse Dictionaries (for enrichment)

HASHED dictionary sourced from PG sessions table, refreshed every 5-10 min. Useful for enriching ClickHouse queries with session metadata without leaving CH.

3. Aiven PG Integration (for ad-hoc)

Aiven auto-creates a PostgreSQL engine database in ClickHouse for live remote queries. Suitable for small lookups, not large scans.


Hasura Implications

  • Hasura v2 OSS (confirmed current deployment, dardanos/hasura image)
  • Statement timeout: 120s
  • Auth: webhook to orchestrator, returns X-Hasura-Role + X-Hasura-Allowed-Account-Id for row-level security
  • All analytics functions tracked as type tables with auto-generated GraphQL
  • Orchestrator only calls get_session_content() — all other analytics queries come from tooling frontend or hub services

Recommendation: Keep Hasura on PG for session metadata. Route all ClickHouse-backed queries (GetSessionContent, GetSessions, aggregations) through the Go analytics service gRPC, not Hasura. This means:

  • Frontend (tooling) must switch from Hasura GraphQL to Go gRPC for these queries
  • Hub metering/analytics-emails must switch from Hasura to Go gRPC
  • Orchestrator's get_session_content() must route through Go service instead of Hasura

Performance Expectations

Query type TimescaleDB (current) ClickHouse (expected)
Daily aggregation (from MV) 50-200ms 5-30ms
Complex aggregation (raw) 2+ seconds ~280ms
GetSessionContent (per call) ~1s (correlated subqueries) ~100-200ms (JOINs, columnar, bloom filter)
GetSessions (session list) ~19s (CAGG + JOINs + HAVING) ~1-3s (MV + JOINs, columnar)
RatingsCounts ~56s (raw scan, PERCENTILE_CONT) ~1-5s (quantile(), columnar)
Funnel analysis Complex CTEs, seconds Native windowFunnel(), sub-second
Full-text search pg_trgm, slow full_text index, fast

Estimated Total DB Time Reduction

Operation Current Total (s) Expected After Split
GetSessionContent 155,461 ~15,000-30,000 (5-10x faster per call)
GetSessions 56,675 ~3,000-9,000 (6-20x faster with MV)
RatingsCounts 18,049 ~500-1,500 (10-30x faster)
Remaining aggregations ~17,000 ~1,000-3,000
Total ~247,000 ~20,000-44,000
PG CPU load 100% ~2% (only sessions CAGG for 2 cross-DB functions)

What This Eliminates

  • insert_session() trigger (replaced by callable upsert_session() — same logic, no chatlog INSERT needed)
  • sessions_hourly CAGG (replaced by ClickHouse MV)
  • _fast / non-_fast function duplication (6+ function pairs eliminated)
  • 189-day CAGG window limitation (ClickHouse MVs cover full retention)
  • ~10-20x disk usage (columnar compression)
  • CPU contention between reads and writes on single PG instance
  • Correlated subqueries in GetSessionContent (replaced by ClickHouse JOINs)

Unresolved Questions

  • Current chatlogs volume? (rows/day, GB/day) — needed to size ClickHouse
  • Message JSONB schema — which fields are stable/frequent? (determines extracted columns)
  • logs_sessions_preview_v2 still called or dead code?
  • Non-_fast count functions still called? (RequestsCounts times out at 120s — remove?)
  • Insights pipeline: does InsightsPool do processing before DB write, or pass-through? If processing, CH Kafka engine may not suffice for that topic
  • conversation:reopened event appearing in logs but handled as "invalid type" — new event type to support?
  • Team ClickHouse experience?
  • Frontend (tooling) currently queries Hasura for GetSessions — rerouting to Go gRPC is a frontend change. Scope?
  • Hub metering/analytics-emails also query Hasura — reroute to Go gRPC or keep Hasura for session-only queries?
  • upsert_session(): Go consumer already parses Kafka headers (ReqID, AccountID, DeskID, SessionID, BrainParentID, IsTest, Channel, etc.) — does it also parse session_start_time, collection_response_code, is_covered, is_contained from message body?
  • Migration cutover: parallel (trigger + new consumer) during transition, or hard switch?
  • Historical data backfill: Kafka has only 12h retention. Must export from TimescaleDB directly — what's the export strategy?

Sources