Analytics DB: Dual-DB Architecture with ClickHouse

Date: 2026-02-19

Problem

TimescaleDB hits 100% CPU on read queries that combine session data with message data. All workloads — high-volume message writes, trigger-based session upserts, CAGG refreshes, dashboard aggregation queries, session detail lookups — compete for CPU/IO on a single PG instance.

Top operations by total DB time:

#	Operation	Avg (s)	Calls	Total Time (s)	Source
1	GetSessionContent	1.0	152,033	155,461	raw chatlogs + correlated subqueries to reactions, corrections, insights
2	GetSessions	19.0	2,985	56,675	`sessions_hourly` CAGG + LEFT JOIN reactions + LEFT JOIN insights
3	RatingsCounts	56.2	321	18,049	raw chatlogs scan, COUNT DISTINCT, PERCENTILE_CONT
4	SessionsMeteringFast	2.3	2,054	4,720	sessions CAGGs
5	GetNormalizedRatingFeedback	21.8	205	4,469	raw chatlogs
6	LogsSessionsIntervalV2	13.9	207	2,872	chatlogs CAGG
7	GetMonthlyActiveUsers	4.6	534	2,437	`monthly_active_users_1m` CAGG
8	CountsPerBrainFast	95.4	25	2,385	cross-join: sessions CAGG + chatlogs CAGG

Additional slow queries (low volume, extreme latency):

#	Operation	Avg (s)	Calls	Total Time (s)
9	RequestsCountsFast	120.0 (timeout)	2	240
10	RequestsCounts	120.0 (timeout)	2	240
11	GetSessionsCount	82.2	3	247
12	GetAgentsPerformance	66.0	6	396
13	BrainsPerformance	64.5	4	258

Why GetSessionContent Dominates (60% of total DB time)

Despite a modest 1s average, it's called 152K times. Per call it:

Scans raw chatlogs hypertable (~45 columns) for all messages in a session
For every message:brain_send row, executes 2 correlated subqueries: one to reactions, one to response_corrections
Builds 30+ field JSONB objects per message via ARRAY_AGG(JSONB_BUILD_OBJECT(...))
Deduplicates tag and rule arrays via ARRAY(SELECT DISTINCT UNNEST(ARRAY_CAT_AGG(...)))
LEFT JOINs insights table

A session with 100 messages and 50 brain_sends = 100 JSONB constructions + 100 index lookups against other hypertables. Caching is not viable: users browse historical sessions going back months with low repeat rate.

Why GetSessions Is Second (23% of total DB time)

Reads from sessions_hourly CAGG (derived from chatlogs), then LEFT JOINs reactions and insights. A HAVING clause with 15+ filter conditions including JSONB array unnesting, text LIKE searches, and insight filter evaluation. 19s average at 3K invocations.

Current Architecture

Data Flow

                                  Kafka (Strimzi, KRaft v3.8.0)
                                  3 brokers, SCRAM-SHA-512 auth
                                  12h retention, gzip compression
                                           │
                    ┌──────────────────────┬┴──────────────────────┐
                    │                      │                       │
        tracking.analytics.       tracking.insights.1     tracking.analytics.
           message.2                 (6 partitions)       message.failed.1
          (6 partitions)                   │                (6 partitions)
                    │                      │                       │
                    ▼                      ▼                       ▼
              ┌─────────────────── Go Analytics Service ───────────────────┐
              │  Consumer groups:                                          │
              │    analytics.group.4        → MessagePool                  │
              │    analytics.insights.group.1 → InsightsPool               │
              │    analytics.failed.group.1 → MessageFailedPool            │
              │                                                            │
              │  pgx pool (analyticsUser) → INSERT chatlogs                │
              │                            → insert_session() trigger      │
              │                               fires → UPSERT sessions     │
              └────────────────────────────────────────────────────────────┘
                                           │
                                           ▼
                                    TimescaleDB (PG17)
                                   Timescale Cloud (Aiven)
                                           │
                              ┌────────────┴────────────┐
                              ▼                          ▼
                      Hasura v2 (OSS)          Go service (gRPC)
                      GraphQL API               query functions
                              │                          │
              ┌───────────────┼───────────────┐          │
              ▼               ▼               ▼          ▼
        Orchestrator    Tooling (React)    Hub services   Tooling
        (JWT auth,      (dashboard         (metering,     (session
        only calls       queries via        weekly         detail)
        get_session_     Hasura)            email)
        content)

Database

TimescaleDB (PG17 + TimescaleDB 2.22/2.23) on Timescale Cloud (Aiven infra)
AWS Prod: prod-analytics-moveo-0dff.a.timescaledb.io:21879
GCP Prod: internal CNPG (CloudNativePG, 3 instances, PG17 + TimescaleDB extension)
Dev access via Tailscale VPN (prod-analytics-db:21879)

Who Connects

Service	Method	Auth
analytics (Go)	Direct pgx pool (3 max connections)	`analyticsUser` (sealed secret)
datawarehouse (Go)	Direct pgx pool, same binary different mode	separate DB credentials
hasura (v2 OSS)	PostgreSQL URL	`adminUser` (sealed secret)
orchestrator (Node.js)	Hasura GraphQL	JWT (HS256, 5min expiry) — only `get_session_content()`
tooling (React)	Hasura GraphQL	JWT via orchestrator webhook
hub/metering (Node.js)	Hasura GraphQL	`sessions_metering_fast()`, `get_meaningful_sessions_count_by_hour_range()`
hub/analytics-emails (Node.js)	Hasura GraphQL	`sessions_counts_fast()`, `ratings_counts()`, `agents_performance()`

Kafka Topology

Topic	Partitions	Consumer Group	Pool
`tracking.analytics.message.2`	6	`analytics.group.4`	MessagePool
`tracking.insights.1`	6	`analytics.insights.group.1`	InsightsPool
`tracking.analytics.message.failed.1`	6	`analytics.failed.group.1`	MessageFailedPool

Kafka cluster: Strimzi KRaft v3.8.0, 3 brokers, min.insync.replicas=2, SCRAM-SHA-512, 12h retention, gzip compression. Broker address: p4-kafka-kafka-bootstrap:9093.

Schema

Hypertables:

chatlogs — primary event log, partitioned on time. ~45 columns. Every message event is a row.
sessions — session aggregation, partitioned on start_time. ~23 columns. UNIQUE on (start_time, session_id, brain_parent_id).
reactions — user reactions (thumbs up/down), partitioned on time.
response_corrections — agent response edits, partitioned on time.
insights — AI-generated session insights (CSAT, sentiment, goal), partitioned on time.

Continuous Aggregates (CAGGs):

CAGG	Source	Granularity	Purpose
`sessions_account_desk_daily`	sessions	1 day	Dashboard counts by account/desk
`sessions_brain_daily`	sessions	1 day	Dashboard counts by brain
`chatlogs_account_desk_brain_daily`	chatlogs	1 day	Message counts, last_used
`ratings_daily`	chatlogs	1 day	Rating aggregates by account/desk/brain/agent
`agents_performance_1h`	chatlogs	1 hour	Per-agent perf with ROLLUP stats
`monthly_active_users_1m`	chatlogs	1 month	Unique users/month
`sessions_hourly`	chatlogs	1 hour	Main sessions list (GetSessions)
`meaningful_sessions_per_hour`	chatlogs	1 hour	GCP metering

Event Types (18 tracked)

Message events: message:received, message:brain_send, message:send, message:broadcast_send, message:campaign_send, message:compose, message:delivered, message:read, message:rating, message:correction

Conversation events: conversation:closed, conversation:member_join, conversation:member_leave

Session events: session:closed, session:rating, session:expired, context:updated

The insert_session() trigger fires only on 6 of these: message:received, message:brain_send, message:send, session:rating, message:broadcast_send, message:campaign_send.

Architectural Pain Points

CPU contention — writes (ingestion + triggers + CAGG refresh) compete with reads (dashboard queries + session detail) on one PG instance
Triple redundancy — same session data in 3 places: raw chatlogs, sessions table (trigger-maintained), sessions_hourly CAGG
insert_session() trigger — every chatlog INSERT fires a 14-column CASE UPSERT into sessions (row lock contention, read-before-write, array dedup per row)
189-day CAGG window — CAGGs only cover ~6 months (some functions allow 369 days). Beyond window = raw table scans
_fast suffix pattern — 6+ functions duplicated: requests_counts_fast (CAGG) vs requests_counts (raw), counts_per_brain_fast, counts_per_desk_fast, log_requests_interval_counts_v2_fast, log_sessions_interval_counts_v2_fast, sessions_metering_fast
Session uniqueness is messy — one session = multiple rows if it spans brains. sessions_hourly collapses differently. Counts disagree.

Query Pattern Summary

Category	Source	Functions
Chatlogs-only	Raw chatlogs or chatlogs CAGGs	~20 functions (RatingsCounts, GetSessionsCount, etc.)
Sessions-only	Sessions CAGGs	~5 `_fast` functions (SessionsCountsFast, SessionsMeteringFast, etc.)
Cross-table (CAGG join)	sessions CAGG + chatlogs CAGG	CountsPerBrainFast, CountsPerDeskFast
Session list	`sessions_hourly` CAGG + reactions + insights	GetSessions (served via Hasura to frontend)
Session detail	Raw chatlogs + reactions + corrections + insights	GetSessionContent (called by orchestrator + frontend via Hasura)

Proposed Architecture

Guiding Principle

Move all event/message data and its associated tables to ClickHouse. Keep only session metadata in PG for Hasura.

                                    Kafka (Strimzi)
                                         │
                    ┌────────────────────┬┴───────────────────────┐
                    │                    │                         │
                    ▼                    ▼                         ▼
         tracking.analytics.    tracking.insights.1     tracking.analytics.
            message.2                                   message.failed.1
                    │                    │                         │
         ┌──────── ┼ ────────┐          │                         │
         ▼                   ▼          ▼                         ▼
   ClickHouse (Aiven)    Go Analytics Service              (same Go service)
   Kafka Engine tables    (modified consumer)
   ─────────────────      ──────────────────
   chatlogs              Only upsert_session()
   reactions             for 6 event types
   corrections           → PG sessions table
   insights              (no chatlog INSERTs)
         │
         ├── MVs: daily aggs, ratings, agent perf, MAU, sessions_hourly
         ├── Bloom filters on session_id, user_id, request_id
         └── Dictionaries: pull session metadata from PG
                    │
                    ▼
          Go analytics service
          queries ClickHouse for:
          - GetSessionContent (single CH query)
          - GetSessions (CH MV + reactions + insights)
          - All aggregation queries
                    │
          PG retains only:
          - sessions table (for Hasura + CountsPerBrain/Desk)
          - Hasura GraphQL (session metadata)

Why This Split

ClickHouse gets: chatlogs (high-volume, append-only), reactions, corrections, insights — all append-only with the same access pattern. This means:

GetSessionContent (60% of DB time) becomes a single ClickHouse query — no cross-DB joins, no correlated subqueries
GetSessions (23% of DB time) becomes a single ClickHouse query against a sessions_hourly MV with reactions/insights JOINed in CH
All aggregation queries (RatingsCounts, MAU, AgentsPerformance, etc.) are pure ClickHouse

PG keeps: sessions table (relational, needs UPSERT updates, serves Hasura). PG's workload drops to:

Session metadata upserts (from Go service, no more trigger)
CountsPerBrain/DeskFast session-side subquery (lightweight CAGG scan)
Hasura GraphQL for session metadata

Only 2 functions need cross-DB joins (CountsPerBrainFast, CountsPerDeskFast) — these join session counts from PG CAGGs with message counts from ClickHouse MVs. Handled as app-level joins in Go (both subqueries return small aggregated result sets).

Impact by Query

Operation	Total Time (s)	Current	After Split
GetSessionContent	155,461	PG: chatlogs + reactions + corrections + insights	CH only: all tables in one engine
GetSessions	56,675	PG: sessions_hourly CAGG + reactions + insights	CH only: MV + reactions + insights
RatingsCounts	18,049	PG: raw chatlogs scan	CH only: MV or raw scan
SessionsMeteringFast	4,720	PG: sessions CAGGs	CH only: MV
GetNormalizedRatingFeedback	4,469	PG: raw chatlogs	CH only
LogsSessionsIntervalV2	2,872	PG: chatlogs CAGG	CH only: MV
GetMonthlyActiveUsers	2,437	PG: chatlogs CAGG	CH only: MV
CountsPerBrainFast	2,385	PG: sessions CAGG + chatlogs CAGG	Split: PG sessions + CH chatlogs, app join

~98% of total DB time moves to ClickHouse. PG retains only the CountsPerBrain/DeskFast session-side scans (~2% of load).

Ingestion Strategy: Kafka Fan-out (Not CDC)

Kafka writes to both DBs via independent consumer groups. CDC with Debezium was evaluated and rejected:

Dimension	CDC (Debezium)	Kafka Fan-out (chosen)
Latency	+20-500ms (WAL → Debezium → Kafka → CH)	Near-zero. Both consumers read same topics
Ops overhead	Kafka Connect cluster, replication slot monitoring, WAL retention	Additional consumer group per topic. Already manage Kafka consumers
Failure blast radius	Debezium falling behind → WAL grows → PG disk fills	Consumer lag = Kafka offset. No impact on source DB
Schema coupling	CH schema derived from PG physical schema	Each DB independently interprets Kafka events
Throughput	PG connector single-task: ~7-15K events/sec	No bottleneck. Both consumers scale independently

Additional Debezium/TimescaleDB limitations:

Must use FOR ALL TABLES publications (dynamic chunk tables)
Compressed chunks invisible to CDC
No ClickHouse source connector (ClickHouse has no WAL)

Fan-out Topology

Kafka Topics                        Consumer Groups
─────────────                       ────────────────
tracking.analytics.message.2   ──→  analytics.group.4 (existing Go, modified)
                                      └→ upsert_session() for 6 event types → PG sessions
                               ──→  clickhouse-chatlogs (CH Kafka engine)
                                      └→ chatlogs + reactions + corrections tables in CH

tracking.insights.1            ──→  analytics.insights.group.1 (existing Go, kept)
                                      └→ insights still need special processing?
                               ──→  clickhouse-insights (CH Kafka engine)
                                      └→ insights table in CH

tracking.analytics.             ──→  analytics.failed.group.1 (existing Go, kept)
  message.failed.1                    └→ error tracking, retries → PG

Critical constraint: Kafka retention is only 12 hours. No replay-based backfill. Historical data migration must come from TimescaleDB directly.

Session Creation Without Chatlogs in PG

The current insert_session() trigger fires AFTER INSERT on chatlogs. If chatlogs stop going to PG, sessions table stays empty.

Current:  Kafka → PG chatlogs INSERT → trigger fires → sessions UPSERT
After:    Kafka → ClickHouse (chatlogs)
                → Go service → upsert_session() → PG sessions (no chatlog INSERT)

Solution: Replace the trigger with a callable upsert_session() function. The Go consumer reads all Kafka messages but only writes session UPSERTs — no chatlog INSERTs.

The function preserves identical semantics: sticky booleans (is_covered once false stays false), conditional is_meaningful calculation referencing current row state, tag array dedup via SELECT DISTINCT UNNEST, and atomic read-before-write via INSERT ... ON CONFLICT DO UPDATE. The UPSERT logic must stay in PG (not Go) because it requires atomicity guarantees that would otherwise require explicit row locking.

Consistency Between PG and ClickHouse

Both consumer groups read independently. Small window where one is ahead:

Scenario	Effect	Acceptable?
PG ahead — session exists, CH messages still ingesting	Session listed; content returns partial messages	Yes — messages appear within ms, LEFT JOINs handle NULLs
CH ahead — messages in CH, session row not yet in PG	Session not listed yet	Yes — session appears within ms, no user-visible impact

Typical inter-consumer-group lag: <100ms. Acceptable for analytics.

Fallback if stronger guarantees needed: merge into single consumer group that writes PG (sync) then CH (async/buffered). Tradeoff: coupling if CH is down.

ClickHouse Schema Design Principles

Main Table: chatlogs

Engine: MergeTree (auto-remapped to ReplicatedMergeTree on Aiven)
Partition: Monthly (toYYYYMM(time))
ORDER BY: (account_id, desk_id, brain_id, toStartOfHour(time), session_id, event)
PRIMARY KEY: (account_id, desk_id, brain_id, toStartOfHour(time)) — sparse index only
TTL: 2 years

ORDER BY rationale:

account_id first — lowest cardinality, every query filters by it
desk_id, brain_id — aggregations roll up by these
toStartOfHour(time) — bucketed for sparse index efficiency
session_id in ORDER BY (not PRIMARY KEY) — helps compression + GetSessionContent lookups
event last — groups same event types for compression

Secondary indexes: Bloom filters on session_id, user_id, request_id. Full-text index on message_text.

JSONB `message` Handling (Three-Tier)

Tier	What	How
Extracted columns	`message_text`, `message_type`, `message_sender`	Dedicated typed columns. Best compression + query speed
Native JSON	`message_data` with `JSON(max_dynamic_paths=256)`	Semi-structured remainder. Lazy parsing
Overflow	Rare/long-tail keys	Auto-stored in shared format. Still queryable

Supporting Tables

reactions — ORDER BY (account_id, session_id, time), bloom on session_id + request_id
response_corrections — ORDER BY (account_id, request_id, time), bloom on request_id
insights — ORDER BY (account_id, session_id, time), bloom on session_id

All use MergeTree, monthly partitions, 2-year TTL.

Materialized Views (Replacing CAGGs)

ClickHouse MVs fire on INSERT (truly real-time) vs TimescaleDB CAGGs (periodic refresh, 1-15 min lag). No 189-day window limitation.

Current CAGG	CH Replacement	Engine	Key Difference
`chatlogs_account_desk_brain_daily`	`chatlogs_daily_agg`	AggregatingMergeTree	Real-time, uses `-State`/`-Merge` combinators
`ratings_daily`	`ratings_daily_agg`	AggregatingMergeTree	Filters on rating events
`agents_performance_1h`	`agents_perf_hourly`	AggregatingMergeTree	Hourly agent metrics
`monthly_active_users_1m`	`mau_monthly`	AggregatingMergeTree	`uniqState` for approximate distinct
`sessions_hourly`	`sessions_hourly_agg`	AggregatingMergeTree	Session-level aggregation with bit operations

All MVs use AggregatingMergeTree with *State() aggregate functions. Queries use *Merge() combinators. This enables incremental aggregation on INSERT without full recomputation.

Compression Expectations

Column type	Expected ratio
`LowCardinality(String)` (event, channel, account_id)	50-100x
`String` (session_id, request_id)	5-15x
`message_text` (ZSTD)	3-8x
Numerics (Delta+LZ4)	10-20x
Overall vs TimescaleDB	~10-20x less disk

Kafka Ingestion (Three-Table Pattern)

Each topic uses ClickHouse's Kafka Engine:

Kafka table — reads from topic, no storage
Materialized view — transforms + routes (e.g., JSON extraction for message fields)
Target MergeTree table — final storage

Kafka engine settings must match partition counts (6 consumers, not 4) and use SCRAM-SHA-512 auth. At-least-once semantics (acceptable for analytics).

Cross-DB Query Strategies

Only 2 functions need cross-DB queries: counts_per_desk_fast and counts_per_brain_fast.

1. Application-Level Joins (primary)

Go analytics service queries both DBs, merges in memory. Both subqueries return small aggregated result sets (one row per desk/brain).

2. ClickHouse Dictionaries (for enrichment)

HASHED dictionary sourced from PG sessions table, refreshed every 5-10 min. Useful for enriching ClickHouse queries with session metadata without leaving CH.

3. Aiven PG Integration (for ad-hoc)

Aiven auto-creates a PostgreSQL engine database in ClickHouse for live remote queries. Suitable for small lookups, not large scans.

Hasura Implications

Hasura v2 OSS (confirmed current deployment, dardanos/hasura image)
Statement timeout: 120s
Auth: webhook to orchestrator, returns X-Hasura-Role + X-Hasura-Allowed-Account-Id for row-level security
All analytics functions tracked as type tables with auto-generated GraphQL
Orchestrator only calls get_session_content() — all other analytics queries come from tooling frontend or hub services

Recommendation: Keep Hasura on PG for session metadata. Route all ClickHouse-backed queries (GetSessionContent, GetSessions, aggregations) through the Go analytics service gRPC, not Hasura. This means:

Frontend (tooling) must switch from Hasura GraphQL to Go gRPC for these queries
Hub metering/analytics-emails must switch from Hasura to Go gRPC
Orchestrator's get_session_content() must route through Go service instead of Hasura

Performance Expectations

Query type	TimescaleDB (current)	ClickHouse (expected)
Daily aggregation (from MV)	50-200ms	5-30ms
Complex aggregation (raw)	2+ seconds	~280ms
GetSessionContent (per call)	~1s (correlated subqueries)	~100-200ms (JOINs, columnar, bloom filter)
GetSessions (session list)	~19s (CAGG + JOINs + HAVING)	~1-3s (MV + JOINs, columnar)
RatingsCounts	~56s (raw scan, PERCENTILE_CONT)	~1-5s (`quantile()`, columnar)
Funnel analysis	Complex CTEs, seconds	Native `windowFunnel()`, sub-second
Full-text search	pg_trgm, slow	`full_text` index, fast

Estimated Total DB Time Reduction

Operation	Current Total (s)	Expected After Split
GetSessionContent	155,461	~15,000-30,000 (5-10x faster per call)
GetSessions	56,675	~3,000-9,000 (6-20x faster with MV)
RatingsCounts	18,049	~500-1,500 (10-30x faster)
Remaining aggregations	~17,000	~1,000-3,000
Total	~247,000	~20,000-44,000
PG CPU load	100%	~2% (only sessions CAGG for 2 cross-DB functions)

What This Eliminates

insert_session() trigger (replaced by callable upsert_session() — same logic, no chatlog INSERT needed)
sessions_hourly CAGG (replaced by ClickHouse MV)
_fast / non-_fast function duplication (6+ function pairs eliminated)
189-day CAGG window limitation (ClickHouse MVs cover full retention)
~10-20x disk usage (columnar compression)
CPU contention between reads and writes on single PG instance
Correlated subqueries in GetSessionContent (replaced by ClickHouse JOINs)

Unresolved Questions

Current chatlogs volume? (rows/day, GB/day) — needed to size ClickHouse
Message JSONB schema — which fields are stable/frequent? (determines extracted columns)
logs_sessions_preview_v2 still called or dead code?
Non-_fast count functions still called? (RequestsCounts times out at 120s — remove?)
Insights pipeline: does InsightsPool do processing before DB write, or pass-through? If processing, CH Kafka engine may not suffice for that topic
conversation:reopened event appearing in logs but handled as "invalid type" — new event type to support?
Team ClickHouse experience?
Frontend (tooling) currently queries Hasura for GetSessions — rerouting to Go gRPC is a frontend change. Scope?
Hub metering/analytics-emails also query Hasura — reroute to Go gRPC or keep Hasura for session-only queries?
upsert_session(): Go consumer already parses Kafka headers (ReqID, AccountID, DeskID, SessionID, BrainParentID, IsTest, Channel, etc.) — does it also parse session_start_time, collection_response_code, is_covered, is_contained from message body?
Migration cutover: parallel (trigger + new consumer) during transition, or hard switch?
Historical data backfill: Kafka has only 12h retention. Must export from TimescaleDB directly — what's the export strategy?