TimescaleDB Continuous Aggregates: What I Got Wrong (and How to Fix It)
I thought continuous aggregates would solve my crypto data performance problems. They did, for about three weeks. Then the wheels came off, and I spent two months untangling the mess I'd made. If you're using TimescaleDB as your time-series database and thinking about continuous aggregates, read this first. I wish someone had warned me. I was storing OHLCV candlestick data (Open, High, Low, Close, Volume) for crypto pairs in TimescaleDB, a time-series database built on PostgreSQL, with millions of rows per day and queries aggregating across hours and days. Query performance on the raw hypertable was crawling, sometimes 30 seconds for a week's worth of aggregated data. Continuous aggregates seemed like the obvious fix. In TimescaleDB, a continuous aggregate is a PostgreSQL materialized view that refreshes itself incrementally in the background: pre-compute the aggregations, query the materialized view instead of raw data, and let TimescaleDB handle the rest. Clean, simple, automatic. It was none of those things. I created an hourly aggregate because I figured "hourly should be enough for most queries." It wasn't. Users wanted 4-hour candles, 6-hour candles, sometimes 2-hour. The hourly buckets were useless for them. Here's what I didn't understand upfront: you can't query a continuous aggregate at a finer granularity than what you defined. An hourly aggregate gives you hourly data. If you want 15-minute buckets, you're hitting the raw hypertable anyway, which defeats the whole point. -- This works fine -- same granularity as the aggregate SELECT time_bucket('1 hour', bucket), AVG(close) FROM candlesticks_1h GROUP BY 1; -- This has to go to the raw table -- 15-min is finer than what we materialized SELECT time_bucket('15 minutes', time), AVG(close) FROM candlesticks_raw GROUP BY 1; The fix was creating multiple aggregates at different granularities. And that led straight to my next problem. Worth noting: since TimescaleDB v2.9, you can stack continuous aggregates on top of each other. So you could create a 15-minute aggregate, then build an hourly aggregate on top of it, then a daily on top of that. I didn't know this at the time and ended up doing things the hard way. With three aggregate views (15-minute, hourly, daily), the refresh policies started stepping on each other. Data would appear in one view but not another. Sometimes queries returned different results depending on which view the planner hit. -- The configuration that caused headaches SELECT add_continuous_aggregate_policy('candlesticks_15m', start_offset => INTERVAL '30 minutes', end_offset => INTERVAL '5 minutes', schedule_interval => INTERVAL '5 minutes'); SELECT add_continuous_aggregate_policy('candlesticks_1h', start_offset => INTERVAL '2 hours', end_offset => INTERVAL '30 minutes', schedule_interval => INTERVAL '15 minutes'); The start_offset and end_offset parameters define the continuous aggregate refresh policy window, specifically what range of data gets recomputed each time the job runs. Get these wrong and you either have stale data (offsets too large) or a gap where queries return nothing (the refresh job simply hasn't caught up yet). I spent three days wondering why the last hour of data was missing. The answer: end_offset was 30 minutes but the refresh job ran every 15. The window had materialized data up to 30 minutes ago, and the real-time aggregation that would fill that gap was disabled in my config. Once I understood the interplay between the offsets and the schedule interval, it clicked, but the documentation doesn't make this obvious. This one actually killed performance. Each continuous aggregate creates its own internal materialization hypertable, which means its own set of chunks. With raw data, hourly aggregates, and daily aggregates all running, I had three independent chunk hierarchies to manage. -- Checking disk usage per table (returns one row per chunk) SELECT * FROM chunks_detailed_size('candlesticks_raw'); -- 847 rows returned SELECT * FROM chunks_detailed_size('candlesticks_1h'); -- 892 rows returned SELECT * FROM chunks_detailed_size('candlesticks_daily'); -- 412 rows returned Note: chunks_detailed_size() returns one row per chunk with size information, so the row count above is effectively your chunk count. When I saw nearly 900 chunks on a single aggregate view, I knew something had gone sideways. Query performance started degrading rather than improving. The database was spending more time on chunk exclusion (figuring out which chunks were relevant to a query) than actually reading data. Too many small chunks and the metadata overhead overwhelms the actual work. The fix was TimescaleDB chunk compression: compressing older chunks with a compression policy and adjusting the chunk time interval so chunks weren't being created too frequently. This isn't in the quick-start guide, but it absolutely should be. Here's the thing nobody talks about much: continuous aggregates are rigid by design. The GROUP BY you define at creation time is the GROUP BY you're stuck with. Adding a new grouping dimension later means dropping the materialized view and rebuilding from scratch, including re-running the backfill. -- Original aggregate CREATE MATERIALIZED VIEW candlesticks_1h WITH (timescaledb.continuous) AS SELECT time_bucket('1 hour', time) AS bucket, pair, AVG(close) AS close, MAX(high) AS high, MIN(low) AS low FROM candlesticks_raw GROUP BY bucket, pair; A few weeks after shipping this, a stakeholder asked: "Can we group by exchange too?" The answer was either rebuild everything and backfill, or query raw data. Both defeated the purpose. There's no ALTER MATERIALIZED VIEW that lets you change the query definition or add columns. You can tweak settings like materialized_only, but the underlying query is immutable. That said, the hierarchical aggregate approach I mentioned earlier does offer some flexibility here. If I'd planned things out with stacked aggregates (raw → 15m → 1h → daily), I could have added exchange grouping at one layer without rebuilding everything. Hindsight is 20/20. I'm not saying continuous aggregates are the wrong tool for crypto data. When the conditions are right, they work well: Your query patterns are fixed and known upfront You don't need to add grouping dimensions later Your refresh windows are carefully tested before going live But for crypto data specifically, where new trading pairs get added constantly and query requirements change, I'd now approach it differently: Start with the raw hypertable and good indexes before jumping to aggregates. This sounds boring, but TimescaleDB's chunk exclusion is fast. You might not need aggregates as much as you think. Use hierarchical continuous aggregates (available since TimescaleDB v2.9) to build a chain from fine to coarse granularity: raw → 15m → 1h → daily. This lets you reuse computation at each level, improve query performance at every tier, and gives you more flexibility if requirements change. Plan for chunk compression from day one. Older chunks should be compressed using a TimescaleDB compression policy, set it up early and don't let chunks pile up unmanaged. Manually call refresh_continuous_aggregate to validate your aggregate design before setting up automated policies. Debugging a broken policy after the fact is painful. Limit the total number of aggregates. More views means more chunks, more refresh jobs, and more things to get out of sync. The documentation frames continuous aggregates as a "set it and forget it" solution. They're not. They're a commitment to a specific query shape, and that shape is hard to change later. Go in with your eyes open.
