
Rahul J
Jul 01, 2025
Introduction
PostgreSQL’s query planner is one of its greatest strengths, but it can also be a source of mystery and frustration when your queries suddenly slow down and indexes seem to stop working.
Often, the culprit isn’t a missing or broken index, but outdated statistics.
In this post, I’ll walk you through:
- How PostgreSQL’s statistics system works
- How the query planner makes decisions using those stats
- Why and when statistics become outdated
- How to reproduce, detect, and fix this issue yourself with real SQL examples
1. What Are PostgreSQL Statistics?
PostgreSQL tracks detailed statistics for every table and column, including:
- Number of live tuples (
n_live_tup
) - Most common values and their frequencies (
most_common_vals
,most_common_freqs
) - Histograms (distribution buckets for the rest of the data)
- Null fraction, number of distinct values, and more
These statistics are maintained in the system catalogs:
pg_stats
The query planner relies on these to estimate how many rows will match each filter in a query.
2. How the Query Planner Uses Statistics
pg_stats
Understanding The planner’s decision to use a sequential scan, an index scan, or a bitmap index scan is entirely dependent on up-to-date column statistics.
pg_stats
?
What’s inside - most_common_vals: The most frequent values in the column.
- most_common_freqs: The frequency (as a decimal) for each value in
most_common_vals
. - histogram_bounds: Buckets that describe the spread of remaining values.
- null_frac: Fraction of NULL values in the column.
- n_distinct: Number of distinct values seen in the column.
Example
Suppose you have a table with status values, mostly 'active'
, some 'inactive'
:
SELECT attname, most_common_vals, most_common_freqs, null_frac, n_distinct
FROM pg_stats
WHERE tablename = 'useractivity';
Sample output:
attname | most_common_vals | most_common_freqs | null_frac | n_distinct |
---|---|---|---|---|
status | {active,inactive} | {0.9527,0.0473} | 0 | 2 |
Interpretation:
- ~95% of rows have
'active'
, ~5% have'inactive'
, and there are 2 possible status values.
How the Planner Uses These Stats to Pick a Scan Type
- If your filter matches very few rows (e.g.,
status = 'inactive'
), the planner prefers an index scan. - If your filter matches most rows (e.g.,
status = 'active'
), the planner prefers a sequential scan.
What if the Stats Are Outdated?
If you change your data—say, update all rows to 'inactive'
, but do not run ANALYZE
, the planner will continue to make its decisions based on old, now-wrong numbers.
This leads to bad query plans and poor performance.
3. When and How Are Statistics Updated?
Postgres does NOT update statistics after every data change.
Stats are refreshed in only two ways:
- When you run
ANALYZE
(orVACUUM ANALYZE
) manually. - When autovacuum’s autoanalyze process triggers—based on thresholds.
How Is the Autoanalyze Threshold Calculated?
Autoanalyze will only run after the following number of rows have been changed (inserted, updated, or deleted):
autovacuumAnalyzeThreshold + (autovacuumAnalyzeScaleFactor × number_of_rows_in_table)
Default values:
autovacuumAnalyzeThreshold = 50
autovacuumAnalyzeScaleFactor = 0.1
(i.e., 10%)
Example Calculation
If your table has 100,000 rows:
0.1 × 100,000 = 10,000
- Add the threshold:
10,000 + 50 = 10,050
So autoanalyze will not run until you have at least 10,051 row changes.
If you only do 9,999 changes, statistics will remain stale until more changes are made or you run ANALYZE
manually.
See These Settings in Your DB
SHOW autovacuum_analyze_threshold;
SHOW autovacuum_analyze_scale_factor;
4. Deep Dive: When Indexes “Fail”
Let’s reproduce the problem from scratch:
Step 1: Create Table and Index
CREATE TABLE useractivity (
id serial PRIMARY KEY,
userid integer NOT NULL,
status varchar(20) NOT NULL
);
CREATE INDEX idxuseractivitystatus ON useractivity(status);
Step 2: Insert Data (skewed)
-- 100,000 inactive users
INSERT INTO useractivity (userid, status)
SELECT generate_series(1, 100000), 'inactive';
-- 1,000 active users
INSERT INTO useractivity (userid, status)
SELECT generate_series(100001, 101000), 'active';
ANALYZE useractivity;
Step 3: See Current Stats
SELECT attname, most_common_vals, most_common_freqs, null_frac, n_distinct
FROM pg_stats
WHERE tablename = 'useractivity';
Sample output:
attname | most_common_vals | most_common_freqs | null_frac | n_distinct |
---|---|---|---|---|
status | {inactive,active} | {0.9527,0.0473} | 0 | 2 |
Step 4: Query Plans
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM useractivity WHERE status = 'inactive';
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM useractivity WHERE status = 'active';
You’ll likely see:
'active'
: Index Scan or Bitmap Index Scan (planner expects a small fraction of rows)'inactive'
: Seq Scan (planner expects almost all rows to match)
Step 5: Bulk Data Change Without Analyze
-- Make everyone active
UPDATE useractivity SET status = 'active';
-- Add 10,000 inactive users
INSERT INTO useractivity (userid, status)
SELECT generate_series(105001, 115000), 'inactive';
-- Do NOT run ANALYZE yet!
Step 6: See What Planner Does With Stale Stats
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM useractivity WHERE status = 'inactive';
You may find the planner still does an sequential scan based on the old stats even though now there are less 'inactive'
rows. This means that the query could be much slower than it should be.
Step 7: Refresh Stats and Observe the Change
ANALYZE useractivity;
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM useractivity WHERE status = 'inactive';
Now that the statistics are fresh, the planner realizes 'inactive' is actually rare, and it will likely switch to using an Index Scan for this query.
5. How to Fix and Prevent Bad Plans
-
Always run
ANALYZE
after big updates/inserts/deletes. -
Monitor last stats update:
SELECT relname, last_analyze, last_autoanalyze FROM pg_stat_user_tables WHERE relname = 'useractivity';
-
Lower the threshold for critical tables:
ALTER TABLE useractivity SET (autovacuum_analyze_scale_factor = 0.01);
6. Tuning Autovacuum and Analyze
-
See your global settings:
SHOW autovacuum_analyze_scale_factor; SHOW autovacuum_analyze_threshold;
-
Set lower thresholds on a per-table basis with
ALTER TABLE ... SET (...)
.
7. Conclusion
Indexes themselves rarely break in PostgreSQL, what fails is the planner’s ability to recognize when to use them if statistics are outdated. By keeping your table statistics current, you empower PostgreSQL to make optimal decisions, ensuring your queries are fast and your indexes deliver the performance you expect.