Work

Products

Services

About Us

Careers

Blog

Resources

Why PostgreSQL Indexes Fail - The Role of Outdated Statistics
Image

Rahul J

Jul 01, 2025

Overview
This post explains why PostgreSQL sometimes ignores indexes when statistics are outdated, causing slow queries even when indexes exist. You’ll learn how the query planner uses stats, how to spot stale statistics, and simple steps to keep your queries fast.

Introduction

PostgreSQL’s query planner is one of its greatest strengths, but it can also be a source of mystery and frustration when your queries suddenly slow down and indexes seem to stop working.
Often, the culprit isn’t a missing or broken index, but outdated statistics.

In this post, I’ll walk you through:

  • How PostgreSQL’s statistics system works
  • How the query planner makes decisions using those stats
  • Why and when statistics become outdated
  • How to reproduce, detect, and fix this issue yourself with real SQL examples

1. What Are PostgreSQL Statistics?

PostgreSQL tracks detailed statistics for every table and column, including:

  • Number of live tuples (n_live_tup)
  • Most common values and their frequencies (most_common_vals, most_common_freqs)
  • Histograms (distribution buckets for the rest of the data)
  • Null fraction, number of distinct values, and more

These statistics are maintained in the system catalogs:

  • pg_stats

The query planner relies on these to estimate how many rows will match each filter in a query.

2. How the Query Planner Uses Statistics

Understanding pg_stats

The planner’s decision to use a sequential scan, an index scan, or a bitmap index scan is entirely dependent on up-to-date column statistics.

What’s inside pg_stats?

  • most_common_vals: The most frequent values in the column.
  • most_common_freqs: The frequency (as a decimal) for each value in most_common_vals.
  • histogram_bounds: Buckets that describe the spread of remaining values.
  • null_frac: Fraction of NULL values in the column.
  • n_distinct: Number of distinct values seen in the column.

Example

Suppose you have a table with status values, mostly 'active', some 'inactive':

SELECT attname, most_common_vals, most_common_freqs, null_frac, n_distinct
FROM pg_stats
WHERE tablename = 'useractivity';

Sample output:

attnamemost_common_valsmost_common_freqsnull_fracn_distinct
status{active,inactive}{0.9527,0.0473}02

Interpretation:

  • ~95% of rows have 'active', ~5% have 'inactive', and there are 2 possible status values.

How the Planner Uses These Stats to Pick a Scan Type

  • If your filter matches very few rows (e.g., status = 'inactive'), the planner prefers an index scan.
  • If your filter matches most rows (e.g., status = 'active'), the planner prefers a sequential scan.

What if the Stats Are Outdated?

If you change your data—say, update all rows to 'inactive', but do not run ANALYZE, the planner will continue to make its decisions based on old, now-wrong numbers.
This leads to bad query plans and poor performance.

3. When and How Are Statistics Updated?

Postgres does NOT update statistics after every data change.
Stats are refreshed in only two ways:

  • When you run ANALYZE (or VACUUM ANALYZE) manually.
  • When autovacuum’s autoanalyze process triggers—based on thresholds.

How Is the Autoanalyze Threshold Calculated?

Autoanalyze will only run after the following number of rows have been changed (inserted, updated, or deleted):

autovacuumAnalyzeThreshold + (autovacuumAnalyzeScaleFactor × number_of_rows_in_table)

Default values:

  • autovacuumAnalyzeThreshold = 50
  • autovacuumAnalyzeScaleFactor = 0.1 (i.e., 10%)

Example Calculation

If your table has 100,000 rows:

  • 0.1 × 100,000 = 10,000
  • Add the threshold: 10,000 + 50 = 10,050

So autoanalyze will not run until you have at least 10,051 row changes.
If you only do 9,999 changes, statistics will remain stale until more changes are made or you run ANALYZE manually.

See These Settings in Your DB

SHOW autovacuum_analyze_threshold;
SHOW autovacuum_analyze_scale_factor;

4. Deep Dive: When Indexes “Fail”

Let’s reproduce the problem from scratch:

Step 1: Create Table and Index

CREATE TABLE useractivity (
    id serial PRIMARY KEY,
    userid integer NOT NULL,
    status varchar(20) NOT NULL
);

CREATE INDEX idxuseractivitystatus ON useractivity(status);

Step 2: Insert Data (skewed)

-- 100,000 inactive users
INSERT INTO useractivity (userid, status)
SELECT generate_series(1, 100000), 'inactive';

-- 1,000 active users
INSERT INTO useractivity (userid, status)
SELECT generate_series(100001, 101000), 'active';

ANALYZE useractivity;

Step 3: See Current Stats

SELECT attname, most_common_vals, most_common_freqs, null_frac, n_distinct
FROM pg_stats
WHERE tablename = 'useractivity';

Sample output:

attnamemost_common_valsmost_common_freqsnull_fracn_distinct
status{inactive,active}{0.9527,0.0473}02

Step 4: Query Plans

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM useractivity WHERE status = 'inactive';
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM useractivity WHERE status = 'active';

You’ll likely see:

  • 'active': Index Scan or Bitmap Index Scan (planner expects a small fraction of rows)
  • 'inactive': Seq Scan (planner expects almost all rows to match)

Step 5: Bulk Data Change Without Analyze

-- Make everyone active
UPDATE useractivity SET status = 'active';

-- Add 10,000 inactive users
INSERT INTO useractivity (userid, status)
SELECT generate_series(105001, 115000), 'inactive';

-- Do NOT run ANALYZE yet!

Step 6: See What Planner Does With Stale Stats

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM useractivity WHERE status = 'inactive';

You may find the planner still does an sequential scan based on the old stats even though now there are less 'inactive' rows. This means that the query could be much slower than it should be.

Step 7: Refresh Stats and Observe the Change

ANALYZE useractivity;

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM useractivity WHERE status = 'inactive';

Now that the statistics are fresh, the planner realizes 'inactive' is actually rare, and it will likely switch to using an Index Scan for this query.

5. How to Fix and Prevent Bad Plans

  • Always run ANALYZE after big updates/inserts/deletes.

  • Monitor last stats update:

    SELECT relname, last_analyze, last_autoanalyze FROM pg_stat_user_tables WHERE relname = 'useractivity';
    
  • Lower the threshold for critical tables:

    ALTER TABLE useractivity SET (autovacuum_analyze_scale_factor = 0.01);
    

6. Tuning Autovacuum and Analyze

  • See your global settings:

    SHOW autovacuum_analyze_scale_factor;
    SHOW autovacuum_analyze_threshold;
    
  • Set lower thresholds on a per-table basis with ALTER TABLE ... SET (...).

7. Conclusion

Indexes themselves rarely break in PostgreSQL, what fails is the planner’s ability to recognize when to use them if statistics are outdated. By keeping your table statistics current, you empower PostgreSQL to make optimal decisions, ensuring your queries are fast and your indexes deliver the performance you expect.

We Build Digital Products That Move Your Business Forward

locale flag

en

Office Locations

India

India

502/A, 1st Main road, Jayanagar 8th Block, Bengaluru - 560070

France

France

66 Rue du Président Edouard Herriot, 69002 Lyon

United States

United States

151, Railroad Avenue, Suite 1F, Greenwich, CT 06830

© 2025 Surya Digitech Private Limited. All Rights Reserved.