How Postgres chooses which index to use #26959

TheOtherBrian1 · 2024-06-03T05:53:16Z

TheOtherBrian1
Jun 3, 2024
Maintainer

For the curious: here is a list of all built-in indexes in Postgres

Postgres Internals

How an index is choosen

PostgreSQL, internally, contains a few components that manage query execution:

Module	Description
Parser	Converts SQL into an easily traversable query tree
Planner/Optimizer	Takes the query tree and uses rules and database statistics to find the optimal strategy for getting the data
Executor	Executes the plan created by the planner

The planner will consider using an index when an indexed column is present in a filter statement, such as:

WHERE
LIKE
ILIKE
DISTINCT
SIMILAR TO
JOIN
ORDER BY

Otherwise, it will likely perform a full table scan (sequential scan).

In the majority of cases, the indexed column must not only be present but also must be filtered by a comparison operator (=, >, <>) that is compatible with the index.

As an example, one can create the following table:

Column Name	Data Type
id	INT
data	JSONB

On the data column, a GIN index can be applied, which is excellent for filtering JSONB datatypes:

CREATE INDEX some_arbitary_index_name ON some_table USING gin (data);

Here's a link to the list operators supported by the GIN index; notably, it does not support greater than >:

--GIN index will never be used
SELECT * FROM some_table
WHERE data->val > 5;

GIN does support the @> operator:

--GIN will be considered
SELECT id FROM some_table 
WHERE data @> '[ { "itemId": "p11" } ]';

In most cases, developers work with the default BTREE index. It is the most practical and performant in the majority of cases and is compatible with the following filter operators:

Comparison Operator
<
<=
=
>=
>

An operator's functional equivalents, such as IN, BETWEEN, and ANY, are also valid.

However, just because the base requirements (relevant column, filter, and operators) are present, doesn't mean that an index will be used.

Indexes have a startup cost, so for small tables, Postgres might use a sequential scan if it believes that it will take less time. The database keeps statistics about each table that it uses to inform these choices.

In very rare cases, these statistics can become stale, and Postgres may opt to use a slower index or sequential scan when a better option is available.

You can see the query plan with the EXPLAIN keyword:

EXPLAIN <your query>

To understand how to interpret its output, you can check out this explainer.

To reset statistics within the database, you can use the following query:

-- use judiciously
SELECT pg_stat_reset();

Complex or Composite indexes

For a more complete rundown, check the Postgres Official Docs

Multi-column indexes

If you make independent indexes on multiple columns, Postgres will likely use each of them independently to find the relevant rows and then combine the results together.

It is possible to make multi-column indexes. If you are regularly filtering against multiple columns, there can be performance benefits using them instead of several independent indexes.

-- multi-column index
CREATE INDEX test2_mm_idx ON test2 (major, minor);

-- multi-column comparison:
SELECT name FROM test2 
WHERE major = constant AND minor = constant;

Ordered indexes

If you're using an ORDER BY clause, indexes can also be pre-sorted by DESC/ASC for better performance.

-- organizes the index in a DESC order, places NULL values at the end
CREATE INDEX test3_desc_index ON test3 (id DESC NULLS LAST);

Functional indexes

Although not as common, indexes can also be leveraged against modified values, such as when using a LOWER function:

-- Index on modified column through function
CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1));

-- Index will be considered for the following query:
SELECT * FROM test1 WHERE lower(col1) = 'value';

Covering indexes

Indexes contain pointers to a specific row and a copy of the indexed value, but you could instruct an index to hold a copy of another column's value, too. These are known as covering indexes. Because this is more storage intensive, you should avoid using it for values with large data footprints. FULL VIDEO ON TOPIC

CREATE INDEX a_b_idx ON x (a,b) INCLUDE (c);

Indexes on JSONB

Although a GIN/GIST index can be used to index entire JSONB bodies, you can also target just specific Key-values with standard BTREE indexes:

-- Example table
CREATE TABLE person (
    id SERIAL PRIMARY KEY,
    data JSONB
);


create index index_name on person (
    (data->>'name') 
);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supabase

How Postgres chooses which index to use #26959

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Supabase

How Postgres chooses which index to use #26959

TheOtherBrian1 Jun 3, 2024 Maintainer

Postgres Internals

How an index is choosen

Complex or Composite indexes

Multi-column indexes

Ordered indexes

Functional indexes

Covering indexes

Indexes on JSONB

Replies: 0 comments

TheOtherBrian1
Jun 3, 2024
Maintainer