How Postgres chooses which index to use #26959
TheOtherBrian1
announced in
Troubleshooting
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Postgres Internals
How an index is choosen
PostgreSQL, internally, contains a few components that manage query execution:
The planner will consider using an index when an indexed column is present in a filter statement, such as:
WHERE
LIKE
ILIKE
DISTINCT
SIMILAR TO
JOIN
ORDER BY
Otherwise, it will likely perform a full table scan (sequential scan).
In the majority of cases, the indexed column must not only be present but also must be filtered by a comparison operator (=, >, <>) that is compatible with the index.
As an example, one can create the following table:
On the data column, a GIN index can be applied, which is excellent for filtering JSONB datatypes:
Here's a link to the list operators supported by the GIN index; notably, it does not support greater than
>
:GIN does support the
@>
operator:In most cases, developers work with the default BTREE index. It is the most practical and performant in the majority of cases and is compatible with the following filter operators:
An operator's functional equivalents, such as
IN
,BETWEEN
, andANY
, are also valid.However, just because the base requirements (relevant column, filter, and operators) are present, doesn't mean that an index will be used.
Indexes have a startup cost, so for small tables, Postgres might use a sequential scan if it believes that it will take less time. The database keeps statistics about each table that it uses to inform these choices.
In very rare cases, these statistics can become stale, and Postgres may opt to use a slower index or sequential scan when a better option is available.
You can see the query plan with the
EXPLAIN
keyword:To understand how to interpret its output, you can check out this explainer.
To reset statistics within the database, you can use the following query:
Complex or Composite indexes
Multi-column indexes
If you make independent indexes on multiple columns, Postgres will likely use each of them independently to find the relevant rows and then combine the results together.
It is possible to make multi-column indexes. If you are regularly filtering against multiple columns, there can be performance benefits using them instead of several independent indexes.
Ordered indexes
If you're using an ORDER BY clause, indexes can also be pre-sorted by DESC/ASC for better performance.
Functional indexes
Although not as common, indexes can also be leveraged against modified values, such as when using a LOWER function:
Covering indexes
Indexes contain pointers to a specific row and a copy of the indexed value, but you could instruct an index to hold a copy of another column's value, too. These are known as
covering
indexes. Because this is more storage intensive, you should avoid using it for values with large data footprints. FULL VIDEO ON TOPICIndexes on JSONB
Although a GIN/GIST index can be used to index entire JSONB bodies, you can also target just specific Key-values with standard BTREE indexes:
Beta Was this translation helpful? Give feedback.
All reactions