This style guide focuses on Postgres-based SQL dialects for the purpose of standardizing analytic queries to improve the consistency and ease of peer-reviewing and maintaining code.
- Use lower case and underscores to name tables, views, CTEs, columns, etc. No spaces.
- Don't use SQL keywords for naming things.
- If, for some reason (e.g., mirroring an external data source), you absolutely must use a SQL keyword as a name, always enclose it in double quotes (e.g.,
"timestamp"
). - When aliasing tables or subqueries, use short (1-5 character) aliases or don't alias. If using a single character, try to use a character that can be easily associated with the table/subquery in question. For example, use
c
forcustomers
.
- Before and after operators:
first_name = 'John'
- After commas in function calls:
dateadd(day, 23, getdate())
- Use after the SQL keyword introducing a SQL clause.
- Use after each individual column or equivalent in a SQL clause.
- Use before logical operators in
where
,having
, oron
clauses. - Use after the opening and closing parentheses for a CTE or subquery.
For example...
with the_cte as ( -- New line due to opening CTE parenthesis
select -- New line due to SQL keyword introducing a new clause
c.id as customer_id, -- New line after a column
e.email
from
customers as c
left join
emails as e
on
c.id = e.customer -- New line due to logical operator in on clause
and e.current = true
)
select
email,
count(*)
from
the_cte
group by
email
- Indent SQL code using four (4) spaces, not tabs
- Indent lists of columns or equivalents within each clause.
- Indent code for CTEs or subqueries within their parentheses.
For example...
with the_cte as (
select -- Start indenting due to being within CTE parentheses
c.id as customer_id, -- Indent further due to being a list of columns within select clause
e.email
from
customers as c
left join
emails as e
on
c.id = e.customer
and e.current = true
)
select
email,
count(*)
from
the_cte
group by
email
- Use your discretion in dividing long case statements across multiple lines. If you do divide long case statements, it's preferable to start a new line for each
when
orelse
. - If you have nested logic in a
where
clause (e.g.,foo = 1 and (bar = 2 or baz = 3)
), use discretion in splitting the inner clause over multiple lines. If it's short, one line is fine. If it's long, insert a line break and indent after the opening parenthesis, as you would format a subquery or CTE. - Feel free to write very simple queries on a single line.
- For readability purposes, use Common Table Expressions (CTEs) rather than subqueries to build complicated queries.
- Subqueries are generally unavoidable when filtering rows based on a particular column's presence in another table (e.g.,
when email in (select email from blacklist)
). If the subquery in question starts getting too large, move it to a CTE and reference the CTE in the subquery rather than the original table. Use your discretion here. - Never nest CTEs. This has all the disadvantages of both CTEs and subqueries with none of the advantages of either.
- CTEs are optimization fences in Postgres. If a CTE in Postgres is significantly hurting query performance, convert it to a subquery. Discretion.
For example...
with the_cte as ( -- CTE used instead of a subquery for readability.
select
left(c.last_name, 3) as first_three_letters
from
customers as c
left join
emails as e
on
c.id = e.customer
and e.current = true
where
e.email in ( -- This subquery is unavoidable.
select
email
from
blacklist
)
)
select
first_three_letters, -- We use this twice, so it's calculated in the CTE to prevent repeating code (see below).
count(*)
from
the_cte
group by
first_three_letters
- If you need to use the result of a particular function (e.g.,
dateadd(day, 14, date_column)
) multiple times in a query, consider doing the calculation in a CTE before referencing it in your main query. - If you need to use the result of a particular subquery in multiple queries, consider creating a temporary table containing the results.
- In
group by
andorder by
clauses, reference the actual names of selected columns (e.g.,group by customer_id
) rather than their position (e.g.,group by 1
). Grouping/ordering by position is very brittle and can fail silently. - When joining two tables, always specify which table a column comes from, even when it's only present in one table. For example
customers.id
orc.id
, not justid
. - In general, it's preferable to list individual columns to be selected rather than using
select *
. This is especially true for the final select list in a query that is used by an application (e.g., a Python script), because applications could break in unexpected ways if the number, contents, or order of the columns changes.
- Use lower case for SQL keywords.
cast()
is preferable to::
for compatibility with other flavors of SQL.- For performance reasons, use
like
orilike
instead of~
unless~
is needed. - For compatibility,
coalesce()
is preferable tonvl()
. - When aliasing, include the
as
for clarity, even though it's not necessary. - Use
between
(inclusive) where possible instead of combining multiple statements withand
. - Rather than using multiple
or
clauses, usein ()
. - Write booleans using all lowercase (i.e.
true
|false
), rather than mixed case, strings (i.e.'true'
), or integer equivalents (i.e.0
|1
).