-
Notifications
You must be signed in to change notification settings - Fork 106
Enabling an expression based sort
Some of our API endpoints make use of SQL expressions (functions) in the WHERE
clause of the underlying SQL statements. In these cases, we should be making use of the same expression in the ORDER BY
clause of the SQL statement if we're also sorting (or would like to enable sorting) on the same field(s) found in the WHERE
clause.
In order to accomplish this and ensure that the queries remain performant (or become more performant in the case of addressing existing performance issues), we must make use of indexes on expressions. With an index setup on the expression itself we can take advantage of the performance gains in both the WHERE
and ORDER BY
clauses in the SQL statements generated for the API.
There are two steps for accomplishing this:
- Setting up the expression-based index(es)
- Setting up the expression-based sort clause to map to the column(s) exposed for sorting in the API
Let's dive into the details of these two steps using [issue #2791](https://github.com/18F/openFEC/issues/2791 as the basis for the example.
Setting up an expression-based index is identical to a regular index, the only difference is there is a SQL expression in place of a column name. For example, there are COALESCE
statements in the WHERE
clause for the Schedule B queries:
COALESCE(ofec_sched_b_master.disb_dt, '9999-12-31'::date);
We'll be referencing this statement in the example. But first, note that the API is also keeping track of an index for paging, and in the case of Schedule A and B, a CHECK CONSTRAINT
for table partitioning due to the size of those data sets. All of our indexes must keep these things in mind, so they are usually setup as multicolumn indexes for maximum performance with our filtering, paging, and sorting options.
Back to our example with Schedule B, in order to create an index for this statement we would execute this statement:
CREATE INDEX idx_ofec_sched_b_1977_1978_election_cycle_coalesce_disb_dt
ON public.ofec_sched_b_1977_1978
USING btree
(two_year_transaction_period, COALESCE(disb_dt, '9999-12-31'::date), sub_id);
Please note that we would do this for all tables in the Schedule B partition. With an expression-based index created, the WHERE
and ORDER BY
clauses can now take advantage of it. In this case, it's the COALESCE
statement using the disb_dt
column.
A full example of this can be seen in PR #2904, where a migration was created to add a group of expression-based indexes for the Schedule B partition.
With expression-based indexes in place, any query with an expression in the WHERE
clause that matches the index will automatically take advantage of it. The ORDER BY
clause will not, however, because of the way the API is setup and how our sorting is exposed in the interface and URL. Only a column name is exposed in the API (both for usability and security, as we don't want to open ourselves up to SQL injection attacks), so we must map the column name to the expression we would like actually like to sort by. This is transparent to the user, meaning the API continues to work as it always has by specifying one or more (where supported) sorting columns with an optional -
in front to toggle between ascending or descending order.
In order to setup this mapping to enable expression-based sorting for a given field, a hybrid_property
must be setup on the model associated with the resource that is mapped to the endpoint being modified. For example, when using the API to view the Schedule B (disbursement) data, you would see this in the URL parameters by default: sort=-disbursement_date
. This is the column name exposed in the API, but under the hood it should actually sort by COALESCE(ofec_sched_b_master.disb_dt, '9999-12-31'::date)
. The API model is what enables us to do this if we add a new property to it.
Continuing with the Schedule B example, take a look at the ScheduleB
model, and what will be added to it is this:
@hybrid_property
def sort_expressions(self):
return {
'disbursement_date': sa.func.coalesce(
self.disbursement_date,
sa.cast('9999-12-31', sa.Date)
)
}
Let's unpack what is going on here:
- First, we need to use the [
@hybrid_property]([
hybrid_property](http://docs.sqlalchemy.org/en/rel_1_0/orm/extensions/hybrid.html?highlight=hybrid_property#sqlalchemy.ext.hybrid.hybrid_property)) decorator because we are making use of a column defined in the model and a SQLAlchemy construct (the reference to
coalesce` in this case). - Second, we create the property as a method and call it
sort_expressions
. This is important as the sorting code checks for the presence of this property name on a model to see if any columns need to be swapped for expressions. - Third, we return a
dict
in the method that contains the column names found in the model as the keys and the expressions we would like to swap them with as the values. Again, the keys are the column names that are defined on the model, meaning the model properties, not the database column name itself. In this example,disbursement_date
is the model property anddisb_dt
is the database column name. We need to usedisbursement_date
because that is the model property and what is exposed in the API.
If we were going to enable sorting by expression for another column, we would add another key/value pair in the dictionary of the model property and the expression to order by.
The sorting and paging code takes care of all of the rest for us, including supporting which direction to sort in. With this property and mapping in place, the expression will be used in the ORDER BY
clause, and the ORDER BY
clause in turn will make use of the index setup for the expression.
If you are interested in seeing the full implementation of how this is supporting, PR #2905 contains the changeset that enables the underlying support for this.