Skip to content

Enabling an expression based sort

Carlo Costino edited this page Feb 9, 2018 · 15 revisions

Overview

Some of our API endpoints make use of SQL expressions (functions) in the WHERE clause of the underlying SQL statements. In these cases, we should be making use of the same expression in the ORDER BY clause of the SQL statement if we're also sorting (or would like to enable sorting) on the same field(s) found in the WHERE clause.

In order to accomplish this and ensure that the queries remain performant (or become more performant in the case of addressing existing performance issues), we must make use of indexes on expressions. With an index setup on the expression itself we can take advantage of the performance gains in both the WHERE and ORDER BY clauses in the SQL statements generated for the API.

There are two steps for accomplishing this:

  • Setting up the expression-based index(es)
  • Setting up the expression-based sort clause to map to the column(s) exposed for sorting in the API

Let's dive into the details of these two steps using [issue #2791](https://github.com/18F/openFEC/issues/2791 as the basis for the example.

Expression-based indexes

Setting up an expression-based index is identical to a regular index, the only difference is there is a SQL expression in place of a column name. For example, there are COALESCE statements in the WHERE clause for the Schedule B queries:

COALESCE(ofec_sched_b_master.disb_dt, '9999-12-31'::date);

We'll be referencing this statement in the example. But first, note that the API is also keeping track of an index for paging, and in the case of Schedule A and B, a CHECK CONSTRAINT for table partitioning due to the size of those data sets. All of our indexes must keep these things in mind, so they are usually setup as multicolumn indexes for maximum performance with our filtering, paging, and sorting options.

Back to our example with Schedule B, in order to create an index for this statement we would execute this statement:

CREATE INDEX idx_ofec_sched_b_1977_1978_election_cycle_coalesce_disb_dt
  ON public.ofec_sched_b_1977_1978
  USING btree
  (two_year_transaction_period, COALESCE(disb_dt, '9999-12-31'::date), sub_id);

Please note that we would do this for all tables in the Schedule B partition. With an expression-based index created, the WHERE and ORDER BY clauses can now take advantage of it. In this case, it's the COALESCE statement using the disb_dt column.

A full example of this can be seen in PR #2904, where a migration was created to add a group of expression-based indexes for the Schedule B partition.

Expression-based sort clauses

With expression-based indexes in place, any query with an expression in the WHERE clause that matches the index will automatically take advantage of it. The ORDER BY clause will not, however, because of the way the API is setup and how our sorting is exposed in the interface and URL. Only a column name is exposed in the API (both for usability and security, as we don't want to open ourselves up to SQL injection attacks), so we must map the column name to the expression we would like actually like to sort by. This is transparent to the user, meaning the API continues to work as it always has by specifying one or more (where supported) sorting columns with an optional - in front to toggle between ascending or descending order.

In order to setup this mapping to enable expression-based sorting for a given field, a hybrid_property must be setup on the model associated with the resource that is mapped to the endpoint being modified. For example, when using the API to view the Schedule B (disbursement) data, you would see this in the URL parameters by default: sort=-disbursement_date. This is the column name exposed in the API, but under the hood it should actually sort by COALESCE(ofec_sched_b_master.disb_dt, '9999-12-31'::date). The API model is what enables us to do this if we add a new property to it.

Continuing with the Schedule B example, take a look at the ScheduleB model, and what will be added to it is this:

@hybrid_property
def sort_expressions(self):
    return {
        'disbursement_date': sa.func.coalesce(
            self.disbursement_date,
            sa.cast('9999-12-31', sa.Date)
        )
    }

Let's unpack what is going on here:

  • First, we need to use the [@hybrid_property]([hybrid_property](http://docs.sqlalchemy.org/en/rel_1_0/orm/extensions/hybrid.html?highlight=hybrid_property#sqlalchemy.ext.hybrid.hybrid_property)) decorator because we are making use of a column defined in the model and a SQLAlchemy construct (the reference to coalesce` in this case).
  • Second, we create the property as a method and call it sort_expressions. This is important as the sorting code checks for the presence of this property name on a model to see if any columns need to be swapped for expressions.
  • Third, we return a dict in the method that contains the column names found in the model as the keys and the expressions we would like to swap them with as the values. Again, the keys are the column names that are defined on the model, meaning the model properties, not the database column name itself. In this example, disbursement_date is the model property and disb_dt is the database column name. We need to use disbursement_date because that is the model property and what is exposed in the API.

If we were going to enable sorting by expression for another column, we would add another key/value pair in the dictionary of the model property and the expression to order by.

The sorting and paging code takes care of all of the rest for us, including supporting which direction to sort in. With this property and mapping in place, the expression will be used in the ORDER BY clause, and the ORDER BY clause in turn will make use of the index setup for the expression.

If you are interested in seeing the full implementation of how this is supporting, PR #2905 contains the changeset that enables the underlying support for this.