Skip to content
Aaron Mussig edited this page Apr 12, 2024 · 7 revisions

This outlines the steps required to update the website for each release.

1 - Preparing the GTDB database

1.1 - Backup & restore

On Watson, run the command below. Then copy it across to the database server.

pg_dump -U gtdb --data-only --format=tar --no-owner --no-acl --no-privileges --dbname=gtdb_pierre_r220 --file "/tmp/gtdb_r220.tar"

On the database server, restore the release into a new database:

docker exec -it gtdb-postgres psql -U postgres gtdb_r220 -f /backup/gtdb_r220.sql

1.2 - Extensions

Check what extensions were enabled on the previous release database, and enable them on the new database:

SELECT *
FROM pg_extension;

For example, this enables the pg_trgm extension:

create extension pg_trgm;

1.3 - Views

There are a number of materialized views that need to be created, and potentially views that may need to be created. Check the previous database and bring across any that do not exist.

Views:

select 'Materialized Views' AS type, matviewname AS name
from pg_matviews
    UNION
select 'Views', table_name
from INFORMATION_SCHEMA.views
where table_schema = 'public'
order by 1, 2;

1.4 - Permissions

Update permissions on the database so that the API is allowed to read.

GRANT SELECT
    ON ALL TABLES IN SCHEMA public
    TO gtdb_api;

ALTER DEFAULT PRIVILEGES IN SCHEMA public
    GRANT SELECT 
    ON TABLES TO gtdb_api;

1.5 - Maintenance

Reclaim any unused space by running the following:

Finally, reclaim any unused space:

VACUUM FULL ANALYZE;

Make sure that all tables are clustered on their primary keys, this can be done by querying for all primary keys, then copying the SQL from the output and running it:

SELECT conrelid::regclass                                              AS tbl,
       conname                                                         AS pkey,
       pg_get_constraintdef(oid)                                       AS description,
       CONCAT('CLUSTER ', conrelid::regclass, ' USING ', conname, ';') AS query
FROM pg_constraint
WHERE contype = 'p'
  AND connamespace = 'public'::regnamespace
ORDER BY conrelid::regclass::text, contype DESC;

You will also need to manually re-cluster the materialized views, they should have a "clst" index that dictates what index to use.

Now, reindex the database and update the statistics:

REINDEX DATABASE gtdb_r220;
ANALYZE;
CHECKPOINT;

2 - Preparing the GTDB website database

2.1 - Backup & restore

To do this, dump the previous release website database, then restore it as follows:

docker exec -it gtdb-postgres pg_dump -U postgres --dbname=gtdb_r214_web --no-acl --no-privileges --no-owner --file="/backup/gtdb_r214_web.sql"

docker exec -it gtdb-postgres psql -U postgres gtdb_r220_web -f /backup/gtdb_r214_web.sql

2.2 - Permissions

Update permissions on the database so that the API is allowed to read.

GRANT SELECT
    ON ALL TABLES IN SCHEMA public
    TO gtdb_api;

ALTER DEFAULT PRIVILEGES IN SCHEMA public
    GRANT SELECT 
    ON TABLES TO gtdb_api;

2.2 - Updating the genome_taxid table

First, truncate the genome_taxid table:

TRUNCATE TABLE genome_taxid;

Run the script from the GTDB API repository: scripts/release/update_genome_taxid.py

Ensure that the connection string is setup to use the new web release database (see api/db/__init__.py).

2.3 - Updating the gtdb_taxa_not_in_lit table

This is done manually, discuss with the ledger holder to ensure that it is up-to-date.

2.4 - Updating the gtdb_tree and gtdb_tree_children tables

2.5 - Updating the lpsn_url table

2.6 - Updating the taxon_hist table

  1. Connect to the gtdb_r220_web.taxon_hist table and run the following command:
DELETE FROM taxon_hist WHERE release_ver = 'NCBI';

Run the following command in the new GTDB release database (not the web database):

select 'NCBI'                            AS release_ver,
       g.formatted_source_id             AS genome_id,
       split_part(ncbi_taxonomy, ';', 1) AS rank_domain,
       split_part(ncbi_taxonomy, ';', 2) AS rank_phylum,
       split_part(ncbi_taxonomy, ';', 3) AS rank_class,
       split_part(ncbi_taxonomy, ';', 4) AS rank_order,
       split_part(ncbi_taxonomy, ';', 5) AS rank_family,
       split_part(ncbi_taxonomy, ';', 6) AS rank_genus,
       split_part(ncbi_taxonomy, ';', 7) AS rank_species
from genomes g
         inner join metadata_taxonomy mt ON mt.id = g.id
where mt.gtdb_domain != 'd__'
  AND mt.gtdb_phylum != 'p__'
  AND mt.gtdb_class != 'c__'
  AND mt.gtdb_order != 'o__'
  AND mt.gtdb_family != 'f__'
  AND mt.gtdb_genus != 'g__'
  AND mt.gtdb_species != 's__'

UNION

select 'R220'                AS release_ver,
       g.formatted_source_id AS genome_id,
       mt.gtdb_domain        AS rank_domain,
       mt.gtdb_phylum        AS rank_phylum,
       mt.gtdb_class         AS rank_class,
       mt.gtdb_order         AS rank_order,
       mt.gtdb_family        AS rank_family,
       mt.gtdb_genus         AS rank_genus,
       mt.gtdb_species       AS rank_species
from genomes g
         inner join metadata_taxonomy mt ON mt.id = g.id
where mt.gtdb_domain != 'd__'
  AND mt.gtdb_phylum != 'p__'
  AND mt.gtdb_class != 'c__'
  AND mt.gtdb_order != 'o__'
  AND mt.gtdb_family != 'f__'
  AND mt.gtdb_genus != 'g__'
  AND mt.gtdb_species != 's__'

Export the results and import them to the gtdb_r220_web.taxon_hist table.

2.xxxx - Indexing

Make sure that all tables are clustered on their primary keys, this can be done by querying for all primary keys, then copying the SQL from the output and running it:

SELECT conrelid::regclass                                              AS tbl,
       conname                                                         AS pkey,
       pg_get_constraintdef(oid)                                       AS description,
       CONCAT('CLUSTER ', conrelid::regclass, ' USING ', conname, ';') AS query
FROM pg_constraint
WHERE contype = 'p'
  AND connamespace = 'public'::regnamespace
ORDER BY conrelid::regclass::text, contype DESC;

Finally, re-index the database:

REINDEX DATABASE gtdb_r220_web;

3 - Updating the GTDB website

TODO: Put the commit with the release changes in.