-
Notifications
You must be signed in to change notification settings - Fork 2
Updating website
This outlines the steps required to update the website for each release.
On Watson, run the command below. Then copy it across to the database server.
pg_dump -U gtdb --data-only --format=tar --no-owner --no-acl --no-privileges --dbname=gtdb_pierre_r220 --file "/tmp/gtdb_r220.tar"
On the database server, restore the release into a new database:
docker exec -it gtdb-postgres psql -U postgres gtdb_r220 -f /backup/gtdb_r220.sql
Check what extensions were enabled on the previous release database, and enable them on the new database:
SELECT *
FROM pg_extension;
For example, this enables the pg_trgm extension:
create extension pg_trgm;
There are a number of materialized views that need to be created, and potentially views that may need to be created. Check the previous database and bring across any that do not exist.
Views:
select 'Materialized Views' AS type, matviewname AS name
from pg_matviews
UNION
select 'Views', table_name
from INFORMATION_SCHEMA.views
where table_schema = 'public'
order by 1, 2;
Update permissions on the database so that the API is allowed to read.
GRANT SELECT
ON ALL TABLES IN SCHEMA public
TO gtdb_api;
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT
ON TABLES TO gtdb_api;
Reclaim any unused space by running the following:
Finally, reclaim any unused space:
VACUUM FULL ANALYZE;
Make sure that all tables are clustered on their primary keys, this can be done by querying for all primary keys, then copying the SQL from the output and running it:
SELECT conrelid::regclass AS tbl,
conname AS pkey,
pg_get_constraintdef(oid) AS description,
CONCAT('CLUSTER ', conrelid::regclass, ' USING ', conname, ';') AS query
FROM pg_constraint
WHERE contype = 'p'
AND connamespace = 'public'::regnamespace
ORDER BY conrelid::regclass::text, contype DESC;
You will also need to manually re-cluster the materialized views, they should have a "clst" index that dictates what index to use.
Now, reindex the database and update the statistics:
REINDEX DATABASE gtdb_r220;
ANALYZE;
CHECKPOINT;
To do this, dump the previous release website database, then restore it as follows:
docker exec -it gtdb-postgres pg_dump -U postgres --dbname=gtdb_r214_web --no-acl --no-privileges --no-owner --file="/backup/gtdb_r214_web.sql"
docker exec -it gtdb-postgres psql -U postgres gtdb_r220_web -f /backup/gtdb_r214_web.sql
Update permissions on the database so that the API is allowed to read.
GRANT SELECT
ON ALL TABLES IN SCHEMA public
TO gtdb_api;
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT
ON TABLES TO gtdb_api;
First, truncate the genome_taxid table:
TRUNCATE TABLE genome_taxid;
Run the script from the GTDB API repository: scripts/release/update_genome_taxid.py
Ensure that the connection string is setup to use the new web release database (see api/db/__init__.py
).
This is done manually, discuss with the ledger holder to ensure that it is up-to-date.
- Connect to the
gtdb_r220_web.taxon_hist
table and run the following command:
DELETE FROM taxon_hist WHERE release_ver = 'NCBI';
Run the following command in the new GTDB release database (not the web database):
select 'NCBI' AS release_ver,
g.formatted_source_id AS genome_id,
split_part(ncbi_taxonomy, ';', 1) AS rank_domain,
split_part(ncbi_taxonomy, ';', 2) AS rank_phylum,
split_part(ncbi_taxonomy, ';', 3) AS rank_class,
split_part(ncbi_taxonomy, ';', 4) AS rank_order,
split_part(ncbi_taxonomy, ';', 5) AS rank_family,
split_part(ncbi_taxonomy, ';', 6) AS rank_genus,
split_part(ncbi_taxonomy, ';', 7) AS rank_species
from genomes g
inner join metadata_taxonomy mt ON mt.id = g.id
where mt.gtdb_domain != 'd__'
AND mt.gtdb_phylum != 'p__'
AND mt.gtdb_class != 'c__'
AND mt.gtdb_order != 'o__'
AND mt.gtdb_family != 'f__'
AND mt.gtdb_genus != 'g__'
AND mt.gtdb_species != 's__'
UNION
select 'R220' AS release_ver,
g.formatted_source_id AS genome_id,
mt.gtdb_domain AS rank_domain,
mt.gtdb_phylum AS rank_phylum,
mt.gtdb_class AS rank_class,
mt.gtdb_order AS rank_order,
mt.gtdb_family AS rank_family,
mt.gtdb_genus AS rank_genus,
mt.gtdb_species AS rank_species
from genomes g
inner join metadata_taxonomy mt ON mt.id = g.id
where mt.gtdb_domain != 'd__'
AND mt.gtdb_phylum != 'p__'
AND mt.gtdb_class != 'c__'
AND mt.gtdb_order != 'o__'
AND mt.gtdb_family != 'f__'
AND mt.gtdb_genus != 'g__'
AND mt.gtdb_species != 's__'
Export the results and import them to the gtdb_r220_web.taxon_hist
table.
Make sure that all tables are clustered on their primary keys, this can be done by querying for all primary keys, then copying the SQL from the output and running it:
SELECT conrelid::regclass AS tbl,
conname AS pkey,
pg_get_constraintdef(oid) AS description,
CONCAT('CLUSTER ', conrelid::regclass, ' USING ', conname, ';') AS query
FROM pg_constraint
WHERE contype = 'p'
AND connamespace = 'public'::regnamespace
ORDER BY conrelid::regclass::text, contype DESC;
Finally, re-index the database:
REINDEX DATABASE gtdb_r220_web;
TODO: Put the commit with the release changes in.