-
Notifications
You must be signed in to change notification settings - Fork 1
Website update
The full database should be restored to the website instance to improve reliability, excluding the aligned_markers
table.
Restore the GTDB rXXX database to the website instance with the following name: gtdb_r***
pg_dump --host=??? --username=??? --password --exclude-table=aligned_markers --no-privileges --no-owner -F t gtdb_pierre_r202 > /tmp/gtdb_r202.tar
pg_restore --host=??? --username=??? --password --dbname gtdb_r202 --no-privileges --no-owner --verbose /tmp/gtdb_r202.tar
Create a new database called gtdb_r***_web
. This is a database containing pre-processed information used in tools (e.g. taxon history).
The old version of the database should be restored, note that some data will be modified for the new release.
pg_dump --host=??? --username=??? --password --no-privileges --no-owner -F t gtdb_r95_web > /tmp/gtdb_r95_web.tar
pg_restore --host=??? --username=??? --password --dbname gtdb_r202_web --no-privileges --no-owner --verbose /tmp/gtdb_r95_web.tar
The taxon_alias
table contains a list of aliases for a specific canonical genome_id. For example:
-
GCA_000006945.2
,GCF_000006945.1
,GCF_000006945.2
=G000006945
- Note that this includes UBA IDs (UBA1234, 61231).
Steps:
- As all the UBA mapping is in the table, there is no need to re-process it.
- Add the new genomes to the table.
- Simply, this can be done by just exporting the data and doing an intersection.
The taxon_hist
table contains the taxon history for each canonical genome id per release.
This is used by the taxon history tool.
Steps:
- Query the
gtdb_r***
database and export the result to disk:
select 'R202' AS release_ver,
g.formatted_source_id AS genome_id,
gtdb_domain,
gtdb_phylum,
gtdb_class,
gtdb_order,
gtdb_family,
gtdb_genus,
gtdb_species
from genomes g
inner join metadata_taxonomy mt on g.id = mt.id
where gtdb_phylum != 'p__';
The genome_path
table is used by the FastANI calculator.
Simply query the R202
table, export the query:
select formatted_source_id,
CASE
WHEN substring(fasta_file_location, 0, 4) = 'GCA'
THEN CONCAT('/srv/db/gtdb/genomes/ncbi/release202/genbank/', fasta_file_location)
WHEN substring(fasta_file_location, 0, 4) = 'GCF'
THEN CONCAT('/srv/db/gtdb/genomes/ncbi/release202/refseq/', fasta_file_location)
END AS path
from genomes
Then import the result from the above query into the table.
The genome_taxid
table is used by the genome page to infer the NCBI taxonomy ID, so that it can
be linked out.
This is generated through the GTDB-Migration-Tk package. Truncate the existing table, and import the new records. <<<< TODOS
The gtdb_release
table is probably not used but it should be updated as well as per what is already in there.
Find the SQLALCHEMY_BINDS
variable in config.py
and add the new databases, e.g.:
SQLALCHEMY_BINDS = {
'r95': f'postgresql://{DB_USER}:{DB_PASS}@{DB_HOST}/gtdb_r95',
'gtdb_web': f'postgresql://{DB_USER}:{DB_PASS}@{DB_HOST}/gtdb_r95_web',
'dev': f'postgresql://{DB_USER}:{DB_PASS}@{DB_HOST}/gtdb_release202_temp',
}
SQLALCHEMY_BINDS['current'] = SQLALCHEMY_BINDS['r95']
SQLALCHEMY_BINDS = {
'r95': f'postgresql://{DB_USER}:{DB_PASS}@{DB_HOST}/gtdb_r95',
'r202': f'postgresql://{DB_USER}:{DB_PASS}@{DB_HOST}/gtdb_r202',
'gtdb_web': f'postgresql://{DB_USER}:{DB_PASS}@{DB_HOST}/gtdb_r202_web',
'dev': f'postgresql://{DB_USER}:{DB_PASS}@{DB_HOST}/gtdb_release202_temp',
}
SQLALCHEMY_BINDS['current'] = SQLALCHEMY_BINDS['r202']
This ideally would be automatically configured, but the time required to do that has not yet reached a point that is worth it.