Can be run locally via python -m unittest discover
. There is also a GitHub action to run them.
We currently don't have a unit test suite.
We do have some end-to-end test workflows. Can run them locally via make test-frontend
, but if you check the makefile,
you can see alternative commands (e.g. debugging). There is also a GitHub action for this.
Refresh: Concept set tables (code_sets
, concept_set_container
, concept_set_version_item
concept_set_members
)
There are several database refreshes, which synchronize TermHub with its data source, the N3C data enclave. The most important of these is for the concept set tables. After these tables are synchronized, any dependent tables or views are also regenerated.
A refresh is done nightly via GitHub action,
but this can also be run manually, either by (a) using the GitHub action, or (b) running the Python script
manually via python backend/db/full_data_refresh.py
, which supports the following CLI parameters.
Parameter | Default | Description |
---|---|---|
-t / --hours-threshold-for-updates |
24 | Threshold for how many hours since last update before we require refreshes. If last update time was less than this, nothing will happen. Will evaluate this separately for downloads of local artefacts as well as uploading data to the DB. |
-o / --objects |
False | Download objects. |
-c / --datasets-csets |
False | Download datasets from the "cset" group. |
-v / --datasets-vocab |
False | Download datasets from the "vocab" group |
-f / --force-download-if-exists |
True | If the dataset/object already exists as a local file, force a re-download. This is moot if the last update was done within --hours-threshold-for-updates. |
-l / --use-local-db |
False | Use local database instead of server. |
Patient and record counts are updated in the N3C data enclave routinely, typically every few weeks or months. There is a GitHub action that checks nightly for any changes and updates if so.
This refresh updates the concept_set_counts_clamped
and deidentified_term_usage_by_domain_clamped
tables, as well
as their derived tables and views.
This can also be run manually via make refresh-counts
, or python backend/db/refresh_dataset_group_tables.py --dataset-group counts
.
OMOP vocabulary tables are updated typically every 6 months. There is a GitHub action that checks nightly for any changes and updates if so.
Warning: As of November 2024, the GitHub action for this no longer works 100% of the time. Inexplicably, even with the same inputs, sometimes it only takes ~4.5 hours, and sometimes it takes >6 hours, which is greater than the maximum allowable time for GitHub actions. If it fails, then we can do the following: (i) perhaps just try running the action again; maybe it will take <6 hours the second time, or (ii) just run the refresh locally.
This refresh updates the concept
, concept_ancestor
, concept_relationship
, relationship
tables, as well
as their derived tables and views.
Additionally, whenever this refresh occurs, the networkx
graph term-vocab/relationship_graph.pickle
needs updating.
Presently this does not happen as part of the refresh runs, but afterward. The next time that the app starts, if it
sees that the pickle is out of date, it will regenerate it. This takes about 5 minutes.
This can also be run manually via make refresh-vocab
, or python backend/db/refresh_dataset_group_tables.py --dataset-group vocab
.
Every 6 months or so, whenever the vocab tables are updated:
- Run the vocab refresh locally.
- Go to https://portal.azure.com/ and restart the TermHub server(s).
- After 15 minutes or so, face check the app in the browser to make sure it's working.
- It's also a good idea to run the frontend Playwright E2E tests, though these do run nightly.
To allow a new user to access the database remotely, their IP address must be added to Azure: (i) select DB resource, (ii) select 'Networking', (iii) add IP address.
Prerequisites
You should have an environmental variable called psql_conn
, set as follows:
psql_conn="host=$TERMHUB_DB_HOST port=$TERMHUB_DB_PORT dbname=$TERMHUB_DB_DB user=$TERMHUB_DB_USER password=$TERMHUB_DB_PASS sslmode=require"
If you run ./db_backup.sh
, it will generate commands that you can directly copy/paste into the terminal to (i)
create the backup, and (ii) restore it.
Optional steps
- Google Drive: Uploading the backup there as well can be helpful because it has happened in the past that our
- backup schemas on PostgreSQL have gotten corrupted.
If any new views or derived tables are added, there are some additional steps that need to be followed in order to avoid breaking the database refresh and ensuring that these tables stay up-to-date when refreshes happen.
- Create a DDL file
This needs to be added to
backend/db/
. The standard file naming isddl-N-MODULE.jinja.sql
, whereN
is the position in which it should run relative to other such files, andMODULE
is the module name, e.g. the name of the table(s) or other operations done by the DDL. When choosing the position of the file (N
), you should put it after any / all other DDL which is used to create any of the tables/views that it is derived from. - Update
backend/db/config.py:DERIVED_TABLE_DEPENDENCY_MAP
. - Update
backend/db/config.py:DERIVED_TABLE_DEPENDENCY_MAP
if it is a view.
Additional rules:
- Each table and view should be in their own DDL, not part of another DDL, even if they only depend on one table.
- Otherwise there will be errors.
- The Jinja
{{optional_suffix}}
should only go on the table/view being created itself, not what it's selecting from:- E.g. you want like:
CREATE OR REPLACE VIEW {{schema}}all_csets_view{{optional_suffix}} ...
and ... FROM {{schema}}all_csets);
- E.g. you want like:
Or, you may see: WARNING: transaction read-write mode must be set before any query
This seems to result from situations where we're running commands after the database recently ran out of memory. It
seems like it goes into read only mode at that point and these commands need to be ran to reset it.
BEGIN;
SET transaction read write;
ALTER DATABASE termhub SET transaction_read_only = off;
ALTER DATABASE termhub SET default_transaction_read_only = off;
COMMIT;
If you see seomthing like this...
psql -d $psql_conn
psql: error: connection to server at "termhub.postgres.database.azure.com" (20.62.151.251), port 5432 failed: FATAL:
remaining connection slots are reserved for non-replication superuser connections
...This can be resolved by opening up the DB management page in Azure and clicking the "Restart" button near the top.
You likely will already have a schema called n3c
which is corrupted in some way, hence why you want to restore from backup.
Step 1: Remove/rename corrupted n3c
scema
1.a. Back up corrupted n3c
schema: If you want to keep that schema for whateve reason, you can create a new backup
for it as in step (1), or simply give it a new schema name:
ALTER SCHEMA n3c RENAME TO <new name>;
1.b. Drop up corrupted n3c
schema: But there's a good chance you just want to drop it instead, like so:
DROP n3c WITH CASCADE;
Step 2: Rename n3c_backup_YYYYMMDD
as n3c
ALTER SCHEMA <backup schema name> RENAME TO n3c;
This script/workflow will download datasets from the N3C data enclave and perform the necessary steps to remake all of
the database tables.
It can be run via (a) its GitHub Action, or (b) directly via python backend/db/refresh_from_datasets.py
.
Some things you might want to try to make sure that the restoration worked.
- Run
make counts-update
. Then, runmake counts-table
,make counts-compare-schemas
, ormake counts-docs
, - depending on the situation, and check that the counts/deltas look good.
- Restart local backend/frontend, load frontend in browser, clear
localStorage
(from console:localStorage.clear()
), and face check various application features. - Load dev deployment, clear
localStorage
in the same way, and face check various application features. - Face check
n3c
schema. Just look at the tables and make sure they don't look weird.
Many of these steps are specific to the JHU BIDS team, which deploys on JHU's Azure infrastructure.
Update environmental variables for GitHub actions when necessary:
Every once in a while, will need to update the ENV_FILE GitHub Secret
with a copy/paste of env/.env
. This is only necessary whenever environmental variables have changed, such as
PALANTIR_ENCLAVE_AUTHENTICATION_BEARER_TOKEN
getting refreshed, which happens every few months. If you're confident
that the environment is still up to date, this step can be skipped, but to be safe, it can be done every tie. You can
read more about this in the related "Periodic maintenance > Updating auth token" section.
First, make replace the current dev
or prod
git tag with the commit that you are deploying from. The reason for this
is because the Playright tests are set to run on those tagged commits, so that the tests that run are in sync with the
actual deployment. That way, we'll know if some problem has occurred with our infrastructure. Playwright tests using the
latest code should only be run on local.
Example using dev
:
# Optional: back up old tag
git checkout dev
git tag dev-YYYY-MM-DD # The date the commit was made. There might already be such a tag.
# Make new tags
git checkout BRANCH_OR_COMMIT_TO_DEPLOY
git tag -d dev # delete locally
git tag dev
git tag dev-YYYY-MM-DD # This is a backup. YYYY-MM-DD: The date the commit was made.
# Push updates
git push --tags -f # -f (force) overwrites the previous `dev` tag
Use the GitHub actions. Click these links for backend and frontend actions, and then click "Run Workflow". After both are deployed, test the app here: http://bit.ly/termhub-dev
Use the GitHub actions. Click these links for backend and frontend actions, and then click "Run Workflow". After both are deployed, the app app will be usable here: http://bit.ly/termhub
After deploying, do some manual quality assurance.
I. Clear cache
Start by clearing the cache. Right now the best way to do that is by opening the console and doing localStorage.clear()
.
As of 2023/08, we're in the process of fixing the clear cache button in the "Help / About" page. And we're also in the
process of automatically clearing the cache on version, code, and data changes.
II. Checks Face-check the app. Some ideas of things to try:
- Does the example comparison page load?
- Can you select some concept sets from the cset search page?
- After doing (2), can you go to the comparison page and see the concept sets you selected?
- Do the controls on the comparison page work correctly?
If you notice something wrong with a deployment, follow these steps to roll back to the last stable deployment.
- Go to the GitHub actions page.
- From the left sidebar, click the action that corresponds to the broken deployment (e.g. Backend dev, Frontend dev, Backend prod, or Frontend prod).
- Find the workflow that you think produced the last stable deployment, and click it to open its logs.
- If you're looking at a backend workflow, under the "Jobs" section of the left sidebar, select the job with the name "build". If you're looking at a frontend workflow, select the job with the name "Build and Deploy".
- It will scroll you to the bottom of the logs page. Scroll to the top, and you should see a step called "Print commit hash & branch for rollbacks & troubleshooting". Click that.
- You should now see the commit hash. Copy it.
- In your terminal, run
git checkout <commit hash>
. This will checkout the code at the commit hash you copied. - Create a new branch & push it; for example,
git checkout -b ROLLBACK; git push -u origin ROLLBACK
. - Go to the GitHub actions page.
- From the left sidebar, click the action that corresponds to the broken deployment (e.g. Backend dev, Frontend dev, Backend prod, or Frontend prod).
- Click the "Run workflow" button on the right, and in the popup that appears, where it says "Use workflow from", click the dropdown menu and select the branch you just created (e.g. ROLLBACK). Then, click "Run workflow".
- Finally, after the deployment has been successful, you can delete that branch (e.g.
ROLLBACK
) locally and on GitHub.
After the action finishes, your deployment should be rolled back to the last stable deployment.
You will also want to separately figure out what went wrong and fix it separately (i.e. in the develop
branch), and
make a new deployment again when things are stable and ready for a new release.
Note that for the backend only instead of steps 2 - 4, you can also find the commit hash by going to the deployments page.
Sometimes you may want to redeploy an existing deployment with only minor changes. For example, you may have updated
the ENV_FILE
as discussed in the "Periodic maintenance > Updating auth token" section. Or, you may have noticed a
small bug and want to deploy the fix on top of the currently deployed commit, rather than deploying further updates
from develop
or main
.
What you want to do here is follow basically the same steps as in the "Deployment > Rollbacks" section. If all you did
was update ENV_FILE
, you can follow those steps exactly. If you are mading additional changes, then you would
basically modify step (8). Instead of git checkout -b BRANCH; git push -u origin BRANCH
, you would do
git checkout -b BRANCH
, make your changes and do a new commit, and then git push -u origin ROLLBACK
.
Special case: Failed deployments: Below are steps to find general logs. In order to find the azure log for failed deployments, go here , select "Logs" and then select the one that failed.
Logs for the backend can be opened via the following steps. Option A: Log stream Advantages of log stream vs granular: (i) easier to get to, (ii) combines multiple logs into 1. Disadvantages: (i) when it refreshes, you will have to scroll back down to the bottom, (ii) can be harder to find what you're looking for.
- Log into https://portal.azure.com.
- From the list of Resources, select the deployment. For production, its "Name" is "termhub" and its "Type" is "App Service". For development, its "Name" is "dev / (termhub/dev)", and its "Type" is "App Service (Slot)".
- There is a "Log stream" option in the left sidebar. Click it.
Option B: Granular logs
- Log into https://portal.azure.com.
- From the list of Resources, select the deployment. For production, its "Name" is "termhub" and its "Type" is "App Service". For development, its "Name" is "dev / (termhub/dev)", and its "Type" is "App Service (Slot)".
- From the left sidebar, select "Advanced Tools" (unfortunately, none of "Logs", "App Service Logs", or "Log stream" are what you want).
- Click "Go ->".
- Click "Current Docker logs".
- Find the log: A JSON list of log references will appear. Each log ref has an "href" field with a URL. It seems like there's always about a dozen or so refs. Unfortunately, there doesn't seem to be a way to tell which one is the one
- that we're usually going to want; that is, the one which will show things like stacktraces from server errors.
- You'll have to open each and scroll to the bottom until you find what you are looking for.
Option C: Less recent logs
- Log into https://portal.azure.com.
- From the list of Resources, select the deployment. For production, its "Name" is "termhub" and its "Type" is "App
- Service". For development, its "Name" is "dev / (termhub/dev)", and its "Type" is "App Service (Slot)".
- From the left sidebar, select "Advanced Tools" (unfortunately, none of "Logs", "App Service Logs", or "Log stream" are what you want).
- Click "Go ->".
- Click "Bash" at the top.
- There will be a logs folder that you can
cd
into which has a history of logs which include dates in the names.
Note that if the app is not starting, SSH will not work. Also, it's helpful to know the difference between "Bash" and "SSH" in the "Advanced Tools" console. "Bash" is for accessing the machine that manages deployments. "SSH" is for accessing the machine where the app source code is located.
- Log into https://portal.azure.com.
- From the list of Resources, select the deployment. For production, its "Name" is "termhub" and its "Type" is "App Service". For development, its "Name" is "dev / (termhub/dev)", and its "Type" is "App Service (Slot)".
- From the left sidebar, select "Advanced Tools" (unfortunately, none of "Logs", "App Service Logs", or "Log stream" are what you want).
- Click "Go ->".
- Click "SSH" at the top.
It may be that the app has run out of memory. In https://portal.azure.com there is a way to check memory usage. You can find this by opening the "App Service" (example: develop ) and selecting "App Service Plan" (example: develop ). If it looks like it's maxed and/or of the logs say something about no memory, try increasing the memory to see if that solves. If the memory must be increased, let a BIDS administrator (e.g. Tricia) know.
Example use case: Changing Python version. Steps: Stop app service, change config, start the service, and run the GH action to re-deploy.
TermHub uses semver (semantic versioning). That is, Given a version number MAJOR.MINOR.PATCH
,
increment the:
MAJOR
version when making incompatible API changesMINOR
version when adding functionality in a backward compatible mannerPATCH
version when making backward compatible bug fixes
For example, if we add a new feature or make an update to an existing UI / user facing feature, but don't change
anything that breaks functionality for users or systems that depend on TermHub, that would be a MINOR
version update.
So if the version was 1.10.2 before, it would be 1.11.0 after.
- Update the version in
frontend/src/env.js
- Tag the version in GitHub:
git tag VERSION; git push --tags
If any environmental variables are created/updated/deleted, follow these steps.
- Local
.env
: All devs need to update theirenv/.env
file. Each dev's.env
should be exactly the same. - GitHub settings: Then the
.env
file should be copied and pasted into theENV_FILE
variable on GitHub. To do this, go to the GitHub actions secrets page, scroll down to "repository secrets", editENV_FILE
, paste the contents there, and save. You can read more about this in the related "Deployment > Prerequisite steps" section. - Deployment: Note that after the
ENV_FILE
is up to date, the currently deployed apps will not have access to its new contents. In order to deploy these changes, follow the instructions in "Deployment > Applying patches to existing deployments".
Not necessary if using OAuth, but we were not able to finish setting that up.
The environmental variable PALANTIR_ENCLAVE_AUTHENTICATION_BEARER_TOKEN
needs to be updated every 6 months. To do so,
contact Mariam Deacy, or another member of the enclave's IT team.
After doing so, set a reminder on your calendar before the expiry date.
Note that when close to expiring, messages will start to appear in the logs about that.