-
Notifications
You must be signed in to change notification settings - Fork 87
[RFR] use pgmigrate for schema migrations #759
Conversation
That way we can use the same
Maybe there's a better way to go about all of this, but to retain the current functionality / behaviour, you'd have to make sure of the following:
As for the
That's unfortunate. Let's skip it for now then.
PostgreSQL auth is a confusing mix between UNIX accounts and database's own ones so don't feel bad. Currently implemented docker run -it dockermediacloud/postgresql-server # defaults to ":latest" tag and then docker exec -it <container_id> psql So dunno, maybe run |
not sure how to go about testing this in production and extremely paranoid about doing so, but i think the PR is in decent shape now, and behaving as intended locally |
Thanks, will have a look soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all of your work on this, and sorry for the 428342837th time for the delay.
Just some minor changes here and there, plus a single bug (migrations don't seem to work on second run of the database service in the container).
cd /opt/mediacloud && pgmigrate -t latest migrate | ||
|
||
# Dump schema file for reference in development | ||
psql -v ON_ERROR_STOP=1 mediacloud -c '\! pg_dump mediacloud > /tmp/mediawords.sql' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
/tmp
and/var/tmp
could be tmpfs filesystems mounted by Docker to the container; I think it's better to store the generated schema somewhere less temporary, e.g./
; -
Instead of running
pg_dump
from withinpsql
(psql
's\!
command just starts a shell), you can just runpg_dump
directly; -
By the way, by default
psql
just ignores errors that it encounters in the input SQL. For example, if you had the following SQL file:CREATE TABLE foo (name TEXT); blergh; CREATE TABLE bar (name TEXT);
and were to run
psql -d database_name -f that_file_with_a_typo_in_the_middle.sql
, it wouldCREATE TABLE foo
, complain about theblergh
statement and then happilyCREATE TABLE bar
. This is something to be wary of when, for example, importing large dumps because one might end up with an incomplete imported dump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single tiny revert please, plus update a bunch of docker-compose.tests.yml
.
Edit: no, still volume problems; I'll keep investigating |
@pypt alrighty, only test failures are crimson hexagon-related, so i think this is good to go assuming it looks okay to you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more quick find-and-replace, and we're good to go!
Amazinglymazing, thank you so much! |
Fixes #754. |
Hey @jtotoole @pypt
any suggestions on how I can resolve this? |
From the comments, I see I can set |
Progress here: pgmigrate is successfully creating the db and running the migrations on container start. Some questions (apologies in advance that these reflect a shaky grasp of the existing system and are therefore likely dumb):
pgmigrate
does the work both of initializing the schema and applying migrations, so I'm thinking thatinitialize_schema.sh
,apply_migrations.sh
, andgenerate_empty_sql_migration.sh
can perhaps be consolidated into one file. The challenge, then, is whether there's a way of running a check, like what's done here, to determine whether new migrations are necessary on container start. Do you have thoughts on that? Maybe running all the migrations every time is unavoidable? One thing thatpgmigrate
can do is point to a specific migration number as opposed to applying them all at once (e.gpgmigrate -t 3 migrate
to run up to migration 3), and it creates aschema_version
table in the DB. So, maybe there's a way of scanning all the files in themigrations
folder, identifying the filename with the highest number, and comparing that to the highest number in theschema_versions
table?--dryrun
option, which rolls back rather than committing, but it doesn't seem to actually log the SQL anywhere when that flag is set, so I'm not sure of the best way to output the pending SQL code without running it. I think, though, that all it's going to do each time is run the files in/migrations
sequentially?pg_dump
to get the reference schema file. Based on the permissions errors I've been getting when attempting to run asroot
andpostgres
, as well as a read of the Postgres docs, it seems like I need to execute the command as themediacloud
superuser. When I've tried that (specifically, executingpg_dump --dbname=mediacloud --username=mediacloud
ininitialize_db.sh
), I get the error:pg_dump: [archiver (db)] connection to database "mediacloud" failed: FATAL: Peer authentication failed for user "mediacloud"
. Any thoughts on how to solve this one?Perhaps it's easiest to talk through this via Google Meet—lmk!