Skip to content
This repository has been archived by the owner on Dec 14, 2023. It is now read-only.

[RFR] use pgmigrate for schema migrations #759

Merged
merged 49 commits into from
Jul 22, 2021
Merged

[RFR] use pgmigrate for schema migrations #759

merged 49 commits into from
Jul 22, 2021

Conversation

jtotoole
Copy link
Contributor

Progress here: pgmigrate is successfully creating the db and running the migrations on container start. Some questions (apologies in advance that these reflect a shaky grasp of the existing system and are therefore likely dumb):

  • pgmigrate does the work both of initializing the schema and applying migrations, so I'm thinking that initialize_schema.sh, apply_migrations.sh, and generate_empty_sql_migration.sh can perhaps be consolidated into one file. The challenge, then, is whether there's a way of running a check, like what's done here, to determine whether new migrations are necessary on container start. Do you have thoughts on that? Maybe running all the migrations every time is unavoidable? One thing that pgmigrate can do is point to a specific migration number as opposed to applying them all at once (e.g pgmigrate -t 3 migrate to run up to migration 3), and it creates a schema_version table in the DB. So, maybe there's a way of scanning all the files in the migrations folder, identifying the filename with the highest number, and comparing that to the highest number in the schema_versions table?
  • It seems important to validate whether the ZFS backup snapshot works,, however I don't exactly grok what's happening in that code block. How do you think that check might be incorporated into the new system?
  • You noted in the issue:

It would be nice to have (retain) to have some way of applying any given migrations manually before deploying the rebuilt postgresql-server service.... If the migration tool would be able to print out SQL that would be run on the live database instead of the tool insisting that it has to run the SQL itself, that would be pretty great!

  • ^pgmigrate has a --dryrun option, which rolls back rather than committing, but it doesn't seem to actually log the SQL anywhere when that flag is set, so I'm not sure of the best way to output the pending SQL code without running it. I think, though, that all it's going to do each time is run the files in /migrations sequentially?
  • I'm considering the best way to run pg_dump to get the reference schema file. Based on the permissions errors I've been getting when attempting to run as root and postgres, as well as a read of the Postgres docs, it seems like I need to execute the command as the mediacloud superuser. When I've tried that (specifically, executing pg_dump --dbname=mediacloud --username=mediacloud in initialize_db.sh), I get the error: pg_dump: [archiver (db)] connection to database "mediacloud" failed: FATAL: Peer authentication failed for user "mediacloud". Any thoughts on how to solve this one?

Perhaps it's easiest to talk through this via Google Meet—lmk!

@jtotoole jtotoole marked this pull request as draft January 24, 2021 19:01
@pypt
Copy link
Contributor

pypt commented Jan 25, 2021

I'm thinking that initialize_schema.sh, apply_migrations.sh, and generate_empty_sql_migration.sh can perhaps be consolidated into one file

generate_empty_sql_migration.sh is a script that creates a bunch of boilerplate for a new migration, so given that there's not going to be any boilerplate to wrap the actual migration in, I don't think it's necessary to have that script anymroe.

initialize_schema.sh preloads /var/lib/postgresql/ with a empty schema. We employ a somewhat neat Docker trick here: /var/lib/postgresql/ is defined as a volume in Dockerfile, and if you write something to a directory (/var/lib/postgresql/) and later declare it as a volume, on a container start Docker will then:

  1. If no volumes from the host system are attached to the container, use /var/lib/postgresql/ as any other directory;
  2. If a newly created volume gets attached to the container, preload that new volume with the contents of /var/lib/postgresql from the container image;
  3. If a previously created volume gets attached to the container, don't preload anything and just start the container with the volume mounted.

That way we can use the same postgresql-server image for both testing and production:

  1. When running a test (or anything else that gets started via docker-compose.tests.yml), no volumes get mounted to /var/lib/postgresql/ in postgresql-server, so the container starts with an empty schema loaded to it via initialize_schema.sh;
  2. When starting the very same Docker image on production, a data volume gets mounted from the host to the container so PostgreSQL gets started with the production data instead of an empty schema that got previously built into the container image.

Maybe there's a better way to go about all of this, but to retain the current functionality / behaviour, you'd have to make sure of the following:

  • On Docker build, the image ends up with the empty, up-to-date schema preloaded under /var/lib/postgresql. To do that, you'd probably want to start PostgreSQL in the background, apply all the migrations in the sequence to it, and stop it (as is currently done in initialize_schema.sh).
  • On container start, whenever container comes around to opening up a public port (i.e. lets users connect), it should expose an up-to-date schema to the users. To achieve that, you might start PostgreSQL on a non-public port (to avoid letting users connect to an outdated database), run the migration tool against that instance (which would then in turn figure out at which "version" the currently live schema is at, and apply the missing migrations to make the schema up-to-date), stop the instance, and restart it on a public port for the rest of the services to use.

As for the schema_version table, I think the tool is supposed to do all of the version comparing and such itself?

It seems important to validate whether the ZFS backup snapshot works,, however I don't exactly grok what's happening in that code block. How do you think that check might be incorporated into the new system?

MC_POSTGRESQL_SKIP_MIGRATIONS is an environment variable which, when set, skips all of the migration stuff and just starts PostgreSQL on whatever is currently in /var/lib/postgresql/. ZFS backups is just one of the ways how it gets used. Please leave support for that environment variable if you can.

^pgmigrate has a --dryrun option, which rolls back rather than committing, but it doesn't seem to actually log the SQL anywhere when that flag is set, so I'm not sure of the best way to output the pending SQL code without running it. I think, though, that all it's going to do each time is run the files in /migrations sequentially?

That's unfortunate. Let's skip it for now then.

I'm considering the best way to run pg_dump to get the reference schema file. Based on the permissions errors I've been getting when attempting to run as root and postgres, as well as a read of the Postgres docs, it seems like I need to execute the command as the mediacloud superuser. When I've tried that (specifically, executing pg_dump --dbname=mediacloud --username=mediacloud in initialize_db.sh), I get the error: pg_dump: [archiver (db)] connection to database "mediacloud" failed: FATAL: Peer authentication failed for user "mediacloud". Any thoughts on how to solve this one?

PostgreSQL auth is a confusing mix between UNIX accounts and database's own ones so don't feel bad. Currently implemented postgresql-server is configured to authenticate as mediacloud via environment variables so you should be able to do something like:

docker run -it dockermediacloud/postgresql-server    # defaults to ":latest" tag

and then

docker exec -it <container_id> psql

So dunno, maybe run pg_dump at some point when building and then add a shell script in ./dev/ that would cat the schema from postgresql-server:latest?

.gitignore Outdated Show resolved Hide resolved
@jtotoole jtotoole marked this pull request as ready for review January 29, 2021 00:44
@jtotoole jtotoole requested a review from pypt January 29, 2021 00:44
@jtotoole jtotoole changed the title [WIP] use pgmigrate for schema migrations use pgmigrate for schema migrations Jan 29, 2021
@jtotoole
Copy link
Contributor Author

not sure how to go about testing this in production and extremely paranoid about doing so, but i think the PR is in decent shape now, and behaving as intended locally

@pypt
Copy link
Contributor

pypt commented Feb 8, 2021

Thanks, will have a look soon.

@pypt pypt self-assigned this Jun 8, 2021
Copy link
Contributor

@pypt pypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all of your work on this, and sorry for the 428342837th time for the delay.

Just some minor changes here and there, plus a single bug (migrations don't seem to work on second run of the database service in the container).

.gitignore Outdated Show resolved Hide resolved
apps/postgresql-server/Dockerfile Show resolved Hide resolved
apps/postgresql-server/Dockerfile Outdated Show resolved Hide resolved
cd /opt/mediacloud && pgmigrate -t latest migrate

# Dump schema file for reference in development
psql -v ON_ERROR_STOP=1 mediacloud -c '\! pg_dump mediacloud > /tmp/mediawords.sql'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. /tmp and /var/tmp could be tmpfs filesystems mounted by Docker to the container; I think it's better to store the generated schema somewhere less temporary, e.g. /;

  2. Instead of running pg_dump from within psql (psql's \! command just starts a shell), you can just run pg_dump directly;

  3. By the way, by default psql just ignores errors that it encounters in the input SQL. For example, if you had the following SQL file:

    CREATE TABLE foo (name TEXT);
    
    blergh;
    
    CREATE TABLE bar (name TEXT);

    and were to run psql -d database_name -f that_file_with_a_typo_in_the_middle.sql, it would CREATE TABLE foo, complain about the blergh statement and then happily CREATE TABLE bar. This is something to be wary of when, for example, importing large dumps because one might end up with an incomplete imported dump.

dev/get_schema.sh Outdated Show resolved Hide resolved
dev/get_schema.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@pypt pypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single tiny revert please, plus update a bunch of docker-compose.tests.yml.

.github/workflows/build.yml Outdated Show resolved Hide resolved
@jtotoole
Copy link
Contributor Author

jtotoole commented Jul 2, 2021

Looks like the latest failures are all crimson hexagon related (so no more volume problems I don't think)—I can confirm once #793 is merged

Edit: no, still volume problems; I'll keep investigating

@jtotoole jtotoole changed the title use pgmigrate for schema migrations [RFR] use pgmigrate for schema migrations Jul 6, 2021
@jtotoole
Copy link
Contributor Author

jtotoole commented Jul 6, 2021

@pypt alrighty, only test failures are crimson hexagon-related, so i think this is good to go assuming it looks okay to you

Copy link
Contributor

@pypt pypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more quick find-and-replace, and we're good to go!

apps/common/docker-compose.tests.yml Outdated Show resolved Hide resolved
@pypt pypt merged commit 82ead93 into master Jul 22, 2021
@pypt pypt deleted the jot-pgmigrate branch July 22, 2021 12:47
@pypt
Copy link
Contributor

pypt commented Jul 22, 2021

Amazinglymazing, thank you so much!

@pypt
Copy link
Contributor

pypt commented Jul 22, 2021

Fixes #754.

@esirK
Copy link

esirK commented Oct 19, 2021

Hey @jtotoole @pypt
I restarted our Swarm and since it's using gcr.io/mcback/postgresql-server:release for the PostgreSQL image, it picked up all the latest changes which I think might be breaking the system. I'm currently getting the following error

psycopg2.errors.DuplicateTable: relation "database_variables" already exists

 File "/usr/local/bin/pgmigrate", line 8, in <module>
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |     sys.exit(_main())
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |   File "/usr/local/lib/python3.8/dist-packages/pgmigrate.py", line 891, in _main
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |     COMMANDS[args.cmd](config)
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |   File "/usr/local/lib/python3.8/dist-packages/pgmigrate.py", line 753, in migrate
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |     _migrate_step(state, config.callbacks, config.base_dir, config.user,
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |   File "/usr/local/lib/python3.8/dist-packages/pgmigrate.py", line 555, in _migrate_step
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |     _apply_version(version, base_dir, user, schema, cursor)
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |   File "/usr/local/lib/python3.8/dist-packages/pgmigrate.py", line 465, in _apply_version
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |     _apply_file(version_info.file_path, cursor)
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |   File "/usr/local/lib/python3.8/dist-packages/pgmigrate.py", line 451, in _apply_file
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |     _apply_statement(statement, cursor)
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |   File "/usr/local/lib/python3.8/dist-packages/pgmigrate.py", line 442, in _apply_statement
cfamediacloud_postgresql-server.1.nmvjhex3xf5b@ip-172-31-10-89    |
cfamediacloud_postgresql-server.1.nmvjhex3xf5b@ip-172-31-10-89    | During handling of the above exception, another exception occurred:
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |     raise MigrateError('Unable to apply statement')
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    | pgmigrate.MigrateError: Unable to apply statement
cfamediacloud_postgresql-server.1.nmvjhex3xf5b@ip-172-31-10-89    |

any suggestions on how I can resolve this?

@esirK
Copy link

esirK commented Oct 19, 2021

MC_POSTGRESQL_SKIP_MIGRATIONS is an environment variable which, when set, skips all of the migration stuff and just starts PostgreSQL on whatever is currently in /var/lib/postgresql/. ZFS backups is just one of the ways how it gets used. Please leave support for that environment variable if you can.

From the comments, I see I can set MC_POSTGRESQL_SKIP_MIGRATIONS as an environment variable. Where should I set this from?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants