-
Notifications
You must be signed in to change notification settings - Fork 87
Replace home-grown PostgreSQL database migrations with something modern #754
Comments
Alrighty, I've reviewed a variety of options in this space. Taking into consideration the fact that we don't want to use an ORM and we want to write our migrations in plain SQL or PL/pgSQL, the top contenders in my mind are Flyway in combination with pgTAP for testing, or sqitch (maybe also with pgTAP, but that's not strictly necessary). You're already familiar with Flyway, so I won't go too much into its details. Pros:
Cons:
Sqitch, meanwhile, is a system for tracking changes and automatically testing them before deploy. When starting on a new migration, you run a command that creates three Pros:
Cons:
Stray thoughts:
|
Thanks, this is very thorough!
By that are you referring to dry runs or undos? I would feel inclined to just downright give up on the idea on writing those "down" (revert) migrations. First of all, it's pretty hard to write a "down" migration for an "up" migration like that:
(Here we iterate over tables Also, I can't quite figure out the reason for having those "down" migrations in the first place:
Am I missing something here? Feel free to disagree, maybe I'm downright wrong somewhere.
I would be of the opinion that we're better off testing the application itself that uses the schema and not the schema migrations. I don't think there's much of a point in verifying that the table really exists right after executing Again, maybe I don't see something here? What is it that those tests would test for?
I kinda like Yandex's KISS approach to migrations - their tool seems to be something that one of us could code together in a weekend with a few Red Bulls :), meaning that it wouldn't be hard to "migrate" away from this migration tool if need be. Yeah, the docs are a bit funny. They get a few things right:
I wonder what's the reason to insist for the migration files to be in ASCII though: https://github.com/yandex/pgmigrate/blob/master/doc/tutorial.md#utf-8-migrations
Again, don't let yourself suffer too much over these comments - feel free to skip the non-useful ones. If you find it too time consuming (e.g. taking more than two runs of Dekmantel Podcast 013 - Linkwood) to copy those comments over, don't hesitate to cut corners. Or maybe there's a better way to preserve some of this information in the comments instead of |
Update for anyone else reading this issue: Linas and I discussed and actually settled on trying Yandex's tool. There don't seem to be super large differences in functionality vs. Flyway, it's fully FOSS, and it wouldn't be crazy complicated to rewrite on our own if we ever needed to do so. I'm going to take a stab at implementing it and will link to the PR here when ready. |
Fixed in #759 |
We forgot about one thing - when pgmigrate tries to apply migrations to the current production database (or any other existing dataset) for the first time, it will assume that it's currently at the schema version 0, i.e. it will try applying So, we need to tell pgmigrate that the current schema is actually at version 1 already. pgmigrate seems to store its schema version in a DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_type WHERE typname = 'schema_version_type') THEN
CREATE TYPE public.schema_version_type AS ENUM (
'auto',
'manual'
);
END IF;
END $$;
CREATE TABLE IF NOT EXISTS public.schema_version (
version BIGINT NOT NULL PRIMARY KEY,
description TEXT NOT NULL,
type public.schema_version_type NOT NULL DEFAULT 'auto',
installed_by TEXT NOT NULL,
installed_on TIMESTAMP WITHOUT time ZONE DEFAULT now() NOT NULL
);
INSERT INTO public.schema_version (
version,
description,
type,
installed_by,
installed_on
) VALUES (
1,
'initial schema',
'auto',
'mediacloud',
NOW()
) ON CONFLICT (version) DO NOTHING; I've run this on our production and will tell CfA people for them to do it on their end too. |
excellent catch! |
Hey James!
So, are you up for fixing up some past mistakes of mine?
As you might have noticed (and as is described in the docs), we use a self-made database migrations system that I came up with. The essence of it is that to come up with a migration, one has to:
apps/postgresql-server/schema/mediawords.sql
which serves as our reference schema file.apps/postgresql-server/schema/migrations/mediawords-XXX0-XXX1.sql
that will get imported when the production database actually get migrated.This process kinda works, but there are a few issues with it:
postgresql-server
gets built into a Docker image, only the referenceschema/mediawords.sql
gets imported which is not something that we do at all when deploying. Instead, migrations get applied onpostgresql-server
container start, i.e. thebin/postgresql_server.sh
(viabin/apply_migrations.sh
) starts the server in an unpublished port (IIRC1234
), tests whether the schema of the database is up to date, applies the migrations if needed, stops the server and then restarts in on a "proper" port. If a migration fails, then the container never really starts (which is okay because we don't want a database instance with an out-of-date schema but also not okay because we should somehow learn that the migration is broken before we even try deploying it).schema/mediawords.sql
and a migration file inschema/migrations/
.So, a new migration system is due. Something like this would be a tremendous improvement IMHO:
mediawords.sql
file altogether and manage the schema only via migrations.Or is there a better way to do migrations these days?
Vague to-do:
set_database_schema_version()
stuff, quite possibly fix a few syntax errors here and there (let me know if you get stuck with those, I might be able to help out).schema/mediawords.sql
with whatever schema got imported though those migrations and make sure that they more or less look like the same thing.mediawords.sql
schema file and what ends up in the database after going through all of the migrations. If it's just column order or something like that that's different, then that's fine, but I think it's important to not miss a table or two.schema/mediawords.sql
into one database, all of the migrations into another,pg_dump
both databases and review a diff between them.Notes, considerations and a wishlist:
While
schema/mediawords.sql
has to go, it would still be tremendously useful to have a single (auto-generated) file with the currently active schema for our own reference (i.e. something that you could look at while developing things). Maybe the container image build process could import all the migrations and then do a quick schema dump to a separate file that we could then be able to look at? Something like:So that later one could do:
docker run dockermediacloud/postgresql-server cat /mediawords.sql > mediawords.sql
to extract the latest schema.
Or is there some sort of a better way?
pg_dump
-generated reference schema files) is that we'd lose the-- comments
that we have in the schema. Some of these comments are not particularly useful (e.g.feeds.type
column is described as-- Feed type
:)) but others are something that we'd like to retain. Maybe the most useful comments could be ported toCOMMENT ON
statements? Or would this be too much of a hassle?It would be nice to have (retain) to have some way of applying any given migrations manually before deploying the rebuilt
postgresql-server
service. Sometimes migrations that work fine on an empty / testing database don't quite work as well on a production one (e.g.CREATE INDEX
on a table with a billion rows) so sometimes it's useful to apply the migration manually before deploying anything. If the migration tool would be able to print out SQL that would be run on the live database instead of the tool insisting that it has to run the SQL itself, that would be pretty great! If not, then oh well, we'll think of something.If it so happens that your migration tool of choice uses Java, consider using (and rebasing
postgresql-base
on)java-base
app which is a base container image for apps that use Java.If you don't like it that migrations get applied at deployment time, that's up for a discussion too - it just seemed to make sense to me so I did it that way, but maybe there's a better way (point in time) to apply those migrations.
Years ago, I started doing this in a
flyway
branch but never finished up the work. What might be useful for you in this branch is the pre-migrations full schema file (the first "migration" so to say) and a bunch of migrations with some fixed up SQL.While doing the task, please keep in mind that if it seems to take too much time to do something (e.g. port existing migrations or rewrite comments to
COMMENT ON
statements), we could always decide to skip on the too-time-consuming parts of this task (e.g. existing migrations is just a nice-to-have, and comments on various tables could potentially be fished out some other way). Don't let yourself get scope-creeped!As always, do let me know if you have any questions and / or need any help.
The text was updated successfully, but these errors were encountered: