Skip to content

🧰 Postgres upgrade data integrity script

License

Notifications You must be signed in to change notification settings

processout/pg-upgrade-data-check

Repository files navigation

pg-upgrade-data-check

Installing

go install github.com/processout/pg-upgrade-data-check@latest or use one of the binaries available in https://github.com/processout/pg-upgrade-data-check/releases

Description

pg-upgrade-data-check will compare Postgres table contents of two databases (source and target) to ensure there are no differences in data.

It will detect discrepancies like;

  • Row missing in the target database table
  • Extra row in the target database table
  • Difference in any column value for the same id

Note: It's designed to work purely on tables content (it will not check indexes, etc). Note 2: It will only work for tables that have numeric id that is incremented for each row.

Configuration

All configurations related to tables are contained in the examples/config.yaml file. You must specify the table that needs to be checked in .tables and then configure a function for collect and compare:

  • collect: is executed in first and second stage in order to identify range of id's to compare, usually the max of the current table.
  • compare: should return two values: id and hash; hash is used to compare rows of the same id between two databases.

A full example would be:

 tables:
   users:
     collect: select max(id) from %s limit 1;
     compare: select id, md5(%s::TEXT) as hash from %s where id>=%d and id<=%d order by id asc;

Each command will require the database DSN, which might be indicated by --source-url, --target-url or both.

Usage

As pg-upgrade-data-check has been designed primarily for verifying effects of cloning & replication, it operates in three stages: before, after and compare. Each of these are commands in the cli itself, e.g: pg-upgrade-data-check before --source-url postgres://postgres@localhost:6001/postgres

Stage 1: before

pg-upgrade-data-check before --source-url postgres://postgres@localhost:6001/postgres --config config.yaml

For each configured table find the starting id. This query is executed on the source database, and output is saved to data/before_ids.json file. For safety reasons it is not possible to execute this step if the destination file already exist. Delete that file first if you don't care about previous result.

When using pg-upgrade-data-check during Postgres upgrade this stage should be executed before you create a replication slot on the source database.

Stage 2: after

pg-upgrade-data-check after --target-url postgres://postgres@localhost:6002/postgres --config config.yaml

For each configured table find the ending id. Again, this is executed on the source database, with output written this time to data/after_ids.json file. It has the same safety mechanism as previous stage.

When using pg-upgrade-data-check during Postgres upgrade this stage should be executed after target database has been restored from snapshot and logical replication caught up to source.

Stage 3: compare

pg-upgrade-data-check compare --source-url postgres://postgres@localhost:6001/postgres --target-url postgres://postgres@localhost:6002/postgres --config config.yaml

Load the contents of after_ids.json (generated by the previous stages) and for all tables from configuration compare rows between startId and stopId, detecting additionally any missing or extra rows on the target db.