Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for passthrough of extra args to pg_dump #26

Merged
merged 1 commit into from
Jul 8, 2022

Conversation

alexhall
Copy link
Contributor

I started off looking for a way to restrict the anonymized database dump to a particular schema, which pg_dump already supports via the -n flag. While we could conceivably define an equivalent pg-anonymizer flag and pass it through, the pg_dump tool already supports a rich variety of CLI flags that help control what objects get dumped and we wouldn't want to have to define separate flags for each one that somebody might want to use.

A more flexible approach, as implemented here, could be to switch from a single, named positional arg (a database connection string) to instead use variable-length arguments which we can then pass through to pg_dump (after sanitizing to ensure that we aren't breaking pg-anonymizer by altering the dump format or writing to a file rather than stdout).

By setting the strict = false flag, we instruct he OCLIF arg parser to gather up any unrecognized flags as positional arguments which are then exposed in argv. You can also force the issue by using the special -- argument to separate flags from arguments a la GNU getopt. So the following commands are equivalent and both result in passing -n myschema mydb to pg_dump:

$ pg-anonymizer -n myschema -l first_name:faker.name.firstName mydb
$ pg-anonymizer -l first_name:faker.name.firstName -- -n myschema mydb

I believe this change to be fully backwards compatible with the existing CLI-parsing behavior, but in the absence of a test suite I'm not sure what the best way of demonstrating that might be.

@alexhall
Copy link
Contributor Author

This is a potential fix for #23

@alexhall
Copy link
Contributor Author

Hi! Just wanted to see if there are any questions I can answer or anything else I can do to help get this PR reviewed.

@rap2hpoutre
Copy link
Owner

Hi @alexhall! Thank you for your contribution and sorry for the delay. Your addition seems legit and I guess it's a good move.

I believe this change to be fully backwards compatible with the existing CLI-parsing behavior, but in the absence of a test suite I'm not sure what the best way of demonstrating that might be.

Oops sorry about that!

The only thing I want to be sure is: will this command (the default command) still work? (TBH I don't remember how to test a CLI without publishing it so I prefer asking)

npx pg-anonymizer postgres://user:secret@localhost:1234/mydb -o dump.sql

@rap2hpoutre rap2hpoutre self-requested a review May 4, 2022 07:21
@alexhall
Copy link
Contributor Author

alexhall commented May 6, 2022

Hi @rap2hpoutre, thanks for getting back to me!

The only thing I want to be sure is: will this command (the default command) still work? (TBH I don't remember how to test a CLI without publishing it so I prefer asking)

npx pg-anonymizer postgres://user:secret@localhost:1234/mydb -o dump.sql

Yes, I can confirm that this still works. Here's a simple test run to demonstrate:

~/src/pg-anonymizer (add-pgdump-args) > createdb testdb
~/src/pg-anonymizer (add-pgdump-args) > psql testdb -c "create table people(name text, email text); insert into people (name, email) values ('John Doe', '[email protected]');"
INSERT 0 1
~/src/pg-anonymizer (add-pgdump-args) > bin/run postgres://alex:**********@localhost/testdb -o dump.sql
Launching pg_dump
Command pg_dump started, running anonymization.
Output file: dump.sql
Anonymizing table public.people
Columns to anonymize: name, email
~/src/pg-anonymizer (add-pgdump-args) > cat dump.sql 
--
-- PostgreSQL database dump
--

-- Dumped from database version 12.10 (Ubuntu 12.10-1.pgdg20.04+1)
-- Dumped by pg_dump version 12.10 (Ubuntu 12.10-1.pgdg20.04+1)

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

SET default_tablespace = '';

SET default_table_access_method = heap;

--
-- Name: people; Type: TABLE; Schema: public; Owner: alex
--

CREATE TABLE public.people (
    name text,
    email text
);


ALTER TABLE public.people OWNER TO alex;

--
-- Data for Name: people; Type: TABLE DATA; Schema: public; Owner: alex
--

COPY public.people (name, email) FROM stdin;
Samuel Bogisich	[email protected]
\.


--
-- PostgreSQL database dump complete
--

@GeekOnCoffee
Copy link

This would be a huge win for our use!

@rap2hpoutre rap2hpoutre merged commit 776b5be into rap2hpoutre:main Jul 8, 2022
github-actions bot pushed a commit that referenced this pull request Jul 8, 2022
# [0.6.0](v0.5.1...v0.6.0) (2022-07-08)

### Features

* allow for passthrough of extra args to pg_dump ([#26](#26)) ([776b5be](776b5be))
@rap2hpoutre
Copy link
Owner

Thank you for your contribution (and sorry for the late answer!)

@github-actions
Copy link

github-actions bot commented Jul 8, 2022

🎉 This PR is included in version 0.6.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants