GCS2Postgres is a Go-based solution designed to facilitate the loading of various open data formats stored in Google Cloud Storage (GCS) and BigQuery into a PostgreSQL database. This solution leverages Google BigQuery for data extraction and transformation, providing a seamless and scalable data pipeline.
- Supports multiple file formats: CSV, JSON, Parquet, Avro, Iceberg.
- Utilizes Google BigQuery to create external tables for data processing.
- Implements connection pooling with
pgx/v5
for efficient database operations. - Configurable via YAML file for flexibility.
- Retrieves PostgreSQL credentials securely from Google Secret Manager.
- Concurrent data processing to maximize performance.
The configuration is managed through a config.yaml
file. Below is an example configuration:
postgres:
host: "localhost"
port: 5432
user: "postgres"
dbname: "tfmv"
sslmode: "disable"
secret_name: "projects/your_project_number/secrets/your_secret_name/versions/latest"
gcs:
bucket_name: "your_bucket_name"
project_id: "your_gcp_project_id"
dataset: "your_bigquery_dataset"
files:
- name: "regions.parquet"
table: "regions"
- name: "cities.avro"
table: "cities"
concurrent_jobs: 3
bq:
project_id: "your_gcp_project_id"
dataset: "your_bigquery_dataset"
tables:
- name: "nation"
table: "nation"
- Set up the YAML configuration file with the necessary details, including PostgreSQL connection settings and GCS details.
- Ensure the necessary permissions are granted for accessing GCS, BigQuery, and Google Secret Manager.
- Build the project using the Go build tool:
go build -o GCS2Postgres
./GCS2Postgres
To load data from multiple files in a GCS bucket into PostgreSQL, ensure your config.yaml is properly set up and run the application. The application will create external tables in BigQuery for each file, fetch the data, and load it into the specified PostgreSQL database.
Contributions are welcome! Please fork this repository, make your changes, and submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for more details.