Skip to content

A Go-based solution designed to load various open data formats stored in Google Cloud Storage (GCS) and BigQuery into a PostgreSQL database.

License

Notifications You must be signed in to change notification settings

TFMV/GCS2Postgres

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GCS2Postgres

Overview

GCS2Postgres is a Go-based solution designed to facilitate the loading of various open data formats stored in Google Cloud Storage (GCS) and BigQuery into a PostgreSQL database. This solution leverages Google BigQuery for data extraction and transformation, providing a seamless and scalable data pipeline.

GCS2Postgres

Features

  • Supports multiple file formats: CSV, JSON, Parquet, Avro, Iceberg.
  • Utilizes Google BigQuery to create external tables for data processing.
  • Implements connection pooling with pgx/v5 for efficient database operations.
  • Configurable via YAML file for flexibility.
  • Retrieves PostgreSQL credentials securely from Google Secret Manager.
  • Concurrent data processing to maximize performance.

Configuration

The configuration is managed through a config.yaml file. Below is an example configuration:

postgres:
  host: "localhost"
  port: 5432
  user: "postgres"
  dbname: "tfmv"
  sslmode: "disable"
  secret_name: "projects/your_project_number/secrets/your_secret_name/versions/latest"

gcs:
  bucket_name: "your_bucket_name"
  project_id: "your_gcp_project_id"
  dataset: "your_bigquery_dataset"
  files:
    - name: "regions.parquet"
      table: "regions"
    - name: "cities.avro"
      table: "cities"
  concurrent_jobs: 3

bq:
  project_id: "your_gcp_project_id"
  dataset: "your_bigquery_dataset"
  tables:
    - name: "nation"
      table: "nation"

Usage

  1. Set up the YAML configuration file with the necessary details, including PostgreSQL connection settings and GCS details.
  2. Ensure the necessary permissions are granted for accessing GCS, BigQuery, and Google Secret Manager.
  3. Build the project using the Go build tool:
go build -o GCS2Postgres
./GCS2Postgres

Example

To load data from multiple files in a GCS bucket into PostgreSQL, ensure your config.yaml is properly set up and run the application. The application will create external tables in BigQuery for each file, fetch the data, and load it into the specified PostgreSQL database.

Contributing

Contributions are welcome! Please fork this repository, make your changes, and submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

About

A Go-based solution designed to load various open data formats stored in Google Cloud Storage (GCS) and BigQuery into a PostgreSQL database.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published