Skip to content

Latest commit

 

History

History
 
 

spanner-changestreams-bigquery

Stream Spanner Data Changes to BigQuery

This repo uses terraform to create below resources in order to deploy an end-to-end pipeline for spanner change streams and streaming the observed changes to BigQuery.

  • The Cloud Spanner instance to read change streams from.
  • The Cloud Spanner database to read change streams from.
  • The Cloud Spanner instance to use for the change streams connector metadata table.
  • The Cloud Spanner database to use for the change streams connector metadata table.
  • The Cloud Spanner change stream in the database to be monitored.
  • The BigQuery dataset for change streams output.
  • Dataflow Flex pipeline that streams Cloud Spanner data change records and writes them into BigQuery tables using Dataflow Runner V2.

Requirements

  • A project in an org where all the resources will be created

  • A service account which will be used by terraform having below permissions

    • Spanner
      • "roles/spanner.admin"
    • BigQuery
      • "roles/bigquery.dataOwner""
    • Dataflow
      • "roles/dataflow.admin"
    • At bucket(used to store state) level
      • "roles/storage.objectAdmin"
  • User/Service account executing terraform code need to have below permissions on above service account used by terraform.

    • "roles/iam.serviceAccountTokenCreator"

Providers

Name Version
google.impersonate n/a

Modules

Name Source Version
cs-bq-env ./spanner_cs_bq_dataflow n/a

Resources

Name Type
google_service_account_access_token.default data source

Inputs

Name Description Type Default Required
terraform_service_account Service Account to be impersonated by Terraform. string n/a yes
region Google Cloud region string n/a yes
project_id Google Project ID. string n/a yes
spanner_instance_name_for_userdata The Cloud Spanner instance to read change streams from. string n/a yes
spanner_database_name_for_userdata The Cloud Spanner database to read change streams from. string n/a yes
spanner_instance_name_for_metadata The Cloud Spanner instance to use for the change streams connector metadata table. string n/a yes
spanner_database_name_for_metadata The Cloud Spanner database to use for the change streams connector metadata table. string n/a yes
bigquery_dataset_name The BigQuery dataset for change streams output. string n/a yes
dataflow_job_name Dataflow Streaming Job Name for change streams from Spanner to BigQuery. string n/a yes
spanner_changestream_name The name of the Cloud Spanner change stream to read from. string n/a yes

Outputs

No outputs.