Pub/Sub Direct to BigQuery Lab
Note the project ID.
We will need this for the rest of the lab.
This is needed for the networking setup.
Go to Cloud IAM and through the UI, grant yourself security admin role.
This is needed to set project level policies
In the UI, set context to organization level (instead of project)
Go to Cloud IAM and through the UI, grant yourself Organization Policy Administrator at an Organization level.
Don't forget to set the project back to the project you created in Step 1 above in the UI.
The following services and resources will be created via Terraform scripts:
- VPC, Subnetwork and NAT rules
- IAM permissions for user principals and Google Managed default service accounts
- CloudSQL mysql instance; test database and peoples table
- Pub/Sub topic and BQ Subscription for Debezium CDC Data Updates
- Pub/Sub topic and BQ Subscription for Debezium CDC DDL Updates
- BigQuery Dataset and peoples table
- Composer Environment with Pre-Defined Environment Variables
- Terraform for automation
- Cloud Shell for executing Terraform
This section covers creating the environment via Terraform from Cloud Shell.
- Launch cloud shell
- Clone this git repo
- Provision foundational resources such as Google APIs and Organization Policies
- Provision the GCP data Analytics services and their dependencies for the lab
Instructions for launching and using cloud shell are available here.
cd ~
git clone https://github.com/j-f-oleary-bigdata/pubsub2bq
cd ~/pubsub2bq/
Browse and familiarize yourself with the layout and optionally, review the scripts for an understanding of the constructs as well as how dependencies are managed.
- Define variables for use with Terraform
- Initialize Terraform
- Run a Terraform plan & study it
- Apply the Terraform to create the environment
- Validate the environment created
Modify the below as appropriate for your deployment..e.g. region, zone etc. Be sure to use the right case for GCP region & zone.
Make the corrections as needed below and then cut and paste the text into the Cloud Shell Session.
PROJECT_ID=`gcloud config list --format "value(core.project)" 2>/dev/null`
PROJECT_NBR=`gcloud projects describe $PROJECT_ID | grep projectNumber | cut -d':' -f2 | tr -d "'" | xargs`
echo "PROJECT_ID=$PROJECT_ID"
Needs to run in cloud shell from ~/pubsub2bq/terraform
cd ~/pubsub2bq/terraform
terraform init
Needs to run in cloud shell from ~/pubsub2bq/terraform
terraform apply \
-var="project_id=${PROJECT_ID}" \
-var="project_nbr=${PROJECT_NBR}" \
--auto-approve
Time taken to complete: <10 minutes
Needs to run in cloud shell from ~/pubsub2bq/terraform
nohup cloud-sql-proxy --port 3306 $PROJECT_ID:us-central1:mysql-pubsub2bq > sqlproxy.log 2>&1 &
Execute the command:
admin_@cloudshell:~ (pubsub2bqv2)$ tail -f sqlproxy.log
Output should look similar to this:
2023/11/26 23:28:13 Authorizing with Application Default Credentials
2023/11/26 23:28:13 [pubsub2bqv2:us-central1:mysql-pubsub2bq] Listening on 127.0.0.1:3306
2023/11/26 23:28:13 The proxy has started successfully and is ready for new connections!
Extract the mysql password with the following command
MYPASS=$(sed -n "/^debezium.source.database.password=/p" application.properties | sed "s/^debezium.source.database.password=//")
echo $MYPASS
N.B.This is for demo purposes only; typically you wouldn't store the password in clear text
Run the following command:
mysql -u pubsub2bq -p${MYPASS} --host 127.0.0.1 < pubsub2bq.sql
Run the following command:
mysql -u pubsub2bq -p${MYPASS} --host 127.0.0.1
then from the sql prompt run the following commmads: ``` select * from test.people; ```
Output should look similar to this: ```
+-----+------------+-------------+---------------+---------+-----------+---------+
| id | first_name | last_name | email | zipcode | city | country |
+-----+------------+-------------+---------------+---------+-----------+---------+
| 990 | Tom | BagofDonuts | [email protected] | 2174 | Arlington | USA |
| 991 | Phil | BagofDonuts | [email protected] | 2174 | Arlington | USA |
| 992 | Ted | BagofDonuts | [email protected] | 2174 | Arlington | USA |
+-----+------------+-------------+---------------+---------+-----------+---------+
3 rows in set (0.03 sec)
```
The command below needs to run in cloud shell from ~/pubsub2bq/terraform/debezium-server
cd ~/pubsub2bq/terraform/~/pubsub2bq/terraform/debezium-server
./run.sh
Stop the debezium server by hitting ctrl-c in the cloud-shell terminal window
Run the query below from the cloud shell:
bq query "select * from ${PROJECT_ID}.pubsub2bq_dataset.people"
The results should look similar to this: ```
+-----+------------+-------------+---------------+---------+-----------+---------+-----------+
| id | first_name | last_name | email | zipcode | city | country | __deleted |
+-----+------------+-------------+---------------+---------+-----------+---------+-----------+
| 990 | Tom | BagofDonuts | [email protected] | 2174 | Arlington | USA | false |
| 991 | Phil | BagofDonuts | [email protected] | 2174 | Arlington | USA | false |
| 992 | Ted | BagofDonuts | [email protected] | 2174 | Arlington | USA | false |
```
Run the following command:
mysql -u pubsub2bq -p${MYPASS} --host 127.0.0.1
then from the sql prompt run the following commmads:
``` use test;
alter table people ADD column phone VARCHAR(255);
INSERT INTO people (id, first_name, last_name,email,zipcode,city,country,phone)
values (994, "Tim", "BagofDonuts","[email protected]", 2174 ,"Arlington","USA","508-555-1212");
select * from people;
```
The results should look similar to the following: [Notice the new 'phone' column] ```
+-----+------------+-------------+---------------+---------+-----------+---------+--------------+
| id | first_name | last_name | email | zipcode | city | country | phone |
+-----+------------+-------------+---------------+---------+-----------+---------+--------------+
| 990 | Tom | BagofDonuts | [email protected] | 2174 | Arlington | USA | NULL |
| 991 | Phil | BagofDonuts | [email protected] | 2174 | Arlington | USA | NULL |
| 992 | Ted | BagofDonuts | [email protected] | 2174 | Arlington | USA | NULL |
| 994 | Tim | BagofDonuts | [email protected] | 2174 | Arlington | USA | 508-555-1212 |
+-----+------------+-------------+---------------+---------+-----------+---------+--------------+
```
The command below needs to run in cloud shell from ~/pubsub2bq/terraform/debezium-server
cd ~/pubsub2bq/terraform/~/pubsub2bq/terraform/debezium-server
./run.sh
Stop the debezium server by hitting ctrl-c in the cloud-shell terminal window
Run the query below from the cloud shell:
bq query "select * from ${PROJECT_ID}.pubsub2bq_dataset.people"
The results should look similar to this:
[Notice there is not 'phone' column]
```
+-----+------------+-------------+---------------+---------+-----------+---------+-----------+
| id | first_name | last_name | email | zipcode | city | country | __deleted |
+-----+------------+-------------+---------------+---------+-----------+---------+-----------+
| 990 | Tom | BagofDonuts | [email protected] | 2174 | Arlington | USA | false |
| 991 | Phil | BagofDonuts | [email protected] | 2174 | Arlington | USA | false |
| 992 | Ted | BagofDonuts | [email protected] | 2174 | Arlington | USA | false |
| 994 | Tim | BagofDonuts | [email protected] | 2174 | Arlington | USA | false |
```
Run the query below from the cloud shell:
gcloud pubsub subscriptions pull pubsub2bq-nobq-sub
The results should look similar to this: [Notice there is a 'phone' column but it is dropped from BigQuery because the 'phone' column is not in the topic schema] ``` DATA: {"id":994,"first_name":"Tim","last_name":"BagofDonuts","email":"[email protected]","zipcode":2174,"city":"Arlington","country":"USA","phone":"508-555-1212","__deleted":"false"} MESSAGE_ID: 9760094994850475 ORDERING_KEY: {"id":994} ATTRIBUTES: googclient_schemaencoding=JSON googclient_schemaname=projects/pubsub2bqv2/schemas/pubsub2bq-schema googclient_schemarevisionid=3afd3cc8 ```
In the BigQuery UI, edit the people table and add an 'age' column (as integer) as shown below:
Run the following command:
mysql -u pubsub2bq -p${MYPASS} --host 127.0.0.1
then from the sql prompt run the following commmads: ``` use test; INSERT INTO people (id, first_name, last_name,email,zipcode,city,country,phone) values (995, "Kim", "BagofDonuts","[email protected]", 2174 ,"Arlington","USA","508-555-1212"); select * from people; ```
The results should look similar to the following: [Notice the new row of data] ```
+-----+------------+-------------+---------------+---------+-----------+---------+--------------+
| id | first_name | last_name | email | zipcode | city | country | phone |
+-----+------------+-------------+---------------+---------+-----------+---------+--------------+
| 990 | Tom | BagofDonuts | [email protected] | 2174 | Arlington | USA | NULL |
| 991 | Phil | BagofDonuts | [email protected] | 2174 | Arlington | USA | NULL |
| 992 | Ted | BagofDonuts | [email protected] | 2174 | Arlington | USA | NULL |
| 994 | Tim | BagofDonuts | [email protected] | 2174 | Arlington | USA | 508-555-1212 |
| 995 | Kim | BagofDonuts | [email protected] | 2174 | Arlington | USA | 508-555-1212 |
+-----+------------+-------------+---------------+---------+-----------+---------+--------------+
5 rows in set (0.03 sec)
```
The command below needs to run in cloud shell from ~/pubsub2bq/terraform/debezium-server
cd ~/pubsub2bq/terraform/~/pubsub2bq/terraform/debezium-server
./run.sh
Stop the debezium server by hitting ctrl-c in the cloud-shell terminal window
Run the query below from the cloud shell:
bq query "select * from ${PROJECT_ID}.pubsub2bq_dataset.people"
The results should look similar to this:
[Notice there is a new row for 'id' of '995'
```
+-----+------------+-------------+---------------+---------+-----------+---------+-----------+------+
| id | first_name | last_name | email | zipcode | city | country | __deleted | age |
+-----+------------+-------------+---------------+---------+-----------+---------+-----------+------+
| 995 | Kim | BagofDonuts | [email protected] | 2174 | Arlington | USA | false | NULL |
| 990 | Tom | BagofDonuts | [email protected] | 2174 | Arlington | USA | false | NULL |
| 991 | Phil | BagofDonuts | [email protected] | 2174 | Arlington | USA | false | NULL |
| 992 | Ted | BagofDonuts | [email protected] | 2174 | Arlington | USA | false | NULL |
| 994 | Tim | BagofDonuts | [email protected] | 2174 | Arlington | USA | false | NULL |
+-----+------------+-------------+---------------+---------+-----------+---------+-----------+------+
```
Run the query below from the cloud shell:
bq rm --table "pubsub2bq_dataset.people"
Run the query below from the cloud shell from ~/pubsub2bq/terraform/:
bq mk \
--table \
--expiration 3600 \
pubsub2bq_dataset.people \
bq_schemav2.json
Run the following command:
mysql -u pubsub2bq -p${MYPASS} --host 127.0.0.1
then from the sql prompt run the following commmads: ``` use test; INSERT INTO people (id, first_name, last_name,email,zipcode,city,country,phone) values (996, "Sam", "BagofDonuts","[email protected]", 2174 ,"Arlington","USA","508-555-1212"); select * from people; ```
The results should look similar to the following: [Notice the new row of data] ```
+-----+------------+-------------+---------------+---------+-----------+---------+--------------+
| id | first_name | last_name | email | zipcode | city | country | phone |
+-----+------------+-------------+---------------+---------+-----------+---------+--------------+
| 990 | Tom | BagofDonuts | [email protected] | 2174 | Arlington | USA | NULL |
| 991 | Phil | BagofDonuts | [email protected] | 2174 | Arlington | USA | NULL |
| 992 | Ted | BagofDonuts | [email protected] | 2174 | Arlington | USA | NULL |
| 994 | Tim | BagofDonuts | [email protected] | 2174 | Arlington | USA | 508-555-1212 |
| 995 | Kim | BagofDonuts | [email protected] | 2174 | Arlington | USA | 508-555-1212 |
| 996 | Sam | BagofDonuts | [email protected] | 2174 | Arlington | USA | 508-555-1212 |
+-----+------------+-------------+---------------+---------+-----------+---------+--------------+
6 rows in set (0.03 sec)
```
The command below needs to run in cloud shell from ~/pubsub2bq/terraform/debezium-server
cd ~/pubsub2bq/terraform/~/pubsub2bq/terraform/debezium-server
./run.sh
Stop the debezium server by hitting ctrl-c in the cloud-shell terminal window
Run the query below from the cloud shell:
bq query "select * from ${PROJECT_ID}.pubsub2bq_dataset.people"
You get zero results because there was a schema mismatch as shown below: