delta lake poc implementation. kafka connect (Mysql debezium CDC and S3 connectors) S3 spark+deltalake
We are using Docker Compose to deploy following components
* MySQL
* Kafka
* ZooKeeper
* Kafka Broker
* Kafka Connect with [Debezium](https://debezium.io/) and [JDBC](https://github.com/confluentinc/kafka-connect-jdbc) Connectors
* PostgreSQL
* minio - local S3
* Spark
* master
* spark-worker-1
* spark-worker-2
* pyspark jupyter notebook
### Usage
How to run:
```shell
docker-compose up -d
# see kafka confluent control-center on http://localhost:9021/
# Start PostgreSQL connector
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/ -d @jdbc-sink.json
# Start S3 minio connector
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/ -d @s3-minio-sink.json
# Start MySQL connector
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/ -d @source.json
Check contents of the MySQL database:
docker-compose exec mysql bash -c 'mysql -u $MYSQL_USER -p$MYSQL_PASSWORD inventory -e "select * from customers"'
+------+------------+-----------+-----------------------+
| id | first_name | last_name | email |
+------+------------+-----------+-----------------------+
| 1001 | Sally | Thomas | [email protected] |
| 1002 | George | Bailey | [email protected] |
| 1003 | Edward | Walker | [email protected] |
| 1004 | Anne | Kretchmar | [email protected] |
+------+------------+-----------+-----------------------+
Verify that the PostgreSQL database has the same content:
docker-compose exec postgres bash -c 'psql -U $POSTGRES_USER $POSTGRES_DB -c "select * from customers"'
last_name | id | first_name | email
-----------+------+------------+-----------------------
Thomas | 1001 | Sally | [email protected]
Bailey | 1002 | George | [email protected]
Walker | 1003 | Edward | [email protected]
Kretchmar | 1004 | Anne | [email protected]
(4 rows)
Insert a new record into MySQL;
docker-compose exec mysql bash -c 'mysql -u $MYSQL_USER -p$MYSQL_PASSWORD inventory'
mysql> insert into customers values(default, 'John', 'Doe', '[email protected]');
Query OK, 1 row affected (0.02 sec)
Verify that PostgreSQL contains the new record:
docker-compose exec postgres bash -c 'psql -U $POSTGRES_USER $POSTGRES_DB -c "select * from customers"'
last_name | id | first_name | email
-----------+------+------------+-----------------------
...
Doe | 1005 | John | [email protected]
(5 rows)
Update a record in MySQL:
mysql> update customers set first_name='Jane', last_name='changed' where last_name='Thomas';
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
Verify that record in PostgreSQL is updated:
docker-compose exec postgres bash -c 'psql -U $POSTGRES_USER $POSTGRES_DB -c "select * from customers"'
last_name | id | first_name | email
-----------+------+------------+-----------------------
...
Roe | 1005 | Jane | [email protected]
(5 rows)
Delete a record in MySQL:
mysql> delete from customers where email='[email protected]';
Query OK, 1 row affected (0.01 sec)
Verify that record in PostgreSQL is deleted:
docker-compose exec postgres bash -c 'psql -U $POSTGRES_USER $POSTGRES_DB -c "select * from customers"'
last_name | id | first_name | email
-----------+------+------------+-----------------------
...
(4 rows)
As you can see there is no longer a 'Jane Doe' as a customer.
get notebook token:
docker-compose exec pyspark bash -c "jupyter server list"
open localhost:9999
open work/write_read_to_minio.ipynb
see http://localhost:8080/ for the workers and DAG image
End application:
docker-compose down