This repo serves as an example of how you can run Kafka Connect or KSQL on Heroku. Specifically, it demonstrates how fundamental Kafka components integrate well with the Heroku Runtime.
Kafka Connect and KSQL, though not yet officially supported by Heroku, runs well on Heroku and works under both single- and multi-tenant Kafka plans.
This infrastructure effectively leverages the Heroku Runtime as a horizontally-scalable processing layer. That is, we can leverage Kafka's consumer group functionality alongside Heroku's Dynos to achieve scale-in/scale-out use cases for Kafka Connect and KSQL. That way, worker instances are rebalanced transparently on our behalf as the end user.
This runs on dynos as a distributed system. You can scale dynos up or down depending on how many
instances of Kafka Connect you want. Rebalancing is done behind the scenes. Notice in the Procfile
that these instances are started as a web
dyno. That is, they expose port 80 or 443. This is because
Kafka Connect in distributed mode exposes a REST interface for mangaging Kafka Connectors for end-users
to interact with.
This runs on dynos as a distributed system. You can also scale dynos up or down. Rebalancing is taken
care of here as well. KSQL also exposes a web port. This is because the ksql
cli is essentially a
read-eval-print-loop that communicates to the KSQL server(s), which communicate to Kafka brokers on
your behalf. Essentially, you submit a KSQL query via REST and get a response. This response can be
a one-time/ad-hoc response or a streaming response, depending on your query.
This repo has an Aptfile.
This file requires that your app uses the Heroku Apt Buildpack.
This Aptfile
tells each dyno, that when it boots, to download the Kafka Connect and KSQL dependencies.
These dependencies contain libraries and binaries necessary to run Kafka Connect and KSQL binaries.
These bianries eventually run java
, so this repo also requires the Heroku JVM Buildpack.
This repo has a Procfile which tells
a dyno what to do when it runs. This Procfile
has commands on how to start a Kafka Connect cluster
in distributed mode and/or a KSQL server.
This repo has three scripts:
setup_certs
: Takes config vars as input and creates required keystores. This is required because Apache Kafka on Heroku strictly requires SSL.start-distributed
: This usessetup_certs
to create keystores, sets up config properties, downloads any additional Kafka Connect dependencies, and starts the Kafka Connect binary.start-ksql
: This usessetup_certs
to create keystores, sets up config properties, downloads additional KSQL dependencies, and starts the KSQL binary.
All scripts are run on the dyno in Heroku.
Notice this repo doesn't contain any code. This is because no code is necessary. Instead, all we have is configuration in the form of demo-connector.json which is used for Kafka Connect to specify Debezium and Connect properties. All that's required of us as the end user is to fill in this config file. More importantly, this file does not even need to be checked in, but is present in this repo as an example. This config file can be stored anywhere since it is eventually send over as a request body to Kafka Connect.
Create app, add buildpacks, add Kafka add-on, modify configs, deploy to Heroku.
$ heroku apps:create my-app
$ heroku buildpacks:add --index 1 https://github.com/actionsprout/heroku-buildpack-apt
$ heroku buildpacks:add heroku/jvm
# Add a credit card to your account so you can create addons.
$ heroku addons:create heroku-kafka:standard-0
# Add key passwords, please change these
$ heroku config:set TRUSTSTORE_PASSWORD=changeme KEYSTORE_PASSWORD=changeme
$ git push heroku master
# Comment out or un-comment desired `web` dyno in the `Procfile` (or just deploy to separate apps)
# and scale appropriate dynos.
$ heroku ps:scale web=1
This repo leverages Debezium to do CDC from Postgres into Kafka. For this process, you will also need Heroku Postgres.
$ heroku addons:create heroku-postgres:standard-0
Then,
- pre-create tables in Postgres corresponding to what you have in
demo-connector.json
. - pre-create Kafka topics from
demo-connector.json
as well.
Interact with Kafka Connect via REST API.
Example:
$ curl -X POST -H "Content-Type: application/json" https://my-app.herokuapp.com/connectors
To deploy a connector, use the demo-connetor.json
as the body and POST
it to your app at the /connectors
endpoint.
Interact with KSQL via the ksql cli.
Example:
$ ksql https://my-app.herokuapp.com
- Kafka Connect and KSQL uses a variety of internal topics consumer groups. Heroku's multitenant Kafka plans leverage Kafka ACLs. This means that you will have to pre-create internal topics and consumer groups in order for this to work on multitenant plans.
- The Heroku Runtime has an ephemeral file system. This means that leveraging RocksDB is yet not possible. Fortunately, RocksDB is not a hard requirement because log-compacted Kafka topcis are the source of truth. However, the trade-off is that this would result in longer boot-up times due to worker instances replaying data.