This project is about giving you a step by step introduction on how to leverage docker and the open-source ecosystem to do metrics/logs/alerting.
Note: This project is only intended to present ideas.
Note: if you are using docker for mac please assign at least 5go of memory.
- 1. Introduction
- 2. container logging
- 3. Listening for logs using a container
- 4. Elasticsearch
- 5. Elasticsearch Metrics !
- 6. Better metrics: the TICK stack
- 7. Getting the best of the ecosystem
- 8. Kafka the data hub
- 9. Enter JMX !
- 10. Let's do some manual monitoring
- 11. Self descriptive visualizations
- 12. Your sql databases are back
- 13. Share your database tables as kafka table
- 14. Going even further with Kafka using KSQL
- 15. Going C3
- 16. Going Prometheus
- 17. Going distributed open tracing
- 18. Monitoring Federation
- 19. Security
In docker-compose-step1.yml
we create a simple container that displays hello world
The container definition is as follows
example:
image: ubuntu
command: echo hello world
Run it with docker-compose -f docker-compose-step1.yml
$ docker-compose -f docker-compose-step1.yml up
Creating network "monitoring-demo_default" with the default driver
Creating monitoring-demo_example_1 ...
Creating monitoring-demo_example_1 ... done
Attaching to monitoring-demo_example_1
example_1 | hello world
monitoring-demo_example_1 exited with code 0
Hello world
has been writen on stdout
. How fancy !
The output of the container has also been captured by docker.
Run docker logs monitoring-demo_example_1
you should see
$ docker logs monitoring-demo_example_1
hello world
When outputing to stdout
and stderr
docker captures these logs and send them to the log bus. A listener listen to logs and store container logs into their own log file.
graph LR;
container --> Docker;
Docker -- write to --> stdout;
Docker -- write to --> File;
In order to know where it's stored just inspect the container with docker inspect monitoring-demo_example_1
you should see
$ docker inspect monitoring-demo_example_1
[
{
"Id": "cf1a86e1dc9ac16bc8f60b234f9b3e6310bd591dc385bc1da8e1081d2837752a",
"Created": "2017-10-24T21:24:57.558550709Z",
"Path": "echo",
"Args": [
"hello",
"world"
],
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
...
... snip snip ...
...
}
}
}
]
That's a lot of different information, let's look for the log info
$ docker inspect monitoring-demo_example_1 | grep log
"LogPath": "/var/lib/docker/containers/cf1a86e1dc9ac16bc8f60b234f9b3e6310bd591dc385bc1da8e1081d2837752a/cf1a86e1dc9ac16bc8f60b234f9b3e6310bd591dc385bc1da8e1081d2837752a-json.log",
Perfect, let's extract that field now with jq
$ docker inspect monitoring-demo_example_1 | jq -r '.[].LogPath'
/var/lib/docker/containers/cf1a86e1dc9ac16bc8f60b234f9b3e6310bd591dc385bc1da8e1081d2837752a/cf1a86e1dc9ac16bc8f60b234f9b3e6310bd591dc385bc1da8e1081d2837752a-json.log
Note: you will not to be able to read this file directly using docker for mac.
More about logs : https://docs.docker.com/engine/admin/logging/overview/#use-environment-variables-or-labels-with-logging-drivers
The objective now is to leverage the docker event bus, listen to it and output it on the console.
graph LR;
container --> Docker((Docker));
Docker -- write to --> stdout;
Docker -- write to --> File;
Listener -- listen to --> Docker;
Listener -- write to --> stdout;
Therefore we should see twice anything that is outputed on stdout
.
We will use logspout to listen for all the docker logs.
logspout:
image: bekt/logspout-logstash
restart: on-failure
volumes:
- /var/run/docker.sock:/tmp/docker.sock
environment:
ROUTE_URIS: logstash://logstash:5000
depends_on:
- logstash
Note: In order to read from the log bus, we need to access the docker socket. This the volume mapping configuration.
Once logspout
gets a log, it sends it logstash
.
logstashLogstash
is defined as follows
logstash:
image: logstash
restart: on-failure
command: -e "input { udp { port => 5000 codec => json } } filter { if [docker][image] =~ /^logstash/ { drop { } } } output { stdout { codec => rubydebug } }"
Here I define a complete logstash
configuration on the command line.
Note: logspout
will send all logs event from logstash
, filter the logstash
one to prevent infinite printing.
So here is are the containers at play:
graph LR;
container --> Docker((Docker));
Docker -- write to --> stdout;
Docker -- write to --> File;
Logspout -- listen to --> Docker;
Logspout -- write to --> Logstash;
Logstash -- write to --> stdout;
Run the demo with docker-compose -f docker-compose-step2.yml up
, you should see
$ docker-compose -f docker-compose-step2.yml up
Recreating monitoring-demo_logstash_1 ...
Recreating monitoring-demo_logstash_1
Starting monitoring-demo_example_1 ...
Recreating monitoring-demo_logstash_1 ... done
Recreating monitoring-demo_logspout_1 ...
Recreating monitoring-demo_logspout_1 ... done
Attaching to monitoring-demo_example_1, monitoring-demo_logstash_1, monitoring-demo_logspout_1
example_1 | 11597
example_1 | 9666
example_1 | 3226
...
... snip snip ...
...
example_1 | 10854
logstash_1 | {
logstash_1 | "@timestamp" => 2017-10-24T21:49:09.787Z,
logstash_1 | "stream" => "stdout",
logstash_1 | "@version" => "1",
logstash_1 | "host" => "172.24.0.4",
logstash_1 | "message" => "10854",
logstash_1 | "docker" => {
logstash_1 | "image" => "ubuntu",
logstash_1 | "hostname" => "15716aaf6095",
logstash_1 | "name" => "/monitoring-demo_example_1",
logstash_1 | "id" => "15716aaf6095efdde8ab3e566a911aac284e63d3c949dd19ddfd64258d20de9b",
logstash_1 | "labels" => nil
logstash_1 | },
logstash_1 | "tags" => []
logstash_1 | }
Note: Along the message is container metadata! This will be of tremendous help while debugging your cluster !
It's kind of silly to grab stdout in such a convoluted way to export it back to stdout
.
Let's make something useful such as sending all the logs to elasticsearch.
Let's define first an elasticsearch server
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:5.6.0
restart: on-failure
ports:
- "9200:9200"
- "9300:9300"
environment:
xpack.security.enabled: "false"
and it's kibana companion
kibana:
image: docker.elastic.co/kibana/kibana:5.5.2
restart: on-failure
ports:
- "5601:5601"
environment:
xpack.security.enabled: "false"
depends_on:
- elasticsearch
Let's as logstash to send all logs not to stdout
but to elasticsearch
now.
-e "input { udp { port => 5000 codec => json } } filter { if [docker][image] =~ /^logstash/ { drop { } } } output { stdout { codec => rubydebug } }"
becomes
-e "input { udp { port => 5000 codec => json } } filter { if [docker][image] =~ /^logstash/ { drop { } } } output { elasticsearch { hosts => "elasticsearch" } }"
By default the logs will be sent to the logstash-*
index.
So let's create the defaut kibana index pattern.
kibana_index_pattern:
image: ubuntu
command: |
bash -c "sleep 30 ; curl 'http://kibana:5601/es_admin/.kibana/index-pattern/logstash-*/_create' -H 'kbn-version: 5.5.2' -H 'content-type: application/json' --data-binary '{\"title\":\"logstash-*\",\"timeFieldName\":\"@timestamp\",\"notExpandable\":true}'"
depends_on:
- kibana
Here are the containers involved:
graph LR;
Logspout --listen to--> Docker((Docker));
Logspout -- write to --> Logstash;
Logstash -- write to --> Elasticsearch;
Kibana -- reads --> Elasticsearch;
Run the demo with docker-compose -f docker-compose-step3.yml up
$ docker-compose -f docker-compose-step3.yml up
Starting monitoring-demo_example_1 ...
Starting monitoring-demo_example_1
Creating monitoring-demo_elasticsearch_1 ...
Creating monitoring-demo_elasticsearch_1 ... done
Recreating monitoring-demo_logstash_1 ...
Recreating monitoring-demo_logstash_1
Creating monitoring-demo_kibana_1 ...
Recreating monitoring-demo_logstash_1 ... done
Recreating monitoring-demo_logspout_1 ...
Recreating monitoring-demo_logspout_1 ... done
Attaching to monitoring-demo_example_1, monitoring-demo_elasticsearch_1, monitoring-demo_logstash_1, monitoring-demo_kibana_1, monitoring-demo_logspout_1
...
... snip snip ...
...
Now look at the logs in kibana
- open http://localhost:5601/
- click discover
- win !
Docker has metrics about the state of each container, but also what is does consume, let's leverage that !
Let's use metricbeat for that
metricbeat:
image: docker.elastic.co/beats/metricbeat:5.6.3
volumes:
- /var/run/docker.sock:/tmp/docker.sock
depends_on:
- elasticsearch
Note: like for logspout we need to ask container question to docker via its socket.
The nice thing about metric beat is that it comes with ready made dashboards, let's leverage that too.
metricbeat-dashboard-setup:
image: docker.elastic.co/beats/metricbeat:5.6.3
command: ./scripts/import_dashboards -es http://elasticsearch:9200
depends_on:
- elasticsearch
Here are the containers at play :
graph LR;
MetricBeat -- listen to --> Docker((Docker));
MetricBeat -- write to --> Elasticsearch;
MetricBeat -- setup dashboards --> Kibana;
Kibana -- reads from --> Elasticsearch;
Run the demo with docker-compose -f docker-compose-step4.yml up
then look at the
The TICK stack is comprised of
- Telegraf a component that gathers metrics, such as collectd
- Influxdb a timeseries database
- Chronograf a visualization tool
- Kapacitor real time alerting platform
This stack has many very interesting properties, let's leverage them.
Let's start with influxdb
influxdb:
image: influxdb:1.3.7
ports:
- "8086:8086"
Then kapacitor
kapacitor:
image: kapacitor:1.3.3
hostname: kapacitor
environment:
KAPACITOR_HOSTNAME: kapacitor
KAPACITOR_INFLUXDB_0_URLS_0: http://influxdb:8086
depends_on:
- influxdb
Then chronograf
chronograf:
image: chronograf:1.3.10
environment:
KAPACITOR_URL: http://kapacitor:9092
INFLUXDB_URL: http://influxdb:8086
ports:
- "8888:8888"
depends_on:
- influxdb
- kapacitor
Then telegraf
telegraf:
image: telegraf:1.4.3
volumes:
- /var/run/docker.sock:/tmp/docker.sock
- ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf:ro
links:
- influxdb
- elasticsearch
A few things to notice here:
- we once again match docker.sock : we will listen to metrics in telegraf too
- we have a local telegraf.conf
- we link to influxdb as metrics will be shipped there
- we link elasticsearch ... we will monitor elasticsearch too !
Let's look at how telegraf conf looks like.
I removed many default values, if you want to see them fully go to https://github.com/influxdata/telegraf/blob/master/etc/telegraf.conf
[agent]
interval = "10s"
## Outputs
[[outputs.influxdb]]
urls = ["http://influxdb:8086"]
database = "telegraf"
## Inputs
[[inputs.cpu]]
[[inputs.disk]]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.net]]
[[inputs.netstat]]
[[inputs.interrupts]]
[[inputs.linux_sysctl_fs]]
[[inputs.docker]]
endpoint = "unix:///tmp/docker.sock"
[[inputs.elasticsearch]]
servers = ["http://elasticsearch:9200"]
This configuration should be self-explanatory right ?
Note: The telegraf plugin ecosystem is huge, see the full list here : https://github.com/influxdata/telegraf#input-plugins
Now run the demo docker-compose -f docker-compose-step5.yml up
You are starting to have many containers:
The ELK story:
graph LR;
Logspout -- listen to --> Docker((Docker));
Logspout -- write to --> Logstash;
Logstash -- write to --> Elasticsearch;
Kibana -- reads from --> Elasticsearch;
MetricBeat -- listen to --> Docker;
MetricBeat -- write to --> Elasticsearch;
MetricBeat -- one time dashboards setup --> Kibana;
And the TICK story:
graph LR;
Telegraf -- listen to --> Docker((Docker));
Telegraf -- write to --> Influxdb;
Chronograf -- reads from --> Influxdb;
Kapacitor -- listen to --> Influxdb;
Chronograf -- setup rules --> Kapacitor;
Kapacitor -- notifies --> Notification;
Run the demo with docker-compose -f docker-compose-step5.yml up
then look at the following links
You can play around with the alerting system etc.
Are are now in a pretty good shape
- we have all the logs in elasticseach
- we have metrics in elasticsearch
- we have metrics in influxdb
- we have a mean of visualization via chronograf
- we have a mean of alerting via kapacitor
We should be all set right ?
Well, no, we can do better: as an admin I want to mix and match logs, visualization and alerting in a single page.
Let's do that together by leveraging grafana
grafana:
image: grafana/grafana:4.6.1
ports:
- "3000:3000"
depends_on:
- influxdb
- elasticsearch
Nothing fancy here, but if you run like this, you'll have to setup manually
- the elasticsearch datasource
- the influxdb datasource
- the alert channels
- having a few default dashboards
Well there's a local build that does just that
grafana-setup:
build: grafana-setup/
depends_on:
- grafana
graph LR;
Grafana -- reads from --> Influxdb
Grafana -- reads from --> Elasticsearch
Grafana -- write to --> AlertChannels
GrafanaSetup -- one time setup --> Grafana
Run the demo with docker-compose -f docker-compose-step6.yml up
then enjoy your docker metrics in grafan!
Note: Use username admin
password admin
Go at the bottom of the page ... here are the logs for the container you are looking at !
Note: do not hesitate to rely on dashboards from the community at https://grafana.com/dashboards
You can create alerts etc. That's great.
We can't have all this data for ourselves right ? We most probably are not the same users.
What about the security team, what about auditing, what about performance engineers, what about pushing the data to other storages etc.
Well kafka is very useful here, let's leverage that component.
Kafka relies on zookeeper, let's use the simplest images I could find:
zookeeper:
image: wurstmeister/zookeeper:3.4.6
ports:
- "2181:2181"
Same for thing for kafka
kafka:
image: wurstmeister/kafka:1.0.0
ports:
- "9092"
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
depends_on:
- zookeeper
Now we can update telegraf to ask to ship all its data to kafka too.
Let's add the kafka output in the telegraf configuration
[[outputs.kafka]]
brokers = ["kafka:9092"]
topic = "telegraf"
And add the link to telegraf container to kafka server
telegraf:
image: telegraf:1.4.3
volumes:
- /var/run/docker.sock:/tmp/docker.sock
- ./telegraf/telegraf-with-kafka-output.conf:/etc/telegraf/telegraf.conf:ro
links:
- influxdb
- elasticsearch
- kafka
The Kafka story
graph LR;
Telegraf -- listen to --> Docker;
Telegraf -- write to --> Influxdb;
Telegraf -- write to --> Kafka
Kafka -- read/writes --> Zookeeper
Run the demo docker-compose -f docker-compose-step7.yml up
Let's see if we got our metrics data readily available in kafka ...
docker exec -ti monitoring-demo_kafka_1 kafka-console-consumer.sh --zookeeper zookeeper --topic telegraf --max-messages 5
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
docker_container_mem,build-date=20170801,com.docker.compose.service=kibana,license=GPLv2,com.docker.compose.config-hash=1e1f2bf92f25fcc3a4b235d04f600cd276809e7195a0c5196f0a8098e82e47b3,host=c280c5e69493,container_image=docker.elastic.co/kibana/kibana,maintainer=Elastic\ Docker\ Team\ <[email protected]>,com.docker.compose.version=1.16.1,com.docker.compose.oneoff=False,com.docker.compose.project=monitoring-demo,vendor=CentOS,com.docker.compose.container-number=1,name=CentOS\ Base\ Image,engine_host=moby,container_name=monitoring-demo_kibana_1,container_version=5.5.2 pgpgin=98309i,rss_huge=0i,total_pgmajfault=3i,total_pgpgin=98309i,total_rss_huge=0i,usage_percent=1.9058546412278363,active_anon=155103232i,hierarchical_memory_limit=9223372036854771712i,max_usage=272527360i,container_id="aa2195088fd305079d2942b009c9e9fd1bb38781aa558be6a9f084a334b1b755",writeback=0i,pgfault=116807i,pgpgout=59702i,total_mapped_file=0i,total_unevictable=0i,total_writeback=0i,unevictable=0i,active_file=0i,mapped_file=0i,total_inactive_anon=20480i,total_pgfault=116807i,total_rss=154718208i,usage=162758656i,total_active_anon=155103232i,cache=3416064i,rss=154718208i,total_cache=3416064i,total_inactive_file=3010560i,total_pgpgout=59702i,limit=8360689664i,pgmajfault=3i,total_active_file=0i,inactive_anon=20480i,inactive_file=3010560i 1508887282000000000
docker_container_cpu,vendor=CentOS,com.docker.compose.container-number=1,build-date=20170801,container_image=docker.elastic.co/kibana/kibana,com.docker.compose.project=monitoring-demo,container_name=monitoring-demo_kibana_1,cpu=cpu-total,host=c280c5e69493,license=GPLv2,com.docker.compose.config-hash=1e1f2bf92f25fcc3a4b235d04f600cd276809e7195a0c5196f0a8098e82e47b3,com.docker.compose.oneoff=False,engine_host=moby,container_version=5.5.2,com.docker.compose.service=kibana,maintainer=Elastic\ Docker\ Team\ <[email protected]>,com.docker.compose.version=1.16.1,name=CentOS\ Base\ Image usage_total=11394168870i,usage_system=27880670000000i,throttling_periods=0i,throttling_throttled_periods=0i,throttling_throttled_time=0i,usage_in_usermode=10420000000i,usage_in_kernelmode=970000000i,container_id="aa2195088fd305079d2942b009c9e9fd1bb38781aa558be6a9f084a334b1b755",usage_percent=7.948400539083559 1508887282000000000
docker_container_cpu,com.docker.compose.project=monitoring-demo,vendor=CentOS,com.docker.compose.container-number=1,com.docker.compose.oneoff=False,container_image=docker.elastic.co/kibana/kibana,maintainer=Elastic\ Docker\ Team\ <[email protected]>,engine_host=moby,com.docker.compose.config-hash=1e1f2bf92f25fcc3a4b235d04f600cd276809e7195a0c5196f0a8098e82e47b3,build-date=20170801,license=GPLv2,com.docker.compose.version=1.16.1,container_name=monitoring-demo_kibana_1,host=c280c5e69493,name=CentOS\ Base\ Image,cpu=cpu0,container_version=5.5.2,com.docker.compose.service=kibana container_id="aa2195088fd305079d2942b009c9e9fd1bb38781aa558be6a9f084a334b1b755",usage_total=3980860071i 1508887282000000000
docker_container_cpu,com.docker.compose.container-number=1,host=c280c5e69493,name=CentOS\ Base\ Image,com.docker.compose.oneoff=False,container_version=5.5.2,build-date=20170801,com.docker.compose.service=kibana,maintainer=Elastic\ Docker\ Team\ <[email protected]>,vendor=CentOS,com.docker.compose.project=monitoring-demo,engine_host=moby,license=GPLv2,com.docker.compose.config-hash=1e1f2bf92f25fcc3a4b235d04f600cd276809e7195a0c5196f0a8098e82e47b3,com.docker.compose.version=1.16.1,container_name=monitoring-demo_kibana_1,container_image=docker.elastic.co/kibana/kibana,cpu=cpu1 usage_total=3942753596i,container_id="aa2195088fd305079d2942b009c9e9fd1bb38781aa558be6a9f084a334b1b755" 1508887282000000000
docker_container_cpu,maintainer=Elastic\ Docker\ Team\ <[email protected]>,cpu=cpu2,host=c280c5e69493,build-date=20170801,container_version=5.5.2,com.docker.compose.config-hash=1e1f2bf92f25fcc3a4b235d04f600cd276809e7195a0c5196f0a8098e82e47b3,com.docker.compose.version=1.16.1,com.docker.compose.container-number=1,name=CentOS\ Base\ Image,com.docker.compose.oneoff=False,container_name=monitoring-demo_kibana_1,com.docker.compose.service=kibana,container_image=docker.elastic.co/kibana/kibana,vendor=CentOS,com.docker.compose.project=monitoring-demo,engine_host=moby,license=GPLv2 usage_total=1607029783i,container_id="aa2195088fd305079d2942b009c9e9fd1bb38781aa558be6a9f084a334b1b755" 1508887282000000000
Processed a total of 5 messages
Yes it looks like it !
We are in a pretty good shape right ?
Well, we can do better. We have many jvm based components such as kafka, and we know its monitoring is based on the JMX standard.
Telegraf is a go application, it does not speak jvm natively. However it speaks jolokia.
Let's leverage that.
So let's create our own image based on the wurstmeister/kafka
, download jolokia and add it to the image.
FROM wurstmeister/kafka:1.0.0
ENV JOLOKIA_VERSION 1.3.5
ENV JOLOKIA_HOME /usr/jolokia-${JOLOKIA_VERSION}
RUN curl -sL --retry 3 \
"https://github.com/rhuss/jolokia/releases/download/v${JOLOKIA_VERSION}/jolokia-${JOLOKIA_VERSION}-bin.tar.gz" \
| gunzip \
| tar -x -C /usr/ \
&& ln -s $JOLOKIA_HOME /usr/jolokia \
&& rm -rf $JOLOKIA_HOME/client \
&& rm -rf $JOLOKIA_HOME/reference
CMD ["start-kafka.sh"]
And link the new kafka definition to this image
kafka:
build: kafka-with-jolokia/
ports:
- "9092"
environment:
JOLOKIA_VERSION: 1.3.5
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_OPTS: -javaagent:/usr/jolokia-1.3.5/agents/jolokia-jvm.jar=host=0.0.0.0
depends_on:
- zookeeper
Configure telegraf to gather jmx metrics using the jolokia agent
[[inputs.jolokia]]
context = "/jolokia/"
[[inputs.jolokia.servers]]
name = "kafka"
host = "kafka"
port = "8778"
[[inputs.jolokia.metrics]]
name = "heap_memory_usage"
mbean = "java.lang:type=Memory"
attribute = "HeapMemoryUsage"
[[inputs.jolokia.metrics]]
name = "messages_in"
mbean = "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec"
[[inputs.jolokia.metrics]]
name = "bytes_in"
mbean = "kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec"
Then configure telegraf to use the new configuration with jolokia input
telegraf:
image: telegraf:1.4.3
volumes:
- /var/run/docker.sock:/tmp/docker.sock
- ./telegraf/telegraf-with-kafka-output-and-jolokia.conf:/etc/telegraf/telegraf.conf:ro
links:
- influxdb
- elasticsearch
- kafka
Run the demo docker-compose -f docker-compose-step8.yml up
You'll see the new kafka image created
$ docker images | grep demo
monitoring-demo_kafka latest 5a746c9ff5ea 2 minutes ago 270MB
The JMX story
graph LR;
Telegraf -- write to --> Kafka
Telegraf -- get metrics --> Jolokia
Jolokia -- reads JMX --> Kafka
Do we have jolokia metrics ?
$ docker exec -ti monitoring-demo_kafka_1 kafka-console-consumer.sh --zookeeper zookeeper --topic telegraf | grep jolokia
jolokia,host=cde5575b52a5,jolokia_name=kafka,jolokia_port=8778,jolokia_host=kafka heap_memory_usage_used=188793344,messages_in_MeanRate=12.98473084303969,bytes_out_FiveMinuteRate=1196.4939381458667,bytes_out_RateUnit="SECONDS",active_controller_Value=1,heap_memory_usage_init=1073741824,heap_memory_usage_committed=1073741824,messages_in_FiveMinuteRate=4.794914163942757,messages_in_EventType="messages",isr_expands_Count=0,isr_expands_FiveMinuteRate=0,isr_expands_OneMinuteRate=0,messages_in_RateUnit="SECONDS",bytes_in_FifteenMinuteRate=995.4606306690374,bytes_out_OneMinuteRate=3453.5697437249646,bytes_out_Count=413240,offline_partitions_Value=0,isr_shrinks_OneMinuteRate=0,messages_in_FifteenMinuteRate=1.8164700620801133,messages_in_OneMinuteRate=11.923477587504813,bytes_in_Count=955598,bytes_in_MeanRate=7110.765507856953,isr_shrinks_Count=0,isr_expands_RateUnit="SECONDS",isr_shrinks_EventType="shrinks",isr_expands_MeanRate=0,bytes_in_RateUnit="SECONDS",bytes_in_OneMinuteRate=6587.34465794122,bytes_in_FiveMinuteRate=2631.3776025779002,bytes_out_EventType="bytes",isr_shrinks_FiveMinuteRate=0,isr_expands_EventType="expands",messages_in_Count=1745,bytes_out_MeanRate=3074.982298604404,isr_expands_FifteenMinuteRate=0,heap_memory_usage_max=1073741824,bytes_in_EventType="bytes",bytes_out_FifteenMinuteRate=438.0280170256858,isr_shrinks_MeanRate=0,isr_shrinks_RateUnit="SECONDS",isr_shrinks_FifteenMinuteRate=0 1508889300000000000
jolokia,jolokia_name=kafka,jolokia_port=8778,jolokia_host=kafka,host=cde5575b52a5 bytes_in_MeanRate=6630.745414108696,isr_shrinks_RateUnit="SECONDS",isr_expands_EventType="expands",isr_expands_FiveMinuteRate=0,isr_expands_RateUnit="SECONDS",heap_memory_usage_max=1073741824,messages_in_Count=1745,isr_expands_FifteenMinuteRate=0,bytes_out_RateUnit="SECONDS",isr_shrinks_OneMinuteRate=0,isr_shrinks_FifteenMinuteRate=0,isr_shrinks_MeanRate=0,messages_in_RateUnit="SECONDS",bytes_in_OneMinuteRate=5576.066868503058,messages_in_FifteenMinuteRate=1.796398775034883,bytes_in_FiveMinuteRate=2545.1107836610863,bytes_out_Count=413240,active_controller_Value=1,isr_expands_Count=0,heap_memory_usage_committed=1073741824,messages_in_EventType="messages",bytes_in_Count=955598,isr_expands_OneMinuteRate=0,messages_in_FiveMinuteRate=4.637718179794651,messages_in_MeanRate=12.107909165680097,isr_shrinks_Count=0,isr_shrinks_EventType="shrinks",bytes_in_FifteenMinuteRate=984.461178226918,offline_partitions_Value=0,bytes_out_OneMinuteRate=2923.3836736983444,bytes_out_EventType="bytes",isr_shrinks_FiveMinuteRate=0,isr_expands_MeanRate=0,bytes_in_EventType="bytes",bytes_out_MeanRate=2867.3907911149618,messages_in_OneMinuteRate=10.093005874965653,bytes_in_RateUnit="SECONDS",bytes_out_FifteenMinuteRate=433.18797795919676,bytes_out_FiveMinuteRate=1157.2682011038034,heap_memory_usage_init=1073741824,heap_memory_usage_used=189841920 1508889310000000000
Well looks like we do !
Let's say you have some hand coded monitoring tools that you did in python or bash as such
#!/bin/bash
group=$1
kafkaHost=$2
kafkaPort=$3
kafka-consumer-groups.sh --bootstrap-server ${kafkaHost}:${kafkaPort} --group ${group} --describe 2> /dev/null \
| tail -n +3 \
| awk -v GROUP=${group} '{print "kafka_group_lag,group="GROUP",topic="$1",partition="$2",host="$7" current_offset="$3"i,log_end_offset="$4"i,lag="$5"i"}'
You have many possibilities there :
The exec plugin :
[[inputs.exec]]
commands = ["kafka-lag.sh mygroup broker 9092"]
timeout = "5s"
You can also make Telegraf listen line protocol metrics on a socket
Note: Telegraf has many networking options and protocols supported.
[[inputs.socket_listener]]
service_address = "tcp://:8094"
You would then update your bash to send the data to telegraf!
#!/bin/bash
group=$1
kafkaHost=$2
kafkaPort=$3
telegrafHost=$4
telegrafPort=$5
echo Fetching metrics for the ${group} group in ${kafkaHost}:${kafkaPort} and pushing the metrics into ${telegrafHost}:${telegrafPort}
while true
do
kafka-consumer-groups.sh --bootstrap-server ${kafkaHost}:${kafkaPort} --group ${group} --describe 2> /dev/null \
| tail -n +3 \
| awk -v GROUP=${group} '{print "kafka_group_lag,group="GROUP",topic="$1",partition="$2",host="$7" current_offset="$3"i,log_end_offset="$4"i,lag="$5"i"}' \
| nc ${telegrafHost} ${telegrafPort}
echo Sleeping for 10s
sleep 10s
done
The bash telemetry story
graph LR;
Telegraf -- write to --> Influxdb
Group-Kafka-Lag -- read group metrics --> Kafka
Group-Kafka-Lag -- send metrics to over TCP --> Telegraf
Run the demo docker-compose -f docker-compose-step9.yml up
You can now graph on slowness of consumers.
Let's rely on jdbranham-diagram-panel to show pretty diagram that will be live
For that we need to install a plugin, let's leverage the grafana GF_INSTALL_PLUGINS
environment variable
grafana:
image: grafana/grafana:4.6.1
ports:
- "3000:3000"
environment:
GF_INSTALL_PLUGINS: jdbranham-diagram-panel
depends_on:
- influxdb
- elasticsearch
Run the demo docker-compose -f docker-compose-step10.yml up
You can now create live diagrams !
Note: Todo
Leverage your sql databases in your grafana dashboards with http://docs.grafana.org/features/datasources/mysql/
You can consume your database changes and push them to kafka https://www.confluent.io/product/connectors/
Change Data Capture and Kafka connect Look at the ecosystem : https://www.confluent.io/product/connectors/
Note: Todo
Now that Kafka is the real bus of your architecture, you can leverage ksql declarative power such as
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
Note: Todo
Now that kafka, ksql, connect is driving many parts of your monitoring, you want to have a dedicated tool that will enrich your existing metrics/visualizations : https://www.confluent.io/product/control-center/
Note: Todo
Note: Todo
Note: Todo
Have a global overview of many clusters.
Note: Todo
Always a bit of a pain.