Overview | Quick Start | Concepts | Syntax Reference | Demo | Examples | FAQ | Roadmap |
---|
These steps will guide you through how to setup your environment and run the clickstream analysis demo. For instructions using Docker, see this documentation.
- Confluent 3.3.0 locally
- Make sure you only have one broker running on the host
- ElasticSearch
- Grafana
- Git
- Maven
- Java: Minimum version 1.8. Install Oracle Java JRE or JDK >= 1.8 on your local machine
-
Clone the Confluent KSQL repository.
$ git clone [email protected]:confluentinc/ksql.git
-
Change directory to the
ksql
directory and compile the KSQL code.$ cd ksql $ mvn clean compile install -DskipTests
-
Copy the Kafka Connect Elasticsearch configuration file (
ksql/ksql-clickstream-demo/demo/connect-config/null-filter-4.0.0-SNAPSHOT.jar
) to your Confluent installationshare
directory (confluent-3.3.0/share/java/kafka-connect-elasticsearch/
).cp ksql-clickstream-demo/demo/connect-config/null-filter-4.0.0-SNAPSHOT.jar <path-to-confluent-3.3.0>/share/java/kafka-connect-elasticsearch/
-
From your terminal, start the Confluent Platform. It should be running on default port 8083.
$ <path-to-confluent-3.3.0>/bin/confluent start
The output should resemble:
Starting zookeeper zookeeper is [UP] Starting kafka kafka is [UP] Starting schema-registry schema-registry is [UP] Starting kafka-rest kafka-rest is [UP] Starting connect connect is [UP]
-
From your terminal, start the Elastic and Grafana servers. ElasticSearch should be running on the default port 9200. Grafana should be running on the default port 3000.
-
From your terminal, create the clickStream data using the ksql-datagen utility. This stream will run continuously until you terminate.
Tip: Because of shell redirection, this command does not print a newline and so it might look like it's still in the foreground. The process is running as a daemon, so just press return again to see the shell prompt.
$ <path-to-ksql>/bin/ksql-datagen -daemon quickstart=clickstream format=json topic=clickstream maxInterval=100 iterations=500000
Your output should resemble:
Writing console output to /tmp/ksql-logs/ksql.out
-
From your terminal, create the status codes using the ksql-datagen utility. This stream runs once to populate the table.
$ <path-to-ksql>/bin/ksql-datagen quickstart=clickstream_codes format=json topic=clickstream_codes maxInterval=20 iterations=100
Your output should resemble:
200 --> ([ 200 | 'Successful' ]) 302 --> ([ 302 | 'Redirect' ]) 200 --> ([ 200 | 'Successful' ]) 406 --> ([ 406 | 'Not acceptable' ]) ...
-
From your terminal, create a set of users using ksql-datagen utility. This stream runs once to populate the table.
$ <path-to-ksql>/bin/ksql-datagen quickstart=clickstream_users format=json topic=clickstream_users maxInterval=10 iterations=1000
Your output should resemble:
1 --> ([ 1 | 'GlenAlan_23344' | 1424796387808 | 'Curran' | 'Lalonde' | 'Palo Alto' | 'Gold' ]) 2 --> ([ 2 | 'ArlyneW8ter' | 1433932319457 | 'Oriana' | 'Vanyard' | 'London' | 'Platinum' ]) 3 --> ([ 3 | 'akatz1022' | 1478233258664 | 'Ferd' | 'Trice' | 'Palo Alto' | 'Platinum' ]) ...
-
Launch the KSQL CLI in local mode.
$ <path-to-ksql>/bin/ksql-cli local
You should see the KSQL CLI welcome screen.
====================================== = _ __ _____ ____ _ = = | |/ // ____|/ __ \| | = = | ' /| (___ | | | | | = = | < \___ \| | | | | = = | . \ ____) | |__| | |____ = = |_|\_\_____/ \___\_\______| = = = = Streaming SQL Engine for Kafka = Copyright 2017 Confluent Inc. CLI v0.1, Server v0.1 located at http://localhost:9098 Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! ksql>
-
From the the KSQL CLI, load the
clickstream.sql
schema file that will run the demo app.Important: Before running this step, you must have already run ksql-datagen utility to create the clickstream data, status codes, and set of users.
ksql> run script 'ksql-clickstream-demo/demo/clickstream-schema.sql';
The output should resemble:
Message ------------------------------------ Executing statement ksql>
-
From the the KSQL CLI, verify that the tables are created.
ksql> list TABLES;
Your output should resemble:
Table Name | Kafka Topic | Format | Windowed ----------------------------------------------------------------------------- WEB_USERS | clickstream_users | JSON | false ERRORS_PER_MIN_ALERT | ERRORS_PER_MIN_ALERT | JSON | true CLICKSTREAM_CODES_TS | CLICKSTREAM_CODES_TS | JSON | false USER_IP_ACTIVITY | USER_IP_ACTIVITY | JSON | true CLICKSTREAM_CODES | clickstream_codes | JSON | false PAGES_PER_MIN | PAGES_PER_MIN | JSON | true CLICK_USER_SESSIONS | CLICK_USER_SESSIONS | JSON | true ENRICHED_ERROR_CODES_COUNT | ENRICHED_ERROR_CODES_COUNT | JSON | true EVENTS_PER_MIN_MAX_AVG | EVENTS_PER_MIN_MAX_AVG | JSON | true ERRORS_PER_MIN | ERRORS_PER_MIN | JSON | true EVENTS_PER_MIN | EVENTS_PER_MIN | JSON | true
-
From the the KSQL CLI, verify that the streams are created.
ksql> list STREAMS;
Your output should resemble:
Stream Name | Kafka Topic | Format ---------------------------------------------------------------- USER_CLICKSTREAM | USER_CLICKSTREAM | JSON EVENTS_PER_MIN_MAX_AVG_TS | EVENTS_PER_MIN_MAX_AVG_TS | JSON ERRORS_PER_MIN_TS | ERRORS_PER_MIN_TS | JSON EVENTS_PER_MIN_TS | EVENTS_PER_MIN_TS | JSON ENRICHED_ERROR_CODES | ENRICHED_ERROR_CODES | JSON ERRORS_PER_MIN_ALERT_TS | ERRORS_PER_MIN_ALERT_TS | JSON CLICK_USER_SESSIONS_TS | CLICK_USER_SESSIONS_TS | JSON PAGES_PER_MIN_TS | PAGES_PER_MIN_TS | JSON ENRICHED_ERROR_CODES_TS | ENRICHED_ERROR_CODES_TS | JSON USER_IP_ACTIVITY_TS | USER_IP_ACTIVITY_TS | JSON CUSTOMER_CLICKSTREAM | CUSTOMER_CLICKSTREAM | JSON CLICKSTREAM | clickstream | JSON
-
From the the KSQL CLI, verify that data is being streamed through various tables and streams.
View clickstream data
ksql> SELECT * FROM CLICKSTREAM LIMIT 5;
Your output should resemble:
1503585407989 | 222.245.174.248 | 1503585407989 | 24/Aug/2017:07:36:47 -0700 | 233.90.225.227 | GET /site/login.html HTTP/1.1 | 407 | 19 | 4096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 1503585407999 | 233.168.257.122 | 1503585407999 | 24/Aug/2017:07:36:47 -0700 | 233.173.215.103 | GET /site/user_status.html HTTP/1.1 | 200 | 15 | 14096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 1503585408009 | 222.168.57.122 | 1503585408009 | 24/Aug/2017:07:36:48 -0700 | 111.249.79.93 | GET /images/track.png HTTP/1.1 | 406 | 22 | 4096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 1503585408019 | 122.145.8.244 | 1503585408019 | 24/Aug/2017:07:36:48 -0700 | 122.249.79.233 | GET /site/user_status.html HTTP/1.1 | 404 | 6 | 4006 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 1503585408029 | 222.152.45.45 | 1503585408029 | 24/Aug/2017:07:36:48 -0700 | 222.249.79.93 | GET /images/track.png HTTP/1.1 | 200 | 29 | 14096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 LIMIT reached for the partition. Query terminated
View the events per minute
ksql> SELECT * FROM EVENTS_PER_MIN_TS LIMIT 5;
Your output should resemble:
1503585450000 | 29^�8 | 1503585450000 | 29 | 19 1503585450000 | 37^�8 | 1503585450000 | 37 | 25 1503585450000 | 8^�8 | 1503585450000 | 8 | 35 1503585450000 | 36^�8 | 1503585450000 | 36 | 14 1503585450000 | 24^�8 | 1503585450000 | 24 | 22 LIMIT reached for the partition. Query terminated
View pages per minute
ksql> SELECT * FROM PAGES_PER_MIN LIMIT 5;
Your output should resemble:
1503585475000 | 4 : Window{start=1503585475000 end=-} | 4 | 14 1503585480000 | 25 : Window{start=1503585480000 end=-} | 25 | 9 1503585480000 | 16 : Window{start=1503585480000 end=-} | 16 | 6 1503585475000 | 25 : Window{start=1503585475000 end=-} | 25 | 20 1503585480000 | 37 : Window{start=1503585480000 end=-} | 37 | 6 LIMIT reached for the partition. Query terminated
-
Go to your terminal and send the KSQL tables to Elasticsearch and Grafana.
-
From your terminal, navigate to the demo directory:
cd ksql-clickstream-demo/demo/
-
Run this command to send the KSQL tables to Elasticsearch and Grafana:
$ ./ksql-tables-to-grafana.sh
Your output should resemble:
Loading Clickstream-Demo TABLES to Confluent-Connect => Elastic => Grafana datasource Logging to: /tmp/ksql-connect.log Charting CLICK_USER_SESSIONS_TS Charting USER_IP_ACTIVITY_TS Charting CLICKSTREAM_STATUS_CODES_TS Charting ENRICHED_ERROR_CODES_TS Charting ERRORS_PER_MIN_ALERT_TS Charting ERRORS_PER_MIN_TS Charting EVENTS_PER_MIN_MAX_AVG_TS Charting EVENTS_PER_MIN_TS Charting PAGES_PER_MIN_TS Navigate to http://localhost:3000/dashboard/db/click-stream-analysis
Important: The
http://localhost:3000/
URL is only available inside the container. We will access the dashboard with a slightly different URL, after running the next command. -
From your terminal, load the dashboard into Grafana.
$ ./clickstream-analysis-dashboard.sh
Your output should resemble:
Loading Grafana ClickStream Dashboard {"slug":"click-stream-analysis","status":"success","version":1}
-
-
Go to your browser and view the Grafana output at http://localhost:3000/dashboard/db/click-stream-analysis. You can login with user ID
admin
and passwordadmin
.Important: If you already have Grafana UI open, you may need to enter the specific clickstream URL: http://localhost:3000/dashboard/db/click-stream-analysis.
Interesting things to try:
- Understand how the
clickstream-schema.sql
file is structured. We use a DataGen.KafkaTopic.clickstream -> Stream -> Table (for window & analytics with group-by) -> Table (to Add EVENT_TS for time-index) -> ElasticSearch/Connect topic - Run the
LIST TOPICS;
command to see where data is persisted - Run the KSQL CLI
history
command
- Docker must not be running on the host machine.
- Check that Elasticsearch is running: http://localhost:9200/.
- Check the Data Sources page in Grafana.
- If your data source is shown, select it and scroll to the bottom and click the Save & Test button. This will indicate whether your data source is valid.
- If your data source is not shown, go to
/ksql/ksql-clickstream-demo/demo/
and run./ksql-tables-to-grafana.sh
.