Skip to content

Latest commit

 

History

History
 
 

anomaly-detection

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Event pattern detection with Apache Flink and Pravega

Sample application which simulates network anomaly intrusion and detection using Apache Flink and Apache Pravega.

This application is based on streaming-state-machine which is slightly extended to demonstrate Pravega/Flink integration capabilities.

Events in streams (generated by devices and services, such as firewalls, routers, authentication services etc.,) are expected to occur in certain patterns. Any deviation from these patterns indicates an anomaly (attempted intrusion) that the streaming system should recognize and that should trigger an alert.

The event patterns are tracked per interacting party (here simplified per source IP address) and are validated by a state machine. The state machine's states define what possible events may occur next, and what new states these events will result in.

The final aggregated results are grouped under network id which acts as a network domain abstraction hosting multiple server machines.

The aggregated results are sinked to Elastic Search for visualizing from the Kibana user interface.

The following diagram depicts the state machine used in this example.

           +--<a>--> W --<b>--> Y --<e>---+
           |                    ^         |     +-----<g>---> TERM
   INITIAL-+                    |         |     |
           |                    |         +--> (Z)---<f>-+
           +--<c>--> X --<b>----+         |     ^        |
                     |                    |     |        |
                     +--------<d>---------+     +--------+

Getting Started

Building the code (instructions in the parent README.md) provides two main artifacts for this sample:

  • A JAR of the Flink application that can be uploaded to Nautilus: anomaly-detection/build/libs/anomaly-detection-0.1.0-SNAPSHOT-all.jar
  • A distribution directory that includes everything (minus Java) needed to run this application outside of Flink: anomaly-detection/build/install/anomaly-detection/

Application Configuration

This application is configured via command line arguments using Flink's ParameterTool.

  • mode: The mode to the Flink application in. One of processor or producer. Defaults to processor.

  • controller: The pravega controller endpoint to connect to.

  • stream: The pravega stream to use (<scope>/<name>). If not specified, defaults to examples/NetworkPacket.

  • producer-latency: How frequently events are generated and published to Pravega.

  • producer-error-factor: How frequently error records needs to be simulated. A value between 0.0 and 1.0.

  • producer-controlled: When this is true, the value of producer-error-factor will be ignored and error records will be generated when you press ENTER in a console.

  • pipeline-checkpoint-interval: Flink checkpointing interval in milliseconds.

  • pipeline-disable-checkpoint: Disable Flink checkpointing.

  • pipeline-watermark-offset: Window watermark offset interval.

  • pipeline-window-interval: Window frequency interval in seconds

The default argument values are optimized to run in Nautilus. Find a full reference and the default values in PipelineRunner::parseConfigurations.

Running in Nautilus (from the UI)

To run this example in Nautilus, you will have to create two applications:

  1. The anomaly detector application.
  2. An event producer of simulated network .

The launch steps below assume that you already have:

  • A Nautilus project and you are ready to create your application.
  • Built the samples and have the application JAR, such as anomaly-detection/build/libs/anomaly-detection-0.1.0-SNAPSHOT-all.jar.

1) Launch the Anomaly Detector

The first step is the createa and launch the application in the processor mode.

  • From the Analytics menu in Nautilus, make sure to select your project from the drop down.
  • Choose the Apps sub-navigation item and click on the + icon to start creating a new application.
  • Give your application a name and description.
  • In the Source section, select Upload Artifact option.
  • Choose a Maven Coordinate for your artifact. For example:
    • Group: io.pravega
    • Artifact: anomaly-detection
    • Version: 0.1
  • Choose the application JAR to launch (anomaly-detection-0.1.0-SNAPSHOT-all.jar)
  • Click on Upload
  • In the Configuration section:
    • Add a Stream property. Its Key should be stream. Choose an existing Pravega stream or create a new one with the Create Stream link.
    • Add any other properties described in Application Configuration above to customize the behavior as you wish. The application will work without any other additional arguments.
  • Click on Launch.

Notes:

  • Nautilus will automatically convert application properties into the ParameterTool arguments described in the Application Configuration section.
  • Nautilus will set the controller argument to the pravega endpoint being run in Nautilus.
  • The parallelism option in Nautilus sets the default application parallelism. This sample will make use of that argument when building up the job graph.
  • This application is also responsible for setting up the elastic search index and visualization.

2) Launch the Event Producer

Now that you have a the detector running, we need to write simulated events to the pravega stream that is being processed. There are two options to run the producer:

  1. Run it in Flink (less work).
  2. Run it in controlled mode on one of the Nautilus nodes (more work).

2.1) Run Producer in Flink

  • From the Analytics menu in Nautilus, make sure to select your project from the drop down.
  • Choose the Apps sub-navigation item and click on the + icon to start creating a new application.
  • Give your application a different name and description from the processor.
  • In the Source section, select the same artifact that you uploaded in Step #1.
  • In the Configuration section:
    • Add a Stream property. Its Key should be stream. Choose the same stream that was chosen or created in Step #1.
    • Add a regular Property. Its Key should be mode and set the value to producer
    • Add any other properties described in Application Configuration above to customize the behavior as you wish. The application will work without any other additional arguments.
  • Click on Launch.

2.2) Run Producer in Controlled Mode

Controlled mode allows you to choose when to inject invalid events by pressing ENTER in a console.

To run this sample in controlled mode in Nautilus, you need to run the distribution (anomaly-detection/build/install/anomaly-detection/) from within the cluster. Any of the Nautilus nodes will do.

From within this distribution directory, launch the application by running this command:

./bin/anomaly-detection --mode producer --controller tcp://controller.pravega.l4lb.thisdcos.directory:9091 --scope <SCOPE> --stream <STREAM> --producer-controlled true

Leave the program running for a while, to generate a few events per second for a number of simulated network clients. Press ENTER to simulate invalid events or CTRL-C to exit the producer.

3) Profit: Visualize Results in Kibana

Now that you have a processor and producer running, from Nautilus you can go to Infrastructure => Kibana to visualize the results.

Once in Kibana, click on Visualize => Anomaly Demo. Final results:

Kibana Screenshot

Running in Nautilus (from the Command Line)

First you should install the nautilus command line from the UI by selecting the "Download CLI" option from the user menu.

Setup

Before you can do anything with the CLI you need to target your Nautilus cluster and log in.

nautilus target --insecure [IP]
nautilus login

You either need to create a new project or lookup the ID of an existing project. This can be done by one of these commands.

# Create the project
export PROJECT_ID=$(nautilus project create anomaly --cpus 5 --mem 7)

# If you project already exists, look it up
export PROJECT_ID=$(nautilus project lookup anomaly)

Wait for the project deployment to complete. This may take a few minutes.

Create and Launch the App

Upload the anomaly detection artifact:

nautilus artifact upload io.pravega:anomaly-detection:0.1 anomaly-detection/build/libs/anomaly-detection-0.1.0-SNAPSHOT-all.jar

Create and launch the processor app

export PROCESSOR_ID=$(nautilus app create anomaly-processor)
nautilus app launch --artifact io.pravega:anomaly-detection:0.1 $PROCESSOR_ID

Create and launch the producer app

export PRODUCER_ID=$(nautilus app create anomaly-producer)
nautilus app launch --artifact io.pravega:anomaly-detection:0.1 -p mode=producer $PRODUCER_ID