Polygraph quantifies number of anomalies produced by an application.
Yazeed Alabdulkarim, Marwan Almaymoni, and Shahram Ghandeharizadeh
-
Quantifies the number of anomalies produced by an application.
-
Operates in either an online or an offline setting.
-
Input is log records stored in Kafka or a directory of files.
-
Consists of 3 sub-projects: Authoring, Validation, and Monitoring.
Polygraph is data store agnostic and produces no false positives when quantifying the number of anomalies produced by an application and all its components. The software and hardware limitations and design choices that produce anomalies are categorized into isolation, linearizability, and atomicity violation of transaction properties. See database lab technical report 2017-02 for details.
Polygraph consists of three distinct phases: Authoring, Validation, and Monitoring. During the authoring phase, the experimentalist employs Polygraph's visualization tool to identify entities and relationship sets of an application, and those actions that constitute each transaction and their referenced entity/relationship sets. An action may insert or delete one or more entities/relationships, and/or read or update one or more attribute values of an entity/relationship. For example, the payment transaction of the TPC-C benchmark reads and updates a customer entity.
Authoring produces the following three outputs. First, for each transaction, Polygraph generates a transaction specific code snippet to be embedded in each transaction. These snippets generate a log record for each executed transaction. They push log records to a distributed framework for processing by a cluster of Polygraph servers. Second, it generates a configuration file for Polygraph servers, customizing it to process the log records. Third, it identifies how the log records should be partitioned for parallel processing, including both Kafka topics and the partitioning attributes of the log records.
During the deployment phase, the experimentalist deploys (1) the Kafka brokers and Zookeepers and creates the topics provided as output of the Authoring step, (2) Polygraph servers using the configuration files provided as output of the Authoring step, and (3) the application whose transactions are extended with the code snippets as output of the Authoring step.
During the monitoring phase, the experimentalist uses Polygraph's monitoring tool to view those read transactions that produce anomalies. For each such transaction, Polygraph shows the transactions that read/ wrote the entities/relationships referenced by the violating transaction.
Below, we provide a description of how to deploy Authoring, Validation, and Monitoring components.
Polygraph Authoring generates code snippets to extend transactions to create and publish log records. It currently supports Java.
Getting Started
-
Download Polygraph Authoring code
-
Import as JavaEE project
-
Run the mainpage "app.jsp". This step may require configuring Tomcat server.
-
Specify the application name
-
Click on "Edit ER" to provide the application ER diagram
-
After completing, click on "Save ER"
-
Click on the Authoring link on the left corner to return to mainpage
-
Click on "Add Transaction" button to add more transactions
-
Specify the transaction name
-
Click on entity/relationship sets (from the ER canvas on the right) referenced by the transaction.
-
For each referenced entity/relationship set:
-
Specify the number of entities/relationships referenced
-
Specify the action type, such as read or update
-
Specify the variable name holding the primary key value
-
For each referenced property, specify the variable name holding its value and the action type
-
-
Click on "Save"
-
Click on the Authoring link on the left corner to return to mainpage
-
Click on "Generate Code" to generate the code snippets
-
Include the source files under "common" folder with the generated code snippets
Polygraph Validation component processes log records and has the following configuration parameters:
Parameter | Description | Example |
app | The application name which also represents the generated topic name in Kafka | -app TPCC |
er | The entity/relationship sets json file which describes the entities/relationships of the application | -er /home/yaz/erFile |
numvalidators | The total number of validation threads across all Polygraph servers | -numvalidators 3 |
numpartitions | The number of partitions for the topic | -numpartitions 10 |
numclients | Number of Polygraph servers | -numclients 1 |
clientid | The id of this Polygraph server [0 to numclients-1] | -clientid 0 |
kafka | Boolean flag which configures Polygraph to process log records from Kafka if set to true, or from a directory of files if set to false | -kafka true |
filelogdir | The directory of the log records to be processed (kafka parameter must be set to false). | -filelogdir /home/yaz/dir |
online | Boolean flag which configures Polygraph servers to online mode if sets to true (kafka parameter must be set to true)or offline if sets to false. | -online true |
printfreq | The frequency of printing stats message every this number of reads | -printfreq 1000 |
kafkahosts | Kafka host string: IP1:port1,IP2:port2,... | -kafkahost 10.0.0.127:9298 |
zookhosts | Zookeperhost string: IP1:port1,IP2:port2,... | -zookhost 10.0.0.127:2128 |
buffer | The number of log records to be buffered in memory from each Kafka partition (kafka parameter must be set to true) | -buffer 100 |
freshness | Boolean flag which configures Polygraph servers to compute freshness confidence | -freshness true |
Getting Started
-
Download Polygraph Validation code
-
Import the project as Java project
-
To use Polygraph with Kafka:
-
Launch Kafka brokers (see Kafka website)
-
Create the application topic with the desired number of partitions. The number of partitions must be equivalent to numpartitions (the configuration parameter of Polygraph) * 2
-
-
Launch your application extended with code snippets to generate log records.
-
Launch Polygraph "ValidationMain" class configuring the above parameters. For example, you may use the following parameters:
-numvalidators 1 -app tpcc -printfreq 1000 -buffer 5000 -numpartitions 1 -numclients 1 -clientid 0 -kafkahosts 127.0.0.1:9298 - zookhosts 127.0.0.1:2128 -er erFilePath -kafka true
Polygraph Monitoring visualizes anomalous transactions.
Getting Started
-
Download Polygraph Monitoring code
-
Import as JavaEE project
-
Run the mainpage "visual.jsp". This step may require configuring Tomcat server.
-
Specify the topic name, Kafka and Zookeeper hosts
-
Click on "visualize"
-
It is going to show anomalies based on referenced entities/relationships on the left panel and based on transactions on the right panel. The freshness confidence graph is shown at the bottom if computed by Polygraph servers.
-
Clicking on an entity/relationship or a transaction name show the anomalies referencing it.
-
Click on the transaction id on the left panel to visualize the anomaly.