Skip to content

Beyond outliers and on to micro-clusters: Vision-guided Anomaly Detection

License

Notifications You must be signed in to change notification settings

wenchieh/eaglemine

Repository files navigation

EagleMine

Build Status Python 2.7 GitHub

Implementation of the Beyond outliers and on to micro-clusters: Vision-guided Anomaly Detection_ by Wenjie Feng et al.(2019).

Feng, W., Liu, S., Faloutsos, C., Hooi, B., Shen, H. and Cheng, X.
Beyond Outliers and on to Micro-clusters: Vision-Guided Anomaly Detection. 
In Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2019 (pp. 541-554). Springer, Cham.

About

EagleMine is a novel tree-based mining approach to recognize and summarize the micro-clusters in the heatmap. EagleMine:

  • automatic summarization: automatically summarizes the heatmap derived from correlated graph features, and recognizes node groups forming disjointed dense areas as human vision does;
  • effectiveness: detects interpretable groups, and outperforms the baselines, achieving better performance both in quantitative (i.e., code length for compact model description) and qualitative (i.e., consistent with vision-based judgment) comparisons;
  • anomaly detection: spots, and even explains anomalies on real data by identifying suspicious micro-clusters, and achieves higher accuracy compared with the state-of-the-art methods;
  • scalable: EagleMine is scalable, with nearly linear time complexity in the number of graph nodes, and can deal with more correlated features in multi-dimensional space.

Inspired by the mechanism of human vision and cognitive system, EagleMine detects and summarizes micro-clusters (dense blocks) in the heatmap with a hierarchical tree structure (WaterLevelTree), and reports the suspiciousness score of each micro-cluster (based on the deviation from the normal).

For the large graph, the heatmap can be constructed with correlated features of graph nodes, and the micro-clusters correspond to node groups, some of them deviating from the majority and contain anomaly / suspicious objects with high probability.

Correlated features of graph nodes can be: (in / out) Degree, # Triangle, PageRank, Hubness / Authority, Coreness, etc.


Datasets

The download links for the datasets used in the paper are online available.

  • Amazon ratings graph.
  • Android App rating graph.
  • Beer Advocate graph. [REGRET: No longer available as per request]
  • Tagged graph.
  • Yelp graph.
  • Youtube graph.

Datasets statistic information

Name Content Size #Edge Graph Download
Amazon User X Item X Ratings 2.14M X 1.23 M X 5 5.84M bipartite Link
Android User X Apps X Ratings 1.32M X 61.27K X 5 2.64M bipartite Link
BeerAdvocate User X Beer X Rating 33.37K X 65.91K X 4 1.57M bipartite -
Tagged User X User X Relation 2.73M X 4.65M 858M (partial) unipartite Link
Yelp User X Business X Rating 686K X 85.5K X 5 2.68M bipartite Link
Youtube User X User 3.22M X 3.22M 9.37M unipartite Link

Environment

Python 2.7 is the only supported in current version. (⚠)

To install required dependencies, please type

./install_libs.sh

Building and Running EagleMine

Please see User Guide


Running Demo

Graph Analysis

Demo for scratch graph features, please type

make

In briefly,

  • Firstly, we use run_graphfeature_histogram.py to generate histogram H for out-degree vs. hubness node features of the example graph G;
  • Then, EagleMine (run_eaglemine.py) takes the histogram H as input to construct WaterLevelTree T, identifies and summaries micro-clusters C in H, and also measures the suspiciousness of each element in C.
  • At last, we provide view tools in run_eaglemine_view.py to visualize the micro-cluster detection result and model description (with DTM Gaussian vocabulary) result for heatmap H;

To extract node correlated features for example graph, please type

make graph

run_graph_feature.py takes the edgelist-format directed graph as input and extracts out / in- degree and hubness / authoritaty features for each nodes. For the undirected unipartite graph (e.g. Youtube), it can extract degree and pagerank features.

Note: src/tools/graph.py provides interfaces for some common graph features. For the very large graph, user can use powerful graph analysis tools to extract features, like graphlab, and src/tools/large_graph.py gives a simple example.

Heatmap demo

Heatmap analysis demo, please type

./demo

EagleMine detects micro-clusters in given example heatmap (histogram)
please see the parameter explanation in run_eaglemine.py.


NOTE

👉 The interfaces in run_*.py contains detail information and parameter explanation.

  • run_eaglemine.py:
    EagleMine algorithm consists of WaterLevelTree, TreeExplore, and suspicious measure.
  • run_eaglemine_view.py:
    Visualization tools for the detection result and model description result of EagleMine
  • run_waterleveltree.py:
    WaterLevelTree algorithm, Construct raw-tree, and refine tree structure (contract, prune, and expand).
  • run_graph_feature.py:
    Simple tools for extracting correlated node features of graph. bipartite graph: out / in- degree and hubness / authoritaty;
    unipartite graph: degree and pagerank;
  • run_graphfeature_histogram.py
    Tools for constructing histogram for graph features.

Reference

If you use this code as part of any published research, please acknowledge the following papers.

@inproceedings{feng2019beyond,
  title={Beyond Outliers and on to Micro-clusters: Vision-Guided Anomaly Detection},
  author={Wenjie Feng, Shenghua Liu, Christos Faloutsos, Bryan Hooi, Huawei Shen, and Xueqi Cheng},
  booktitle={The 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining},
  pages={541--554},
  year={2019},
  organization={Springer}
}

About

Beyond outliers and on to micro-clusters: Vision-guided Anomaly Detection

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages