Implementation of the Beyond outliers and on to micro-clusters: Vision-guided Anomaly Detection_ by Wenjie Feng et al.(2019).
Feng, W., Liu, S., Faloutsos, C., Hooi, B., Shen, H. and Cheng, X.
Beyond Outliers and on to Micro-clusters: Vision-Guided Anomaly Detection.
In Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2019 (pp. 541-554). Springer, Cham.
EagleMine is a novel tree-based mining approach to recognize and summarize the micro-clusters in the heatmap. EagleMine:
- automatic summarization: automatically summarizes the heatmap derived from correlated graph features, and recognizes node groups forming disjointed dense areas as human vision does;
- effectiveness: detects interpretable groups, and outperforms the baselines, achieving better performance both in quantitative (i.e., code length for compact model description) and qualitative (i.e., consistent with vision-based judgment) comparisons;
- anomaly detection: spots, and even explains anomalies on real data by identifying suspicious micro-clusters, and achieves higher accuracy compared with the state-of-the-art methods;
- scalable: EagleMine is scalable, with nearly linear time complexity in the number of graph nodes, and can deal with more correlated features in multi-dimensional space.
Inspired by the mechanism of human vision and cognitive system, EagleMine detects and summarizes micro-clusters (dense blocks) in the heatmap with a hierarchical tree structure (WaterLevelTree), and reports the suspiciousness score of each micro-cluster (based on the deviation from the normal).
For the large graph, the heatmap can be constructed with correlated features of graph nodes, and the micro-clusters correspond to node groups, some of them deviating from the majority and contain anomaly / suspicious objects with high probability.
Correlated features of graph nodes can be: (in / out) Degree, # Triangle, PageRank, Hubness / Authority, Coreness, etc.
The download links for the datasets used in the paper are online available.
- Amazon ratings graph.
- Android App rating graph.
- Beer Advocate graph. [REGRET: No longer available as per request]
- Tagged graph.
- Yelp graph.
- Youtube graph.
Name | Content | Size | #Edge | Graph | Download |
---|---|---|---|---|---|
Amazon | User X Item X Ratings | 2.14M X 1.23 M X 5 | 5.84M | bipartite | Link |
Android | User X Apps X Ratings | 1.32M X 61.27K X 5 | 2.64M | bipartite | Link |
BeerAdvocate | User X Beer X Rating | 33.37K X 65.91K X 4 | 1.57M | bipartite | - |
Tagged | User X User X Relation | 2.73M X 4.65M | 858M (partial) | unipartite | Link |
Yelp | User X Business X Rating | 686K X 85.5K X 5 | 2.68M | bipartite | Link |
Youtube | User X User | 3.22M X 3.22M | 9.37M | unipartite | Link |
Python 2.7 is the only supported in current version. (⚠)
To install required dependencies, please type
./install_libs.sh
Please see User Guide
Demo for scratch graph features, please type
make
In briefly,
- Firstly, we use
run_graphfeature_histogram.py
to generate histogram H for out-degree vs. hubness node features of the example graph G; - Then, EagleMine (
run_eaglemine.py
) takes the histogram H as input to construct WaterLevelTree T, identifies and summaries micro-clusters C in H, and also measures the suspiciousness of each element in C. - At last, we provide view tools in
run_eaglemine_view.py
to visualize the micro-cluster detection result and model description (with DTM Gaussian vocabulary) result for heatmap H;
To extract node correlated features for example graph, please type
make graph
run_graph_feature.py
takes the edgelist-format directed graph as input and extracts out / in- degree and
hubness / authoritaty features for each nodes.
For the undirected unipartite graph (e.g. Youtube), it can extract degree and pagerank features.
Note: src/tools/graph.py
provides interfaces for some common graph features.
For the very large graph, user can use powerful graph analysis tools to extract features,
like graphlab, and src/tools/large_graph.py
gives a simple example.
Heatmap analysis demo, please type
./demo
EagleMine detects micro-clusters in given example heatmap (histogram)
please see the parameter explanation inrun_eaglemine.py
.
👉 The interfaces in run_*.py
contains detail information and parameter explanation.
run_eaglemine.py
:
EagleMine algorithm consists of WaterLevelTree, TreeExplore, and suspicious measure.run_eaglemine_view.py
:
Visualization tools for the detection result and model description result of EagleMinerun_waterleveltree.py
:
WaterLevelTree algorithm, Construct raw-tree, and refine tree structure (contract, prune, and expand).run_graph_feature.py
:
Simple tools for extracting correlated node features of graph. bipartite graph: out / in- degree and hubness / authoritaty;
unipartite graph: degree and pagerank;run_graphfeature_histogram.py
Tools for constructing histogram for graph features.
If you use this code as part of any published research, please acknowledge the following papers.
@inproceedings{feng2019beyond,
title={Beyond Outliers and on to Micro-clusters: Vision-Guided Anomaly Detection},
author={Wenjie Feng, Shenghua Liu, Christos Faloutsos, Bryan Hooi, Huawei Shen, and Xueqi Cheng},
booktitle={The 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining},
pages={541--554},
year={2019},
organization={Springer}
}