Introduction

This is the source code developed for the experiments of the FPCubes algorithm proposed in this paper:

Hamid Faidhsie, and Azadeh Soltani, "The Curse of Indecomposable Aggregates for Big Data Exploratory Analysis with A Case for Frequent Pattern Cubes", Journal of Supercomputing, Springer, DOI: 10.1007/s11227-019-03053-8

Frequent pattern cubes refer to structures and algorithms that facilitate exploratory frequent pattern analysis on multidimensional big datasets. For more information, please refer to the published paper.

Compiling

In order to compile the source code, simply use the make command:

make clean
make

Upon a successful build, you will have the fpc binary file which can be executed from the Linux command line. use the -h switch to see its possible command line arguments.

Data format

The fpc command expects the multidimensional input data to be in a specific format. The data should be stored inside a filesystem tree with folders named after dimensions and dimension values. At the leaf folder a transactions.csv should exist. For an example, please see the data/example folder:

tree data/example
data/example
└── device
    ├── 0
    │   └── gender
    │       ├── 0
    │       │   └── transactions.csv
    │       └── 1
    │           └── transactions.csv
    └── 1
        └── gender
            ├── 0
            │   └── transactions.csv
            └── 1
                └── transactions.csv

Dimension values should be zero-indexed integers. The transaction.csv files contain transactions of that dimension with items separated by commas. Items are also zero-indexed integers:

cat data/example/device/0/gender/0/transactions.csv
0 2 3 5
1 4 5
1 4
0 2 3 4
3 5

Datasets used in the paper can be obtained from their official sources cited in the paper. However, they should be processed and converted to the above format. I will not upload these large converted files here but you can find the Python scripts developed for the conversion in scripts folder.

Usage

the fpc command will preprocess (premine) the input dataset and when finished, starts accepting exploratory queries from standard input using a specific format. Please see query folder for some examples. You can give multiple queries by feeding multiple lines to the program. Some examples of executing this command are given in the following:

Example 1 - run a single query and print frequent patterns:

echo "/device/0/gender/0-1" | ./fpc -d data/example -m 0.27 -p
# premine: t=0.000060,m=12804096,ands=45
3 (8)
 5 (3)
4 (6)
2 (5)
 3 (5)
  4 (3)
 4 (3)
1 (4)
0 (3)

Example 2 - run a number of queries and print runtime statistics without printing frequent patterns:

cat query/example.txt | ./fpc -d data/example -m 0.27 -H -s
# premine: t=0.000060,m=12804096,ands=45
query,cuboids,time,memory,fpcnt,fplenavg,ands,trancnt,tranlenavg,digs,leafdigs
0,2,0.000023,12804096,11,1.545455,7,11,2.727273,14,7
1,2,0.000009,12804096,8,1.250000,4,10,2.500000,10,5

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bm		bm
concise		concise
data/example/device		data/example/device
ewah		ewah
query		query
roaring		roaring
script		script
Makefile		Makefile
README.md		README.md
bitset.c		bitset.c
bitset.h		bitset.h
cube.c		cube.c
cube.h		cube.h
eclat.c		eclat.c
eclat.h		eclat.h
fpc.c		fpc.c
itemset.c		itemset.c
itemset.h		itemset.h
itemtree.c		itemtree.c
itemtree.h		itemtree.h
lbound.h		lbound.h
util.c		util.c
util.h		util.h
wrapper.h		wrapper.h
wrapper_bm.cpp		wrapper_bm.cpp
wrapper_concise.cpp		wrapper_concise.cpp
wrapper_ewah.cpp		wrapper_ewah.cpp
wrapper_roaring.c		wrapper_roaring.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Compiling

Data format

Usage

About

Releases

Packages

Languages

fadishei/fpcubes

Folders and files

Latest commit

History

Repository files navigation

Introduction

Compiling

Data format

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages