PotentialFinder - finding business potential in data lakes

this work was published in HICS 2021

Abstract

Finding exponential growth trends and exponential growth potential is key to success in businesses and startups. Building a product for a market that can grow exponentially would increase the likelihood of success. These growth potentials can be found in a variety of sectors. Different challenges lie ahead in terms of finding exponential patterns and trends. This paper deals with finding these exponential patterns in data lakes. It also proposes different algorithms that can scale up to petabytes of data which can come in different sizes and formats (tabular files). These algorithms can be key to pattern discovery in data lakes, ultimately empowering our search for growth opportunities.

Files Description

Step0_data_processing

The scripts in this file are used to preprocess data.

step0_kaggle_dataset_download

Downloads datasets from Kaggle datalake

step1_generate_header

Generates headerfile.txt from the dataset folder which will be used by splitter.

step2_sampling_localmachine

This code generates sample files in the local machine.

Step2_samling_map_reduce

Generates sample file from the map reduce.

satep3_splittling_partfiles_into_sample_files

Generates sample files from the mapreduce part files

step3_potentialfinder

Does preprocessing for the dataset and does exponential and logistic pattern fit

step5_potential_functions

It has support files for exponential and logistic fit and will be called by step3_potentialfinder

step6_plots_classification

Generates graphs and plot

Step7_creating_graphs

Generate graphs

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.idea		.idea
__pycache__		__pycache__
Extract Datasets.ipynb		Extract Datasets.ipynb
README.md		README.md
function.py		function.py
potentialFinder-2019-external-library.ipynb		potentialFinder-2019-external-library.ipynb
potentialFinder-2019-scientificplots.ipynb		potentialFinder-2019-scientificplots.ipynb
step0_data_processing.txt		step0_data_processing.txt
step0_kaggle_dataset_download.ipynb		step0_kaggle_dataset_download.ipynb
step1_generate_header.py		step1_generate_header.py
step2_sampling_localmachine.py		step2_sampling_localmachine.py
step2_sampling_map_reduce.py		step2_sampling_map_reduce.py
step3_splitting_partfiles_into_sample_files.py		step3_splitting_partfiles_into_sample_files.py
step5_potentialfinder.py		step5_potentialfinder.py
step5_potentialfinder_functions.py		step5_potentialfinder_functions.py
step6_plots_classification_barplot_generation.ipynb		step6_plots_classification_barplot_generation.ipynb
step7_creating_graphs.ipynb		step7_creating_graphs.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PotentialFinder - finding business potential in data lakes

Abstract

Files Description

Step0_data_processing

step0_kaggle_dataset_download

step1_generate_header

step2_sampling_localmachine

Step2_samling_map_reduce

satep3_splittling_partfiles_into_sample_files

step3_potentialfinder

step5_potential_functions

step6_plots_classification

Step7_creating_graphs

About

Releases

Packages

Contributors 4

Languages

DZRPT-Lab/PotentialFinder

Folders and files

Latest commit

History

Repository files navigation

PotentialFinder - finding business potential in data lakes

Abstract

Files Description

Step0_data_processing

step0_kaggle_dataset_download

step1_generate_header

step2_sampling_localmachine

Step2_samling_map_reduce

satep3_splittling_partfiles_into_sample_files

step3_potentialfinder

step5_potential_functions

step6_plots_classification

Step7_creating_graphs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages