CO2gather: Automatic CO2 Saving Incentive Manegement Service

[2022 KPMG Ideathon Challenge] Pink Cow Lab. - "CO2gather"

Overview

This is an API which gets core brandname from random merchant name of credit card log. We're gonna use this API to automatically calculate people's available incentive based on their credit card log.

API Base URL (Only during the Competition)

http://20.194.102.148:3000/api/투썸플레이스신촌

You can put the string that you wanna know the core brandname instead of '투썸플레이스신촌'.

Algorithm Description

As you can seen in flow chart, the solution learns brand name datasets using make_scores.py. The 'learn' in our solution means the process computing 'word scores'. Word scores can be defined freely considering characteristics of solution and datasets, we defined our word scores as cohesion probability.

Cohesion Probability: Prefix(cumulative) product of the probablities of meeting following characters in given string, in the process of making inner string from seperating each characters from a string.

After the learning, string_processing.py script gets a random string and it extracts core brand name. I implemented extracting using tokenizing. The script tokenizes given string based on cohesion probablities, and returns decreasingly sorted list based on word scores. At the function call, you can get core brand name by accessing first element of the list.

Since the word score is only based on cohesion probability, the probability that token that has the highest word score and core brand name are the same is significantly high. From data_lab.py script, the accuracy of the current solution was 80.501393%.

Local Requirements

Node.js, npm (Uses express.js and python-shell but it considers dependencies)

Python 3, Some Python packages:

$ pip install numpy
$ pip install psutil
$ pip install scipy
$ pip install scikit-learn
$ pip install soynlp
$ pip install openpyxl
$ pip install pandas

How to Run

Clone this repository.
Satisfy every requirements.
Go to local repository folder, run:
```
$ cd myapp
$ nohup npm start &
```

How to Put Datasets

Datasets: List of brandnames that you wanna detect in random strings

Make Excel(.xlsx) file that containing data of brandnames. (Make it into same format with previous files!)
Put it to /core/test_data
Add that filename to the list 'file_list' in make_scores.py like below(Do not include .xlsx into it!):
```
file_list = ['purchase_data_7Feb22', 'YOUR NEW FILE NAME']
```
Run /core/make_scores.py once, then it makes word scores. (You don't need to restart the server.)

It saves the dictionary data into a file(scores_dict.pkl). When the API starts the process, it gets word scores from the pickle file. It means, the new brandnames are not gonna be considered unless you run make_scores.py even if you put the new dataset files.

File Description

api.py: Script that Python shell runs
data_lab.py: Measures accuracy of solution
make_scores.py: Considers every datasets and computes cohesion scores - learning
scores_dict.pkl: Data file containing cohesion scores
read_data.py: Script that reads datasets
string_processing.py: Script that tokenizes given string and extracts core brand name

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
core		core
myapp		myapp
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CO2gather: Automatic CO2 Saving Incentive Manegement Service

Overview

API Base URL (Only during the Competition)

Algorithm Description

Local Requirements

How to Run

How to Put Datasets

File Description

About

Releases

Packages

Languages

License

younghanstark/2022_kpmg_pinkcowlab

Folders and files

Latest commit

History

Repository files navigation

CO2gather: Automatic CO2 Saving Incentive Manegement Service

Overview

API Base URL (Only during the Competition)

Algorithm Description

Local Requirements

How to Run

How to Put Datasets

File Description

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages