Skip to content

๐Ÿท๐Ÿฎ Woochuri daily sale prediction project using machine learning

Notifications You must be signed in to change notification settings

young-hun-jo/WoochuriService

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

86 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฎ ์šฐ์ถ”๋ฆฌ ์ถ•์‚ฐ ์ผ์ผ ๋งค์ถœ ์˜ˆ์ธก ํ”„๋กœ์ ํŠธ ๐Ÿท

๐ŸŽฏ ํ”„๋กœ์ ํŠธ ๋ชฉ์ 

  • ๋Œ€์ „๊ด‘์—ญ์‹œ ์„œ๊ตฌ ๋„ํ™”๊ณต์›๊ธธ 21์— ์œ„์น˜ํ•œ ์šฐ์ถ”๋ฆฌ ์ถ•์‚ฐ
  • 2020๋…„ 2์›” ์ดํ›„๋กœ ์„ธ๊ณ„์ ์œผ๋กœ ํ™•์‚ฐ๋œ ์ฝ”๋กœ๋‚˜ ์‹ ์ข… ๋ฐ”์ด๋Ÿฌ์Šค๋กœ ์ธํ•ด ๋งค์ถœ์˜ ํƒ€๊ฒฉ์— ํฐ ์˜ํ–ฅ์„ ์ž…๊ฒŒ ๋œ ์ƒํƒœ
  • ๋จธ์‹  ๋Ÿฌ๋‹์„ ํ™œ์šฉํ•˜์—ฌ ์ผ์ผ ๋งค์ถœ์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์˜ˆ์ธกํ•ด ์ •์œก ์žฌ๊ณ  ๊ด€๋ฆฌ๋ฅผ ์ตœ์ ํ™”ํ•˜๊ณ  ์ตœ์ข…์ ์œผ๋กœ ์šด์˜ ๋น„์šฉ ์ตœ์†Œํ™” ๋ชฉ์ 

๐Ÿ“‹ ๋ฐ์ดํ„ฐ ๋ช…์„ธ์„œ

  • ์šฐ์ถ”๋ฆฌ ์ถ•์‚ฐ ์ผ ๋งค์ถœ ๋ฐ์ดํ„ฐ

    • ์˜์—… ์˜คํ”ˆ์ผ 2009-01-01 ~ ํ˜„์žฌ ๊นŒ์ง€์˜ ์ผ ๋งค์ถœ ๋ฐ์ดํ„ฐ
    • Excel ํ™œ์šฉํ•ด ์ˆ˜๊ธฐ๋กœ ์ง์ ‘ ์ˆ˜์ง‘
  • ์ง€์ƒ(์ข…๊ด€, ASOS) ์ผ์ž๋ฃŒ ์กฐํšŒ ์„œ๋น„์Šค

    • ๊ณต๊ณต ๋ฐ์ดํ„ฐ Open API
    • ํ‰๊ท ๊ธฐ์˜จ
    • ์ตœ์ €๊ธฐ์˜จ
    • ์ตœ๊ณ ๊ธฐ์˜จ
    • 1์‹œ๊ฐ„ ์ตœ๋‹ค๊ฐ•์ˆ˜๋Ÿ‰
    • ์ผ ๊ฐ•์ˆ˜๋Ÿ‰
    • ํ‰๊ท ํ’์†
    • ์ตœ๋Œ€ํ’์†
    • ํ‰๊ท ์ƒ๋Œ€์Šต๋„
    • ์ตœ์†Œ์ƒ๋Œ€์Šต๋„
    • 1์‹œ๊ฐ„ ์ตœ๋‹ค์ผ์‚ฌ๋Ÿ‰
    • ์ผ์‚ฌ๋Ÿ‰
  • ์ถ•์‚ฐ๋ฌผ๋“ฑ๊ธ‰ํŒ์ • ์„œ๋น„์Šค

    • ๊ณต๊ณต ๋ฐ์ดํ„ฐ Open API
    • ํ•œ์šฐ์™€ ์œก์šฐ ๋„๋งค ๊ฐ€๊ฒฉ
      • 2009-01-01~2011-03-02 ๊นŒ์ง€๋Š” ์ „๊ตญ ๋„๋งค ๊ฐ€๊ฒฉ
      • 2011-03-03~ํ˜„์žฌ ๊นŒ์ง€๋Š” ์ค‘๋ถ€๊ถŒ ๋„๋งค ๊ฐ€๊ฒฉ(๋Œ€์ „์ด ์ค‘๋ถ€๊ถŒ์— ์†ํ•จ)
    • ๋ผ์ง€ ํƒ•๋ฐ• ๋„๋งค ๊ฐ€๊ฒฉ

๐Ÿ›  ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

  • ๊ฒฐ์ธก์น˜

    • ์ง€์ƒ(์ข…๊ด€, ASOS) ์ผ์ž๋ฃŒ ๋ฐ์ดํ„ฐ
      • ํ•ด๋‹น ๋ฐ์ดํ„ฐ ๋ณ€์ˆ˜๋“ค์€ ์„œ๋กœ ์ƒ๊ด€์„ฑ์ด ๋†’๊ธฐ ๋•Œ๋ฌธ์— Pearson Correlation์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜์—ฌ KNN(K-Nearest-Neighbors) Imputer ์‚ฌ์šฉ
  • ์ด์ƒ์น˜

    • ํ•œ์šฐ์™€ ์œก์šฐ ๋„๋งค ๊ฐ€๊ฒฉ
      • ํ•œ์šฐ ๊ฐ€๊ฒฉ์€ ์œก์šฐ ๊ฐ€๊ฒฉ์˜ ์•ฝ 1.8๋ฐฐ๋กœ ์ฑ…์ •. ์ด๋ฅผ ์ด์šฉํ•ด ๋กœ์ง ๊ตฌํ˜„
      • ํ•œ์šฐ ๊ฐ€๊ฒฉ์ด ์ž˜๋ชป ์ฑ…์ •๋œ ๋‚ ์งœ์ผ ๊ฒฝ์šฐ โžก๏ธ ํ•ด๋‹น ๋‚ ์งœ์˜ ์œก์šฐ๊ฐ€๊ฒฉ์„ ์ด์šฉํ•ด ์ด์ƒ์น˜ ๋Œ€์ฒด
      • ์œก์šฐ ๊ฐ€๊ฒฉ์ด ์ž˜๋ชป ์ฑ…์ •๋œ ๋‚ ์งœ์ผ ๊ฒฝ์šฐ โžก๏ธ ํ•ด๋‹น ๋‚ ์งœ์˜ ํ•œ์šฐ๊ฐ€๊ฒฉ์„ ์ด์šฉํ•ด ์ด์ƒ์น˜ ๋Œ€์ฒด
  • ๋ช…์ ˆ, ๊ณตํœด์ผ ํŒŒ์ƒ๋ณ€์ˆ˜ ์ƒ์„ฑ

  • ์„ค, ์ถ”์„ ๋ช…์ ˆ

    • ์šฐ์ถ”๋ฆฌ ์ถ•์‚ฐ์€ ํ•ญ์ƒ ๋ช…์ ˆ ๋‹น์ผ ์ง์ „๋‚ ๊นŒ์ง€ ์˜์—… ๊ฒŒ์‹œ
    • EDA ๊ฒฐ๊ณผ, ๋ช…์ ˆ ์ด๋ฒคํŠธ๋กœ ์ธํ•ด ๋ช…์ ˆ ๋‹น์ผ ์ง์ „๋‚ ๋กœ๋ถ€ํ„ฐ ๊ณผ๊ฑฐ 6์ผ๊ฐ„ ๋งค์ถœ์ด ํ‰์†Œ์™€ ๋‹ค๋ฅด๊ฒŒ ๋งค์šฐ ๋†’์€ ๊ฒƒ์œผ๋กœ ๊ด€์ฐฐ
    • holidays ์˜คํ”ˆ์†Œ์Šค๋ฅผ ์ด์šฉํ•ด ๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ๋ช…์ ˆ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฏธ๋ฆฌ ๋กœ๋“œํ•˜๊ณ  ํ•ด๋‹น ๋‚ ์งœ๋กœ๋ถ€ํ„ฐ ๋ฏธ๋ž˜ 6์ผ ์ด๋‚ด์— ๋ช…์ ˆ ๋‹น์ผ ์ง์ „๋‚ ์ด ์กด์žฌํ•˜๋ฉด ๊ฐ€์ค‘์น˜๋ฅผ 1๋ถ€ํ„ฐ 6๊นŒ์ง€ ์ฐจ๋“ฑ์ ์œผ๋กœ ๋ถ€์—ฌ
      • ์œ„์™€ ๊ฐ™์€ ๋กœ์ง์„ ์‚ฌ์šฉํ•ด ์‹ค์‹œ๊ฐ„ ๋‚ ์งœ์— ๋ช…์ ˆ ์—ฐํœด ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•  ์ˆ˜ ์žˆ์Œ
  • ์ผ๋ฐ˜ ๊ณตํœด์ผ

    • ์–ด๋ฆฐ์ด๋‚ , ์„๊ฐ€ํƒ„์‹ ์ผ ๋“ฑ๊ณผ ๊ฐ™์ด ์ผ๋ฐ˜ ๊ณตํœด์ผ์—๋„ ํ‰์†Œ์™€ ๋‹ค๋ฅด๊ฒŒ ๋†’์€ ๋งค์ถœ์ด ์ง‘๊ณ„
    • ๋ช…์ ˆ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋™์ผํ•œ ๋กœ์ง ๊ตฌํ˜„

๐Ÿฆพ ์˜ˆ์ธก ๋ชจ๋ธ ์„ฑ๋Šฅ ๋น„๊ต

  • Train : 2009-01-01 ~ 2020-05-03
  • Validation : 2020-05-04 ~ 2021-05-04(1์ผ์”ฉ ๊ต์ฐจ๊ฒ€์ฆ ์ˆ˜ํ–‰)
  • MAPE: ์˜ˆ์ธก๊ฐ’์ด ์‹ค์ œ๊ฐ’๊ณผ์˜ ์ฐจ์ด๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ฐจ์ด๋‚˜๋Š”์ง€์— ๋Œ€ํ•œ ๋น„์œจ
Model Train MAPE Test MAPE
Prophet 51.80% 41.75%
ARIMA 63.25% 67.43%
Linear Regression 43.51% 52.20%
Polynomial Linear Regression(2 degree) 41.02% 49.41%
PLS Regression 44.68% 62.21%
Random Forest 11.10% 34.32%
XGBoost 28.41% 33.51%
LightGBM 22.19% 34.22%
LightGBM(PCA๋กœ ์ฐจ์›์ถ•์†Œ ) 25.50% 37.70%
Hybrid Voting(Random Forest+XGBoost+LightGBM) 20.20% 33.35%
LSTM(with Convolution) 37.31% 42.95%

๐Ÿ’ก ์ตœ์ข… ๋ชจ๋ธ : Random Forest Regressor
๐Ÿ’ก Optimal Hyper-parameter : n_estimators=100, min_samples_split=2
๐Ÿ’ก ์•ž์œผ๋กœ ์ผ์ผ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ„์† ์ˆ˜์ง‘ ํ›„ ํ•™์Šตํ•  ๊ฒƒ์ด๋ฏ€๋กœ Train MAE๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์€ Random Forest ์„ ์ •

๐Ÿงท ์ตœ์ข… ๋ชจ๋ธ ์˜ˆ์ธก ๊ทธ๋ž˜ํ”„(2009-01-01 ~ 2021-05-04)
graph

๐Ÿ“Š ๋ถ„์„๊ฒฐ๊ณผ ๋ณด๊ณ ์„œ ์ž‘์„ฑ

๐Ÿ“Ÿ ์ž๋™ํ™”

  • ์˜ˆ์ธกํ•˜๊ธฐ ์ „ ์ „์ผ ์šฐ์ถ”๋ฆฌ ์ถ•์‚ฐ ๋งค์ถœ๊ณผ ํœด๋ฌด ์—ฌ๋ถ€๋ฅผ today_sale ๋ณ€์ˆ˜์™€ remark์— ์ž…๋ ฅ
  • ์˜ˆ์ธกํ•˜๋ ค๋Š” ๋‚  ์˜ค์ „ 8์‹œ, ์˜ค์ „ 11์‹œ์— 2์ฐจ๋ก€ main.py ์ˆ˜ํ–‰
  • ์œ ๋‹‰์Šค ๊ณ„์—ด Mac OS ์‹œ๊ฐ„ ๊ธฐ๋ฐ˜ ์žก ์Šค์ผ€์ค„๋Ÿฌ cron ์ด์šฉ
  • ์˜ˆ์ธก๊ฐ’์„ SMS๋กœ ์ „๋‹ฌํ•˜๊ธฐ ์œ„ํ•ด ์†Œ์ •์˜ ์œ ๋ฃŒ SMS ๋ฐœ์†ก ํ”Œ๋žซํผ twiliow ํ™œ์šฉ
from model import WoochuriPredModel
from twilio.rest import Client
import pandas as pd
print('์˜ˆ์ธกํ•˜๋ ค๋Š” ๋‚ ์งœ:', pd.Timestamp.now())

# Load updated crawling dataset and modeling to predict tomorrow's sale
# Setting parameters using local MySQL id, password
user, password = 'root', 'your password'
end_time = (pd.Timestamp.now() - pd.Timedelta(days=1)).strftime("%Y-%m-%d")

# run crawling, preprocess datasets, and finally prediction at one time
PredModel = WoochuriPredModel(user=user, password=password, end_time=end_time,
                              today_sale='must be integer', remark='ํ‰์ผ')
FinalDataset = PredModel.execute()
result = PredModel.run(FinalDataset)

# Sending message
account_sid = 'AC8f9d9f4c8983ee648153f5347ee027a9'
auth_token = 'your_auth_token'  # customize your_auth_token
client = Client(account_sid, auth_token)

woochuri_master = '+821094125854'
message = client.messages.create(from_='+13132543287', body=result, to=woochuri_master)
print(message.sid)

โš™๏ธ Stack

แ„‰แ…ณแ„แ…ณแ„…แ…ตแ†ซแ„‰แ…ฃแ†บ 2021-03-29 แ„‹แ…ฉแ„’แ…ฎ 5 09 47

  • Python 3.7.7
  • BeautifulSoup 4.6.0
  • MySQL 8.0.21(pymysql 1.0.2)
  • Pandas
  • Numpy
  • Scikit-learn 0.24.1
  • Tensorflow 2.x
  • PowerPoint
  • IDE: PyCharm, Jupyter notebook
  • ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ํ„ฐ๋ฏธ๋„์—์„œ ์ž…๋ ฅ ํ›„ ๋‹ค์šด๋กœ๋“œ
    pip install twilio
  • twilio ๊ฐ€์ž… ํ›„ twilio ์›น ๋ธŒ๋ผ์šฐ์ € console๋กœ ์ด๋™ํ•ด account_sid์™€ auth_token ํ™•์ธ ํ›„ main.py ํŒŒ์ผ๊ณผ ๋™์ผํ•œ ์ฝ”๋“œ๋กœ ๋ฉ”์„ธ์ง€ ์ „์†ก

About

๐Ÿท๐Ÿฎ Woochuri daily sale prediction project using machine learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published