Skip to content
This repository has been archived by the owner on Jun 6, 2019. It is now read-only.

use case

Ng Yik Kai edited this page Apr 23, 2019 · 3 revisions

Use Case : Fundamental Analysis

Note

  • Root Directory = /kynetic
  • Blueprint = single-container-kylin
  • MySQL IP = 172.20.0.5
  • Master Node IP = 172.20.0.2

Prerequisite

  • Docker is installed
  • Single Container Kylin Docker Image is built
  • ODBC Driver is installed on client computer | Download

Steps

Docker Image Setup

  1. Set up single-container-kylin HDP cluster
docker-compose -f ./assests/compose/single-container-kylin.yml up
  1. Submit amabari blueprint to start creating cluster
sh submit-blueprint.sh single-container-kylin

Configure Master Node for Hive View and Kylin

  1. Access into master node
docker exec -it compose_master0.dev_1 bash
  1. Switch user and create necessary folder in HDFS
su - hdfs
hdfs dfs -mkdir /user/root /kylin /user/admin
hdfs dfs -chown root /user/root /kylin
hdfs dfs -chown admin /user/admin
  1. Open Kylin Configuration File
cd $KYLIN_HOME/conf
vi kylin_job_conf_inmem.xml
  1. Update the attribtes as follows
mapreduce.map.java.opts = -Xmx410m
mapreduce.map.memory.mb = 512
  1. Start up Kylin
$KYLIN_HOME/bin/kylin.sh start

Preparing Fundamental Data

Fundamental Data is scraped from stockpup

  1. Prepare the list of ticker | LIST_COMP-01.txt
  2. Scrap Data from stockpup
sh doc/assets/scrap-01.sh
  1. Since there are files that are not being scraped properly, a bash command is used to get the list of ticker that are not being scraped properly | LIST_COMP-02.txt
sh doc/assets/scrap-02.sh
  1. Merging all fundamental data into one file | fundamental-raw.csv
sh merge.sh
  1. Edit and prepare table with the following columns
Column Name Previous Columne Name Notes
index Not Exist computed through append.sh
ticker Not Exist computed through merge.sh
year Not Exist ==LEFT(TEXT(quarter_end,"yyyy"),4)
quarter Not Exist computed through append.sh
quarter_end Quarter end
assets Assets
current-assets Current Assets
liabilities Liabilities
current liabilities Current Liabilities
shareholder_equity Shareholder equity
preferred_equity Preferred equity
goodwill Goodwill & intangibles
long_debt Long-term debt
revenue Revenue
earning Earnings
eps EPS diluted
cash Cash at end of period
price Price
roe ROE
roa ROA
pe_ratio Existed but discarded
de_ratio Long-term debt to equity ratio
current_ratio Current Ratio
cash_ratio Not Exist
profit_margin Not Exist = IF(ISNUMBER(earning/revenue),earing/revenue,0)
nta Not Exist = assets - goodwill - liabilities
pnta_ratio Not Exist = IF(ISNUMBER(price/nta),price/nta)
  1. Convert all 'None' value to 0
  2. Compute column index and quarter
python append.py

Preparing Company Profile Data

  1. Getting US Ticker Profile Data | Link
  2. Update and output the table as the following schema
  3. Since this table acts as look up table, it need to containes every Ticker existed in Fundamental Table, the list of missing ticker is being added to the table with blank attributes | profile.csv

Import Data to MySQL

  1. Copy fundamental and profile data into MySQL Docker Container
docker cp ./doc/assets/fundamental.csv mysql:/root
docker cp ./doc/assets/profile.csv mysql:/root
  1. Launch into MySQL docker container
docker exec -it compose_master0.dev_1 bash
  1. Launch MySQL
mysql --local-infile=1 -u root -PPASSWORD mysql
  1. Reset MySQL Local Infile Permission
SET GLOBAL local_infile = 1;
  1. Import fundamental and profile data with the following SQL
LOAD DATA LOCAL INFILE "/root/fundamental.csv"
INTO TABLE fundamental
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
LOAD DATA LOCAL INFILE "/root/profile.csv"
INTO TABLE profile
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
  1. Make index primary key in RDBMS

Import Data to Hive by using Sqoop

  1. Launch master node
docker exec -it compose_master0.dev_1 bash
  1. Run the following Sqoop Command to import fundamental and profile data to Hive fro RDB<S
sqoop import --connect
jdbc:mysql://172.20.0.5:3306/mysql?characterEncoding=utf8 --username root
--password PASSWORD --table fundamental --hive-import
sqoop import --connect
jdbc:mysql://172.20.0.5:3306/mysql?characterEncoding=utf8 --username root
--password PASSWORD --table profile --hive-import
  1. Create View for Fundamental Analysis
CREATE VIEW FUND_VIEW_1 as
SELECT TICKER, YEAR, QUARTER, QUARTER_END, REVENUE, EARNING, EPS, PE_RATIO,
DE_RATIO, CURRENT_RATIO, PROFIT_MARGIN, PNTA_RATIO, ROW_NUMBER()
OVER(PARTITION BY TICKER ORDER BY QUARTER_END DESC) as CUM_QUARTER

Build Kylin Cube

Log in to Kylin Portal and start building cube with the following schema

Dimension Measure
TIME : year, ticker, quarter_end, quarter, cum_quarter sum of all ratios
LOCATION : location
COMPANY : ticker, sector, industry
RATIOS : pe_ratio, de_ratio, current_ratio, cash_ratio, profit_margin, pnta_ratio

Preparing SQL for the following query

  1. Profit for past 5 years all positive
WITH A AS (SELECT TICKER, CUM_QUARTER, EARNING AS PQ1,
LAG(EARNING, 1) OVER(PARTITION BY TICKER ORDER BY QUARTER_END) AS PQ2,
LAG(EARNING, 2) OVER(PARTITION BY TICKER ORDER BY QUARTER_END) AS PQ3,
LAG(EARNING, 3) OVER(PARTITION BY TICKER ORDER BY QUARTER_END) AS PQ4,
LAG(EARNING, 4) OVER(PARTITION BY TICKER ORDER BY QUARTER_END) AS PQ5
FROM FINAL.FUND_VIEW_1)
SELECT * FROM A
  1. Profit Margin > 0.05
SELECT TICKER, QUARTER_END, PROFIT_MARGIN
FROM FINAL.FUND_VIEW_1
WHERE PROFIT_MARGIN > 0.05
AND CUM_QUARTER = 1
  1. Current Ratio > 1
SELECT TICKER, QUARTER_END, CURRENT_RATIO
FROM FINAL.FUND_VIEW_1
WHERE CURRENT_RATIO > 1
AND CUM_QUARTER = 1
  1. Debt to Equity Ratio < 2
SELECT TICKER, QUARTER_END, DE_RATIO
FROM FINAL.FUND_VIEW_1
WHERE DE_RATIO < 2
AND CUM_QUARTER = 1
  1. P/E ratio < 10
SELECT TICKER, QUARTER_END, PE_RATIO
FROM FINAL.FUND_VIEW_1
WHERE PE_RATIO < 10
AND CUM_QUARTER = 1
  1. Continuous Growth of P/E for last 2 quarter
SELECT TICKER, PE_RATIO, LAG(PE_RATIO, 1) OVER(PARTITION BY TICKER ORDER BY QUARTER_END) AS PE_RATIO_PQ
FROM FINAL.FUND_VIEW_1