This repository has been archived by the owner on Jun 6, 2019. It is now read-only.
forked from randerzander/docker-hdp
-
Notifications
You must be signed in to change notification settings - Fork 1
use case
Ng Yik Kai edited this page Apr 23, 2019
·
3 revisions
- Root Directory = /kynetic
- Blueprint = single-container-kylin
- MySQL IP = 172.20.0.5
- Master Node IP = 172.20.0.2
- Docker is installed
- Single Container Kylin Docker Image is built
- ODBC Driver is installed on client computer | Download
- Set up single-container-kylin HDP cluster
docker-compose -f ./assests/compose/single-container-kylin.yml up
- Submit amabari blueprint to start creating cluster
sh submit-blueprint.sh single-container-kylin
- Access into master node
docker exec -it compose_master0.dev_1 bash
- Switch user and create necessary folder in HDFS
su - hdfs
hdfs dfs -mkdir /user/root /kylin /user/admin
hdfs dfs -chown root /user/root /kylin
hdfs dfs -chown admin /user/admin
- Open Kylin Configuration File
cd $KYLIN_HOME/conf
vi kylin_job_conf_inmem.xml
- Update the attribtes as follows
mapreduce.map.java.opts = -Xmx410m
mapreduce.map.memory.mb = 512
- Start up Kylin
$KYLIN_HOME/bin/kylin.sh start
Fundamental Data is scraped from stockpup
- Prepare the list of ticker | LIST_COMP-01.txt
- Scrap Data from stockpup
sh doc/assets/scrap-01.sh
- Since there are files that are not being scraped properly, a bash command is used to get the list of ticker that are not being scraped properly | LIST_COMP-02.txt
sh doc/assets/scrap-02.sh
- Merging all fundamental data into one file | fundamental-raw.csv
sh merge.sh
- Edit and prepare table with the following columns
Column Name | Previous Columne Name | Notes |
---|---|---|
index | Not Exist | computed through append.sh |
ticker | Not Exist | computed through merge.sh |
year | Not Exist | ==LEFT(TEXT(quarter_end,"yyyy"),4) |
quarter | Not Exist | computed through append.sh |
quarter_end | Quarter end | |
assets | Assets | |
current-assets | Current Assets | |
liabilities | Liabilities | |
current liabilities | Current Liabilities | |
shareholder_equity | Shareholder equity | |
preferred_equity | Preferred equity | |
goodwill | Goodwill & intangibles | |
long_debt | Long-term debt | |
revenue | Revenue | |
earning | Earnings | |
eps | EPS diluted | |
cash | Cash at end of period | |
price | Price | |
roe | ROE | |
roa | ROA | |
pe_ratio | Existed but discarded | |
de_ratio | Long-term debt to equity ratio | |
current_ratio | Current Ratio | |
cash_ratio | Not Exist | |
profit_margin | Not Exist | = IF(ISNUMBER(earning/revenue),earing/revenue,0)
|
nta | Not Exist | = assets - goodwill - liabilities |
pnta_ratio | Not Exist | = IF(ISNUMBER(price/nta),price/nta)
|
- Convert all 'None' value to 0
- Compute column index and quarter
python append.py
- Getting US Ticker Profile Data | Link
- Update and output the table as the following schema
- Since this table acts as look up table, it need to containes every Ticker existed in Fundamental Table, the list of missing ticker is being added to the table with blank attributes | profile.csv
- Copy fundamental and profile data into MySQL Docker Container
docker cp ./doc/assets/fundamental.csv mysql:/root
docker cp ./doc/assets/profile.csv mysql:/root
- Launch into MySQL docker container
docker exec -it compose_master0.dev_1 bash
- Launch MySQL
mysql --local-infile=1 -u root -PPASSWORD mysql
- Reset MySQL Local Infile Permission
SET GLOBAL local_infile = 1;
- Import fundamental and profile data with the following SQL
LOAD DATA LOCAL INFILE "/root/fundamental.csv"
INTO TABLE fundamental
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
LOAD DATA LOCAL INFILE "/root/profile.csv"
INTO TABLE profile
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
- Make index primary key in RDBMS
- Launch master node
docker exec -it compose_master0.dev_1 bash
- Run the following Sqoop Command to import fundamental and profile data to Hive fro RDB<S
sqoop import --connect
jdbc:mysql://172.20.0.5:3306/mysql?characterEncoding=utf8 --username root
--password PASSWORD --table fundamental --hive-import
sqoop import --connect
jdbc:mysql://172.20.0.5:3306/mysql?characterEncoding=utf8 --username root
--password PASSWORD --table profile --hive-import
- Create View for Fundamental Analysis
CREATE VIEW FUND_VIEW_1 as
SELECT TICKER, YEAR, QUARTER, QUARTER_END, REVENUE, EARNING, EPS, PE_RATIO,
DE_RATIO, CURRENT_RATIO, PROFIT_MARGIN, PNTA_RATIO, ROW_NUMBER()
OVER(PARTITION BY TICKER ORDER BY QUARTER_END DESC) as CUM_QUARTER
Log in to Kylin Portal and start building cube with the following schema
Dimension | Measure |
---|---|
TIME : year, ticker, quarter_end, quarter, cum_quarter | sum of all ratios |
LOCATION : location | |
COMPANY : ticker, sector, industry | |
RATIOS : pe_ratio, de_ratio, current_ratio, cash_ratio, profit_margin, pnta_ratio |
- Profit for past 5 years all positive
WITH A AS (SELECT TICKER, CUM_QUARTER, EARNING AS PQ1,
LAG(EARNING, 1) OVER(PARTITION BY TICKER ORDER BY QUARTER_END) AS PQ2,
LAG(EARNING, 2) OVER(PARTITION BY TICKER ORDER BY QUARTER_END) AS PQ3,
LAG(EARNING, 3) OVER(PARTITION BY TICKER ORDER BY QUARTER_END) AS PQ4,
LAG(EARNING, 4) OVER(PARTITION BY TICKER ORDER BY QUARTER_END) AS PQ5
FROM FINAL.FUND_VIEW_1)
SELECT * FROM A
- Profit Margin > 0.05
SELECT TICKER, QUARTER_END, PROFIT_MARGIN
FROM FINAL.FUND_VIEW_1
WHERE PROFIT_MARGIN > 0.05
AND CUM_QUARTER = 1
- Current Ratio > 1
SELECT TICKER, QUARTER_END, CURRENT_RATIO
FROM FINAL.FUND_VIEW_1
WHERE CURRENT_RATIO > 1
AND CUM_QUARTER = 1
- Debt to Equity Ratio < 2
SELECT TICKER, QUARTER_END, DE_RATIO
FROM FINAL.FUND_VIEW_1
WHERE DE_RATIO < 2
AND CUM_QUARTER = 1
- P/E ratio < 10
SELECT TICKER, QUARTER_END, PE_RATIO
FROM FINAL.FUND_VIEW_1
WHERE PE_RATIO < 10
AND CUM_QUARTER = 1
- Continuous Growth of P/E for last 2 quarter
SELECT TICKER, PE_RATIO, LAG(PE_RATIO, 1) OVER(PARTITION BY TICKER ORDER BY QUARTER_END) AS PE_RATIO_PQ
FROM FINAL.FUND_VIEW_1