Financial Data Science projects in Jupyter notebooks, with FinDS Python package:
- use database engines SQL, Redis, MongoDB
- interfaces for
- structured data from CRSP, Compustat, IBES, TAQ
- APIs from ALFRED, BEA
- unstructured data from SEC Edgar, Federal Reserve websites
- academic websites by Fama and French, Loughran and MacDonald, Hoberg and Phillips
- recipes for econometrics, finance, graphs, event studies, backtesting
- applications of statistics, machine learning, NLP, neural networks and LLMs.
notebook | Financial | Data | Science |
---|---|---|---|
stock_prices | Stock distributions, delistings | CRSP stocks | Statistical moments |
jegadeesh_titman | Overlapping portfolios; Momentum effect |
CRSP stocks | Hypothesis testing; Newey-West estimator |
fama_french | Portfolio sorts; Value effect |
CRSP stocks; Compustat |
Linear regression |
fama_macbeth | Cross-sectional Regressions; CAPM |
Ken French research library | Non-linear regression; Quadratic optimization |
weekly_reversals | Mean reversion; Implementation shortfall |
CRSP stocks | Structural breaks; Performance evaluation |
quant_factors | Factor investing; Backtests |
CRSP stocks; Compustat; IBES |
Cluster analysis |
event_study | Event studies | S&P key developments | Multiple testing; FFT |
economic_releases | Economic data revisions; Employment payrolls |
ALFRED | Outliers |
regression_diagnostics | Consumer and producer prices |
FRED | Linear regression diagnostics; Residual analysis |
econometric_forecast | Production and Inflation | FRED | Time series analysis |
approximate_factors | Approximate factor models | FRED-MD | Unit root test |
economic_states | State space models | FRED-MD | Gaussian Mixture; HMM |
term_structure | Interest rates | FRED yield curve | SVD |
bond_returns | Bond risk factors | FRED bond returns | PCA |
option_pricing | Binomial tree; Black-Scholes-Merton and the Greeks |
simulated data | Monte Carlo simulation |
conditional_volatility | Value at risk | FRED crypto-currencies | EWMA; GARCH |
covariance_matrix | Portfolio risk | Fama-French industries | Covariance matrix estimation |
market_microstructure | Market impact; Liquidity risk |
TAQ tick data | High frequency volatility |
event_risk | Earnings misses | IBES | Poisson regression; GLM |
customer_ego | Supply chain | Compustat principal customers | Graph networks |
industry_community | Industry sectors | Hoberg and Phillips research library |
Community detection |
bea_centrality | Input-output tables | Bureau of Economic Analysis | Graph centrality |
link_prediction | Product markets | Hoberg and Phillips | Link prediction |
spatial_regression | Earnings surprises | IBES Hoberg and Phillips |
Spatial regression |
fomc_topics | FOMC meetings | Federal Reserve website | Topic modeling |
mda_sentiment | 10-K Management Discussion | SEC Edgar; Loughran and Macdonald research library |
Sentiment analysis |
business_description | 10-K Business Description | SEC Edgar | POS tagging; Density-based clustering |
classification_models | Industry classification | SEC Edgar | Classification |
regression_models | Macroeconomic forecasts | FRED-MD | Regression |
deep_classifier | Industry classification | SEC Edgar | Neural networks; Word embeddings |
recurrent_net | Macroeconomic forecasts | FRED-MD | Recurrent Neural Nets; Dynamic factor models |
convolutional_net | Macroeconomic forecasts | FRED-MD | Convolutional Neural Nets; Vector autoregression |
reinforcement_learning | Retirement spending | SBBI | Reinforcement learning |
fomc_language | Fedspeak | FOMC meetings minutes | Language modelling; Transformers |
sentiment_llm | Financial news sentiment | Kaggle | LLM prompting |
summarization_llm | 10-K Market Risks | SEC Edgar | Text summarization |
finetune_llm | Industry classification | SEC Edgar | LLM fine-tuning |
rag_agent | Corporate philanthropy | text documents | RAG, LLM chatbots and agents |
Github: https://terence-lim.github.io