GitHub - bacross/buffett_topics: Topic Modelling of Warren Buffett's Letters

This repo looks at various topic modelling algorithms in order to extract info out of Warren Buffett's Investor Letters

Getting started

After cloning or downloading this repo, navigate to the repo folder and create the conda environment using the requirements.txt file:

conda env create --file requirements.txt

Data

Buffett's Letter's can be found here: http://www.berkshirehathaway.com/letters/letters.html. The letters from 1977 to 2003 are offered as html. The remainder are offered only as pdfs. Rather than deal with messy pdf text extraction, for now, I just hand downloaded the html letters. It's on my todo list to tackle the pdfs, but for now we can stick to the html available letters. I'll leave it to the reader to decide how to efficiently get the letters. My code assumes the html is saved locally.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
docs		docs
etl		etl
models		models
.gitignore		.gitignore
README.md		README.md
cfg.py		cfg.py
requirements.txt		requirements.txt
run_all.py		run_all.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This repo looks at various topic modelling algorithms in order to extract info out of Warren Buffett's Investor Letters

Getting started

Data

About

Releases

Packages

Languages

bacross/buffett_topics

Folders and files

Latest commit

History

Repository files navigation

This repo looks at various topic modelling algorithms in order to extract info out of Warren Buffett's Investor Letters

Getting started

Data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages