A simple script to extract images, text, and presenter notes from a folder full of PowerPoint files. It uses the python-pptx
library.
You need python-pptx, if you don't have it already. Install with:
pip install python-pptx
- Clone the repository onto your local drive.
- Copy PowerPoint files into the input folder.
- Run
extract.py
.
- Text will be saved to a new
text.csv
file in the root folder. This has a row for each slide, with columns containing the presentation name, page number, all the text from the page, and any presenter notes. - Images will be saved to a new
images
folder, named sequentially with the name of the presentation.
This is a quick and dirty script I wrote for a specific project. I welcome PRs to clean up code, add features, etc.