Skip to content

Latest commit

 

History

History
22 lines (13 loc) · 955 Bytes

README.md

File metadata and controls

22 lines (13 loc) · 955 Bytes

questions

  • A collection of pyspark scripts exploring Wikipedia data dumps and how they can be used to generate questions.
  • A small flask-based web app that generates questions by using llama3 deployed via ollama.

Description

More information about this project can be found in its' accompanying dev journal here:

Getting started

This project uses poetry for Python dependency management and running the scripts, Docker for containerization, and GNU Make as a build tool. With these three tools installed, you can run

make WIKIDATA_DUMP=<path_to_xml.bz2_file>

and all the necessary steps should be done out of the box.

To clean everything up, run

make clean