Skip to content

Latest commit

 

History

History
143 lines (105 loc) · 6.84 KB

JACK-README.md

File metadata and controls

143 lines (105 loc) · 6.84 KB

Jack the Reader Wercker build badge codecov Gitter license

A Machine Reading Comprehension framework.
  • All work and no play makes Jack a great framework!
  • All work and no play makes Jack a great framework!
  • All work and no play makes Jack a great framework!

Jack the Reader - or jack, for short - is a framework for building and using models on a variety of tasks that require reading comprehension.

Installation

To install Jack, please see How to Install and Run.

Quick Start & Tutorials

We provide ipython notebooks with tutorials on Jack. For the quickest start, you can begin here. If you're interested in training a model yourself, see this tutorial, and if you'd like to implement a new model yourself, this notebook gives you a tutorial that explains this process in more detail.

There is documentation on our command line interface for actually training and evaluating models. For a high-level explanation of the ideas and vision, see Understanding Jack the Reader.

Supported ML Backends

We currently support TensorFlow and PyTorch. Readers can be implemented using both. Input and output modules (i.e., pre- and post-processing) are independent of the ML backend and can thus be reused for model modules that either backend. Though most models are implemented in TensorFlow by reusing the cumbersome pre- and post-processing it is easy to quickly build new readers in PyTorch as well.

Dedicated Task Documentation and Pre-trained Models

Quickstart Examples - Training and Usage of a Question Answering System

To illustrate how jack works, we will show how to train a question answering model. It is probably best to setup a virtual environment to avoid clashes with system wide python library versions. A more comprehensive

First, install the framework:

$ python3 -m pip install -e .[tf]

Then, download the SQuAD dataset, and the GloVe word embeddings:

$ ./data/SQuAD/download.sh
$ ./data/GloVe/download.sh

Train a FastQA model:

$ python3 bin/jack-train.py with train='data/SQuAD/train-v1.1.json' dev='data/SQuAD/dev-v1.1.json' reader='fastqa_reader' \
> repr_dim=300 dropout=0.5 batch_size=64 seed=1337 loader='squad' save_dir='./fastqa_reader' epochs=20 \
> with_char_embeddings=True embedding_format='memory_map_dir' embedding_file='data/GloVe/glove.840B.300d.memory_map_dir' vocab_from_embeddings=True

or shorter, using our prepared config:

$ python3 bin/jack-train.py with config='./conf/qa/squad/fastqa.yaml'

A copy of the model is written into the save_dir directory after each training epoch when performance improves. These can be loaded using the commands below or see e.g. quickstart.

You want to train another model? No problem, we have a fairly modular QAModel implementation which allows you to stick together your own model. There are examples in conf/qa/squad/ (e.g., bidaf.yaml or our own creation jack_qa.yaml). These models are defined solely in the configs, i.e., there is not implementation in code. This is possible through our ModularQAModel.

If all of that is too cumbersome for you and you just want to play, why not downloading a pretrained model:

$ # we still need GloVe in memory mapped format, ignore the next 2 commands if already downloaded and transformed
$ data/GloVe/download.sh
$ wget -O fastqa.zip https://www.dropbox.com/s/qb796uljoqj0lvo/fastqa.zip?dl=1
$ unzip fastqa.zip && mv fastqa fastqa_reader
from jack import readers
from jack.core import QASetting

fastqa_reader = readers.reader_from_file("./fastqa_reader")

support = """"It is a replica of the grotto at Lourdes, 
France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. 
At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), 
is a simple, modern stone statue of Mary."""

answers = fastqa_reader([QASetting(
    question="To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?",
    support=[support]
)])

print(answers[0][0].text)

Support

We are thankful for support from:

Developer guidelines

$ pwd
/home/pasquale/workspace/jack
$ python3 bin/jack-train.py [..]