Text embedding models yield high-resolution insights into conceptual knowledge from short multiple-choice quizzes
This repository contains all data and code used to produce the paper "Text embedding models yield high-resolution insights into conceptual knowledge from short multiple-choice quizzes" by Paxton C. Fitzpatrick, Andrew C. Heusser, and Jeremy R. Manning.
We also include reproducible environments for running our experiment and analyses via Docker.
The repository is organized as follows:
.
├── code : all analysis code used in the paper
│ ├─ notebooks : Jupyter notebooks for running analyses
│ └─── khan_helpers : Python package with helper code for analyses
├── data : all data analyzed in the paper
│ ├── embeddings : 2D UMAP embeddings for knowledge maps
│ ├── models : trained models
│ ├── participants : individual & average participant data objects
│ ├── raw : raw lecture transcripts, quiz questions, and performance data
│ └── trajectories : topic trajectories for lectures and question sets
├── docker : files for building experiment & analysis environments
├── exp : all code for running the experiment
│ ├── static : stimuli, scripts, stylesheets, and other static files
│ └── templates : HTML templates for experiment pages
└── paper : all files for generating the paper
├── CDL-bibliography : submodule for CDL BibTeX file
├── admin : files related to submission & review process
└── figs : PDFs of all figures from the paper
You can install the Docker Desktop app for your operating system using one of the guides below:
Alternatively, you can install Docker Engine (CLI only) for various Linux OSes using one of the guides listed here.
You do not need to create a Docker ID or Docker Hub account to use Docker with this repo.
Option 1: launch_notebooks.sh
The easiest way to set up and run the analyses is to use the
launch_notebooks.sh
script included in this repository.
From the repository root, simply run:
./launch_notebooks.sh
The script will:
- Start the Docker daemon, if it isn't already running
- Build the image from
Dockerfile-analyses
, if it doesn't already exist - Create and run a container from the image, if one doesn't already exist
- Launch a Jupyter notebook server inside the container, if one isn't already running
- Open the notebook app in your default browser
- Attach stdout to the notebook server logs in the container
The script also accepts a few options to customize behavior:
$ ./launch_notebooks.sh --help
launch_notebooks.sh [-h] [-d] [-b] [-i NAME] [-c NAME]
Launch a Jupyter notebook server inside a Docker container for running the
analysis notebooks. The container is set up automatically the first time the
script is run.
Options:
-h, --help Show this help message and exit
-d, --detach Don't attach the terminal to the streaming
notebook server log
-b, --no-browser Don't try to automatically open notebooks in a
browser window
-i, --image-name NAME Run a container from existing image NAME, or
build a new image and tag it NAME
-c, --container-name NAME Start the existing container NAME, or create a
new container named NAME
To stop the notebook server and exit the container, press Control+c.
The script should work on most systems. If for some reason it doesn't work for you, or you prefer to manage the environment manually, you can build and run the analysis environment following the steps below (and if you encounter any errors, feel free to open an issue!).
-
After installing Docker, launch the desktop app or start the daemon from the command line.
-
From the repository's root directory, build the "
khan
" image from the Dockerfile-analyses file in the docker directory:docker build --rm -f docker/Dockerfile-analyses -t khan .
-
Run a container (named "
Khan
") from the newly built image:docker run -it -p 8888:8888 --name Khan -v $PWD:/mnt khan
The command above binds port 8888 in the continer to port 8888 on the host so we can access the Jupyter notebook server from a web browser, and bind-mounts the repository to the container's
/mnt
directory so we can read and write files from inside it. -
The notebook server will launch automatically when the container is run. Copy and paste the 3rd link that appears (the one starting with
http://127.0.0.1:8888
) into a web browser to access the notebook app. -
You can then open any notebook in
code/notebooks/
and run the code inside it. When finished, return to the terminal and press Control+c to stop the notebook server and exit the container. -
To launch the container and notebooks any time after this initial setup, run:
docker start Khan && docker attach Khan
- You can launch an interactive
bash
shell inside the container to explore or modify its contents withdocker exec -it Khan bash
. To exit the container, either press Control+d or typeexit
. Note that:- if the container isn't already running, you must start it first with
docker start Khan
- When you enter the container this way (rather than with
docker attach
or thelaunch_notebooks.sh
script), the container isn't automatically stopped when you exit it. To stop the container after exiting, usedocker stop Khan
- if the container isn't already running, you must start it first with
- You can get the URL of the running notebook server from inside the container
with
jupyter notebook list
- The container pre-installs some nifty
extensions
for customizing the Jupyter Notebook interface. If you want to enable any of
them:
- Open the notebook application in a browser and click on the "Nbextensions" tab. (To launch the notebook application, start the notebook server as described above and visit the server's address in your web browser.)
- Uncheck the "disable configuration for nbextensions without explicit compatibility ..." box (nearly all of them are compatible, we're just using a newer version of Jupyter notebooks).
- Click on any of the listed extensions to see a description and further options, and check the box next to its name to enable it.
- Refresh any running notebooks for changes to take effect.
-
After installing Docker, launch the desktop app or start the daemon from the command line.
-
From the repository's root directory, build the "
khan-exp
" image from the Dockerfile-experiment file in the docker directory:docker build --rm -f docker/Dockerfile-experiment -t khan-exp .
-
Run a container (named "
Khan-exp
") from the newly built image:docker run -it -p 22363:22363 -v "$PWD/exp:/exp" --name Khan-exp khan-exp
The command above bind-mounts the container to the repository's
exp/
directory so the psiTurk server can read and run the experiment code, and binds port 22363 between the container and host so the server can be accessed from a web browser.Note: the port published by the container must match the port listed in
exp/config.txt
. -
Your shell prompt (
$PS1
) should now start withroot@
, indicating that you're now running abash
shell from inside the container. To start the psiTurk experiment server, run:psiturk server on
When you see "
Now serving on http://0.0.0.0:22363
," the experiment server is ready. Starting the server for the first time will also createexp/server.log
, a logfile for the experiment server, andexp/efficient-learning-khan.db
, a SQLite database to hold raw experiment data. -
Generate a link to the experiment in "debug mode":
psiturk debug -p
This will output a URL in the format
http://0.0.0.0:22363/ad?assignmentId=debug<XXXXXX>&hitId=debug<YYYYYY>&workerId=debug<ZZZZZZ>&mode=debug
, where<XXXXXX>
and<ZZZZZZ>
will form a unique identifier for the run (i.e., a participant's unique ID). In debug mode, the experiment will behave normally and data will still be saved properly, but psiTurk will not try to connect to Amazon Mechanical Turk's servers. This is useful because it enables the experiment to be run locally without the user having to create AWS & MTurk accounts, supply access keys, etc. -
Copy and paste the URL into a web browser, and follow the on-screen instructions to progress through the experiment. Note: the experiment will not work in Google Chrome. Recommended browsers include Safari and Firefox.
-
When finished, return to the terminal and shut down the experiment server:
psiturk server off
and exit the Docker container by pressing Control+d or typing
exit
. -
To start and enter the container any time after this initial setup, run:
docker start Khan-exp && docker attach Khan-exp