This repository contains all resources for the Applied Machine Learning Days workshop Meet your Artificial Self: Generate text that sounds like you.
In this workshop, participants are tasked to download their own chat logs and build a chat bot that generates text similar to their writing. As an alternative to using chat logs, we provide a number of other conversational (and non-conversational datasets) datasets in this repository.
Feel free to join our Gitter during the workshop:
Find the workshop slides here.
The workshop is split in 3 tasks. You can run each task locally (by cloning this repository) or by running the Colab notebook (see links below). If you run locally, make sure you have access to GPU(s) and you are running Python 3.6+ (also make sure you have sufficient storage space). More detailed instructions are provided in the different subfolders.
Fine-tune GPT-2 on various datasets (including tweets, poetry, programming code, chess, music and more!). Thanks to @manueth for compiling the datasets!
➡️ Read more
We use the same approach of style transfer to train a conversational model from our chat logs. You can either use Chatistics to parse your own chat logs or you can use some of the provided resources. Thanks to @MasterScrat for compiling the conversational datasets!
➡️ Read more
We extend the approach in task 2 by introducing multi-task learning, improving data preprocessing, and adding token types.
➡️ Read more