Imagine the power of your favorite Twitter banger, in the palm of your hands.
Forked from yiminghan. GOAT fr.
- Added long tweet collection (going back and scrolling down height divided by 3).
- Some text cleaning code, including getting rid of "hf.co" links (that kind of serious posting is too neutral).
- Added a watchlist of users you can customize (only affects training data selection).
- Moved credentials to a separate file (config.json) that's gitignored.
- Also deletes the hashtag. Fuck you ChatGPT bread and butter tweet generation.
Besides the original instructions down below, you should probably also consider creating a config.json
in the root folder to store your throwaway twitter account credentials and openai api keys. Use this starter schema:
{
"username": "your_name_here",
"password": "your_pw_here",
"openai_api_key": "key_here"
}
Happy hunting for tokens.
Well, now you can!
With Twitter Banger Generator, you can scrape, train, and generate your own Twitter bangers.
Scrap, train, and generate your own Twitter bangers, all locally on your own machine!
Let's start by generating your banger in a few easy steps!
# init nanoGPT
git submodule update --init --recursive
# install dependencies
npm install
pip install torch numpy transformers datasets tiktoken wandb tqdm argparse
Download pre-trained models:
wget https://huggingface.co/yiminghan/twitter-bangers/resolve/main/goth600.pt
# The script expects the model to be in nanoGPT/out-{TWITTER HANDLE}/ckpt.pt
mkdir -p nanoGPT/out-goth600
mv goth600.pt nanoGPT/out-goth600/ckpt.pt
# Find all the models here: https://huggingface.co/yiminghan/twitter-bangers/tree/main
That's it! Let's Generate!
(Honestly, I find it strugging on Mac OS, so I recommend running it on CUDA if you can)
bash sample_finetune.sh goth600
# On MAC OS, you might need set MPS flag:
MPS=1 bash sample_finetune.sh goth600
This generates a lot of stuff, for example:
Protip: I usually like to generate a lot and pick out the good stuff
% MPS=1 bash sample_finetune.sh goth600
---------------
The concept is simple, if you're not paying attention, you're not paying attention.
It's a game.
It's a living.
It's a time.
It's an obsession.
And you need to be here.
We can start by building a simulation.
We can start by building a simulation.
We can start by creating an AI simulation.
We can start by creating an AI simulation.
We can start by creating a simulation using only data.
// .....
---------------
You can also prompt the beginning of the text with a custom prompt:
bash sample_finetune.sh goth600 "It's 2023"
MPS=1 bash sample_finetune.sh goth600 "It's 2023"
And there's bangers for your specific prompt!
---------------
It's 2023. You have internet access. You have data. You have a choice. Have a drink…
— Kevin Spacey, House of Cards
If you have internet access, there's nothing stopping you from connecting to the internet and inventing some new future. No one has the authority to stop you. No one has the right to control you. Just relax and take your time. It's 2023. You have internet access. You have data. You have a choice. Have a drink…
---------------
It's 2023. You have internet access. You are Rich. You are Me.
I am my father's best friend. I am my father's best friend.
My father is my godfather. I am my father's godfather.
You are always there to support me. You are always there to support me.
You are the best. You are the best. You are the best.
You are the best. You are the best. You are the best
---------------
All scraping logic is done inside scraper.spec.js
via playwright
To do begin scraping your favoirte Twitter account, do the following:
-
Set up a throwaway Twitter account to login with, use a temporary email service provider.
-
On line 7 and line 8, replace the
username
andpassword
with your throwaway account's credentials. -
On line 24, replace the
twitterAt
with the Twitter handle of the account you want to scrape. -
Begin scraping by running
npx playwright test --project=LocalChrome --ui
, and start scraping in the browser window that pops up. -
There's already some pre-scraped data in
scrape/
, so you can skip the scraping step if you want to just start playing around. -
Please don't abuse this, I don't want Elon to get pissed.
Before you finetune, please cd
into nanoGPT and complete the setup instructions there.
Finetuning is all done via nanoGPT, but there is simple script to do so:
bash finetune_gpt2.sh {TWITTER HANDLE}
(Honestly, I don't recommend finetuning on Mac OS, I recommend running it on CUDA if you can, MacOS is just too slow)
Feel free to play around the learning rates in finetune.py
Once I get a good GPU, it's over for you guys. I'm going to train a Llama2 finetune and unleash it on the world.
But anyways, right now I'm still learning about parameter efficient finetuning techniques, so I plan to add qlora training soon, either submit a PR into NanoGPT or do it in this repo directly.
- Automated Builtin Twitter Scraper
- Character-level Transformer training
- GPT2 full fine-tuning (using nanoGPT)
- GPT2 qlora fine-tuning (Coming Soon)
- Llama fine-tuning
PRs Welcome! My GPU is trash right now so I can't train anything bigger, I would be excited to see what a bigger model can generate!