Degen Scraper

Pipeline for generating AI character files and training datasets by scraping public figures' online presence across Twitter and blogs.

⚠️ IMPORTANT: Create a new Twitter account for this tool. DO NOT use your main account as it may trigger Twitter's automation detection and result in account restrictions.

Setup

Install dependencies:
```
npm install
```

Copy the .env.example into a .env file:

# (Required) Twitter Authentication
TWITTER_USERNAME=     # your twitter username
TWITTER_PASSWORD=     # your twitter password
TWITTER_EMAIL=        # your twitter email

# RapidAPI Configuration
RAPIDAPI_URL=
RAPIDAPI_KEY=

# (Optional) Blog Configuration
BLOG_URLS_FILE=      # path to file containing blog URLs

# (Optional) Scraping Configuration
MAX_TWEETS=          # max tweets to scrape
MAX_RETRIES=         # max retries for scraping
RETRY_DELAY=         # delay between retries
MIN_DELAY=           # minimum delay between requests
MAX_DELAY=           # maximum delay between requests

Update

Add Rapid API to get more data.

Get full text tweet:

const twitterCrawlAPI = new TwitterCrawlAPI();
twitterCrawlAPI.getFullTextTweet();

Use puppeteer to get full text tweet with tweet before Sep 29, 2022:

twitterCrawlAPI.fallbackGetFullTextTweet();

Get message examples:

this.messageExamplesCrawler = new MessageExamplesCrawler();
messageExamplesCrawler.addExample();

Usage

Run as Server

npm run start

Add express Server

APIs:

GET /api/characters/:username - get character data by username
POST /api/characters - scrape tweets and blogs by username

{
  "username": "pmarca", // twitter username
  "date": "2024-12-23", // generate character from this date
  "is_crawl": true // scrape tweets and blogs
}

Collect Tweets and Blogs by using CLI

Twitter Collection

npm run twitter -- username

Example: npm run twitter -- pmarca

Blog Collection

npm run blog

Generate Character

npm run character -- username

Example: npm run character -- pmarca

Finetune

npm run finetune

Finetune (with test)

npm run finetune:test

Generate Virtuals Character Card

https://whitepaper.virtuals.io/developer-documents/agent-contribution/contribute-to-cognitive-core#character-card-and-goal-samples

Run this after Twitter Collection step

npm run generate-virtuals -- username date

Example: npm run generate-virtuals -- pmarca 2024-11-29 Example without date: npm run generate-virtuals -- pmarca

The generated character file will be in the characters/[username].json directory. Edit clients and modelProvider fields to match your needs.

The generated tweet dataset file will be in pipeline/[username]/[date]/raw/tweets.json.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github/workflows		.github/workflows
ci		ci
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Degen Scraper

Setup

Update

Usage

Run as Server

APIs:

Collect Tweets and Blogs by using CLI

Twitter Collection

Blog Collection

Generate Character

Finetune

Finetune (with test)

Generate Virtuals Character Card

About

Releases

Packages

Languages

sqrDAO/twitter-scraper-finetune

Folders and files

Latest commit

History

Repository files navigation

Degen Scraper

Setup

Update

Usage

Run as Server

APIs:

Collect Tweets and Blogs by using CLI

Twitter Collection

Blog Collection

Generate Character

Finetune

Finetune (with test)

Generate Virtuals Character Card

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages