Skip to content

jan-schaeffer/google_scholar_scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Google Scholar Scrape

A Python script that scrapes Google Scholar profiles for name and domain, guesses how they could be combined in an email-address and verifies whether these addresses exists using the isitaralemail API.

Features

  • Navigates the first 100 results pages for a given search term, limits search results to one year (e.g. 2021-2022) to maximize number of results (scholar does not show 100+ pages)
  • Looks for links to authors' profiles
  • Scrapes name and domain from profile page (see picture below)
  • Automatically removes non-Latin and non-English (ä, ü, ö, ...) characters from the name
  • Depending on the number of names of a person combines them into email-'guesses'
  • Sends these guesses to isitaralemail API
  • Writes verified Emails to a .csv
  • Automatically sleeps to not trip Bot detection
  • Saves already found names, searched for terms and pages in pages in .csv files to allow restarting the script and to not double verify Emails

Scholar profile page

scholar profile

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages