Skip to content

A Python script that scrapes an Indeed job filter and categorizes the data using the OpenAI API.

Notifications You must be signed in to change notification settings

Robinh0/indeed-linkedin-scraper-categorizer

Repository files navigation

This is a personal project that I created when searching for jobs. To make my own life easier, and also to learn more about cloud and devops practices.

How does it work? The script scrapes an Indeed search url, and collects the job information for each job. It then connects with openAI to extract and categorize variables such as programming langages, frameworks, seniority level, etc.

Input: The input is an indeed filter, like: "https://nl.indeed.com/jobs?q=python+developer&l=Randstad&from=searchOnDesktopSerp&vjk=20f5fbb9784589b5"

Output: The output is a csv file with extracted and enhanced job position data.

image

The scraper runs locally, but is also deployed on AWS and can be called via an API Gateway post request. It then activates the ECR image in AWS and runs it with Lambda.

Still a work in progress, next steps are to create a subclass of the Extractor class, to add functionality for scraping linkedin jobs as well.

About

A Python script that scrapes an Indeed job filter and categorizes the data using the OpenAI API.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published