Serverless Scrapy Project Less Than 1 Minute

Sample scrapy project intergrate with AWS Step Function to trigger all lambdas all at once, then save results to AWS S3 Bucket.

References:

With a couple of modifications for scrapy working with AWS Lambda

Prerequisites

Docker (Non-linux evironment, for building compatible python package for AWS Lambda environment)
Python 3.9
Pipenv
NodeJs 16
AWS cli + AWS profile
Serverless CLI 3.22

Local development & testing

Packages

Python packages is managed by Pipenv. Use Pipenv's pipenv install to install required packages and pipenv shell to start python development environment with those installed packages.

Scrapy

This repository is already a scrapy project. Any scrapy command can be used. For example scrape a spider that is already defined:

scrapy crawl quotes -o test.json

Lambda Functions

We can test the lambda function by invoke it locally:

serverless invoke local -f scrape_quotes

Deploy to AWS

Change the stage of the deployment in the serverless.yml file.

Deploy

With configued AWS CLI Profile, serverless deployment can be done by using

serverless deploy

Destroy

Defined serverless deployment can be remove by using

serverless remove

Or Delete corresponding stack on CloudFormation

All of buckets created needs to be empty before removing resources, we can remove again if there's errors

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
crawler		crawler
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
launcher.py		launcher.py
package-lock.json		package-lock.json
package.json		package.json
scrapy.cfg		scrapy.cfg
serverless.yml		serverless.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Serverless Scrapy Project Less Than 1 Minute

Prerequisites

Local development & testing

Packages

Scrapy

Lambda Functions

Deploy to AWS

Deploy

Destroy

About

Releases

Packages

Languages

License

hohoaisan/serverless-scrapy-less-than-one-min

Folders and files

Latest commit

History

Repository files navigation

Serverless Scrapy Project Less Than 1 Minute

Prerequisites

Local development & testing

Packages

Scrapy

Lambda Functions

Deploy to AWS

Deploy

Destroy

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages