flask-scraper

Scraper engine based on Selenium Python bindings. Uses Flask for its API and ReactJS for its client.

Requirements

Python v3.7+
Node.js v12+
Docker latest

Install

Set up Python enviromnent:

python -m venv .env
source .env/bin/activate
python -m pip install -r requirements.txt

Windows users should use activate.bat instead:

.env\Scripts\activate.bat

Setup Node.js environment:
```
npm install
```
Build client:
```
npm run build
```
Edit composer/standalone-chrome with your VNC password. Replace flaskscraper@123.
Edit docker-compose.yml with your PostgreSQL database. Afterwards start the service with:
```
sudo docker compose up
```
Windows users should omit the sudo
```
docker compose up
```
Navigate to home page:
```
http://localhost:5000
```

Development & Testing

Environment Variables

PYTHONUNBUFFERED: Used to configure python. Set to true
FLASK_ENV: Used to configure Flask server. Set to development
NODE_ENV: Used to configure Webpack. Set to development
DATABASE_URI: Link to PostgreSQL database. Default postgres://postgress:postgress@localhost/postgres
SELENIUM_URI: Link to Selenium API server. Should not include a trailing slash. Default http:/localhost:4444

Directory Structure

.
+-- src
|   +-- client
|   |   +-- static
|   |       +-- index.htm       # react single page app
|   |       +-- favicon.ico
|   |       +-- main.js         # webpack bundle file
|   +-- server
|       +-- app.py              # flask application file
|       +-- conftest.py         # pytest configuration file
|       +-- routes
|           +-- scrapper        # scripts are stored here
+-- .browserlist                # configuration used by babel-loader
+-- .babelrc                    # babel-loader configuration file
+-- docker-compose.yml          # docker service configuration
+-- Dockerfile                  # docker file for flask container
+-- package.json                # node.js configuration
+-- setup.py                    # python configuration
+-- requirements.txt            # python configuration

React.js client files are found in the src/client directory. These are compiled using Webpack into src/client/static directory. See README.md for more information

Flask REST server files are found in src/server directory. You can add new scripts by creating a folder in src/server/routes/scraper directory. See README.md for more information

Run Selenium in Docker

docker run \
    --rm -d -p 4444:4444/tcp -p 5900:5900/tcp \
    --name selenium \
    -e SE_NODE_SESSION_TIMEOUT=240 \
    -e SE_NODE_MAX_SESSIONS=16 \
    -v /dev/shm:/dev/shm \
    selenium/standalone-chrome:91.0

Start Development Server (Linux / bash)

export NODE_ENV=development
export FLASK_ENV=development
source .env/bin/activate
npm run watch &
python -m flask run

Start Development Server (Windows / powershell)

$env:NODE_ENV=development
$env:FLASK_ENV=development
.env\Scripts\activate.bat
Start-Process -NoNewWindow npm -ArgumentList "run", "watch"
python -m flask run

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
Profile		Profile
composer		composer
src		src
.babelrc		.babelrc
.browserslistrc		.browserslistrc
.gitignore		.gitignore
.snyk		.snyk
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

flask-scraper

Requirements

Install

Development & Testing

Environment Variables

Directory Structure

Run Selenium in Docker

Start Development Server (Linux / bash)

Start Development Server (Windows / powershell)

About

Packages

Languages

License

wotsyula/flask-scraper

Folders and files

Latest commit

History

Repository files navigation

flask-scraper

Requirements

Install

Development & Testing

Environment Variables

Directory Structure

Run Selenium in Docker

Start Development Server (Linux / bash)

Start Development Server (Windows / powershell)

About

Resources

License

Stars

Watchers

Forks

Packages 0

Languages

Packages