This Python project crawls a website and converts each page to a PDF file.
-
Clone the repository:
git clone https://github.com/yourusername/web_to_pdf.git cd web_to_pdf
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Install wkhtmltopdf (required by pdfkit):
- On Ubuntu:
sudo apt-get install wkhtmltopdf
- On macOS:
brew install wkhtmltopdf
- On Windows: Download and install from https://wkhtmltopdf.org/downloads.html
- On Ubuntu:
Run the script with a URL as an argument:
python src/main.py https://example.com
This will crawl the website and save PDFs in the output
directory.
To specify a custom output directory:
python src/main.py https://example.com --output custom_directory
To run the tests, use pytest:
pytest
This project is licensed under the MIT License.