Web Downloader is a powerful and user-friendly application built with Python and PyQt5, designed to help you effortlessly download entire websites, including their HTML, CSS, JavaScript, images, SVGs, videos, and documents. Whether you're archiving your favorite sites, working offline, or analyzing website structures, Web Downloader has got you covered!
- Batch Downloads: Enter multiple website URLs at once and download them simultaneously.
- Selective Resource Downloading: Choose which types of resources (HTML, CSS, JS, Images, Fonts, Videos, SVGs, Documents) you want to download.
- Respect or Bypass
robots.txt
: Decide whether to adhere to a website'srobots.txt
rules. - Concurrency Control: Set the number of concurrent threads to optimize download speeds based on your system's capabilities.
- Proxy Support: Configure proxy settings to route your downloads through a proxy server, enhancing privacy and bypassing network restrictions.
- Rate Limiting: Prevent server overload by setting delays between HTTP requests.
- User-Agent Customization: Select from predefined User-Agent strings or input a custom one to mimic different browsers or devices.
- Custom HTTP Headers: Add custom HTTP headers to your requests for advanced configurations.
- Basic Authentication: Provide username and password for websites requiring basic authentication.
- Ignore Specific MIME Types: Exclude certain MIME types from being downloaded to streamline the process.
- Schedule Downloads: Schedule your downloads to start at a specific time, allowing for automated operations.
- Download History: Keep track of your downloads with a searchable and filterable history log.
- Pause, Resume, and Stop Downloads: Control your download processes with easy-to-use pause, resume, and stop functionalities.
- Responsive UI: Enjoy a sleek, dark-themed interface with real-time progress indicators, detailed logs, and a resource table.
- Preview Downloaded Pages: (If PyQtWebEngine is installed) Preview the last downloaded page directly within the application.
- Open Download Folder: Quickly access the folder where your websites are downloaded.
- Cross-Platform Compatibility: Runs smoothly on Windows, macOS, and Linux.
- Python 3.7+: Ensure you have Python installed. Download Python
- Pip: Python package installer. It typically comes bundled with Python.
-
Clone the Repository
git clone https://github.com/skillerious/WebDownloader.git cd WebDownloader
-
Install Required Dependencies
It's recommended to use a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts�ctivate
Install dependencies:
pip install -r requirements.txt
If you don't have a
requirements.txt
, install manually:pip install PyQt5 requests beautifulsoup4 playwright qdarkstyle
Install Playwright Browsers:
After installing Playwright, you need to install the necessary browsers:
playwright install
-
Run the Application
python main.py
-
(Optional) Create an Executable
For easier distribution, you can create a standalone executable using
PyInstaller
:pip install pyinstaller pyinstaller --onefile --windowed main.py
The executable will be available in the
dist
folder.
After running main.py
, the Web Downloader window will appear with a clean and intuitive interface.
-
Batch Input: Enter one or multiple website URLs in the "Website URLs" text area, each on a new line.
https://example.com https://anotherexample.com
- Click the Browse button to choose the folder where you want the downloaded websites to be saved.
- Resource Types: Select the types of resources you wish to download by checking the corresponding boxes (HTML, CSS, JS, Images, Fonts, Videos, SVGs, Documents).
- Timeout: Set the timeout duration for HTTP requests.
- Retries: Specify the number of retry attempts for failed downloads.
- Max Depth: Determine the recursion depth for downloading linked pages.
- Concurrency: Set the number of concurrent threads to optimize download speed.
- Proxy (Optional): Enter proxy server details if you need to route downloads through a proxy.
- Proxy Authentication: If your proxy requires authentication, enable the proxy authentication option and provide the username and password.
- Respect
robots.txt
: Toggle this option to adhere to or bypassrobots.txt
rules. - Rate Limit: Set delays between HTTP requests to prevent server overload.
- Ignore HTTPS Errors: Choose to ignore HTTPS certificate errors if necessary.
- Max File Size: Specify the maximum file size (in MB) for downloaded resources. Resources exceeding this size will be skipped.
- Download Structure: Choose between "keep" to preserve the original website structure or "flatten" to organize all resources into a single directory.
- Follow External Links: Decide whether to download resources from external domains linked within the websites.
- Ignore MIME Types: Exclude certain MIME types from being downloaded to streamline the process.
- User-Agent: Choose from predefined User-Agent strings or select "Custom" to input your own.
- Navigate to the Advanced tab in Settings.
- Add any custom HTTP headers required for specific websites or use cases.
- If the target website requires basic authentication, navigate to the Advanced tab in Settings.
- Enter the required username and password.
- In the Advanced tab within Settings, enable "Schedule Download" and set the desired time.
- The download will automatically start at the scheduled time.
- Click the Download button to initiate the download process.
- Progress Bar: Monitor the download progress in real-time.
- Logs: View detailed logs of each action, including successes and errors.
- Resource Table: Keep track of each resource's download status and path.
- Pause/Resume: Use the Pause and Resume buttons to control ongoing downloads.
- Stop: Click the Stop button to terminate the download process immediately.
- Navigate to the History section to view past downloads.
- Search & Filter: Utilize the search bar to find specific downloads.
- Open Folder: Quickly access the folder where a particular website was downloaded.
- (If PyQtWebEngine is installed) After a successful download, a Preview Last Downloaded Page button will appear.
- Click it to view the downloaded HTML page within the application.
- After downloading, an Open Download Folder button will appear.
- Click it to open the folder containing the downloaded website files.
- Go to the Settings page to modify application-wide settings.
- Remember to click Save Settings after making changes.
- Visit the About page for information about the application, version details, and acknowledgments.
- Operating System: Windows, macOS, or Linux
- Python: Version 3.7 or higher
- Python Packages:
- PyQt5
- requests
- beautifulsoup4
- playwright
- qdarkstyle
- Settings: Stored in
settings.json
within the application directory. This file holds all user-configurable settings. - History: Stored in
history.json
, maintaining a log of all downloaded websites and their respective download paths. - Download Cache: Managed in
cache.json
, preventing duplicate downloads and tracking failed resources.
These files are automatically managed by the application. Avoid manual edits to prevent corruption.
Contributions are welcome! If you'd like to enhance Web Downloader, follow these steps:
-
Fork the Repository
-
Create a Feature Branch
git checkout -b feature/YourFeature
-
Commit Your Changes
git commit -m "Add YourFeature"
-
Push to the Branch
git push origin feature/YourFeature
-
Open a Pull Request
Please ensure your code follows the project's coding standards and includes relevant documentation.
This project is licensed under the MIT License. You are free to use, modify, and distribute this software as per the license terms.
If you encounter any issues or have questions about Web Downloader, feel free to reach out:
- Email: [email protected]
- GitHub Issues: Open an Issue
- PyQt5: The framework used for building the graphical user interface.
- Requests: Simplifies HTTP requests.
- BeautifulSoup: Facilitates HTML parsing.
- Playwright: Enables headless browser automation for dynamic content rendering.
- QDarkStyle: Provides the dark theme for the application.
- Icon Design: Icons made by Freepik from www.flaticon.com.
- Open-Source Community: Special thanks for providing invaluable tools and libraries that made this project possible.
While Web Downloader provides the tools to download website content, please ensure you have the necessary permissions to download and use the content from the target websites. Always respect copyright laws and website policies.
Enjoy seamless website downloading with Web Downloader! If you find this tool helpful, please consider giving it a 👍 and sharing it with others. Happy downloading! 🌐💾