Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
VanitasXIV authored Jul 15, 2024
1 parent 830f338 commit ae9ca85
Showing 1 changed file with 66 additions and 2 deletions.
68 changes: 66 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,66 @@
# WebScrapping
A Web Scrapping App made with Pyton using urllib and BeautifulSoup libraries
# Web Scraping Tool

This Python script allows you to scrape web pages for various types of content such as links, headers, paragraphs, and images. It uses the `BeautifulSoup` library to parse HTML and extract the desired data, which can then be saved to text files.

## Features

- **Input Validation**: Ensures the URL provided is valid.
- **Exception Handling**: Catches errors during URL fetching.
- **Extracting Other Tags**: Retrieves and prints other HTML elements like headers, paragraphs, and images.
- **Exporting Data**: Saves the retrieved links and other extracted data to a file.
- **Improved User Interface**: Provides a menu for the user to choose what to extract.

## Prerequisites

- Python 3.x
- `BeautifulSoup` library
- `urllib` module

## Installation

1. Clone this repository:
```sh
git clone https://github.com/your-username/web-scraping-tool.git
cd web-scraping-tool
```

2. Install the required packages:
```sh
pip install beautifulsoup4
```

## Usage

1. Run the script:
```sh
python web_scraping_tool.py
```

2. Enter the URL you want to scrape when prompted.

3. Choose what type of content you want to extract by selecting the corresponding option:
- Links
- Headers
- Paragraphs
- Images
- Exit

4. The extracted data will be printed on the screen and saved to a corresponding text file (`links.txt`, `headers.txt`, `paragraphs.txt`, or `images.txt`).

## Example

```sh
Enter URL: https://www.example.com
Choose what to extract:
1. Links
2. Headers
3. Paragraphs
4. Images
5. Exit
Enter your choice: 1
Links:
https://www.example.com/page1
https://www.example.com/page2
...
Data saved to links.txt

0 comments on commit ae9ca85

Please sign in to comment.