Skip to content

Implemented a web crawler using java library (jsoup) on a publicly available website BookstoScrape

Notifications You must be signed in to change notification settings

Arham-12336/Web-Scrapping-Java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web-Scrapping-Java

⚠️ Problem

You have a HTML document that you want to extract data from. You know generally the structure of the HTML document.

♻️ Java HTML Parser

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.

Visit https://jsoup.org/ for more details

🔰 Getting Started

✅ Prerequisites

💻 Running the Application

  1. Clone the repository
$ git clone https://github.com/Arham-12336/Web-Scrapping-Java-.git
  1. Check into the cloned repository
$ cd main.xml
  1. Install the dependencies and package the application
$ mvn package
  1. Run the web scraper
Run the xml file on the IDE

🤝 Contribution

Please feel free to raise issues using this and I'll get back to you.

You can also fork the repository, make changes and submit a Pull Request.

About

Implemented a web crawler using java library (jsoup) on a publicly available website BookstoScrape

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages