Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. The web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.
Web scraping is the process of automatically mining data or collecting information from the World Wide Web. It is a field with active developments sharing a common goal with the semantic web vision, an ambitious initiative that still requires breakthroughs in text processing, semantic understanding, artificial intelligence, and human-computer interactions. Current web scraping solutions range from the ad-hoc, requiring human effort, to fully automated systems that can convert entire websites into structured information, with limitations.
The purpose of this project is to develop a scraper tool to achieve web-scrapping. This was achieved using Ruby, Open-URI, and Nokogiri gem. Byebug debugger is used to check the values of the scrapped data from the page. In this project, I created a scraper that extracts job advertisements for junior web developers from the Simplyhired.com.
- Ruby 2.7.1
To get started, you should first get this file in your local machine by downloading this project or typing.
git clone https://github.com/arslanbisharat/Capstone_Project_Ruby/
- Ruby installed on local machine
- Text editor (preferably: VSCode, Atom, Sublime)
- Git
If you have installed Ruby
on your machine:
- Clone the project into your local machine using the `git clone command or download the zip file.
- Go into the project directory using the
cd directory name
command. - Install required gems by using
gem install <gem name>
gem install nokogiri
gem install colorize
- From the root directory type the
bin/main.rb
command. - Run command RSpecc ` to test the various methods in the classes.
When you run the project it will show you job advertisements on the selected page, then prompts the user to see more or stop. If you want to see more results you can press the 'y' button or 'Enter/Return key. If you want to stop or found a job that matches you, then press the 'n' or 'a button. The scraping process will be stopped.
In job advertisements you can found information about job title, hiring company and its location, estimated salary per year and the link about job descriptions. If you
interest in any job you can go to the job url and apply for the job.
🤝 Contributions, issues, and feature requests are welcome! Start by:
1. Forking the project
2. Cloning the project to your local machine
3. cd into the project directory
4. Run git checkout -b your-branch-name
5. Make your contributions
6. Push your branch up to your forked repository
7. Open a Pull Request with a detailed description to the development branch of the original project for a review
Please feel free to contribute to any of these!
Feel free to check the issues page.
👤 Muhammad Arslan
- Github: @githubhandle
- Twitter: @twitterhandle
- Linkedin: linkedin
Give a 🌟 if you like this project! 😊
📝 Copyright
This project is MIT licensed
Happy coding!