Skip to content

A PHP library/toolkit designed to handle all of your web scraping needs under a MIT or LGPL license.

Notifications You must be signed in to change notification settings

stibiumz/ultimate-web-scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 

Repository files navigation

Ultimate Web Scraper Toolkit

A PHP library of tools designed to handle all of your web scraping needs under a MIT or LGPL license. This toolkit easily makes RFC-compliant web requests that are indistinguishable from a real web browser, a web browser-like state engine for handling cookies and redirects, and a full cURL emulation layer for web hosts without the PHP cURL extension installed. Simple HTML DOM is included to easily extract the desired content from each retrieved document.

Features

  • Carefully follows the IETF RFC Standards surrounding the HTTP protocol.
  • Supports file transfers, SSL, and HTTP/HTTPS/CONNECT proxies.
  • Easy to emulate various web browser headers.
  • A web browser-like state engine that emulates redirection (e.g. 301) and automatic cookie handling for managing multiple requests.
  • Extensive callback support.
  • A full cURL emulation layer for drop-in use on web hosts that are missing cURL.
  • Includes Simple HTML DOM to easily parse and extract the desired content from HTML.
  • Has a liberal open source license. MIT or LGPL, your choice.
  • Designed for relatively painless integration into your project.
  • Sits on GitHub for all of that pull request and issue tracker goodness to easily submit changes and ideas respectively.

More Information

Documentation, examples, and official downloads of this project sit on the Barebones CMS website:

http://barebonescms.com/documentation/ultimate_web_scraper_toolkit/

About

A PHP library/toolkit designed to handle all of your web scraping needs under a MIT or LGPL license.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published