Ultimate Web Scraper Toolkit

A PHP library of tools designed to handle all of your web scraping needs under a MIT or LGPL license. This toolkit easily makes RFC-compliant web requests that are indistinguishable from a real web browser, a web browser-like state engine for handling cookies and redirects, and a full cURL emulation layer for web hosts without the PHP cURL extension installed. Simple HTML DOM is included to easily extract the desired content from each retrieved document.

Features

Carefully follows the IETF RFC Standards surrounding the HTTP protocol.
Supports file transfers, SSL, and HTTP/HTTPS/CONNECT proxies.
Easy to emulate various web browser headers.
A web browser-like state engine that emulates redirection (e.g. 301) and automatic cookie handling for managing multiple requests.
Extensive callback support.
A full cURL emulation layer for drop-in use on web hosts that are missing cURL.
Includes Simple HTML DOM to easily parse and extract the desired content from HTML.
Has a liberal open source license. MIT or LGPL, your choice.
Designed for relatively painless integration into your project.
Sits on GitHub for all of that pull request and issue tracker goodness to easily submit changes and ideas respectively.

More Information

Documentation, examples, and official downloads of this project sit on the Barebones CMS website:

http://barebonescms.com/documentation/ultimate_web_scraper_toolkit/

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
support		support
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ultimate Web Scraper Toolkit

Features

More Information

About

Releases

Packages

stibiumz/ultimate-web-scraper

Folders and files

Latest commit

History

Repository files navigation

Ultimate Web Scraper Toolkit

Features

More Information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages