-
Notifications
You must be signed in to change notification settings - Fork 760
Main Console Data Elements and Operations
The "rescan" button causes Heritrix to scan the file system for any changes to the "jobs" directory. The display is then synchronized with the file system.
The "create" button allows you to enter the name of a new crawl job and create it. The crawl job will be based on the defaults profile.
The "add" button allows you to specify a job directory that is not currently managed by Heritrix. After entering the path to the new job directory and clicking "add," Heritrix will allow you to manage the directory. For example, you will now be able configure the job using the crawler-beans.cxml file.
In the Main Console page, the status of running jobs is displayed, as is the number of times the job has been launched, and the path to the job's configuration file. Whether the job is a profile or not is also displayed, along with Heritrix memory statistics.
As of Heritrix 3.1, the "Exit Java Process" button is provided. This button, when invoked with the "I'm sure" checkbox selected, will exit and shutdown the Heritrix software.
Structured Guides:
User Guide
- Introduction
- New Features in 3.0 and 3.1
- Your First Crawl
- Checkpointing
- Main Console Page
- Profiles
- Heritrix Output
- Common Heritrix Use Cases
- Jobs
- Configuring Jobs and Profiles
- Processing Chains
- Credentials
- Creating Jobs and Profiles
- Outside the User Interface
- A Quick Guide to Creating a Profile
- Job Page
- Frontier
- Spring Framework
- Multiple Machine Crawling
- Heritrix3 on Mac OS X
- Heritrix3 on Windows
- Responsible Crawling
- Politeness parameters
- BeanShell Script For Downloading Video
- crawl manifest
- JVM Options
- Frontier queue budgets
- BeanShell User Notes
- Facebook and Twitter Scroll-down
- Deduping (Duplication Reduction)
- Force speculative embed URIs into single queue.
- Heritrix3 Useful Scripts
- How-To Feed URLs in bulk to a crawler
- MatchesListRegexDecideRule vs NotMatchesListRegexDecideRule
- WARC (Web ARChive)
- When taking a snapshot Heritrix renames crawl.log
- YouTube
- H3 Dev Notes for Crawl Operators
- Development Notes
- Spring Crawl Configuration
- Potential Cleanup-Refactorings
- Future Directions Brainstorming
- Documentation Wishlist
- Web Spam Detection for Heritrix
- Style Guide
- HOWTO Ship a Heritrix Release
- Heritrix in Eclipse