Skip to content

HOWTO Ship a Heritrix Release

Alex Osborne edited this page Dec 20, 2024 · 12 revisions

The Heritrix issue tracker is at: https://github.com/internetarchive/heritrix3/issues

The GitHub project is at: https://github.com/internetarchive/heritrix3

The project homepage is at: http://crawler.archive.org

The Docker Hub images are at: https://hub.docker.com/r/iipc/heritrix

And of course, this wiki's entry page is: https://github.com/internetarchive/heritrix3/wiki

In a release number X.Y.Z, X is the 'major' release number, Y is the 'minor' release number, and 'Z' is the 'micro' release number. Interim releases may also have an additional -SUFFIX. For more details, see Version Numbering

Getting Started

Before any release, verify that:

  • all tracked issues targeted for that release are resolved or rescheduled for a later release
  • the continuous build box builds successfully, and all automatic unit tests pass, both on a local developer box and the build box
  • the lead developer agrees the code is ready for release and has reviewed recent commit logs for areas of concern
  • committers have been aware a release is upcoming for a reasonable period (days for micro releases; a week+ for minor releases) and refrained from making destabilizing changes

(For 'minor' and 'major' releases, other production-scale test crawling should have already occurred, and an announced 'code freeze' on the relevant trunk may have been in effect for a week or more.)

Using previous wiki page Release Notes as a template, create a skeleton wiki page Release Notes for the planned version. Leave the area where a release date is declared with a 'planned' or 'TK'/'TBD' ('to come' or 'to be determined') notation.

Add notes there of significant changes anyone upgrading should be aware of, with links to other wiki pages or JIRA issues with more info.

Use the dynamic-inclusion links to pull in a live copy of the 'release notes' issue list from JIRA.

Add acknowledgement of any new or outside contributors to this release.

Roll to Official Release Version Numbers

Make a commit to the trunk that sets the official release version number and links the in-distribution 'release notes' to the full wiki release notes.

'Smoke Test'

Verify all expected artifacts (.tar.gz, .zip, -src.tar.gz, -src.zip) were created and have their official distribution names.

Download these each to a remote directory and confirm they expand without error and create expected directory trees.

For at least the .tar.gz, launch the crawler with a webui. Connect to the webui and verify visible version identifiers are as expected.

Using the default profile, configure a minimal test crawl of a several-pages site (>1 page, <100). Launch crawl and verify expected output in crawl.log and normal termination of crawl when finished.

Release to Maven Central

The main project POM has an ossrh build profile, intended to be used to submit Maven artefacts to Maven Central, as per the OSSRH Guide.

To use it, you'll need an OSSRH account and you'll need to request access by getting a current user who as the rights to push to org.archive to comment here with a request to add you to the account.

Then, you'll need to add your username and password to your Maven ~/.m2/settings.xml file, using the sonatype-nexus-staging and sonatype-nexus-snapshots IDs, like this:

  <servers>
    <server>
      <id>sonatype-nexus-snapshots</id>
      <username>anjackson</username>
      <password>********</password>
    </server>
    <server>
      <id>sonatype-nexus-staging</id>
      <username>anjackson</username>
      <password>********</password>
    </server>
  </servers>

and set up GPG as outlined in the OSSRH guide.

Then, you should be able to deploy snapshots with

mvn -Possrh clean deploy

and for releases:

mvn -Possrh release:clean release:prepare
mvn -Possrh release:perform

Note that there may be problems GPG-signing things unless you set a GPG_TTY=$(tty) environment variable, see this for more details.

If there is a problem, you can try mvn release:rollback but sometimes you'll have to delete the local tag (if it's been created) or reset our git repository.

Update the Change Log

To get the change log right, we need to do it after the release so the changes get associated with the new release tag.

We can update the change log via github-changelog-generator. You'll need a suitable token, then you can use:

export CHANGELOG_GITHUB_TOKEN="«your-40-digit-github-token»"
github_changelog_generator -u internetarchive -p heritrix3 --release-branch master

Then commit the updated CHANGELOG.md to the master branch.

Create a release on GitHub

Go to https://github.com/internetarchive/heritrix3/releases and create a release from the release tag. Add a brief summary and include links to the dist TAR and ZIP files hosted on Maven Central (see e.g. https://oss.sonatype.org/content/repositories/releases/org/archive/heritrix/heritrix/3.4.0-20190205/).

Announce

Update the wiki release notes with the actual release date.

Update the project wiki front page to list the new release as the latest, and adjust other wording about upcoming releases accordingly.

Send email to the archive-crawler project list announcing the release, with links to the release notes and download area.

Commit a change to the 'xdocs/index.xml' file in heritrix trunk which auto-generates the http://crawler.archive.org home page, to include a news item in the appropriate place announcing the latest release. (BROKEN NEEDS FIXING: Currently the auto-builds are not uploading the changed website automatically to crawler.archive.org.)

Add a Docker Hub image

See Docker about building current images.

Build images for current release number, tag them with <user>/heritrix[:<label>] (<user> being iipc, optional label consisting of release number, contrib for contribution builds and jre for Java JRE), and push them to Docker Hub.

Heritrix

Structured Guides:

Wiki index

FAQs

User Guide

Knowledge Base

Known Issues

Background Reading

Users of Heritrix

How To Crawl

Development

Clone this wiki locally