Skip to content

Features List

Shana Moore edited this page Jan 19, 2023 · 9 revisions

Ingest & metadata:

Feature Description
IiifPrint content models Includes Hyrax work types for IiifPrintTitle (publication), IiifPrintContainer (microform reel or bound volume), IiifPrintIssue, IiifPrintPage, and IiifPrintArticle. See Data Model for more info.
IiifPrint's metadata In addition to Hyrax's BasicMetadata module, each model includes content-type-specific metadata fields for rich description. See Metadata Profile for more info.
NDNP batch ingest Batch ingest of files conforming to NDNP digitization specs via a command-line rake task. See the NDNP Batch Ingest Guide for more info.
PDF ingest IiifPrint supports single-item ingest (via the Hyrax UI) or batch ingest (via a command-line rake task) of issue-level PDF files. PDFs are split into component pages, OCR'd, and derivative files are created for each page. See the PDF Batch Ingest Guide for more info.
TIFF image ingest Batch ingest of page-level TIFF or JP2 files (that may not have existing ALTO or OCR files) via a command-line rake task. See the TIFF or JP2 Batch Ingest Guide for more info.
PDF derivative generation A PDF for each page will be created during batch ingest. An issue-level PDF will be compiled during NDNP or TIFF/JP2 page-level batch ingest.
JPEG200 derivative generation A JPEG2000 file for each page will be created during batch ingest to support deep zooming viewer functionality. (If it does not already exist.)
TIFF derivative generation A TIFF master file for each page will be created during batch ingest to support long-term preservation. (If it does not already exist.)
OCR text generation Page images will be OCR'd and a plain TXT file for each page will be created during batch ingest to support full-text searching. (If it does not already exist.)
ALTO 2.0 XML generation Page images will be analyzed for word coordinates within the image and an ALTO XML file for each page will be created during batch ingest to support search term highlighting on page images. (If it does not already exist.)
Word-coordinate JSON generation ALTO XML word coordinate data for each page will be extracted to a JSON file during batch ingest to support search term highlighting on page images and IIIF Content Search.
Automated metadata generation The batch ingest process uses folder and file names to determine LCCN an issue dates, and extracts publication-level metadata from Library of Congress APIs to populate the metadata for objects. Any metadata present in NDNP XML manifest files will also be added.
Controlled vocabulary for article genres A controlled vocabulary of article types (advertisements, obituaries, editorials, etc.) is added to the application during installation (see config/authorities/iiif_print_article_genres.yml) to support filtering search results by article type.

User experience:

Feature Description
Title <-> Issue <-> Page navigation Hierarchical navigation to allow users to easily navigate between iiif_print pages and their parent issues and publications.
Calendar-view issue browse Browse issues for a publication by date.
Chronicling America-style semantic URLs Semantic URLs allow access to titles, issues, and pages via predictable URL pattern based on LCCN, date, edition, and page number (e.g. http://IiifPrint.example.org/iiif_prints/sn85025202/1857-02-14/ed-1/seq-1) to facilitate bookmarking and sharing. This functionality mimics a popular (Chronicling America)[https://chroniclingamerica.loc.gov/about/api/#link] feature.
Download OCR text as TXT, ALTO XML, or word-coordinate JSON Text can be downloaded at the page level in a variety of formats.
Download iiif_print issue as PDF Download an issue-level PDF.
Download iiif_print page image as PDF or JP2 Download page images in a variety of formats.
Download iiif_print article images as as PDF or JP2 Download article page images in a variety of formats.
Display front page of each iiif_print within iiif_print title This feature allows users to view all front pages for a iiif_print title to quickly locate items of interest.

Search Features

Feature Description
Full-text (OCR) search OCR text is automatically extracted and indexed to support full-text searching.
Advanced search An advanced search page is available at /iiif_prints_search which allows users to search only within full-text iiif_print content in your repository. The advanced search form also allows users to limit searches by date range, language, or iiif_print title.
Search within iiif_print title Each publication has its own page in the repository, which includes a search form to support searching only issues and pages from that publication.
Search within iiif_print issue The issue detail page includes a viewer that supports searching within the issue.
Filter search results by iiif_print title Filters search results to pages from a specific publication.
Filter search results by language Filters search results by language.
Filter search results by article type Filters search results by article type (advertisements, obituaries, editorials, etc.).
Filter search results by place of publication Filters search results to pages from publications from a specific geographic area.
Limit search results to front pages Users can limit search results to front pages using the search form on the publication show page.
Highlight search terms on page image Keyword highlights are indicated on the page thumbnail in search results.
Highlight keyword text matches in search results Search results include OCR text snippets with highlighted keyword matches.
Sort search results by publication date Sorts search results by the issue publication date.

Viewer/IIIF Features

The features below require enabling an IIIF image viewer in your application's config/initializers/hyrax.rb Supports IiifManifest v2 and v3.

Feature Description
Deep zooming viewer / page turner Issues, pages, and articles are displayed in a viewer that provides zooming and page-turning functionality.
Full text search The viewer includes a search box to search the full text of the item.
Search terms highlighting in viewer Keyword searches are highlights by default when the show page is loaded.
IIIF Presentation API manifest for iiif_print issues, pages, and articles A JSON manifest conforming to the IIIF Presentation API spec can by accessed by adding /manifest to the show page URL for issues, articles, and pages.
IIIF Content Search API An API endpoint conforming to the IIIF Content Search API spec can be accessed by adding /iiif_search to the show page URL for issues, articles, and pages.

Configurability

The features that are configurable are the following...

DERIVATIVES

Click to expand

By default, the gem will always create four extra derivatives (TIFF, JP2, PDF, and Text Extraction) for all work types.

One can configure which derivatives to create by adding include IiifPrint.model_configuration to the application's model level:

class Book < ActiveFedora::Base
  include IiifPrint.model_configuration(
    derivative_service_plugins: [
      IiifPrint::TIFFDerivativeService
    ]
  )
end

In the example above, the only derivative the gem would create will be the TIFF.

class Book < ActiveFedora::Base
  include IiifPrint.model_configuration(
    derivative_service_plugins: [
      IiifPrint::JP2DerivativeService,
      IiifPrint::PDFDerivativeService,
      IiifPrint::TextExtractionDerivativeService
    ]
  )
end

In this example above, the gem will create JP2, PDF, and Text Extraction.

class Book < ActiveFedora::Base
  include IiifPrint.model_configuration
end

In this example above, the gem will fall back on all four default derivatives services. The same is true if the module is not included.

NOTE: In addition to all the specified derivatives services, the default Hyrax::FileSetDerivativesService will still run since it is responsible for things like thumbnail creation.

PDF Splitting

Click to expand Splits pdfs of configurable work types into child works. By default...

At the model level, include the configuration as follows:

include IiifPrint.model_configuration(pdf_split_child_model: GenericWork)

Metadata Properties for display in the UV

Click to expand