idrovora - A pump station for your XProc pipelines

This is work in progress!

XProc is a versatile language to describe XML-based processing pipelines. With XML Calabash and its embedded Saxon Processor exists a stable and feature-rich implementation, that allows developers to implement document processing logic in a platform-independent way using XML-based technologies.

Idrovora provides an execution context for XProc pipelines, where

pipelines are run with input read from and results written to the filesystem following a common directory layout,
the execution of pipelines can be triggered via requests to an embedded HTTP server or via filesystem events (hot folders),
pipelines are run concurrently and asynchronously by a daemon process, thereby not incurring JVM-related startup costs for each pipeline execution.

It is designed as a backend service for systems which depend on XML-based processing logic and consequently have to incorporate a Java runtime environment needed by the aforementioned tools, but want to do so with a minimalistic interface based on HTTP and the local filesystem.

Getting Started

Prerequisites

In order to build, test and run Idrovora, you need to install

Alternatively you can run Idrovora within a Docker container in which case nothing is needed except a Docker installation.

Build Docker container

$ make

Test

$ make test

Develop and Run

$ clojure -m idrovora.cli --help
Idrovora - A pump station for your XProc pipelines
Copyright (C) 2020 Gregor Middell

See <https://github.com/gremid/idrovora> for more information.

Usage: clojure -m idrovora.cli [OPTION]...

Options:
  -x, --xpl-dir $IDROVORA_XPL_DIR           workspace/xpl   source directory with XProc pipeline definitions
  -j, --job-dir $IDROVORA_JOB_DIR           workspace/jobs  spool directory for pipeline jobs
  -p, --port $IDROVORA_HTTP_PORT            3000            HTTP port for embedded server
  -c, --cleanup $IDROVORA_CLEANUP_SCHEDULE  0 1 0 * * ?     Schedule for periodic cleanup of old jobs (cron expression)
  -a, --job-max-age $IDROVORA_JOB_MAX_AGE   PT168H          Maximum age of jobs; older jobs are removed periodically
  -h, --help
[…]

Starting Idrovora with defaults, creating a workspace/ in the current directory and running an HTTP server on port 3000, simply execute the JAR without any arguments:

$ clojure -m idrovora.cli
2020-03-09 14:52:51 [main       | INFO  | idrovora.cli        ] Starting Idrovora
2020-03-09 14:52:51 [main       | INFO  | idrovora.http       ] Starting HTTP server at 3000/tcp
2020-03-09 14:52:51 [main       | INFO  | idrovora.workspace  ] Start watching 'workspace/jobs'
2020-03-09 14:52:51 [main       | INFO  | idrovora.workspace  ] Scheduling cleanup of old jobs (0 1 0 * * ?)

Pipelines can be defined in workspace/xpl/, for instance a pipeline named test in workspace/xpl/test.xpl:

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:c="http://www.w3.org/ns/xproc-step"
                version="1.0">
  <p:option name="source-dir" required="true"/>
  <p:option name="result-dir" required="true"/>
  <p:load name="read-from-input">
    <p:with-option name="href" select="concat($source-dir,'document.xml')"/>
  </p:load>
  <p:identity/>
  <p:store name="store-to-output">
    <p:with-option name="href" select="concat($result-dir,'document.xml')"/>
  </p:store>
</p:declare-step>

Jobs for this pipeline can then be created in sub-directories under workspace/jobs/test/, i. e.

workspace/jobs/
└── test
    └── 82d11012-cf02-4ec0-b3c6-f9fe004de7b0
        ├── result
        │   └── document.xml
        ├── source
        │   └── document.xml
        └── status
            ├── job-failed
            ├── result-ready
            └── source-ready

In a job-directory, here one named with the UUID 82d11012-cf02-4ec0-b3c6-f9fe004de7b0, the sub-directories source/ and result/ hold input and output data for the job; they are passed as URIs via options (source-dir and result-dir) to the pipeline. Files in the status/ sub-directory are used for controlling job execution and signaling job completion:

touching status/source-ready will signal to Idrovora that the sources have been written to source/ and the job can be scheduled for execution,
upon creation/modification of status/result-ready, a listening process can assume that the pipeline executed successfully and results can be picked up from result/,
the creation/modification of status/job-failed signals an error occurring while the job was run through the pipeline.

Roadmap

Notes/Links

License

This project is licensed under the GNU General Public License v3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
dev		dev
hooks		hooks
src		src
test		test
workspace/xpl		workspace/xpl
.dir-locals.el		.dir-locals.el
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
deps.edn		deps.edn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

idrovora - A pump station for your XProc pipelines

Getting Started

Prerequisites

Build Docker container

Test

Develop and Run

Roadmap

Notes/Links

License

About

Releases

Packages

Languages

License

gremid/idrovora

Folders and files

Latest commit

History

Repository files navigation

idrovora - A pump station for your XProc pipelines

Getting Started

Prerequisites

Build Docker container

Test

Develop and Run

Roadmap

Notes/Links

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages