Skip to content

A collection of java applications that process structured text inputs.

License

Notifications You must be signed in to change notification settings

havrikov/text-processing-java-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Processing Java Projects

This is a collection of drivers for projects that take structured text inputs.
The projects are built with jacoco instrumentation and report code coverage and thrown exceptions.

Build Instructions

You require java 1.8 or greater.
To build all projects simply execute ./gradlew build (or .\gradlew.bat build on Windows) in the project root directory.
This will generate instrumented, executable jars in the build/libs directory.

There is also the command ./gradlew gatherOriginals, which will download the original, uninstrumented versions of the projects' artifacts into build/originals. These are required for producing coverage reports.

Running the Projects

After being built, every project can be invoked like any normal runnable jar.
To get more information, you can call a project with the --help argument, e.g.:

java -jar build/libs/argo-subject.jar --help

For a more interesting example, if you have some inputs located in ~/tmp/json, you can run the Argo json parser with the following command:

java -jar build/libs/argo-subject.jar \
--ignore-exceptions \
--log-exceptions argo.exceptions.json \
--report-coverage argo.coverage.csv \
--original-bytecode build/libs/originals/argo-5.4.jar \
~/tmp/json

This will execute the parser on all inputs in ~/tmp/json and log all exceptions into argo.exceptions.json and produce a coverage report in argo.coverage.csv.

Turning off the Instrumentation

You can set the de.cispa.se.subjects.instrument property to false to build the subjects without instrumenting them with jacoco. They will still be runnable, but the reported coverage will be zero.

Repository Structure

This repository is organized as a gradle multi-project where each subdirectory encapsulates a driver for a project, with a few notable exceptions:

.
├── argo      <-- driver for project argo
├── autolink  <-- driver for project autolink
├── ...       <-- more project drivers...
├── build     <-- the output directory where the built projects end up
├── buildSrc  <-- single source of truth for dependency and project versions
├── gradle    <-- gradle wrapper, so you don't have to install a build tool
└── utils     <-- this contains the entry point, command line processing, and coverage and exception reporting; it is used in all drivers

Projects

These are the projects, which are currently supported:

JSON

Project Version Instrumented Package
argo 5.16 argo
fastjson 1.2.76 com.alibaba.fastjson
genson 1.6 com.owlike.genson
gson 2.8.6 com.google.gson
jackson-databind 2.12.2 com.fasterxml.jackson
json-flattener 0.12.0 com.github.wnameless.json
json-java 20210307 org.json
json-simple-cliftonlabs 3.1.1 com.github.cliftonlabs.json_simple
json-simple 1.1.1 org.json.simple
json2flat 1.0.3 com.github.opendevl
minimal-json 0.9.5 com.eclipsesource.json
pojo 1.1.0 org.jsonschema2pojo

URL

Project Version Instrumented Package
autolink 0.10.0 org.nibor.autolink
galimatias-nu 0.1.3 io.mola.galimatias
galimatias 0.2.1 io.mola.galimatias
jurl v0.4.2 com.anthonynsimon.url
url-detector 0.1.17 com.linkedin.urls.detection

Markdown

Project Version Instrumented Package
commonmark 0.17.0 org.commonmark
flexmark 0.34.48 com.vladsch.flexmark
markdown-papers 1.4.4 org.tautua.markdownpapers
markdown4j 2.2-cj-1.1 org.markdown4j
markdownj 0.4 org.markdownj
txtmark 0.13 com.github.rjeschke.txtmark

CSV

Project Version Instrumented Package
commons-csv 1.8 org.apache.commons.csv
jackson-dataformat-csv 2.12.2 com.fasterxml.jackson.dataformat
jcsv 1.4.0 com.googlecode.jcsv
sfm-csv 8.2.3 org.simpleflatmapper.csv
simplecsv 2.1 net.quux00.simplecsv
super-csv 2.4.0 org.supercsv

JavaScript

Project Version Instrumented Package Notes
closure v20210302 com.google.javascript.jscomp
nashorn-sandbox 0.2.0 delight.nashornsandbox Delegates to Nashorn (Rhino's precursor)
rhino-sandbox 0.0.15 delight.rhinosandox Not a typo. Delegates to Rhino.
rhino 1.7.13 org.mozilla.javascript

CSS

Project Version Instrumented Package Notes
batik-css 1.14 org.apache.batik.css
css-validator 1.0.8 org.w3c.css.css ⚠️ Currently unsupported because of Jacoco Error: "Method too large: org/w3c/css/parser/analyzer/CssParserTokenManager.jjMoveNfa_0 (II)I"
cssparser 0.9.29 net.sourceforge.cssparser ⚠️ Currently unsupported because of Jacoco Error: "Method too large: com/steadystate/css/parser/SACParserCSS21TokenManager.jjMoveNfa_0 (II)I"
flute 1.3 org.w3c.flute
jstyleparser 4.0.0 net.sf.cssbox
ph-css 6.3.0 com.helger.css ⚠️ Currently unsupported because of Jacoco Error: "Method too large: com/helger/css/parser/ParserCSS30TokenManager.jjMoveNfa_0 (II)I"

INI

Project Version Instrumented Package
fastini 1.2.1 com.github.onlynight.fastini
ini4j 0.5.4 org.ini4j
java-configparser 0.2 ca.szc.configparser

Dot

Project Version Instrumented Package
digraph-parser 1.0 com.paypal.digraph.parser
graphstream 2.0 org.graphstream
graphviz-java 0.18.1 guru.nidi

About

A collection of java applications that process structured text inputs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages