Skip to content

JavaScript library for working with Concrete, a data serialization format for NLP

License

Notifications You must be signed in to change notification settings

hltcoe/concrete-js

Repository files navigation

concrete-js

Build status

concrete-js is a Node.js/JavaScript library for working with Concrete, a set of NLP data types defined by a Thrift schema. Thrift makes it easy to use a shared set of data structures across multiple programming languages.

This document is available online alongside an API reference.

Use cases

The concrete-js library is designed for visualization and annotation. While it is possible to implement NLP algorithms in JavaScript, consider using concrete-python or concrete-java instead.

In general, you should use concrete-js when you have a Concrete Communication object that was created by another tool, and need a user interface that visually represents data in the Communication. You may also need the user or annotator to interactively modify Concrete data structures and then save their modifications.

Node.js / JavaScript+jQuery

There are two supported distributions of concrete-js: a Node.js distribution and a JavaScript + jQuery distribution. These two distributions provide similar functionality but are designed to support different approaches to web development. If you use the npm or yarn commands to install third-party libraries, the Node.js distribution will likely fit best in your workflow. If you don't know which distribution you should use, we suggest using the JavaScript + jQuery distribution.

Both distributions provide serialization and deserialization code for Concrete data structures that is generated by Thrift according to the Concrete schema. They also provide utility functions for working with Concrete objects. The JavaScript + jQuery distribution additionally provides visualization code that maps Concrete data structures to DOM elements using jQuery.

The rest of this document describes the Node.js distribution of concrete-js. For documentation of the JavaScript + jQuery distribution, go to the JavaScript + jQuery concrete-js documentation page.

Installation

The Node.js distribution of concrete-js is published as @hltcoe/concrete.

To use concrete-js in your Node.js project, use npm install to add it to your dependencies:

npm install @hltcoe/concrete

Usage: Fetching Communications

The following code provides an example of using concrete-js: Fetching and displaying the first ten Communications from a FetchCommunicationService. To use this example, replace the example URL with the URL to your own FetchCommunicationService:

const concrete = require("@hltcoe/concrete");

/**
 * Returns up to the first ten Communications from a Fetch service.
 *
 * @param fetchClient A Fetch client for the desired Fetch service
 * @param maxNumCommunications The maximum number of Communications to
 *   retrieve
 * @returns A list of maxNumCommunications Communications (or fewer
 *   if the Fetch client does not have that many Communications).
 */
async function headCommunications(fetchClient, maxNumCommunications = 10) {
  const numCommunications = await fetchClient.getCommunicationCount();
  const communicationIds = await fetchClient.getCommunicationIDs(
    0,
    Math.min(maxNumCommunications, numCommunications)
  );
  const request = new concrete.access.FetchRequest({
    communicationIds: communicationIds,
  });
  const result = await fetchClient.fetch(request);
  return result.communications;
}

const client = concrete.util.createXHRClientFromURL(
  "http://localhost:8080/fetch_http_endpoint/",
  concrete.access.FetchCommunicationService,
);
headCommunications(client).then((communications) =>
  communications.forEach((communication) => {
    console.log(`--- ${communication.id}`);
    console.log(communication.text);
    console.log("---");
  })
).catch((error) => console.error(error));

Development

The concrete-js source code can be accessed from GitHub.

Building

You do not need to build concrete-js yourself in order to use it. The Node.js package is available on the npm registry as @hltcoe/concrete, and a working copy of the JavaScript + jQuery library is available in the dist/ directory in the concrete-js repository or here: readable version, minified version. The concrete-js library only needs to be (re)built when the Thrift schema files for Concrete have been updated or when the code in src/ or src_nodejs/ is modified.

Requirements for building concrete-js:

First, install the Node.js packages required for building concrete-js:

npm ci

Then build concrete-js:

npx grunt

This command will build both the JavaScript + jQuery and the Node.js version of the library and then regenerate the documentation:

  1. Build Node.js library
    • call the Thrift compiler to generate Node.js code for the Thrift schema under ~/concrete
    • run JSHint on the code to check for problems
    • combine the Thrift-generated code with the hand-written utility code in dist_nodejs
  2. Build JavaScript + jQuery library
    • call the Thrift compiler to generate JavaScript + jQuery code for the Thrift schema under ~/concrete
    • run JSHint on the code to check for problems
    • combine the Thrift-generated code with the hand-written utility code in dist
    • minify the combined code to reduce download size
    • download a supported version of thrift.js to dist/thrift.js
  3. Generate documentation in docs

Testing

Code linting will be performed automatically when concrete-js is built.

To run tests for both libraries, do:

npm test

Or to run the tests for just the Node.js library or just the JavaScript + jQuery library, do npm run test_nodejs or npm run test_js, respectively.

Publishing a new version to NPM

Read this section carefully, as the publication process is complex.

Publishing a new version of @hltcoe/concrete involves the following steps:

  1. Update the package version
  2. Perform a clean build of the package with the new version
  3. Add changes to git and commit
  4. Tag the commit
  5. Push the built package in dist_nodejs to NPM
  6. Update the package version to a pre-release version
  7. Add changes to git and commit
  8. Push main and the new tag to GitHub and GitLab

These steps are automated in publish.bash. Note that a failure at any point in the process will cause the script to exit with an error. The script echoes each command as it runs to help you recover from a failure.

Example usage:

./publish.bash patch