Skip to content

An Open Source Java implementation of the Validation Transformation Language, based on the VTL 1.1 draft specification. The implementation follows the JSR-223 Java Scripting API and exposes a simple connector interface one can implement in order to integrate with any data stores. VTL is a standard language for defining validation and transformat…

License

Notifications You must be signed in to change notification settings

takvamborgen/java-vtl

 
 

Repository files navigation

Build Status Codacy Badge Codacy coverage Gitter

Java VTL: Java implementation of VTL

The Java VTL project is an open source java implementation of the VTL 1.1 draft specification. It follows the JSR-223 Java Scripting API and exposes a simple connector interface one can implement in order to integrate with any data stores.

Visit the interactive reference manual for more information.

Modules

The project is divided in modules;

  • java-vtl-parent
    • java-vtl-parser, contains the lexer and parser for VTL.
    • java-vtl-model, VTL data model.
    • java-vtl-script, JSR-223 (ScriptEngine) implementation.
    • java-vtl-connector, connector API.
    • java-vtl-tools, various tools.

Usage

Add a dependency to the maven project

<dependency>
    <groupId>no.ssb.vtl</groupId>
    <artifactId>java-vtl-script</artifactId>
    <version>[VERSION]</version>
</dependency>

Evaluate VTL expressions

ScriptEngine engine = new VTLScriptEngine(connector);

Bindings bindings = engine.getBindings(ScriptContext.ENGINE_SCOPE);
engine.eval("ds1 := get(\"foo\")" +
            "ds2 := get(\"bar\")" +
            "ds3 := [ds1, ds2] {" +
            "   filter ds1.id = \"string\"," +
            "   total := ds1.measure + ds2.measure" +
            "}");

System.out.println(bindings.get("ds3"))

Connect to external systems

VTL Java uses the no.ssb.vtl.connector.Connector interface to access and export data from and to external systems.

The Connector interface defines three methods:

public interface Connector {

    boolean canHandle(String identifier);

    Dataset getDataset(String identifier) throws ConnectorException;

    Dataset putDataset(String identifier, Dataset dataset) throws ConnectorException;

}

The method canHandle(String identifier) is used by the engine to find which connector is able to provide a Dataset for a given identifier.

The method getDataset(String identifier) is then called to get the dataset. Example implementations can be found in the java-vtl-ssb-api-connector module but a very crude implementation could be as such:

class StaticDataset implements Dataset {

    private final DataStructure structure = DataStructure.builder()
            .put("id", Role.IDENTIFIER, String.class)
            .put("period", Role.IDENTIFIER, Instant.class)
            .put("measure", Role.MEASURE, Long.class)
            .put("attribute", Role.ATTRIBUTE, String.class)
            .build();

    @Override
    public Stream<DataPoint> getData() {

        List<Map<String, Object>> data = new ArrayList<>();
        HashMap<String, Object> row = new HashMap<>();
        Instant period = Instant.now();
        for (int i = 0; i < 100; i++) {
            row.put("id", "id #" + i);
            row.put("period", period);
            row.put("measure", Long.valueOf(i));
            row.put("attribute", "attribute #" + i);
            data.add(row);
        }

        return data.stream().map(structure::wrap);
    }

    @Override
    public Optional<Map<String, Integer>> getDistinctValuesCount() {
        return Optional.empty();
    }

    @Override
    public Optional<Long> getSize() {
        return Optional.of(100L);
    }

    @Override
    public DataStructure getDataStructure() {
        return structure;
    }
}

Implementation roadmap

This is an overview of the implementation progress.

Group Operators Progress Comment
General purpose round parenthesis done
General purpose := (assignment) done
General purpose membership done
General purpose get usable The keep, filter and aggregate are not yet reflected in the connector interface.
General purpose put usable The Connector interface is defined but expressions are not recognized yet.
Join expression []{} done
Join clause filter done
Join clause keep done
Join clause drop done
Join clause fold done
Join clause unfold done
Join clause rename done
Join clause := (assignment) done
Join clause . (membership) done
Clauses rename done
Clauses filter done
Clauses keep done
Clauses calc todo
Clauses attrcalc todo
Clauses aggregate todo
Conditional if-then-else todo
Conditional nvl usable Dataset as input not implemented.
Validation Comparisons (>,<,>=,<=,=,<>) usable Only inside join expression (no lifting).
Validation in,not in, between todo
Validation isnull done Implemented syntax are isnull(value), value is null and value is not null
Validation exist_in, not_exist_in todo
Validation exist_in_all, not_exist_in_all todo
Validation check usable The boolean dataset must be built manually (no lifting).
Validation match_characters todo
Validation match_values todo
Statistical min, max todo
Statistical hierarchy usable The inline definition is not supported. A dataset that has a correct structure can be used instead.
Statistical aggregate todo
Relational union done
Relational intersect todo
Relational symdiff todo
Relational setdiff todo
Relational merge todo
Boolean and usable Only inside join expression (no lifting).
Boolean or usable Only inside join expression (no lifting).
Boolean xor usable Only inside join expression (no lifting).
Boolean not usable Only inside join expression (no lifting).
Mathematical unary plus and minus usable Only inside join expression (no lifting).
Mathematical addition, substraction usable Only inside join expression (no lifting).
Mathematical multiplication, division usable Only inside join expression (no lifting).
Mathematical round todo
Mathematical abs todo
Mathematical trunc todo
Mathematical power, exp, nroot todo
Mathematical in, log todo
Mathematical mod todo
String length todo
String concatenation todo
String trim todo
String upper/lower case todo
String substring todo
String indexof todo
String date_from_string usable Dataset as input not implemented. Only YYYY date format accepted.

Analytics

About

An Open Source Java implementation of the Validation Transformation Language, based on the VTL 1.1 draft specification. The implementation follows the JSR-223 Java Scripting API and exposes a simple connector interface one can implement in order to integrate with any data stores. VTL is a standard language for defining validation and transformat…

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 62.2%
  • JavaScript 26.3%
  • CSS 5.7%
  • GAP 2.5%
  • HTML 2.1%
  • ANTLR 1.0%
  • Other 0.2%