Semargl is a modular framework for crawling linked data from structured documents. The main goal of the project is to provide lightweight and performant tool without excess dependencies.
At this moment Semargl offers high-performant streaming parsers for RDF/XML, RDFa, N-Triples, JSON-LD, streaming serializers for Turtle, NTriples, NQuads and integration with Jena, Clerezza and Sesame.
Small memory footprint, and CPU requirements allow framework to be embedded in any system. It runs seamlessly on Android and GAE.
You can check some framework capabilities via RDFa parser demo.
Semargl’s code is small and simple to understand. It has no external dependencies and it will never read a mail. Internally it operates with a raw strings and creates as few objects as possible, so your Android or GAE applications will be happy.
All parsers and serializers fully support corresponding W3C specifications and test suites.
No jokes!
<dependency>
<groupId>org.semarglproject</groupId>
<artifactId>semargl-rdfa</artifactId>
<version>0.7</version>
</dependency>
// just init triple store you want
MGraph graph = ... // Clerezza calls
// create processing pipe
StreamProcessor sp = new StreamProcessor(NTriplesParser.connect(ClerezzaSink.connect(graph));
// and run it
sp.process(file, docUri);
If you want to use Semargl as a standalone framework, you can find useful internal serializers and easily extendable API.
To build framework just run mvn clean install
. RDFa tests require direct Internet connection.