Alternative Tabular to RDF converters

csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

csv2rdf4lod is a tool by some folks in the Tetherless World Constellation at RPI. It is currently being used as part of the infrastructure for their Linking Open Government Data and Linking Open Biomedical Data projects. Tim Lebo wrote it with some invaluable design guidance from Greg Williams.

The number of utilities available to convert tabular data to RDF suggests a large and diverse set of requirements. To help you find the right match for your needs, this page collects pointers to other utilities that can convert tabular data to RDF.

If you know of yet another, feel free to email Tim or jot (and save) suggestions on this piratepad.

Please note that csv2rdf4lod was NOT used to produce the RDF available at http://www.data.gov, such as http://www.data.gov/semantic/data/alpha/92/dataset-92.rdf.gz. That was from some code somewhere in Google.

Special thanks to Jim McCusker, Paola, Li, Greg, Christoph, and Alvaro for their help in developing this list.

Other listings

http://www.lbd.dcc.ufmg.br/colecoes/escience/2016/010.pdf compares Refine, SemantEco, csv2rdf4lod, Any23, RML, SML, and BioDSL
circa 2015 W3C draft http://w3c.github.io/csvw/csv2rdf/
W3C's wiki: ConverterToRdf
MIT SIMILE listing: RDFizers
LOD2 deliverable: Report on Knowledge Extraction from Structured Sources
Michael Bergman's Sweet Tools listing
LATC project's Data Publication & Consumption Tools Library
The Open Data Institute's open data tech review
Linked University's Converting Legacy Data to RDF
(broken) http://www.opendataday.org/wiki/Tools
VIVO's listing
stackexchange.com

SETLr

As of Oct 2017, SETLr is still RPI's latest approach to tabular conversions.

Ontop

expose a relational database content in RDF : Ontop http://ontop.inf.unibz.it/

Tabula

https://github.com/tabulapdf/tabula/releases/tag/v1.0.0

KARMA

homepage: http://www.isi.edu/integration/karma/
Available under Apache 2 License on GitHub
GUI based
Uses Conditional Random Field (CRF) to propose mappings to classes and properties.
Uses relational database and views.
Avoids data preparation - requires it as a preprocessing step.
Provides entity matching based on Song and Heflin's entity coreference approach (Silk did not work for them)
Permits manual curation of sameAs links. Uses PROV-O to distinguish different sets of links.
It’s read data from several different formats, including a relational database. Then one loads in one or more ontologies and it provides a graphical ui to map source data to the ontology.

Publications:

Used in best in-use paper ESWC 2013: Connecting the Smithsonian American Art Museum to the Linked Data Cloud
Earliest paper 2007 in IUI

Any23

Anything To Triples (any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents.

http://any23.apache.org/download.html
July 2013 released Apache Any23 0.8.0 which includes a major re-factoring of the codebase providing improved modularity and enabling much better use of Any23 within your applications. Currently it supports the following input formats:
- RDF/XML, Turtle, Notation 3
- RDFa with RDFa1.1 prefix mechanism
- Microformats: Adr, Geo, hCalendar, hCard, hListing, hResume, hReview, License, XFN and Species
- HTML5 Microdata: (such as Schema.org)
- CSV: Comma Separated Values with separator autodetection.
May 2014 The Apache Any23 PMC are proud to announce the immediate release of Any23 1.0 which is a major release for the project. Anything To Triples (Any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents. Currently it supports the following input formats. A release report for this release can be accessed here http://s.apache.org/Ull. Although we suggest that you use and consume the Any23 Maven artifacts there are also a number of other download options on our downloads page as well as documentation for how you can include Any23 in your projects. http://any23.apache.org/download.html
- RDF/XML, Turtle, Notation 3
- RDFa with RDFa1.1 prefix mechanism
- Microformats: Adr, Geo, hCalendar, hCard, hListing, hRecipe, hReview, License, XFN and Species
- HTML5 Microdata: (such as Schema.org)
- JSON-LD: JSON for Linking Data. a lightweight Linked Data format based on the already successful JSON format and provides a way to help JSON data interoperate at Web-scale.
- CSV: Comma Separated Values with separator autodetection.
- Vocabularies: Extraction support for CSV, Dublin Core Terms, Description of a Career, Description Of A Project, Friend Of A Friend, GEO Names, ICAL, lkif-core, Open Graph Protocol, BBC Programmes Ontology, RDF Review Vocabulary, schema.org, VCard, BBC Wildlife Ontology and XHTML.

UMBC's T2LD

"Tabular to Linked Data"

UMBC's Mulwad Varish
ISWC CEUR paper
Varish Mulwad/UMBC T2LD MS Thesis
Arbitrary target structures
Mulwad et al. Automatically Generating Government Linked Data from Tables (AAAI 2011) use ontologies and existing linked data to drive suggestions for enhancements.

AKSW's CSVImport

Representing multi-dimensional statistical data as RDF using the RDF Data Cube Vocabulary

(csv2rdf4lod handles n-ary relations in spreadsheets, including multi-dimentional statistics; see Converting with cell based subjects)

Trifacta

A nicer, commercial, version of OpenRefine. http://www.trifacta.com. Partnered with Tableau.

Google's Refine

Browser based
Faceted browsing
Concurrent editing for efficient manual data cleaning
Reconciliation with Freebase
Programmatic control of values
DERI offers module that exports to RDF: http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/
https://groups.google.com/forum/#!topic/google-refine/5O-jSE0NBTU/discussion
http://dataist.wordpress.com/2012/04/10/tutorial-using-google-refine-to-clean-mortgage-data/
http://ckan.org/2011/07/05/google-refine-extension-for-ckan/
screencast
For one-off conversion Google Refine is quite easy to get started. It has a great deal of data cleaning facilities for noisy or illogical data. With its RDF extension you have automated data reconciliation with outside linked data sources of your choice as DBpedia. (Rafael)

OpenRefine

October 2nd, 2012, Google is not supporting actively Refine which have been rebranded to OpenRefine

http://openrefine.org/

TAO's RDBToOnto

RDBToOnto is a tool that allows to automatically generate fine-tuned populated ontologies from relational databases (in RDFS/OWL).

A major feature of this tool is the ability to produce highly structured ontologies by exploiting both the database schema and structuring patterns hidden in the data (see publications for details on the RTAXON learning method, including its formal description).

Though automated to a large extent, the process can be constrained in many ways through a friendly user interface. It also provides a framework that eases the development and integration of new learning methods and database readers. A database optimization module allows to enhance the input database before ontology generation.

homepage: http://www.tao-project.eu/researchanddevelopment/demosanddownloads/RDBToOnto.html

Ermilov's wiki.publicdata.eu CSV2RDF Application

Ivan Ermilov, Sören Auer, Claus Stadler: Crowd-Sourcing the Large-Scale Semantic Mapping of Tabular Data at WebSci 2013 (wiki intro)

More notes and comments at Ermilov's wiki.publicdata.eu CSV2RDF Application

DataLift

Datalift brings raw structured data coming from various formats (relational databases, CSV, XML, ...) to semantic data interlinked on the Web of Data.

homepage: http://datalift.org
tutorial

Anzo

Cambridge Semantics Anzo
http://www.cambridgesemantics.com/products/anzo_for_excel - designed to keep large numbers (potentially hundreds) of spreadsheets continuously integrated and in sync across an enterprise, each independently curated.

Anzo (in particular Anzo for Excel) is designed for enterprises to curate large numbers of spreadsheets, map them to ontologies & to existing RDF instance data, and maintain them as changes are made to the spreadsheets or to the data in the spreadsheets. It can be used for CSV-style "tabular" spreadsheets and also for arbitrarily "human-oriented" spreadsheets. It can be used both in interactive modes (where people are opening up and interacting with spreadsheets) and also in automated batch modes.

Anzo stores the RDF data from spreadsheets in an RDF database. Anzo includes both authenticated and unauthenticated SPARQL endpoints for this data; Anzo can also directly publish the data as Linked Data. Finally, Anzo gives you several ways to export RDF data from the database.

Anzo is available in several editions: Anzo Express Starter -- includes Anzo for Excel as above for limited #s of users; freely available Anzo Express -- includes Anzo for Excel and Anzo on the Web, a user-friendly browser-based dashboard tool for visualization, searching, and analyzing RDF data Anzo Enterprise -- includes the above in addition to tools to connect to data in relational databases, to integrate unstructured data from documents, web pages, etc., to run rules and reasoning and work flow processes, various server-side and client-side APIs, etc. We also make Anzo available for free for academic use. (Lee)

Michel Dumontier's php-lib

Michel Dumontier's php-lib library is what Bio2RDF has been using for converting TSV, CSV files (and other file formats) to RDF [1]. It contains some aspects that are Bio2RDF specific, namely its support for prefixed URIs, but any Pull Requests on GitHub would be appreciated to generalise that. OSX has PHP installed by default as far as I know so you can use it on the command line without any other dependencies.

You can find examples of scripts using php-lib in the bio2rdf-scripts repository on GitHub [2]. A fairly simple example would be the HGNC converter, which is Tab separated, but quite similar [3].

Cheers,

Peter

[1] https://github.com/micheldumontier/php-lib [2] https://github.com/bio2rdf/bio2rdf-scripts [3] https://github.com/bio2rdf/bio2rdf-scripts/blob/master/hgnc/hgnc.php#L129

Christopher Gutteridge's Grinder

https://github.com/cgutteridge/Grinder
Christopher Gutteridge's http://graphite.ecs.soton.ac.uk/stuff2rdf/

raw2ld

Set of tools and scripts for converting raw data (csv, tsv, $sv, and other custom formats), creating links, and managing a triple store http://www.data2semantics.org

https://github.com/Data2Semantics/raw2ld

TabLinker

https://github.com/Data2Semantics/TabLinker

RightField

Homepage: http://rightfield.org.uk

RightField allows the creation of spreadsheets that have ontology terms embedded within them for data validation. -Simon Jupp RightField (http://www.rightfield.org.uk), allows you to embed ontology term selection into spreadsheets, and to extract these selections as RDF. It is designed more for assisting in the data collection process (i.e. when users fill in a spreadsheet that has been marked-up using RightField, they are automatically collecting semantically enriched data). Their paper RDF extraction in more detail:

Wolstencroft, Katherine; Owen, Stuart; Goble, Carole; Nguyen, Quyen; Krebs, Olga; Muller, Wolfgang; , "RightField: Semantic enrichment of Systems Biology data using spreadsheets," E-Science (e-Science), 2012 IEEE 8th International Conference on , vol., no., pp.1-8, 8-12 Oct. 2012

doi: 10.1109/eScience.2012.6404412 (Katy)

Populous is a spawn of RightField

Populous

Populous is a spawn of RightField. Populous (http://populous.org.uk) uses the ontology pre-processing language (OPPL) to convert spreadsheet data in OWL/RDF. It also supports validating spreadsheet content against existing ontologies. Populous is a spawn of RightField (http://rightfield.org.uk).

IO Informatics’ Knowledge Explorer

IO informatics Knowledge Explorer, a good tool. I used Google Refine+ RDF plugin and faced some problem with large datasets but KE worked perfectly well. -Abdul Mateen Rajput
IO Informatics’ Knowledge Explorer. Professional Edition, also provides an automated way to facilitate import and updating a triplestore backend of your choice via monitored folders which will map and import incoming spreadsheets to RDF. You can set up multiple monitored folders with different data mappings, and this will run as background processes to continuously update one or multiple connected triplestores (or different graphs in a single triplestore.

The Knowledge Explorer also provide scripting within the import mapping, application of thesauri and other mechanisms for data transformation to clean, consolidate and harmonize data during the import.

You can find out more about this tool here: http://www.io-informatics.com/products/sentient-KE.html -Erich Gombocz

Spain I-SEM 2010 submission

http://portal.acm.org/citation.cfm?id=1839753

eBiquity's RDF123

© 2007; status "Past project"; active April-December 2007
google group last non-spam entry was 19 Sept 2007
RDF123 by UMBC's Lushan Han; see ebiquity's RDF123 page. pdf ontology.

OWL spreadsheets

The idea behind "spreadsheet" work in .bib is to enrich spreadsheets with an ontology that makes the semantics of the spreadsheet cells, particularly of derived/computed values, more explicit, and using that information to provide user assistance. -Christoph

Spreadsheets with a Semantic Layer

Talis' csvmapper

homepage: http://tiree.snipit.org/talis/tables
download: http://tiree.snipit.org/talis/tables/downloads/csv_mapper_remote.zip

Tetherless World's 2009 data-gov converter

http://code.google.com/p/data-gov-wiki/source/browse/#svn
Chunks output into multiple files to suit Tabulator's memory constraints.
Uses hash-based URIs for quick and easy Linked Data deployment.

Simple Sloppy Semantic Database

S3DB stands for Simple Sloppy Semantic Database. It is a way to represent information on the Semantic Web without the rigidness of relational/XML schema while avoiding the "spaghetti" of unconstrained RDF stores. The critical feature of S3DB is a core datamodel that makes an explicit distinction between domain of discourse and its instantiation. The motivation and basic design is introduced in our publications [Nature Biotechnology - 24, 1070 - 1071 (2006)], [PLoS ONE 3(8) 2008] and [BMC Bioinformatics 11:387 (2010)]. For a shortcut to the syntax of the REST protocol used to expose S3DB's API click here. For the sprawling list of documents and media describing installation and usage see the documentation page. https://sites.google.com/a/s3db.org/s3db/ http://www.biomedcentral.com/content/pdf/1471-2105-11-387.pdf http://ibl.mdanderson.org/~jsalmeida/

Li Ding's lod-apps

page: http://code.google.com/p/lod-apps/wiki/phpLod#phpCsv2Rdf
last code update: Aug 4, 2011
version 2011-02-08 is in "testing status"

Talend Open Studio

"tons of connectors to get your data from any sources"
"nice data cleaning and transormation components to massage your data"
"fuzzymatch option (using levenshtein‎ and metaphone) for reconciliation"
"job can be exported in a shell script and included in a cron job."
"Talend is more complex than Refine and the learning curve a bit longer"

Michael Grove's ConvertToRDF

Command line version of Mindswap Convert To RDF Tool

homepage: http://www.mindswap.org/~mhgrove/ConvertToRDF/

Michael Grove's Mindswap Convert To RDF Tool

GUI version of Michael Grove's ConvertToRDF

homepage: http://www.mindswap.org/~mhgrove/convert/

Mindswap's Excel2RDF

Windows exe circa 2002
Can convert up to 26 columns and 100 rows.
Excel2RDF by University of Maryland's Mindswap.
Defers to Michael Grove's ConvertToRDF

R2RML

R2RML tutorial (circa April 2013) http://rdb2rdf.org/

http://www.w3.org/TR/r2rml/

Dave Reynolds' talk (esp slide 26) http://www.slideshare.net/der42/industrialized-linked-data
in virtuoso

D2RQ

http://d2rq.org/

http://www.w3.org/2001/sw/wiki/D2RQ

Others

http://dataincubator.org (google group active April 2009 to August 2011)
http://www.w3.org/2011/Talks/0223-cshals-egp/#%2843%29, http://www.w3.org/2011/Talks/0223-cshals-egp/#%2847%29
VIVO slurps their csvs into a relational database and uses the JDBC or d2r widgets to produce RDF.
Hibernate to map SPARQL to object-relational model.
RDB-RDF
http://www.w3.org/2001/sw/rdb2rdf/
C. Bizer, D2R MAP – A Database to RDF Mapping Language, Proceedings of the 12th In- ternational World Wide Web, 2003.
Assem et al. mention their own in ISWC 2010
Interactively Mapping Data Sources into the Semantic Web (presented at ISWC) http://ceur-ws.org/Vol-783/paper2.pdf
http://rdf-translator.appspot.com/
XLWrap (ISWC 2009 paper)
W3C's A Direct Mapping of Relational Data to RDF First Public Working Draft
Stanford's M2: a Language for Mapping Spreadsheets to OWL from OWLED 2010
http://www.mindswap.org/~rreck/excel2rdf.shtml
Auer's Triplify / http://aksw.org/Projects/ReDDObservatory/OntoWiki_sCSV2RDFPlugin
- S ̈oren Auer, Sebastian Dietzold, Jens Lehmann, Sebastian Hellmann, and David Aumueller. Triplify: Light-weight linked data publication from relational databases. In Juan Quemada, Gonzalo Le ́on, Yo ̈elle S. Maarek, and Wolfgang Nejdl, editors, Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009, pages 621–630. ACM, 2009. (eprint)
http://aksw.org/Projects/Stats2RDF is now at http://aksw.org/Projects/CSVImport.html
Leigh Dodds describing his Gridworks reconciliation api hack: http://www.ldodds.com/blog/2010/08/gridworks-reconciliation-api-implementation/
H2R Michael Krauthammer
Mapping Master http://data.semanticweb.org/conference/iswc/2010/paper/414/html
EasyRdf (php) Homepage: http://www.aelius.com/njh/easyrdf/ Download: http://github.com/downloads/njh/easyrdf/easyrdf-0.6.0.tar.gz API Docs: http://www.aelius.com/njh/easyrdf/docs/
Linked Data Integration Framework (uses R2R Mapping and SILK)
Spotfire "Spreadsheets in Spotfire as Linked Open Data"
Pentaho Data Integration suite (http://kettle.pentaho.com/) for converting from relational DBs to RDF. They also used it to translate from XML to RDF.
ALOE - Assisted Linked Data Consumption framework http://aksw.org/projects/aloe
Information Workbench6 [5] developed by fluid Operations Haase, P., Schmidt, M., Schwarte, A.: The Information Workbench as a Self-Service Platform for Linked Data Applications. 2nd International Workshop on Consuming Linked Data (COLD 2011), Bonn, Oktober 2011. http://www.fluidops.com/information-workbench/
http://arxiv.org/abs/1202.3667
http://www.sysmo-db.org/rightfield
Stanford’s DataWrangler app – a tool for visually creating a script to reformat/clean data
Tabels is a tool by CTIC to bridge the gap between tabular formats and linked data. Tabels is able to process spreadsheets, csv files, but also other tabular formats such as statistical specific ones, analysis tool formats and so on. Moreover, Tabels is more than a transformation tool. It is geared up with data-sensitive front-end widgets to facilitate end-users the exploitation of data. Regarding multidimensional information, Tabels programs are able to produce DataCube-compliant datasets, which can be dynamically explored using the chart view. A HTML5-based visualization component that automatically generates an interactive interface to explore the data. An example of how to transform an Eurostat PX file to Data Cube with a generic Tabels program is found at http://idi.fundacionctic.org/tabels/project/eurostat/. Ermilov et al. claim that it is the most advanced because it features "Tables Language": "This language is similar to Sparqlify-ML in the sense that it re-uses syn- tactic constructs already known from SPARQL. However, it introduces additional features specifically for CSV-RDF transformations, such as loops for iterating over CSV files in ZIP archives and workbooks and pages in Excel spreadsheets."
https://github.com/njh/easyrdf
http://www.data2semantics.org/2012/11/09/update-tablinker-untablinker/
Tomas Knap presented a poster on ODCleanStore at ISWC 2012. Some more documentation is here.
revelytix makes a tool called Spyder, which is not open-source, but is free - http://www.revelytix.com/content/spyder. It will let you use R2RML over a CVS file directly to convert to RDF (or query with SPARQL without converting).
OpenRefine (formerly Google Refine): http://github.com/OpenRefine/OpenRefine/wiki
NOR2O a Library for Transforming Non-Ontological Resources to Ontologies
Data Shapes and data transformations http://www.slideshare.net/boricles/data-shapes-and-data-transformations http://arxiv.org/abs/1211.1565
http://linkdata.org/
VIVO Harvester: http://vivo-project.github.com/
https://github.com/AKSW/Sparqlify
http://grafter.org/
https://www.researchgate.net/publication/300897674_Sheet2RDF_a_Flexible_and_Dynamic_Spreadsheet_ImportLifting_Framework_for_RDF

Other (non-converter) related work

Triple Store Evaluation

Conversion to RDF is reported by the triple store evaluation literature, where they propose queries as well. Hexastore used as evaluation, but didn't mention how they converted. Library thing a dataset (LUBM?). Rdf4x guys have a non-public dataset. Work did not describe their considerations during the conversion process. (was some of this work from MIT?)

Bibtex to RDF tools

http://www.w3.org/wiki/ConverterToRdf#BibTex lists a handful
Simile's http://simile.mit.edu/repository/RDFizers/bibtex2rdf/
Wolf Siberski's http://www.l3s.de/~siberski/bibtex2rdf/
BibBase http://data.bibbase.org/
Text::BibTeX module of the btOOL library

Conferences

PCI 2013 - Special Session on the Web of Data (DATAWEB) Production and deployment of Open, Linked and Big Data