Skip to content

conversion:subject_discriminator

Timothy Lebo edited this page Feb 14, 2012 · 6 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

URI construction for datasets with multiple (tabular) components.

Why was the table (indentified by conversion:subject_discriminator) tucked between dataset/ and version/?

  • The ordering of components used to construct the URI of the dataset follows a decompositional metaphor.
    • i.e., source/s are grouped by the the aggregator's base_uri; datasets/s are grouped by their source/s; dataset components/tables/files/discriminators/subdatasets are grouped by their dataset/s; version/s are grouped by their dataset/s (or subdatasets); and entities are grouped by (initially) the versioned datasets or (eventually) the abstract datasets.
  • individual tables are also more of a part of the dataset than a part of the version.
  • We assume that the tables (i.e., the individual csv files) will persist across multiple versions.
  • This design was guided by logical organization and NOT physical organization of the data.

If I have multiple CSVs, should I always keep the subject_discriminator?

Keeping it or removing it affects the URIs for the entities named during conversion.

Automated creation of subject_discriminator when creating the conversion trigger

When creating the conversion trigger, cr-create-convert-sh.sh will create a subject_discriminator for each file if multiple files are listed:

cr-create-convert-sh.sh -w source/PA/pa-0001.csv source/NY/ny-0001.csv

Looking into the conversion trigger shell script:

subjectDiscriminator="pa-0001"
Clone this wiki locally