Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing Multivalue References in CSV #173

Open
tobiasschweizer opened this issue Jun 21, 2022 · 2 comments
Open

Processing Multivalue References in CSV #173

tobiasschweizer opened this issue Jun 21, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@tobiasschweizer
Copy link

Hi there

I am trying to create linking property values from a string with concatenated foreign keys in a CSV.

CSV data source: https://data.snf.ch/Exportcsv/Person.csv

mapping:

@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix schema: <http://schema.org/>.
@prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#lat>.
@prefix gn: <http://www.geonames.org/ontology#>.
@prefix carml: <http://carml.taxonic.com/carml/> .
@prefix fnml: <http://semweb.mmlab.be/ns/fnml#> .
@prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .
@prefix fno: <https://w3id.org/function/ontology#> .
@base <http://example.com/ns#>.

<#LogicalSourcePerson> a rml:BaseSource ;
  rml:source <#CSVW_sourcePerson> ;
  rml:referenceFormulation ql:CSV .

<#CSVW_sourcePerson> a csvw:Table;
   csvw:url "Person.csv" ;
   csvw:dialect [ a csvw:Dialect;
       csvw:delimiter ";"
   ] .

<#PersonMapping> a rr:TriplesMap;
  rml:logicalSource <#LogicalSourcePerson> ;

  rr:subjectMap [
    rr:template "http://snf.ch/person/{PersonNumber}";
    rr:class schema:Person
  ] ;

  rr:predicateObjectMap [
    rr:predicate schema:memberOf ;
    rr:objectMap <#JoinMap> ;
  ] ;

  rr:predicateObjectMap [
    rr:predicate schema:givenName ;
    rr:objectMap [
      rml:reference "FirstName"
    ]
  ] ;

  rr:predicateObjectMap [
    rr:predicate schema:familyName ;
    rr:objectMap [
      rml:reference "Surname"
    ]
  ] .

<#JoinMap>
    fnml:functionValue [
        rr:predicateObjectMap [
            rr:predicate fno:executes ;
            rr:objectMap [ rr:constant grel:array_join ]
        ];
        rr:predicateObjectMap [
            rr:predicate grel:p_array_a ;
            rr:objectMap [ rr:constant "http://snf.ch/project/" ]
        ];
        rr:predicateObjectMap [
            rr:predicate grel:p_array_a ;
            rr:objectMap <#FunctionMap>
        ];
    ] .


# https://stackoverflow.com/questions/53715353/converting-a-csv-to-rdf-where-one-column-is-a-set-of-values
<#FunctionMap>
    fnml:functionValue [
        rml:logicalSource <#LogicalSourceGrant>;
        rr:predicateObjectMap [
            rr:predicate fno:executes;
            rr:objectMap [
                rr:constant grel:string_split # function to use
            ];
        ];
        rr:predicateObjectMap [
            rr:predicate grel:valueParameter;
            rr:objectMap [
                rml:reference "ResponsibleApplicantGrantNumber" # input string: concatenated foreign keys
            ];
        ];
        rr:predicateObjectMap [
            rr:predicate grel:p_string_sep;
            rr:objectMap [
                rr:constant ";";
            ];
        ];
    ].

result:

"http://schema.org/memberOf" : [ {
    "@value" : "http://snf.ch/project/111925667315583634468"
  }

expected result:

  "http://schema.org/memberOf" : [ {
    "@value" : "http://snf.ch/project/111925"
  }, {
    "@value" : "http://snf.ch/project/34468"
  }, {
    "@value" : "http://snf.ch/project/55836"
  }, {
    "@value" : "http://snf.ch/project/66731"
  } ]

For more details, see kg-construct/rml-questions#15 (reply in thread)

@DylanVanAssche
Copy link
Contributor

The problem is that the same delimiter is used for multi values as columns.
To avoid confusing, the values are quoted. However, the OpenCSV library in the RMLMapper does not pick this up it seems.

@DylanVanAssche DylanVanAssche added the bug Something isn't working label Jul 1, 2022
@tobiasschweizer
Copy link
Author

The problem is that the same delimiter is used for multi values as columns. To avoid confusing, the values are quoted. However, the OpenCSV library in the RMLMapper does not pick this up it seems.

Yes, this is how I understood this works in CSV. The quoting ist like escaping characters that have a special meaning (meta chars). Maybe we could look at the library you mentioned or create an issue in their repo. Let my know if I can be of any assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants