-
Notifications
You must be signed in to change notification settings - Fork 36
conversion:object_search
- conversion:object_search is one of many conversion:Enhancements.
- conversion:object_search is like conversion:SubjectAnnotation because it adds triples to describe the subject(s) of the table row, but instead of explicit predicate-object pairs, conversion:object_search searches the cell value to determine the predicate and/or object of the triple describing the row subject.
This page describes how search a cell value to assert additional triples describing the table row subject(s).
- Reusing the cell value
- Reusing a comma-delimited cell value
- Example: Searching for stocks mentioned in tweets
- Processing an annotation in a cell value
Take the entire value of the cell and construct a URL with it:
conversion:enhance [
ov:csvCol 3;
...
conversion:equivalent_property dcterms:identifier;
conversion:range rdfs:Literal;
...
conversion:object_search [
conversion:regex "^(.*)$";
conversion:predicate foaf:homepage;
conversion:object "http://www.ncbi.nlm.nih.gov/pubmed/[\\\\1]";
];
Will produce:
<http://bio2rdf.org/pubmed:11587856>
dcterms:identifier "11587856" ;
foaf:homepage <http://www.ncbi.nlm.nih.gov/pubmed/11587856> ;
from the line in gene2pubmed:
205920 3927647 11587856
(If you want to affect the subject of the triple, see this)
NOTE: This should only be used in degenerate cases when you can't do it with conversion:delimits_object because for some odd reason you want to keep the unparsed value around in your enhancement. conversion:delimits_object is a much more eloquent way to parse the cell value.
conversion:enhance [
ov:csvCol 14;
...
conversion:object_search [
conversion:regex "([^,]+), ";
conversion:predicate dcterms:subject;
conversion:object "[\\\\1]";
];
conversion:object_search [
conversion:regex ", ([^,]+)$"; # If you have a single regex, feel free to email me.
conversion:predicate dcterms:subject;
conversion:object "[\\\\1]";
];
If you're wrestling around with spacing, try the [>\\1<]
[template variable](Using template variables to construct new values).
After some initial enhancements, twapperkeeper's CSV row (full input file here):
High Volume Stock: stock analysis website - $ABC - http://www.dojispace.com/stock-picks/amerisourcebergen-stock-price-ABC.aspx,,timlisten27,14522987982098432,130595362,en,<a href="http://www.dojispace.com" rel="nofollow">Stock Screener</a>,http://s.twimg.com/a/1291760612/images/default_profile_0_normal.png,,0,0,Tue 14 Dec 2010 03:32:04 +0000,1292297524
can become:
stocks:tweet_14522987982098432
dcterms:identifier "tweet_14522987982098432" ;
dcterms:isReferencedBy
<http://logd.tw.rpi.edu/source/twapperkeeper-com/dataset/stocks/version/2011-Mar-26> ;
a stocks_vocab:Tweet , sioctypes:MicroblogPost ;
sioc:content
"High Volume Stock: stock analysis website - $ABC - http://www.dojispace.com/stock-picks/amerisourcebergen-stock-price-ABC.aspx" ;
But we'd like to not have to regex a tweet to find which stocks it mentions; we'd like to precompute it so we can query it as triples. This can be done with conversion:object_search, which specifies a regex to search the object of a triple, and -- for each match -- the predicate and object to assert on the original subject. (full enhancements file here.)
conversion:enhance [
ov:csvCol 1;
ov:csvHeader "text";
conversion:domain_name "Tweet";
conversion:domain_template "tweet_[#4]";
conversion:equivalent_property sioc:content;
#conversion:label "text";
conversion:comment "";
conversion:range rdfs:Literal;
conversion:object_search [
conversion:eg "is website - $ABC - http:";
conversion:regex "\\\\$([^\\\\s]*)";
conversion:predicate foaf:topic;
conversion:object "$[\\1]";
];
conversion:object_search [
conversion:regex "\\\\$([^\\\\s]*)";
conversion:predicate sioc:topic;
conversion:object "http://dbpedia.org/resource/[\\1]";
];
conversion:object_search [
conversion:regex "\\\\$([^\\\\s]*)";
conversion:predicate foaf:homepage;
conversion:object "[/sd][\\\\1]";
];
];
adds the following triples to those shown above (full output file here):
@prefix stocks_global_value: <http://logd.tw.rpi.edu/source/twapperkeeper-com/dataset/stocks/> .
stocks:tweet_14522987982098432
foaf:topic "$ABC" ;
foaf:homepage stocks_global_value:ABC ;
sioc:topic dbpedia:ABC ;
Note that the enhancements are Using template variables to construct new values, with additional [\1]
variables that result from captured groups in the regex.
Given an "annotated cell value" that contains a long messy string prepended by a clean processable string:
[Tabels, D2R Server, Jena, Virtuoso] * Tabels (Conversion XLS to RDF) [http://idi.fu...
We want:
:row
dcterms:description "[Tabels, D2R Server, Jena, Virtuoso] * Tabels (Conversion XLS to RDF) [http://idi.fu...";
dcterms:references :Tabels, :D2R_Server, :Jena, :Virtuoso;
.
The following hairball of a regex doesn't handle it. Even if it is possible, it certainly requires too much expertise and time to work out.
conversion:enhance [
ov:csvCol 10;
ov:csvHeader "lod 2 (org type)";
conversion:object_search [ # [Tabels, D2R Server, Jena, Virtuoso]
conversion:regex "^\\[([^,\\]]+)[,\\]]",
"[^\\]]+?, ([^,\\]]+),",
"^[^\\]]+, ([^,\\]]+)\\]", "^(academic)$";
conversion:predicate dcterms:yippie;
conversion:object "[/sd]org/[\\1]";
];
];
So we combine conversion:object_search and conversion:delimits_object to first select the cell value substring to process, then to parse that string with the given delimiter:
conversion:enhance [
ov:csvCol 11;
ov:csvHeader "lod 3 (tools)";
conversion:object_search [
conversion:regex "^\\[([^,\\]]+)\\]]"; # [Tabels, D2R Server, Jena, Virtuoso]
conversion:delimits_object ",\\s*"; # "Tabels", "D2R Server", "Jena", and "Virtuoso"
conversion:predicate dcterms:references;
conversion:object "[/sd]org/[\\1]"; # <http://purl.org/twc/lodcloud/id/tool/Virtuoso>
];
];
Unfortunately, this doesn't allow us to leverage the full Resource handling (such as conversion:links_via).
- conversion:interpret specifies how to "rewrite" the input before it is handled for conversion.
-
Using template variables to construct new values for use within values of
conversion:object_search
.