Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option for fatal error when links_via can't locate a match (for validation) #369

Open
ewpatton opened this issue Sep 4, 2013 · 0 comments

Comments

@ewpatton
Copy link

ewpatton commented Sep 4, 2013

As we collect and convert bloodwork data into RDF we are aggregating labels since each lab facility uses different labels. For example, one facility provides "neu#", another provides "Neu # (ANC)", and yet another provides "ne #r". To a physician the mappings are obvious but to a machine not so much. It would be a nice feature to have CSV2RDF4LOD stop conversion when it fails to find a match because we're looking to have as complete coverage as possible of the underlying data. Currently, we work around it by scanning for lines where the property for the column appears but not multiple values:

$ find * -name '*.e1.ttl' -exec grep -H ofCharacteristic {} \; | grep -v ,
2013-08-20/automatic/cbc_ruby.csv.e1.ttl:   health:ofCharacteristic value_of_characteristic:Neu_ANC ;
2013-08-27/automatic/250_comprehensive_panel.csv.e1.ttl:    health:ofCharacteristic value_of_characteristic:Bilirubin_Total ;

Once the failures have been identified we can then add the missing labels to the ontology, wipe the version, and reconvert. However, on large datasets this rinse-and-repeat procedure would be cumbersome as the conversion might take significant time and we'd like to know about failure early in the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant