-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measurement of prevalence in the shacl-report #51
Comments
Hi @yum-yab This would be fairly easy to implement, not in pyshacl but in a client application. You can compare the validation report to the input data and generate precisely the custom metrics your application requires. |
Hi @ashleysommer thank you for your quick reply. I understand the focus on the standard, but even one of the authors of the SHACL spec did implement such a feature in RDFUnit (unfortunately it has no support for shacl-af) so I am not really convinced that the vast majority is not interested in an error/success rate (which I agree they could calculate themselves - if they know the number of focus nodes which succeeded ) for the individual tests. Maybe we are missing some "shortcut" or other trick but it seems not easy at all (without actually implementing big parts of the shacl standard and basically writing another (incomplete) shacl engine) to get the number of focusNodes which succeeded for a given test? What would help would be e.g. an option to enable a "logging level" which would also include successful focusNodes in the report or return some triples (maybe even in a separate file) showing the number of successful focusNodes (for large scale analysis this would be the better option) for every shape/test. If you see other easy options to count the number of successful nodes on the client side, please let us know. P.S.: I could also imagine to let @yum-yab make a PR for this (in case it is possible without major changes to core routines), but we are looking for a sustainable solution, so in case you don't see this as a meaningful feature of pySHACL that would not make sense for us. P.P.S.: in case you are aware of any other free or open SHACL engine which would satisfy this requirement, this would also help us. I appreciate any hints and pointers you can provide |
Hi @JJ-Author Re-reading the original post, it seems there are two different requests here. One is quite easy, one is very hard.
i) As you alluded to, pySHACL does not keep record of passed tests. The SHACL specification states that a SHACL engine should apply a constraint test against a collected set of focus nodes, and generate a validation report item (aka a failure) for every focus node which fails the test. There is no mechanism described for keeping record of focus nodes which do conform to the shape constraint. To add this feature would be non-trivial, and would affect a large portion of the pyshacl codebase (every constraint type evaluator method would need new parameters passed in and out, etc). ii) A collected set of focus nodes is not necessarily the result of This introduces a lot requirement-solicitation into this feature request. Eg, should implicit class target focus nodes count in the pass/fail ratio, even though they were not an explicit target? What about subjects-of target type and objects-of targets type focus nodes? And that's only discussing Node Shapes. Should Property Shapes be included in the count too? They don't have a targetClass, and cannot map cleanly to any class for the kind of metrics required. There would need to be some important decisions made, and every individual's picture of the ideal feature set would be different. There is no specification for how a feature such as this should behave, so it would essentially need to be bespoke solution, made up as we go. If this feature is implemented in pySHACL, it would likely generate different metrics output than what is provided by RDFUnit, if that is what it is compared against. I am currently the sole maintainer of pySHACL, and I don't have much time available to spend on building and maintaining new features. If @yum-yab would like to have an attempt at building the feature, please go ahead, but as stated above, due to the current architecture of pySHACL it would be non-trivial. |
Thank you very much @ashleysommer for the explanations. We are sorry for the confusion, the first message was trying to explain what we would like to achieve giving one example, but we knew that there are several ways to specify targets and it would not be straightforward to get the number of focusNodes passed to a test from the SHACL definition At line https://github.com/RDFLib/pySHACL/blob/master/pyshacl/shape.py#L456 we could hook on (also on https://github.com/RDFLib/pySHACL/blob/master/pyshacl/shape.py#L470) to write how many focus_nodes are passed per constraint (test) in a separate file. This should be consistent for all target types and comparable between different implementations. |
@JJ-Author |
Hello,
is there any way to measure the prevalence of executed SHACL-tests, like getting the total number of instances of the sh:targetClass or a percentage like 0.95 of the instances of the given sh:targetClass fulfill the restrictions? If not I think it would be nice to have.
Best Regards
The text was updated successfully, but these errors were encountered: