Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

90% - Prevent null values being added to ExtendedGraph #82

Merged

Conversation

lordtatty
Copy link
Contributor

To address issue #81

This will ensure that subjects, predicates, and resource values are always strings. Literal values can be any scalar - we would like to enforce strings for literal values too, but doing so may break legacy databases which use non-strings as values.

*
* @return bool
*/
protected function isValueValid($value){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change name to isValidLiteralValue() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we want to check resource values as well though? isValidTripleValue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Tripod's perspective, don't we just need to ensure that subject, predicate, and object are all strings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-empty strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but enforcing strings as opposed to null could be very painful for any graphs which are already storing non-string values (eg. integers). This current PR to ignore null values could well cause enough headaches without taking out integers at the same time. I do think we should ultimately enforce strings-only, but maybe one step at a time, once we are comfortable with no nulls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or arrays of strings and integers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I'm saying is that we should be doing type checking here so we're not just allowing some other bad data that is equally as incompatible as a null value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we deal with legacy data, though? We have no idea what crazy things have ended up in the graph. The php data types are:

String.
Integer.
Float (floating point numbers - also called double)
Boolean.
Array.
Object.
NULL.
Resource.

Integer, floats and booleans could all have ended up in the graph. Is it possible to add an array via the public ExtendedGraph methods?

Agree that really we should be ignoring anything that isn't a string, but I guess the question is how much we are prepared to break anything currently using tripod to enforce this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tripod only supports plain literals. Full-stop. If other data types are in the documents they would produce invalid RDF, anyway, because we do not support typed literals at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this with @rsinger Agreed to continue with the current approach. I have just tested to see if add_to_literal can accept arrays, and it can, so we can't simply ignore any non-scalar.

* @return bool
*/
protected function isValidTripleValue($value){
if(!is_string($value) && !is_int($value) && !is_array($value) && !is_float($value) && !is_bool($value)){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not is_scalar here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh yep, I meant to use that, no idea why I didn't.... one moment.

@lordtatty lordtatty changed the title 30% - Prevent null values being added to ExtendedGraph 80% - Prevent null values being added to ExtendedGraph Jul 21, 2015
@lordtatty
Copy link
Contributor Author

@rsinger @RobotRobot Something I just noticed is https://github.com/talis/tripod-php/blob/master/src/classes/ExtendedGraph.class.php#L141 - @return boolean true if the triple was new, false if it already existed in the graph.

What should be return of add_resource_triple and add_literal_triple be if we aren't adding the triple due to the value being invalid? Null? False? At the moment it is false.

@lordtatty
Copy link
Contributor Author

@rsinger @RobotRobot also at the moment we are allowing scalars for both literals and resources. Should we restrict resources to only strings? I really can't imagine how having anything other than a string can be legitimate for a uri.

@scaleupcto
Copy link
Contributor

resources should definitely always be strings. So I think we should validate them as such, rather than scalars.

* @param array $mongoValueObject
* @return array
* @return array| false an array of values or false if the value is not valid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array| false isn't a valid phpdoc, it needs to be array|bool

@rsinger
Copy link
Member

rsinger commented Jul 22, 2015

This looks good. Should we validate $s and $p while we're here? Is it possible to add invalid values for them currently?

@lordtatty
Copy link
Contributor Author

@rsinger I feel like maybe that should be raised as a separate issue so we can get this branch down.

@rsinger
Copy link
Member

rsinger commented Jul 22, 2015

Why can't you just run $s and $p through isValidResourceProperty?

@rsinger
Copy link
Member

rsinger commented Jul 22, 2015

I mean, my real question is, can you even add invalid values there? I don't see why wouldn't fix it here, if so.

@lordtatty
Copy link
Contributor Author

@rsinger my only real reservation to fixing that in this branch is that I'll have to add a whole bunch of tests to cover that, and this needs to get shipped today, ideally. It's currently passing as it is so I think the right strategy is to merge this down and fix up the $s and $p in another branch...

@rsinger
Copy link
Member

rsinger commented Jul 22, 2015

Can you just confirm whether or not it's even an issue? There is no point in making a release to prevent us from adding duff data if we've still left it open to add duff data.

@lordtatty lordtatty changed the title 80% - Prevent null values being added to ExtendedGraph 90% - Prevent null values being added to ExtendedGraph Jul 22, 2015
@lordtatty
Copy link
Contributor Author

Added checks for subjects and predicates

@lordtatty
Copy link
Contributor Author

This will now also throw an exception if trying to add a triple with a blank subject

*
* @return bool
*/
protected function isValidTripleValue($value){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triple value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that would be a good generic name to cover subject, predicate and value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it made more sense when it was named resourcevalue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used for more than just validating a resource value though - the only thing it doesn't validate now is a literal value. isValidTripleComponentValue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I be breaking this out into isValidSubject, isValidPredicate, and isValidVaue? It might make sense since we are now ensuring that the subject is not an empty string, as we can just add the check into there...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they're all just uris.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isTripleValue() really doesn't make a lot of sense.

The graph is made up of subjects ($s) predicates ($p) and objects ($o).

Subjects and predicates are always resources (another name for URIs), objects can be either a resource or a literal.

So it makes more sense to have two methods that reflect that.

isValidResource() - this covers subjects, predicates or objects that are of type URI. Those are all resources - a different term for URI.

isValidLiteral() covers objects that are literals only.

You are currently doing validation in two places - once in _add_triples() and again in the individual methods - pick one or the other - don't do both - that just wastes CPU cycles unnecessarily.

Finally you're null checking outside these methods. Do the null checking inside the isValidResource(). If the validity check fails for resources or predicates (i.e. returns false) I think the calling method might validly throw an exception, but for objects we are saying it is allowed but ignored (to keep the peace).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, in my head I was separating subjects and resource values unnecessarily here, will fix up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't spotted that duplicate validation - that was a leftover from a previous iteration. Removing.

if(!isset($value['r']) || !$this->isValidTripleValue($value['r'])){
return;
}
if($value['r'] === ""){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again - why is your validity checking not inside the method validating the value? Instead if isValidWhatever returns false then throw your exception.

@lordtatty
Copy link
Contributor Author

@RobotRobot @rsinger Fixed up from PR comments:

  • Renamed isValidTripleValue back to isValidResourceValue, and added the empty string check into there.
  • Removed an unnecessary predicate validation which is being covered by the the labeller, and added appropriate tests to cover that.
  • toGraphValueObject no longer returns false if the array is empty.

* @param string $value
* @return bool
*/
protected function isValidLiteralValue($value){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too verbose, rename isValidLiteral()

@scaleupcto
Copy link
Contributor

👍 just rename those methods, then merge

lordtatty added a commit that referenced this pull request Jul 23, 2015
…d-to-extendedgraph

90% - Prevent null values being added to ExtendedGraph
@lordtatty lordtatty merged commit 89044e8 into master Jul 23, 2015
@lordtatty lordtatty deleted the do-not-allow-null-values-to-be-added-to-extendedgraph branch July 23, 2015 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants