Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if the namespace of an object is changed it should be reflected in the object #412

Open
JMante1 opened this issue Oct 10, 2022 · 14 comments · Fixed by #413
Open

if the namespace of an object is changed it should be reflected in the object #412

JMante1 opened this issue Oct 10, 2022 · 14 comments · Fixed by #413
Assignees

Comments

@JMante1
Copy link

JMante1 commented Oct 10, 2022

obj = sbol3.Component('hello', [sbol3.SBO_DNA]) obj.name = 'hello' obj.namespace = 'parts.igem.org' doc.add(obj)

This code leads to an object with http://sbols.org/unspecified_namespace used everywhere, except for the has namespace property. if the namespace property is updated it should change the namespace used throughout the property

@jakebeal
Copy link
Contributor

jakebeal commented Oct 11, 2022

pySBOL3 intentionally uses a "permissive" model of property changes, in which it does not check for potential inconsistencies until the user requests validation, since the user may be performing a given operation in the middle of some sort of multi-stage operation.

If you run a document validation after running the code that you provided, however, you will find that it does identify an error due to the invalid state that has been created:

>>> print(doc.validate())
http://sbols.org/unspecified_namespace/hello sbol3-10301: The namespace parts.igem.org is not a prefix of the identity http://sbols.org/unspecified_namespace/hello

Changing the namespace of an object is a potentially dangerous operation, since it can break links with other objects that point to this object, as well with with its child objects. As such, we recommend changing namespace via the "safe" function provided for that purpose on the Document class:

def change_object_namespace(top_levels: Iterable[TopLevel], new_namespace: str) -> Any:

This function provides a guaranteed safe change for a group of objects, moving all of them as a unit into the new namespace and preserving links both within the group and with external references.

@JMante1
Copy link
Author

JMante1 commented Oct 11, 2022

The change_object_namespace almost does what I want, unfortunately it doesn't update other references to the object (e.g. if a component is used in a combinatorial derivation the component is updated but the reference to it in the combinatorial derivation is not). Is there a function to do that too, i.e. to preserve the external references?

@tcmitchell
Copy link
Collaborator

@JMante1 this could be a bug. Document.change_object_namespace is supposed to maintain referential integrity among the top_levels passed to the method. If that is not what is happening for you, could you please share a reproducible test case so that the bug can be fixed?

@JMante1
Copy link
Author

JMante1 commented Oct 12, 2022

@JMante1
Copy link
Author

JMante1 commented Oct 12, 2022

I wrote this as a work around:

def update_uri_refs(doc, update_dict, use_derived=True, derived_ls = ['_sequence']):
    """
    This updates a set of referenced uris (may be a namespace or identity update)

    Args:
        doc (SBOL3 Document): document to be updated
        update_dict (dict): dictionary of the form {old_uri:new_uri}
        use_derived (bool, optional): Whether or not to also update derived uris. Defaults to True.
        derived_ls (list, optional): List of derivations e.g. also version of the uri
                                     with _sequence added to the end. Defaults to ['_sequence'].

    Returns:
        doc (SBOL3 Document): updated document
    """
    # create all the additional uris that will need to be updated
    derived_keys = []
    for deriv in derived_ls:
        der_update = [f'{x}{deriv}' for x in  update_dict.keys()]
        derived_keys.extend(der_update)

    # pull the graph from the document
    g = doc.graph()
    for index, (subject, predicate, _object) in enumerate(g):
        # if the object is one of the items to be updated do so
        if str(_object) in update_dict:
            g.remove((subject, predicate, _object))
            new = rdflib.URIRef(update_dict[str(_object)])
            g.add((subject, predicate, new))
        # update any derived objects
        elif use_derived and str(_object) in derived_keys:
            suffix = str(_object).split('_')[-1]  # assumes suffix starts with '_'
            suffix = f'_{suffix}'
            g.remove((subject, predicate, _object))
            old = str(_object)
            new = f"{update_dict[old.replace(suffix, '')]}{suffix}"
            new = rdflib.URIRef(new)
            g.add((subject, predicate, new))
        # update any derived subjects
        if use_derived and str(subject) in derived_keys:
            suffix = str(subject).split('_')[-1]  # assumes suffix starts with '_'
            suffix = f'_{suffix}'
            g.remove((subject, predicate, _object))
            old = str(subject)
            new = f"{update_dict[old.replace(suffix, '')]}{suffix}"
            new = rdflib.URIRef(new)
            g.add((new, predicate, _object))
    doc._parse_graph(g)
    return doc

@tcmitchell
Copy link
Collaborator

@JMante1 Thank you for the short and simple test case to demonstrate the problem, and thank you for taking the time to develop a workaround. Both are appreciated.
I think I see where Document.change_object_namespace is coming up short. It only takes one list of TopLevel objects. Those objects all have their namespace changed and their references are made consistent after the change. Your use case is slightly different. You have two distinct sets of objects. One set should have their namespace changed and get their references updated. The other set should get their references updated without changing their namespace. We had not considered this case when we designed and implemented Document.change_object_namespace.
I am testing a fix that will add a third argument, optional, to Document.change_object_namespace. This argument is an iterable of TopLevel objects that should have their references updated. It can be list or a Document (which is itself an Iterable[TopLevel].

@tcmitchell
Copy link
Collaborator

@JMante1 I have merged an enhancement to provide the functionality I think you need. With this enhancement you can add a third argument to Document.change_object_namespace. In your sample code above to demonstrate the error, you would change the invocation to Document.change_object_namespace to include the document:

doc.change_object_namespace([seq], 'http://parts.igem.org', doc)

Are you able to test from the "main" branch to see if this fix works for your original case? If possible I'd like to hear back from you before I cut a new release of pySBOL3 that includes this fix.
Thanks again for your assistance in creating the test case. It was tremendously helpful in understanding the issue and providing a fix.

@jakebeal
Copy link
Contributor

Reopening because the autoclose happened before @JMante1 could test the fix.

@jakebeal jakebeal reopened this Oct 13, 2022
@tcmitchell
Copy link
Collaborator

Hi @JMante1 please let us know when you are able to test this fix. Does it resolve your issue?
Thanks!

@JMante1
Copy link
Author

JMante1 commented Oct 20, 2022

Hi, it does work for the simpler test case but not for the more complicated one. I am not sure if it is because of the way I am referencing documents or not? Additionally, I have derived parts e.g. partname_sequence etc that I also want to update. Your code doesn't allow this. Do you intend to add this or should I continue to use the work around for that case anyway?

@tcmitchell
Copy link
Collaborator

I'd be happy to try, but I'll need more information. There's not enough to go on here.

For "the more complicated" test case, I'm assuming you are referring to this one:

import sbol3
import os

direct = os.path.split(__file__)[0]
file_path_out = os.path.join(direct, 'out.nt')

doc = sbol3.Document()
comp = sbol3.Component('http://sbolstandard.org/testfiles/hello', sbol3.SBO_DNA)
seq = sbol3.Sequence('http://sbolstandard.org/testfiles/hello_sequence', elements='ATGC')
doc.add(comp)
doc.add(seq)

comp.sequences = [seq]

doc.change_object_namespace([seq], 'http://parts.igem.org')

doc.write(file_path_out)

Did you make the change I noted in my comment?

Here is the change you need to make, in case you did not update the code:

doc.change_object_namespace([seq], 'http://parts.igem.org', doc)

Note the third argument. The above code works for me. I think I used a variant of your code for a test case.

Let's get this settled before moving on to the derived parts issue. For that one I think I will need an example of failing code in order to understand the differences and test a fix.

Thanks!

@JMante1
Copy link
Author

JMante1 commented Oct 20, 2022

No, for test case above it does work for me. I am using it as part of the excel to sbol code and it doesn't work as part of that code. However, that is a really complicated code base. I can try and create a simpler version that gives the same errors, however, I was wondering if it was worth the effort if the derived objects will still need to be fixed via the work around.

@tcmitchell
Copy link
Collaborator

I've read your workaround code and I think a derived object is one whose URI appears to be "derived from" another URI. For example, http://example.org/foo_sequence is derived from http://example.org/foo.

If both of those objects, the Component named http://example.org/foo and the derived object, a Sequence named http://example.org/foo_sequence are in the first argument to change_object_namespace , I would think they would both be properly updated to a new namespace.

I think your third case, under the comment # update any derived subjects will not be handled. If the object named http://example.org/foo_sequence is not specified in the first argument to change_object_namespace, then no, it will not have its identity modified only because http://example.org/foo was in that first argument. It will have its identity changed only if it is one of the objects in the first argument to change_object_namespace.

I could work up an example if you think I'm understanding your code and your question. Thanks for your patience.

@JMante1
Copy link
Author

JMante1 commented Oct 26, 2022

I think you understand and it would work if I restructured my code somewhat. What happens to your function if you give it objects to change which don't exist? Does it just skip them or do I need to create an intersect of possible derived objects and objects that exist?

@tcmitchell tcmitchell modified the milestone: 1.1 Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants