Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clips-executive: release lock if we suddenly become the owner #321

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

TarikViehmann
Copy link
Contributor

@TarikViehmann TarikViehmann commented Mar 31, 2022

We encountered a case, where the local mutex fact was saying that
another robot held the lock, hence the own mutex request was rejected.
After the rejection went through we finally got the update about the
request that was sent asynchronously (prior to the requection).
Hence the mutex fact changed from
(mutex (state LOCKED) (locked-by "other") (request NONE)
to
(mutex (state LOCKED) (locked-by "us") (request NONE)
which is of course not wanted.

 14:02:48.340853 CLIPS (executive): FIRE   25 mutex-trigger-event: f-1906,f-65681,*
D 14:02:48.340873 CLIPS (executive): Trigger: { "_id" : { "_data" : "8262459867000000252B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType" : "update", "clusterTime" : { "$timestamp" : { "t" : 1648728167, "i" : 37 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728167656 }, "locked-by" : "Set" }, "ns" : { "db" : "robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "locked" : false }, "removedFields" : [ "lock-time", "locked-by" ] } }
D 14:02:48.340906 CLIPS (executive): <== f-63573 (mutex (name resource-PROMISE-C-RS1-OUTPUT) (state LOCKED) (locked-by "Upsilan") (lock-time 1648728167 626000) (request NONE) (pending-requests) (response NONE) (auto-renew TRUE) (error-msg ""))
D 14:02:48.340913 CLIPS (executive): ==> f-65710 (mutex (name resource-PROMISE-C-RS1-OUTPUT) (state LOCKED) (locked-by "Set") (lock-time 1648728167 656000) (request NONE) (pending-requests) (response NONE) (auto-renew TRUE) (error-msg ""))

We encountered a case, where the local mutex fact was saying that
another robot held the lock, hence the own mutex request was rejected.
After the rejection went through we finally got the update about the
request that was sent asynchronously (prior to the requection).
Hence the mutex fact changed from
(mutex (state LOCKED) (locked-by "other") (request NONE)
to
(mutex (state LOCKED) (locked-by "us") (request NONE)
which is of course not wanted.

Co-authored-by: Daniel Swoboda <[email protected]>
@morxa
Copy link
Member

morxa commented Apr 1, 2022

Hm at least the example log that you posted shows a different issue:

  "updateDescription": {
    "updatedFields": {
      "locked": false
    },
    "removedFields": [
      "lock-time",
      "locked-by"
    ]
  }

But at the same time:

{
  "fullDocument": {
    "_id": "resource-PROMISE-C-RS1-OUTPUT",
    "locked": true,
    "lock-time": {
      "$date": 1648728167656
    },
    "locked-by": "Set"
  }
}

So it seems like fullDocument contains the document before the update (I think this can be configured when registering the trigger). May this be the actual issue?

@TarikViehmann
Copy link
Contributor Author

That would make much more sense, i already wondered why the UpdateDescription differs from the doc... So far we always process a trigger by applying the data stored in the fullDocument.

  (bind ?doc (bson-get ?obj "fullDocument"))                                                                           
  (bind ?id (sym-cat (bson-get ?doc "_id")))                                                                           
  (if ?id then                                                                                                         
    (bind ?locked FALSE)                                                                                               
    (bind ?locked-by "")                                                                                               
    (bind ?lock-time (create$ 0 0))                                                                                    
                                                                                                                       
    (bind ?locked (bson-get ?doc "locked"))                                                                            
    (if (bson-has-field ?doc "locked-by") then                                                                         
      (bind ?locked-by (bson-get ?doc "locked-by"))                                                                    
    )                                                                                                                  
    (if (bson-has-field ?doc "lock-time") then                                                                         
      (bind ?lock-time (bson-get-time ?doc "lock-time"))                                                               
    ) 

But did anything change or did we always only acted upon outdated data?

@TarikViehmann
Copy link
Contributor Author

Also, looking into the log again, i find the trigger history for this resource a bit strange:

:g/Trigger.*resource-PROMISE-C-RS1-OUTPUT
D 14:02:05.021218 CLIPS (executive): Trigger: { "_id" : { "_data" : "826245983C000000152B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "insert", "clusterTime" : { "$timestamp" : { "t" : 1648728124, "i" : 21 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728124928 }, "locked-by" : "Icks" }, "ns" : { "db" :
 "robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" } }
D 14:02:36.206661 CLIPS (executive): Trigger: { "_id" : { "_data" : "826245985C000000032B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "update", "clusterTime" : { "$timestamp" : { "t" : 1648728156, "i" : 3 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : false }, "ns" : { "db" : "robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_i
d" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "locked" : false }, "removedFields" : [ "lock-time", "locked-by" ] } }
D 14:02:47.679170 CLIPS (executive): Trigger: { "_id" : { "_data" : "8262459867000000172B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "update", "clusterTime" : { "$timestamp" : { "t" : 1648728167, "i" : 23 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728167626 }, "locked-by" : "Upsilan" }, "ns" : { "db
" : "robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "lock-time" : { "$date" : 1648728167626 }, "locked" : true, "locked-by" : "Upsilan" },
 "removedFields" : [  ] } }
D 14:02:48.340873 CLIPS (executive): Trigger: { "_id" : { "_data" : "8262459867000000252B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "update", "clusterTime" : { "$timestamp" : { "t" : 1648728167, "i" : 37 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728167656 }, "locked-by" : "Set" }, "ns" : { "db" :
"robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "locked" : false }, "removedFields" : [ "lock-time", "locked-by" ] } }
D 14:02:48.341151 CLIPS (executive): Trigger: { "_id" : { "_data" : "82624598670000002B2B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "update", "clusterTime" : { "$timestamp" : { "t" : 1648728167, "i" : 43 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728167656 }, "locked-by" : "Set" }, "ns" : { "db" :
"robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "lock-time" : { "$date" : 1648728167656 }, "locked" : true, "locked-by" : "Set" }, "remove
dFields" : [  ] } }

In particular, i wonder why after a Trigger containing the full Document with locked-by Upsilan and updateField locked-by Upsilan we get the next Trigger containing a full Document with locked-by Set and removed-Field locked-by.
Did we miss a trigger here?

@morxa
Copy link
Member

morxa commented Apr 1, 2022

Looking at the doc of mongocxx change streams, the issue might be the following:

When set to ‘updateLookup’, the change stream will include both a delta describing the changes to the document, as well as a copy of the entire document that was changed from some time after the change occurred. This will be stored in the "fullDocument" field of the notification.

(emphasis mine).

So it may be that we get a trigger, but the document that is sent with the trigger already contains data from some subsequent change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants