Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are we sure Flavors are being removed on File deletion/removal from an ADO? #307

Open
DiegoPino opened this issue Mar 1, 2024 · 3 comments
Assignees
Labels
Can't reproduce Stuff that only fails in production Danger Mr Robinson Things so core to us that need extra care. Please submit automated tests? Drupal 10 Upgrade Economy (capitalism) enhancement New feature or request Events and Subscriber Files Never old, always fresh help wanted Extra attention is needed JSON Postprocessors Drupal Plugins that do stuff with JSON data Search API F around and find out Strawberry Flavor Post Processing data extracted that goes into Solr Testing Deal with automated and human testing Working Group's 💜 Imagined, curated and loved by the Working Group
Milestone

Comments

@DiegoPino
Copy link
Member

DiegoPino commented Mar 1, 2024

What?

I just stumbled on a few left over OCRs on a server. The case was a PDF file was removed and replace with a new one, the old OCRs were still there. I can't confirm is this happened because of me (e.g I re-enqueued everything to be reindex not letting the untracking to have an effect) or bc this https://github.com/esmero/strawberryfield/blob/1.4.0/src/EventSubscriber/StrawberryEventSaveFlavorSubscriber.php is not doing its job correctly or entering some type of race condition where untracking is being overriden by updating again (that would be a bug in the Search API)

To get to the obscure mechanics of "finding what needs to be deleted" on a file removal/adding, checking if untracking for deletion is persistent and works and that nothing else (e.g the new OCR processing starts first and gets in the way) I will have to do intensive testing. One idea I have is that because a NODE reindex will also request (at the Data Source level) a Flavor reindex, there are chances that if that reindex is requested, the untracking for deletion might not have any effect.

@alliomeria this is related to an open ticket of one of our users

A backup solution to this problem would be to "save in a key/value" what we know needs to be untracked too and use as secondary mechanist found in the search api (a hook) named ::alterIndexedItems() that need to be present in a processor (we have a few we could use/re-use) that could in case an already deleted item comes into the index, its intercepted there by querying this key/value from DB, removed and then the key/value is deleted. This key value could be temporary? Only issue with temporary is that those are used dependendent, so the "deleter" will generate it, but the index will run as anonymous ....

Another issue might be a misconfiguration of the Solr fields belonging to a Flavor. We make a SOlr query to know what needs to be deleted. So maybe we should make some Solr fields "fixed" and untouchable... I saw a few examples somewhere on how to do that

mmmm

@DiegoPino DiegoPino self-assigned this Mar 1, 2024
@DiegoPino DiegoPino added this to the 1.4.0 milestone Mar 1, 2024
@DiegoPino DiegoPino added enhancement New feature or request JSON Postprocessors Drupal Plugins that do stuff with JSON data Testing Deal with automated and human testing Events and Subscriber Danger Mr Robinson Things so core to us that need extra care. Please submit automated tests? Working Group's 💜 Imagined, curated and loved by the Working Group Strawberry Flavor Post Processing data extracted that goes into Solr Search API F around and find out Drupal 10 Upgrade Economy (capitalism) Files Never old, always fresh help wanted Extra attention is needed labels Mar 1, 2024
@DiegoPino DiegoPino changed the title Are we sure Flavors are being removed on File deletion/removal of an ADO? Are we sure Flavors are being removed on File deletion/removal from an ADO? Mar 1, 2024
@DiegoPino
Copy link
Member Author

Some findings:

  • If an Object has a relationship and that was removed at the same time as a file was deleted, the Eventsubscriber will not know it needs to also update the parent ADO. This needs to be fixed. This here
    if ($parent_entity_index_needs_update && $entity->field_sbf_nodetonode instanceof EntityReferenceFieldItemListInterface) {
    $indexes = [];
    /** @var \Drupal\search_api\Plugin\search_api\datasource\ContentEntityTrackingManager $tracking_manager */
    $tracking_manager = \Drupal::getContainer()
    ->get('search_api.entity_datasource.tracking_manager');
    foreach ($entity->field_sbf_nodetonode->referencedEntities() as $key => $referencedEntity) {
    if (!isset($indexes[$referencedEntity->getType()])) {
    $indexes[$referencedEntity->getType()]
    = $tracking_manager->getIndexesForEntity($referencedEntity);
    }
    $updated_item_ids = [];
    $entity_id = $referencedEntity->id();
    $langcode = [$referencedEntity->language()->getId()];
    $combine_id = function ($langcode) use ($entity_id) {
    return $entity_id . ':' . $langcode;
    };
    $updated_item_ids = array_map($combine_id, array_values($langcode));
    if (isset($indexes[$referencedEntity->getType()])) {
    foreach ($indexes[$referencedEntity->getType()] as $index) {
    $index->trackItemsUpdated('entity:node', $updated_item_ids);
    }
    }
    }
    }
    }
    needs to run on both "previous" relations and "new relations". Only affects aggregation and/or transitive relationships (custom) that go from ADO down to OCR and not the other way around (reason why I dislike aggregating at all)

@DiegoPino
Copy link
Member Author

Another finding:

  • Since Drupal has made mandatory "Read Committed" as DB flag, deleting and accessing Key Values from a DB in a single PHP request might (might) lead to getting back on the "reading" what was already deleted. The way Drupal "commits" DB is in a Service destructor which basically means any DB transaction? (why) happens once all is done. This requires us to use a service level "cache" that can be used to access/always see what was deleted inside a single PHP request. For a good example see \Drupal\search_api\Utility\QueryHelper and how it caches results by IDs.

@DiegoPino DiegoPino added the Can't reproduce Stuff that only fails in production label Mar 5, 2024
@DiegoPino
Copy link
Member Author

So. I can't reproduce this after a lot of testing... running 1.4.0 with transaction isolation read committed, and doing exactly that repeatedly. Adding/removing OCR over and over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Can't reproduce Stuff that only fails in production Danger Mr Robinson Things so core to us that need extra care. Please submit automated tests? Drupal 10 Upgrade Economy (capitalism) enhancement New feature or request Events and Subscriber Files Never old, always fresh help wanted Extra attention is needed JSON Postprocessors Drupal Plugins that do stuff with JSON data Search API F around and find out Strawberry Flavor Post Processing data extracted that goes into Solr Testing Deal with automated and human testing Working Group's 💜 Imagined, curated and loved by the Working Group
Projects
None yet
Development

No branches or pull requests

1 participant