-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE-14: LoD Reconciling + CSV Export + Release Improvements #31
Conversation
Not perfect yet. Still some tiny issues missing and some re-reconciling/correcting UI needed. Should be ready tonight? Left on this pull still a few DPM()s for my own enjoyment.
@alliomeria gosh. I made it i think? OF course the UI can get some better things and we need to add a LOT of options to avoid "overwriting" all your lovely Manually LoD. But gosh. This is so good!!
And our CSV headers are not (lowercase/normalized) And gosh. Well. But this seems to work. Last commit for tonight.
@DiegoPino woohoo! This is indeed so good! 🌟😄 Is this ok for me to pull & use/test later today? |
@alliomeria i will need you to test this when you can. This is finally working as expected. For DCMNY this gives us initially 714 headers! but then the cleanup reduces things significantly to lower numbers (e.g for gsmt:icc to 87 or so) We get more data of course now!! (because of children really providing their correct data when not collapsing). Uff. Was super hard to find my own error but the gist for my own future me: Batch driven Plugins like the SolrImport one provide the headers via their ::getInfo method but because we do not really know until the batch ends how many of the headers will really survive the cleanup we do not set them into the actual Config (config that is stored in each AMI set) which really means we have to manually pass them to the batch! Good, so good
@alliomeria this includes now the correct Header normalization, inclusive for non collapsed Children. You may see a LOT of new headers appear when getting children (sorry) but its OK, better so. Still from the 714 initial fields, most of the time the real number ends being reduced to a manageable 87-88 one. Next step will be to allow all RELS-EXT an RELS-INT to not be harvest to make the amount smaller. Please give it a try tomorrow, this is really really better now! |
…ering On Books with pages
@DiegoPino --AMI's LoD Reconciliation is awesome. All’s been going very well and running quickly with various testing and checking (for batches big & small). This is such a great tool for Archipelago users, one that is going to enable a lot of very good work to take place in repository lands. 🤓 Will share some additional notes/comments about a couple small UI things a little later on (preview: think your idea of separating out the various elements into different pages/sections for the Edit LoD is a good one, especially for larger sets), but wanted to drop off this kudos here first. This is really great & I am truly looking forward to using this in our work ahead! |
@alliomeria this should fix the double encoding issue, but just to be sure, could you try Solr import 7 with a collection that has complex weird characters in its descriptions/titles (maybe NYHS?) and also with a google sheet using, e.g Japanese + "" and & and ' somewher? WE can test tomorrow together. Thing is i removed a lot of code that i "for some reason" had added, so wonder if i will not break now something somewhere else, and thus "testing" is required!
I was resetting the array to give this a default value. happens that i needed the first key, not the first value and now that our "labels" are Capitalized and not == to the value (e.g Direct v/s direct) i was breaking the Form State. What a mess!
- Can not handle CSV rows that are either indexed arrays (default) or associative ones. - ::createAmiSet() can be passed a Name (title/label) via the \stdClass $data using $data->name. used for now on the CSV export but as easy-peasy as it gets to be added to the AMI normal Multi Step one (next pull?)
And mostly the logic generated by \Drupal\views_bulk_operations\Form\ConfigureAction::buildForm where form_state does not survive the ajax triggered dynamic fields that have internally a submit (e.g OpenStreetmaps). So will mark as experimental until i find a strange/dark hack to get this rolling. webform elements, gosh
Quite smart may i say! So much and so little code really. Test it, let me know how it goes
Hello @DiegoPino & @aksm, adding some follow up related to the batch offsets per our recent discussions. When testing using the I7 Solr Importer for iterative batches for a single large (30k+ objects) collection (👀 photosnycbeyond:collection) using both the ISSUE-14 and 0.2.0 branches, I came across these 2 things:
|
@alliomeria thanks. Re 1. That is a glitch (a.k.a Bug): Because the batch sizes are of 100, I'm failing on limiting the last batch to the already harvested + missing part instead of going full in 100s. Can fix that. FYI. I can only do that for Top objects since we will never know upfront how many child objects we may have in case you go "uncollapsed" with your workflow Thanks, will work on that! |
- This adds new methods for offsetting and paging CSV data (was a pain because JSON pretty printed has breaklines). Since i was here i also added options for getting headers (or not) for offset and paged CSV access - Adds an actual pager to the Edit Reconciling page - Nothing of this is Done-done. I need to now change the "saving" algorithm to allow partial / offset saves to happen. Probably using the already created KeyValue for the set instead of reading the actual Form data as we did with large full sets and also need the "this was fixed" checkbox to be added to a new Column More tomorrow @alliomeria !
After dealing with Direct Ingest i broke Twig templates.... @patdunlavey ... you will see this fail... should back again to its normal behavior.
It was getting weird to have such cross dependencies. Now both deal with their own concerns and there is complementation. This changes a lot Also: - Json Decoding Cells now deals with Smart Quote exceptions and tries to be as smart as possible in case of errors. But if the key is one of our core ones ap: or as: then in case of failure we reset to NULL value. This allows us to avoid breaking code in case one our controlled values via a Twig template OR a Direct ingest ends with wrong data. @todo. Implement a JSON-SCHEMA level validation for our own internal control vocabulary (WIP!)
… numbers @alliomeria when you come back. Solr exact fetch is fixed, also fixes the real time report to the number of rows requested. Tested on collapsed/uncollapsed and with offsets to check if a set of consequent harvests with offsets was correct and not missing anything. All checks out!
…the options This is basically the same as setting type = "node" but allows us in the future to also allow certain Actions to run on e.g AMI Sets @alliomeria how to test? - git pull this branch - Clear Caches - Edit your Search and replace View. Remove the "Global: Views bulk operations" field (instead of installing/reinstalling the module, this seems to be the best option a.k.a as TRICK to make actions show again) - Add it again. The list should be refreshed. Check the ones you want to use - Save the view - test!
Hi @DiegoPino! I re-pulled earlier in the week and again a few hours ago. All has been going very well with various testing & so far have checked these things without encountering any issues:
Exporting:
Hope some of these notes are useful. I will keep testing & using AMI and report any issues if encountered. AMI is for sure a beast 🐯🐉 of a module as you say @DiegoPino! |
ps: @DiegoPino --your latest commit about the EntityChangedActionDeriver did address the issue I was having with that errant leftover Action. The Action was gone after clearing caches and I did not need to remove the "Global" Views bulk operations" from the view (I could though, if that's the best/a good trick to use in this case). Thank you very much for helping me sort out this issue in my local! |
@alliomeria thanks!!! We will come back to do some UX refinements to the CSV export when I'm back. I really appreciate all the testing! |
Happens that i uncommented a line i had commented because i used to be smart. Not anymore. If you add a #name property to an AJAX driven form with select, the original value gets cached for ever and never changes even if you submit it differently. How do i know this? Just because i tested it. There is NO documentation. Wonder if that is also the issue with Webform based find and replace??
Internally our CSV to data structure is already normalized for lower case (we need to document this better). So for Solr import, also inmediatelly set the column names to lower case and in general enforce checking against lowercases when doing validation. Does not require any change really on Twig templates because for Twig "data.HoLa" is the same as "data.hola". Just saying @alliomeria
all ZIP files will go eventually (once the AMI set entity is saved or when uploaded directly via "Edit" of a Set, into private://ami/zip
Since we can not predict the final extension of a download upfront without download i check now using gob() for the first part of the possible filename. If there is already a single (only a single one) File with the same starting future name (which is consistent since we use the URL of the file to generate it) then we reuse the already downloaded one. Will add also a button that deletes all files for an AMI set in the future and one that forces the download even if there. Also this piece of code ensures ZIP files (After first upload) are moved into private://ami/zip once the AMI set is saved
And first pass works, like a charm! Ok, need to add 1.- Config entry to set "what is many files" and what not. 2.- Better reuse of as:technical metadata, so we do not reprocess that (file re use after download is already in place). Probably the best thing ever here. @aksm @alliomeria
@alliomeria can you test with the same CSV with 387 objects? And please twice? First time may be slow-ish, second time quite fast
also reprocessing of cached TECHMD, etc
@dmer @patdunlavey can you check this code please (as you know release process so as soon as you can). I wonder if you removed moderation workflows which enables by default revisions or if you have not set revisions at the bundle level?) But this will force that.
Works. Looks good. I'm good. A good person right @alliomeria and @aksm ? Revisit CSV offsets once we are on PHP 8.1 I found a PHP BUG!!
Need to check now CSV write!
I have been willing to write the first PHP remote streamer library from forever and i never have the time. So for now, for the people that decide to put ALLL on S3 we have to download the file.. no other option. So we inject the StrawberryfieldFileMetadataService to reuse the ::ensureFileAvailability function. But we can not delete the file here since we may need it for the rest of our times.
But.. VBO actions is not respecting the Facet selection when using the "Select all" option... We should make sure that is clear? Documentation? Eventually patch VBO or, alter the form and remove Select All completely?
See #14 and #4
LOD reconciliation works but not perfect yet. Still some tiny issues missing and some re-reconciling/correcting UI needed. Should be ready tonight? Left on this pull still a few DPM()s for my own enjoyment.Main issue here is when the Terms to reconciliation are hundreds and our current form spends too much time building it. Will need a workaround that only enables the autocompletes on click (opening the eternal AJAX battle of dynamic field rendering of course)
On LOD CSV is being generated and each LoD endpoint is being called.
Fix/manual attaches also reconciled data to temporary structure that can be read into the twig template natively. Need more tests
CSV export works now perfectly (really)
Webform search and replace (what a mess but it works!)
Many improvement on AMI, Separate batches for files, Solr works better, CSV and ZIP can live in S3.
Forced revisions on Search and Replace and AMI Updates with revision Notes