Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two new fields: wf_edited_by and wf_event? #1634

Open
fjorba opened this issue Oct 9, 2024 · 24 comments
Open

Two new fields: wf_edited_by and wf_event? #1634

fjorba opened this issue Oct 9, 2024 · 24 comments

Comments

@fjorba
Copy link
Contributor

fjorba commented Oct 9, 2024

Most of Muscat main tables have workflow (wf_*) fields like:

  • wf_owner, integer, stores the user id who created the record.
  • wf_stage, integer, stores the record status (published, unpublished, deleted).
  • wf_audit, integer, stores other values like minimal, full, basic, imported, etc.

I tried to understand how paper_trail works, and how it fits into Muscat (as like any other Rails application), and I learned that paper_trail "stores the pre-change version of the model" (https://github.com/paper-trail-gem/paper_trail?tab=readme-ov-file#1c-basic-usage), but not the current one, that lives in its standard Rails table. Paper_trail uses two fields to keep who did the last change (whodunnit) and which event created this last change (event).

With this knowledge, many months ago we were able to create an export and import procedure to migrate all our record history from Invenio into Muscat (currently 4,601,595 versions for our largest database).

However, as we are testing our Muscat instances, we noticed that there is no fields to store the equivalent of the the whodunnit or event fields into the current version of the record. Those fields should be the ones that would mutate into paper_trail versions table when a new version of a record is updated (whodunnit probably should store the text identification of the user). Otherwise, we cannot know who and how the record was last modified, and audit it.

So, we are interested in two new fields, and I'd like to ask whether this need could be shared, and part of standard Muscat:

  • wf_edited_by, integer, should store the user id of the last person who edited the record. I propose this name as whodunnit is quite ugly and the meaning not obvious (at least it was not obvious to me!).
  • wf_event, text, would be the equivalence of the event field of the versions table. One clear value would be "Muscat editor" if the record was manually edited by a person, and some identification of the program o procedure if the update was done by a program (we do many updates via automated procedures). Unless I've messed up our instances (which I cannot discard), now it always says "update".

During the last weeks I have done some ugly hacks to overcome this limitation without altering Muscat database schema, but the more it goes, the more I feel that this is fragile and probably the solution could be solved in Muscat itself.

Would you accept patches and migrations to initially create the fields? Updating them (should be automatic) and displaying them would be a second phase.

@fjorba
Copy link
Contributor Author

fjorba commented Oct 14, 2024

Let me make ask it in another way: in your Muscat instance at RISM, can you know who did the last modification of a record? (Or which program, if it was an automatic correction or update.) And the list of all people who modified it (and when and what, of course)? Would you be interested in this information?

@xhero
Copy link
Contributor

xhero commented Oct 14, 2024

@jenniferward @lpugin What do you think? In theory we can use the versions to know who made a snapshot, but not all events are always snapshotted.

@ahankinson
Copy link
Contributor

wf_event is similar to the "commit message" idea we've been throwing around...

@xhero
Copy link
Contributor

xhero commented Oct 14, 2024

true, it could have a default value or some message users can add

@lpugin
Copy link
Contributor

lpugin commented Oct 14, 2024

Yes, I think this is a similar idea. It should be also in the version history, maybe simply be using it int the versions.event column?

I agree we need a default message, because otherwise people just put subjective information (e.g., "minor changes")

@fjorba
Copy link
Contributor Author

fjorba commented Oct 14, 2024

Not really, the wf_event should be automatic and, as we have been using are, for example:

  • Muscat editor
  • Modified via person authority
  • Modified via person merge
  • fix773g (or any other automated program)

Except for the last one, all they also have the person (user id) who has edited or merged the records. It is not needed to explain it, like a git commit message, it is automatic.

I have it working in my instances, but reusing unused db fields. I'd prefer to have them right.

@lpugin
Copy link
Contributor

lpugin commented Oct 14, 2024

A problem we had when storing the user as id in versions was when a user would eventually be deleted. That might be an issue with wf_edited_by too.

@fjorba
Copy link
Contributor Author

fjorba commented Oct 14, 2024

That's why I proposed that in the versions table the whodunnit field should mutate into a text version of this user. We store the email address.

However, probably a better policy would be to never delete users, just mark them as deleted.

@fjorba
Copy link
Contributor Author

fjorba commented Oct 14, 2024

OTOH, the problem of deleted users also happens now with wf_owner, that keeps the user id.

@lpugin
Copy link
Contributor

lpugin commented Oct 14, 2024

Yes, but it makes sense to require for the owner of a record to be changed before you can deleted a user.

@fjorba
Copy link
Contributor Author

fjorba commented Oct 15, 2024

This is an example of one of our records history. The screenshot is part of an alternative record history view that I will propose to you in the future, in case that interests you (both can coexist: the current Muscat one and the one that we use here). Those messages are imported:

Captura de pantalla_2024-10-15_08-17-05

And this is a one message created already in Muscat:

image

@fjorba
Copy link
Contributor Author

fjorba commented Oct 17, 2024

Hi @jenniferward, your thoughts about this issue will be much appreciated. Again, here I'm just currently asking for Muscat to add a couple of fields for each record type to store who did the last modification of the record and which tool/method/program was used. If the knowledge of this information is useful or necessary (for us it is necessary to comply with the institutional repository auditing procedures), those fields are needed, because currently there is no place to keep them.

It is not yet time to discuss the exact nature of this information, or how it will be stored or shown; that will be part of a future issue.

@jenniferward
Copy link
Contributor

I'm not sure if I understand all the technical implications, but I'll chime in with this:
It would be good if the script changes were more specific about what the script did, even if in a general way. Currently they are recorded as author: "system" event: "update" changes. A brief description of the goal of the script or naming a GitHub ticket number would help reconstruct the history.

Some changes done on the authority side are captured in sources with the names of the authority editors (example: https://muscat.rism.info/admin/sources/600147100/edit), but (for reasons I don't understand) not all of them (https://muscat.rism.info/admin/sources/600147109/edit). Here, "system" and "update" again. A more specific description would be more helpful, just so we don't have to click around and guess.

New records, even from imports, don't have a modification history. I guess this makes sense in a way, but sometimes it would be good to mark the start of a record as an "event" (record created / record imported). This imported record https://muscat.rism.info/admin/sources/600100740 is dated Thursday 22 March 2012 but this is when the record on the originating library's side created the record. The history should state when it came into Muscat.

That is all on the technical side, though. Catalogers don't need to record changes because our assumption so far has been that the record history captures this. Any decisions worth preserving should be in a 500 note (if it is relevant to everyone, like a composer attribution) or 599 (if it is really just an internal thing).

@fjorba
Copy link
Contributor Author

fjorba commented Oct 18, 2024

May I understand that there are arguments for creating them, and no one against? Are the names ok? I'd like to prepare the migration to have them available, so I rewrite my ugly hack and use them properly.

@xhero
Copy link
Contributor

xhero commented Oct 18, 2024

Sorry I just had a moment now to read through this. In principle for me it is ok to add a field to track the last user that saved a record, but I'm wondering isn't this a duplication of the versions in PaperTrail? Any time you save a record the current (pre-modification) version is saved, but with the user that is doing the modification. In other words if I save a document a snapshot is created with my name attached to it. So the last version has always the name of the last person that changed the record (even if the contents of the version are version before the changes)
For example, I just edited a record:
Screenshot 2024-10-18 alle 09 30 02
The contents of that version are the record before my changes, but it records correctly me as the last person that edited it. If someone then comes in and modifies the record again they will get pushed above me and be the last person to edit the record. So I'm just wondering if we cannot leverage the functionality that is already there.
We could make the "event" editable to I can type in something else than "updated".
Another limitation (or feature, depending where we see it) is that whodunnit (other than the ugly name) is a string field, so it could be useful to also have a reference to the user id?

@fjorba
Copy link
Contributor Author

fjorba commented Oct 18, 2024

I had been puzzled by this duality myself, and it took me a while to conclude that a new field indeed is necessary.

For example, when a program performs an automatic correction. In the record that you have put as an example, the version of the record that you edited will be pushed to the versions table, and your name as whodunnit. You need some field that the current version has been updated by a program named fixbadchars (to use #1599 as example) as user Admin (for example, or System, or creating a new user for those automated corrections).

Another case is when the bibliographic record (for example the one you modified) is updated because an authority record is updated. If @jenniferward corrects the name, or dates, of a composer, all the associated bibliographic records (and sometimes other authorities) should reflect that this version has been updated by her, through an authority record update. The Admin user and fixbadchars event will be pushed down into versions and Jennifer will be the author of the current version as wf_changed_by, via an authority update wf_event. Similarly when merging authority records. In our current system we know (as seen in the screenshots above) all the modifications of a bibliographic records even if they are indirect.

People make mistakes, or interpret things differently, or misunderstand a guideline. Knowing when, who and how each update was make is important. I have patched Muscat to report all those changes, but of course I'll have to adapt it to reuse the "correct" new wf_* fields.

@fjorba
Copy link
Contributor Author

fjorba commented Oct 18, 2024

Regarding the whodunnit paper_trail field in versions table, I think that should be a text version of the user, either user.name or user.email.

@xhero
Copy link
Contributor

xhero commented Oct 18, 2024

Sorry I'm afraid I don't follow completely. Currently, when you save a record, the version pre-save plus the user that is saving are pushed in the versions stack. This means that the last item on top of the stack always contains the name of the last user that modified the record. Isn't this what we are trying to implement with wf_changed_by?

Scripts are a bit different. When you save stuff from the backend, as default you get snapshots. Since there is no user there, it is set to NULL (and displayed as System) or Admin depending where it is run. The event field can be set or not depending on the script. For example, the SaveItemsJob, which is responsible for saving sources when @jenniferward corrects a name, will set event to "auth save" and user to NULL.
Other scripts may write longer events, such as Fix encoding: "Ün Übersetzung Ün, and others will not make version snapshots at all. This is normally the case when many thousands of records have to be saved and we don't want to create 100k+ versions. For example, I will need to resave all the 156'533 People we have in Muscat. PaperTrail will be disabled or it will create over 150k versions.
Is this one of the use cases you are thinking of (it is quite rare tbh)

@fjorba
Copy link
Contributor Author

fjorba commented Oct 18, 2024

Ok, let's take the automatic programs aside and modifications via authority records. As I understand it, the current situation, without the new wf_* fields, is:

  1. User A creates record 1234 (v1). wf_owner is A, there are no versions (in the versions table, that is).
  2. User B modifies record 1234 (v2). wf_owner is A, there is one version (v1), with user A as whodunnit. Where is stored that user B has done the last modification?

What I pretend with the new wf_* fields is:

  1. User A creates record 1234 (v1). wf_owner is A, wf_edited_by is A, there are no versions.
  2. User B modifies record 1234 (v2). wf_owner is A, wf_edited_by is B, there is one version (v1), with user A as whodunnit).

@xhero
Copy link
Contributor

xhero commented Oct 18, 2024

wait! in the example 1, when user B makes 1234 (v2), the old version is of v1 BUT whodunnit == user B. To say it differently: user B makes v2 and saves v1 in the versions, setting whodunnit as itself. You can see it on my example: before I saved the record there were no versions, but when I saved it is saying that the latest version was saved by me at 9.29.

@lpugin
Copy link
Contributor

lpugin commented Oct 18, 2024

Yes, I think that makes sense. It seems that what your proposal to have wf_event and wf_edited_by in the records implies to change where things are stored in version because I would expect that wf_event and wf_edited_by (i.e.,wf_whodunit) would be preserved in version::event and version::whodunit.

We now store the information about the modification, so we have something like:

Edit: reverse the version numbers

record (wf_)event (wf_)whodunit
source - -
version3 update title user1
version2 update author user2
version1 update subject user1

And we would then do:

record (wf_)event (wf_)whodunit
source update title user1
version3 update author user2
version2 update subject user1
version1 import admin

We would need to see how the existing data should be migrated.

@xhero
Copy link
Contributor

xhero commented Oct 18, 2024

But do we actually need this? because right now, when any user saves a record, that action is recorded. If I click "save" in a record it will create a version that says "Rodolfo Zitellini saved this record at 12h30" and then will backup the record as it is before saving. So Source.versions.last is always pointing to the last person that touched the record.

@fjorba
Copy link
Contributor Author

fjorba commented Oct 18, 2024

Oh, oh... Sorry for misunderstanding paper_trail. As I read the documentation to create our migrating procedure and using our previous setup as starting point, my glasses didn't allow me to understand how paper_trail works. Maybe it's me who has to review our versions table.

I'll thing harder about it, and also about wf_event.

@xhero
Copy link
Contributor

xhero commented Oct 18, 2024

Well I go agree with both you and @lpugin that it is super confusing as it is :) That's why I wanted to be sure we are all on the same page!
I do think we can make this better, but I need a moment to think how it will affect our data and make sure we don't end up in an even more confusing situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants