Various bugfixes and improvements around buffer md5 hash calculation and `fastdirty` #3430

dmaluka · 2024-08-17T15:28:50Z

See commit messages for details.

According to the Go hash package documentation [1]: type Hash interface { // Write (via the embedded io.Writer interface) adds more data to the running hash. // It never returns an error. io.Writer [1] https://pkg.go.dev/hash#Hash

Make calcHash() respect the buffer's file endings (unix vs dos), to make its calculation of the file size consistent with how we calculate it in other cases (i.e. when opening or saving the file) and with the `fastdirty` option documentation, i.e. make calcHash() return ErrFileTooLarge if and only if the exact file size exceeds 50KB.

This behavior is then aligned to the actual documentation of `fastdirty`. Additionally set the origHash to zero in case the buffer was already modified.

When we have already enabled `fastdirty` but have not updated origHash yet, we shouldn't use Modified() since it depends on origHash which is still outdated, and thus returns wrong values. This fixes the following issue: enable `fastdirty`, modify the buffer, save the buffer and disable `fastdirty` -> micro wrongly reports the buffer as modified (whereas it has just been saved). Note that this fix, though, also causes a regression: e.g. if we run `set fastdirty false` while fastdirty is already disabled, micro may unexpectedly report a non-modified buffer as modified (in the case if isModified is true but the buffer it actually not modified, since its md5 sum matches and fastdirty is disabled), since this fix assumes that since we are disabling fastdirty, it has been enabled. This shall be fixed by PR zyedidia#3343 which makes `set` do nothing if the option value doesn't change.

Let calcHash() unconditionally hash whatever buffer it is asked to hash, and let its callers explicitly check if the buffer is too large before calling calcHash(). This makes things simpler and less error-prone (no extra source of truth about whether the file is too large, we don't need to remember to check if calcHash() fails, we can be sure calcHash() will actually update the provided hash), and actually faster (since just calculating the buffer size, i.e. adding line lengths, is faster than md5 calculation). In particular, this fixes the following bugs: 1. Since ReOpen() doesn't check calcHash() return value, if the reloaded file is too large while the old version of the file is not, calcHash() returns ErrFileTooLarge and doesn't update origHash, so so Modified() returns true since the reloaded file's md5 sum doesn't match the old origHash, so micro wrongly reports the newly reloaded file as modified. 2. Since Modified() doesn't check calcHash() return value, Modified() may return false positives or false negatives if the buffer has *just* become too large so calcHash() returns ErrFileTooLarge and doesn't update `buff`.

Similarly to how we force `fastdirty` to true when opening a large file (when creating the buffer), force it also when reopening a file, in case the file on disk became large since we opened it.

JoeKar · 2024-08-18T15:15:21Z

Shouldn't we check the actual buffer size in Modified() before calling calcHash() and directly return b.isModified in case LargeFileThreshold is exceeded?
While typing the file could grow above the limit and in case we reopen it again we apply the special handling, but not in case it's growing in case fastdirty is off all the time.

dmaluka · 2024-08-18T15:52:25Z

I was thinking about that, but it's not so non-controversial. For example, if the user accidentally pastes a large chunk of text so that the file grows above the limit, we enable fastdirty, and then the user undoes this change and the file is small again, yet fastdirty is still enabled (or should we disable it again? so, should we remember if it was enabled by us or by the user?..)

...Honestly, this md5 hashing feature always seemed dubious to me. I think it would be better if we just had better support for basic detection: return non-modified status not only if the buffer was not modified but also if all its modifications have been undone, i.e. if the undo stack is empty (or with saveundo enabled, if the undo stack size is the same as when the file was opened).

This entire md5 feature seems like a workaround for the lack of this basic functionality. And it consumes much more CPU, of course.

And I'm actually planning to implement this basic functionality, for better detection when fastdirty is enabled. (It's really annoying e.g. when I accidentally edit something in internal/action/actions.go, whose size is above 50000 so fastdirty is forced, and then I undo all my edits and yet it is still "modified", so I have to save it to the disk to make it "non-modified".) But first we should fix the problems with the undo stack bypassed in some cases (e.g in Retab and MoveLinesUp).

JoeKar · 2024-08-18T16:34:25Z

For example, if the user accidentally pastes a large chunk of text so that the file grows above the limit, we enable fastdirty, and then the user undoes this change and the file is small again, yet fastdirty is still enabled (or should we disable it again? so, should we remember if it was enabled by us or by the user?..)

I was thinking about only returning b.isModified, but not touching fastdirty within Modified(). So it would be just a fast path of returning the actual state.

dmaluka · 2024-08-18T16:39:40Z

Hmm, maybe... OTOH that would be an undocumented behavior, and not consistent with other cases (in other cases we override fastdirty to true, while in this case we override the behavior of fastdirty set to false)...

JoeKar · 2024-08-18T17:54:12Z

Ok, acceptable.
Then we try it the other way around by fixing the basic modifications incl. undo stack and maybe we can remove the hashing then.

dmaluka · 2024-08-18T18:02:50Z

and maybe we can remove the hashing then.

Or at least disable it by default (I suppose most users would be fine with "unmodified == all modifications undone" instead of "unmodified == file content not changed", and micro will be more CPU thrifty).

dmaluka mentioned this pull request Aug 17, 2024

Rework filetype change, reload command and autosave #3343

Merged

dmaluka and others added 7 commits August 18, 2024 13:35

calcHash: remove unneeded h.Write() error checks

e0f5361

According to the Go hash package documentation [1]: type Hash interface { // Write (via the embedded io.Writer interface) adds more data to the running hash. // It never returns an error. io.Writer [1] https://pkg.go.dev/hash#Hash

buffer/settings: On fastdirty off set to on in case of "large file"

352fd2b

This behavior is then aligned to the actual documentation of `fastdirty`. Additionally set the origHash to zero in case the buffer was already modified.

buffer/settings: Add comment why do we need to zero origHash

d31095f

buffer: Set fastdirty=true for large file when reopening

0b15b57

Similarly to how we force `fastdirty` to true when opening a large file (when creating the buffer), force it also when reopening a file, in case the file on disk became large since we opened it.

dmaluka force-pushed the calchash-fixes branch from 57edb8f to 0b15b57 Compare August 18, 2024 13:53

dmaluka changed the title ~~Various small bugfixes in around buffer md5 hash calculation and fastdirty~~ Various bugfixes and improvements around buffer md5 hash calculation and fastdirty Aug 18, 2024

JoeKar approved these changes Aug 18, 2024

View reviewed changes

JoeKar merged commit f88ac6d into zyedidia:master Aug 18, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various bugfixes and improvements around buffer md5 hash calculation and `fastdirty` #3430

Various bugfixes and improvements around buffer md5 hash calculation and `fastdirty` #3430

dmaluka commented Aug 17, 2024

JoeKar commented Aug 18, 2024

dmaluka commented Aug 18, 2024

JoeKar commented Aug 18, 2024

dmaluka commented Aug 18, 2024

JoeKar commented Aug 18, 2024

dmaluka commented Aug 18, 2024

Various bugfixes and improvements around buffer md5 hash calculation and fastdirty #3430

Various bugfixes and improvements around buffer md5 hash calculation and fastdirty #3430

Conversation

dmaluka commented Aug 17, 2024

JoeKar commented Aug 18, 2024

dmaluka commented Aug 18, 2024

JoeKar commented Aug 18, 2024

dmaluka commented Aug 18, 2024

JoeKar commented Aug 18, 2024

dmaluka commented Aug 18, 2024

Various bugfixes and improvements around buffer md5 hash calculation and `fastdirty` #3430

Various bugfixes and improvements around buffer md5 hash calculation and `fastdirty` #3430