Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various bugfixes and improvements around buffer md5 hash calculation and fastdirty #3430

Merged
merged 7 commits into from
Aug 18, 2024

Conversation

dmaluka
Copy link
Collaborator

@dmaluka dmaluka commented Aug 17, 2024

See commit messages for details.

dmaluka and others added 7 commits August 18, 2024 13:35
According to the Go hash package documentation [1]:

type Hash interface {
	// Write (via the embedded io.Writer interface) adds more data to the running hash.
	// It never returns an error.
	io.Writer

[1] https://pkg.go.dev/hash#Hash
Make calcHash() respect the buffer's file endings (unix vs dos), to make
its calculation of the file size consistent with how we calculate it in
other cases (i.e. when opening or saving the file) and with the
`fastdirty` option documentation, i.e. make calcHash() return
ErrFileTooLarge if and only if the exact file size exceeds 50KB.
This behavior is then aligned to the actual documentation of `fastdirty`.
Additionally set the origHash to zero in case the buffer was already modified.
When we have already enabled `fastdirty` but have not updated origHash
yet, we shouldn't use Modified() since it depends on origHash which is
still outdated, and thus returns wrong values.

This fixes the following issue: enable `fastdirty`, modify the buffer,
save the buffer and disable `fastdirty` -> micro wrongly reports the
buffer as modified (whereas it has just been saved).

Note that this fix, though, also causes a regression: e.g. if we run
`set fastdirty false` while fastdirty is already disabled, micro may
unexpectedly report a non-modified buffer as modified (in the case if
isModified is true but the buffer it actually not modified, since its
md5 sum matches and fastdirty is disabled), since this fix assumes that
since we are disabling fastdirty, it has been enabled. This shall be
fixed by PR zyedidia#3343 which makes `set` do nothing if the option value
doesn't change.
Let calcHash() unconditionally hash whatever buffer it is asked to hash,
and let its callers explicitly check if the buffer is too large before
calling calcHash(). This makes things simpler and less error-prone
(no extra source of truth about whether the file is too large, we don't
need to remember to check if calcHash() fails, we can be sure calcHash()
will actually update the provided hash), and actually faster (since just
calculating the buffer size, i.e. adding line lengths, is faster than
md5 calculation).

In particular, this fixes the following bugs:

1. Since ReOpen() doesn't check calcHash() return value, if the reloaded
   file is too large while the old version of the file is not,
   calcHash() returns ErrFileTooLarge and doesn't update origHash, so
   so Modified() returns true since the reloaded file's md5 sum doesn't
   match the old origHash, so micro wrongly reports the newly reloaded
   file as modified.

2. Since Modified() doesn't check calcHash() return value, Modified()
   may return false positives or false negatives if the buffer has
   *just* become too large so calcHash() returns ErrFileTooLarge and
   doesn't update `buff`.
Similarly to how we force `fastdirty` to true when opening a large file
(when creating the buffer), force it also when reopening a file, in case
the file on disk became large since we opened it.
@dmaluka dmaluka changed the title Various small bugfixes in around buffer md5 hash calculation and fastdirty Various bugfixes and improvements around buffer md5 hash calculation and fastdirty Aug 18, 2024
@JoeKar
Copy link
Collaborator

JoeKar commented Aug 18, 2024

Shouldn't we check the actual buffer size in Modified() before calling calcHash() and directly return b.isModified in case LargeFileThreshold is exceeded?
While typing the file could grow above the limit and in case we reopen it again we apply the special handling, but not in case it's growing in case fastdirty is off all the time.

@dmaluka
Copy link
Collaborator Author

dmaluka commented Aug 18, 2024

I was thinking about that, but it's not so non-controversial. For example, if the user accidentally pastes a large chunk of text so that the file grows above the limit, we enable fastdirty, and then the user undoes this change and the file is small again, yet fastdirty is still enabled (or should we disable it again? so, should we remember if it was enabled by us or by the user?..)

...Honestly, this md5 hashing feature always seemed dubious to me. I think it would be better if we just had better support for basic detection: return non-modified status not only if the buffer was not modified but also if all its modifications have been undone, i.e. if the undo stack is empty (or with saveundo enabled, if the undo stack size is the same as when the file was opened).

This entire md5 feature seems like a workaround for the lack of this basic functionality. And it consumes much more CPU, of course.

And I'm actually planning to implement this basic functionality, for better detection when fastdirty is enabled. (It's really annoying e.g. when I accidentally edit something in internal/action/actions.go, whose size is above 50000 so fastdirty is forced, and then I undo all my edits and yet it is still "modified", so I have to save it to the disk to make it "non-modified".) But first we should fix the problems with the undo stack bypassed in some cases (e.g in Retab and MoveLinesUp).

@JoeKar
Copy link
Collaborator

JoeKar commented Aug 18, 2024

For example, if the user accidentally pastes a large chunk of text so that the file grows above the limit, we enable fastdirty, and then the user undoes this change and the file is small again, yet fastdirty is still enabled (or should we disable it again? so, should we remember if it was enabled by us or by the user?..)

I was thinking about only returning b.isModified, but not touching fastdirty within Modified(). So it would be just a fast path of returning the actual state.

@dmaluka
Copy link
Collaborator Author

dmaluka commented Aug 18, 2024

Hmm, maybe... OTOH that would be an undocumented behavior, and not consistent with other cases (in other cases we override fastdirty to true, while in this case we override the behavior of fastdirty set to false)...

@JoeKar
Copy link
Collaborator

JoeKar commented Aug 18, 2024

Ok, acceptable.
Then we try it the other way around by fixing the basic modifications incl. undo stack and maybe we can remove the hashing then.

@JoeKar JoeKar merged commit f88ac6d into zyedidia:master Aug 18, 2024
3 checks passed
@dmaluka
Copy link
Collaborator Author

dmaluka commented Aug 18, 2024

and maybe we can remove the hashing then.

Or at least disable it by default (I suppose most users would be fine with "unmodified == all modifications undone" instead of "unmodified == file content not changed", and micro will be more CPU thrifty).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants