Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installPkg: do no work when using cached package and annotate safely #726

Merged
merged 11 commits into from
Jan 3, 2024

Conversation

aronatkins
Copy link
Contributor

@aronatkins aronatkins commented Jan 2, 2024

First, the installPkg() function no longer does additional work when the package was discovered in the cache. This avoids the most common situation where multiple processes performing package installation can race to the annotatePkgDesc() and moveInstalledPackageToCache() calls.

Second, annotatePkgDesc() writes its update to a temporary file before renaming to the target DESCRIPTION. This avoids a (now more rare) situation where a second process might see an incompletely written DESCRIPTION file.

As a side-effect, only the originally installing actor will annotate the DESCRIPTION. Previously, an update occurred even when the package was used from the cache, which overwrote an existing annotation.

Update: After thinking about this problem a bit more, realized that the root cause to #720 is because annotatePkgDesc() was previously writing into the package cache. After calling restoreWithCopyFromCache(), the library contains a symlink into the cache. Calling annotatePkgDesc() reads through the symlink and modifies the in-cache DESCRIPTION. This is why we saw multiple actors reading/writing the same file.

Fixes #720

R/restore.R Outdated
# Write it out using a temporary file so DESCRIPTION is never partial.
tmpf <- tempfile(tmpdir = dirname(descFile))
write_dcf(content, tmpf)
file.rename(tmpf, descFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will emit a warning and return false on failure -- do we need to handle that scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I had dismissed the need to handle the error because the directory is known to be writable and we are not performing a cross-file-system move, but it's probably worth checking the return value.

Copy link
Contributor

@toph-allen toph-allen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major comments from a code read. Questions that came up were answered elsewhere in the code.

One conceptual question: Do we know why the description was updated when installing from the cache? Are there any times where un-annotated descriptions can enter the cache (e.g. a package was installed via install.packages without Packrat) where we'd want to add the annotation? And if so, would the atomic updates (via renaming a temp file) be sufficient to fix the problem?

@kevinushey
Copy link
Contributor

Are there any times where un-annotated descriptions can enter the cache (e.g. a package was installed via install.packages without Packrat) where we'd want to add the annotation?

A long time ago, Packrat tried to distinguish between packages which were installed by Packrat itself, versus packages that were installed by the user. The intention here was to have a way of differentiating between packages that were "managed" or "known" to Packrat, and use that to customize the UI presented to the user in certain scenarios (e.g. helping them choose whether to snapshot or restore).

In practice, users found the behavior hard to understand, so that was phased out over time. I'm not sure if we still make use of the InstallAgent annotations anywhere in the Packrat sources, but if we do, it'd be nice to drop those once and for all.

R/restore.R Outdated Show resolved Hide resolved
@aronatkins
Copy link
Contributor Author

I'm not sure if we still make use of the InstallAgent annotations anywhere in the Packrat sources

We do still use InstallAgent. See calls to installedByPackrat().

@aronatkins
Copy link
Contributor Author

Are there any times where un-annotated descriptions can enter the cache (e.g. a package was installed via install.packages without Packrat) where we'd want to add the annotation? And if so, would the atomic updates (via renaming a temp file) be sufficient to fix the problem?

Users do not enroll packages in the cache; only Packrat performs cache additions and cache-uses while installing a package. I suppose it's possible that a user could add something to the Packrat library, but that is separate from cache management.

@aronatkins aronatkins merged commit 7062580 into main Jan 3, 2024
9 checks passed
@aronatkins aronatkins deleted the aron-annotate branch January 3, 2024 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

packrat restore race condition
3 participants