Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Renamed packages should redirect to the new name #268

Open
savetheclocktower opened this issue Jul 30, 2024 · 3 comments
Open

[BUG] Renamed packages should redirect to the new name #268

savetheclocktower opened this issue Jul 30, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@savetheclocktower
Copy link
Sponsor

(Couldn't decide between “feature” and “bug,” so I flipped a coin.)

What is the Bug
We support the renaming of a package, but this only seems to point the GitHub repo to a new name. It effectively “orphans” the old name and its subscribers.

Instead, the old name should function as an “alias” for the new name. Consider a package renamed from old-name to new-name; when Pulsar checks what the latest version of old-name is, it should get the same result as if it had asked what the newest version of new-name is. The package “card” in Pulsar can show the new name, but otherwise nothing else would have to change about the repo on the user's machine; it could still exist at ~/.pulsar/packages/old-name without any problem.

I'm pretty sure this would be seamless. An edge case would be if the user explicitly searched for new-name; ideally they'd get the new-name result and Pulsar would know that it was already installed.

How to Replicate the Bug
Check out these search results. They reflect both the original name of a package and what I renamed it to.

@savetheclocktower savetheclocktower added the bug Something isn't working label Jul 30, 2024
@savetheclocktower
Copy link
Sponsor Author

I didn't even know you could rename packages until I started contributing to Pulsar, so I had no idea whether the feature had been properly implemented back in the Atom days. Since NPM doesn't have the ability to rename packages, I was skeptical that Pulsar and PPM had managed to handle this seamlessly.

So I did some digging. Here's what I found. (Keep in mind that most of this is hypothetical, and can’t actually be tested until we fix how the /packages/:packageName endpoint works.)

  • In all these scenarios, imagine that a package has been renamed from old-name to new-name and had a new version 0.0.2 published under new-name. The last version published under old-name was 0.0.1. The user has version 0.0.1 of the package installed under its previous name.

  • ppm’s ability to migrate a renamed package to a new directory hinges on this flow:

    • User runs ppm install old-name
    • /api/packages/old-name returns JSON with a name field of new-name
    • PPM installs the package into $ATOM_HOME/packages/new-name (not $ATOM_HOME/packages/old-name)
    • PPM sees that the argument was old-name, yet the package says it’s called new-name; it thus knows it should attempt to clean up the old directory.
    • So it deletes $ATOM_HOME/packages/old-name.
  • So here’s where it’s tricky, and where I think ppm does the right thing:

    • To detect which packages are outdated, ppm outdated gets a list of installed packages and maps each one to an HTTP request against the API.
    • old-name will be in that list and will map to a request to /api/packages/old-name — the exact same endpoint that it’ll request in an upgrade scenario.
    • The only thing that’s actually preserved from the API request is the latest version metadata. Everything else re-uses the metadata we already have about the package on disk. So we take version out of the response of /api/packages/old-name and nothing else. At this point, ppm hasn’t bothered to check whether the package has a new name yet. It will report that old-name has an upgrade. This is crucial; if it reported the package's new name at this point, some of this stuff wouldn't work!
  • At this point, everything should work right:

    • If the user runs ppm upgrade and accepts the confirmation, ppm will run the equivalent of ppm install [email protected], which does the right thing for the reasons explained above.
    • If the user runs ppm upgrade old-name, ppm will upgrade all packages that match old-name, and the effect will be the same.
    • If the user somehow runs ppm upgrade new-name, it won’t work, since there isn’t a local package called new-name yet.
  • There’s one possible wrinkle: if the user runs ppm install new-name but doesn’t realize that they’ve already got that package installed under a different name, then there’s no way for ppm to know that it should clean anything up. I don’t think this is a big deal, and it can be addressed with explicit checking by the package in question. (On startup, if both old-name and new-name are detected among the local packages, new-name can pop up a notification asking the user to uninstall old-name.)

    • Eventually, though, we might want to include a new piece of metadata at /packages/:packageName that lists all of a package’s previous names. Since those names can’t be used for new packages, it should always be safe to test for the existence of directories by those names locally, and to delete those directories when we find them.
  • The GUI in settings-view should operate identically, since it delegates to ppm outdated --json to know which packages to upgrade.

This was a rollercoaster — I spent at least half an hour writing this where I was convinced that this flow wouldn’t work, but then I read the code more closely. I think the takeaway is this:

  • Internally, when we ask for a package by name, we should make sure that we translate that to a package pointer, then look up the packages by that pointer so that we get the metadata for the package’s most recent name. Currently, I think we include the original name in the database query, which means we’re not going to pick up on any versions of the package published under its new name.
  • Once we do that, /pakages/:packageName will (I think) automatically return the latest metadata for a package, no matter which of its names we use to request it. If we’re being technical, we should issue a 301 redirect whenever a package is looked up under an old name; this is right to do on the web site for SEO and for other reasons, but I think we can also get away with doing it for the API! The crucial thing is that requesting /packages/old-name will seamlessly return JSON with a name field of new-name.

Honestly, though, the first item is the crucial one; once that logic fix is deployed, we can run real-world experiments about this stuff and act accordingly.

@confused-Techie
Copy link
Member

Fantastic write up here.
So from reading that and seeing how empty our current logic is during a rename, it seems to me the best thing to do and the easiest fix is we add an additional field to the names table of the database. This could simply be migrated_name which contains the new name of a given package. From there this function returns a call to itself for that package's name.

Since we return another call this could follow infinite changes no matter how many times a package's name is changed, and ensures it will always return the most up to date package name, which then can be returned via the API.

While the method outlined above isn't the best for SEO, since like you said that should be a 301 redirect, this could get things working rather quickly, and sometime later we look into a proper redirect. But a redirect could likely be accomplished with an check of the name being returned having matched the name request by the user and redirecting. Which actually sounds incredibly easy, but does require both implementations.

I'll try to take a shot at this one later today if I find the time

@savetheclocktower
Copy link
Sponsor Author

savetheclocktower commented Aug 4, 2024

I opened #269 yesterday to improve the testing situation around package renaming and make it better resemble real-world renaming scenarios.

Here's a rough checklist of how to proceed:

  • Introduce a schema change to make it possible to get the canonical name of a package from either an older package name or a pointer. My thought was to introduce a new canonical boolean column into names; @confused-Techie’s idea to add a migrated_name column seems just as good.
  • Rewrite getPackageByName to first convert a package's name to its canonical name, then look up the package by name and pointer from the packages table. In my experiments, the INNER JOIN on names needed to be removed so that we didn't return extra results for versions, but I'm sure there are other ways to pull it off.
  • Ensure searches don't return any results from non-canonical package names, nor accidentally duplicate results if a package has more than one entry in the names table. (This isn't a huge deal, since clicking on a result would load the package's new name anyway;)
    • For instance: if I rename test-rename to test-renamed, then a search for test-rename should return test-renamed (since the search term is a substring of the new package name), but should not return a result for test-rename (since it's an old package name).
    • Extra credit: if I rename old-name to new-name, then a search for old-name probably should return new-name. The result would still be listed under new-name.
  • In package-frontend, ensure we redirect if the :packageName parameter in the route doesn't match up with the name of the package that we got from the API.
  • Extra credit: issue a redirect in the API when /api/packages/:packageName is hit with the old name of a package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants