Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use builds.dotnet.microsoft.com not download.visualstudio.microsoft.com #9675

Open
richlander opened this issue Dec 31, 2024 · 7 comments
Open

Comments

@richlander
Copy link
Member

releases.json and release.json files currently reference the VS CDN. That's a bad practice since it requires that both builds.dotnet.microsoft.com and download.visualstudio.microsoft.com are up. Is either is down, then users cannot download files via this flow. That naturally means that the JSON files should reference files on the builds.dotnet.microsoft.com CDN. Also, the VS CDN doesn't have stable URLs, making it impossible to have a stable link to the JSON files should we want to locate them there instead.

Example JSON file:

"files": [
{
"name": "dotnet-runtime-linux-arm.tar.gz",
"rid": "linux-arm",
"url": "https://download.visualstudio.microsoft.com/download/pr/b4d8f2f3-a0fd-4d48-b584-cae2c3af5c06/97479f98b5746e515d7d99f72b67c852/dotnet-runtime-8.0.11-linux-arm.tar.gz",
"hash": "279b93bf6b5c5c2f45427b620c56bff0e22ec8f3fb9a4f3749e7a6a0d0d0ee8163851b5bd081c6814b758068df7ba1b9401c844ba5905b27a830020846ef6406"
},
{
"name": "dotnet-runtime-linux-arm64.tar.gz",
"rid": "linux-arm64",
"url": "https://download.visualstudio.microsoft.com/download/pr/501c5677-1a80-4232-9223-2c1ad336a304/867b5afc628837835a409cf4f465211d/dotnet-runtime-8.0.11-linux-arm64.tar.gz",

We are currently validating that the Azure Front Door and Akamai CDNs that we'll be using provide good latency/service to various large countries.

This change also means that we will no longer be publishing new URLs for the VS CDN. It will be used as a private implementation detail for .NET delivery via Visual Studio. All users will need to move to using the dotnet CDN.

FYI: We're going to call builds.dotnet.microsoft.com the "dotnet CDN". Feel free to use that terminology. We intend for it to have a very long future.

@agocke
Copy link
Member

agocke commented Jan 5, 2025

It's actually worse than that: the current behavior is inconsistent and, as far as I can see, cannot be made consistent using the current layout.

What I mean by consistent is the data in one of the JSON files shouldn't be contradicted by the links and data in referenced files. So, for instance, a URL in the original JSON file should never 404. There may be network issues that prevent delivery, but a 404 would indicate that the content is inconsistent.

As far as I can see, we may have some limited ability to ensure consistency in one CDN. For instance we could stage our releases-index.json such that the referenced URLs are uploaded before we publish the index.

But across CDNs I think this is just impossible. So, an example where this is broken is the CDN-delivered releases-index.json file. The file at https://builds.dotnet.microsoft.com/dotnet/release-metadata/releases-index.json looks like

{
    "$schema": "https://json.schemastore.org/dotnet-releases-index.json",
    "releases-index": [
        {
            "channel-version": "9.0",
            "latest-release": "9.0.0",
            "latest-release-date": "2024-12-03",
            "security": true,
            "latest-runtime": "9.0.0",
            "latest-sdk": "9.0.101",
            "product": ".NET",
            "support-phase": "active",
            "eol-date": "2026-05-12",
            "release-type": "sts",
            "releases.json": "https://dotnetcli.blob.core.windows.net/dotnet/release-metadata/9.0/releases.json",
            "supported-os.json": "https://dotnetcli.blob.core.windows.net/dotnet/release-metadata/9.0/supported-os.json"
        }
...

Note the this file was delivered from https://builds.dotnet.microsoft.com, but the referenced releases.json is at https://builds.dotnet.microsoft.com. Even if we published things in the right order, we can't control CDN delivery or cache expiry. There's no way to guarantee that the CDN info is consistent with the referenced JSON file. And note that there's lot's of stuff that can (and does) go out of date! The latest SDK could very easily be older on the CDN than in the releases.json file.

This isn't a theoretical matter -- in practice I've seen the CDN take 24 hours to replicate after we release a new build. So in multiple cases with dnvm I've had inconsistent data from the two endpoints that has caused crashes and data corruption.

I've currently just abandoned using any CDN at all. It's fine if the CDN is out of date, but inconsistencies are hard to impossible to guard against. It makes all the data suspect and it's not worth the trouble to try to sort through it.

I think the right answer here is that we should always use the same source domain URLs in each JSON file. In fact, we might want to consider using relative paths instead of absolute paths. The relative path needs to implicitly resolve against the same domain used originally.

@richlander
Copy link
Member Author

richlander commented Jan 5, 2025

Can you elaborate on how relative paths help?

This file will be hosted in two places:

  • builds.dotnet.microsoft.com
  • dotnet/core GitHub repo

All links (once we make the change) will reference builds.dotnet.microsoft.com.

I thought the behavior of CDNs was to request a file from origin that it didn't have and that a missing file added to origin 5 mins ago is the same as an aged out file that hasn't been requested in a year.

I read this post recently: https://scotthelme.co.uk/lets-encrypt-to-end-ocsp-support-in-2025/. The CDN / origin relationship seems to be the expected one.

It would be great to see logs that demonstrate the corruption / bad behavior you've seen with dnvm.

I am wondering if something else is it play. Perhaps we can look at the pattern that dnvm is using? There are lots of other users, for whom we have not received failure reports. Our own CI relies on a peer CDN every day w/new content. My understanding is that we are not seeing CI failures due to bad CDN behavior.

We are planning on telling people that using the blob storage URLs is unsupported. We'd like to eventually restrict their usage to the IP ranges used by our CDNs. I'm not sure doing that is actually practical, but that's the north star. Enabling people to use origin is bad. We've clearly had multiple bad patterns at play.

@agocke
Copy link
Member

agocke commented Jan 5, 2025

Sorry, no logs from dnvm, I didn't save anything. The specific behavior I saw was this:

  1. dnvm hits the CDN to grab the above releases-index.json file. It looks at the "latest-release" number and saves it.
  2. Then it tries to find the actual "releases.json" for the appropriate major release. It pulls the file from the URL in the "releases.json" field above.
  3. Once it has the releases.json file, it scans the "latest-release" field in this file as well. It asserts that this value and the previous value are the same.
  4. They are not! Command blows up.

What I believe was happening is that this was just after we pushed the release (day of). The new versions had not yet propagated to the CDN, but they did appear in the "dotnetcli.blob.core.windows.net" location. While this could theoretically even happen on the same host, in between updating the two resources, what I saw with the CDN was a lot of delay. This split situation was occurring for multiple hours, at least. The queries in dnvm are immediate, so otherwise the race would be on the order of hundreds of milliseconds.

This gets much worse if we delete a release from the files (which we have done!). Now not only is the latest release not the same, but the one on the CDN is newer than the one on the dotnetcli blob store. Moreover, the requested release won't appear in the list of available SDKs.

My idea is that the state should always be coherent. If one file says that the latest release is 9.0.100, the other one should agree unless a race is hit where 9.0.101 was published in between the queries. I think that's doable with a single domain/server. But I don't see how it's possible to ensure coherency if the files are delivered from multiple servers. The CDN (by design) may lag the original server by an indeterminate period.

Hence my proposal that we only list relative paths. In that case, I could ignore the linked domain and instead use the original domain as the resolved host.

Alternatively, we could stick with the plan of "we only ever use a single CDN's domain". That seems OK, but it does mean that if the CDN goes down, availability is down too. If we have confidence in the CDN uptime I'm OK with that too.

@richlander
Copy link
Member Author

This could also be caused by human error. We are going to write some tools that validate that the files are always coherent.

@agocke
Copy link
Member

agocke commented Jan 7, 2025

Could you elaborate? How would you handle this situation:

But I don't see how it's possible to ensure coherency if the files are delivered from multiple domains. The CDN (by design) may lag the original server by an indeterminate period.

@richlander
Copy link
Member Author

richlander commented Jan 7, 2025

We're planning on using a single domain. The intent is for the JSON files to be served from domain A and reference files exclusively from domain A. This is for going forward. We're not going to make changes to existing JSON content.

We will be relying on multiple CDNs as an implementation detail, however, for that domain.

@richlander
Copy link
Member Author

Closed incorrectly. This is still coming.

@richlander richlander changed the title Use builds.dotnet.microsoft.com CDN in releases.json files Use builds.dotnet.microsoft.com not download.visualstudio.microsoft.com Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@agocke @richlander and others