Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient ways to sync with the Podcast Index database #33

Open
ryan-lp opened this issue Dec 27, 2023 · 2 comments
Open

Efficient ways to sync with the Podcast Index database #33

ryan-lp opened this issue Dec 27, 2023 · 2 comments

Comments

@ryan-lp
Copy link

ryan-lp commented Dec 27, 2023

In Podcastindex-org/podcast-namespace#558 I wrote:

As for what I'm currently doing (downloading the whole database every week, computing a diff, then integrating that), I feel this could be streamlined. [...]

@daveajones replied:

There are a bunch of ways to stay up to date actually. I’m glad to share them.
[...] This should probably be in the “database” repo instead of in the namespace repo.

I'm moving that discussion here and would be interested in the bunch of ways you mentioned. I'm personally interested in ways that don't hit the API server in part due to Podcastindex-org/legal#1 which prohibits building databases out of content returned from the API. I think ideally we want an efficient and permissible way to create mirror databases, not only to improve locality but to facilitate mirroring and prevent a single point of failure.

@ryan-lp
Copy link
Author

ryan-lp commented Dec 28, 2023

In order to keep the mirror up to date, it would be helpful to have a diff indicating insertions, deletions, updates. I'm not talking about updates to the feed contents, but updates to the feed identity (feed URL, itunes ID, ...). This sort of mirroring has some parallels with the way mirrors are created for Linux distributions using rsync to only transfer what has changed, although in practice, a podcast index mirror DB might either be an exact replica or it might be a custom DB with extra columns. As long as it has the same primary key, the diff approach will still work. Since there are straightforward instructions on how to create a Linux distribution mirror, there are many Linux mirrors and no single point of failure. My Arch Linux mirrorlist file has 500 alternative mirror sites in it.

In theory, the podping network could also be used to broadcast "insertions" at least. The podcast index might then publish guidelines on how to independently detect deletions and updates (i.e. to the identity) on their own. Although this approach might need to involve adding the iTunes ID to the podping message. Ideally it would be in the feed content anyway but that is unlikely to be a realistic option in the near to medium term.

@ryan-lp
Copy link
Author

ryan-lp commented Dec 28, 2023

Although this approach might need to involve adding the iTunes ID to the podping message.

I suppose an alternative would be to leave the podping message format the way it is, so just broadcasting the feed URLs, and then rely on the iTunes API to look up the iTunes ID whenever a new podcast appears. There's no official API to lookup an iTunes ID by feed URL, but you can lookup by title and get a set of results, then iterate over those results to match the feed URL.

GET https://itunes.apple.com/search?term=PODCAST_TITLE&attribute=titleTerm&entity=podcast

There is a limit of 20 API calls per minute, so this assumes new podcasts are created at a rate no greater than 20 per minute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant