-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/publicsuffix: automate periodic regeneration #15518
Comments
@nigeltao, we should talk about this today. |
We talked about making some online serving mode where your program could fetch the latest data at runtime and in the background periodically, relieving of us of the need to regularly rebuild this and litter the git repo's history with huge binary blob updates every week. If somebody wants to write that mode, that'd be great. |
FYI to solve this problem I made available an alternative implementation that can parse a list dynamically: https://github.com/weppos/publicsuffix-go There is an open ticket in the Let's Encrypt repo about using a different way to pull the PSL changes. |
To @Webpos, I think you mean https://github.com/weppos/publicsuffix-go |
Indeed, wrong buffer. Thanks for pointing it out, I've updated the link. |
This came up again in a private conversation with a colleague. As I understand the original request, "automate periodic regeneration", the idea is to subscribe for changes to the canonical https://publicsuffix.org/ data. When it changes, generate and commit the To echo an earlier comment by @bradfitz, another cause for pushback is that every update causes non-trivial growth in the x/net git repository. Even if an individual upstream commit's diff is relatively small, the downstream
The disk usage difference, 10995803 - 10841140, is 154663 bytes, or 1.4% of 10841140. One idea (not necessarily a good one, but I don't want to forget it) to cap git repo size concerns is to keep Back to solving the original problem, Again, as @bradfitz suggested above, we could add API where the caller brings their own copy of the PSL-as-a-text-file for the library to compile. Compilation used to take minutes, but now takes seconds, so that might be more feasible than it was a few years ago. We might have to cut the overall feature into multiple packages, so that people who just want the pre-compiled version don't have to import all the dependencies for dynamic generation, but if we want to do dynamic generation, that ought to be a solvable problem. To be clear, I don't have much spare time to work on this myself, but I'm writing this all down before I forget the conversation. |
They currently use my Go library I setup a process that does exactly what you described. It checks for changes, and automatically creates a PR for review. In my case, this is trivial because I don't generate a blob. I cleanup, pre-process the list and pre-compile a Go file for optimal load, but I don't go any further. As a result, diffs are small: Furthermore, weppos/publicsuffix-go#128 was designed to allow to load an external list with the exact intent of allowing the consumer to bypass the shipped version of the list. There's a couple of cases where this is a good idea: if the packaged list is outdated, or if you want to use a custom list (after all, the PSL is a format as well so you can submit your own list). |
@weppos my colleague had some concerns about https://github.com/weppos/publicsuffix-go performance. Every List.Find call does a linear iteration over the entire list of rules, right? (Over 8000, by default?) Have any of your other users raised performance concerns? Is there any intention (an issue?) to build a fancier data structure (i.e. a tree)? |
@nigeltao the concern is absolutely valid and the only reason I haven't flagged the lib as 1.x (it's still 0.x) is exactly because I want to change the lookup strategy. My plan was to switch to the same Hash-based lookup I implemented in the Ruby library (weppos/publicsuffix-ruby#133) that proved to be extremely efficient. I also made tests and benchmarks with a Trie based lookup (weppos/publicsuffix-ruby#134) but there was little if no gain. I also made some research to use a DAWG/DAFSA but it turns out while it is extremely memory efficient (I tried to balance lookup performance with memory size), it works well only in cases where the list is not intended to be dynamically modified. In fact, it works well for the Chromium use case (the folks at Chrome implemented the lookup with a DAFSA) but not in the case of a more general-purpose library. Peter from Amazon also reported another proposal that seems to provide further efficiency over the current version that I still need to review weppos/publicsuffix-ruby#143 All these versions provide good performance still adopting a representation that is diff-efficient. Long story short, I definitely want to use a more efficient data structure. To be fair, I did not put too much effort on it lately as nobody really complained about performance. Still, it's on the roadmap. I should probably create a ticket on the repo and dump this info there (so that it can also inspire contributions). |
You might want to try https://github.com/globalsign/publicsuffix which aims for performance close to https://golang.org/x/net/publicsuffix whilst adding the capability to update the PSL similar to https://github.com/weppos/publicsuffix-go |
Change https://go.dev/cl/450935 mentions this issue: |
Regenerating the publicsuffix package every time we tagged a new x/net version (monthly) wouldn't add all that much repo growth. Is monthly good enough? Or do we ideally want something that tracks the upstream list more closely? If we want a schedule faster than the x/net tagging one, that's another reason to move to a new module. At least half the time spent generating the publicsuffix repo right now is text compression, where we take a list of label strings and crush them into a single string containing all the individual labels, possibly overlapping. On my laptop, this step takes 100ms and reduces the text size by 33%, saving 16KiB. (For comparison, Maybe we could stop compressing the list? 16KiB isn't a lot these days, and storing the list uncompressed would reduce commit deltas. |
Use //go:embed to embed the public suffix tables, rather than generating .go files containing the data. Creating an empty git repo and generating commits for the last 20 updates to the public suffix list, the total size of the repository directory as measured by "du -sh" decreases from 2.2M to 668K when using embedding. For golang/go#15518. Change-Id: Id71759765831a7699e7a182937095b3820bb643b Reviewed-on: https://go-review.googlesource.com/c/net/+/450935 Run-TryBot: Damien Neil <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Nigel Tao <[email protected]> Reviewed-by: Nigel Tao (INACTIVE; USE @golang.org INSTEAD) <[email protected]>
`golang.org/x/net/publicsuffix` is updated infrequently and there have been [discussions over time](golang/go#15518) which seem to imply that more frequent updates of the list are [unlikely to be considered](golang/go#15518 (comment)). The proposed library [`github.com/weppos/publicsuffix-go`](https://github.com/weppos/publicsuffix-go) keeps itself updated with the list maintained at the PSL source and has a backwards compatible API which we can use as a drop-in replacement. Fixes #8074.
#5764) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/authelia/authelia](https://redirect.github.com/authelia/authelia/pkgs/container/authelia) ([source](https://redirect.github.com/authelia/authelia)) | patch | `4.38.16` -> `4.38.17` | --- ### Release Notes <details> <summary>authelia/authelia (ghcr.io/authelia/authelia)</summary> ### [`v4.38.17`](https://redirect.github.com/authelia/authelia/releases/tag/v4.38.17) [Compare Source](https://redirect.github.com/authelia/authelia/compare/v4.38.16...v4.38.17) ##### Bug Fixes - **configuration:** jwk without required key startup panic ([#​8023](https://redirect.github.com/authelia/authelia/issues/8023)) ([af5face](https://redirect.github.com/authelia/authelia/commit/af5facec6dd5c4c91ec51daf19ca319580759d2e)) - **configuration:** templating panic edge case ([#​8130](https://redirect.github.com/authelia/authelia/issues/8130)) ([feca984](https://redirect.github.com/authelia/authelia/commit/feca984ddd63aa508ce45259e2c874784be98316)) - **configuration:** utilise updated psl for domain validation ([#​8119](https://redirect.github.com/authelia/authelia/issues/8119)) ([a89d8b8](https://redirect.github.com/authelia/authelia/commit/a89d8b81af8cb7149266a290ad73679ab9425182)), closes [/github.com/golang/go/issues/15518#issuecomment-217312171](https://redirect.github.com//github.com/golang/go/issues/15518/issues/issuecomment-217312171) [#​8074](https://redirect.github.com/authelia/authelia/issues/8074) - **web:** feedback missing from password reset ([#​8021](https://redirect.github.com/authelia/authelia/issues/8021)) ([58866f6](https://redirect.github.com/authelia/authelia/commit/58866f68f7f16d27779775cd44fb27942a63637e)) - **web:** totp credential ui shows too much info ([#​8062](https://redirect.github.com/authelia/authelia/issues/8062)) ([5538c2f](https://redirect.github.com/authelia/authelia/commit/5538c2f3af28d963879f97e1ebf286e16cd105f3)) - **web:** webauthn buttons crowded ([#​8008](https://redirect.github.com/authelia/authelia/issues/8008)) ([108c58e](https://redirect.github.com/authelia/authelia/commit/108c58eb1719dfa0d0b95b7a0ec556b5b35ef165)) ##### Docker Container - `docker pull authelia/authelia:4.38.17` - `docker pull ghcr.io/authelia/authelia:4.38.17` </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://redirect.github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC4xMzguNCIsInVwZGF0ZWRJblZlciI6IjM4LjEzOC40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19--> Co-authored-by: lumiere-bot[bot] <98047013+lumiere-bot[bot]@users.noreply.github.com>
Please answer these questions before submitting your issue. Thanks!
go version
)?1.6.1
go env
)?linux/amd64
I used golang.org/x/net/publicsuffix and it doesn't contain the last updated list
An updated list
A list from 2016-03-01
The list is changing more than before. The trigger was LetsEncrypt but people begin to understand the security implications beside Letsencrypt.
The text was updated successfully, but these errors were encountered: