Skip to content
Nathan Friedly edited this page May 7, 2021 · 34 revisions

Notes to self on future development

Pluggable URL rewriters (WIP)

To allow base64 encoding and the like.

  • Will need a couple of pre-flight methods to determine if a url should be proxied and to handle 404's (maybe those can be one in the same?)
  • Will need to provide rewrite logic to other parts of the proxy such as the actual url-rewriter
    • Logic will need to be serializable for eventual client script support
  • Will need to provide domain and path logic to cookies
  • Might need to provide a way to provide a disallow rule for robots.txt

Path-based

Current implementation, just needs to be de-coupled from existing code

Subdomain + Path based

Move the domain portion of the url into the subdomain, optionally drop the path prefix.

Will need to either force HTTP (and perhaps use a querystring to request HTTPS at the destination) or else needs an HTTPS wildcard certificate and some kind of escaping for dots in URLs.

Simple escaping: convert - to --, convert . to -. Relatively pretty urls, but may break on subdomains that have dashes at the beginning or end. Also, need to be careful about xn--* punnycode domains. Maybe de-punnycode everything first?

Base64

Like path-based, but obscured (#141)

Client scripts

  • postMessage could be wrapped to send messages to "*", and wrap them in something to identify what the originating domain should be (?)

  • window.location cannot be wrapped, but perhaps scripts can be wrapped and given proxies for window and location or, alternatively, parsing and rewriting JavaScript as it passes through the proxy. - This might fix #149 and #162

    • ditto for fixing the domain/path on writes to document.cookie
  • DOM Mutation Observer to detect changes and fixup urls - https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver

Proxy the srcset attribute

It's just a list of urls, should be fairly straightforward. Would be easier with proper parsing rather than regex's. https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#attr-srcset (#77)

Get smarter about content-types

Some middleware should operate on html (only), some on html + css, some on js, etc. Some middleware, such as compression should operate if any other middleware is used. So, we need to allow the categories of content-types to be configurable, and then allow middleware to specify which categories it should work on, with a special "all of the above" category that includes anything that any other middleware needs to touch.

Make a "inject into <head> once" helper

Client scripts and meta robots could both use this.

handle Access-Control-Allow-Origin headers

Need to be a little bit careful about this one to ensure it doesn't enable anything nasty

Enable async middleware

Would make things like the youtube plugin simpler

Modernize the JS (WIP)

ES Modules, classes, const, etc.

This was initially written for Node.js v0.2 and it kind of shows in places...

Client scripts will be a little bit more tricky and may require the use of something like esbuild.

Modules would bump the minimum node.js version to 13 unless a flag is passed in

Robots.txt

  • Would need to play nice with the url wrapper to know what to disallow.

New Examples & Improvements

  • DNS-base adblocker
  • more fleshed-out youtube wrapper - at least implement search
  • pollyfill.io for URL & array.includes
    • Might want to also add a object + copy + setTimeout comparison if Proxy isn't available?
  • fastify
  • cross-user caching for things that have to be parsed (I'm looking at you, 7.7mb youtube js file!)

Breaking changes for v3

  • Bump the minimum node.js version, at least 10, maybe 14
  • Switch data.url to a URL object
  • use real html parser instead of regexp
  • rework how processContentTypes works and/or deprecate it
  • split client js into multiple files and set up a bundler
  • remove the standardMiddleware configuration option

Other changes for v3

  • make context an event emitter
    • move html events to context
  • Organize the files better
  • Tests for client scripts

Maybe:

  • make the url wrapper emit events that cookies can hook into unstead of having to unwrap and re-write urls (?)
  • figure out how to avoid needing both global and per-request instances of url wrapper - maybe static methods for the global one?
  • rework configuration:
    • separate configuration objects for each module, with false = off and true = default config
    • add a origin option that supersedes host
  • add an event-based system to allow custom middleware to hook into different portions
    • make default spots for requestMiddleware and responseMiddleware
  • use ES Modules - https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c#how-can-i-move-my-commonjs-project-to-esm
  • use real js parser - not likely
  • use a real css parser - maybe?