Skip to content
This repository has been archived by the owner on Feb 29, 2024. It is now read-only.

problems about WebTextHandler #2

Open
3 tasks
CoolSpring8 opened this issue Jul 31, 2020 · 1 comment
Open
3 tasks

problems about WebTextHandler #2

CoolSpring8 opened this issue Jul 31, 2020 · 1 comment
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@CoolSpring8
Copy link
Owner

CoolSpring8 commented Jul 31, 2020

Currently we have this in internal/proxy/proxy.go:

// WebTextHandler fixes links in page and "src" issues in javascript files.
// This solution, however, may prevent content streaming. Fix it?
func WebTextHandler() goproxy.RespHandler {
	return goproxy_html.HandleString(
		func(s string, ctx *goproxy.ProxyCtx) string {
			c := RVPNURLMatcher.ReplaceAllString(s, "$1://")
			rawURLWithPort := ctx.UserData.(reqData).rawURLWithPort
			rawURLWithoutPort := ctx.UserData.(reqData).rawURLWithoutPort
			c = strings.ReplaceAll(c, rawURLWithPort[:strings.LastIndex(rawURLWithPort, "/")+1], "") // possible out of bounds?
			c = strings.ReplaceAll(c, rawURLWithoutPort[:strings.LastIndex(rawURLWithoutPort, "/")+1], "")
			return c
		})
}

As the comments indicates, there are several problems.

  1. Issue of "src" in javascript files is not completely resolved. Corruption of scripts still exists on some sites.
  2. Out of bounds in trying to get the string before (including) last "/".
  3. Streaming of response. Apparently for now the result is first "ReadAll"-ed, processed and sent to the browser.

Solutions and plans:

  • 1: This requires a close inspecting and some experiments on some sites.
  • 2: A better way to do the extracting. Note since this is also related to "src" issue as mentioned above, if we can find other ways to solve that, there is no need to do this substitution anymore.
  • 3: This is more of a performance concern. Let's have a detailed explanation in the next reply.

And one more small thing: CSS files can probably be excluded from being applied with this handler.

@CoolSpring8
Copy link
Owner Author

CoolSpring8 commented Jul 31, 2020

For 3:
Well, I'm not sure if browsers really start rendering one element before its downloading finishes.

The current behavior is that "Waiting (TTFB)" takes a long time, and "Content Download" phase finishes like lightning (as rwppa runs locally).

As introducing streaming text replacement will add overhead and requires more work, it would be better to know if this have a perceivable impact on user experience.

An extra bonus would be being more stable and have a lower memory usage.

So, if having decided to optimize this:

  • Streaming text replacement: https://github.com/icholy/replace
    This one supports regular expressions besides normal bytes replace and string replace.
  • Character set conversion is a big problem. One solution is to convert content to utf-8, do processing and converts back. Another is to convert the new string (in strings.ReplaceAll) to the given encoding. (However, it looks like that regexp cannot be encoded this way.) Refer to following resources:

@CoolSpring8 CoolSpring8 added enhancement New feature or request help wanted Extra attention is needed labels Jul 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant