Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS Browserdetection fail and redirect #368

Open
toniritter opened this issue Jul 7, 2021 · 8 comments
Open

JS Browserdetection fail and redirect #368

toniritter opened this issue Jul 7, 2021 · 8 comments
Labels
js-engine Issues related to the js engine

Comments

@toniritter
Copy link

toniritter commented Jul 7, 2021

based on JavaScript execution exeption question on Stackoverflow

HtmlUnit Version: 2.50.0

During getPage call of webpage flashscore.com, i got following exceptions

2021-07-07 08:46:05.408  WARN 4828 --- [nio-8080-exec-1] c.g.htmlunit.IncorrectnessListenerImpl   : Obsolete content type encountered: 'text/javascript'.
2021-07-07 08:46:05.564 ERROR 4828 --- [nio-8080-exec-1] c.g.h.j.DefaultJavaScriptErrorListener   : Error during JavaScript execution

com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function entries in object function Object() { [native code] }. (script in https://www.flashscore.com/unsupported/ from (31, 9) to (53, 10)#35)
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:949) ~[htmlunit-2.50.0.jar:2.50.0]
	at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:598) ~[htmlunit-core-js-2.50.0.jar:na]
	at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:487) ~[htmlunit-core-js-2.50.0.jar:na]
	at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.callSecured(HtmlUnitContextFactory.java:353) ~[htmlunit-2.50.0.jar:2.50.0]
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:829) ~[htmlunit-2.50.0.jar:2.50.0]
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:805) ~[htmlunit-2.50.0.jar:2.50.0]
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:796) ~[htmlunit-2.50.0.jar:2.50.0]
	at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScript(HtmlPage.java:942) ~[htmlunit-2.50.0.jar:2.50.0]
	at com.gargoylesoftware.htmlunit.html.ScriptElementSupport.executeInlineScriptIfNeeded(ScriptElementSupport.java:378) ~[htmlunit-2.50.0.jar:2.50.0]

I've tried with two different classes and problem still occur.

@PostMapping("/startScraping")
	public ResponseEntity<FlashScraper> startScraping(@NonNull @RequestBody FlashScraper flashScraper) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
		logger.info("startScraping request incomming");
		logger.info("Call URL: " + flashScraper.getScrapeUrl());
		
	    String url = "https://flashScore.com";

	    try (final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED)) {
	        HtmlPage page = webClient.getPage(url);
	        webClient.waitForBackgroundJavaScript(3_000);

	        System.out.println();
	        System.out.println();
	        System.out.println("----------------");
	        System.out.println(page.asNormalizedText());
	        System.out.println("----------------");
	    }
		
		return new ResponseEntity(flashScraper, HttpStatus.OK);
	}
@PostMapping("/startScraping")
	public ResponseEntity<FlashScraper> startScraping(@NonNull @RequestBody FlashScraper flashScraper) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
		logger.info("startScraping request incomming");
		logger.info("Call URL: " + flashScraper.getScrapeUrl());
		
		
		final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED);
		webClient.getOptions().setJavaScriptEnabled(true);
		webClient.getOptions().setThrowExceptionOnScriptError(false);
		webClient.waitForBackgroundJavaScriptStartingBefore(1000);

		HtmlPage scrapePage = webClient.getPage(flashScraper.getScrapeUrl());
		webClient.waitForBackgroundJavaScript(3000);
		
		
		
		System.out.println(scrapePage.getByXPath("//*[@id=\"g_25_rwPxTVj1\"]"));
		
		return new ResponseEntity(flashScraper, HttpStatus.OK);
	}
@toniritter
Copy link
Author

After switch Dependency to 2.51.0 version, the exception is not thrown anymore but still i'm on the "Unsupported" page https://flashscore.com/unsupported/

@rbri
Copy link
Member

rbri commented Jul 11, 2021

The browser detection is done using this https://www.flashscore.com/x/js/browsercompatibility_4.js code

// !!! for update iterate manually `browser_compatibility_serial`
"use strict";
try {
	(function () {
		var cssRequirements = [["display", "flex"], ["display", "grid"], ["color", "red"]];
		for (var i in cssRequirements) {
			if (!CSS.supports(cssRequirements[i][0], cssRequirements[i][1])) {
				throw "no-" + cssRequirements[i][0] + "-" + cssRequirements[i][1];
			}
		}
		try {
			new XMLHttpRequest();
		}
		catch (pass) {
			throw "no-ajax";
		}
		try {
			eval("var foo = (x)=>x+1");
		}
		catch (pass) {
			throw "no-es6";
		}
		try {
			eval("var foo = {}; var bar = {...foo};")
		}
		catch (pass) {
			throw "no-spread";
		}
	})();
}
catch (e) {
	var utm = "";
	if (typeof e == "string" && /^[a-z0-9\-]+$/.test(e)) {
		utm = "?err=" + e;
	}
	window.location.replace("/unsupported/" + utm);
}

For the moment i can fix CSS.supports() but because Rhino not (yet) supports the spread syntax (mozilla/rhino#968) this will still fail.

The only option you have is to 'patch' the script and replace comment out some parts (see https://htmlunit.sourceforge.io/faq.html#HowToModifyRequestOrResponse). At least it is worth a try

@rbri
Copy link
Member

rbri commented Jul 11, 2021

Have done a fix for CSS.supports() - will make a new snapshot available soon (check twitter for updates)

@toniritter
Copy link
Author

I've done it as suggested and try modify the response but got now following exception on it (still on version 2.51.0

2021-07-12 19:23:13.844 ERROR 2820 --- [nio-8080-exec-2] o.a.c.c.C.[.[.[/].[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is com.gargoylesoftware.htmlunit.ScriptException: syntax error (https://www.flashscore.com/x/js/browsercompatibility_4.js#1)] with root cause

net.sourceforge.htmlunit.corejs.javascript.EvaluatorException: syntax error (https://www.flashscore.com/x/js/browsercompatibility_4.js#1)
	at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory$HtmlUnitErrorReporter.error(HtmlUnitContextFactory.java:436) ~[htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.corejs.javascript.Parser.addError(Parser.java:251) ~[htmlunit-core-js-2.51.0.jar:na]

@rbri
Copy link
Member

rbri commented Jul 15, 2021

looks like there is a syntax error in your replaced script - maybe you can replace it by an empty one?

@toniritter
Copy link
Author

Hey rbri, i've tried it meanwhile with this but it will still faile:

	public void startScraper() throws FailingHttpStatusCodeException, MalformedURLException, IOException {
		
		
		String url = "https://www.flashscore.com/basketball/";
		
		
		try (final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED)) {
			
			webClient.getOptions().setThrowExceptionOnScriptError(false);
			webClient.getOptions().setUseInsecureSSL(true);
		    webClient.getOptions().setCssEnabled(true);
			webClient.getOptions().setJavaScriptEnabled(true);
			webClient.waitForBackgroundJavaScriptStartingBefore(1000);
			
			
			new WebConnectionWrapper(webClient) {

	            public WebResponse getResponse(WebRequest request) throws IOException {
	                WebResponse response = super.getResponse(request);
	                if (request.getUrl().toExternalForm().contains("browsercompatibility")) {
	                    String content = "";
	                    // intercept and/or change content

	                    WebResponseData data = new WebResponseData(content.getBytes(),response.getStatusCode(), response.getStatusMessage(), response.getResponseHeaders());
	                    response = new WebResponse(data, request, response.getLoadTime());
	                }
	                return response;
	            }
	        };
			
			
			
	        HtmlPage page = webClient.getPage(url);
	        webClient.waitForBackgroundJavaScript(3_000);

	        System.out.println();
	        System.out.println();
	        System.out.println("----------------");
	        System.out.println(page.asNormalizedText());
	        System.out.println("----------------");
	    }
		
		
		
	}
2021-07-16 15:22:45.844  WARN 1524 --- [           main] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://www.flashscore.com/res/_fs/build/livetableresponsive.c7059bf.css' [1:8910] Error in pseudo class or element. (Invalid token ".". Was expecting one of: <S>, <NUMBER>, <IDENT>, <STRING>, "-", <PLUS>, <DIMENSION>.)
2021-07-16 15:22:45.844  WARN 1524 --- [           main] c.g.htmlunit.DefaultCssErrorHandler      : CSS warning: 'https://www.flashscore.com/res/_fs/build/livetableresponsive.c7059bf.css' [1:8910] Ignoring the whole rule.
2021-07-16 15:22:46.305  WARN 1524 --- [           main] c.g.htmlunit.IncorrectnessListenerImpl   : Obsolete content type encountered: 'text/javascript'.
2021-07-16 15:22:46.487 ERROR 1524 --- [           main] c.g.h.j.DefaultJavaScriptErrorListener   : Error during JavaScript execution

com.gargoylesoftware.htmlunit.ScriptException: invalid property id (https://www.flashscore.com/res/_fs/build/loader.5714507.js#1)
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:954) ~[htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:580) ~[htmlunit-core-js-2.51.0.jar:na]
	at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:481) ~[htmlunit-core-js-2.51.0.jar:na]
	at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.callSecured(HtmlUnitContextFactory.java:352) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.compile(JavaScriptEngine.java:785) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.compile(JavaScriptEngine.java:751) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.compile(JavaScriptEngine.java:112) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.HtmlPage.loadJavaScriptFromUrl(HtmlPage.java:1122) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:1002) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.ScriptElementSupport.executeScriptIfNeeded(ScriptElementSupport.java:196) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.ScriptElementSupport$1.execute(ScriptElementSupport.java:120) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.ScriptElementSupport.onAllChildrenAddedToPage(ScriptElementSupport.java:143) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:191) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:551) ~[htmlunit-2.51.0.jar:2.51.0]
	at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) ~[xercesImpl-2.12.0.jar:na]
	at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:503) ~[htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1216) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1156) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:219) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:312) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3189) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2114) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:937) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:443) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:394) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) ~[xercesImpl-2.12.0.jar:na]
	at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.parse(HtmlUnitNekoDOMBuilder.java:751) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parse(HtmlUnitNekoHtmlParser.java:208) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:297) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:217) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:684) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:586) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:501) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:413) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:548) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:529) ~[htmlunit-2.51.0.jar:2.51.0]
Caused by: net.sourceforge.htmlunit.corejs.javascript.EvaluatorException: invalid property id (https://www.flashscore.com/res/_fs/build/loader.5714507.js#1)

@rbri
Copy link
Member

rbri commented Jul 21, 2021

Looks like another error - this time

invalid property id (https://www.flashscore.com/res/_fs/build/loader.5714507.js#1)

And this js is a huge minimized javascript. At least this uses the not supported syntax

function(...e){let t=this._configData;

I fear you have to wait until this is fixed in Rhino.

@rbri
Copy link
Member

rbri commented Mar 27, 2024

see #755

@rbri rbri added the js-engine Issues related to the js engine label Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
js-engine Issues related to the js engine
Projects
None yet
Development

No branches or pull requests

2 participants