Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when select string include '(' char. #24

Open
nifflin opened this issue Jan 4, 2016 · 1 comment
Open

Crash when select string include '(' char. #24

nifflin opened this issue Jan 4, 2016 · 1 comment

Comments

@nifflin
Copy link

nifflin commented Jan 4, 2016

I am searching a "script" node in facebook html source, the node is like
<.s.c.r.i.p.t.>require("TimeSlice").guard(function() ... ///< The dots in "script" is for showing this line normally in issue page.

So I use this selector to find this node
CSelection c = doc.find("script:contains(require("TimeSlice"))");

But, the app crashed with error "terminate called after throwing an instance of 'std::string'", GDB says it crash in doc.find function.

If I use CSelection c = doc.find("script:contains(require)"), it works well. But these nodes are not what I want. So, I think gumbo-query's "contains" filter does not support '(' in it.

@TechnikEmpire
Copy link

No, gumbo-query does not properly parse string sequences like this. Once the parser encounters a "special" character like "(" or even """, it will discard whatever previous operation it was tasked with doing (such as parsing a string) and change its task based on the newly discovered "special" character.

I'm not criticizing @lazytiger when I criticize stuff like this, just to be clear. He said he blindly ported cascadia over without doing too much testing, assuming that cascadia worked well. The problem is that cascadia has lots of bugs, _lots_ of them. This is one such example, improper string parsing. It doesn't parse strings by consuming all data between two matching unescaped quote characters, but rather has a per-character context, checking every single character starting at a quote for a "special" character, then assuming it's done parsing the string once it finds one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants