Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of literal double quotes #10

Open
ppf2 opened this issue Mar 21, 2017 · 7 comments
Open

Handling of literal double quotes #10

ppf2 opened this issue Mar 21, 2017 · 7 comments
Labels

Comments

@ppf2
Copy link
Member

ppf2 commented Mar 21, 2017

Example log entry:

Test:"4394740425750718628","Something","2017-03-14 22:31:13"

If I use the dissect mapping:

message => "Test:\"%{supportID}\",\"%{attackType}\",\"%{msg}\""

Fields will not get extracted because Logstash does not currently handle double quotes semantics yet.

The following works but will generate fields with literal double quotes in them:

message => "Test:%{supportID},%{attackType},%{msg}"

eg.

"supportID" => ""4394740425750718628""

Possible workarounds will be to pre-process the log entry to either remove the literal double quotes via a gsub, or reformat the log entry using one of the workarounds in the elastic/logstash#1645 ticket.

For users who are switching from grok to dissect, it may be helpful to document this use case in a reference/caveat section.

@jordansissel
Copy link
Contributor

message => "Test:\"%{supportID}\",\"%{attackType}\",\"%{msg}\""

Logstash syntax doesn't support escapes the way you are trying to use them. I recommend using single quotes.

message => 'Test:"%{supportID}","%{attackType}","%{msg}"'

@ppf2
Copy link
Member Author

ppf2 commented Mar 21, 2017

Might be helpful to document in dissect filter section for users rewriting their configs from grok to dissect because grok appears to handle this just fine.

match => ["message","Test:\"%{NUMBER:supportID}?\",\"%{ALPHANUMSPACESPECIAL:attackType}?\",\"%{DATA:msg}?\""]

Returns:

    "attackType" => "Something",
      "@version" => "1",
     "host" => "Firestorm",
     "supportID" => "4394740425750718628",
     "msg" => "2017-03-14 22:31:13"

@jordansissel
Copy link
Contributor

That is unexpected behavior and a bug, and I don't want us to document a bug as if it were a feature.

It works by accident because the regex compiler seems to turn \" into meaning that you want to match a double-quote literally. In this sense, /\"/ and /"/ are identical regular expressions.

In Ruby, we can see them being identical expressions:

# Show what /\"/ matches
>> puts /\"/.match('"').to_s
"

# Show what /"/ matches
>> puts /"/.match('"').to_s
"

@ppf2
Copy link
Member Author

ppf2 commented Mar 21, 2017

Thanks for the explanation ❤️ Sounds good then! We do have it documented in this issue so it will suffice.

@ppf2 ppf2 closed this as completed Mar 21, 2017
@jordansissel
Copy link
Contributor

jordansissel commented Mar 21, 2017

Maybe instead of focusing the documentation on quotation marks, we could highlight that the regex engine does additional processing when compiling the pattern. This idea is much more than just \" and goes for any characters special to Joni (the regexp engine), such as ., [], {}, (), ^, $, etc.

Another example of something wildly different in dissect and grok is this pattern:

"[hello]"

In Grok, this becomes compiled (regex) to mean "match a single character that can by any of h, e, l, or o". In Dissect, this means literally match the text "[hello]".

And another example:

"."

In Grok, this means "any character". In Dissect, this means "a period".

@ppf2
Copy link
Member Author

ppf2 commented Mar 21, 2017

This sounds like a good idea :)

@ppf2 ppf2 reopened this Mar 21, 2017
@jordansissel
Copy link
Contributor

@ppf2 thank you for filing. Sometimes it's easy to forget how similar ideas are (grok/dissect) and how different the implementations and effects are ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants