Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML filter creates parsed_xml inconsistently for nested xml #2498

Closed
mboyanna opened this issue Feb 3, 2015 · 1 comment
Closed

XML filter creates parsed_xml inconsistently for nested xml #2498

mboyanna opened this issue Feb 3, 2015 · 1 comment

Comments

@mboyanna
Copy link

mboyanna commented Feb 3, 2015

Hi

I am trying to parse nested XML using log stash. When there are repeating elements on the same level, the first one becomes an attribute to the parent, while the rest become attribute to the grandparent and all get put in an array.

Input file:

1A B-element1a B-element1b B-element1c B-element2a B-element2b B-element2c # Config file:

input {
file {
path => "/Users/bparman/Rawdata/resource/ls-jira.xml"
start_position => beginning
sincedb_path => "/dev/null"
}
}

filter {
multiline {
patterns_dir => "/Users/bparman/awdata/resource/mypatterns"
pattern => "^<ROOT.|."
what => "previous"
negate => "true"
}
}

filter {
mutate {
gsub => ["message","\n"," "]
gsub => ["message","<","<"]
gsub => ["message",">",">"]
gsub => ["message","/>",">"]
gsub => ["message",""",'"']

}

if [message] != "" {
mutate {
replace => [ "message", "%{message}" ]
}
}

if "multiline" in [tags] {
xml {
source => message
target => parsed_xml
xpath => ["/ROOT/@root_attr", "root_attr"]
xpath => ["/ROOT/elementA/item", "item"]
xpath => ["/ROOT/elementB/arry/text()", "array_of_fields"]

      add_field => {
          one_element => "%{[parsed_xml][ROOT][elementA][item]}"
          arr_elements => "%{[parsed_xml][elementB][1][arry]}" # This doesn't work Errors in parsed XML structure, see parsed_xml structure
      }
  }

}

}

output {
stdout { codec => rubydebug }
if "_xmlparsefailure" not in [tags] {
file {
path => "/Users/bparman/Rawdata/resource/xml-good.tsv"
message_format => "%{root_attr} %{item} %{array_of_fields} %{arr_elements}"
}
} else {
file {
path => "/Users/bparman/Rawdata/resource/xml-bad.tsv"
message_format => "%{message}"
}
}

}

Here's the debug output:

Note: how arry from the 2nd occurrence of elementB is not under elementB hash, but rather under ROOT

Using milestone 2 output plugin 'file'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.4.2/plugin-milestones {:level=>:warn}
{
"message" => "<ROOT root_attr="test-root-attribute"> 1A B-element1a B-element1b B-element1c B-element2a B-element2b B-element2c",
"@Version" => "1",
"@timestamp" => "2015-02-03T00:51:45.804Z",
"host" => "bparman-05210.gracenote.gracenote.com",
"path" => "/Users/bparman/Rawdata/resource/ls-jira.xml",
"tags" => [
[0] "multiline"
],
"root_attr" => [
[0] "test-root-attribute"
],
"item" => [
[0] "1A"
],
"array_of_fields" => [
[0] "B-element1a",
[1] "B-element1b",
[2] "B-element1c"
],
"parsed_xml" => {
"root_attr" => "test-root-attribute",
"elementA" => [
[0] {
"item" => [
[0] "1A"
]
}
],
"elementB" => [
[0] {
"arry" => [
[0] "B-element1a",
[1] "B-element1b",
[2] "B-element1c"
]
},
[1] {}
],
"arry" => [
[0] "B-element2a",
[1] "B-element2b",
[2] "B-element2c"
],
"ROOT" => {
"elementA" => {}
}
},
"one_element" => "%{[parsed_xml][ROOT][elementA][item]}",
"arr_elements" => "%{[parsed_xml][elementB][1][arry]}"
}

@jordansissel
Copy link
Contributor

For Logstash 1.5.0, we've moved all plugins to individual repositories, so I have moved this issue to logstash-plugins/logstash-filter-xml#11. Let's continue the discussion there! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants