Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate performance improvements for netbox enrichment #547

Open
mmguero opened this issue Jan 9, 2025 · 0 comments
Open

investigate performance improvements for netbox enrichment #547

mmguero opened this issue Jan 9, 2025 · 0 comments
Labels
netbox Related to Malcolm's use of NetBox performance Related to speed/performance
Milestone

Comments

@mmguero
Copy link
Collaborator

mmguero commented Jan 9, 2025

The netbox enrichment code is by far the slowest part of the logstash pipeline. Here's the end of the output of the list of all the logstash filters, with the final column being the duration of that filter in milliseconds:

$ docker compose exec logstash curl -XGET http://localhost:9600/_node/stats/pipelines | jq -r '.. | .filters? // empty | .[] | objects | select (.events.in > 0) | [.id, .events.in, .events.out, .events.duration_in_millis] | join (";")' | sort -n -t ';' -k4
...
...
...
cidr_detect_network_type_ipv4_source;22357;22357;1953
ruby_dns_freq_lookup;407;407;2098
ruby_zeek_remove_empty_values;13007;13007;2962
ruby_suricata_timestamp_calc;11469;11469;4816
cidr_add_tag_internal_destination;22397;22397;8875
cidr_add_tag_internal_source;22357;22357;14472
ruby_netbox_enrich_destination_ip_segment;8434;8403;858710
ruby_netbox_enrich_source_ip_segment;9759;9691;1227869
ruby_netbox_enrich_destination_ip_device;8403;8403;1391125
ruby_netbox_enrich_source_ip_device;9691;9600;1782989

You can see that the enrichment stuff is far and away the most costly. Beyond some caching, there isn't a ton I'm doing optimization/performance wise. We should examine the netbox enrichment ruby filter code (linked above) and see if we can do some of the following:

  • examine cache settings... do they make sense? are we getting cache misses?
  • is there any sort of profiling code we can do to find the hot spots in the code?
  • are there particular features (autodiscovery, regular lookups, devices, services, etc.) that are more costly than others?

All in all, it would be probably the biggest performance benefit we could get for Malcolm if we could improve the speed of that code without sacrificing functionality.

@mmguero mmguero added netbox Related to Malcolm's use of NetBox performance Related to speed/performance labels Jan 9, 2025
@mmguero mmguero added this to the z.staging milestone Jan 9, 2025
@mmguero mmguero added this to Malcolm Jan 9, 2025
@mmguero mmguero moved this to Todo (investigate) in Malcolm Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
netbox Related to Malcolm's use of NetBox performance Related to speed/performance
Projects
Status: Todo (investigate)
Development

No branches or pull requests

1 participant