Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[System] Add dimensions to system package metrics data_streams only, except core data_streams #6118

Merged
merged 17 commits into from
May 31, 2023

Conversation

tetianakravchenko
Copy link
Contributor

@tetianakravchenko tetianakravchenko commented May 8, 2023

What does this PR do?

Add/adjust dimensions fields:

  • ⛔ (when adding core.id fields all documents are dropped, need to double-check, will add in different PR) core - add standard fields + core.id
    • Verification of the count of documents (before & after TSDB enablement) in Discover Interface
    • For every field annotated with dimension:true, there exists time_series_dimension: true annotation in the index
  • ✅ cpu - adjust list of fields according to [ECS] [TSDB] Centralisation of Dimension Fields  #5193 (comment).
    • Verification of the count of documents (before & after TSDB enablement) in Discover Interface ✅
    • For every field annotated with dimension:true, there exists time_series_dimension: true annotation in the index ✅
  • ✅ diskio - add standard fields + system.diskio.name
    • Verification of the count of documents (before & after TSDB enablement) in Discover Interface ✅
    • For every field annotated with dimension:true, there exists time_series_dimension: true annotation in the index ✅
  • ✅ (retest) filesystem - add standard fields + system.filesystem.device_name and system.filesystem.mount_point
    • Verification of the count of documents (before & after TSDB enablement) in Discover Interface ✅
    • For every field annotated with dimension:true, there exists time_series_dimension: true annotation in the index ✅-
  • ✅ fsstat - add standard fields
    • Verification of the count of documents (before & after TSDB enablement) in Discover Interface ✅
    • For every field annotated with dimension:true, there exists time_series_dimension: true annotation in the index ✅
  • ✅ load - adjust list of fields according to [ECS] [TSDB] Centralisation of Dimension Fields  #5193 (comment).
    • Verification of the count of documents (before & after TSDB enablement) in Discover Interface ✅
    • For every field annotated with dimension:true, there exists time_series_dimension: true annotation in the index ✅
  • ✅ memory - adjust list of fields according to [ECS] [TSDB] Centralisation of Dimension Fields  #5193 (comment).
    • Verification of the count of documents (before & after TSDB enablement) in Discover Interface ✅
    • For every field annotated with dimension:true, there exists time_series_dimension: true annotation in the index ✅
  • ⛔ network - add standard fields + system.network.name - will be added in separate PR
    • Verification of the count of documents (before & after TSDB enablement) in Discover Interface ✅
    • For every field annotated with dimension:true, there exists time_series_dimension: true annotation in the index ✅
  • ⛔ process - add standard fields + process.pid- will be added in separate PR
    • Verification of the count of documents (before & after TSDB enablement) in Discover Interface
    • For every field annotated with dimension:true, there exists time_series_dimension: true annotation in the index
  • ✅ process_summary - add standard fields
    • Verification of the count of documents (before & after TSDB enablement) in Discover Interface ✅
    • For every field annotated with dimension:true, there exists time_series_dimension: true annotation in the index ✅
  • ✅ socket_summary - add standard fields
    • Verification of the count of documents (before & after TSDB enablement) in Discover Interface ✅
    • For every field annotated with dimension:true, there exists time_series_dimension: true annotation in the index ✅
  • ✅ uptime - add standard fields
    • Verification of the count of documents (before & after TSDB enablement) in Discover Interface ✅
    • For every field annotated with dimension:true, there exists time_series_dimension: true annotation in the index ✅

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

@tetianakravchenko tetianakravchenko marked this pull request as ready for review May 8, 2023 14:25
@tetianakravchenko tetianakravchenko requested a review from a team as a code owner May 8, 2023 14:25
Signed-off-by: Tetiana Kravchenko <[email protected]>
@elasticmachine
Copy link

elasticmachine commented May 8, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-05-31T09:22:26.110+0000

  • Duration: 15 min 4 sec

Test stats 🧪

Test Results
Failed 0
Passed 145
Skipped 0
Total 145

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

Signed-off-by: Tetiana Kravchenko <[email protected]>
Signed-off-by: Tetiana Kravchenko <[email protected]>
Signed-off-by: Tetiana Kravchenko <[email protected]>
Signed-off-by: Tetiana Kravchenko <[email protected]>
@elasticmachine
Copy link

elasticmachine commented May 8, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (3/3) 💚
Files 100.0% (4/4) 💚
Classes 100.0% (4/4) 💚
Methods 60.759% (48/79)
Lines 99.536% (2790/2803)
Conditionals 100.0% (0/0) 💚

Signed-off-by: Tetiana Kravchenko <[email protected]>
@tetianakravchenko tetianakravchenko changed the title [System] Add dimensions to system package metrics data_streams only, except core and process data_streams [System] Add dimensions to system package metrics data_streams only, except core data_streams May 9, 2023
Copy link
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

system.process.cmdline should be considered as a dimension field.

@agithomas
Copy link
Contributor

system.process.cmdline should be considered as a dimension field.

Please ignore the above review comment.

@lalit-satapathy
Copy link
Collaborator

CC: @ruflin current system package migration PR.

@@ -202,6 +209,13 @@
description: Process metrics.
type: group
fields:
- name: pid
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: it implies that new time series will be created when restarting a process, the same for k8s environment since we have this setting in manifest: https://github.com/elastic/elastic-agent/blob/main/deploy/kubernetes/elastic-agent-managed-kubernetes.yaml#L29

@@ -202,6 +209,13 @@
description: Process metrics.
type: group
fields:
- name: pid
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a field, for which mapping was completely missing before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lalit-satapathy yes. this field is defined in ecs https://www.elastic.co/guide/en/ecs/current/ecs-process.html
note: there are other process.* fields in document, I've added here only the relevant for this PR field
document example with process.* fields:
Screenshot 2023-05-10 at 17 11 55

@@ -1357,7 +1358,7 @@ This data should be available without elevated permissions.
| host.os.full.text | Multi-field of `host.os.full`. | match_only_text | | |
| host.os.kernel | Operating system kernel version as a raw string. | keyword | | |
| host.os.name | Operating system name, without the version. | keyword | | |
| host.os.name.text | Multi-field of `host.os.name`. | match_only_text | | |
| host.os.name.text | Multi-field of `host.os.name`. | text | | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this change coming from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't touch any ecs fields in this PR, I thought that this is coming from the ecs definition, when running elastic-package build
but, you are right - it should still be match_only_text from the ecs fields definition - https://github.com/elastic/ecs/blob/8.0/experimental/generated/beats/fields.ecs.yml#L3857-L3866

it appeared that this field is defined in fields folder twice - in agent.yml

- name: os.name
level: extended
type: keyword
ignore_above: 1024
multi_fields:
- name: text
type: text
norms: false
default_field: false
description: Operating system name, without the version.
example: Mac OS X
and in ecs.yml -
- external: ecs
name: host.os.name

I will remove the custom definition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - b762ecc

there are lots of duplicated fields, to not mix up fields clean up and tsdb dimensions pr - I am not going to introduce more fields removal here. @elastic/elastic-agent-data-plane could you please take care of it?

I think that elastic-package check could verify that there are no duplicated fields to avoid such situations, wdyt? should it be created as a feature request for elastic-package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: elastic/elastic-package#1287, there is already available a check for the duplicated fields, it is only needed to update the format_version

@@ -1733,7 +1737,7 @@ This data should be available without elevated permissions.
| host.mac | Host mac addresses. | keyword | | |
| host.name | Name of the host. It can contain what `hostname` returns on Unix systems, the fully qualified domain name, or a name specified by the user. The sender decides which value to use. | keyword | | |
| host.network.in.bytes | The number of bytes received on all network interfaces by the host in a given period of time. | scaled_float | byte | counter |
| host.network.in.packets | The number of packets received on all network interfaces by the host in a given period of time. | long | | |
| host.network.in.packets | The number of packets received on all network interfaces by the host in a given period of time. | scaled_float | | counter |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks off. Why is this suddenly a scaled float but at the same time a counter which I assume is always a long ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same here: the same field defined twice: once in fields.yml:

- name: network.in.bytes
type: scaled_float
format: bytes
unit: byte
metric_type: counter
description: |
The number of bytes received on all network interfaces by the host in a given period of time.

another one in agent.yml:

- name: network.in.bytes
type: long
format: bytes
description: >
The number of bytes received on all network interfaces by the host in a given period of time.

I am not sure what is a correct 😕 Could be that the files evaluation sequence changed that caused this

@elastic/elastic-agent-data-plane I need some help here, what is the correct value for this field?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to say the one in fields.yml is wrong. Two questions:

  • Can you check what is loaded today if you install the package? Hopefully long.
  • Can you check what we have in metricbeat for the same?

If we load long today and it is also the one used in Metricbeat, lets fix the fields.yml file. Otherwise lets continue the discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check what is loaded today if you install the package? Hopefully long.

long
Screenshot 2023-05-31 at 11 08 05

Can you check what we have in metricbeat for the same?

also long - https://github.com/elastic/beats/blob/main/metricbeat/module/system/network/_meta/fields.yml#L30-L33

I've reverted network changes for this PR, will create an another one

Signed-off-by: Tetiana Kravchenko <[email protected]>
@ruflin
Copy link
Member

ruflin commented May 31, 2023

Could we take out the process and network changes to get this in quickly and follow up with the other two separately?

Signed-off-by: Tetiana Kravchenko <[email protected]>
Signed-off-by: Tetiana Kravchenko <[email protected]>
Signed-off-by: Tetiana Kravchenko <[email protected]>
Signed-off-by: Tetiana Kravchenko <[email protected]>
@tetianakravchenko
Copy link
Contributor Author

Could we take out the process and network changes to get this in quickly and follow up with the other two separately?

@ruflin done, process and network data_stream changes were reverted

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

As this PR does mostly add dimensions + some cleanups, this should be low risk.

@@ -1350,7 +1351,7 @@ This data should be available without elevated permissions.
| host.hostname | Hostname of the host. It normally contains what the `hostname` command returns on the host machine. | keyword | | |
| host.id | Unique host id. As hostname is not always unique, use values that are meaningful in your environment. Example: The current usage of `beat.name`. | keyword | | |
| host.ip | Host ip addresses. | ip | | |
| host.mac | Host MAC addresses. The notation format from RFC 7042 is suggested: Each octet (that is, 8-bit byte) is represented by two [uppercase] hexadecimal digits giving the value of the octet as an unsigned integer. Successive octets are separated by a hyphen. | keyword | | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where this change is coming from but I assume pulled in from some ECS changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same issue as above - there are duplication one in ecs file, another in

- name: mac
level: core
type: keyword
ignore_above: 1024
description: Host mac addresses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets ignore it for now.

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

As this PR does mostly add dimensions + some cleanups, this should be low risk.

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

As this PR does mostly add dimensions + some cleanups, this should be low risk.

@agithomas
Copy link
Contributor

Referring to #5193 (comment), please include agent.id as the dimension field.

@agithomas
Copy link
Contributor

Referring to #5193 (comment), please include agent.id as the dimension field.

I see it is already handled. Please ignore this

Copy link
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tetianakravchenko
Copy link
Contributor Author

@elastic/elastic-agent-data-plane could you please review this PR? I can't merge it since Code owner review required

@lalit-satapathy
Copy link
Collaborator

@tetianakravchenko,

Can we make this beta and update the stack.version as in this PR: #6129

@lalit-satapathy
Copy link
Collaborator

@tetianakravchenko,

Please confirm that system dashboards loading fine (without any errors) in the TSDB mode?

@tetianakravchenko
Copy link
Contributor Author

tetianakravchenko commented May 31, 2023

@tetianakravchenko,
Can we make this beta and update the stack.version as in this PR: #6129

@lalit-satapathy before doing it - this PR also should be merged #6395, there are some missing metric_type metadata fields.

after merging this and #6395, tsdb can be enabled for all except: core, network, process datastream

@tetianakravchenko
Copy link
Contributor Author

@tetianakravchenko,
Please confirm that system dashboards loading fine (without any errors) in the TSDB mode?

@lalit-satapathy yes, dashboards and inventory page works (note: tested on linux system only, so the windows dashboard have some errors regarding missing metrics)

@cmacknz cmacknz requested a review from leehinman May 31, 2023 12:42
@elasticmachine
Copy link

Package system - 1.30.0 containing this change is available at https://epr.elastic.co/search?package=system

agithomas pushed a commit to agithomas/integrations that referenced this pull request Jun 5, 2023
…except core data_streams (elastic#6118)

* add dimensions to system package, metrics data_streams only

Signed-off-by: Tetiana Kravchenko <[email protected]>

* revert k8s package

Signed-off-by: Tetiana Kravchenko <[email protected]>

* revert k8s changes; add extra dimensions for diskio and network

Signed-off-by: Tetiana Kravchenko <[email protected]>

* filesystem: add mount_point and device_name

Signed-off-by: Tetiana Kravchenko <[email protected]>

* move core and process datastream to different PR

Signed-off-by: Tetiana Kravchenko <[email protected]>

* update PR number

Signed-off-by: Tetiana Kravchenko <[email protected]>

* move agent.id field to ecs.yaml

Signed-off-by: Tetiana Kravchenko <[email protected]>

* run elastic-package build

Signed-off-by: Tetiana Kravchenko <[email protected]>

* run elastic-package build

Signed-off-by: Tetiana Kravchenko <[email protected]>

* elastic-package check

Signed-off-by: Tetiana Kravchenko <[email protected]>

* add dimensions to process data_stream

Signed-off-by: Tetiana Kravchenko <[email protected]>

* clean up some duplicated fields

Signed-off-by: Tetiana Kravchenko <[email protected]>

* revert network data_stream changes

Signed-off-by: Tetiana Kravchenko <[email protected]>

* rever process data_stream changes

Signed-off-by: Tetiana Kravchenko <[email protected]>

* adjust the changelog description

Signed-off-by: Tetiana Kravchenko <[email protected]>

* revert process data_stream changes

Signed-off-by: Tetiana Kravchenko <[email protected]>

---------

Signed-off-by: Tetiana Kravchenko <[email protected]>
sodhikirti07 pushed a commit that referenced this pull request Jun 15, 2023
…except core data_streams (#6118)

* add dimensions to system package, metrics data_streams only

Signed-off-by: Tetiana Kravchenko <[email protected]>

* revert k8s package

Signed-off-by: Tetiana Kravchenko <[email protected]>

* revert k8s changes; add extra dimensions for diskio and network

Signed-off-by: Tetiana Kravchenko <[email protected]>

* filesystem: add mount_point and device_name

Signed-off-by: Tetiana Kravchenko <[email protected]>

* move core and process datastream to different PR

Signed-off-by: Tetiana Kravchenko <[email protected]>

* update PR number

Signed-off-by: Tetiana Kravchenko <[email protected]>

* move agent.id field to ecs.yaml

Signed-off-by: Tetiana Kravchenko <[email protected]>

* run elastic-package build

Signed-off-by: Tetiana Kravchenko <[email protected]>

* run elastic-package build

Signed-off-by: Tetiana Kravchenko <[email protected]>

* elastic-package check

Signed-off-by: Tetiana Kravchenko <[email protected]>

* add dimensions to process data_stream

Signed-off-by: Tetiana Kravchenko <[email protected]>

* clean up some duplicated fields

Signed-off-by: Tetiana Kravchenko <[email protected]>

* revert network data_stream changes

Signed-off-by: Tetiana Kravchenko <[email protected]>

* rever process data_stream changes

Signed-off-by: Tetiana Kravchenko <[email protected]>

* adjust the changelog description

Signed-off-by: Tetiana Kravchenko <[email protected]>

* revert process data_stream changes

Signed-off-by: Tetiana Kravchenko <[email protected]>

---------

Signed-off-by: Tetiana Kravchenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants