Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[O365_metrics] Add teams_device_usage data stream. #12218

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion packages/o365_metrics/_dev/build/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,4 +82,14 @@ Please refer to the following [document](https://www.elastic.co/guide/en/ecs/cur

Please refer to the following [document](https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html) for detailed information on ECS fields.

{{fields "active_users"}}
{{fields "active_users"}}

### Teams Device Usage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: It will be good to add a line or two about what the dataset is for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve added this to the issue list to update it for all datasets collectively in a single documentation PR.

{{event "teams_device_usage"}}

**ECS Field Reference**

Please refer to the following [document](https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html) for detailed information on ECS fields.

{{fields "teams_device_usage"}}
3 changes: 3 additions & 0 deletions packages/o365_metrics/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# newer versions go on top
- version: "0.2.0-next"
changes:
- description: Add `teams_device_usage` data stream.
type: enhancement
link: https://github.com/elastic/integrations/pull/12218
- description: Add `active_users` data stream.
type: enhancement
link: https://github.com/elastic/integrations/pull/11934
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
dynamic_fields:
"event.ingested": ".*"
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"events": [
{
"teamsdeviceusage": "{\"Android Phone\":\"10\",\"Chrome OS\":\"20\",\"Linux\":\"30\",\"Mac\":\"40\",\"Report Date\":\"2024-12-25\",\"Report Period\":\"7\",\"Web\":\"40\",\"Windows\":\"5\",\"Windows Phone\":\"8\",\"iOS\":\"2\",\"Report Refresh Date\":\"2024-12-31\"}"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"expected": [
{
"ecs": {
"version": "8.16.0"
},
"o365": {
"metrics": {
"teams": {
"device": {
"usage": {
"android_phone": {
"count": "10"
},
"ios": {
"count": "2"
},
"mac": {
"count": "40"
},
"report": {
"date": "2024-12-25",
"period": {
"day": "7"
},
"refresh_date": "2024-12-31"
},
"web": {
"count": "40"
},
"windows": {
"count": "5"
},
"windows_phone": {
"count": "8"
}
}
}
}
}
}
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
config_version: 2
interval: {{interval}}
auth.oauth2:
client.id: {{client_id}}
client.secret: {{client_secret}}
provider: azure
scopes:
{{#each token_scopes as |token_scope|}}
- {{token_scope}}
{{/each}}
endpoint_params:
grant_type: client_credentials
{{#if token_url}}
token_url: {{token_url}}/{{azure_tenant_id}}/oauth2/v2.0/token
{{else if azure_tenant_id}}
azure.tenant_id: {{azure_tenant_id}}
{{/if}}

resource.url: {{url}}
{{#if resource_ssl}}
resource.ssl:
{{resource_ssl}}
{{/if}}

{{#if enable_request_tracer}}
resource.tracer.filename: "../../logs/cel/http-request-trace-*.ndjson"
{{/if}}

tags:
{{#if preserve_original_event}}
- preserve_original_event
{{/if}}
{{#each tags as |tag|}}
- {{tag}}
{{/each}}
{{#contains "forwarded" tags}}
publisher_pipeline.disable_host: true
{{/contains}}
{{#if processors}}
processors:
{{processors}}
{{/if}}

state:
want_more: false
base:
tenant_id: "{{azure_tenant_id}}"
period: "{{period}}"

redact:
fields:
- base.tenant_id


program: |
state.with(
request(
"GET",
state.url + "/reports/getTeamsDeviceUsageUserCounts(period='" + state.base.period + "')"
).do_request().as(resp,
resp.StatusCode == 200
?
bytes(resp.Body).mime("text/csv; header=present").as(events, {
"events": events.map(e, {"teamsdeviceusage": e.encode_json()}),

})
:
{
"events": {
"error": {
"code": string(resp.StatusCode),
"id": string(resp.Status),
"message": "GET:"+(
size(resp.Body) != 0 ?
string(resp.Body)
:
string(resp.Status) + ' (' + string(resp.StatusCode) + ')'
),
},
},
"want_more": false,
}
)
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
description: Pipeline for renaming object
processors:
- set:
field: ecs.version
value: "8.16.0"
- json:
field: teamsdeviceusage
target_field: o365.metrics.teams.device.usage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add ignore_failure: true here so the pipeline doesn't break.

- script:
lang: painless
description: Replace spaces and dashes in field names under o365.metrics.teams.device.usage.
tag: painless_purge_spaces_and_dashes
if: ctx.o365.metrics?.teams.device.usage instanceof Map
source: |
String underscore(String s) {
String result = /[ -]/.matcher(s).replaceAll('_').toLowerCase();
return /[\ufeff]/.matcher(result).replaceAll('')
}

def out = [:];
for (def item : ctx.o365.metrics.teams.device.usage.entrySet()) {
out[underscore(item.getKey())] = item.getValue();
}
ctx.o365.metrics.teams.device.usage = out;

- remove:
if: ctx.teamsdeviceusage != null
field: teamsdeviceusage
ignore_missing: true

- rename:
field: o365.metrics.teams.device.usage.android_phone
target_field: o365.metrics.teams.device.usage.android_phone.count
ignore_missing: true
- rename:
field: o365.metrics.teams.device.usage.ios
target_field: o365.metrics.teams.device.usage.ios.count
ignore_missing: true
- rename:
field: o365.metrics.teams.device.usage.mac
target_field: o365.metrics.teams.device.usage.mac.count
ignore_missing: true
- rename:
field: o365.metrics.teams.device.usage.web
target_field: o365.metrics.teams.device.usage.web.count
ignore_missing: true
- rename:
field: o365.metrics.teams.device.usage.windows
target_field: o365.metrics.teams.device.usage.windows.count
ignore_missing: true
- rename:
field: o365.metrics.teams.device.usage.windows_phone
target_field: o365.metrics.teams.device.usage.windows_phone.count
ignore_missing: true
- rename:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this and refresh_date, are we going to use the date processor to process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, we use the date processor to parse dates from fields and then use the parsed date or timestamp as the document's timestamp in Elasticsearch. However, we cannot use report_date or report_refresh_date as the document timestamp because these fields are generated by Microsoft and do not represent the actual ingestion time of the document in Elasticsearch.

For example, if a report is fetched today (January 10) to retrieve the last 7 days of data, the report_refresh_date might be January 7, which would be the same for all 7 documents. Meanwhile, the report_date for these 7 documents would range from January 1 to January 7. This means neither report_date nor report_refresh_date can reliably represent the document's ingestion timestamp.

Please let me know if I’ve misunderstood something or if your question was different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I understand now. But suppose when you will make the visualization, what are you going to use as the timestamp? Suppose, the user wants to use the visualization (or for aggregation purpose) then we cannot use the ingestion time, right. Are you going to manually change the timestamp to these fields when plotting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I assume they can use report_date if they want the visualizations to reflect the actual date the data pertains to, or report_refresh_date if the visualizations should represent the date the report was last updated or refreshed. What are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of the integration testing, do you see any latency in the availability of metrics?

For example : If you run the integration for the current date, the data is available only till yesterday.

If yes, it would be best to mention this latency in the README.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the integration testing, I’ve set the API call interval to 3 minutes, and during this period, I do see data ingestion occurring every 3 min. However, this short interval is just for testing purposes. In a production environment, a 3-minute interval is not practical. Ideally, the interval should be at least 24 hours, as the Microsoft API only fetches the latest report, and reports are usually refreshed every 3-4 days. So, even if we call the API every 24 hours, the records might remain the same.

But I agree that updating these details in the README is important, especially since we're receiving data for like 7 days here, so users can configure accordingly for both period and interval. While I assume microsoft users should already be aware of how this works, I will ensure the README is detailed for clarity.

field: o365.metrics.teams.device.usage.report_date
target_field: o365.metrics.teams.device.usage.report.date
ignore_missing: true
- rename:
field: o365.metrics.teams.device.usage.report_period
target_field: o365.metrics.teams.device.usage.report.period.day
ignore_missing: true
- rename:
field: o365.metrics.teams.device.usage.report_refresh_date
target_field: o365.metrics.teams.device.usage.report.refresh_date
ignore_missing: true
- remove:
field: o365.metrics.teams.device.usage.linux
ignore_missing: true
- remove:
field: o365.metrics.teams.device.usage.chrome_os
ignore_missing: true

on_failure:
- append:
field: error.message
value: 'Processor {{{_ingest.on_failure_processor_type}}} with tag {{{_ingest.on_failure_processor_tag}}} in pipeline {{{_ingest.on_failure_pipeline}}} failed with message: {{{_ingest.on_failure_message}}}'
- append:
field: event.kind
value: pipeline_error
allow_duplicates: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
- name: cloud
title: Cloud
group: 2
description: Fields related to the cloud or infrastructure the events are coming from.
footnote: 'Examples: If Metricbeat is running on an EC2 host and fetches data from its host, the cloud info contains the data about this machine. If Metricbeat runs on a remote machine outside the cloud and fetches data from a service running in the cloud, the field contains cloud data from the machine the service is running on.'
type: group
fields:
- name: image.id
type: keyword
description: Image ID for the cloud instance.
- name: host
title: Host
group: 2
description: 'A host is defined as a general computing instance. ECS host.* fields should be populated with details about the host on which the event happened, or from which the measurement was taken. Host types include hardware, virtual machines, Docker containers, and Kubernetes nodes.'
type: group
fields:
- name: containerized
type: boolean
description: >
If the host is a container.

- name: os.build
type: keyword
example: "18D109"
description: >
OS build information.

- name: os.codename
type: keyword
example: "stretch"
description: >
OS codename, if any.

Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
- name: data_stream.type
type: constant_keyword
description: Data stream type.
- name: data_stream.dataset
type: constant_keyword
description: Data stream dataset.
- name: data_stream.namespace
type: constant_keyword
description: Data stream namespace.
- name: '@timestamp'
type: date
description: Event timestamp.
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
- name: o365.metrics.teams.device.usage
type: group
fields:
- name: android_phone.count
type: integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please verify if the type: integer lead to any limitations in the value range. 2**31-1 is not a big value range. Please verify for the other integer fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I couldn't find any specific details about these fields in the API documentation and we're only retrieving 7 days of data, it should not exceed the limit. However, I agree that I should verify this, not just for this field but for all fields across all data streams. Additionally, using long might indeed be a better choice here. I've added this to the issue where we're tracking all review comments for verification across all data streams. I'll address this in a separate PR covering all the data streams. Does that sound okay?

description: |
The number of active Teams users on Android devices.
- name: ios.count
type: integer
description: |
The number of active Teams users on iOS devices (iPhone and iPad).
- name: mac.count
type: integer
description: |
The number of active Teams users on macOS devices.
- name: web.count
type: integer
description: |
The number of active Teams users accessing via web browsers.
- name: windows.count
type: integer
description: |
The number of active Teams users on Windows devices.
- name: windows_phone.count
type: integer
description: |
The number of active Teams users on Windows Phone devices.
- name: report
type: group
fields:
- name: period.day
unit: d
type: integer
description: |
The duration (e.g., 7 days) over which the report data is aggregated.
- name: refresh_date
type: date
description: |
The date when the report data was last updated.
- name: date
type: date
description: |
The specific date for which the report data applies.
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
title: Microsoft Office 365 Teams Device Usage metrics
type: metrics
streams:
- input: cel
title: Office 365 Teams Device Usage metrics.
enabled: true
description: Collect Office 365 Teams Device Usage metrics.
template_path: cel.yml.hbs
vars:
- name: interval
type: text
title: Interval
description: The interval at which the API is polled, supported in seconds, minutes, and hours.
show_user: true
required: true
default: 3m
- name: processors
type: yaml
title: Processors
multi: false
required: false
show_user: false
description: >
Processors are used to reduce the number of fields in the exported event or to enhance the event with metadata. This executes in the agent before the logs are parsed. See [Processors](https://www.elastic.co/guide/en/fleet/current/elastic-agent-processor-configuration.html) for details.
- name: period
type: text
title: Period
description: >
Specifies the length of time over which the report is aggregated. The supported values are: D7, D30, D90, and D180.
show_user: true
required: true
default: D7
- name: tags
type: text
title: Tags
multi: true
required: false
show_user: false
default:
- o365.metrics.outlook.activity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename tag.

- name: enable_request_tracer
type: bool
title: Enable request tracing
multi: false
required: false
show_user: false
description: >-
The request tracer logs HTTP requests and responses to the agent's local file-system for debugging configurations. Enabling this request tracing compromises security and should only be used for debugging. See [documentation](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-cel.html#_resource_tracer_filename) for details.
Loading