-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[O365_metrics] Add teams_device_usage data stream. #12218
base: main
Are you sure you want to change the base?
Conversation
🚀 Benchmarks reportPackage
|
Data stream | Previous EPS | New EPS | Diff (%) | Result |
---|---|---|---|---|
mailbox_usage |
6451.61 | 4255.32 | -2196.29 (-34.04%) | 💔 |
To see the full report comment with /test benchmark fullreport
💚 Build Succeeded
cc @ritalwar |
Quality Gate failedFailed conditions |
required: false | ||
show_user: false | ||
default: | ||
- o365.metrics.outlook.activity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename tag.
value: "8.16.0" | ||
- json: | ||
field: teamsdeviceusage | ||
target_field: o365.metrics.teams.device.usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add ignore_failure: true
here so the pipeline doesn't break.
type: group | ||
fields: | ||
- name: android_phone.count | ||
type: integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please verify if the type: integer
lead to any limitations in the value range. 2**31-1
is not a big value range. Please verify for the other integer
fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I couldn't find any specific details about these fields in the API documentation and we're only retrieving 7 days of data, it should not exceed the limit. However, I agree that I should verify this, not just for this field but for all fields across all data streams. Additionally, using long
might indeed be a better choice here. I've added this to the issue where we're tracking all review comments for verification across all data streams. I'll address this in a separate PR covering all the data streams. Does that sound okay?
|
||
### Teams Device Usage | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: It will be good to add a line or two about what the dataset is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve added this to the issue list to update it for all datasets collectively in a single documentation PR.
I assume the TSDS enablement will be done through a separate PR. Please remember to add metric_type mapping, TSDS enablement without fail. |
field: o365.metrics.teams.device.usage.windows_phone | ||
target_field: o365.metrics.teams.device.usage.windows_phone.count | ||
ignore_missing: true | ||
- rename: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this and refresh_date, are we going to use the date
processor to process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, we use the date
processor to parse dates from fields and then use the parsed date or timestamp as the document's timestamp in Elasticsearch. However, we cannot use report_date
or report_refresh_date
as the document timestamp because these fields are generated by Microsoft and do not represent the actual ingestion time of the document in Elasticsearch.
For example, if a report is fetched today (January 10) to retrieve the last 7 days of data, the report_refresh_date
might be January 7, which would be the same for all 7 documents. Meanwhile, the report_date
for these 7 documents would range from January 1 to January 7. This means neither report_date
nor report_refresh_date
can reliably represent the document's ingestion timestamp.
Please let me know if I’ve misunderstood something or if your question was different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I understand now. But suppose when you will make the visualization, what are you going to use as the timestamp? Suppose, the user wants to use the visualization (or for aggregation purpose) then we cannot use the ingestion time, right. Are you going to manually change the timestamp to these fields when plotting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! I assume they can use report_date
if they want the visualizations to reflect the actual date the data pertains to, or report_refresh_date
if the visualizations should represent the date the report was last updated or refreshed. What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of the integration testing, do you see any latency in the availability of metrics?
For example : If you run the integration for the current date, the data is available only till yesterday.
If yes, it would be best to mention this latency in the README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the integration testing, I’ve set the API call interval to 3 minutes, and during this period, I do see data ingestion occurring every 3 min. However, this short interval is just for testing purposes. In a production environment, a 3-minute interval is not practical. Ideally, the interval should be at least 24 hours, as the Microsoft API only fetches the latest report, and reports are usually refreshed every 3-4 days. So, even if we call the API every 24 hours, the records might remain the same.
But I agree that updating these details in the README is important, especially since we're receiving data for like 7 days here, so users can configure accordingly for both period and interval. While I assume microsoft users should already be aware of how this works, I will ensure the README is detailed for clarity.
Proposed commit message
Add teams_device_usage data stream to O365_metrics package.
Checklist
changelog.yml
file.How to test this PR locally
Related issues
Screenshots