Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AzurePublicDatasetV2 - Workload categories and VM roles #12

Open
AndreaMorichetta opened this issue Nov 17, 2020 · 2 comments
Open

AzurePublicDatasetV2 - Workload categories and VM roles #12

AndreaMorichetta opened this issue Nov 17, 2020 · 2 comments

Comments

@AndreaMorichetta
Copy link

Hello everyone,
I am using the dataset for research purposes, and I have some questions related to the workload. In vmtable.csv some VMs are labeled as "interactive", other as "delay-insensitive", and most of them as "unknown." I would like to know how this classification has been performed, and what do they mean. E.g., is it safe to think that in the "interactive" workload include web-services?
Related to that, what does a deployment represent? Is this an application? Does it follow the definition of deployments for container strategies?

Thank you very much in advance.

@mehdibouhamidi
Copy link

hello I am using the dataset for research purposes too and i didn't find a complet description of this dataset if you did find something please contact me i will be gratful for you .

@clevilll
Copy link

clevilll commented Aug 8, 2024

Hi,

Since no one commented on this and offered some description from the data provider, it makes sense to do some EDA over time and plot the workload. However one could ask domain experts in Cloud data about this. Please see the following example are categorized as Delay-insensitive when you filtered their vmid in vmtable.csv:

Fig. 1: vmid= '//20EFdlSE3atYr9P03/9X4nF16d9RXI+JKVFfvpC281ohXWjFoS9L+ldKyb3ple' Fig. 2: vmid= '/KlNtIK2BCkiGhURgesiA/MQNyTpAgt7daSRsu2kJldWyCBGwnZCbtXR3w+vR4kq'
# Filter the VM_CPU_dataframe to find the category of the given vmid
vmid_to_check = '/KlNtIK2BCkiGhURgesiA/MQNyTpAgt7daSRsu2kJldWyCBGwnZCbtXR3w+vR4kq'
filtered_df = df[df['vmid'] == vmid_to_check]

# Display the filtered dataframe
#display(filtered_df)
print(filtered_df.to_markdown(tablefmt="grid"))
+--------+-------------------+------------------------------------------------------------------+
|        | vmcategory        | vmid                                                             |
+========+===================+==================================================================+
| 267924 | Delay-insensitive | /KlNtIK2BCkiGhURgesiA/MQNyTpAgt7daSRsu2kJldWyCBGwnZCbtXR3w+vR4kq |
+--------+-------------------+------------------------------------------------------------------+

# Filter the VM_CPU_dataframe to find the category of the given vmid
vmid_to_check = '//20EFdlSE3atYr9P03/9X4nF16d9RXI+JKVFfvpC281ohXWjFoS9L+ldKyb3ple'
filtered_df = df[df['vmid'] == vmid_to_check]

# Display the filtered dataframe
#display(filtered_df)
print(filtered_df.to_markdown(tablefmt="grid"))
+---------+-------------------+------------------------------------------------------------------+
|         | vmcategory        | vmid                                                             |
+=========+===================+==================================================================+
| 1011821 | Delay-insensitive | //20EFdlSE3atYr9P03/9X4nF16d9RXI+JKVFfvpC281ohXWjFoS9L+ldKyb3ple |
+---------+-------------------+------------------------------------------------------------------+

without domain knowledge and comment of data providers who collect data, it is difficult to reason.

Here in the left picture, I see some delay especially is clear on the 'avgcpu' at the start but not very clear in the right picture. I am also interested to know why they classed this type of VMs not sensitive to delay or delay-insensitive.

  • Does the Unknowon class translate if it is not Delay-insensitive or interactive?
  • Classification is labeled based on which factor during data acquisition??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants