Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'/1.0/instances?recursion=2' Endpoint has missing information. #14277

Open
6 tasks
Kxiru opened this issue Oct 15, 2024 · 0 comments
Open
6 tasks

'/1.0/instances?recursion=2' Endpoint has missing information. #14277

Kxiru opened this issue Oct 15, 2024 · 0 comments
Assignees
Labels
Bug Confirmed to be a bug
Milestone

Comments

@Kxiru
Copy link

Kxiru commented Oct 15, 2024

Required information

  • Distribution:
  • Distribution version:
  • The output of "snap list --all lxd core20 core22 core24 snapd":
  • The output of "lxc info" or if that fails:
    • Kernel version: 6.8.0-45-generic
    • LXC version: 5.21.2 LTS
    • LXD version: 5.21.2 LTS
    • Storage backend in use:

Issue description

This issue documents the findings of an investigation into the limitations and issues with using certain API endpoints for fetching instance data, specifically disk and memory information. The investigation focused on two main endpoints: the /1.0/metrics endpoint and the /1.0/instances?recursion=2 endpoint. Both endpoints exhibit shortcomings in their ability to provide the required information, especially when instances are in a "stopped" state.

Findings

1. Issues with the /1.0/metrics Endpoint
The metrics endpoint is currently used on the "Detail Instances" page to calculate various instance-related metrics.

  • Problem 1: It does not provide data when an instance is stopped. Specifically, certain metrics (such as disk and memory totals) are unavailable when the instance is not running. I have found that this is because when an instance is not running, certain metrics such as "lxd_memory_MemFree_bytes" and "lxd_memory_MemTotal_bytes" are not available in the api response.
  • Problem 2: The metrics endpoint returns a large amount of data, much of which is filtered out after retrieval. This can impose a significant load on larger systems, making it a suboptimal choice for regular use. That being said, in LXD-UI we use Lazy loading to combat this, but it may still not be a sustainable solution.

Given these limitations, it is not feasible to rely on the metrics endpoint for obtaining instance data, especially when aiming for a lightweight solution that works regardless of instance status.

2. Issues with the /1.0/instances?recursion=2 Endpoint

The /1.0/instances?recursion=2 endpoint is designed to fetch comprehensive data on all instances. It should ideally return all necessary details, including disk and memory metrics, irrespective of the instance state.

  • Problem 1: When an instance is stopped, the total field for disk and memory metrics that is returned from the API is set to 0, meaning the data is not accurately reported. Please see the responses below for context.
  • Problem 2: When the instance is running, the disk attribute does not display the total correctly (shows 0, this is broken), which impacts the reliability of this endpoint for fetching disk usage metrics.

Note, when this endpoint is is used to provide memory usage totals, it is understandable that when an instance is stopped it should not return data (as memory would not be in use).

/1.0/instances?recursion=2 on a running instance

{
    "status": "Running",
    "status_code": 103,
    "disk": {
        "root": {
            "usage": 1183744,
            "total": 0
        }
    },
    "memory": {
        "usage": 1310720,
        "usage_peak": 0,
        "total": 7823340000,
        "swap_usage": 331776,
        "swap_usage_peak": 0
    },
...
}

(Note how despite running, the 'total' data returned from the disk is 0?)

/1.0/instances?recursion=2 on a Stopped Instance

{
    "status": "Stopped",
    "status_code": 102,
    "disk": {
        "root": {
            "usage": 1182720,
            "total": 0
        }
    },
    "memory": {
        "usage": 0,
        "usage_peak": 0,
        "total": 0,
        "swap_usage": 0,
        "swap_usage_peak": 0
    },
...
}

Note, disk data should still be available here, perhaps also memory total?

Steps to reproduce

  1. Create an instance in LXD-UI
  2. Attempt to view it's disk/memory usage when the instance is running vs when it is stopped.
  3. Review the API responses from the /1.0/instances?recursion=2 endpoints.

Or

  1. Call the API on a running or stopped instance.

Information to attach

  • Any relevant kernel output (dmesg)
  • Container log (lxc info NAME --show-log)
  • Container configuration (lxc config show NAME --expanded)
  • Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
  • Output of the client with --debug
  • Output of the daemon with --debug (alternatively output of lxc monitor while reproducing the issue)
@tomponline tomponline added the Bug Confirmed to be a bug label Oct 15, 2024
@tomponline tomponline added this to the lxd-6.2 milestone Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug
Projects
None yet
Development

No branches or pull requests

3 participants