Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Zypper updates and patches script metrics #222

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

Wabri
Copy link

@Wabri Wabri commented Oct 14, 2024

No description provided.

@Wabri Wabri marked this pull request as ready for review October 15, 2024 11:40
zypper.sh Outdated Show resolved Hide resolved
zypper.sh Outdated Show resolved Hide resolved
@dswarbrick
Copy link
Member

I notice that several of the metrics are specified as counters, but they will presumably fluctuate up and down as the number of available updates ebbs and flows. As such, they should be designated as gauge, not counter.

Whilst this repo accepts both shell scripts and Python scripts, once a shell script starts to include non-trivial amounts of awk, I think my personal feeling would be that it's time to write it in Python - where you can leverage the official Prometheus Python client library.

@Wabri
Copy link
Author

Wabri commented Oct 19, 2024

Ok, I'll create the python script so we can compare what are the most complex to maintain or read 👍

@Wabri Wabri marked this pull request as draft October 19, 2024 16:38
@Wabri Wabri force-pushed the feature/zypper branch 6 times, most recently from c95cf87 to 2c54027 Compare October 23, 2024 09:41
@Wabri Wabri marked this pull request as ready for review October 23, 2024 09:46
@Wabri
Copy link
Author

Wabri commented Oct 23, 2024

Hi @dswarbrick I've transcript the basch script into python script. Let us know what do you think about it!

@dswarbrick
Copy link
Member

@Wabri I won't take a serious look at this PR until the lint CI job is green. However, you seem to have overlooked my comment where I said that Python scripts can utilize the official Prometheus Python client library. This was more than just a suggestion - it's a requirement for Python scripts in this repo. It avoids the need for people to (re)invent their own text exposition format functions, which are often incomplete or buggy.

@Wabri Wabri force-pushed the feature/zypper branch 2 times, most recently from 5ef22b9 to 4eb861d Compare October 24, 2024 16:38
@Wabri
Copy link
Author

Wabri commented Oct 25, 2024

@dswarbrick let me know if there is anything to change in the code using the prometheus python client library.

@dswarbrick
Copy link
Member

@Wabri There seems to be some miscommunication here, because your Python script does not make use of the official Prometheus Python client library at all.

Let me be clear. Python code such as this will not be accepted by this repo:

    print('# HELP zypper_patches_pending_reboot_total zypper patches available which require '
          'reboot total')
    print('# TYPE zypper_patches_pending_reboot_total counter')
    print_patches_sum(data_zypper_lp,
                      prefix="zypper_patches_pending_reboot_total",
                      filters={'Interactive': 'reboot'})

The rendering of text exposition format metrics is handled by the client_python library, avoiding the need for users to implement their own (often incomplete and / or buggy) methods.

Documentation for client_python can be found at https://prometheus.github.io/client_python/, and you can also look at the existing Python scripts in this repo for some examples of how to use it.

Jumping ahead a little bit, I also see that there is no error checking or handling being performed when calling the /usr/bin/zypper external process. It would be nice to see something added there, e.g. handle command execution failure gracefully, output error (to stderr) and exit with non-zero status.

@Wabri
Copy link
Author

Wabri commented Oct 27, 2024

No miscommunication, I understand that you want the library in my python script, but I've never use it and in my previous comment I was just asking where do I need to use it. I saw that the library is lacking of documentation, for me at least.

Thanks for the clarification, I'll go ahead with the changes you ask to do and I'll be back.

@Wabri Wabri marked this pull request as draft October 27, 2024 08:28
@Wabri Wabri force-pushed the feature/zypper branch 2 times, most recently from 3014ba2 to 68a414f Compare October 27, 2024 16:49
@Wabri
Copy link
Author

Wabri commented Oct 27, 2024

Hi @dswarbrick!

I've done some changes:

  • I removed all the non standard information print by replacing them with the prometheus client library functions
  • Add the subprocess error handling with the output of stderr
  • Refactor the extraction of the zypper list datas to let the code be more readable

I've not tested the scripts yet, I'll set the pr ready for review when I've done it!

@dswarbrick
Copy link
Member

I saw that the library is lacking of documentation, for me at least.

I literally posted a link to the client_python documentation in my previous comment: https://prometheus.github.io/client_python/

@Wabri
Copy link
Author

Wabri commented Oct 27, 2024

You are right, I meant lacking "a good" documentation. But it is a personal thought.

Wabri and others added 14 commits October 28, 2024 14:11
Co-authored-by: Bernd Schubert <[email protected]>
Signed-off-by: Wabri <[email protected]>
Signed-off-by: Bernd Schubert <[email protected]>
Signed-off-by: Wabri <[email protected]>
Signed-off-by: Wabri <[email protected]>
Signed-off-by: Bernd Schubert <[email protected]>
Signed-off-by: Wabri <[email protected]>
Signed-off-by: Bernd Schubert <[email protected]>
Signed-off-by: Wabri <[email protected]>
Co-authored-by: Daniel Swarbrick <[email protected]>
Signed-off-by: GabrielePuliti <[email protected]>
Signed-off-by: Wabri <[email protected]>
Co-authored-by: Daniel Swarbrick <[email protected]>
Signed-off-by: GabrielePuliti <[email protected]>
Signed-off-by: Wabri <[email protected]>
@Wabri Wabri changed the title feat(zypper.sh): add zypper updates and patches script metrics Add Zypper updates and patches script metrics Oct 28, 2024
@Wabri Wabri marked this pull request as ready for review November 4, 2024 09:52
@Wabri
Copy link
Author

Wabri commented Nov 4, 2024

@dswarbrick tested and it works as expected!

def __parse_arguments(argv):
parser = argparse.ArgumentParser()

parser.add_mutually_exclusive_group(required=False)
Copy link
Member

@dswarbrick dswarbrick Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect. I think what you meant to do is

group = parser.add_mutually_exclusive_group(required=False)
group.add_argument("-m", ...)
group.add_argument("-l", ...)

However, this combination of "more / less" in a mutually exclusive group, and with a default that sets the --more option true (thus overriding the fact you set the group required=False), is really unintuitive from a user's perspective.

You have essentially just reinvented a boolean option, which by definition can only be one of two values, i.e. mutually exclusive.

Please re-work this to something that is a bit clearer.

"zypper package update available from repository. "
"(0 = not available, 1 = available)"
)
info = Info(prefix, description)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You specify a metric description that includes "(0 = not available, 1 = available)", yet Info metrics cannot take a user-defined value, and always have a value of 1. I'm not sure what your intentions are here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dswarbrick Thanks for your comments and effort to review it. The explanation for 0 and 1 in the help section clarifies what value can be expected to get from the metrics. Even if it is clear for some people not of of the users will have the same knowledge from the beginning. From my point of view having this hint mentioned here does hurt anyone and makes the usage a bit easier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pirat013 The issue is not with having that information in the metric description. The issue is that using an Info metric type does not allow the value to be specified - they always have a value of 1. Encoding the 0 or 1 status as a label is not an acceptable approach, due to its propensity to create new series and result in stale metrics when that label value changes. Using a Gauge metric, and setting the metric value to 0 or 1, is the only correct way to do this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, got it.

("Status", "status")]
prefix = "zypper_patch_pending"
description = "zypper patch available from repository. (0 = not available , 1 = available)"
info = Info(prefix, description)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above regarding Info metric types.


def print_reboot_required():
needs_restarting_path = '/usr/bin/needs-restarting'
is_path_ok = os.path.isfile(needs_restarting_path) and os.access(needs_restarting_path, os.X_OK)
Copy link
Member

@dswarbrick dswarbrick Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overly paranoid (and not atomic). A more elegant approach would simply be to catch subprocess.CalledProcessError exceptions thrown by subprocess.run.

@dswarbrick
Copy link
Member

You are right, I meant lacking "a good" documentation. But it is a personal thought.

I agree, the online HTML documentation is a bit sparse, mostly only showing example usage. However, client_python, like most Python code, is effectively self-documenting. For example:

>>> from prometheus_client import Gauge
>>> help(Gauge)
Help on class Gauge in module prometheus_client.metrics:

class Gauge(MetricWrapperBase)
 |  Gauge(name: str, documentation: str, labelnames: Iterable[str] = (), namespace: str = '', subsystem: str = '', unit: str = '', registry: Optional[prometheus_client.registry.CollectorRegistry] = <prometheus_client.registry.CollectorRegistry object at 0x7c1844c2d5b0>, _labelvalues: Optional[Sequence[str]] = None, multiprocess_mode: Literal['all', 'liveall', 'min', 'livemin', 'max', 'livemax', 'sum', 'livesum', 'mostrecent', 'livemostrecent'] = 'all')

stderr=subprocess.DEVNULL,
check=False)
if result.returncode == 0:
info.info({"node_reboot_required": "0"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again you are abusing the Info metric type. Specifying a 0 or 1 value as a label will create new series whenever that value changes, leading to stale metrics due to Prometheus' look-behind mechanism. What you need to use is a Gauge metric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants