decision in the matter of cloudmon/stackmon approach #576

piobig2871 · 2024-04-23T11:48:46Z

No description provided.

Signed-off-by: Piotr Bigos <[email protected]>

artificial-intelligence · 2024-04-25T07:17:06Z

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md

+
+## Decision
+
+By opting for Gherkin test scenarios with Python mapping API calls to OpenStack, we aim to address the


we could link to Gherkin here, maybe, for people not familiar with it (myself included)?
I believe this should be an official source? https://cucumber.io/docs/gherkin/

@artificial-intelligence this is a very good idea. I have not thought about that, thanks for pointing it out. I can also write a sentence or two about behave framework (https://behave.readthedocs.io/en/latest/) that we are using and link the documentation there. Would that be helpful as well?

Yes that would be nice, I'm only lightly familiar with the topic (I think I read some articles about it some years ago).

Signed-off-by: Piotr Bigos <[email protected]>

bitkeks · 2024-04-25T14:58:03Z

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md

@@ -0,0 +1,43 @@
+---
+title: Evaluate current state of CloudMon


Maybe a better fitting title for the SCS DR would be "Evaluating the deployment of CloudMon in SCS infrastructure"?

ACK, I will apply that

bitkeks · 2024-04-25T15:00:44Z

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md

+## Motivation
+
+One such solution, the [Cloudmon](https://stackmon.org/) project, presents challenges that may hinder its
+suitability for organizations seeking efficient and reliable monitoring capabilities. This introduction


Should we list the challenges? The tool itself does good work - for infrastructures of a bigger scale like OTC. We need to compare it to our use case to put the challenges into context

I will describe that

bitkeks · 2024-04-25T15:02:32Z

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md

+## Decision
+
+By opting for [Gherkin](https://cucumber.io/docs/gherkin/) test scenarios with Python mapping API calls to OpenStack,
+we aim to address the complexities and shortcomings of the Cloudmon project while ensuring our monitoring and testing


we aim to address the complexities and shortcomings of the Cloudmon project

See comment above - what do these complexities and shortcomings consist of?

Signed-off-by: Piotr Bigos <[email protected]>

…kmon approach Signed-off-by: Piotr Bigos <[email protected]>

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md

Co-authored-by: Matej Feder <[email protected]> Signed-off-by: Piotr <[email protected]>

Signed-off-by: Piotr Bigos <[email protected]>

gtema

This is very subjective outcome and proposed alternative does not look good to me while bringing enormous effort for the implementation.

gtema · 2024-06-07T06:33:07Z

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md

+against utilizing the Cloudmon project and instead embraces a more streamlined and effective approach involving Gherkin
+test scenarios and mapping Python API calls to interact with OpenStack resources. By addressing the complexities and
+shortcomings of the Cloudmon project, SCS organization aims to adopt a monitoring solution that not only meets but
+exceeds our requirements for simplicity, reliability, and ease of use.


Honestly this phrasing is more then subjective. After dozens of hours spend explaining the documentation, setup scenarios, testing capabilities, workshops this statement is very offensive. SCS organization entered the evaluation and asked for multiple workshops to understand the tooling and offered help in improving the documentation. There was absolutely no output.
What you offer as an alternative is another "one of thousands" of namely universal testing frameworks that were not designed to monitor cloud environment and are not really dealing with measuring low level performance (of the API itself).

gtema · 2024-06-07T06:36:07Z

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md

+
+Our approach was to base on a technical concept. This document serves as a proposal, with the final decision
+subject to discussion with SCS team members. We propose a behavior-driven system based on solutions using Java framework
+Cucumber, utilizing the Gherkin domain-specific language for defining executable specifications. This approach ensures


So what you offer is clearly controversial to the objective you stated above: simplicity. Taking Java and a widely unknown framework with even less known DSL is going to harm simplicity and understand-ability of the solution. It brings enormous amount of efforts required by operators to learn those things and be able to extend it.

main idea of stackmon is in using ansible as a test scenario that literally anybody is able to read/understand/replay locally/extend (endlessly)

gtema · 2024-06-07T06:39:38Z

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md

+Cucumber, utilizing the Gherkin domain-specific language for defining executable specifications. This approach ensures
+clear, human-readable test behavior definitions, facilitating participation from both developers and non-technical
+contributors. Considering the team's proficiency in Python, the language's simplicity and clarity, alignment with
+OpenStack's ecosystem, and the robust support from the Python community, it's evident that Python presents a superior


I do not get how you come from Java mentioned above to the python

gtema · 2024-06-07T06:42:50Z

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md

+## Challanges
+
+During our assessment of the Cloudmon/Stackmon project, we encountered significant challenges related to documentation,
+particularly regarding lack of examples for configuration setups and usage guidelines. The lack of comprehensive


This doesn't sound fair. You were offered dedicated session, pointed to the relevant documentation (and as mentioned above it was communicated towards the CloudMon team that taking those workshops you will help improving the documentation), pointed to detailed explanation of the configuration, pointed to the live production configuration. If production configuration is not considered as "lack of examples" I do not know what to say more. If production testing scenarios of a very big OpenStack based public cloud with much more services then vanila SCS comes with are not sufficient it shows me the evaluation was performed not very detailed.

gtema · 2024-06-07T06:46:46Z

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md

+
+## Decision
+
+By opting for [Gherkin](https://cucumber.io/docs/gherkin/) test scenarios with Python mapping API calls to OpenStack,


OpenTelekomCloud is using alternative framework (Robot Framework) for doing system acceptance testing. This comes with a huge effort (which you do not explicitly mention here, but is read between the lines) of developing this "python mapping API calls to OpenStack". This is a huge and complex work that can be avoided by relying on existing tooling for OpenStack. And btw, RobotFramework (at OTC) and behavior driven testing is great in testing, but is absolutely unsuitable for health and performance monitoring

artificial-intelligence · 2024-06-07T08:15:30Z

As a general comment on how to approach system design decisions:

From this text, it is not clear to me, why exactly is cucumber superior.

I don't think it's a matter of the implementation language.

You should be able to show concrete examples what cucumber actually does better than cloudmon, so you could convince me, as a reader.

I would also expect a proposal to use new technology to list the shortcomings of said technology as well, because in my experience every technological decision has, of course, up- and downsides.

Presenting only upsides is showing me, that we either are not aware of the downsides, which is dangerous, because it means we will learn of the downsides much later, and they might be even worse then the previous solution. In rare cases there are no downsides and you have a straight up better technical solution, which should be spelled out as well in the text, showing that at least it was thought about if there are downsides to the new solution.

If I don't see such things, it's a red flag to me, as it's shows a clear bias in the decision, because when we are honest, of course we all, as humans, like one technology more than the other for a variety of subjective reasons like prior familiarity, trendiness, marketing and last but not least, technical merit.

I think though it's important that we know about our own biases and at least make an attempt to counter them by an honest technical analysis of the up- and downsides of a given software.

So either not enough research was done, to be able to list up- and downsides of all solutions.
Or there is some bias, sweeping downsides under the rug.

Both are understandable things that happen to all of us, still I think if we want to make better decisions we should try to avoid that.

Thanks for reading, if you made it this far.

artificial-intelligence · 2024-06-07T07:54:11Z

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md

+In the fast-paced environment of modern cloud computing, ensuring the reliability and performance
+of cloud infrastructure is paramount for organizations. Effective monitoring and testing framework


Unfortunately I have many problems with this.

The complete text lacks a clearer definition of how e.g. Cloudmon "monitors" "reliability and performance" - what that even means in this context, because the context is unclear.

First, it only talks about "Cloud Computing", from guessing and stuff I know about Cloudmon I think this is about the IaaS Layer of a Cloud, but it isn't even mentioned anywhere.

I think this should be spelled out.

Second, neither "reliability" or "performance" or "monitoring" are really defined here, and used intermixed, which poses some problems, as you can see when looking at my next questions.

artificial-intelligence · 2024-06-07T08:08:12Z

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md

+
+In the fast-paced environment of modern cloud computing, ensuring the reliability and performance
+of cloud infrastructure is paramount for organizations. Effective monitoring and testing framework
+are essential tools in this endeavor, enabling proactive identification and resolution of issues before


e.g. most monitoring doesn't really allow proactive resolution of issues.

Monitoring is just analysing live data after the fact. e.g. you can see an API is slow in monitoring, or you can see a disk filling up. But all the activities monitoring allows a DevOps Team to do are reactive. You notice a problem in monitoring, then you react.

A testing framework, on the other hand, can of course be used to detect problems e.g. before deployment.

artificial-intelligence · 2024-06-07T08:10:55Z

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md

+Our approach was to base on a technical concept. This document serves as a proposal, with the final decision
+subject to discussion with SCS team members. We propose a behavior-driven system based on solutions using Java framework


There are many different testing frameworks and testing strategies, which are not really differentiated in this text. You propose a behavior-driven test concept using cucumber, but somehow the document fails to mention any up- or downsides of behavior driven testing or of this specific framework.

There is no reasoning I have read in this text why behavior driven testing is superior to the cloudmon approach.

And I don't doubt that there might be advantages to it, but those need to be clearly spelled out somewhere.

This text also talks about a "technical concept", where can I see that? It isn't linked here, or included. If it is the basis for the decision, it surely should be included?

I only know both technologies superficially, for the record, that is cloudmon and cucumber.

bitkeks · 2024-07-03T14:14:39Z

What's the status here?

decision in the matter of cloudmon/stackmon approach - review

bcc4b2f

Signed-off-by: Piotr Bigos <[email protected]>

piobig2871 requested a review from bitkeks April 23, 2024 11:49

piobig2871 added the SCS-VP12 Related to tender lot SCS-VP12 label Apr 23, 2024

piobig2871 changed the title ~~decision in the matter of cloudmon/stackmon approach - review~~ decision in the matter of cloudmon/stackmon approach Apr 23, 2024

Piotr Bigos added 2 commits April 23, 2024 14:36

pylint fixes

6f1520c

Signed-off-by: Piotr Bigos <[email protected]>

pylint fixes, removing space and extra line

82ade46

Signed-off-by: Piotr Bigos <[email protected]>

artificial-intelligence reviewed Apr 25, 2024

View reviewed changes

piobig2871 requested a review from artificial-intelligence April 25, 2024 07:44

Piotr Bigos added 2 commits April 25, 2024 10:59

including sugested links into document

de1a29f

Signed-off-by: Piotr Bigos <[email protected]>

pylint fix, removing double spaces and space on the end of line

7fb6258

Signed-off-by: Piotr Bigos <[email protected]>

bitkeks reviewed Apr 25, 2024

View reviewed changes

Piotr Bigos and others added 3 commits May 6, 2024 15:20

adding suggested title to the documentation

7382ac6

Signed-off-by: Piotr Bigos <[email protected]>

Merge branch 'main' into standards/scs-cloudmon-decision

1cb5529

adding explanation about challenges meet during testing cloudmon/stac…

68d3d0e

…kmon approach Signed-off-by: Piotr Bigos <[email protected]>

matofeder self-requested a review May 16, 2024 07:12

matofeder reviewed May 17, 2024

View reviewed changes

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md Outdated Show resolved Hide resolved

Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md Outdated Show resolved Hide resolved

piobig2871 and others added 3 commits May 17, 2024 16:24

Update Standards/scs-0106-v1-evaluate-current-state-of-cloudmon.md

1df0f6e

Co-authored-by: Matej Feder <[email protected]> Signed-off-by: Piotr <[email protected]>

rephrase motivation

3501430

Signed-off-by: Piotr Bigos <[email protected]>

Merge branch 'main' into standards/scs-cloudmon-decision

33b28f3

matofeder approved these changes Jun 6, 2024

View reviewed changes

gtema requested changes Jun 7, 2024

View reviewed changes

artificial-intelligence requested changes Jun 7, 2024

View reviewed changes

bitkeks assigned piobig2871 Jun 13, 2024

bitkeks assigned bitkeks and unassigned piobig2871 Aug 13, 2024

bitkeks added this to the R7 (v8.0.0) milestone Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decision in the matter of cloudmon/stackmon approach #576

decision in the matter of cloudmon/stackmon approach #576

piobig2871 commented Apr 23, 2024

artificial-intelligence Apr 25, 2024

piobig2871 Apr 25, 2024

artificial-intelligence Apr 25, 2024

bitkeks Apr 25, 2024

piobig2871 Apr 26, 2024

bitkeks Apr 25, 2024

piobig2871 Apr 26, 2024

bitkeks Apr 25, 2024

gtema left a comment •

edited

Loading

gtema Jun 7, 2024

gtema Jun 7, 2024

gtema Jun 12, 2024

gtema Jun 7, 2024

gtema Jun 7, 2024

gtema Jun 7, 2024

artificial-intelligence commented Jun 7, 2024

artificial-intelligence Jun 7, 2024

artificial-intelligence Jun 7, 2024

artificial-intelligence Jun 7, 2024

bitkeks commented Jul 3, 2024


		## Decision

		By opting for Gherkin test scenarios with Python mapping API calls to OpenStack, we aim to address the

		@@ -0,0 +1,43 @@
		---
		title: Evaluate current state of CloudMon


		## Decision

		By opting for [Gherkin](https://cucumber.io/docs/gherkin/) test scenarios with Python mapping API calls to OpenStack,

		In the fast-paced environment of modern cloud computing, ensuring the reliability and performance
		of cloud infrastructure is paramount for organizations. Effective monitoring and testing framework

		Our approach was to base on a technical concept. This document serves as a proposal, with the final decision
		subject to discussion with SCS team members. We propose a behavior-driven system based on solutions using Java framework

decision in the matter of cloudmon/stackmon approach #576

Are you sure you want to change the base?

decision in the matter of cloudmon/stackmon approach #576

Conversation

piobig2871 commented Apr 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gtema left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artificial-intelligence commented Jun 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bitkeks commented Jul 3, 2024

gtema left a comment •

edited

Loading