-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Demote "specialized_init_system" certification test from "essential" to "normal" #129
Comments
You can still have zombies with one of the popular, proper init systems like dummy init, just fyi. So another redundant check for zones is warranted. If you know of well known init systems, feel free to add them. Outside of that, reducing the amount of tests for microservices this will make passing harder, not easier. This is because you'll have less 'chances' to prove your good microservice practices. For example, if we removed the test you would now have 17 chances to pass 10 tests instead of 18 chances. This count of tests was done to capture the idea that what is a cloud native good practice, e.g a good microservice practice, is hazy. Check https://cloud.google.com/architecture/best-practices-for-building-containers And https://ahmet.im/blog/minimal-init-process-for-containers/ for some background |
Are we still requiring 10 tests to pass with the most recent addition of 'essential" tests? On a few tests, I was seeing Coredns CNF fail the cert when it passed 14 of 19 tests. |
@wavell My view on it is: essential best practice is to handle correctly signals and zombies. We have this covered already in other tests. As best practices describe, vendors have basically 3 options how to handle this, just one of them is specialized init process. And vendors can have their reasons to choose one of 2 different ways how to handle signals/zombies. From that point of view, I don't see this check as "essential". I'm open to be convinced, but I haven't seen a strong reason so far. As for number of tests: @wavell @agentpoyo @agentpoyo sorry martin, i edited your post by accident when hitting quote reply on my phone. Note to self: that button is dangerous. |
Yeah, sorry, that was the only recent screenshot I had while doing tests prior to the removal. I was only confirming what Watson stated about 10 essential tests passing would pass the certification. |
That answers my question but if a essential test is removed and goes from 19 to 18 or even 17, are we lowering the number of essential "passed" to a lower number as well for one to pass the certification? I think this is what @wavell was detailing about the chances of passing tests to pass certification, not the test itself. If we keep the threshold at 15 but remove a test as "essential", we should consider promoting another test to essential or lower the threshold of 15. |
|
@martin-mat even the authors of this article recognized at the beginning that not all the best practices are applicable. Personally, I prefer to use the |
" And vendors can have their reasons to choose one of 2 different ways how to handle signals/zombies. From that point of view, I don't see this check as "essential"." You are getting hung up on the word "essential". Essential doesn't mean required. It means heavily weighted in this test suite. "I'm open to be convinced, but I haven't seen a strong reason so far." The strong reason is the recommendation from the Google docs et al previously posted. Specialized init system is one of the ways to handle this which is recommended from multiple sources. For these discussions we try to reference best practices from other upstream projects that have authority in the space, in this case a hyperscaler that hosts containers, when driving the test rationale. If you have some references on this subject feel free to add them to the discussion. "As for number of tests: There was a drive to reduce the weighting of microservice tests (specifically the single process type test) because it was a too difficult an area to pass. With multiple ways to increase the hygiene of the microservice (i.e. avoiding lift and shift) adding more pathways for participants to pass in a difficult category reduces the weighting. Again the essentials are about weighting not requirement. . |
Correct me if i'm wrong in my logic, but i think that "specialized_init_system" is not a correct design of a test and it would be better if it was removed (or rather replaced, more on that below). As i see it - best practice of using specialized init system shouldn't be centered around specific names of init systems, but rather around benefits, safeguards and functionality that they provide. If that is correct - why do we test for the init system names, rather then for the actual points of the best practice? By having set list of the "allowed" init systems - we overextend our trust to these systems and possibly undervalue any init system that could be compliant to all the best practices, but is "underweighted" just because it isn't in the list. If there was a certification of some sort for init systems - this test could make more sense and be renamed to "certified_init_system", but, to my knowledge, there is nothing like that. P.S. This is not a call to act. CNTI Certification and CNF-Testsuite have more significant issues to attend to, but to me - it seems that issues like this - can hurt CNTI in a long-term: directly and by having a precedent which could be used in other discussions. |
"As i see it - best practice of using specialized init system shouldn't be centered around specific names of init systems, but rather around benefits, safeguards and functionality that they provide." I cited multiple references where indeed, using specifc independently and openly developed and maintained "specializeed init systems" is recommended. Eg. Dummy-init is maintained by Yelp. Rolling your own is it's own strategy and not best for most, which is why the cited references call it an option. Writing a sophisticated supervisor, let alone a "lite" one is non-trivial and covered in the references and in the community. Please cite a reference for why you say relying on specific, openly developed supervisors is not good to support your argument. I have provided multiple, and can provide more, saying it is a best practice. |
bitnami container images, widely used and well designed, respected cloud-native authority, don't use specialized init system while still handling correctly signals and zombies. |
I just checked their repo, they only have 2 of 335 images (gitlab-runner and jupyter-base-notebook) that uses specialized init system. |
@wavell
I'm not saying that that is not good, but rather pointing out that it shouldn't be the main goal (and subsequently - best practice).
Let's go through the references, that you have provided:
Post about graceful shutdowns for containers. I agree, that this should be a best practice, and this best practice is not tied to specific init systems, we need to check if signals are being correctly trapped and shutdown is handled. Post that contains list of the init systems, that an engineer have considered and found from Twitter for his needs. It's a nice info, but doesn't contribute much to the idea of having few systems as a best practice.
To me - strongest reference, document that contains a list of best practices for building containers:
And that last sentence - totally aligns with my position: I agree with having recommendation to use specific specialized init systems, but i don't agree with having it as a best practice. It should be a way of achieving compliance with the best practices, not a best practice itself. P.S. I think we should be careful when using references to the blog posts as a ground truth. It surely can be good information from knowledgeable person, but i don't think that it should be weighted significantly more then comments from the people out there in the issue. |
You should provide references for what you think.
I see that you skipped over this from the reference:
The engineer is the same engineer that writes for, and is employed by, Google cloud on the other post ... He's not "just an engineer".
Let's quote it "Solution 3: Use a specialized init system As you would in a more classic Linux environment, you can also use an init system to deal with those problems. " It's quoted as a solution.
It's time for you to post your references
See above reference about the "engineer" |
@wavell About the references: I'm saying that your references actually prove my point more than yours. On all pages - usage of specialized init systems is a way to achieve best practice, not a best practice itself. About the Mr. Ahmet Alp Balkan, author of the posts: I'm not undervaluing him as an engineer, I'm only saying that we should be careful by taking opinions of a single person as a ground truth and as a final argument. (Even through in this case - i agree with correctness and usefulness of the references provided). I have little interest in continuing this discussion, as i see a need of third party intervention for us to find common ground on this topic and, in my sight, the problem is not important and critical enough to spend more time debating about it there. |
From the Google's Best practices for building containers, they list 2 problems that are being addressed with a specialized init system:
They list 2 other solutions.
The first solution, handling the signals, does not deal with the second problem of orphaned processes. The special init system handles this issue. The second solution also addresses both problems.
|
Continuing from the document regarding the specialized init system, they have the following.
Tini is recommended by many larger projects, providers and experts which is why it keeps making it to the list. For example docker mentions it all over including in their discussion about the init system in containers here https://github.com/docker-library/official-images#init It is also recommended to use an init system if your container is
It seems like there could be some check for
in which case an init system would not be needed. That said maybe the test runs when there are no child processes but later in the lifecycle a child process starts... how do we know? My understanding of the objection to this test is that it is only checking for a list of validated "good" init systems in this test is that the test is not all-inclusive of init systems that meet the recommendations for a minimal init system that meets the requirements listed above. Sometimes testing functionality, and application qualities/attributes is difficult to implement or the test coverage area is open-ended and ambiguous. That can happen because the current landscape is large or it is actively growing/changing). When that is the case and time/resources/priorties do not support implementing this type of more extensive and open ended testing a short cut is used: find pre-validated options and use those as a reference list of good choices. This is what happened with this test. A few items that the test coverage to allow any init system would include the items previously listed:
I would love to see a set of tests that can let us keep this flexible enforce the best practices for containers. In the meantime if there is an application that does not pass this test, that we care about for passing, then I would like to examine the application and see how we can improve the test or the application. For example if a CNF uses an unlisted open source init system that could be considered minimal and handles the init requirements for a container then we can add that init system to the list. If the init system is closed source and unknown... well I do not know what to do regarding this test when we have no other tests validating if it is minimal (not complex) and handling all requirements. The single process situation should be handled separately from multiple processes IMO |
Reading this, it feels that it's not about whether a specialised init system is a best practice or not (I think everyone in the thread agrees it is), but a measure on the importance of that best practice. It feels like this is a more generic question: how do we guage whether a best practice is essential or not? What are the metrics by which we define a best practice as essential? Feels like once we have such a definition, we can then discuss the original point which wasn't whether this is a best practice or not, but whether it is essential or not. |
Specialized_init_system in my opinion does not deserve to be "essential" for following reasons:
the main reason for having init system inside containers is handling signals and zombies. There are specialized tests already among essential tests for those so this one seem to be redundant. Additionally, specialized init system is just one of ways how to handle signals/zombies.
there is no good universal check for a specialized init system in cnf-testsuite. Current implementation checks for use just of "tini", "dumb-init", "s6-vscan" which is limiting.
I propose to leave it among certification tests, but demote from "essential" to "normal".
The text was updated successfully, but these errors were encountered: