Merge branch 'master' into feat/newrelic-timeout

argoproj · Jul 31, 2024 · 4881cd4 · 4881cd4
2 parents 6643cea + 6c873a9
commit 4881cd4
Show file tree

Hide file tree

Showing 3 changed files with 32 additions and 2 deletions.
diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -1,5 +1,7 @@
 # FAQ
 
+Be sure to read the [Best practices page](../best-practices) as well.
+
 ## General
 
 ### Does Argo Rollouts depend on Argo CD or any other Argo project?

diff --git a/docs/best-practices.md b/docs/best-practices.md
@@ -29,7 +29,23 @@ You should *NOT* use Argo Rollouts for preview/ephemeral environments. For that
 
 The recommended way to use Argo Rollouts is for brief deployments that take 15-20 minutes or maximum 1-2 hours. If you want to run new versions for days or weeks before deciding to promote, then Argo Rollouts is probably not the best solution for you.
 
-Also, if you want to run a wave of multiple versions at the same time (i.e. have 1.1 and 1.2 and 1.3 running at the same time), know that Argo Rollouts was not designed for this scenario.
+Keeping parallel releases for long times, complicates the deployment process a lot and opens several questions where different people have different views on how Argo Rollouts should work.
+
+For example let's say that you are testing for a week version 1.3 as stable and 1.4 as preview.
+Then somebody deploys 1.5
+
+1. Some people believe that the new state should be 1.3 stable and 1.5 as preview
+1. Some people believe that the new state should be 1.4 stable and 1.5 as preview
+
+Currently Argo Rollouts follows the first approach, under the assumption that something was really wrong with 1.4 and 1.5 is the hotfix. 
+
+And then let's say that 1.5 has an issue. Some people believe that Argo rollouts should "rollback" to 1.3 while other people think it should rollback to 1.4
+
+Currently Argo Rollouts assumes that the version to rollback is always 1.3 regardless of how many "hotfixes" have been previewed in-between.
+
+All these problems are not present if you make the assumption that each release stays active only for a minimal time and you always create one new version when the previous one has finished.
+
+Also, if you want to run a wave of multiple versions at the same time (i.e. have 1.1 and 1.2 and 1.3 running at the same time), know that Argo Rollouts was not designed for this scenario. Argo Rollouts always works with the assumption that there is one stable/previous version and one preview/next version.
 
 A version that has just been promoted is assumed to be ready for production and has already passed all your tests (either manual or automated).
 
@@ -41,6 +57,8 @@ While Argo Rollouts supports manual promotions and other manual pauses, these ar
 
 Ideally you should have proper metrics that tell you in 5-15 minutes if a deployment is successful or not. If you don't have those metrics, then you will miss a lot of value from Argo Rollouts.
 
+If you are doing a deployment right now and then have an actual human looking at logs/metrics/traces for the next 2 hours, adopting Argo Rollouts is not going to help you a lot with automated deployments.
+
 Get your [metrics](../features/analysis) in place first and test them with dry-runs before applying them to production deployments.
 
 

diff --git a/pkg/kubectl-argo-rollouts/info/pod_info.go b/pkg/kubectl-argo-rollouts/info/pod_info.go
@@ -53,6 +53,12 @@ func newPodInfo(pod *corev1.Pod) rollout.PodInfo {
 		},
 	}
 	restarts := 0
+	rs := make(map[string]bool, len(pod.Spec.InitContainers))
+	for _, c := range pod.Spec.InitContainers {
+		p := c.RestartPolicy
+		rs[c.Name] = p != nil && *p == corev1.ContainerRestartPolicyAlways
+	}
+
 	totalContainers := len(pod.Spec.Containers)
 	readyContainers := 0
 
@@ -69,7 +75,7 @@ func newPodInfo(pod *corev1.Pod) rollout.PodInfo {
 			continue
 		case container.State.Terminated != nil:
 			// initialization is failed
-			if len(container.State.Terminated.Reason) == 0 {
+			if container.State.Terminated.Reason == "" {
 				if container.State.Terminated.Signal != 0 {
 					reason = fmt.Sprintf("Init:Signal:%d", container.State.Terminated.Signal)
 				} else {
@@ -79,6 +85,10 @@ func newPodInfo(pod *corev1.Pod) rollout.PodInfo {
 				reason = "Init:" + container.State.Terminated.Reason
 			}
 			initializing = true
+		case rs[container.Name] && container.Started != nil && *container.Started:
+			if container.Ready {
+				continue
+			}
 		case container.State.Waiting != nil && len(container.State.Waiting.Reason) > 0 && container.State.Waiting.Reason != "PodInitializing":
 			reason = "Init:" + container.State.Waiting.Reason
 			initializing = true