Under the Hood: How Argo Rollouts 1.8 Implements Canary Deployments with Kubernetes 1.33 and Prometheus 3.1
Under the Hood: How Argo Rollouts 1.8 Implements Canary Deployments with Kubernetes 1.33 and Prometheus 3.1 Canary deployments remain a gold standard for risk-free application rollouts, allowing teams to shift a small percentage of traffic to a new version before full cutover. Argo Rollouts 1.8, released alongside Kubernetes 1.33 and Prometheus 3.1, introduces critical under-the-hood optimizations to streamline this workflow. This article breaks down the integration, architecture, and technical implementation details of this stack. Prerequisites and Stack Compatibility Argo Rollouts 1.8 is purpose-built to leverage Kubernetes 1.33’s enhanced workload APIs, including stable support for Deployment and ReplicaSet lifecycle hooks, plus Prometheus 3.1’s native histogram metrics for low-latency canary analysis. Key compatibility notes: Kubernetes 1.33+ is required for Argo Rollouts’ new Rollout controller admission webhooks, which validate canary configuration at creation time. Prometheus 3.1’s prometheus-operator v0.70+ integration enables automatic metric scraping for canary analysis rules. Argo Rollouts 1.8 drops support for Kubernetes versions below 1.28, aligning with upstream Kubernetes deprecation policies. Argo Rollouts 1.8 Canary Architecture The core Argo Rollouts 1.8 canary workflow relies on three components, updated for K8s 1.33 and Prometheus 3.1: Rollout Controller: Watches Rollout custom resources (CRs), manages canary ReplicaSet creation, and updates Kubernetes Service and Ingress objects to split traffic. Analysis Controller: Queries Prometheus 3.1 for canary health metrics, evaluates analysis templates, and signals the Rollout Controller to progress or abort the canary. Metrics Server: Aggregates real-time traffic and error rate metrics from K8s 1.33’s kube-proxy and Prometheus 3.1 exporters. Under-the-Hood Traffic Splitting with Kubernetes 1.33 Kubernetes 1.33 introduces stable support for Service traffic policy enhancements, which Argo Rollouts 1.8 uses to implement canary traffic splitting without third-party service meshes (though mesh integration is still supported). The workflow: When a Rollout CR is updated with a new container image, the Rollout Controller: Creates a canary ReplicaSet with the new image, scaled to 0 replicas initially. Updates the primary Service selector to include a rollout.argoproj.io/canary: "true" label for canary pods, and rollout.argoproj.io/stable: "true" for stable pods. Uses K8s 1.33’s EndpointSlice API to split traffic between stable and canary EndpointSlice objects based on the canary percentage defined in the Rollout spec. Example Rollout traffic splitting snippet: apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: canary-demo spec: replicas: 10 strategy: canary: steps: - setWeight: 10 - pause: {duration: 5m} - setWeight: 50 - pause: {duration: 10m} - setWeight: 100 trafficRouting: kubernetes: service: canary-demo-svc ingress: name: canary-demo-ingress selector: matchLabels: app: canary-demo template: metadata: labels: app: canary-demo spec: containers: - name: demo-app image: demo-app:v2.0.0 ports: - containerPort: 8080 Prometheus 3.1 Integration for Canary Analysis Argo Rollouts 1.8 leverages Prometheus 3.1’s native histogram and exponential bucket metrics to evaluate canary health with lower query latency than previous versions. The Analysis Controller polls Prometheus 3.1 at configurable intervals using the PrometheusQuery API, then compares results against user-defined success thresholds. Example AnalysisTemplate for Prometheus 3.1: apiVersion: argoproj.io/v1alpha1 kind: AnalysisTemplate metadata: name: prometheus-canary-analysis spec: metrics: - name: error-rate successCondition: result[0] < 0.01 failureCondition: result[0] > 0.05 provider: prometheus: address: http://prometheus.istio-system.svc:9090 query: | sum(rate(http_requests_total{app="canary-demo", status=~"5.."}[5m])) / sum(rate(http_requests_total{app="canary-demo"}[5m])) Prometheus 3.1’s new remote_write optimizations reduce metric lag to under 1 second, ensuring Argo Rollouts 1.8 can make canary progression decisions in near real-time. Key Optimizations in Argo Rollouts 1.8 Beyond K8s 1.33 and Prometheus 3.1 integration, Argo Rollouts 1.8 includes under-the-hood improvements: Reduced Rollout Controller memory usage by 30% via K8s 1.33’s shared informer cache optimizations. Native support for Prometheus 3.1’s exemplar metrics, enabling trace-to-metric correlation for canary debugging. Improved canary abort logic: if Prometheus 3.1 reports a threshold breach, the Rollout Controller automatically scales down the canary ReplicaSet and restores 100% traffic to the stable version within 2 seconds. Conclusion Argo Rollouts 1.8, paired with Kubernetes 1.33 and Prometheus 3.1, delivers a robust, low-latency canary deployment workflow without relying on complex service mesh configurations. The tight integration with K8s 1.33’s traffic routing APIs and Prometheus 3.1’s high-performance metrics engine makes it an ideal choice for teams running production Kubernetes workloads. For full release notes, refer to the Argo Rollouts 1.8 changelog.
Loading comments…