The Istio open source service mesh, currently being transferred to the CNCF foundation, is based on a couple: the control plane and the data plane. The control plane is called Istiod: it receives and manages metrics, traffic information and microservices configuration certificates via the data plane.
This data plane relies on sidecars, proxies deployed alongside services running on Kubernetes pods. The proxy must intercept incoming and outgoing traffic on the network and calls between services. In this case, the Istio project uses an “extended” version of the Envoy proxy. It uses various functions of this component to discover services, perform load balancing, administer TLS and mTLS flows, retrieve metrics, perform error injection, organize progressive deployments, etc.
However, this architecture has several drawbacks: the microservices and the data plane are not “perfectly” separated. The injection of side-cars at the level of the pods remains “invasive”.
According to the project managers, the Kubernetes pod must be modified and the traffic redirected, resulting in a restart of the pod with each sidecar update. Despite the “lightness” of the sidecar proxy, more CPU and RAM resources should be provisioned per pod to support its workload. Additionally, “traffic capture and HTTP processing, as typically performed by Istio sidecars, is computationally expensive and can break some applications with non-compliant HTTP implementations,” acknowledge the authors. main contributors to the project.
Denis JannotField Engineering Director EMEA, Solo.io
“If they do have an impact on the consumption of resources, this is not the main criticism made of sidecars”, insists Denis Jannot, EMEA field engineering director at Solo.io, to MagIT. “The number 1 problem reported by our customers is operational. Currently, for an application to integrate a service mesh, it must be restarted. However, the service mesh is often administered by the team in charge of the Kubernetes platform. It must agree with the application teams when it is necessary to modify the sidecars”.
The same phenomenon occurs when you have to switch from one version of Istio to another, even if the sidecars and istiod can remain compatible despite the differences in versions.
Decouple L4 and L7 layers
Hence the desire to offer Istio Ambient Mesh. “It’s about offering a new option so that a single control plane can drive supervised applications using sidecars and others without this component. This choice is made at the namespace level”, informs Denis Jannot.
To do without the sidecar, Solo.io and Google propose to separate the functionalities of Istio “into two distinct layers”. “The idea is to separate the L4 and L7 layers”, specifies the field engineering director at Solo.io.
These are related to the OSI (Open System Interconnection) model: it describes the seven layers that a computer system uses to communicate across a network. L4 is the data transport layer through transmission protocols such as TCP and UDP. In Istio, it is used to perform TCP routing, to retrieve TCP logs and metrics, to deploy mTLS tunnels and to manage simple authorizations.
The L7 layer corresponds to that of the application. In Istio, it is used for HTTP routing, load balancing, activating circuit breakers, setting limits, performing disturbance injections in addition to managing advanced authorizations and retrieving HTTP logs, traces and metrics. Until then, all these functionalities related to the data plane depended on the sidecar.
Instead of a sidecar, Solo and Google propose to establish a shared agent – named ztunnel (for zero-trust tunnel) – running on each node of a Kubernetes cluster. This agent, deployed as a daemonset, is devoted to the application of the functions of the L4 layer, and to the connection and authentication of the elements present within the mesh. “This makes it possible to indicate which services can communicate together in a secure manner”, sums up Denis Jannot.
“The node’s network stack redirects all traffic from participating workloads through the local ztunnel agent,” project officials say.
Another advantage is that the management of mTLS encryption and other L4 layer functionalities consume less RAM and CPU resources.
Importantly, the ztunnel agent and related business rules can be updated without affecting the application. Yes, in this case, the data plane is separate from the application. “There are even applications that did not support sidecars well, including Apache Kafka, which will be able to take advantage of it,” boasts Denis Jannot.
The L7 layer is configured at the namespace level via a variant of the Envoy proxy, called Waypoint. Waypoint proxies are Kubernetes pods that can automatically scale to workloads, according to documentation. They are deployed by identity (called service account in Kubernetes) to avoid the need for a multi-tenant L7 proxy.
ztunnels agents communicate with one or more proxies through Istiod. “The Istio control plane configures the cluster’s ztunnels to pass all traffic requiring L7 processing through the Waypoint proxy.”
It would have been easier to deploy one Waypoint proxy per node. “It’s technically easy, but it’s a very bad idea,” warns the field engineering director. “If a proxy manages all the L7 functionalities of a node, that means that with each vulnerability, there is a risk of takeover. While robust, the majority of CVEs reported for Envoy were for L7 processing.
Another point, a proxy per node would cause a problem of “noisy neighbors”. “With Envoy, we have the notion of filters to process HTTP requests,” explains Denis Jannot. “These filters can impact latency. If a lot of services are using the same proxy, we’re going to run into this noisy neighbors problem and potentially impact multiple apps.”
“We are not saying goodbye to sidecars”
Istio’s main contributors had already worked to stabilize and simplify the deployment of this service mesh. So they had collected a set of components in the Istiod monolith, but hadn’t touched Envoy yet.
If it seems to represent a gap with the original philosophy of the project, Ambient Mesh does not sign the end of sidecars. “We are not saying goodbye to sidecars”, points out the field engineering director. “A lot of people won’t understand it, but it’s about bringing the two modes together: sidecar and ambient. It will be possible to have a single service mesh made up of pods with and without sidecars,” he repeats.
Denis JannotField Engineering Director EMEA, Solo.io
This dual approach is not so surprising given the scale of some deployments of Istio in large groups and publishers. Idit Levine, CEO of Solo.io, discusses in a press release the case of some customers who execute 30 billion transactions per day on their existing mesh service. As a reminder, the contributors seek to stabilize Istio, which they claim to have succeeded in doing four years after its launch. The new approach could be confusing. Solo and Google therefore take tweezers.
Nevertheless, the new architecture option seems to be of interest to some Solo customers, notably T-Mobile. “The biggest barrier to service mesh adoption has always been complexity,” said Joe Searcy, technical staff member at T-Mobile in a press release. “The overhead in terms of resources and services associated with managing the service mesh within large enterprises continues to make service mesh adoption difficult, even as projects like Istio strive to reduce complexity,” notes he.
“The possibilities offered by Ambient Mesh are extremely interesting. With better application transparency, fewer moving parts, easier invocation, and huge potential to save IT resources and engineering hours… all I can say is, I agree! “.
According to Google and Solo.io, this combination of ambient and sidecar modes would not introduce additional limitations or security issues. From this point of view, Denis Jannot considers that the two modes have their advantages and their disadvantages. However, the security analysis leans in favor of the new data plane.
“Sidecars are co-located with the workloads they serve, and therefore a vulnerability in one compromises the other,” the project leaders write. “In Ambient Mesh mode, even if an application is compromised, ztunnels and Waypoint proxies can still enforce a strict security policy on that application’s traffic.”
In addition to the greater susceptibility to L7 vulnerabilities, officials say that if the ztunnel agent is shared, its attack surface is less. “As people better understand Ambient Mesh’s security posture, we’re confident that Ambient Mesh will be the preferred mode of Istio deployment, while sidecars will be used for other use cases. specific,” officials say.
An alternative to the Envoy – eBPF duo
In addition to seeking greater simplicity for users, the community behind Istio sees a new trend on the horizon. This aims to build a service mesh based on the eBPF project. This is the claim of Cilium, also a partner of Solo.io.
“The ztunnel agent which manages the L4 layer could be replaced by an equivalent from eBPF”, assumes Denis Jannot. “This is not possible at the L7 layer: even Cilium uses Envoy to manage it”.
“Furthermore, ztunnel outperforms eBPF in providing mTLS and maintaining source identity. EBPF could make it possible to obtain the same functionalities, but it remains technically complex and nobody does it today”, he judges.
While Solo.io promises to automate certain aspects of Ambient Mesh deployment, the framework is only available in a technical preview in version 2.1 of the Gloo Mesh platform. The open source distribution is presented as an “experimental” version that can be tested from GKE and EKS instances. “Even if Ambient Mesh is already well equipped, there are still limitations and the proposal is open to contribution”, suggests Denis Jannot.