Multiple Architectures
Supporting Multiple Architectures
Authors may decide to distribute their bundles for various architectures: x86_64, aarch64, ppc64le, s390x, etc, to accommodate the diversity of Kubernetes clusters and reach a larger number of potential users. Each supported architecture requires compatible binaries. Binary compatibility is based on the platform, which is generally comprised of an operating system, a CPU architecture, and other architecture variants. For this guide, we will treat the CPU architecture as the platform differentiator.
That said, the same general advice would apply to supporting workloads on clusters of the same architecture but different operating systems (e.g. linux/amd64
, windows/amd64
). In this guide, we assume all workloads will be using the linux
operating system.
Fundamentals
The basic principle of supporting multiple architectures is to ensure that each of your operator images is built for each of the architectures to be supported. From there, the images should be hosted in image registries as manifest lists. Finally, you’ll need to update your distribution configuration to set which architectures are supported. This section explains each of these concepts in turn.
Building an Operator for Multiple Architectures
Kubebuilder explains how you can use docker buildx
to build images for multiple architectures. Operator SDK leverages KubeBuilder to ensure that builds can be cross-platform from the start.
Manifest lists
The most straightforward way of building operators and operands supporting multiple architectures is to leverage manifest lists, specified by Image Manifest V2, Schema 2 or OCI Image Index. A manifest list points to specific image manifests for one or more architectures.
For convenience tools like buildah allow to cross-build and manifest multi-arch containers on one host. For instance with buildah:
for a in amd64 arm64 ppc64le s390x; do \
buildah bud --manifest registry/username/repo:v1 --arch $a; \
done
This creates the manifest list, builds each image, and adds them to the manifest list.
The result can then be pushed to the desired registry.
buildah push registry/username/repo:v1
Docker with buildx provides similar capabilities.
docker buildx build --push --platform linux/amd64,linux/arm64,linux/ppc64le,linux/s390x --tag registry/username/repo:v1 .
See docker documentation for additional options.
Caveat: the Dockerfile generated by the SDK for the operator explicitly references GOARCH=amd64 for go build. This can be amended to GOARCH=$TARGETARCH. Docker will automatically set the environment variable to the value specified by –platform. With buildah –build-arg will need to be used for the purpose.
Caveat: When mirroring registries for disconnected installations (environments without internet connection) all the images referenced by a manifest list need to be copied, including images for architectures that may not be used in the environment.
Operator Lifecycle Manager
For operators distributed through the Operator Lifecycle Manager (OLM):
- Bundle images are not architecture-specific. They contain only plaintext Kubernetes manifests and operator metadata.
- All image references in the ClusterServiceVersion should be manifest lists containing the pointers to the image manifests for the supported architectures.
- Labels for OS and architectures can be set in the CSV. Please refer to the Operator Lifecycle Management Documentation for details.
Supporting Clusters with Multi-Architecture Compute Nodes
The Fundamentals above aim to guide authors on the key steps to building and distributing operators that can run on multiple architectures. These instructions are sufficient when your cluster’s compute nodes share the same architecture. However, operator authors should also understand the implications of running their operators in a cluster with multi-architecture compute nodes since it is not always guaranteed that the architectures of the compute nodes will match the architectures supported by the operator.
Safe Scheduling Using Node Affinity
Node affinity is a mechanism exposed in a Kubernetes pod template that allows a PodSpec
author to instruct the scheduler to restrict a pod to run only on (or with a preference for) nodes that meet specific criteria. To ensure that pods are always scheduled to nodes of compatible architecture, it is a best practice for authors to set node affinity requirements to ensure their operators and operands will only schedule to the nodes with architectures available to the images in the pod. If you don’t do this, a container scheduled to an incompatible node will immediately crash with an exec format error
, which will ultimately lead to an ImagePullBackoff
event as the pod is restarted only to crash again with the same error.
Determining the Architectures Supported by an Image
For a given container image, you can check which architectures are supported by listing them by inspecting the manifest. Piping the output to the python json.tool
module enables pretty-printed JSON output.
$ skopeo inspect --raw <image> | python -m 'json.tool'
Here’s an example of architectures distributed for the Alpine Linux container image. Notice that both linux/amd64
and linux/arm64
are supported.
$ skopeo inspect --raw docker://alpine:latest | python -m 'json.tool'
{
"manifests": [
{
"digest": "sha256:c0669ef34cdc14332c0f1ab0c2c01acb91d96014b172f1a76f3a39e63d1f0bda",
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"platform": {
"architecture": "amd64",
"os": "linux"
},
"size": 528
},
...
{
"digest": "sha256:30e6d35703c578ee703230b9dc87ada2ba958c1928615ac8a674fcbbcbb0f281",
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"platform": {
"architecture": "arm64",
"os": "linux",
"variant": "v8"
},
"size": 528
},
...
You can also use docker
to inspect the manifest, but beware that it will return no such manifest
error if the referenced image is not actually a manifest list.
$ docker manifest inspect <image>
If the above commands do not produce output with manifests
, then it may be because the referenced name points to a single image rather than a manifest list. In this case, you can find the architecture by inspecting the image directly.
$ skopeo inspect <image>
This example shows how to inspect a local image to determine the architecture.
$ skopeo inspect docker://alpine
{
"Name": "docker.io/library/alpine",
...
"Architecture": "amd64",
"Os": "linux",
...
}
Alternatively you can pull the image down and inspect it with docker
.
$ docker pull <image> && docker inspect <image>
Setting Node-Affinity Criteria for Operators & Operands
Kubernetes provides a mechanism called nodeAffinity
which can be used to limit the possible node targets where a pod can be scheduled. The following example can be used to update a PodSpec
or PodTemplateSpec
to prevent the scheduling of pods on nodes of incompatible architecture. Here we compare the kubernetes.io/arch
and kubernetes.io/os
keys set on the node to ensure that the values match one of the supported OS/architecture
pairs. This assumes that the referenced image points to a manifest list with references to images for each of the supported architectures. It is important to remember that nodeAffinity
should be set anywhere a container reference is defined, including Pod
, Deployment
, DaemonSet
, StatefulSet
, or any other object that defines a PodSpec
or PodTemplateSpec
.
The list of architecture values should only include the architectures the operator supports. The full list of possible GOARCH
values is available here. It should be noted that Kubernetes allows a user to specify both requiredDuringSchedulingIgnoredDuringExecution
and preferredDuringSchedulingIgnoredDuringExecution
. The syntax for each block is the same. At a minimum, an operator author should set the required terms since they protect against a pod being scheduled on an incompatible node. If the operator in question has much better performance on a subset of the supported architectures, it may also be prudent to set preferredDuringSchedulingIgnoredDuringExecution
so that the optimal arches are selected first if they are available.
Setting Node Affinity in a Kubernetes Manifest
To update the PodSpec
and PodTemplateSpec
objects of an operator and its operands, you will need to scan your operator for instances of these objects. Most of the time, these objects will be defined directly in a Kubernetes manifest yaml
; however, for dynamically created workloads, these are sometimes embedded in the operator’s logic directly.
The most common directory with PodSpec
and PodTemplateSpec
objects that will need to be updated will be in the operator’s config
directory. Additionally, if the operator is configured for OLM, there will likely be a deployment object in the bundle/manifests/*.clusterserviceversion.yaml
.
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- amd64
- arm64
- ppc64le
- s390x
- key: kubernetes.io/os
operator: In
values:
- linux
Setting Node Affinity in Golang
While Ansible and Helm operators will usually define their operators and operands according to the syntax defined above directly in yaml
files or directly in a role or variable template file, Go operators will sometimes embed the logic directly in the operator itself. Here is an example of how a PodSpec
would be updated according to the syntax of the Go API.
Template: corev1.PodTemplateSpec{
...
Spec: corev1.PodSpec{
Affinity: &corev1.Affinity{
NodeAffinity: &corev1.NodeAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
NodeSelectorTerms: []corev1.NodeSelectorTerm{
{
MatchExpressions: []corev1.NodeSelectorRequirement{
{
Key: "kubernetes.io/arch",
Operator: "In",
Values: []string{"amd64"},
},
{
Key: "kubernetes.io/os",
Operator: "In",
Values: []string{"linux"},
},
},
},
},
},
},
},
SecurityContext: &corev1.PodSecurityContext{
...
},
Containers: []corev1.Container{{
...
}},
},
Updating Operator Lifecycle Manager Configurations for Multi-Architecture Compute Nodes
The Operator Lifecycle Manager (OLM) is often used to distribute operators via operator catalogs. The operator deployment object in an OLM integrated operator is defined in the ClusterServiceVersion
yaml. It is important to remember to set the node affinity block in each of the spec.template.spec
under spec.install.deployments
.
If you’re planning to distribute the operator via OLM, you can find more information in the OLM guide for supporting multiarch.
Overriding Affinity for an Operator Pod as a Cluster Admin
A cluster admin might have some context that can be used to refine the scheduling requirements for the operator images specified in the deployment spec of the Cluster Service Version. To ensure the fine control of scheduling remains with the cluster admin, OLM provides a mechanism for overriding the operator pod affinity configuration as part of the Subscription
object.
Caveat: No built-in affinity override controls exist that allow cluster admins to override affinity for operands. It is up to the operator author to determine whether this would be appropriate for one or more of their operands and what configuration options should be allowed.
Validating Your Operator’s Multi-Architecture Readiness
A validator is available to help authors ensure that operators are defined according to the best practices. Because operands can be declared and defined in a variety of ways depending on the language used and structure of the operator, this validator focuses on verifying that the images defined in the CSV deployment spec are compliant with the fundamentals and best practices. It is up to the operator author to ensure that affinity best practices are also being followed for each operand’s PodSpec
and PodTemplateSpec
definitions.
$ operator-sdk bundle validate ./bundle --select-optional name=multiarch