Multiple Architectures

Supporting Multiple Architectures

Authors may decide to distribute their bundles for various architectures: x86_64, aarch64, ppc64le, s390x, etc, to accommodate the diversity of Kubernetes clusters and reach a larger number of potential users. Each supported architecture requires compatible binaries. Binary compatibility is based on the platform, which is generally comprised of an operating system, a CPU architecture, and other architecture variants. For this guide, we will treat the CPU architecture as the platform differentiator.

That said, the same general advice would apply to supporting workloads on clusters of the same architecture but different operating systems (e.g. linux/amd64, windows/amd64). In this guide, we assume all workloads will be using the linux operating system.

Fundamentals

The basic principle of supporting multiple architectures is to ensure that each of your operator images is built for each of the architectures to be supported. From there, the images should be hosted in image registries as manifest lists. Finally, you’ll need to update your distribution configuration to set which architectures are supported. This section explains each of these concepts in turn.

Building an Operator for Multiple Architectures

Kubebuilder explains how you can use docker buildx to build images for multiple architectures. Operator SDK leverages KubeBuilder to ensure that builds can be cross-platform from the start.

Manifest lists

The most straightforward way of building operators and operands supporting multiple architectures is to leverage manifest lists, specified by Image Manifest V2, Schema 2 or OCI Image Index. A manifest list points to specific image manifests for one or more architectures.

For convenience tools like buildah allow to cross-build and manifest multi-arch containers on one host. For instance with buildah:

for a in amd64 arm64 ppc64le s390x; do \
  buildah bud --manifest registry/username/repo:v1 --arch $a; \
done

This creates the manifest list, builds each image, and adds them to the manifest list.

The result can then be pushed to the desired registry.

buildah push registry/username/repo:v1

Docker with buildx provides similar capabilities.

docker buildx build --push --platform linux/amd64,linux/arm64,linux/ppc64le,linux/s390x --tag registry/username/repo:v1 .

See docker documentation for additional options.

Caveat: the Dockerfile generated by the SDK for the operator explicitly references GOARCH=amd64 for go build. This can be amended to GOARCH=$TARGETARCH. Docker will automatically set the environment variable to the value specified by –platform. With buildah –build-arg will need to be used for the purpose.

Caveat: When mirroring registries for disconnected installations (environments without internet connection) all the images referenced by a manifest list need to be copied, including images for architectures that may not be used in the environment.

Operator Lifecycle Manager

For operators distributed through the Operator Lifecycle Manager (OLM):

Bundle images are not architecture-specific. They contain only plaintext Kubernetes manifests and operator metadata.
All image references in the ClusterServiceVersion should be manifest lists containing the pointers to the image manifests for the supported architectures.
Labels for OS and architectures can be set in the CSV. Please refer to the Operator Lifecycle Management Documentation for details.

Supporting Clusters with Multi-Architecture Compute Nodes

The Fundamentals above aim to guide authors on the key steps to building and distributing operators that can run on multiple architectures. These instructions are sufficient when your cluster’s compute nodes share the same architecture. However, operator authors should also understand the implications of running their operators in a cluster with multi-architecture compute nodes since it is not always guaranteed that the architectures of the compute nodes will match the architectures supported by the operator.

Safe Scheduling Using Node Affinity

Node affinity is a mechanism exposed in a Kubernetes pod template that allows a PodSpec author to instruct the scheduler to restrict a pod to run only on (or with a preference for) nodes that meet specific criteria. To ensure that pods are always scheduled to nodes of compatible architecture, it is a best practice for authors to set node affinity requirements to ensure their operators and operands will only schedule to the nodes with architectures available to the images in the pod. If you don’t do this, a container scheduled to an incompatible node will immediately crash with an exec format error, which will ultimately lead to an ImagePullBackoff event as the pod is restarted only to crash again with the same error.

Determining the Architectures Supported by an Image

For a given container image, you can check which architectures are supported by listing them by inspecting the manifest. Piping the output to the python json.tool module enables pretty-printed JSON output.

$ skopeo inspect --raw <image> | python -m 'json.tool'

Here’s an example of architectures distributed for the Alpine Linux container image. Notice that both linux/amd64 and linux/arm64 are supported.

$ skopeo inspect --raw docker://alpine:latest | python -m 'json.tool'
{
    "manifests": [
        {
            "digest": "sha256:c0669ef34cdc14332c0f1ab0c2c01acb91d96014b172f1a76f3a39e63d1f0bda",
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "platform": {
                "architecture": "amd64",
                "os": "linux"
            },
            "size": 528
        },
        ...
        {
            "digest": "sha256:30e6d35703c578ee703230b9dc87ada2ba958c1928615ac8a674fcbbcbb0f281",
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "platform": {
                "architecture": "arm64",
                "os": "linux",
                "variant": "v8"
            },
            "size": 528
        },
      ...

You can also use docker to inspect the manifest, but beware that it will return no such manifest error if the referenced image is not actually a manifest list.

$ docker manifest inspect <image>

If the above commands do not produce output with manifests, then it may be because the referenced name points to a single image rather than a manifest list. In this case, you can find the architecture by inspecting the image directly.

$ skopeo inspect <image>

This example shows how to inspect a local image to determine the architecture.

$ skopeo inspect docker://alpine
{
    "Name": "docker.io/library/alpine",
    ...
    "Architecture": "amd64",
    "Os": "linux",
    ...
}

Alternatively you can pull the image down and inspect it with docker.

$ docker pull <image> && docker inspect <image>

Setting Node-Affinity Criteria for Operators & Operands

Kubernetes provides a mechanism called nodeAffinity which can be used to limit the possible node targets where a pod can be scheduled. The following example can be used to update a PodSpec or PodTemplateSpec to prevent the scheduling of pods on nodes of incompatible architecture. Here we compare the kubernetes.io/arch and kubernetes.io/os keys set on the node to ensure that the values match one of the supported OS/architecture pairs. This assumes that the referenced image points to a manifest list with references to images for each of the supported architectures. It is important to remember that nodeAffinity should be set anywhere a container reference is defined, including Pod, Deployment, DaemonSet, StatefulSet, or any other object that defines a PodSpec or PodTemplateSpec.

The list of architecture values should only include the architectures the operator supports. The full list of possible GOARCH values is available here. It should be noted that Kubernetes allows a user to specify both requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution. The syntax for each block is the same. At a minimum, an operator author should set the required terms since they protect against a pod being scheduled on an incompatible node. If the operator in question has much better performance on a subset of the supported architectures, it may also be prudent to set preferredDuringSchedulingIgnoredDuringExecution so that the optimal arches are selected first if they are available.

Setting Node Affinity in a Kubernetes Manifest

To update the PodSpec and PodTemplateSpec objects of an operator and its operands, you will need to scan your operator for instances of these objects. Most of the time, these objects will be defined directly in a Kubernetes manifest yaml; however, for dynamically created workloads, these are sometimes embedded in the operator’s logic directly.

The most common directory with PodSpec and PodTemplateSpec objects that will need to be updated will be in the operator’s config directory. Additionally, if the operator is configured for OLM, there will likely be a deployment object in the bundle/manifests/*.clusterserviceversion.yaml.

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/arch
          operator: In
          values:
          - amd64
          - arm64
          - ppc64le
          - s390x
        - key: kubernetes.io/os
            operator: In
            values:
              - linux

Setting Node Affinity in Golang

While Ansible and Helm operators will usually define their operators and operands according to the syntax defined above directly in yaml files or directly in a role or variable template file, Go operators will sometimes embed the logic directly in the operator itself. Here is an example of how a PodSpec would be updated according to the syntax of the Go API.

Template: corev1.PodTemplateSpec{
    ...
    Spec: corev1.PodSpec{
        Affinity: &corev1.Affinity{
            NodeAffinity: &corev1.NodeAffinity{
                RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
                    NodeSelectorTerms: []corev1.NodeSelectorTerm{
                        {
                            MatchExpressions: []corev1.NodeSelectorRequirement{
                                {
                                    Key:      "kubernetes.io/arch",
                                    Operator: "In",
                                    Values:   []string{"amd64"},
                                },
                                {
                                    Key:      "kubernetes.io/os",
                                    Operator: "In",
                                    Values:   []string{"linux"},
                                },
                            },
                        },
                    },
                },
            },
        },
        SecurityContext: &corev1.PodSecurityContext{
            ...
        },
        Containers: []corev1.Container{{
            ...
        }},
    },

Updating Operator Lifecycle Manager Configurations for Multi-Architecture Compute Nodes

The Operator Lifecycle Manager (OLM) is often used to distribute operators via operator catalogs. The operator deployment object in an OLM integrated operator is defined in the ClusterServiceVersion yaml. It is important to remember to set the node affinity block in each of the spec.template.spec under spec.install.deployments.

If you’re planning to distribute the operator via OLM, you can find more information in the OLM guide for supporting multiarch.

Overriding Affinity for an Operator Pod as a Cluster Admin

A cluster admin might have some context that can be used to refine the scheduling requirements for the operator images specified in the deployment spec of the Cluster Service Version. To ensure the fine control of scheduling remains with the cluster admin, OLM provides a mechanism for overriding the operator pod affinity configuration as part of the Subscription object.

Caveat: No built-in affinity override controls exist that allow cluster admins to override affinity for operands. It is up to the operator author to determine whether this would be appropriate for one or more of their operands and what configuration options should be allowed.

Validating Your Operator’s Multi-Architecture Readiness

A validator is available to help authors ensure that operators are defined according to the best practices. Because operands can be declared and defined in a variety of ways depending on the language used and structure of the operator, this validator focuses on verifying that the images defined in the CSV deployment spec are compliant with the fundamentals and best practices. It is up to the operator author to ensure that affinity best practices are also being followed for each operand’s PodSpec and PodTemplateSpec definitions.

$ operator-sdk bundle validate ./bundle --select-optional name=multiarch

Last modified August 8, 2024: fixing syslist.go link in website (#6808) (5d9a9203)