1. Installing kubeadm and container runtime

  • A compatible Linux host. The Kubernetes project provides generic instructions for Linux distributions based on Debian and Red Hat, and those distributions without a package manager. [2]

  • 2 GB or more of RAM per machine (any less will leave little room for your apps), 2 CPUs or more.

    You can also use the --ignore-preflight-errors=NumCPU,Mem flag to ignore the preflight error when kubeadm init or kubeadm join a node.

  • Full network connectivity between all machines in the cluster (public or private network is fine).

    kubeadm similarly to other Kubernetes components tries to find a usable IP on the network interfaces associated with a default gateway on a host. Such an IP is then used for the advertising and/or listening performed by a component. [1]

    To find out what this IP is on a Linux host you can use:

    ip route show # Look for a line starting with "default via"
  • Unique hostname, MAC address, and product_uuid for every node.

    You can get the MAC address of the network interfaces using the command ip link or ifconfig -a

    The product_uuid can be checked by using the command sudo cat /sys/class/dmi/id/product_uuid

  • Certain ports are open on your machines.

    Table 1. Control plane

    Protocol

    Direction

    Port Range

    Purpose

    Used By

    TC

    Inbound

    6443

    Kubernetes API server

    All

    TCP

    Inbound

    2379-2380

    etcd server client API

    kube-apiserver, etcd

    TCP

    Inbound

    10250

    Kubelet API

    Self, Control plane

    TCP

    Inbound

    10259

    kube-scheduler

    Self

    TCP

    Inbound

    10257

    kube-controller-manager

    Self

    Table 2. Worker node(s)

    Protocol

    Direction

    Port Range

    Purpose

    Used By

    TCP

    Inbound

    10250

    Kubelet API

    Self, Control plane

    TCP

    Inbound

    30000-32767

    NodePort Services

    All

    These required ports need to be open in order for Kubernetes components to communicate with each other. The pod network plugin you use may also require certain ports to be open.

  • Swap disabled. You MUST disable swap in order for the kubelet to work properly.

    To check swap status, use: [3]

    swapon --show

    Or to show physical memory as well as swap usage:

    free -h

1.1. Installing a container runtime

To run containers in Pods, Kubernetes uses a container runtime.

By default, Kubernetes uses the Container Runtime Interface (CRI) to interface with your chosen container runtime.

If you don’t specify a runtime, kubeadm automatically tries to detect an installed container runtime by scanning through a list of known endpoints. [2]

Table 3. Known endpoints for Linux supported operating systems

Runtime

Path to Unix domain socket

containerd

unix:///var/run/containerd/containerd.sock

CRI-O

unix:///var/run/crio/crio.sock

Docker Engine (using cri-dockerd)

unix:///var/run/cri-dockerd.sock

If multiple or no container runtimes are detected kubeadm will throw an error and will request that you specify which one you want to use.

Docker Engine does not implement the CRI which is a requirement for a container runtime to work with Kubernetes. For that reason, an additional service cri-dockerd has to be installed. cri-dockerd is a project based on the legacy built-in Docker Engine support that was removed from the kubelet in version 1.24.

Kubernetes 1.26 defaults to using v1 of the CRI API. If a container runtime does not support the v1 API, the kubelet falls back to using the (deprecated) v1alpha2 API instead. [4]

// Show the details of the `cri` plugin on an existed containerd using `ctr`
$ sudo ctr plugins ls -d id==cri
Type:          io.containerd.grpc.v1
ID:            cri
Requires:
               io.containerd.event.v1
               io.containerd.service.v1
               io.containerd.warning.v1
Platforms:     linux/amd64
Exports:
               CRIVersion           v1
               CRIVersionAlpha      v1alpha2

1.1.1. Forwarding IPv4 and letting iptables see bridged traffic

Verify that the br_netfilter module is loaded by running lsmod | grep br_netfilter.

To load it explicitly, run sudo modprobe br_netfilter.

In order for a Linux node’s iptables to correctly view bridged traffic, verify that net.bridge.bridge-nf-call-iptables is set to 1 in your sysctl config. For example:

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system

# Verify that the `br_netfilter`, `overlay` modules are loaded
lsmod | grep br_netfilter
lsmod | grep overlay

# Verify that the
#   `net.bridge.bridge-nf-call-iptables`, `net.bridge.bridge-nf-call-ip6tables`, and `net.ipv4.ip_forward`
#   system variables are set to `1`
sudo sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward

1.1.2. Cgroup drivers

Both kubelet and the underlying container runtime need to interface with control groups to enforce resource management for pods and containers and set resources such as cpu/memory requests and limits.

It’s critical that the kubelet and the container runtime uses the same cgroup driver and are configured the same. [4]

The cgroupfs driver is NOT recommended when systemd is the init system because systemd expects a single cgroup manager on the system.

Starting with v1.22 and later, when creating a cluster with kubeadm, if the user does not set the cgroupDriver field under KubeletConfiguration, kubeadm defaults it to systemd.

Check the Cgroup driver of the kubelet in the cluster-level of an existed cluster:

$ kubectl get -n kube-system cm kubelet-config -oyaml | grep cgroupDriver
    cgroupDriver: systemd

1.1.3. Containerd

Follow the instructions for getting started with containerd.

For more information about Cgroups, see Linux CGroups and Containers.

For more information about containerd, see RUNC CONTAINERD CRI DOCKERSHIM.

In the containerd config /etc/containerd/config.toml:

  • To use the systemd cgroup driver:

    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
      SystemdCgroup = true
  • To overwrite the sandbox (pause) image:

    [plugins."io.containerd.grpc.v1.cri"]
      sandbox_image = "registry.k8s.io/pause:3.2"
    Please note, that it is a best practice for kubelet to declare the matching pod-infra-container-image. If not configured, kubelet may attempt to garbage collect the pause image.
  • Find or overwrite the settings for persistent and runtime storage locations as well as grpc, debug, and metrics addresses for the various APIs.

    #root = "/var/lib/containerd"
    #state = "/run/containerd"
  • Check the CRI integration plugin status.

    $ sudo ctr plugin ls id==cri
    TYPE                     ID     PLATFORMS      STATUS
    io.containerd.grpc.v1    cri    linux/amd64    ok
  • Check the systemd driver status using crictl.

    $ sudo crictl info -o go-template --template '{{.config.containerd.runtimes.runc.options.SystemdCgroup}}'
    true

1.2. Installing kubeadm, kubelet and kubectl

Note: The legacy package repositories (apt.kubernetes.io and yum.kubernetes.io) have been deprecated and frozen starting from September 13, 2023. Using the new package repositories hosted at pkgs.k8s.io is strongly recommended and required in order to install Kubernetes versions released after September 13, 2023. The deprecated legacy repositories, and their contents, might be removed at any time in the future and without a further notice period. The new package repositories provide downloads for Kubernetes versions starting with v1.24.0. [2]
  • Debian-based distributions

    sudo apt-get update && sudo apt-get install -y apt-transport-https ca-certificates curl
    curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.29/deb/Release.key \
        | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg (1)
    echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.26/deb/ /' \
        | sudo tee /etc/apt/sources.list.d/kubernetes.list (2)
    sudo apt-get update
    sudo apt-get install -y kubelet kubeadm kubectl (3)
    sudo apt-mark hold kubelet kubeadm kubectl
    1 Download the public signing key for the Kubernetes package repositories. The same signing key is used for all repositories so you can disregard the version in the URL.
    2 Please NOTE that this repository have packages only for Kubernetes 1.26; for other Kubernetes minor versions, you need to change the Kubernetes minor version in the URL to match your desired minor version. Such as:
    deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /
    deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /
    deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.27/deb/ /
    deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.26/deb/ /
    3 You can also specify the installing package version:
    $ apt-cache madison kubeadm | head -n 5
       kubeadm | 1.26.4-1.1 | https://pkgs.k8s.io/core:/stable:/v1.26/deb  Packages
       kubeadm | 1.26.3-1.1 | https://pkgs.k8s.io/core:/stable:/v1.26/deb  Packages
       kubeadm | 1.26.2-1.1 | https://pkgs.k8s.io/core:/stable:/v1.26/deb  Packages
       kubeadm | 1.26.1-1.1 | https://pkgs.k8s.io/core:/stable:/v1.26/deb  Packages
       kubeadm | 1.26.0-2.1 | https://pkgs.k8s.io/core:/stable:/v1.26/deb  Packages
    
    $ sudo apt-get install -y kubelet=1.26.0-2.1 kubeadm=1.26.0-2.1 kubectl=1.26.0-2.1

    Output shell completion code for the specified shell (bash or zsh).

    # Install the bash-completion framework
    sudo apt-get install -y bash-completion
    
    # Output bash completion
    sudo sh -c 'kubeadm completion bash > /etc/bash_completion.d/kubeadm'
    sudo sh -c 'kubectl completion bash > /etc/bash_completion.d/kubectl'
    sudo sh -c 'crictl completion > /etc/bash_completion.d/crictl'
    
    # Load the completion code for bash into the current shell
    source /etc/bash_completion
  • Red Hat-based distributions

    # This overwrites any existing configuration in /etc/yum.repos.d/kubernetes.repo
    cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
    [kubernetes]
    name=Kubernetes
    baseurl=https://pkgs.k8s.io/core:/stable:/v1.26/rpm/
    enabled=1
    gpgcheck=1
    gpgkey=https://pkgs.k8s.io/core:/stable:/v1.26/rpm/repodata/repomd.xml.key
    exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni (1)
    EOF
    
    # Set SELinux in permissive mode (effectively disabling it) (2)
    sudo setenforce 0
    sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
    
    sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes (3)
    
    sudo systemctl enable --now kubelet
    1 The exclude parameter in the repository definition ensures that the packages related to Kubernetes are not upgraded upon running yum update as there’s a special procedure that must be followed for upgrading Kubernetes.

    Please NOTE that this repository have packages only for Kubernetes 1.26; for other Kubernetes minor versions, you need to change the Kubernetes minor version in the URL to match your desired minor version.

    2 Setting SELinux in permissive mode by running setenforce 0 and sed …​ effectively disables it. This is required to allow containers to access the host filesystem, which is needed by pod networks for example. You have to do this until SELinux support is improved in the kubelet.

    You can leave SELinux enabled if you know how to configure it but it may require settings that are not supported by kubeadm.

    3 You can also specify the installing package version:
    $ yum --showduplicates --disableexcludes=kubernetes list kubeadm | tail -n 5
    kubeadm.x86_64                   1.26.0-150500.2.1                    kubernetes
    kubeadm.x86_64                   1.26.1-150500.1.1                    kubernetes
    kubeadm.x86_64                   1.26.2-150500.1.1                    kubernetes
    kubeadm.x86_64                   1.26.3-150500.1.1                    kubernetes
    kubeadm.x86_64                   1.26.4-150500.1.1                    kubernetes
    
    $ sudo yum --disableexcludes=kubernetes install kubelet-1.26.0-150500.2.1 kubeadm-1.26.0-150500.2.1 kubectl-1.26.0-150500.2.1

    Output shell completion code for the specified shell (bash or zsh).

    # Install the bash-completion framework
    sudo yum install -y bash-completion
    
    # Output bash completion
    sudo sh -c 'kubeadm completion bash > /etc/bash_completion.d/kubeadm'
    sudo sh -c 'kubectl completion bash > /etc/bash_completion.d/kubectl'
    sudo sh -c 'crictl completion > /etc/bash_completion.d/crictl'
    
    # Load the completion code for bash into the current shell
    source /usr/share/bash-completion/bash_completion
You may need to set the runtime endpoint of the crictl explicity, such as sudo crictl config --set runtime-endpoint=unix:///run/containerd/containerd.sock.

Consider enabling the containerd snapshotters feature on Docker Engine.

{
  "features": {
    "containerd-snapshotter": true
  }
}

You can also specify the cgroup driver to systemd on Docker.

{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "features": {
    "containerd-snapshotter": true
  }
}

2. Creating a cluster with kubeadm

Kubeadm has commands that can help you pre-pull the required images when creating a cluster without an internet connection on its nodes.

You can list and pull the images using the kubeadm config images sub-command:

kubeadm config images list # [--kubernetes-version=v1.26.0] [--image-repository=registry.k8s.io]
kubeadm config images pull # [--kubernetes-version=v1.26.0] [--image-repository=registry.k8s.io]

Kubeadm allows you to use a custom image repository for the required images. For example:

sudo kubeadm config images pull \
  --kubernetes-version=v1.26.0 \
  --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers

You can use ctr to retag the images in the k8s.io namespace back to the default repository registry.k8s.io:

#!/bin/sh
kubernetes_version=v1.26.0
image_repository=registry.cn-hangzhou.aliyuncs.com/google_containers
images=$(kubeadm config images list \
    --kubernetes-version $kubernetes_version \
    --image-repository $image_repository)

for i in $images; do
    case "$i" in
        *coredns*)
            new_repo="registry.k8s.io/coredns"
            ;;
        *)
            new_repo="registry.k8s.io"
            ;;
    esac
    newtag=$(echo "$i" | sed "s@$image_repository@$new_repo@")
    sudo ctr -n k8s.io images tag $i $newtag
done

Or, remove these images by using crictl:

sudo crictl images | \
    grep registry.cn-hangzhou.aliyuncs.com/google_containers | \
    awk '{print $1":"$2}' | \
    xargs sudo crictl rmi

You can also override the image repository behavior of the kubeadm init by using kubeadm with a configuration file.

# Run `kubeadm config print init-defaults` to see the default Init configuration.
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
imageRepository: registry.k8s.io

2.1. Customizing components with the kubeadm API

The preferred way to configure kubeadm is to pass an YAML configuration file with the --config option. A kubeadm config file could contain multiple configuration types separated using three dashes (---).

apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: JoinConfiguration

2.1.1. Customizing the control plane with flags in ClusterConfiguration

The kubeadm ClusterConfiguration object exposes a way for users to override the default flags passed to control plane components such as the APIServer, ControllerManager, Scheduler and Etcd. [6]

apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
apiServer:
  timeoutForControlPlane: 4m0s
controllerManager: {}
scheduler: {}
etcd:
  local:
    dataDir: /var/lib/etcd
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
dns: {}
imageRepository: registry.k8s.io
kubernetesVersion: 1.26.0
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes

2.1.2. Customizing with patches

Kubeadm allows you to pass a directory with patch files to InitConfiguration and JoinConfiguration on individual nodes. These patches can be used as the last customization step before component configuration is written to disk.

apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
patches:
  directory: /home/user/somedir
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: JoinConfiguration
patches:
  directory: /home/user/somedir

2.1.3. Customizing the kubelet

Some kubelet configuration details need to be the same across all kubelets involved in the cluster, while other configuration aspects need to be set on a per-kubelet basis to accommodate the different characteristics of a given machine (such as OS, storage, and networking). [7]

2.1.3.1. Kubelet configuration patterns
  • Propagating cluster-level configuration to each kubelet

    You can provide the kubelet with default values to be used by kubeadm init and kubeadm join commands. Interesting examples include using a different container runtime or setting the default subnet used by services.

    If you want your services to use the subnet 10.96.0.0/12 as the default for services, you can pass the --service-cidr parameter to kubeadm:

    kubeadm init --service-cidr 10.96.0.0/12

    The kubelet provides a versioned, structured API object that can configure most parameters in the kubelet and push out this configuration to each running kubelet in the cluster, called KubeletConfiguration, and can be passed to kubeadm init and kubeadm will apply the same base KubeletConfiguration to all nodes in the cluster.

    kind: ClusterConfiguration
    apiVersion: kubeadm.k8s.io/v1beta3
    ---
    apiVersion: kubelet.config.k8s.io/v1beta1
    kind: KubeletConfiguration
    clusterDNS:
    - 10.96.0.10
    cgroupDriver: systemd
  • Providing instance-specific configuration details

    Some hosts require specific kubelet configurations due to differences in hardware, operating system, networking, or other host-specific parameters. The following list provides a few examples.

    • The path to the DNS resolution file, as specified by the --resolv-conf kubelet configuration flag, may differ among operating systems, or depending on whether you are using systemd-resolved. If this path is wrong, DNS resolution will fail on the Node whose kubelet is configured incorrectly.

    • The Node API object .metadata.name is set to the machine’s hostname by default, unless you are using a cloud provider. You can use the --hostname-override flag to override the default behavior if you need to specify a Node name different from the machine’s hostname.

    • Currently, the kubelet cannot automatically detect the cgroup driver used by the container runtime, but the value of --cgroup-driver must match the cgroup driver used by the container runtime to ensure the health of the kubelet.

    • To specify the container runtime you must set its endpoint with the --container-runtime-endpoint=<path> flag.

    The recommended way of applying such instance-specific configuration is by using KubeletConfiguration patches.

2.1.3.2. Configure kubelets using kubeadm

When you call kubeadm init, the kubelet configuration is marshalled to disk at /var/lib/kubelet/config.yaml, and also uploaded to a kubelet-config ConfigMap in the kube-system namespace of the cluster.

To address the second pattern of providing instance-specific configuration details, kubeadm writes an environment file to /var/lib/kubelet/kubeadm-flags.env, which contains a list of flags to pass to the kubelet when it starts. The flags are presented in the file like this:

KUBELET_KUBEADM_ARGS="--flag1=value1 --flag2=value2 ..."

In addition to the flags used when starting the kubelet, the file also contains dynamic parameters such as the cgroup driver and whether to use a different container runtime socket (--cri-socket).

When you run kubeadm join, kubeadm uses the Bootstrap Token credential to perform a TLS bootstrap, which fetches the credential needed to download the kubelet-config ConfigMap and writes it to /var/lib/kubelet/config.yaml. The dynamic environment file is generated in exactly the same way as kubeadm init.

2.1.3.3. The kubelet drop-in file for systemd

kubeadm ships with configuration for how systemd should run the kubelet [7], written to /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and is used by systemd. For example:

[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generate at runtime, populating
# the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably,
# the user should use the .NodeRegistration.KubeletExtraArgs object in the configuration files instead.
# KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

This file specifies the default locations for all of the files managed by kubeadm for the kubelet.

  • The KubeConfig file to use for the TLS Bootstrap is /etc/kubernetes/bootstrap-kubelet.conf, but it is only used if /etc/kubernetes/kubelet.conf does not exist.

  • The KubeConfig file with the unique kubelet identity is /etc/kubernetes/kubelet.conf.

  • The file containing the kubelet’s ComponentConfig is /var/lib/kubelet/config.yaml.

  • The dynamic environment file that contains KUBELET_KUBEADM_ARGS is sourced from /var/lib/kubelet/kubeadm-flags.env.

  • The file that can contain user-specified flag overrides with KUBELET_EXTRA_ARGS is sourced from /etc/default/kubelet (for DEBs), or /etc/sysconfig/kubelet (for RPMs). KUBELET_EXTRA_ARGS is last in the flag chain and has the highest priority in the event of conflicting settings.

2.1.3.4. Configurations for local ephemeral storage

Nodes have local ephemeral storage, backed by locally-attached writeable devices or, sometimes, by RAM. [8] [9]

Pods use ephemeral local storage for scratch space, caching, and for logs. The kubelet can provide scratch space to Pods using local ephemeral storage to mount emptyDir volumes into containers.

The kubelet also uses this kind of storage to hold node-level container logs, container images, and the writable layers of running containers.

Note: The kubelet tracks tmpfs emptyDir volumes as container memory use, rather than as local ephemeral storage.
Note: The kubelet will only track the root filesystem for ephemeral storage. OS layouts that mount a separate disk to /var/lib/kubelet or /var/lib/containers will not report ephemeral storage correctly.
The kubelet writes logs to files inside its configured log directory (/var/log by default); and has a base directory for other locally stored data (/var/lib/kubelet by default).

The kubelet recognizes two specific filesystem identifiers: [10]

  • nodefs: The node’s main filesystem, used for local disk volumes, emptyDir volumes not backed by memory, log storage, and more. For example, nodefs contains /var/lib/kubelet/.

  • imagefs: An optional filesystem that container runtimes use to store container images and container writable layers. [11]

    The containerd runtime uses a TOML configuration file to control where persistent (default "/var/lib/containerd") and ephemeral data (default "/run/containerd") is stored.

Kubelet auto-discovers these filesystems and ignores other node local filesystems. Kubelet does not support other configurations.

2.2. Initializing control-plane node

The control-plane node is the machine where the control plane components run, including etcd (the cluster database) and the API Server (which the kubectl command line tool communicates with). [1]

sudo kubeadm init \
    --kubernetes-version=v1.26.0 \
    --control-plane-endpoint=cluster-endpoint \
    --apiserver-advertise-address=192.168.0.100 \
    --pod-network-cidr=10.244.0.0/16 \
    --service-cidr=10.96.0.0/12 \
    --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers \
    --ignore-preflight-errors=NumCPU,Mem \
    --dry-run
  1. (Recommended) If you have plans to upgrade this single control-plane kubeadm cluster to high availability you should specify the --control-plane-endpoint to set the shared endpoint for all control-plane nodes. Such an endpoint can be either a DNS name or an IP address of a load-balancer.

  2. Choose a Pod network add-on, and verify whether it requires any arguments to be passed to kubeadm init. Depending on which third-party provider you choose, you might need to set the --pod-network-cidr to a provider-specific value.

  3. (Optional) kubeadm tries to detect the container runtime by using a list of well known endpoints. To use different container runtime or if there are more than one installed on the provisioned node, specify the --cri-socket argument to kubeadm.

Considerations about apiserver-advertise-address and ControlPlaneEndpoint

  • Unless otherwise specified, kubeadm uses the network interface associated with the default gateway to set the advertise address for this particular control-plane node’s API server. To use a different network interface, specify the --apiserver-advertise-address=<ip-address> argument to kubeadm init.

  • While --apiserver-advertise-address can be used to set the advertise address for this particular control-plane node’s API server, --control-plane-endpoint can be used to set the shared endpoint for all control-plane nodes.

  • --control-plane-endpoint allows both IP addresses and DNS names that can map to IP addresses. Such as:

    192.168.56.130	cluster-endpoint

    Where 192.168.56.130 is the IP address of this node and cluster-endpoint is a custom DNS name that maps to this IP. Later you can modify cluster-endpoint to point to the address of your load-balancer in an high availability scenario.

Run the following command to init a control panel:

sudo kubeadm init \
    --kubernetes-version=v1.26.0 \
    --control-plane-endpoint=cluster-endpoint \
    --pod-network-cidr=10.244.0.0/16
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:

  kubeadm join cluster-endpoint:6443 --token ed790l.ylclzoyoa7l9v0e9 \
	--discovery-token-ca-cert-hash sha256:cb046f4d8183a66f930155654cc34354612eeab839d7ed97971154fa8f35072f \
	--control-plane

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join cluster-endpoint:6443 --token ed790l.ylclzoyoa7l9v0e9 \
	--discovery-token-ca-cert-hash sha256:cb046f4d8183a66f930155654cc34354612eeab839d7ed97971154fa8f35072f

2.2.1. Installing a Pod network add-on

You must deploy a Container Network Interface (CNI) based Pod network add-on so that your Pods can communicate with each other. Cluster DNS (CoreDNS) will not start up before a network is installed.

  • Take care that your Pod network must not overlap with any of the host networks: you are likely to see problems if there is any overlap. (If you find a collision between your network plugin’s preferred Pod network and some of your host networks, you should think of a suitable CIDR block to use instead, then use that during kubeadm init with --pod-network-cidr and as a replacement in your network plugin’s YAML).

  • By default, kubeadm sets up your cluster to use and enforce use of RBAC (role based access control). Make sure that your Pod network plugin supports RBAC, and so do any manifests that you use to deploy it.

  • If you want to use IPv6—​either dual-stack, or single-stack IPv6 only networking—​for your cluster, make sure that your Pod network plugin supports IPv6. IPv6 support was added to CNI in v0.6.0.

Flannel is a simple and easy way to configure a layer 3 network fabric designed for Kubernetes. For Kubernetes v1.17+, deploying Flannel with kubectl:

  • Deploying Flannel with kubectl

    kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

    If you use custom podCIDR (not 10.244.0.0/16) you first need to download the above manifest and modify the network to match your one.

  • Deploying Flannel with helm

    # Needs manual creation of namespace to avoid helm error
    kubectl create ns kube-flannel
    kubectl label --overwrite ns kube-flannel pod-security.kubernetes.io/enforce=privileged
    
    helm repo add flannel https://flannel-io.github.io/flannel/
    helm install flannel --set podCidr="10.244.0.0/16" --namespace kube-flannel flannel/flannel
    
    # helm install flannel oci://registry-1.docker.io/qqbuby/flannel --namespace kube-flannel --version v0.24.4

Flannel may be paired with several different backends. Once set, the backend should not be changed at runtime.

  • VXLAN is the recommended choice.

  • host-gw is recommended for more experienced users who want the performance improvement and whose infrastructure support it (typically it can’t be used in cloud environments).

  • UDP is suggested for debugging only or for very old kernels that don’t support VXLAN.

Several external projects provide Kubernetes Pod networks using CNI, some of which also support Network Policy. See a list of add-ons that implement the Kubernetes networking model.

2.2.2. Control plane node isolation

By default, Pods will not be scheduled on the control plane nodes for security reasons. To be able to schedule Pods on the control plane nodes, run:

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

2.3. Joining the work nodes

To add new nodes to your cluster do the following for each machine:

  1. SSH to the machine

  2. Become root (e.g. sudo su -)

  3. Install a runtime if needed

  4. Run the command that was output by kubeadm init. For example:

    Then you can join any number of worker nodes by running the following on each as root:
    
    kubeadm join cluster-endpoint:6443 --token ed790l.ylclzoyoa7l9v0e9 \
    	--discovery-token-ca-cert-hash sha256:cb046f4d8183a66f930155654cc34354612eeab839d7ed97971154fa8f35072f

If you do not have the token, you can get it by running the following command on the control-plane node:

kubeadm token list

By default, tokens expire after 24 hours. If you are joining a node to the cluster after the current token has expired, you can create a new token by running the following command on the control-plane node:

kubeadm token create

If you don’t have the value of --discovery-token-ca-cert-hash, you can get it by running the following command chain on the control-plane node:

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | \
   openssl dgst -sha256 -hex | sed 's/^.* //'

You can also run the following command to create and print join command:

kubeadm token create --print-join-command

2.4. Joing the stacked control plane and etcd nodes

  1. Upload the certificates that should be shared across all the control-plane instances to the cluster, and note the certificate key.

    sudo kubeadm init phase upload-certs --upload-certs
    [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
    [upload-certs] Using certificate key:
    a455917454410f7d8bcdfa5795ed54526c7484e4e6316ef57a3aa16c3454ada2
  2. Run the command that was output by kubeadm init with the additional --certificate-key <certificate key> generated above.

    You can now join any number of control-plane nodes by copying certificate authorities
    and service account keys on each node and then running the following as root:
    
      kubeadm join cluster-endpoint:6443 --token ed790l.ylclzoyoa7l9v0e9 \
    	--discovery-token-ca-cert-hash sha256:cb046f4d8183a66f930155654cc34354612eeab839d7ed97971154fa8f35072f \
    	--control-plane
    kubeadm join cluster-endpoint:6443 --token ed790l.ylclzoyoa7l9v0e9 \
      --discovery-token-ca-cert-hash sha256:cb046f4d8183a66f930155654cc34354612eeab839d7ed97971154fa8f35072f \
      --control-plane \
      --certificate-key a455917454410f7d8bcdfa5795ed54526c7484e4e6316ef57a3aa16c3454ada2
    This node has joined the cluster and a new control plane instance was created:
    
    * Certificate signing request was sent to apiserver and approval was received.
    * The Kubelet was informed of the new secure connection details.
    * Control plane label and taint were applied to the new node.
    * The Kubernetes control plane instances scaled up.
    * A new etcd member was added to the local/stacked etcd cluster.
    
    To start administering your cluster from this node, you need to run the following as a regular user:
    
    	mkdir -p $HOME/.kube
    	sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    	sudo chown $(id -u):$(id -g) $HOME/.kube/config
    
    Run 'kubectl get nodes' to see this node join the cluster.
    $ kubectl get nodes
    NAME                 STATUS   ROLES           AGE   VERSION
    node-0               Ready    control-plane   92m   v1.26.0
    node-2               Ready    control-plane   27s   v1.26.13

2.5. Removing the nodes

Talking to the control-plane node with the appropriate credentials, run:

kubectl drain <node name> --delete-emptydir-data --force --ignore-daemonsets

Before removing the node, reset the state installed by kubeadm:

kubeadm reset

Now remove the node:

kubectl delete node <node name>

2.6. Installing Addons

Add-ons extend the functionality of Kubernetes.

2.6.1. Ingress controllers

In order for the Ingress resource to work, the cluster must have an ingress controller running. Unlike other types of controllers which run as part of the kube-controller-manager binary, Ingress controllers are not started automatically with a cluster. Kubernetes as a project supports and maintains AWS, GCE, and nginx ingress controllers. [14]

There are multiple ways to install the Ingress-Nginx Controller: [15]

  • with Helm, using the project repository chart;

  • with kubectl apply, using YAML manifests;

  • with specific addons (e.g. for minikube or MicroK8s).

You can also expose the Ingress Nginx over a NodePort service. [16]

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "10254"
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  type: NodePort
  ports:
    - name: http
      port: 80
      nodePort: 30080
      protocol: TCP
      targetPort: http
      appProtocol: http
    - name: https
      port: 443
      nodePort: 30443
      protocol: TCP
      targetPort: https
      appProtocol: https
    - name: prometheus
      port: 10254
      protocol: TCP
      targetPort: prometheus

Aliyun (a Chinese corporation) provides a mirror repository (registry.aliyuncs.com/google_containers) for the images, to which Chinese users have access. [17] You can consider updating the ingress-nginx images as the following:

images:
  # registry.k8s.io/ingress-nginx/controller:v1.9.6@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c
  - name: registry.k8s.io/ingress-nginx/controller
    newName: registry.aliyuncs.com/google_containers/nginx-ingress-controller
    # remove the digest to ignore the integrity checking.
    newTag: v1.9.6
  # registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20231226-1a7112e06@sha256:25d6a5f11211cc5c3f9f2bf552b585374af287b4debf693cacbe2da47daa5084
  - name: registry.k8s.io/ingress-nginx/kube-webhook-certgen
    newName: registry.aliyuncs.com/google_containers/kube-webhook-certgen
    # remove the digest to ignore the integrity checking.
    newTag: v20231226-1a7112e06

Checking ingress controller version

Run /nginx-ingress-controller --version within the pod, for instance with kubectl exec:

POD_NAMESPACE=ingress-nginx
POD_NAME=$(kubectl get pods -n $POD_NAMESPACE -l app.kubernetes.io/name=ingress-nginx --field-selector=status.phase=Running -o name)
kubectl exec $POD_NAME -n $POD_NAMESPACE -- /nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.9.6
  Build:         6a73aa3b05040a97ef8213675a16142a9c95952a
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

-------------------------------------------------------------------------------

2.6.2. Metrics server

Metrics Server is a scalable, efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines. [18]

Metrics Server can be installed either directly from YAML manifest or via the official Helm chart. To install the latest Metrics Server release from the components.yaml manifest, run the following command.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

You can also consider updating the yaml as the following:

# metrics-server-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - name: metrics-server
          args:
            - --cert-dir=/tmp
            - --secure-port=10250
            - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
            - --kubelet-use-node-status-port
            - --metric-resolution=15s
            # Do not verify the CA of serving certificates presented by Kubelets. For testing purposes only.
            - --kubelet-insecure-tls
# kustomization.yaml
resources:
  - ../manifests
patchesStrategicMerge:
  - metrics-server-deployment.yaml
images:
  - name: registry.k8s.io/metrics-server/metrics-server
    newName: registry.aliyuncs.com/google_containers/metrics-server

3. Upgrading kubeadm clusters

If you are performing a minor version upgrade for any kubelet, you must first drain the node (or nodes) that you are upgrading. In the case of control plane nodes, they could be running CoreDNS Pods or other critical workloads. [19]
The Kubernetes project recommends that you match your kubelet and kubeadm versions. You can instead use a version of kubelet that is older than kubeadm, provided it is within the range of supported versions.

If you’re using the community-owned package repositories (pkgs.k8s.io), you need to enable the package repository for the desired Kubernetes minor release.

# /etc/apt/sources.list.d/kubernetes.list
deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /
# /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.29/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.29/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
# Find the latest 1.29 version in the list.
# It should look like 1.29.x-*, where x is the latest patch.
sudo apt update
sudo apt-cache madison kubeadm # OR apt-cache policy kubeadm
# Find the latest 1.29 version in the list.
# It should look like 1.29.x-*, where x is the latest patch.
sudo yum clean all --disablerepo="*" --enablerepo=kubernetes # Make sure the YUM cache of the kubernetes repo is cleaned.
sudo yum list --showduplicates kubeadm --disableexcludes=kubernetes

(Optional) Pre-pulled images:

#!/bin/sh

# replace x in 1.29.x with the latest patch version
kubernetes_version=v1.29.x
image_repository=registry.cn-hangzhou.aliyuncs.com/google_containers

sudo kubeadm config images pull \
    --kubernetes-version=$kubernetes_version \
    --image-repository=$image_repository

images=$(kubeadm config images list \
    --kubernetes-version $kubernetes_version \
    --image-repository $image_repository)

for i in $images; do
    case "$i" in
        *coredns*)
            new_repo="registry.k8s.io/coredns"
            ;;
        *)
            new_repo="registry.k8s.io"
            ;;
    esac
    newtag=$(echo "$i" | sed "s@$image_repository@$new_repo@")
    sudo ctr -n k8s.io images tag $i $newtag
done

3.1. Upgrading control plane nodes

The upgrade procedure on control plane nodes should be executed one node at a time.

3.1.1. Upgrade kubeadm

For the first control plane node

  1. Upgrade kubeadm:

    # replace x in 1.29.x-* with the latest patch version
    sudo apt-mark unhold kubeadm && \
    sudo apt-get update && sudo apt-get install -y kubeadm='1.29.x-*' && \
    sudo apt-mark hold kubeadm
    # replace x in 1.29.x-* with the latest patch version
    sudo yum install -y kubeadm-'1.29.x-*' --disableexcludes=kubernetes
  2. Verify that the download works and has the expected version:

    kubeadm version
  3. Verify the upgrade plan:

    sudo kubeadm upgrade plan
  4. Choose a version to upgrade to, and run the appropriate command. For example:

    # replace x with the patch version you picked for this upgrade
    sudo kubeadm upgrade apply v1.29.x

For the other control plane nodes

Same as the first control plane node but use:

sudo kubeadm upgrade node

instead of:

sudo kubeadm upgrade apply

3.1.2. Upgrade kubelet and kubectl

  1. Drain the node, prepare the node for maintenance by marking it unschedulable and evicting the workloads:

    # replace <node-to-drain> with the name of your node you are draining
    kubectl drain <node-to-drain> --ignore-daemonsets
  2. Upgrade the kubelet and kubectl:

    # replace x in 1.29.x-* with the latest patch version
    sudo apt-mark unhold kubelet kubectl && \
    sudo apt-get update && sudo apt-get install -y kubelet='1.29.x-*' kubectl='1.29.x-*' && \
    sudo apt-mark hold kubelet kubectl
    # replace x in 1.29.x-* with the latest patch version
    sudo yum install -y kubelet-'1.29.x-*' kubectl-'1.29.x-*' --disableexcludes=kubernetes
  3. Restart the kubelet:

    sudo systemctl daemon-reload
    sudo systemctl restart kubelet
  4. Uncordon the node, bring the node back online by marking it schedulable:

    # replace <node-to-uncordon> with the name of your node
    kubectl uncordon <node-to-uncordon>

3.2. Upgrade worker nodes

The upgrade procedure on worker nodes should be executed one node at a time or few nodes at a time, without compromising the minimum required capacity for running your workloads. [20]

3.2.1. Upgrade kubeadm

# replace x in 1.29.x-* with the latest patch version
sudo apt-mark unhold kubeadm && \
sudo apt-get update && sudo apt-get install -y kubeadm='1.29.x-*' && \
sudo apt-mark hold kubeadm
# replace x in 1.29.x-* with the latest patch version
sudo yum install -y kubeadm-'1.29.x-*' --disableexcludes=kubernetes
# For worker nodes this upgrades the local kubelet configuration:
sudo kubeadm upgrade node

3.2.2. Upgrade kubelet and kubectl

  1. Drain the node, prepare the node for maintenance by marking it unschedulable and evicting the workloads:

    # execute this command on a control plane node
    # replace <node-to-drain> with the name of your node you are draining
    kubectl drain <node-to-drain> --ignore-daemonsets
  2. Upgrade the kubelet and kubectl:

    # replace x in 1.29.x-* with the latest patch version
    sudo apt-mark unhold kubelet kubectl && \
    sudo apt-get update && sudo apt-get install -y kubelet='1.29.x-*' kubectl='1.29.x-*' && \
    sudo apt-mark hold kubelet kubectl
    # replace x in 1.29.x-* with the latest patch version
    sudo yum install -y kubelet-'1.29.x-*' kubectl-'1.29.x-*' --disableexcludes=kubernetes
  3. Restart the kubelet:

    sudo systemctl daemon-reload
    sudo systemctl restart kubelet
  4. Uncordon the node, bring the node back online by marking it schedulable:

    # execute this command on a control plane node
    # replace <node-to-uncordon> with the name of your node
    kubectl uncordon <node-to-uncordon>

3.3. Verify the status of the cluster

After the kubelet is upgraded on all nodes verify that all nodes are available again by running the following command from anywhere kubectl can access the cluster:

kubectl get nodes

The STATUS column should show Ready for all your nodes, and the version number should be updated.

References