1. Installing kubeadm and container runtime

  • A compatible Linux host. The Kubernetes project provides generic instructions for Linux distributions based on Debian and Red Hat, and those distributions without a package manager.

  • 2 GB or more of RAM per machine (any less will leave little room for your apps).

  • 2 CPUs or more.

  • Full network connectivity between all machines in the cluster (public or private network is fine).

  • Unique hostname, MAC address, and product_uuid for every node.

  • Certain ports are open on your machines.

  • Swap disabled. You MUST disable swap in order for the kubelet to work properly.

1.1. Verify the MAC address and product_uuid are unique for every node

  • You can get the MAC address of the network interfaces using the command ip link or ifconfig -a

  • The product_uuid can be checked by using the command sudo cat /sys/class/dmi/id/product_uuid

    [x@node-2 ~]$ ip link show ens32
    2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
        link/ether 00:0c:29:f2:e6:ca brd ff:ff:ff:ff:ff:ff
    
    [x@node-2 ~]$ sudo cat /sys/class/dmi/id/product_uuid
    44314D56-8B56-37B6-C94C-6A2D5FF2E6CA

1.2. Check required ports

Table 1. Control plane
Protocol Direction Port Range Purpose Used By

TC

Inbound

6443

Kubernetes API server

All

TCP

Inbound

2379-2380

etcd server client API

kube-apiserver, etcd

TCP

Inbound

10250

Kubelet API

Self, Control plane

TCP

Inbound

10259

kube-scheduler

Self

TCP

Inbound

10257

kube-controller-manager

Self

Table 2. Worker node(s)
Protocol Direction Port Range Purpose Used By

TCP

Inbound

10250

Kubelet API

Self, Control plane

TCP

Inbound

30000-32767

NodePort Services

All

These required ports need to be open in order for Kubernetes components to communicate with each other. You can use tools like netcat to check if a port is open. For example:

nc 127.0.0.1 6443

The pod network plugin you use may also require certain ports to be open.

1.3. Installing a container runtime

To run containers in Pods, Kubernetes uses a container runtime.

By default, Kubernetes uses the Container Runtime Interface (CRI) to interface with your chosen container runtime.

Kubernetes 1.26 requires that you use a runtime that conforms with the Container Runtime Interface (CRI).

If you don’t specify a runtime, kubeadm automatically tries to detect an installed container runtime by scanning through a list of known endpoints.

Table 3. Known endpoints for Linux supported operating systems
Runtime Path to Unix domain socket

containerd

unix:///var/run/containerd/containerd.sock

CRI-O

unix:///var/run/crio/crio.sock

Docker Engine (using cri-dockerd)

unix:///var/run/cri-dockerd.sock

If multiple or no container runtimes are detected kubeadm will throw an error and will request that you specify which one you want to use.

Docker Engine does not implement the CRI which is a requirement for a container runtime to work with Kubernetes. For that reason, an additional service cri-dockerd has to be installed. cri-dockerd is a project based on the legacy built-in Docker Engine support that was removed from the kubelet in version 1.24.

1.3.1. CRI version support

Your container runtime must support at least v1alpha2 of the container runtime interface.

Kubernetes 1.26 defaults to using v1 of the CRI API. If a container runtime does not support the v1 API, the kubelet falls back to using the (deprecated) v1alpha2 API instead.

1.3.2. Install and configure prerequisites

The following steps apply common settings for Kubernetes nodes on Linux.

You can skip a particular setting if you’re certain you don’t need it.

1.3.2.1. Forwarding IPv4 and letting iptables see bridged traffic

Verify that the br_netfilter module is loaded by running lsmod | grep br_netfilter.

To load it explicitly, run sudo modprobe br_netfilter.

In order for a Linux node’s iptables to correctly view bridged traffic, verify that net.bridge.bridge-nf-call-iptables is set to 1 in your sysctl config. For example:

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system

1.3.3. Cgroup drivers

On Linux, control groups are used to constrain resources that are allocated to processes. [2]

Both kubelet and the underlying container runtime need to interface with control groups to enforce resource management for pods and containers and set resources such as cpu/memory requests and limits.

To interface with control groups, the kubelet and the container runtime need to use a cgroup driver.

It’s critical that the kubelet and the container runtime uses the same cgroup driver and are configured the same.

There are two cgroup drivers available:

1.3.3.1. cgroupfs driver

The cgroupfs driver is the default cgroup driver in the kubelet. When the cgroupfs driver is used, the kubelet and the container runtime directly interface with the cgroup filesystem to configure cgroups.

The cgroupfs driver is not recommended when systemd is the init system because systemd expects a single cgroup manager on the system.

Additionally, if you use cgroup v2 , use the systemd cgroup driver instead of cgroupfs.

1.3.3.2. systemd cgroup driver

When systemd is chosen as the init system for a Linux distribution, the init process generates and consumes a root control group (cgroup) and acts as a cgroup manager.

systemd has a tight integration with cgroups and allocates a cgroup per systemd unit. As a result, if you use systemd as the init system with the cgroupfs driver, the system gets two different cgroup managers.

Two cgroup managers result in two views of the available and in-use resources in the system.

In some cases, nodes that are configured to use cgroupfs for the kubelet and container runtime, but use systemd for the rest of the processes become unstable under resource pressure.

The approach to mitigate this instability is to use systemd as the cgroup driver for the kubelet and the container runtime when systemd is the selected init system.

To set systemd as the cgroup driver, edit the KubeletConfiguration option of cgroupDriver and set it to systemd. For example: [2][3]

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
...
cgroupDriver: systemd
In v1.22, if the user is not setting the cgroupDriver field under KubeletConfiguration, kubeadm will default it to systemd. [3]

1.3.4. Container runtimes

1.3.4.1. containerd

Follow the instructions for getting started with containerd. Return to this step once you’ve created a valid configuration file, config.toml.

You can find this file under the path /etc/containerd/config.toml.

On Linux the default CRI socket for containerd is /run/containerd/containerd.sock.

  1. Configuring the systemd cgroup driver

    To use the systemd cgroup driver in /etc/containerd/config.toml with runc, set

    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
      ...
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
        SystemdCgroup = true

    The systemd cgroup driver is recommended if you use cgroup v2.

    The cgroup version depends on the Linux distribution being used and the default cgroup version configured on the OS.

    To check which cgroup version your distribution uses, run the stat -fc %T /sys/fs/cgroup/ command on the node: [4]

    stat -fc %T /sys/fs/cgroup/

    For cgroup v2, the output is cgroup2fs.

    For cgroup v1, the output is tmpfs.

    If you installed containerd from a package (for example, RPM or .deb), you may find that the CRI integration plugin is disabled by default.

    You need CRI support enabled to use containerd with Kubernetes. Make sure that cri is not included in the disabled_plugins list within /etc/containerd/config.toml; if you made changes to that file, also restart containerd.

    $ apt-get download containerd.io
    Get:1 https://download.docker.com/linux/debian buster/stable amd64 containerd.io amd64 1.6.13-1 [27.7 MB]
    Fetched 27.7 MB in 24s (1,154 kB/s)
    $ dpkg -c containerd.io_1.6.13-1_amd64.deb
    drwxr-xr-x root/root         0 2022-12-16 02:39 ./
    drwxr-xr-x root/root         0 2022-12-16 02:39 ./etc/
    drwxr-xr-x root/root         0 2022-12-16 02:39 ./etc/containerd/
    -rw-r--r-- root/root       886 2022-12-16 02:39 ./etc/containerd/config.toml
    ....

    The follow configuration /etc/containerd/config.toml is used by Docker CE as default.

    #   Copyright 2018-2022 Docker Inc.
    
    #   Licensed under the Apache License, Version 2.0 (the "License");
    #   you may not use this file except in compliance with the License.
    #   You may obtain a copy of the License at
    
    #       http://www.apache.org/licenses/LICENSE-2.0
    
    #   Unless required by applicable law or agreed to in writing, software
    #   distributed under the License is distributed on an "AS IS" BASIS,
    #   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    #   See the License for the specific language governing permissions and
    #   limitations under the License.
    
    disabled_plugins = ["cri"]
    
    #root = "/var/lib/containerd"
    #state = "/run/containerd"
    #subreaper = true
    #oom_score = 0
    
    #[grpc]
    #  address = "/run/containerd/containerd.sock"
    #  uid = 0
    #  gid = 0
    
    #[debug]
    #  address = "/run/containerd/debug.sock"
    #  uid = 0
    #  gid = 0
    #  level = "info"
  2. Overriding the sandbox (pause) image

    In your containerd config you can overwrite the sandbox image by setting the following config:

    [plugins."io.containerd.grpc.v1.cri"]
      sandbox_image = "registry.k8s.io/pause:3.2"
  3. Configure root and state storage locations

    In the containerd config file you will find settings for persistent and runtime storage locations as well as grpc, debug, and metrics addresses for the various APIs.

    #root = "/var/lib/containerd"
    #state = "/run/containerd"

    The containerd root will be used to store any type of persistent data for containerd. Snapshots, content, metadata for containers and image, as well as any plugin data will be kept in this location.

    The root is also namespaced for plugins that containerd loads. Each plugin will have its own directory where it stores data. containerd itself does not actually have any persistent data that it needs to store, its functionality comes from the plugins that are loaded.

    $ sudo tree  /var/lib/containerd/
    /var/lib/containerd/
    ├── io.containerd.content.v1.content
    │   └── ingest
    ├── io.containerd.metadata.v1.bolt
    │   └── meta.db
    ├── io.containerd.runtime.v1.linux
    ├── io.containerd.runtime.v2.task
    ├── io.containerd.snapshotter.v1.btrfs
    ├── io.containerd.snapshotter.v1.native
    │   └── snapshots
    ├── io.containerd.snapshotter.v1.overlayfs
    │   └── snapshots
    └── tmpmounts
    
    11 directories, 1 file

    The containerd state will be used to store any type of ephemeral data. Sockets, pids, runtime state, mount points, and other plugin data that must not persist between reboots are stored in this location.

    $ sudo tree /run/containerd/
    /run/containerd/
    ├── containerd.sock
    ├── containerd.sock.ttrpc
    ├── io.containerd.runtime.v1.linux
    └── io.containerd.runtime.v2.task
    
    2 directories, 2 files
  4. Configure HTTP or HTTPS proxy.

    The contianerd daemon uses the HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environmental variables in its start-up environment to configure HTTP or HTTPS proxy behavior.

    1. Create a systemd drop-in directory for the containerd service:

      $ sudo mkdir -p /etc/systemd/system/containerd.service.d
    2. Create a file called 10-http_proxy.conf at the above directory that adds the HTTP_PROXY environment variable:

      [Service]
      Environment="HTTP_PROXY=http://proxy.example.com:80/"

      Or, if you are behind an HTTPS proxy server, adds the HTTPS_PROXY environment variable:

      [Service]
      Environment="HTTP_PROXY=http://proxy.example.com:80/"
      Environment="HTTPS_PROXY=https://proxy.example.com:443/"

      If you have internal registries that you need to contact without proxying you can specify them via the NO_PROXY environment variable:

      [Service]
      Environment="HTTP_PROXY=http://proxy.example.com:80/"
      Environment="HTTPS_PROXY=https://proxy.example.com:443/"
      Environment="NO_PROXY=localhost,127.0.0.1,docker-registry.somecorporation.com"

      The NO_PROXY environment variable specifies URLs that should be excluded from proxying (on servers that should be contacted directly). This should be a comma-separated list of hostnames, domain names, or a mixture of both. Asterisks can be used as wildcards, but other clients may not support that. Domain names may be indicated by a leading dot. For example:

      NO_PROXY="*.aventail.com,home.com,.seanet.com"

      says to contact all machines in the ‘aventail.com’ and ‘seanet.com’ domains directly, as well as the machine named ‘home.com’. If NO_PROXY isn’t defined, no_PROXY and no_proxy are also tried, in that order.

      You can also use the systemctl edit containerd to edit override.conf at /etc/systemd/system/containrd.service.d for the containerd service.

    3. Flush changes and restart containerd:

      $ sudo systemctl daemon-reload
      $ sudo systemctl restart containerd
    4. Verify that the configuration has been loaded:

      $ systemctl show --property=Environment containerd --full --no-pager

The containerd.io packages in DEB and RPM formats are distributed by Docker (not by the containerd project)

  • Debian

    # Update the apt package index and install packages to allow apt to use a repository over HTTPS
    sudo apt-get update
    sudo apt-get install \
        ca-certificates \
        curl \
        gnupg \
        lsb-release
    
    # Add Docker’s official GPG key:
    sudo mkdir -p /etc/apt/keyrings
    curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
    
    # Use the following command to set up the repository:
    echo \
      "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
      $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    
    # Install containerd.io
    sudo apt-get update && sudo apt-get install -y containerd.io
  • CentOS

    # Install the yum-utils package (which provides the yum-config-manager utility) and set up the repository.
    sudo yum install -y yum-utils
    sudo yum-config-manager \
        --add-repo \
        https://download.docker.com/linux/centos/docker-ce.repo
    # Install the latest version of containerd.
    # If prompted to accept the GPG key, verify that the fingerprint matches
    # `060A 61C5 1B55 8A7F 742B 77AA C52F EB6B 621E 9F35`, and if so, accept it.
    sudo yum install containerd.io
    # Start containerd.
    sudo systemctl enable containerd.service
    sudo systemctl start containerd.service

For more information about Cgroups, see Linux CGroups and Containers.

For more information about containerd, see RUNC CONTAINERD CRI DOCKERSHIM.

1.3.4.2. Docker Engine
  • On each of your nodes, install Docker for your Linux distribution as per Install Docker Engine.

  • Install cri-dockerd, following the instructions in that source code repository.

    For cri-dockerd, the CRI socket is /run/cri-dockerd.sock by default.

This example sets the cgroupdriver to systemd: [6]

sudo sh -c 'cat > /etc/docker/daemon.json <<EOF
{
  "data-root": "/var/lib/docker",
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF'

1.4. Installing kubeadm, kubelet and kubectl

You will install these packages on all of your machines:

  • kubeadm: the command to bootstrap the cluster.

  • kubelet: the component that runs on all of the machines in your cluster and does things like starting pods and containers.

  • kubectl: the command line util to talk to your cluster.

kubeadm will not install or manage kubelet or kubectl for you, so you will need to ensure they match the version of the Kubernetes control plane you want kubeadm to install for you.

If you do not, there is a risk of a version skew occurring that can lead to unexpected, buggy behaviour.

However, one minor version skew between the kubelet and the control plane is supported, but the kubelet version may never exceed the API server version.

For example, the kubelet running 1.7.0 should be fully compatible with a 1.8.0 API server, but not vice versa.

For more information on version skews, see:

1.4.1. Debian-based distributions

  1. Update the apt package index and install packages needed to use the Kubernetes apt repository:

    $ sudo apt-get update
    $ sudo apt-get install -y apt-transport-https ca-certificates curl
  2. Download the Google Cloud public signing key:

    $ sudo curl -fsSLo /etc/apt/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
  3. Add the Kubernetes apt repository:

    $ echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

    Note: You can also set the kubernetes.list repository with the following mirror by USTC China.

    # deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main
    deb [arch=amd64 signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://mirrors.ustc.edu.cn/kubernetes/apt/  kubernetes-xenial main
  4. Update apt package index, install kubelet, kubeadm and kubectl, and pin their version:

    $ sudo apt-get update
    $ sudo apt-get install -y kubelet kubeadm kubectl
    $ sudo apt-mark hold kubelet kubeadm kubectl

    You can also specify the installing package version:

    $ apt-cache madison kubeadm | head -n 5
       kubeadm |  1.26.0-00 | https://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial/main amd64 Packages
       kubeadm |  1.25.5-00 | https://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial/main amd64 Packages
       kubeadm |  1.25.4-00 | https://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial/main amd64 Packages
       kubeadm |  1.25.3-00 | https://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial/main amd64 Packages
       kubeadm |  1.25.2-00 | https://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial/main amd64 Packages
    
    $ sudo apt-get install -y kubelet=1.26.0-00 kubeadm=1.26.0-00 kubectl=1.26.0-00
  5. Output shell completion code for the specified shell (bash or zsh). [5]

    # Install the bash-completion framework
    sudo apt-get install -y bash-completion
    
    # Output bash completion
    sudo sh -c 'kubeadm completion bash > /etc/bash_completion.d/kubeadm'
    sudo sh -c 'kubectl completion bash > /etc/bash_completion.d/kubectl'
    sudo sh -c 'crictl completion > /etc/bash_completion.d/crictl'
    
    # Load the completion code for bash into the current shell
    source /etc/bash_completion

Set HTTP proxy for APT:

cat <<EOF > /etc/apt/apt.conf.d/httproxy
> Acquire::http::Proxy "http://PROXY_HOST:PORT";
> EOF

Here is a config /etc/apt/apt.conf.d/10httproxy file:

Acquire::http::Proxy "http://10.20.30.40:1080";
Acquire::http::Proxy {
  # the special keyword DIRECT meaning to use no proxies
  #security.debian.org DIRECT;
  #security-cdn.debian.org DIRECT;
  ftp2.cn.debian.org DIRECT;
  ftp.cn.debian.org DIRECT;
  mirror.lzu.edu.cn DIRECT;
  mirrors.163.com DIRECT;
  mirrors.huaweicloud.com DIRECT;
  mirrors.tuna.tsinghua.edu.cn DIRECT;
  mirrors.ustc.edu.cn DIRECT;

  download.docker.com DIRECT;
  packages.microsoft.com DIRECT;
};

1.4.2. Red Hat-based distributions

cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF

# Set SELinux in permissive mode (effectively disabling it)
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

sudo systemctl enable --now kubelet

# Install the bash-completion framework
sudo yum install -y bash-completion

# Output bash completion
sudo sh -c 'kubeadm completion bash > /etc/bash_completion.d/kubeadm'
sudo sh -c 'kubectl completion bash > /etc/bash_completion.d/kubectl'
sudo sh -c 'crictl completion > /etc/bash_completion.d/crictl'

# Load the completion code for bash into the current shell
source /usr/share/bash-completion/bash_completion
  • Setting SELinux in permissive mode by running setenforce 0 and sed …​ effectively disables it. This is required to allow containers to access the host filesystem, which is needed by pod networks for example. You have to do this until SELinux support is improved in the kubelet.

  • You can leave SELinux enabled if you know how to configure it but it may require settings that are not supported by kubeadm.

  • If the baseurl fails because your Red Hat-based distribution cannot interpret basearch, replace \$basearch with your computer’s architecture. Type uname -m to see that value. For example, the baseurl URL for x86_64 could be: https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64

  • You can also replace the kubernetes repository with USTC China mirror. [7]

    1. Update /etc/yum.repos.d/kubernetes.repo:

      [kubernetes]
      name=Kubernetes
      baseurl=https://mirrors.ustc.edu.cn/kubernetes/yum/repos/kubernetes-el7-\$basearch
      enabled=1
      gpgcheck=1
      gpgkey=https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
      exclude=kubelet kubeadm kubectl
    2. You can install and import RPM GPG Key manually: [8]

      $ curl -fsSLo /tmp/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
      $ sudo rpm --import /tmp/kubernetes-archive-keyring.gpg
      
      $ rpm -qa gpg-pubkey
      gpg-pubkey-f4a80eb5-53a7ff4b
      gpg-pubkey-3e1ba8d5-558ab6a8
      
      $ rpm -qi gpg-pubkey-3e1ba8d5-558ab6a8
      Version     : 3e1ba8d5
      Release     : 558ab6a8
      ...
      Packager    : Google Cloud Packages RPM Signing Key <gc-team@google.com>
      ...

      You can also download the rpm-package-key.gpg here.

  • You can also specify the installing package version:

    $ yum --showduplicates --disableexcludes=kubernetes list kubeadm | tail -n 5
    kubeadm.x86_64                       1.25.2-0                        kubernetes
    kubeadm.x86_64                       1.25.3-0                        kubernetes
    kubeadm.x86_64                       1.25.4-0                        kubernetes
    kubeadm.x86_64                       1.25.5-0                        kubernetes
    kubeadm.x86_64                       1.26.0-0                        kubernetes
    
    $ sudo yum --disableexcludes=kubernetes install kubelet-1.26.0-0 kubeadm-1.26.0-0 kubectl-1.26.0-0

Set HTTP proxy for YUM:

echo 'proxy=http://PROXY_HOST:PORT' >> /etc/yum.conf

Here is a complete config /etc/yum.repos.d/kubernetes.repo file:

[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kube*
proxy=http://10.20.30.40:1080/

1.4.3. Set CRICTL endpoint of CRI container runtime service

This example sets the container runtime endpoint of crictl as unix:///run/containerd/containerd.sock.

sudo crictl config --set runtime-endpoint=unix:///run/containerd/containerd.sock
$ sudo cat /etc/crictl.yaml
runtime-endpoint: "unix:///run/containerd/containerd.sock"
image-endpoint: ""
timeout: 0
debug: false
pull-image-on-create: false
disable-pull-on-run: false

$ sudo crictl info

  "cniconfig": {
    "PluginDirs": [
      "/opt/cni/bin"
    ],
    "PluginConfDir": "/etc/cni/net.d",

  "config": {
    "containerd": {
      "runtimes": {
        "runc": {
          "options": {
            "SystemdCgroup": false

    "cni": {
      "binDir": "/opt/cni/bin",
      "confDir": "/etc/cni/net.d",
    },
    "sandboxImage": "registry.k8s.io/pause:3.6",

    "containerdRootDir": "/var/lib/containerd",
    "containerdEndpoint": "/run/containerd/containerd.sock",
    "rootDir": "/var/lib/containerd/io.containerd.grpc.v1.cri",
    "stateDir": "/run/containerd/io.containerd.grpc.v1.cri"

1.4.4. Configuring a cgroup driver

Both the container runtime and the kubelet have a property called "cgroup driver", which is important for the management of cgroups on Linux machines.

Matching the container runtime and kubelet cgroup drivers is required or otherwise the kubelet process will fail.

To set systemd as the cgroup driver, edit the KubeletConfiguration option of cgroupDriver and set it to systemd. For example: [2][3]

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
...
cgroupDriver: systemd
In v1.22, if the user is not setting the cgroupDriver field under KubeletConfiguration, kubeadm will default it to systemd.

2. Creating a cluster with kubeadm

2.1. Preparing the required container images

This step is optional and only applies in case you wish kubeadm init and kubeadm join to not download the default container images which are hosted at registry.k8s.io.

Kubeadm has commands that can help you pre-pull the required images when creating a cluster without an internet connection on its nodes.

You can list and pull the images using the kubeadm config images sub-command:

kubeadm config images list # [--kubernetes-version=v1.26.0] [--image-repository=registry.k8s.io]
kubeadm config images pull # [--kubernetes-version=v1.26.0] [--image-repository=registry.k8s.io]

Kubeadm allows you to use a custom image repository for the required images.

This example uses the custom image repository with registry.cn-hangzhou.aliyuncs.com/google_containers:

sudo kubeadm config images pull \
  --kubernetes-version=v1.26.0 \
  --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers

You can use ctr to retag the images back to the default repository registry.k8s.io:

#!/bin/sh
kubernetes_version=v1.26.0
image_repository=registry.cn-hangzhou.aliyuncs.com/google_containers
images=$(kubeadm config images list \
    --kubernetes-version $kubernetes_version \
    --image-repository $image_repository)

for i in $images; do
    case "$i" in
        *coredns*)
            new_repo="registry.k8s.io/coredns"
            ;;
        *)
            new_repo="registry.k8s.io"
            ;;
    esac
    newtag=$(echo "$i" | sed "s@$image_repository@$new_repo@")
    ctr -n k8s.io images tag $i $newtag
done

You can override this behavior by using kubeadm with a configuration file.

$ kubeadm config print init-defaults
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 1.2.3.4
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  name: node
  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: 1.25.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
scheduler: {}

2.2. Initializing your control-plane node

The control-plane node is the machine where the control plane components run, including etcd (the cluster database) and the API Server (which the kubectl command line tool communicates with). [9]

  1. (Recommended) If you have plans to upgrade this single control-plane kubeadm cluster to high availability you should specify the --control-plane-endpoint to set the shared endpoint for all control-plane nodes. Such an endpoint can be either a DNS name or an IP address of a load-balancer.

  2. Choose a Pod network add-on, and verify whether it requires any arguments to be passed to kubeadm init. Depending on which third-party provider you choose, you might need to set the --pod-network-cidr to a provider-specific value.

  3. (Optional) kubeadm tries to detect the container runtime by using a list of well known endpoints. To use different container runtime or if there are more than one installed on the provisioned node, specify the --cri-socket argument to kubeadm.

  4. (Optional) Unless otherwise specified, kubeadm uses the network interface associated with the default gateway to set the advertise address for this particular control-plane node’s API server. To use a different network interface, specify the --apiserver-advertise-address=<ip-address> argument to kubeadm init. To deploy an IPv6 Kubernetes cluster using IPv6 addressing, you must specify an IPv6 address, for example --apiserver-advertise-address=2001:db8::101.

While --apiserver-advertise-address can be used to set the advertise address for this particular control-plane node’s API server, --control-plane-endpoint can be used to set the shared endpoint for all control-plane nodes.

--control-plane-endpoint allows both IP addresses and DNS names that can map to IP addresses. Please contact your network administrator to evaluate possible solutions with respect to such mapping.

Here is an example mapping:

192.168.0.102 cluster-endpoint

Where 192.168.0.102 is the IP address of this node and cluster-endpoint is a custom DNS name that maps to this IP. This will allow you to pass --control-plane-endpoint=cluster-endpoint to kubeadm init and pass the same DNS name to kubeadm join. Later you can modify cluster-endpoint to point to the address of your load-balancer in an high availability scenario.

Turning a single control plane cluster created without --control-plane-endpoint into a highly available cluster is not supported by kubeadm.

sudo kubeadm init \
    --kubernetes-version=v1.26.0 \
    --pod-network-cidr=10.244.0.0/16 \
    --apiserver-advertise-address=192.168.0.100 \
    --control-plane-endpoint=cluster-endpoint \
    --ignore-preflight-errors=NumCPU,Mem \
    --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers \
    --dry-run

2.3. The kubelet drop-in file for systemd

kubeadm ships with configuration for how systemd should run the kubelet. Note that the kubeadm CLI command never touches this drop-in file. [14]

This configuration file installed by the kubeadm DEB or RPM package is written to /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and is used by systemd. It augments the basic kubelet.service for RPM or kubelet.service for DEB:

Note: The contents below are just an example.
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generate at runtime, populating
# the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably,
# the user should use the .NodeRegistration.KubeletExtraArgs object in the configuration files instead.
# KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

This file specifies the default locations for all of the files managed by kubeadm for the kubelet.

  • The KubeConfig file to use for the TLS Bootstrap is /etc/kubernetes/bootstrap-kubelet.conf, but it is only used if /etc/kubernetes/kubelet.conf does not exist.

  • The KubeConfig file with the unique kubelet identity is /etc/kubernetes/kubelet.conf.

  • The file containing the kubelet’s ComponentConfig is /var/lib/kubelet/config.yaml.

  • The dynamic environment file that contains KUBELET_KUBEADM_ARGS is sourced from /var/lib/kubelet/kubeadm-flags.env.

  • The file that can contain user-specified flag overrides with KUBELET_EXTRA_ARGS is sourced from /etc/default/kubelet (for DEBs), or /etc/sysconfig/kubelet (for RPMs). KUBELET_EXTRA_ARGS is last in the flag chain and has the highest priority in the event of conflicting settings.

2.4. Configurations for local ephemeral storage

Nodes have local ephemeral storage, backed by locally-attached writeable devices or, sometimes, by RAM. "Ephemeral" means that there is no long-term guarantee about durability. [13] [15]

Pods use ephemeral local storage for scratch space, caching, and for logs. The kubelet can provide scratch space to Pods using local ephemeral storage to mount emptyDir volumes into containers.

The kubelet also uses this kind of storage to hold node-level container logs, container images, and the writable layers of running containers.

Kubernetes supports two ways to configure local ephemeral storage on a node:

  • Single filesystem

    In this configuration, you place all different kinds of ephemeral local data (emptyDir volumes, writeable layers, container images, logs) into one filesystem. The most effective way to configure the kubelet means dedicating this filesystem to Kubernetes (kubelet) data.

    The kubelet also writes node-level container logs and treats these similarly to ephemeral local storage.

    The kubelet writes logs to files inside its configured log directory (/var/log by default); and has a base directory for other locally stored data (/var/lib/kubelet by default).

    Typically, both /var/lib/kubelet and /var/log are on the system root filesystem, and the kubelet is designed with that layout in mind.

    Your node can have as many other filesystems, not used for Kubernetes, as you like.

  • Two filesystems

    You have a filesystem on the node that you’re using for ephemeral data that comes from running Pods: logs, and emptyDir volumes. You can use this filesystem for other data (for example: system logs not related to Kubernetes); it can even be the root filesystem.

    The kubelet also writes node-level container logs into the first filesystem, and treats these similarly to ephemeral local storage.

    You also use a separate filesystem, backed by a different logical storage device. In this configuration, the directory where you tell the kubelet to place container image layers and writeable layers is on this second filesystem.

    The first filesystem does not hold any image layers or writeable layers.

    Your node can have as many other filesystems, not used for Kubernetes, as you like.

The kubelet can measure how much local storage it is using. It does this provided that you have set up the node using one of the supported configurations for local ephemeral storage.

If you have a different configuration, then the kubelet does not apply resource limits for ephemeral local storage.

Note: The kubelet tracks tmpfs emptyDir volumes as container memory use, rather than as local ephemeral storage.
Note: The kubelet will only track the root filesystem for ephemeral storage. OS layouts that mount a separate disk to /var/lib/kubelet or /var/lib/containers will not report ephemeral storage correctly.

2.5. Installing a Pod network add-on

You must deploy a Container Network Interface (CNI) based Pod network add-on so that your Pods can communicate with each other. Cluster DNS (CoreDNS) will not start up before a network is installed.

  • Take care that your Pod network must not overlap with any of the host networks: you are likely to see problems if there is any overlap. (If you find a collision between your network plugin’s preferred Pod network and some of your host networks, you should think of a suitable CIDR block to use instead, then use that during kubeadm init with --pod-network-cidr and as a replacement in your network plugin’s YAML).

  • By default, kubeadm sets up your cluster to use and enforce use of RBAC (role based access control). Make sure that your Pod network plugin supports RBAC, and so do any manifests that you use to deploy it.

  • If you want to use IPv6—​either dual-stack, or single-stack IPv6 only networking—​for your cluster, make sure that your Pod network plugin supports IPv6. IPv6 support was added to CNI in v0.6.0.

Several external projects provide Kubernetes Pod networks using CNI, some of which also support Network Policy.

See a list of add-ons that implement the Kubernetes networking model.

You can install a Pod network add-on with the following command on the control-plane node or a node that has the kubeconfig credentials:

$ kubectl apply -f <add-on.yaml>

You can install only one Pod network per cluster.

Once a Pod network has been installed, you can confirm that it is working by checking that the CoreDNS Pod is Running in the output of kubectl get pods --all-namespaces. And once the CoreDNS Pod is up and running, you can continue by joining your nodes.

Deploying flannel manually[10]

Flannel can be added to any existing Kubernetes cluster though it’s simplest to add flannel before any pods using the pod network have been started.

For Kubernetes v1.17+

$ kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/v0.20.2/Documentation/kube-flannel.yml

If you use custom podCIDR (not 10.244.0.0/16) you first need to download the above manifest and modify the network to match your one.

2.6. Control plane node isolation

By default, your cluster will not schedule Pods on the control plane nodes for security reasons. If you want to be able to schedule Pods on the control plane nodes, for example for a single machine Kubernetes cluster, run:

$ kubectl taint nodes --all node-role.kubernetes.io/control-plane-

The output will look something like:

node/node-1.localdomain untainted
...

This will remove the node-role.kubernetes.io/control-plane:NoSchedule taint from any nodes that have it, including the control plane nodes, meaning that the scheduler will then be able to schedule Pods everywhere.

2.7. Joining your nodes

The nodes are where your workloads (containers and Pods, etc) run. To add new nodes to your cluster do the following for each machine:

  • SSH to the machine

  • Become root (e.g. sudo su -)

  • Install a runtime if needed

  • Run the command that was output by kubeadm init. For example:

    You can now join any number of control-plane nodes by copying certificate authorities
    and service account keys on each node and then running the following as root:
    
      kubeadm join cluster-endpoint:6443 --token r8zo5w.rfrg93x0luuo01cy \
    	--discovery-token-ca-cert-hash sha256:ffede9eb2a183a66e3ba5dd313abe9423e36ee57ac3d6b75e7d693c3df3f23f1 \
    	--control-plane
    
    Then you can join any number of worker nodes by running the following on each as root:
    
    kubeadm join cluster-endpoint:6443 --token r8zo5w.rfrg93x0luuo01cy \
    	--discovery-token-ca-cert-hash sha256:ffede9eb2a183a66e3ba5dd313abe9423e36ee57ac3d6b75e7d693c3df3f23f1

If you do not have the token, you can get it by running the following command on the control-plane node:

$ kubeadm token list
TOKEN                     TTL         EXPIRES                USAGES                   DESCRIPTION                                                EXTRA GROUPS
r8zo5w.rfrg93x0luuo01cy   23h         2022-12-27T06:07:36Z   authentication,signing   The default bootstrap token generated by 'kubeadm init'.   system:bootstrappers:kubeadm:default-node-token

By default, tokens expire after 24 hours. If you are joining a node to the cluster after the current token has expired, you can create a new token by running the following command on the control-plane node:

$ kubeadm token create
jlur5d.5qzgyjl28ssfj3za

If you don’t have the value of --discovery-token-ca-cert-hash, you can get it by running the following command chain on the control-plane node:

$ openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | \
   openssl dgst -sha256 -hex | sed 's/^.* //'
ffede9eb2a183a66e3ba5dd313abe9423e36ee57ac3d6b75e7d693c3df3f23f1

You can also run the following command to create and print join command:

$ kubeadm token create --print-join-command
kubeadm join cluster-endpoint:6443 --token 2ihyt2.g933wbzyatjdw56i --discovery-token-ca-cert-hash sha256:ffede9eb2a183a66e3ba5dd313abe9423e36ee57ac3d6b75e7d693c3df3f23f1

As the cluster nodes are usually initialized sequentially, the CoreDNS Pods are likely to all run on the first control-plane node.

To provide higher availability, please rebalance the CoreDNS Pods with kubectl -n kube-system rollout restart deployment coredns after at least one new node is joined.

2.8. Remove the node

Talking to the control-plane node with the appropriate credentials, run:

$ kubectl drain <node name> --delete-emptydir-data --force --ignore-daemonsets

Before removing the node, reset the state installed by kubeadm:

$ kubeadm reset

The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually:

$ iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If you want to reset the IPVS tables, you must run the following command:

$ ipvsadm -C

Now remove the node:

$ kubectl delete node <node name>

3. Installing Addons

3.1. Metrics Server

Metrics Server is a scalable, efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines. [11]

Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler and Vertical Pod Autoscaler.

Metrics API can also be accessed by kubectl top, making it easier to debug autoscaling pipelines.

Installation instructions can be found in Metrics Server releases.

You can also consider updating the image as the following:

# kustomization.yaml
resources:
  - ../base
patchesStrategicMerge:
  - metrics-server-deployment.yaml
images:
  - name: k8s.gcr.io/metrics-server/metrics-server
    newName: registry.aliyuncs.com/google_containers/metrics-server

3.2. Ingress Controllers

In order for the Ingress resource to work, the cluster must have an ingress controller running. [12]

Kubernetes as a project supports and maintains AWS, GCE, and nginx ingress controllers.

You can also consider updating the ingress-nginx images as the following:

images:
  - name: registry.k8s.io/ingress-nginx/controller
    newName: registry.aliyuncs.com/google_containers/nginx-ingress-controller
  - name: registry.k8s.io/ingress-nginx/kube-webhook-certgen
    newName: registry.aliyuncs.com/google_containers/kube-webhook-certgen