$ curl -sLO https://get.helm.sh/helm-v3.14.2-linux-amd64.tar.gz
$ tar xf helm-v3.14.2-linux-amd64.tar.gz
$ sudo cp linux-amd64/helm /usr/local/bin/
Helm is a package manager for Kubernetes that uses a packaging format called charts, and is implemented into two distinct parts: Helm Client and Helm Library. [1]
A chart is a bundle of information to create an instance of a Kubernetes application and a repository is the place where charts can be collected and shared.
A release is a running instance of a chart, combined with a specific config that contains configuration information that can be merged into a packaged chart to create a releasable object.
The Helm can be installed from the official binary releases and the community package managers. [2]
Every release of Helm provides binary releases for a variety of OSes. These binary versions can be manually downloaded and installed.
Download a desired version that supported between Helm and Kubernetes.
Unpack it (tar -zxvf helm-v3.0.0-linux-amd64.tar.gz
)
Find the helm binary in the unpacked directory, and move it to its desired destination (mv linux-amd64/helm /usr/local/bin/helm
)
For example, to install a version 3.14.x
Helm that supports Kubernetes between 1.29.x
and 1.26.x
:
$ curl -sLO https://get.helm.sh/helm-v3.14.2-linux-amd64.tar.gz
$ tar xf helm-v3.14.2-linux-amd64.tar.gz
$ sudo cp linux-amd64/helm /usr/local/bin/
The following script is used to install Helm from Apt (Debian/Ubuntu):
curl https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
sudo apt-get install apt-transport-https --yes
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm
It’s recommended to run the following command to generate the completion scripts for the specified shell (e.g., Bash):
helm completion bash | sudo tee /etc/bash_completion.d/helm > /dev/null
source /etc/bash_completion # reload the completion scripts
Once the Helm is ready, a chart repository can be added using helm repo add
command. Check Artifact Hub for available Helm chart repositories. [3]
For example, to install the bitnami
repo:
helm repo add bitnami https://charts.bitnami.com/bitnami
Once installed, it is able to list the available charts:
$ helm search repo bitnami
NAME CHART VERSION APP VERSION DESCRIPTION
bitnami/airflow 17.2.1 2.8.2 Apache Airflow is a tool to express and execute...
bitnami/apache 10.9.1 2.4.58 Apache HTTP Server is an open-source HTTP serve...
bitnami/apisix 2.10.0 3.8.0 Apache APISIX is high-performance, real-time AP...
...
To install a chart, you can run the helm install
command. Helm has several ways to find and install a chart, but the easiest is to use the bitnami charts. [3]
Make sure getting the latest list of charts.
$ helm repo update bitnami
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈
Install bitnami/mysql
chart.
$ helm install bitnami/mysql --generate-name
NAME: mysql-1709977095
LAST DEPLOYED: Sat Mar 9 17:38:19 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
...
List the deployed releases.
$ helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
mysql-1709977095 default 1 2024-03-09 17:38:19.258433628 +0800 CST deployed mysql-9.23.0 8.0.36
Uninstall a release.
$ helm uninstall mysql-1709977095
release "mysql-1709977095" uninstalled
Docker uses storage drivers to store image layers, and to store data in the writable layer of a container. [1]
Storage drivers are optimized for space efficiency, but (depending on the storage driver) write speeds are lower than native file system performance, especially for storage drivers that use a copy-on-write filesystem.
Use Docker volumes for write-intensive data, data that must persist beyond the container’s lifespan, and data that must be shared between containers.
A Docker image is built up from a series of layers. Each layer represents an instruction in the image’s Dockerfile. Each layer except the very last one is read-only. Consider the following Dockerfile:
# syntax=docker/dockerfile:1
FROM ubuntu:22.04
LABEL org.opencontainers.image.authors="org@example.com"
COPY . /app
RUN make /app
RUN rm -r $HOME/.cache
CMD python /app/app.py
This Dockerfile contains four commands. Commands that modify the filesystem create a layer.
The FROM
statement starts out by creating a layer from the ubuntu:22.04
image.
The LABEL
command only modifies the image’s metadata, and doesn’t produce a new layer.
The COPY
command adds some files from your Docker client’s current directory.
The first RUN
command builds your application using the make command, and writes the result to a new layer.
The second RUN
command removes a cache directory, and writes the result to a new layer.
Finally, the CMD
instruction specifies what command to run within the container, which only modifies the image’s metadata, which doesn’t produce an image layer.
When a new container is created, a new writable layer is added on top of the underlying layers, which is often called the container layer.
A storage driver handles the details about the way these layers interact with each other.
To see what storage driver Docker is currently using, use docker
info and look for the Storage Driver
line:
$ docker info 2> /dev/null | grep 'Storage Driver' -A 5
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
$ df -T /var/lib/docker
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/sda1 ext4 102624184 57865288 39499736 60% /
containerd, the industry-standard container runtime, uses snapshotters instead of the classic storage drivers for storing image and container data. While the
|
Docker has two options for containers to store files on the host machine, so that the files are persisted even after the container stops: volumes, and bind mounts. [3]
Volumes are stored in a part of the host filesystem which is managed by Docker (/var/lib/docker/volumes/
on Linux). Non-Docker processes should not modify this part of the filesystem. Volumes are the best way to persist data in Docker.
Bind mounts may be stored anywhere on the host system. They may even be important system files or directories. Non-Docker processes on the Docker host or a Docker container can modify them at any time.
tmpfs mounts are stored in the host system’s memory only, and are never written to the host system’s filesystem.
Kubernetes supports many types of volumes. Ephemeral volume types have a lifetime of a pod, but persistent volumes exist beyond the lifetime of a pod. [4]
To use a volume, specify the volumes to provide for the Pod in .spec.volumes
and declare where to mount those volumes into containers in .spec.containers[*].volumeMounts
.
A process in a container sees a filesystem view composed from the initial contents of the container image, plus volumes (if defined) mounted inside the container.
Kubernetes supports several types of volumes.
configMap
A ConfigMap provides a way to inject configuration data into pods. The data stored in a ConfigMap can be referenced in a volume of type configMap
and then consumed by containerized applications running in a pod.
downwardAPI
A downwardAPI
volume makes downward API data available to applications. Within the volume, you can find the exposed data as read-only files in plain text format.
emptyDir
For a Pod that defines an emptyDir
volume, the volume is created when the Pod is assigned to a node.
As the name says, the emptyDir
volume is initially empty.
All containers in the Pod can read and write the same files in the emptyDir volume, though that volume can be mounted at the same or different paths in each container.
When a Pod is removed from a node for any reason, the data in the emptyDir
is deleted permanently.
The emptyDir.medium
field controls where emptyDir volumes are stored.
By default emptyDir volumes are stored on whatever medium that backs the node such as disk, SSD, or network storage, determined by the medium of the filesystem holding the kubelet root dir (typically /var/lib/kubelet
).
If you set the emptyDir.medium
field to "Memory", Kubernetes mounts a tmpfs (RAM-backed filesystem) for you instead.
While tmpfs is very fast be aware that, unlike disks, files you write count against the memory limit of the container that wrote them.
hostPath
A hostPath
volume mounts a file or directory from the host node’s filesystem into your Pod. This is not something that most Pods will need, but it offers a powerful escape hatch for some applications.
local
A local
volume represents a mounted local storage device such as a disk, partition or directory.
Local volumes can only be used as a statically created PersistentVolume. When using local volumes, it is recommended to create a StorageClass with volumeBindingMode
set to WaitForFirstConsumer
.
nfs
An nfs
volume allows an existing NFS (Network File System) share to be mounted into a Pod.
NFS can be mounted by multiple writers simultaneously.
persistentVolumeClaim
A persistentVolumeClaim
volume is used to mount a PersistentVolume into a Pod.
PersistentVolumeClaims are a way for users to "claim" durable storage (such as an iSCSI volume) without knowing the details of the particular cloud environment.
projected
A projected
volume maps several existing volume sources into the same directory.
secret
A secret
volume is used to pass sensitive information, such as passwords, to Pods, which is backed by tmpfs (a RAM-backed filesystem) so they are never written to non-volatile storage.
Container Storage Interface (CSI) defines a standard interface for container orchestration systems (like Kubernetes) to expose arbitrary storage systems to their container workloads.
Once a CSI compatible volume driver is deployed on a Kubernetes cluster, users may use the csi
volume type to attach or mount the volumes exposed by the CSI driver.
A csi
volume can be used in a Pod in three different ways:
through a reference to a PersistentVolumeClaim
with a generic ephemeral volume
with a CSI ephemeral volume if the driver supports that
The following fields are available to storage administrators to configure a CSI persistent volume:
driver
: A string value that specifies the name of the volume driver to use.
volumeHandle
: A string value that uniquely identifies the volume.
readOnly
: An optional boolean value indicating whether the volume is to be "ControllerPublished" (attached) as read only. Default is false.
fsType
: If the PV’s VolumeMode
is Filesystem
then this field may be used to specify the filesystem that should be used to mount the volume.
If the volume has not been formatted and formatting is supported, this value will be used to format the volume.
volumeAttributes
: A map of string to string that specifies static properties of a volume.
controllerPublishSecretRef
: A reference to the secret object containing sensitive information to pass to the CSI driver to complete the CSI ControllerPublishVolume
and ControllerUnpublishVolume
calls.
nodeExpandSecretRef
: A reference to the secret containing sensitive information to pass to the CSI driver to complete the CSI NodeExpandVolume
call.
nodePublishSecretRef
: A reference to the secret object containing sensitive information to pass to the CSI driver to complete the CSI NodePublishVolume
call.
nodeStageSecretRef
: A reference to the secret object containing sensitive information to pass to the CSI driver to complete the CSI NodeStageVolume
call.
Mount propagation [4] allows for sharing volumes mounted by a container to other containers in the same pod, or even to other pods on the same node, which is controlled by the mountPropagation
field in containers[*].volumeMounts
.
None
- This volume mount will not receive any subsequent mounts that are mounted to this volume or any of its subdirectories by the host.
In similar fashion, no mounts created by the container will be visible on the host.
This is the default mode.
HostToContainer
- This volume mount will receive all subsequent mounts that are mounted to this volume or any of its subdirectories.
Bidirectional
- This volume mount behaves the same the HostToContainer
mount.
In addition, all volume mounts created by the container will be propagated back to the host and to all containers of all pods that use the same volume.
Managing storage is a distinct problem from managing compute instances. The PersistentVolume subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed.
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes.
It is a resource in the cluster just like a node is a cluster resource, that captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV.
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a Pod.
Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory).
Claims can request specific size and access modes (e.g., ReadWriteOnce, ReadOnlyMany, ReadWriteMany, or ReadWriteOncePod).
While PersistentVolumeClaims allow a user to consume abstract storage resources, it is common that users need PersistentVolumes with varying properties, such as performance, for different problems.
A StorageClass provides a way for administrators to describe the classes of storage they offer. Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies determined by the cluster administrators. [5]
Each StorageClass contains the fields provisioner
, parameters
, and reclaimPolicy
, which are used when a PersistentVolume belonging to the class needs to be dynamically provisioned to satisfy a PersistentVolumeClaim (PVC).
The name of a StorageClass object is significant, and is how users can request a particular class. Administrators set the name and other parameters of a class when first creating StorageClass objects.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
PVs are resources in the cluster. PVCs are requests for those resources and also act as claim checks to the resource. The interaction between PVs and PVCs follows this lifecycle: [6]
There are two ways PVs may be provisioned: statically or dynamically.
Static
A cluster administrator creates a number of PVs. They carry the details of the real storage, which is available for use by cluster users. They exist in the Kubernetes API and are available for consumption.
Dynamic
When none of the static PVs the administrator created match a user’s PersistentVolumeClaim, the cluster may try to dynamically provision a volume specially for the PVC based on StorageClasses.
A control loop in the control plane watches for new PVCs, finds a matching PV (if possible), and binds them together.
If a PV was dynamically provisioned for a new PVC, the loop will always bind that PV to the PVC.
Otherwise, the user will always get at least what they asked for, but the volume may be in excess of what was requested.
The volumeBindingMode
field of a StorageClass controls when volume binding and dynamic provisioning should occur, and when unset, Immediate
mode is used by default. [5]
The Immediate
mode indicates that volume binding and dynamic provisioning occurs once the PersistentVolumeClaim is created.
For storage backends that are topology-constrained and not globally accessible from all Nodes in the cluster, PersistentVolumes will be bound or provisioned without knowledge of the Pod’s scheduling requirements. This may result in unschedulable Pods.
A cluster administrator can address this issue by specifying the WaitForFirstConsumer
mode which will delay the binding and provisioning of a PersistentVolume until a Pod using the PersistentVolumeClaim is created.
PersistentVolumes will be selected or provisioned conforming to the topology that is specified by the Pod’s scheduling constraints.
Pods use claims as volumes.
The cluster inspects the claim to find the bound volume and mounts that volume for a Pod.
For volumes that support multiple access modes, the user specifies which mode is desired when using their claim as a volume in a Pod.
If a user deletes a PVC in active use by a Pod, the PVC is not removed immediately. PVC removal is postponed until the PVC is no longer actively used by any Pods. Also, if an admin deletes a PV that is bound to a PVC, the PV is not removed immediately. PV removal is postponed until the PV is no longer bound to a PVC.
The reclaim policy for a PersistentVolume tells the cluster what to do with it after it has been released of its claim, which can either be Retained or Deleted.
FEATURE STATE: Kubernetes v1.23 [alpha]
Finalizers can be added on a PersistentVolume to ensure that PersistentVolumes having Delete reclaim policy are deleted only after the backing storage are deleted.
The newly introduced finalizers kubernetes.io/pv-controller
and external-provisioner.volume.kubernetes.io/finalizer
are only added to dynamically provisioned volumes.
The finalizer kubernetes.io/pv-controller
is added to in-tree plugin volumes.
The finalizer external-provisioner.volume.kubernetes.io/finalizer
is added for CSI volumes.
If you want a PVC to bind to a specific PV, you need to pre-bind them.
By specifying a PersistentVolume in a PersistentVolumeClaim, you declare a binding between that specific PV and PVC.
If the PersistentVolume exists and has not reserved PersistentVolumeClaims through its claimRef
field, then the PersistentVolume and PersistentVolumeClaim will be bound.
The binding happens regardless of some volume matching criteria, including node affinity.
The control plane still checks that storage class, access modes, and requested storage size are valid.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: foo-pvc
namespace: foo
spec:
# Empty string must be explicitly set otherwise default StorageClass will be set.
storageClassName: ""
volumeName: foo-pv
...
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: foo-pv
spec:
storageClassName: ""
claimRef:
name: foo-pvc
namespace: foo
...
FEATURE STATE: Kubernetes v1.24 [stable]
To request a larger volume for a PVC, edit the PVC object and specify a larger size. This triggers expansion of the volume that backs the underlying PersistentVolume. A new PersistentVolume is never created to satisfy the claim. Instead, an existing volume is resized.
You can only expand a PVC if its storage class’s allowVolumeExpansion
field is set to true.
Pods access storage by using the claim as a volume.
Claims must exist in the same namespace as the Pod using the claim.
The cluster finds the claim in the Pod’s namespace and uses it to get the PersistentVolume backing the claim.
The volume is then mounted to the host and into the Pod.
FEATURE STATE: Kubernetes v1.18 [stable]
The following volume plugins support raw block volumes, including dynamic provisioning where applicable:
CSI
FC (Fibre Channel)
iSCSI
Local volume
OpenStack Cinder
RBD (deprecated)
RBD (Ceph Block Device; deprecated)
VsphereVolume
apiVersion: v1
kind: PersistentVolume
metadata:
name: block-pv
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 5Gi
local:
path: /dev/sdb
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: node.local.io/block-storage
operator: In
values:
- local
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
volumeMode: Block
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: block-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
limits:
storage: 5Gi
requests:
storage: 5Gi
storageClassName: local-storage
volumeMode: Block
---
apiVersion: v1
kind: Pod
metadata:
name: pod-with-block-volume
spec:
containers:
- name: busybox
image: busybox:stable
command: ["/bin/sh", "-c"]
args: [ "tail -f /dev/null" ]
volumeDevices:
- name: data
devicePath: /dev/xvda
volumes:
- name: data
persistentVolumeClaim:
claimName: block-pvc
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 10G 0 loop
sda 8:0 0 100G 0 disk
└─sda1 8:1 0 100G 0 part /
sdb 8:16 0 10G 0 disk
$ kubectl get storageclasses.storage.k8s.io local-storage
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local-storage kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 3d11h
The Container Storage Interface (CSI) is a standard for exposing arbitrary block and file storage systems to containerized workloads on Kubernetes.
By adopting and using CSI, Azure Kubernetes Service (AKS) can write, deploy, and iterate plug-ins to expose new or improve existing storage systems in Kubernetes without having to touch the core Kubernetes code and wait for its release cycles. [6]
A PersistentVolumeClaim requests storage of a particular StorageClass, access mode, and size. The Kubernetes API server can dynamically provision the underlying Azure storage resource if no existing resource can fulfill the claim based on the defined StorageClass.
The CSI storage driver support on AKS allows you to natively use:
Azure Disks can be used to create a Kubernetes DataDisk resource.
Disks can use Azure Premium Storage, backed by high-performance SSDs, or Azure Standard Storage, backed by regular HDDs or Standard SSDs. For most production and development workloads, use Premium Storage.
Azure Disks are mounted as ReadWriteOnce and are only available to one node in AKS. For storage volumes that can be accessed by multiple nodes simultaneously, use Azure Files.
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: azuredisk-csi-waitforfirstconsumer
provisioner: disk.csi.azure.com
parameters:
skuname: StandardSSD_LRS
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
Azure Files can be used to mount an SMB 3.0/3.1 share backed by an Azure storage account to pods.
With Azure Files, you can share data across multiple nodes and pods.
Azure Files can use Azure Standard storage backed by regular HDDs or Azure Premium storage backed by high-performance SSDs.
Azure Blob storage can be used to mount Blob storage (or object storage) as a file system into a container or pod.
Using Blob storage enables your cluster to support applications that work with large unstructured datasets like log file data, images or documents, HPC, and others.
Additionally, if you ingest data into Azure Data Lake storage, you can directly mount and use it in AKS without configuring another interim filesystem.
Istio addresses the challenges developers and operators face with a distributed or microservices architecture. [1]
Modern applications are typically architected as distributed collections of microservices, with each collection of microservices performing some discrete business function. [1]
A service mesh is a dedicated infrastructure layer that you can add to your applications. It allows you to transparently add capabilities like observability, traffic management, and security, without adding them to your own code.
The term “service mesh” describes both the type of software you use to implement this pattern, and the security or network domain that is created when you use that software.
Istio is an open source service mesh that layers transparently onto existing distributed applications.
Istio’s powerful features provide a uniform and more efficient way to secure, connect, and monitor services. Istio is the path to load balancing, service-to-service authentication, and monitoring – with few or no service code changes. Its powerful control plane brings vital features, including:
Secure service-to-service communication in a cluster with TLS encryption, strong identity-based authentication and authorization
Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic
Fine-grained control of traffic behavior with rich routing rules, retries, failovers, and fault injection
A pluggable policy layer and configuration API supporting access controls, rate limits and quotas
Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress and egress
Istio’s control plane runs on Kubernetes, and you can add applications deployed in that cluster to your mesh, extend the mesh to other clusters, or even connect VMs or other endpoints running outside of Kubernetes.
An Istio service mesh is logically split into a data plane and a control plane. [2]
The data plane is composed of a set of intelligent proxies (Envoy) deployed as sidecars. These proxies mediate and control all network communication between microservices. They also collect and report telemetry on all mesh traffic.
The control plane manages and configures the proxies to route traffic.
Envoy
Istio uses an extended version of the Envoy proxy. Envoy is a high-performance proxy developed in C++ to mediate all inbound and outbound traffic for all services in the service mesh. Envoy proxies are the only Istio components that interact with data plane traffic.
Istiod
Istiod provides service discovery, configuration and certificate management.
Go the the Istio release page to download the installation file for your OS, or download and extract the latest release automatically (Linux or macOS): [3]
curl -L https://istio.io/downloadIstio | sh -
Move to the Istio package directory. For example, if the package is istio-1.20.3
:
cd istio-1.20.3
The installation directory contains:
Sample applications in samples/
The istioctl
client binary in the bin/
directory.
Add the istioctl
client to your path (Linux or macOS):
export PATH=$PWD/bin:$PATH
For this installation, we use the demo
configuration profile. It’s selected to have a good set of defaults for testing, but there are other profiles for production or performance testing.
$ istioctl install --set profile=demo -y
✔ Istio core installed
✔ Istiod installed
✔ Egress gateways installed
✔ Ingress gateways installed
✔ Installation complete
Made this installation the default for injection and validation.
Add a namespace label to instruct Istio to automatically inject Envoy sidecar proxies when you deploy your application later:
$ kubectl label namespace default istio-injection=enabled
namespace/default labeled
Uninstall
The Istio uninstall deletes the RBAC permissions and all resources hierarchically under the istio-system
namespace. It is safe to ignore errors for non-existent resources because they may have been deleted hierarchically.
istioctl uninstall -y --purge
The istio-system
namespace is not removed by default. If no longer needed, use the following command to remove it:
kubectl delete namespace istio-system
The label to instruct Istio to automatically inject Envoy sidecar proxies is not removed by default. If no longer needed, use the following command to remove it:
kubectl label namespace default istio-injection-
The controller is a central, coordinating process which stores configuration, loads plugins, and renders the various user interfaces for Jenkins. An agent is typically a machine, or container, which connects to a Jenkins controller and executes tasks when directed by the controller. A node is a machine which is part of the Jenkins environment and capable of executing Pipelines or jobs. Both the Controller and Agents are considered to be Nodes. An executor is a slot for execution of work defined by a Pipeline or job on a Node. A Node may have zero or more Executors configured which corresponds to how many concurrent Jobs or Pipelines are able to execute on that Node. A workspace is disposable directory on the file system of a Node where work can be done by a Pipeline or job. Workspaces are typically left in place after a Build or Pipeline run completes unless specific Workspace cleanup policies have been put in place on the Jenkins Controller. [1] |
A Jenkins controller can operate by itself both managing the build environment and executing the builds with its own executors and resources. If you stick with this "standalone" configuration you will most likely run out of resources when the number or the load of your projects increase.
An agent, where the workload of building projects are delegated to, is a machine set up to offload projects from the controller. The method with which builds are scheduled depends on the configuration given to each project. For example, some projects may be configured to "restrict where this project is run" which ties the project to a specific agent or set of labeled agents. Other projects which omit this configuration will select an agent from the available pool in Jenkins.
In a distributed builds environment, the Jenkins controller will use its resources to only handle HTTP requests and manage the build environment. Actual execution of builds will be delegated to the agents. With this configuration it is possible to horizontally scale an architecture, which allows a single Jenkins installation to host a large number of projects and build environments. [2]
In order for a machine to be recognized as an agent, it needs to run a specific agent program to establish bi-directional communication with the controller.
There are different ways to establish a connection between controller and agent:
The SSH connector: Configuring an agent to use the SSH connector is the preferred and the most stable way to establish controller-agent communication.
The Inbound connector: In this case the communication is established starting the agent through a connection initiated by an agent program.
The Inbound-HTTP connector: This approach is quite similar to the Inbound-TCP Java Web Start approach, with the difference in this case being that the agent is executed as headless and the connection can be tunneled via HTTP(s).
Custom-script: It is also possible to create a custom script to initialize the communication between controller and agent if the other solutions do not provide enough flexibility for a specific use-case.
Builds in a distributed builds architecture use nodes, agents, and executors, which are distinct from the Jenkins controller itself. Understanding what each of these components are is useful when managing nodes: [3]
The Jenkins controller is the Jenkins service itself and where Jenkins is installed. It is also a web server that also acts as a "brain" for deciding how, when, and where to run tasks. Management tasks such as configuration, authorization, and authentication are executed on the controller, which serves HTTP requests. Files written when a Pipeline executes are written to the filesystem on the controller, unless they are off-loaded to an artifact repository such as Nexus or Artifactory.
Agents manage the task execution on behalf of the Jenkins controller by using executors. An agent is a small (170KB single jar) Java client process that connects to a Jenkins controller and is assumed to be unreliable. An agent can use any operating system that supports Java. Any tools required for building and testing get installed on the node where the agent runs. Because these tools are a part of the node, they can be installed directly or in a container, such as Docker or Kubernetes. Each agent is effectively a process with its own Process Identifier (PID) on the host machine. In practice, nodes and agents are essentially the same but it is good to remember that they are conceptually distinct.
Nodes are the "machines" on which build agents run. Jenkins monitors each attached node for disk space, free temp space, free swap, clock time/sync, and response time. A node is taken offline if any of these values go outside the configured threshold. Jenkins supports two types of nodes:
agents (described above)
built-in node
The built-in node is a node that exists within the controller process. It is possible to use agents and the build-in node to run tasks. However, running tasks on the built-in node is discouraged for security, performance, and scalability reasons. The number of executors configured for the node determines the node’s ability to run tasks. Set the number of executors to 0 to disable running tasks on the built-in node.
An executor is a slot for the execution of tasks. Effectively, it is a thread in the agent. The number of executors on a node defines the number of concurrent tasks that can run. In other words, this determines the number of concurrent Pipeline stages that can execute at the same time. Determine the correct number of executors per build node must be determined based on the resources available on the node and the resources required for the workload. When determining how many executors to run on a node, consider CPU and memory requirements, as well as the amount of I/O and network activity:
One executor per node is the safest configuration.
One executor per CPU core can work well, if the tasks running are small.
Monitor I/O performance, CPU load, memory usage, and I/O throughput carefully when running multiple executors on a node.
Due to Docker’s fundamental platform and container design, a Docker image for a given application, such as Jenkins, can be run on any supported operating system or cloud service also running Docker. [4]
Open up a terminal window, and create a directory named controller.
mkdir controller
cd controller
Create an environment file named .env and set the project name with jenkins.
echo -n COMPOSE_PROJECT_NAME=jenkins > .env
Create a groovy file named executors.groovy
with the following content.
import jenkins.model.*
Jenkins.instance.setNumExecutors(0) // Recommended to not run builds on the built-in node
Create a bridge network for the controller.
docker network create -d bridge jenkins-controller
Create a compose file named compose.yml
with the following content.
version: "2.4"
services:
controller:
container_name: jenkins-controller
build:
context: .
dockerfile_inline: |
ARG JENKINS_TAG=2.426.3-jdk21
FROM jenkins/jenkins:$${JENKINS_TAG} (1)
COPY --chown=jenkins:jenkins executors.groovy /usr/share/jenkins/ref/init.groovy.d/executors.groovy (2)
restart: on-failure
ports:
- "8080:8080"
- "50000:50000" (3)
volumes:
- jenkins-home:/var/jenkins_home:rw (4)
networks:
controller:
volumes:
jenkins-home:
name: jenkins-home
networks:
controller:
external: true (5)
name: jenkins-controller
1 | Use the recommended official jenkins/jenkins image from the Docker Hub repository. [4] |
2 | Extend the image and change it to your desired number of executors (recommended 0 executors on the built-in node). [5] |
3 | In order to connect agents through an inbound TCP connection, map the port: -p 50000:50000 . That port will be used when you connect agents to the controller.
If you are only using SSH (outbound) build agents, this port is not required, as connections are established from the controller. If you connect agents using web sockets (since Jenkins 2.217), the TCP agent port is not used either. [5] |
4 | NOTE: Avoid using a bind mount from a folder on the host machine into /var/jenkins_home, as this might result in file permission issues (the user used inside the container might not have rights to the folder on the host machine). If you really need to bind mount jenkins_home, ensure that the directory on the host is accessible by the jenkins user inside the container (jenkins user - uid 1000) or use -u some_other_user parameter with docker run . [5] |
5 | external specifies that this network’s lifecycle is maintained outside of that of the application. |
(Optional) Create a compose file named compose.override.yml
with the following content.
Docker Compose lets you merge and override a set of Compose files together to create a composite Compose file. By default, Compose reads two files, a compose.yml and an optional compose.override.yml file. By convention, the compose.yml contains your base configuration. The override file can contain configuration overrides for existing services or entirely new services. [8] |
version: "2.4"
services:
controller:
build:
args:
- JENKINS_TAG=2.426.3-jdk21
environment:
- TZ=Asia/Shanghai
Starting the controller container:
docker compose up -d
Post-installation setup wizard.
Following this Post-installation setup to finish the last steps.
Print the password at console.
|
(Optional) Expose Jeknins with a Kubernetes service.
apiVersion: v1
kind: Service
metadata:
labels:
app: jenkins
name: jenkins
spec:
ports:
- protocol: TCP
port: 8080
targetPort: 8080
name: ''
type: ClusterIP
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
name: jenkins-1
labels:
kubernetes.io/service-name: jenkins
addressType: IPv4
ports:
- name: ''
appProtocol: http
protocol: TCP
port: 8080
endpoints:
- addresses:
- "192.168.56.130" (1)
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: jenkins.dev.test
labels:
app: jenkins
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
ingressClassName: "nginx"
tls: (2)
- hosts:
- "*.dev.test"
secretName: "dev.test"
rules:
- host: jenkins.dev.test (2)
http:
paths:
- path: /
pathType: ImplementationSpecific
backend:
service:
name: jenkins
port:
number: 8080
1 | Replace the IP address with the server hosting the Jenkins controller, e.g, 192.168.56.130 . |
2 | Replace the TLS and hosts of the Ingress with your settings. |
Generating an SSH key pair.
To generate the SSH key pair, execute a command line tool named ssh-keygen on a machine you have access to. [6]
|
ssh-keygen -t ed25519 -f ~/.ssh/jenkins_agent_key
Create a Jenkins SSH credential.
Go to your Jenkins dashboard.
Go to Manage Jenkins option in left main menu and click on the Credentials button under the Security.
Select the drop option Add Credentials from the (global)
item under the Stores scoped to Jenkins.
Fill in the form.
Kind: SSH Username with private key
ID: jenkins
Description: Jenkins SSH private key
Username: jenkins
Private Key: Select Enter directly and press the Add button to insert the content of your private key file at ~/.ssh/jenkins_agent_key.
Passphrase: Fill your passphrase used to generate the SSH key pair (leave empty if you didn’t use one at the previous step) and then press the Create button.
Open up a terminal window, and create a directory named agents.
mkdir agents
cd agents
Create an environment file named .env and set the project name with jenkins-agents.
echo -n COMPOSE_PROJECT_NAME=jenkins-agents > .env
Create a bridge network for the agent.
docker network create -d bridge jenkins-agents
Create a compose file named compose.yml
with the following content.
version: "2.4"
services:
agent:
container_name: jenkins-agent
image: jenkins/ssh-agent:alpine-jdk21
restart: on-failure
ports:
- "2200:22"
environment:
- "JENKINS_AGENT_SSH_PUBKEY=[your-public-key]" (1)
# e.g. - "JENKINS_AGENT_SSH_PUBKEY=ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKBBHLJ+8RuLPO8dO1tm3RAt5kc3HqYwJUYMmRPjhtI3" (1)
volumes:
- agent-home:/home/jenkins/agent:rw (2)
networks:
agents:
volumes:
agent-home:
name: jenkins-agent-home
networks:
agents:
external: true
name: jenkins-agents
1 | The value of JENKINS_AGENT_SSH_PUBKEY MUST include the full contents of your .pub file created above (i.e. ~/.ssh/jenkins_agent_key.pub), including the ssh-XXXX prefix. [6] |
2 | When using the Linux image, you have to set the value of the Remote root directory to /home/jenkins/agent in the agent configuration UI.
When using the Windows image, you have to set the value of the Remote root directory to |
Starting the agent container.
docker compose up -d
Setup up the jenkins-agent on jenkins.
Go to your Jenkins dashboard.
Go to Manage Jenkins option in left main menu.
Go to Nodes item under the System Configuration.
Go to New Node option in top right menu.
Fill the Node name and select the type; (e.g. Name: agent1, Type: Permanent Agent), and then press the Create button.
Now fill the fields.
Remote root directory; (e.g. /home/jenkins/agent)
Labels; (e.g. agent1 )
Usage; (e.g. Use this node as much as possible)
Launch method; (e.g. Launch agents via SSH)
Host; (e.g. localhost or your IP address)
Credentials; (e.g. jenkins)
Host Key verification Strategy (e.g.: Non verifying Verification Strategy. test only, NOT recommended)
See also, Host Key Verification Strategy.
It’s recommended to use Manually trusted key Verification Strategy , then enter the agent configure page to trust the host key manually.
|
Expand the Advanced tab, and set the Port to be 2200
Press the Save button and the agent1 will be registered, and be launched by the Controller.
Delegating the first job to agent1.
Go to your Jenkins dashboard
Select New Item on side menu
Enter an item name. (e.g.: First Job to Agent1)
Select the Freestyle project and press OK.
Now select the option Execute shell at Build Steps section.
Add the command: echo $NODE_NAME
in the Command field of the Execute shell step and the name of the agent will be printed inside the log when this job is run.
Press the Save button and then select the option Build Now.
Wait some seconds and then go to Console Output page.
Started by user admin
Running as SYSTEM
Building remotely on agent1 in workspace /home/jenkins/agent/workspace/test
[test] $ /bin/sh -xe /tmp/jenkins5590136104445527177.sh
+ echo agent1
agent1
Finished: SUCCESS
Open up a terminal window, and create a directory named agents/dind:
mkdir -p agents/dind
cd agents/dind
Create an environment file named .env and set the project name with jenkins-agents-dind:
echo -n COMPOSE_PROJECT_NAME=jenkins-agents-dind > .env
Create a bridge network for the agent:
docker network create -d bridge jenkins-agents-dind
Create a compose file named compose.yml
with the following content:
version: "2.4"
services:
agent:
container_name: jenkins-agent-dind
# image: qqbuby/jenkins-ssh-dind-agent:5.25.0-jdk21
build:
context: .
dockerfile_inline: |
ARG SSH_AGENET_TAG=jdk21
FROM jenkins/ssh-agent:$${SSH_AGENET_TAG}
ARG DOCKER_CE_CLI_VERSION=5:25.0.1-1~debian.12~bookworm
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
ca-certificates \
curl \
lsb-release \
&& rm -rf /var/lib/apt/lists/*
RUN curl -fsSLo /usr/share/keyrings/docker-archive-keyring.asc https://download.docker.com/linux/debian/gpg
RUN echo "deb [arch=$(dpkg --print-architecture) \
signed-by=/usr/share/keyrings/docker-archive-keyring.asc] \
https://download.docker.com/linux/debian \
$(lsb_release -cs) stable" > /etc/apt/sources.list.d/docker.list
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
docker-ce-cli=$${DOCKER_CE_CLI_VERSION} \ (1)
&& rm -rf /var/lib/apt/lists/*
restart: on-failure
ports:
- "2210:22" (2)
environment:
- "JENKINS_AGENT_SSH_PUBKEY=[your-public-key]" (3)
# e.g. - "JENKINS_AGENT_SSH_PUBKEY=ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKBBHLJ+8RuLPO8dO1tm3RAt5kc3HqYwJUYMmRPjhtI3"
- DOCKER_HOST=tcp://docker:2376
- DOCKER_CERT_PATH=/certs/client
- DOCKER_TLS_VERIFY=1
volumes:
- agent-home:/home/jenkins/agent:rw
- docker-certs:/certs/client:ro
networks:
agents:
depends_on:
- docker
docker:
container_name: jenkins-docker
image: docker:25
restart: on-failure
ports:
- "2376"
privileged: true
environment:
- DOCKER_TLS_CERTDIR=/certs
volumes:
- agent-home:/home/jenkins/agent:rw (4)
- docker-certs:/certs/client:rw
- docker-root:/var/lib/docker:rw
networks:
agents:
aliases:
- docker
volumes:
agent-home:
name: jenkins-agent-home-dind
docker-certs:
name: jenkins-agent-docker-certs
docker-root:
name: jenkins-agent-docker-root
networks:
agents:
external: true
name: jenkins-agents-dind
1 | Extend the jenkins/ssh-agent image to install Docker CLI. |
2 | If your machine already has a ssh server running on the 22 port, use another port to publish the agent container port 22 (SSH), such as 2210:22 . |
3 | The value of JENKINS_AGENT_SSH_PUBKEY MUST include the full contents of your .pub file created above (i.e. ~/.ssh/jenkins_agent_key.pub), including the ssh-XXXX prefix. [6] |
4 | Share the agent home volume (i.e. agent-home ) to the Docker container, otherwise the pipeline will be stuck.
|
(Optional) Create a compose file named compose.override.yml with the following content:
version: "2.4"
services:
agent:
build:
args:
- SSH_AGENET_TAG=jdk21
- DOCKER_CE_CLI_VERSION=5:25.0.1-1~debian.12~bookworm
docker:
image: docker:25
# If an insecure registry isn’t marked as insecure,
# docker pull, docker push, and docker search result
# in error messages, prompting the user to either
# secure or pass the --insecure-registry flag to the
# Docker daemon.
# command: ["--insecure-registry=192.168.56.0/24"]
Starting the agent and docker container:
docker compose up -d
Refer to Configuring agents using the SSH connector in Docker (replace SSH port with 2210
instead of 2200
) to setup up the agent on jenkins, and create a Freestyle project using Execute shell with docker version
command, and select the option Build Now then go to Console Output page.
Started by user admin
Running as SYSTEM
Building remotely on agent1 in workspace /home/jenkins/agent/workspace/test
[test] $ /bin/sh -xe /tmp/jenkins2069680891022148280.sh
+ docker version
Client: Docker Engine - Community
Version: 25.0.1
API version: 1.44
Go version: go1.21.6
Git commit: 29cf629
Built: Tue Jan 23 23:09:46 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 25.0.1
API version: 1.44 (minimum version 1.24)
Go version: go1.21.6
Git commit: 71fa3ab
Built: Tue Jan 23 23:09:59 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.7.12
GitCommit: 71909c1814c544ac47ab91d2e8b84718e517bb99
runc:
Version: 1.1.11
GitCommit: v1.1.11-0-g4bccb38
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Finished: SUCCESS
Blue Ocean as it stands provides easy-to-use Pipeline visualization. It was intended to be a rethink of the Jenkins user experience, designed from the ground up for Jenkins Pipeline. Blue Ocean was intended to reduce clutter and increases clarity for all users. [9]
Sophisticated visualization of continuous delivery (CD) Pipelines, allowing for fast and intuitive comprehension of your Pipeline’s status.
Pipeline editor makes the creation of Pipelines more approachable, by guiding the user through a visual process to create a Pipeline.
Personalization to suit the role-based needs of each member of the team.
Pinpoint precision when intervention is needed or issues arise. Blue Ocean shows where attention is needed, facilitating exception handling and increasing productivity.
Native integration for branches and pull requests, which enables maximum developer productivity when collaborating on code in GitHub and Bitbucket.
When Jenkins is installed on most platforms, the Blue Ocean plugin and all necessary dependent plugins, which compile the Blue Ocean suite of plugins, are not installed by default.
To install the Blue Ocean suite of plugins on an existing Jenkins instance: [10]
Ensure you are logged in to Jenkins as a user with the Administer permission.
From the Jenkins home page, select Manage Jenkins on the left and then Plugins under the System Configuration.
Select the Available plugins tab and enter blueocean
in the Filter text box. This filters the list of plugins based on the name and description.
Select the box to the left of Blue Ocean, and then select either the Install after restart option (recommended) or the Install without restart option at the top right of the page.
It is not necessary to select other plugins in this list. The main Blue Ocean plugin automatically selects and installs all dependent plugins, composing the Blue Ocean suite of plugins. If you select the Install without restart option, you must restart Jenkins to gain full Blue Ocean functionality. |
Once a Jenkins environment has Blue Ocean installed and log in to the Jenkins classic UI, the Blue Ocean UI can be accessed by selecting Open Blue Ocean on the left side of the screen.
Alternatively, access Blue Ocean directly by appending /blue
to the end of the Jenkins server’s URL. For example https://jenkins-server-url/blue
.
If you need to access these features, select the Go to classic icon at the top of a common section of Blue Ocean’s navigation bar.
Jenkins Pipeline (or simply "Pipeline" with a capital "P") is a suite of plugins which supports implementing and integrating continuous delivery pipelines into Jenkins.
The definition of a Jenkins Pipeline is written into a text file (called a Jenkinsfile
) which in turn can be committed to a project’s source control repository, which is the foundation of "Pipeline-as-code"; treating the CD pipeline as a part of the application to be versioned and reviewed like any other code. [9]
The following concepts are key aspects of Jenkins Pipeline, which tie in closely to Pipeline syntax.
Pipeline
A Pipeline is a user-defined model of a CD pipeline. A Pipeline’s code defines your entire build process, which typically includes stages for building an application, testing it and then delivering it.
Also, a pipeline
block is a key part of Declarative Pipeline syntax.
Node
A node is a machine which is part of the Jenkins environment and is capable of executing a Pipeline.
Also, a node
block is a key part of Scripted Pipeline syntax.
Stage
A stage
block defines a conceptually distinct subset of tasks performed through the entire Pipeline (e.g. "Build", "Test" and "Deploy" stages), which is used by many plugins to visualize or present Jenkins Pipeline status/progress.
Step
A single task. Fundamentally, a step
tells Jenkins what to do at a particular point in time (or "step" in the process). For example, to execute the shell command make
, use the sh
step: sh 'make'
. When a plugin extends the Pipeline DSL, that typically means the plugin has implemented a new step.
For an overview of available steps, please refer to the Pipeline Steps reference which contains a comprehensive list of steps built into Pipeline as well as steps provided by plugins. [12]
A Pipeline can be created in one of the following ways:
Through Blue Ocean - after setting up a Pipeline project in Blue Ocean, the Blue Ocean UI helps you write your Pipeline’s Jenkinsfile
and commit it to source control.
Blue Ocean automatically generates an SSH public/private key pair or provides you with an existing pair for the current Jenkins user. This credential is automatically registered in Jenkins with the following details for this Jenkins user:
|
Through the classic UI - you can enter a basic Pipeline directly in Jenkins through the classic UI.
In SCM - you can write a Jenkinsfile
manually, which you can commit to your project’s source control repository.
The Multibranch Pipeline project type enables you to implement different Jenkinsfiles for different branches of the same project. In a Multibranch Pipeline project, Jenkins automatically discovers, manages and executes Pipelines for branches which contain a |
Using a text editor, ideally one which supports Groovy syntax highlighting, create a new Jenkinsfile in the root directory of the project. [11]
pipeline {
agent any
stages {
stage('Build') {
steps {
echo 'Building..'
}
}
stage('Test') {
steps {
echo 'Testing..'
}
}
stage('Deploy') {
steps {
echo 'Deploying....'
}
}
}
}
The Declarative Pipeline example above contains the minimum necessary structure to implement a continuous delivery pipeline. The agent directive, which is required, instructs Jenkins to allocate an executor and workspace for the Pipeline. Without an agent
directive, not only is the Declarative Pipeline not valid, it would not be capable of doing any work! By default the agent
directive ensures that the source repository is checked out and made available for steps in the subsequent stages.
The stages directive, and steps directives are also required for a valid Declarative Pipeline as they instruct Jenkins what to execute and in which stage it should be executed.
Many organizations use Docker to unify their build and test environments across machines, and to provide an efficient mechanism for deploying applications.
To use the Docker with Pipeline, install the Docker Pipeline plugin:
|
Pipeline is designed to easily use Docker images as the execution environment for a single Stage or the entire Pipeline. Meaning that a user can define the tools required for their Pipeline, without having to manually configure agents. Any tool that can be packaged in a Docker container can be used with ease, by making only minor edits to a Jenkinsfile. [13]
pipeline {
agent {
docker { image 'node:20.11.0-alpine3.19' }
}
stages {
stage('Test') {
steps {
sh 'id'
sh 'node --version'
}
}
}
}
When the Pipeline executes, Jenkins will automatically start the specified container and execute the defined steps within:
. . .
[Pipeline] {
[Pipeline] stage
[Pipeline] { (Test)
[Pipeline] sh
+ id
uid=1000(node) gid=1000(node) groups=1000(node)
[Pipeline] sh
+ node --version
v20.11.0
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
. . .
If it is important to keep the workspace synchronized with other stages, use reuseNode true
. Otherwise, a dockerized stage can be run on the same agent or any other agent, but in a temporary workspace.
By default, for a containerized stage, Jenkins:
Picks an agent.
Creates a new empty workspace.
Clones pipeline code into it.
Mounts this new workspace into the container.
If you have multiple Jenkins agents, your containerized stage can be started on any of them.
When reuseNode
is set to true
, no new workspace will be created, and the current workspace from the current agent will be mounted into the container. After this, the container will be started on the same node, so all of the data will be synchronized.
pipeline {
agent any
stages {
stage('Build') {
agent {
docker {
image 'gradle:8.2.0-jdk17-alpine'
// Run the container on the node specified at the
// top-level of the Pipeline, in the same workspace,
// rather than on a new node entirely:
reuseNode true
}
}
steps {
sh 'gradle --version'
}
}
}
}
Many build tools will download external dependencies and cache them locally for future re-use. Since containers are initially created with "clean" file systems, this can result in slower Pipelines, as they may not take advantage of on-disk caches between subsequent Pipeline runs.
Pipeline supports adding custom arguments that are passed to Docker, allowing users to specify custom Docker Volumes to mount, which can be used for caching data on the agent between Pipeline runs. The following example will cache ~/.m2
between Pipeline runs utilizing the maven container, avoiding the need to re-download dependencies for subsequent Pipeline runs.
pipeline {
agent {
docker {
image 'maven:3.9.3-eclipse-temurin-17'
args '-v $HOME/.m2:/root/.m2'
}
}
stages {
stage('Build') {
steps {
sh 'mvn -B'
}
}
}
}
It has become increasingly common for code bases to rely on multiple different technologies. For example, a repository might have both a Java-based back-end API implementation and a JavaScript-based front-end implementation. Combining Docker and Pipeline allows a Jenkinsfile to use multiple types of technologies, by combining the agent {}
directive with different stages.
pipeline {
agent none
stages {
stage('Back-end') {
agent {
docker { image 'maven:3.9.6-eclipse-temurin-17-alpine' }
}
steps {
sh 'mvn --version'
}
}
stage('Front-end') {
agent {
docker { image 'node:20.11.0-alpine3.19' }
}
steps {
sh 'node --version'
}
}
}
}
Install Kubernetes CLI plugin.
Using the GUI: From the Jenkins dashboard navigate to Manage Jenkins > Plugins and select the Available tab. Locate this plugin by searching for kubernetes-cli
.
Using the CLI tool:
jenkins-plugin-cli --plugins kubernetes-cli:1.12.1
Configure Credentials
The following types of credentials are supported and can be used to authenticate against Kubernetes clusters:
Token, as secrets (Kind: Secret text)(see Plain Credentials plugin)
Plain KubeConfig files (Kind: Secret file) (see Plain Credentials plugin)
Username and Password (see Credentials plugin)
Certificates (see Credentials plugin)
OpenShift OAuth tokens, as secrets (see Kubernetes Credentials plugin)
If the Jenkins Agent is running within a Pod (e.g. by using the Kubernetes plugin), you can fallback to the Pod’s ServiceAccount by not setting any credentials.
Now, let’s create a KubeConfig credential using the Secret file
. On the Jenkins dashboard, go to Manage Jenkins > Credentials, move mouse over the (global) and select the Add credentials. Fill the fields as below:
Kind: Secret file.
Scope: Global (Jenkins, nodes, items, all child items, etc)
File: Upload your cluster kubeconfig file.
ID: kubernetes-admin.
Description: (optional)
Create a testing Freestyle project job:
Scroll down to the Build Environment section.
Select Configure Kubernetes CLI (kubectl) with multiple credentials.
In the Credential dropdown, select the credentials (e.g., kubernetes-admin
) to authenticate on the cluster or the kubeconfig stored in Jenkins.
On the Build Steps", using *Execute shell with kubectl cluster-info
command.
Click "Save", and select the option Build Now then go to Console Output page.
Wait a seconds and then go to Console Output page.
Started by user admin
Running as SYSTEM
Building remotely on agent-dind-2 in workspace /home/jenkins/agent/workspace/First Job to K8s
[First Job to K8s] $ /bin/sh -xe /tmp/jenkins17537654207595799867.sh
+ kubectl cluster-info
/tmp/jenkins17537654207595799867.sh: 2: kubectl: not found (1)
Build step 'Execute shell' marked build as failure
[kubernetes-cli] kubectl configuration cleaned up
Finished: FAILURE
1 | To solve the kubectl: not found problem, it’s required to install the kubectl command line tool to the agent node.
See also Install kubectl on Linux. You can also try to use the
|
Again, click the Build Now, and see the log on the Console Output page.
Started by user admin
Running as SYSTEM
Building remotely on agent-dind-2 in workspace /home/jenkins/agent/workspace/First Job to K8s
[First Job to K8s] $ /bin/sh -xe /tmp/jenkins9182137363539535938.sh
+ kubectl cluster-info
[0;32mKubernetes control plane[0m is running at [0;33mhttps://192.168.56.130:6443[0m
[0;32mCoreDNS[0m is running at [0;33mhttps://192.168.56.130:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy[0m
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[kubernetes-cli] kubectl configuration cleaned up
Finished: SUCCESS
GitLab is a fully featured software development platform that includes, among other powerful features, built-in GitLab CI/CD to leverage the ability to build, test, and deploy your apps without requiring you to integrate with CI/CD external tools. [14]
However, many organizations have been using Jenkins for their deployment processes, and need an integration with Jenkins to be able to onboard to GitLab before switching to GitLab CI/CD. Others have to use Jenkins to build and deploy their applications because of the inability to change the established infrastructure for current projects, but they want to use GitLab for all the other capabilities.
With GitLab’s Jenkins integration, you can effortlessly set up your project to build with Jenkins, and GitLab will output the results for you right from GitLab’s UI.
After configured a Jenkins integration, trigger a build in Jenkins when push code to your repository or create a merge request in GitLab. The Jenkins pipeline status displays on merge request widgets and the GitLab project’s home page. [21]
To configure a Jenkins integration with GitLab:
Grant Jenkins access to the GitLab project.
Configure the Jenkins server.
Configure the Jenkins project.
Configure the GitLab project.
Open a terminal, and a bridge network named gitlab-ce
.
docker network create gitlab-ce
Create a compose.yml
file.
version: "2.4"
services:
gitlab-ce:
container_name: gitlab-ce
image: gitlab/gitlab-ce:16.5.8-ce.0 # Pin GitLab to a specific Community Edition version
restart: "on-failure:3"
volumes:
- data:/var/opt/gitlab:rw # For storing application data.
- logs:/var/log/gitlab:rw # For storing logs.
- config:/etc/gitlab:rw # For storing the GitLab configuration files.
networks:
gitlab-ce:
volumes:
data:
name: gitlab-ce-data
logs:
name: gitlab-ce-logs
config:
name: gitlab-ce-config
networks:
gitlab-ce:
external: true
name: gitlab-ce
Create a compose.override.yml
file.
version: "2.4"
services:
gitlab-ce:
# Pin GitLab to a specific Community Edition version
image: gitlab/gitlab-ce:16.5.8-ce.0
# Use a valid externally-accessible hostname or IP address. Do not use `localhost`.
hostname: 'node-0'
environment:
# If you want to use a different host port than 80 (HTTP), 443 (HTTPS), or 22 (SSH), you
# need to add a separate --publish directive to the docker run command.
GITLAB_OMNIBUS_CONFIG: |
# Add any other gitlab.rb configuration here, each on its own line
external_url 'http://node-0:8929'
gitlab_rails['gitlab_shell_ssh_port'] = 2424
ports:
- '8929:8929'
- '2424:22'
extra_hosts:
- "node-0:192.168.56.130"
Start the gitlab-ce
container.
docker compose up -d
The initialization process may take a long time. You can track this process with: [20]
docker logs -f gitlab-ce
After starting the container, you can visit node-0
. It might take a while before the Docker container starts to respond to queries.
Visit the GitLab URL, and sign in with the username root
and the password from the following command:
sudo cat $(docker inspect gitlab-ce-config -f "{{.Mountpoint}}")/initial_root_password
The password file is automatically deleted in the first container restart after 24 hours. |
Sonatype Nexus Repository Manager provides a central platform for storing build artifacts. [15]
Open a terminal, and create a .env file, and set the project name with sonatype-nexus
.
echo -n COMPOSE_PROJECT_NAME=sonatype-nexus > .env
Creata a bridge network named sonatype-nexus
.
docker network create -d bridge sonatype-nexus
Create a compose.yml file.
version: "2.4"
services:
nexus:
container_name: sonatype-nexus
user: nexus:nexus
image: sonatype/nexus3:3.64.0
restart: "on-failure:3"
volumes:
- data:/nexus-data:rw
networks:
nexus:
volumes:
data:
name: nexus-data
networks:
nexus:
external: true
name: sonatype-nexus
Create a compose.override.yml file.
version: "2.4"
services:
nexus:
ports:
- "8081:8081"
- "8082:8082" # Using for Docker Registry
# environment:
# NEXUS_CONTEXT: nexus (1)
# INSTALL4J_ADD_VM_PARAMS, passed to the Install4J startup script. Defaults to -Xms2703m -Xmx2703m -XX:MaxDirectMemorySize=2703m -Djava.util.prefs.userRoot=${NEXUS_DATA}/javaprefs.
Start the sonatype-nexus container.
docker compose up -d
Go to a browser with http://localhost:8081, click the Sign in button on the top right, and fill the login fields, and then complete required setup tasks.
Your admin user password is located in /nexus-data/admin.password on the server.
|
A hosted repository using the Docker repository format is typically called a private Docker registry. It can be used to upload your own container images as well as third-party images. It is common practice to create two separate hosted repositories for these purposes. [18]
Go the Nexus dashboard, and select the gear icon at the top bar, or enter http://localhost:8081/#admin/repository.
Select the Repositories on the left menu to the Manage repositories panel, or enter http://localhost:8081/#admin/repository/repositories.
Click the Create repository button, and select the docker (hosted) recipe, then fill the form.
Name: docker-registry
Http:: 8082
Click the Create repository button at the bottom.
Login in with Docker, and push/pull images from/to the Nexus.
docker login -u admin -p [YOUR ADMIN PASSWORD OF NEXUS] http://localhost:8082
$ docker pull busybox
Using default tag: latest
latest: Pulling from library/busybox
9ad63333ebc9: Pull complete
Digest: sha256:6d9ac9237a84afe1516540f40a0fafdc86859b2141954b4d643af7066d598b74
Status: Downloaded newer image for busybox:latest
docker.io/library/busybox:latest
$ docker tag busybox:latest localhost:8082/busybox
$ docker push localhost:8082/busybox
Using default tag: latest
The push refers to repository [localhost:8082/busybox]
2e112031b4b9: Pushed
latest: digest: sha256:d319b0e3e1745e504544e931cde012fc5470eba649acc8a7b3607402942e5db7 size: 527
$ docker pull localhost:8082/busybox
Using default tag: latest
latest: Pulling from busybox
Digest: sha256:d319b0e3e1745e504544e931cde012fc5470eba649acc8a7b3607402942e5db7
Status: Image is up to date for localhost:8082/busybox:latest
localhost:8082/busybox:latest
Go back to the Browser (e.g. http://localhost:8081/#browse/browse:docker-registry) in the Nexus to check the Repository status.
By default, Docker assumes all registries to be secure, except for local registries. Communicating with an insecure registry isn’t possible if Docker assumes that registry is secure. In order to communicate with an insecure registry, the Docker daemon requires --insecure-registry in one of the following two forms:
The flag can be used multiple times to allow multiple registries to be marked as insecure. If an insecure registry isn’t marked as insecure, Local registries, whose IP address falls in the 127.0.0.0/8 range, are automatically marked as insecure as of Docker 1.3.2. It isn’t recommended to rely on this, as it may change in the future.
|
A hosted repository for NuGet can be used to upload your own packages as well as third-party packages. The repository manager includes a hosted NuGet repository named nuget-hosted by default. [19]
Go the Nexus dashboard, sign in, and click the user name at the top right, or enter http://localhost:8081/#user/account.
On the left panel, select the NuGet API Key.
Click the Access API Key, authentication with your credential, and then click Copy to Clipboard.
Click the gear icon at the top panel, select the Realms on the left panel under the Security.
Select the NuGet API-Key Realm on the left Available tab panel, and transfer it to the right Active tab panel.
Click the Save button at the bottom right.
Push a Nuget package on Nexus.
$ dotnet new classlib -o HelloLib
The template "Class Library" was created successfully.
. . .
$ dotnet pack HelloLib/
$ dotnet nuget push HelloLib/bin/Release/HelloLib.1.0.0.nupkg -k [REPLACE WITH YOUR API KEY] -s http://localhost:8081/repository/nuget-hosted/index.json
warn : You are running the 'push' operation with an 'HTTP' source, 'http://localhost:8081/repository/nuget-hosted/index.json'. Non-HTTPS access will be removed in a future version. Consider migrating to an 'HTTPS' source.
Pushing HelloLib.1.0.0.nupkg to 'http://localhost:8081/repository/nuget-hosted'...
warn : You are running the 'push' operation with an 'HTTP' source, 'http://localhost:8081/repository/nuget-hosted/'. Non-HTTPS access will be removed in a future version. Consider migrating to an 'HTTPS' source.
PUT http://localhost:8081/repository/nuget-hosted/
Created http://localhost:8081/repository/nuget-hosted/ 40ms
Your package was pushed.
You can also create a nuget.config and add the NuGet source to the project.
|
Open a terminal, create a working folder if you haven’t already, and enter it.
In the working folder, run the following command to create a demo ASP.NET Core Web project:
dotnet new gitignore
dotnet new globaljson --sdk-version=8.0.101 --roll-forward=latestFeature
dotnet new sln -n jenkins-getting-started
dotnet new web -o src/HelloWorld
dotnet sln add -s src src/HelloWorld/
Create Dockerfile using to build Docker image.
FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
WORKDIR /source
# Copy everything
COPY . ./
# Restore as distinct layers
RUN dotnet restore
# Build and publish a release
RUN dotnet publish -c release -o /app --no-restore
# Build runtime image
FROM mcr.microsoft.com/dotnet/aspnet:8.0
WORKDIR /app
COPY --from=build /app ./
ENTRYPOINT ["dotnet", "HelloWorld.dll"]
Create Jenkinsfile.
pipeline {
environment {
// Explicitly specify the DOTNET_CLI_HOME environment variable to a writable directory, like /tmp:
// See also: https://github.com/dotnet/cli/pull/9327
// https://github.com/dotnet/sdk/blob/main/src/Common/CliFolderPathCalculatorCore.cs#L14
// System.UnauthorizedAccessException: Access to the path '/.dotnet' is denied.
DOTNET_CLI_HOME = '/tmp'
}
agent any
stages {
stage('Build') {
agent {
docker {
image 'mcr.microsoft.com/dotnet/sdk:8.0'
// Run the container on the node specified at the
// top-level of the Pipeline, in the same workspace,
// rather than on a new node entirely:
reuseNode true
}
}
steps {
sh 'dotnet build'
}
}
stage('Test') {
agent {
docker {
image 'mcr.microsoft.com/dotnet/sdk:8.0'
// Run the container on the node specified at the
// top-level of the Pipeline, in the same workspace,
// rather than on a new node entirely:
reuseNode true
}
}
steps {
sh 'dotnet test'
}
}
stage('Deploy') {
agent {
docker {
image 'mcr.microsoft.com/dotnet/sdk:8.0'
// Run the container on the node specified at the
// top-level of the Pipeline, in the same workspace,
// rather than on a new node entirely:
reuseNode true
}
}
steps {
sh 'dotnet publish'
}
}
stage('Docker') {
// Execute the stage on a node pre-configured to accept Docker-based Pipelines
environment {
// Create the Docker Registry credential with ID as `jenkins-docker-registry-creds` on Jenkins.
DOCKER_REGISTRY_CREDS = credentials('jenkins-docker-registry-creds')
// Replace the following variables with your registry.
REGISTRY_SCHEME= 'http'
REGISTRY_HOSTNAME = '192.168.56.130'
REGISTRY_PORT = '8082'
}
steps {
sh 'docker build . -t $REGISTRY_HOSTNAME:$REGISTRY_PORT/hello-world:$BRANCH_NAME'
sh 'docker login -u $DOCKER_REGISTRY_CREDS_USR -p $DOCKER_REGISTRY_CREDS_PSW $REGISTRY_SCHEME://$REGISTRY_HOSTNAME:$REGISTRY_PORT'
sh 'docker push $REGISTRY_HOSTNAME:$REGISTRY_PORT/hello-world:$BRANCH_NAME'
sh 'docker logout $REGISTRY_SCHEME://$REGISTRY_HOSTNAME:$REGISTRY_PORT'
}
}
}
}
The final project structure should be as below.
$ tree
.
├── Dockerfile
├── global.json
├── Jenkinsfile
├── jenkins-getting-started.sln
└── src
└── HelloWorld
├── appsettings.Development.json
├── appsettings.json
├── HelloWorld.csproj
├── Program.cs
└── Properties
└── launchSettings.json
4 directories, 9 files
Build and test the project.
Run the Web application.
$ dotnet run --project src/HelloWorld/
Building...
info: Microsoft.Hosting.Lifetime[14]
Now listening on: http://localhost:5062
info: Microsoft.Hosting.Lifetime[0]
Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
Hosting environment: Development
...
Open another terminal, and test the above endpoint.
$ curl -i http://localhost:5062
HTTP/1.1 200 OK
Content-Type: text/plain; charset=utf-8
Date: Tue, 30 Jan 2024 03:25:20 GMT
Server: Kestrel
Transfer-Encoding: chunked
Hello World!
The following is a sample output on Jenkins.
. . .
+ dotnet build
MSBuild version 17.8.3+195e7f5a3 for .NET
Determining projects to restore...
. . .
+ docker build . -t 192.168.56.130:8082/hello-world:main
DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
Install the buildx component to build images with BuildKit:
https://docs.docker.com/go/buildx/
Sending build context to Docker daemon 1.535MB
. . .
+ docker login -u **** -p **** http://192.168.56.130:8082
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
WARNING! Your password will be stored unencrypted in /home/jenkins/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
[Pipeline] sh
+ docker push 192.168.56.130:8082/hello-world:main
The push refers to repository [192.168.56.130:8082/hello-world]
. . .
+ docker logout http://192.168.56.130:8082
Removing login credentials for 192.168.56.130:8082
. . .
An error message is displayed in the "Manage Jenkins" page: It appears that your reverse proxy setup is broken
. [22]
For a reverse proxy to work correctly, it needs to rewrite both the request and the response. Request rewriting involves receiving an inbound HTTP call and then making a forwarding request to Jenkins (sometimes with some HTTP headers modified, sometimes not). Failing to configure the request rewriting is easy to catch, because you just won’t see any pages at all.
But correct reverse proxying also involves one of two options, EITHER
rewrite the response with a "Location" header in the response, which is used during redirects. Jenkins sends Location: http://actual.server:8080/jenkins/foobar
and the reverse proxy must rewrite it to Location: http://nice.name/jenkins/foobar
. Unfortunately, failing to configure this correctly is harder to catch; OR
set the headers X-Forwarded-Host
(and perhaps X-Forwarded-Port
) on the forwarded request. Jenkins will parse those headers and generate all the redirects and other links on the basis of those headers. Depending on your reverse proxy it may be easier to set X-Forwarded-Host
and X-Forwarded-Port
to the hostname and port in the original Host
header respectively or it may be easier to just pass the original Host
header through as X-Forwarded-Host
and delete the X-Forwarded-Port
# header from the request. You will also need to set the X-Forwarded-Proto
header if your reverse proxy is changing from https to http or vice-versa.
[2] https://www.jenkins.io/doc/book/scaling/architecting-for-scale/
[5] https://github.com/jenkinsci/docker/blob/master/README.md
[8] https://docs.docker.com/compose/multiple-compose-files/merge/
[10] https://www.jenkins.io/doc/book/blueocean/getting-started/
[15] https://www.sonatype.com/products/sonatype-nexus-repository
[18] https://help.sonatype.com/en/hosted-repository-for-docker—private-registry-for-docker-.html
[19] https://help.sonatype.com/en/nuget-hosted-repositories.html
[22] https://www.jenkins.io/doc/book/system-administration/reverse-proxy-configuration-troubleshooting/
TL.DR: The docker0
is the default bridge when starting the Docker daemon.
Container networking refers to the ability for containers to connect to and communicate with each other, or to non-Docker workloads. [1]
Containers have networking enabled by default, and they can make outgoing connections. A container has no information about what kind of network it’s attached to, or whether their peers are also Docker workloads or not. A container only sees a network interface with an IP address, a gateway, a routing table, DNS services, and other networking details. That is, unless the container uses the none
network driver.
You can create custom, user-defined networks, and connect multiple containers to the same network. Once connected to a user-defined network, containers can communicate with each other using container IP addresses or container names.
The following example creates a network using the bridge network driver and running a container in the created network:
docker network create -d bridge my-net
docker run --network=my-net -itd --name=container3 busybox
The following network drivers are available by default, and provide core networking functionality:
Driver | Description |
---|---|
bridge |
The default network driver. |
host |
Remove network isolation between the container and the Docker host. |
none |
Completely isolate a container from the host and other containers. |
overlay |
Overlay networks connect multiple Docker daemons together. |
ipvlan |
IPvlan networks provide full control over both IPv4 and IPv6 addressing. |
macvlan |
Assign a MAC address to a container. |
In terms of networking, a bridge network is a Link Layer device which forwards traffic between network segments. A bridge can be a hardware device or a software device running within a host machine’s kernel. [2]
In terms of Docker, a bridge network uses a software bridge which lets containers connected to the same bridge network communicate, while providing isolation from containers that aren’t connected to that bridge network. The Docker bridge driver automatically installs rules in the host machine so that containers on different bridge networks can’t communicate directly with each other.
Bridge networks apply to containers running on the same Docker daemon host. For communication among containers running on different Docker daemon hosts, you can either manage routing at the OS level, or you can use an overlay network.
When you start Docker, a default bridge network (also called bridge
) is created automatically, and newly-started containers connect to it unless otherwise specified. You can also create user-defined custom bridge networks.
$ docker run --rm -it qqbuby/net-tools:1.0 hostname -i
172.17.0.2
$ docker run --rm -it qqbuby/net-tools:1.0 ip r
default via 172.17.0.1 dev eth0
172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.2
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
78ec0b2e6034 bridge bridge local
52eb0b3c3639 host host local
97378c7bca5f none null local
The
|
To configure the default bridge network, you specify options in daemon.json
. Here is an example daemon.json
with several options specified. Only specify the settings you need to customize. [3]
{
"bip": "192.168.1.1/24",
"fixed-cidr": "192.168.1.0/25",
"fixed-cidr-v6": "2001:db8::/64",
"mtu": 1500,
"default-gateway": "192.168.1.254",
"default-gateway-v6": "2001:db8:abcd::89",
"dns": ["10.20.1.2","10.20.1.3"]
}
Restart Docker for the changes to take effect.
$ ip a show docker0
11: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:67:64:cd:53 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.1/24 brd 192.168.1.255 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::42:67ff:fe64:cd53/64 scope link
valid_lft forever preferred_lft forever
$ docker inspect bridge
[
{
"Name": "bridge",
"Id": "335bd5ba267bde54a9b125270c4a010d0031ece7e75f43addf70df04571290b1",
"Created": "2024-01-25T12:54:54.930262205+08:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "192.168.1.0/24",
"IPRange": "192.168.1.0/25",
"Gateway": "192.168.1.1",
"AuxiliaryAddresses": {
"DefaultGatewayIPv4": "192.168.1.254"
}
},
{
"Subnet": "2001:db8::/64",
"AuxiliaryAddresses": {
"DefaultGatewayIPv6": "2001:db8:abcd::89"
}
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {},
"Options": {
"com.docker.network.bridge.default_bridge": "true",
"com.docker.network.bridge.enable_icc": "true",
"com.docker.network.bridge.enable_ip_masquerade": "true",
"com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
"com.docker.network.bridge.name": "docker0",
"com.docker.network.driver.mtu": "1500"
},
"Labels": {}
}
]
$ docker run --rm -it qqbuby/net-tools:1.0 hostname -i
192.168.1.2
$ docker run --rm -it qqbuby/net-tools:1.0 ip r
default via 192.168.1.254 dev eth0
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.2
Use the docker network create
command to create a user-defined bridge network.
docker network create my-net
You can specify the subnet, the IP address range, the gateway, and other options. See the docker network create reference or the output of docker network create --help
for details.
$ docker network create my-net
6fae1652e77ea4aa5452ce8f7321005dec3cbdfd5480bd6ad2caf92ae2646f85
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
449f407d5f92 bridge bridge local
e25b7a4625b8 host host local
6fae1652e77e my-net bridge local
8c9b745f69e5 none null local
$ docker inspect my-net
[
{
"Name": "my-net",
"Id": "6fae1652e77ea4aa5452ce8f7321005dec3cbdfd5480bd6ad2caf92ae2646f85",
"Created": "2024-01-25T14:05:37.630914427+08:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": {},
"Config": [
{
"Subnet": "172.17.0.0/16",
"Gateway": "172.17.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {},
"Options": {},
"Labels": {}
}
]
The IP range of the network my-net is still 172.17.0.0/16 .
|
To configure the default address range, specify options in daemon.json
.
{
"bip": "192.168.1.1/24",
"fixed-cidr": "192.168.1.0/25",
"fixed-cidr-v6": "2001:db8::/64",
"mtu": 1500,
"default-gateway": "192.168.1.254",
"default-gateway-v6": "2001:db8:abcd::89",
"dns": ["10.20.1.2","10.20.1.3"],
"default-address-pools": [
{
"base": "10.201.0.0/16",
"size": 24
},
{
"base": "10.202.0.0/16",
"size": 24
}
]
}
Restart Docker for the changes to take effect.
$ docker info
...
Default Address Pools:
Base: 10.201.0.0/16, Size: 24
Base: 10.202.0.0/16, Size: 24
$ docker network create my-net2
c77a9f13ba7732575a3d99d5bfde8852ee5c6827a3cad7d7f268be306394856e
$ docker inspect my-net2
[
{
"Name": "my-net2",
"Id": "c77a9f13ba7732575a3d99d5bfde8852ee5c6827a3cad7d7f268be306394856e",
"Created": "2024-01-25T14:10:14.285419243+08:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": {},
"Config": [
{
"Subnet": "10.201.0.0/24",
"Gateway": "10.201.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {},
"Options": {},
"Labels": {}
}
]
$ docker run --rm -it --network my-net2 qqbuby/net-tools:1.0 hostname -i
10.201.0.2
$ docker run --rm -it --network my-net2 qqbuby/net-tools:1.0 ip r
default via 10.201.0.1 dev eth0
10.201.0.0/24 dev eth0 proto kernel scope link src 10.201.0.2
The IP range of the
|
Remove the user-defined network.
$ docker network rm my-net my-net2
my-net
my-net2
The default bridge can’t be removed by docker network rm command.
|
Stop the Docker, delete the docker0
, and clean the network config files.
$ sudo systemctl stop docker.service docker.socket
$ sudo ip link delete docker0
$ sudo rm /var/lib/docker/network/files/local-kv.db
It will reset all the networking in the current Docker instance. |
Start the Docker and check the changes.
$ sudo systemctl start docker.service
$ ip a show docker0
24: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:e6:78:3a:e3 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
Apache Kafka Raft (KRaft, pronounced craft) is the consensus protocol that was introduced in KIP-500 to remove Apache Kafka’s dependency on ZooKeeper for metadata management.
KRaft mode makes use of a new quorum controller service in Kafka which replaces the previous controller and makes use of an event-based variant of the Raft consensus protocol. [1]
KRaft mode is production ready for new clusters as of Apache Kafka 3.3. The development progress for additional features like migration from ZooKeeper is tracked in KIP-833.
The KRaft controller nodes comprise a Raft quorum which manages the Kafka metadata log. This log contains information about each change to the cluster metadata. Everything that is currently stored in ZooKeeper, such as topics, partitions, ISRs, configurations, and so on, is stored in this log. [2]
A Kafka cluster can be broken down into two components: a control plane and a data plane, each with its own responsibilities that work together to transfer data where it needs to go. [3]
Control plane responsibilities include:
Knowing which servers are alive.
Making appropriate changes when a server is detected as down.
Storing and exchanging metadata.
Data plane responsibilities include:
Handling requests to produce and fetch records and other application requests.
Reacting to metadata changes from the control plane.
Historically, Kafka used an Apache ZooKeeper cluster to provide most of its control plane functionality. ZooKeeper tracks each broker and provides replicated and consistent storage for the cluster metadata. ZooKeeper also elects one Kafka broker to be the controller. The controller has extra, non data plane duties to manage the state of the cluster, such as responding to brokers that crash or restart.
The new architecture removes the ZooKeeper dependency and replaces it with a flavor of the Raft consensus protocol, allowing each server in the Kafka cluster to take the role of broker, controller, or both. The controller cluster will perform the same roles as the cluster of ZooKeeper nodes did previously, but the Kafka controller will now be elected from the controllers instead of the brokers.
For a Kafka cluster to be highly available, you need to make certain both the data plane and control plane (whichever kind is being used) are highly available.
Your local environment must have Java 8+ installed.
|
Go to https://kafka.apache.org/, download the latest Kafka:
$ curl -LO https://dlcdn.apache.org/kafka/3.6.1/kafka_2.13-3.6.1.tgz
Create a kafka
user and extract the tar to the home:
$ sudo useradd -m kafka # [-s /bin/bash] Specify the login shell of the new account.
$ sudo su - kafka
$ sudo tar xf kafka_2.13-3.6.1.tgz -C /home/kafka/ --strip-components=1
Running Kafka as root is not a recommended configuration. |
Generate a Cluster UUID:
$ KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
Format Log Directories:
$ bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties
Formatting /tmp/kraft-combined-logs with metadata.version 3.6-IV2.
Start the Kafka Server:
$ bin/kafka-server-start.sh config/kraft/server.properties
...
[2024-01-12 23:22:34,872] INFO [SocketServer listenerType=CONTROLLER, nodeId=1] Enabling request processing. (kafka.network.SocketServer)
[2024-01-12 23:22:34,881] INFO [MetadataLoader id=1] InitializeNewPublishers: initializing ScramPublisher controller id=1 with a snapshot at offset 4 (org.apache.kafka.image.loader.MetadataLoader)
[2024-01-12 23:22:34,911] INFO Awaiting socket connections on 0.0.0.0:9093. (kafka.network.DataPlaneAcceptor)
...
[2024-01-12 23:22:36,629] INFO [SocketServer listenerType=BROKER, nodeId=1] Enabling request processing. (kafka.network.SocketServer)
[2024-01-12 23:22:36,629] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.DataPlaneAcceptor)
...
The logs (not to be confused with the commit log) are located at logs which are configured in the log4j.properties.
|
Once the Kafka server has successfully launched:
Open another terminal session and create a topic:
$ bin/kafka-topics.sh --create --topic quickstart-events --bootstrap-server localhost:9092
Created topic quickstart-events.
$ bin/kafka-topics.sh --describe --topic quickstart-events --bootstrap-server localhost:9092
Topic: quickstart-events TopicId: wx6vplZjRHaJubPnPP3_QQ PartitionCount: 1 ReplicationFactor: 1 Configs: segment.bytes=1073741824
Topic: quickstart-events Partition: 0 Leader: 1 Replicas: 1 Isr: 1
Run the console producer client to write a few events into your topic:
$ bin/kafka-console-producer.sh --topic quickstart-events --bootstrap-server localhost:9092
This is my first event
This is my second event
Open another terminal session and run the console consumer client to read the events you just created:
$ bin/kafka-console-consumer.sh --topic quickstart-events --from-beginning --bootstrap-server localhost:9092
This is my first event
This is my second event
Make sure the nodes in the cluster could be reachable each other:
You can use the hostname, DNS name, or even IP address to connect each other. |
You can run the ip a s to show the addresses assigned to all network interfaces.
|
The following steps will be demostrated with the following two nodes (/etc/hosts):
192.168.46.131 node-1
192.168.46.132 node-2
Create a kafka
user and extract the tar to the home at each node:
$ sudo useradd -m kafka # [-s /bin/bash] Specify the login shell of the new account.
$ sudo su - kafka
$ sudo tar xf kafka_2.13-3.6.1.tgz -C /home/kafka/ --strip-components=1
Running Kafka as root is not a recommended configuration. |
Generate a Cluster UUID:
$ KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
$ echo $KAFKA_CLUSTER_ID
MkU3OEVBNTcwNTJENDM2Qk
Note down the value of KAFKA_CLUSTER_ID
and copy it to each node in /etc/profile.d/kafka.sh
with the following content:
KAFKA_CLUSTER_ID=MkU3OEVBNTcwNTJENDM2Qk
Load the environment variables to the current shell with the following command:
$ source /etc/profile
Backup the orignal config directory on each node:
$ cp -a config config.org
Create log.dirs with the following commands on each node:
$ sudo mkdir -p /var/lib/kafka
$ sudo chown kafka:kafka /var/lib/kafka
Update the config/kraft/controller.properties:
# The node id associated with this instance's roles
# !!! on the second node, set the node.id to be 3002.
node.id=3001
# The connect string for the controller quorum
controller.quorum.voters=3001@node-1:9093,3002@node-2:9093
# Use to specify where the metadata log for clusters in KRaft mode is placed.
log.dirs=/var/lib/kafka/controller
Each node ID (node.id ) must be unique across all the servers in a particular cluster.
|
Update the config/kraft/broker.properties:
# The node id associated with this instance's roles
# !!! on the second node, set the node.id to be 1002.
node.id=1001
# The connect string for the controller quorum
controller.quorum.voters=3001@node-1:9093,3002@node-2:9093
# The address the socket server listens on.
listeners=PLAINTEXT://:9092
# Listener name, hostname and port the broker will advertise to clients.
# !!! on the second node, set it to be `PLAINTEXT://node-2:9092`.
advertised.listeners=PLAINTEXT://node-1:9092
# The directory in which the log data is kept。
log.dirs=/var/lib/kafka/data
Each node ID (node.id ) must be unique across all the servers in a particular cluster.
|
The advertised.listeners should be reachable by the clients outside the cluster. You could set it with a reachable hostname or DNS name, or an external IP address. [7]
|
Format Log Directories:
$ bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/controller.properties
Formatting /var/lib/kafka/controller with metadata.version 3.6-IV2.
$ bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/broker.properties
Formatting /var/lib/kafka/data with metadata.version 3.6-IV2.
Start the Kafka Controller and Broker on each node:
$ bin/kafka-server-start.sh -daemon config/kraft/controller.properties
$ bin/kafka-server-start.sh -daemon config/kraft/broker.properties
Note that authentication is disabled for JMX by default in Kafka and security configs must be overridden for production deployments by setting the environment variable
|
Use the kafka-metadata-quorum
tool to query the metadata quorum status.
The following code example displays a summary of the metadata quorum:
$ bin/kafka-metadata-quorum.sh --bootstrap-server node-1:9092 describe --status
bin/kafka-metadata-quorum.sh --bootstrap-server node-1:9092 describe --status
ClusterId: MkU3OEVBNTcwNTJENDM2Qg
LeaderId: 3002
LeaderEpoch: 83
HighWatermark: 779
MaxFollowerLag: 0
MaxFollowerLagTimeMs: 408
CurrentVoters: [3001,3002]
CurrentObservers: [1001,1002]
Schema Registry provides a centralized repository for managing and validating schemas for topic message data, and for serialization and deserialization of the data over the network. [8] [9]
The Schema Registry is not part of Apache Kafka but there are several open source options to choose from. Here we use the Confluent Schema Registry for this example. [10]
Schema Registry lives outside of and separately from your Kafka brokers. Your producers and consumers still talk to Kafka to publish and read data (messages) to topics. Concurrently, they can also talk to Schema Registry to send and retrieve schemas that describe the data models for the messages. [14]
Schema Registry is a distributed storage layer for schemas which uses Kafka as its underlying storage mechanism. Some key design decisions:
Assigns globally unique ID to each registered schema. Allocated IDs are guaranteed to be monotonically increasing and unique, but not necessarily consecutive.
Kafka provides the durable backend, and functions as a write-ahead changelog for the state of Schema Registry and the schemas it contains.
Schema Registry is designed to be distributed, with single-primary architecture, and ZooKeeper/Kafka coordinates primary election (based on the configuration).
Download Confluent Platform using only Confluent Community components by using the curl
command:
$ curl -O https://packages.confluent.io/archive/7.5/confluent-community-7.5.3.tar.gz
Extract the contents of the archive to /home/kafka/confluent:
$ mkdir /home/kafka/confluent
$ tar xf confluent-community-7.5.3.tar.gz -C /home/kafka/confluent/ --strip-components=1
$ cd /home/kafka/confluent
$ cp -a etc/ etc.org
Navigate to the Schema Registry properties file (etc/schema-registry/schema-registry.properties) and specify or update the following properties:
# Specify the address the socket server listens on, e.g. listeners = PLAINTEXT://your.host.name:9092
listeners=http://0.0.0.0:8081
# The advertised host name. Make sure to set this if running Schema Registry with multiple nodes.
host.name=node-1
# List of Kafka brokers to connect to, e.g. PLAINTEXT://hostname:9092,SSL://hostname2:9092
kafkastore.bootstrap.servers=PLAINTEXT://node-1:9092,PLAINTEXT://node-1:9092
Schema Registry on Confluent Platform can be deployed using a single primary source, with either Kafka or ZooKeeper leader election. You can also set up multiple Schema Registry servers for high availability deployments, where you switch to a secondary Schema Registry cluster if the primary goes down, and for data migration, one time or as a continuous feed. [13]
Start Schema Registry. Run this command in its own terminal:
$ bin/schema-registry-start -daemon etc/schema-registry/schema-registry.properties
View the runtime logs of Schema Registry:
$ tail -f logs/schema-registry.log
[2024-01-13 01:58:05,916] INFO DefaultSessionIdManager workerName=node0 (org.eclipse.jetty.server.session)
[2024-01-13 01:58:05,916] INFO No SessionScavenger set, using defaults (org.eclipse.jetty.server.session)
[2024-01-13 01:58:05,918] INFO node0 Scavenging every 600000ms (org.eclipse.jetty.server.session)
[2024-01-13 01:58:06,798] INFO HV000001: Hibernate Validator 6.1.7.Final (org.hibernate.validator.internal.util.Version)
[2024-01-13 01:58:07,291] INFO Started o.e.j.s.ServletContextHandler@53a84ff4{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
[2024-01-13 01:58:07,319] INFO Started o.e.j.s.ServletContextHandler@5807efad{/ws,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
[2024-01-13 01:58:07,349] INFO Started NetworkTrafficServerConnector@65a15628{HTTP/1.1, (http/1.1, h2c)}{0.0.0.0:8081} (org.eclipse.jetty.server.AbstractConnector)
[2024-01-13 01:58:07,354] INFO Started @9485ms (org.eclipse.jetty.server.Server)
[2024-01-13 01:58:07,355] INFO Schema Registry version: 7.5.3 commitId: 03b675da443c5687684ecae6736d873560f7c441 (io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain)
[2024-01-13 01:58:07,356] INFO Server started, listening for requests... (io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain)
Show the _schemas
information:
$ bin/kafka-topics.sh --describe --topic _schemas --bootstrap-server node-1:9092
Topic: _schemas TopicId: 9A_-36hMRYuTfUyhQwMm6Q PartitionCount: 1 ReplicationFactor: 2 Configs: cleanup.policy=compact,segment.bytes=1073741824
Topic: _schemas Partition: 0 Leader: 1001 Replicas: 1001,1002 Isr: 1001,1002
UI for Apache Kafka is a free, open-source web UI to monitor and manage Apache Kafka clusters. [15]
$ docker run -it -p 8080:8080 -e DYNAMIC_CONFIG_ENABLED=true provectuslabs/kafka-ui
Make sure the nodes in the cluster could be reachable each other. |
You can use the hostname, DNS name, or an external IP address to connect each other. |
You can run the ip a s to show the addresses assigned to all network interfaces.
|
The following steps will be demostrated with the following two nodes: |
192.168.56.131 node-1
192.168.56.132 node-2
Optional: Install Docker Engine
See https://docs.docker.com/engine/install/ to install Docker Engine.
You might need to configure the Docker daemon to use a different data directory (by default: /var/lib/docker on Linux) and the log driver options.
|
Optional: Stop and disable the firewalld.service.
View the current status:
sudo firewall-cmd --state
Stop the FirewallD service:
sudo systemctl stop firewalld.service
List the rules:
$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Disable the FirewallD service
sudo systemctl disable firewalld.service
Optional: Generate a Cluster UUID:
$ KAFKA_CLUSTER_ID="$(docker run --rm confluentinc/cp-kafka:7.5.3 kafka-storage random-uuid)"
$ echo $KAFKA_CLUSTER_ID
MkU3OEVBNTcwNTJENDM2Qg
Copy the docker/ directory to all the nodes in the Kafka cluster:
Start the controllers:
On node-1:
Update the compose.override.yml in docker/controller/compose.override.yml:
version: "2.4"
services:
controller:
environment:
KAFKA_NODE_ID: 3001
KAFKA_CONTROLLER_QUORUM_VOTERS: '3001@node-1:9093,3002@node-2:9093'
CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qg'
extra_hosts:
- "node-1:192.168.56.131"
- "node-2:192.168.56.132"
Update the CLUSTER_ID with the KAFKA_CLUSTER_ID that generated at the above step.
|
Each node ID (KAFKA_NODE_ID ) must be unique across all the nodes in a particular cluster.
|
Start the Kraft controller:
cd docker/controller
docker compose up -d
On node-2:
Repeat the above steps and update the KAFKA_NODE_ID
with 3002
.
Start the brokers:
On node-1:
Update the compose.override.yml in docker/broker/compose.override.yml:
version: "2.4"
services:
controller:
environment:
KAFKA_NODE_ID: 1001
KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://node-1:9092'
KAFKA_CONTROLLER_QUORUM_VOTERS: '3001@node-1:9093,3002@node-2:9093'
CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qg'
extra_hosts:
- "node-1:192.168.56.131"
- "node-2:192.168.56.132"
Update the CLUSTER_ID with the KAFKA_CLUSTER_ID that generated at step 2.
|
Each node ID (KAFKA_NODE_ID ) must be unique across all the nodes in a particular cluster.
|
The KAFKA_ADVERTISED_LISTENERS should be reachable by the clients outside the cluster. You could set it with a reachable hostname or DNS name, or an external IP address.
|
Start the broker:
cd docker/broker
docker compose up -d
Use kcat
to display the current state of the Kafka cluster and its topics, partitions, replicas and in-sync replicas (ISR).
$ docker run --rm --add-host node-1:192.168.56.131 confluentinc/cp-kcat:7.5.3 -b node-1:9092 -L
Metadata for all topics (from broker -1: node-1:9092/bootstrap):
1 brokers:
broker 1001 at node-1:9092 (controller)
0 topics:
Use the kafka-metadata-quorum
tool to view the metadata quorum status.
$ docker run --rm --add-host node-1:192.168.56.131 confluentinc/cp-kafka:7.5.3 kafka-metadata-quorum --bootstrap-server node-1:9092 describe --status
ClusterId: MkU3OEVBNTcwNTJENDM2Qg
LeaderId: 3002
LeaderEpoch: 28
HighWatermark: 47816
MaxFollowerLag: 0
MaxFollowerLagTimeMs: 32
CurrentVoters: [3001,3002]
CurrentObservers: [1001]
On node-2:
Repeat the above steps and update the KAFKA_NODE_ID
with 1002
, and KAFKA_ADVERTISED_LISTENERS
with 'PLAINTEXT://node-2:9092'
.
Use kcat
to display the current state of the Kafka cluster and its topics, partitions, replicas and in-sync replicas (ISR).
$ docker run --rm --add-host node-2:192.168.56.132 confluentinc/cp-kcat:7.5.3 -b node-2:9092 -L
Metadata for all topics (from broker 1002: node-2:9092/1002):
2 brokers:
broker 1001 at node-2:9092
broker 1002 at node-2:9092 (controller)
0 topics:
Use the kafka-metadata-quorum
tool to view the metadata quorum status.
$ docker run --rm --add-host node-2:192.168.56.132 confluentinc/cp-kafka:7.5.3 kafka-metadata-quorum --bootstrap-server node-2:9092 describe --status
ClusterId: MkU3OEVBNTcwNTJENDM2Qg
LeaderId: 3002
LeaderEpoch: 28
HighWatermark: 47816
MaxFollowerLag: 0
MaxFollowerLagTimeMs: 32
CurrentVoters: [3001,3002]
CurrentObservers: [1001,1002]
Start the Schema Registry:
On node-1:
Update the compose.override.yml in docker/schema-registry/compose.override.yml:
version: "2.4"
services:
schema-registry:
environment:
SCHEMA_REGISTRY_HOST_NAME: node-1
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: node-1:9092,node-2:9092
SCHEMA_REGISTRY_DEBUG: true
extra_hosts:
- "node-1:192.168.56.131"
- "node-2:192.168.56.132"
Start the Schema Registry:
cd docker/schema-registry:
docker compose up -d
On node-2:
Repeat the above steps, and replace the SCHEMA_REGISTRY_HOST_NAME
with node-2
to setup a replication if you need to support high available service.
Start the UI Kafka:
On node-1:
Update the compose.override.yml in docker/controller/compose.override.yml:
version: "2.4"
services:
kafka-ui:
environment:
KAFKA_CLUSTERS_0_NAME: iot
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: node-1:9092,node-2:9092
extra_hosts:
- "node-1:192.168.56.131"
- "node-2:192.168.56.132"
Start the kafka-ui:
cd docker/kafka-ui
docker compose up -d
Go to http://node-1:8080 with your browser to view the cluster status.
On node-2:
Repeat the above steps to setup a replication of the kafka-ui if you need to support high available service.
Go to http://node-2:8080 with your browser to view the cluster status.
[2] https://docs.confluent.io/platform/current/kafka-metadata/kraft.html
[3] https://www.redhat.com/en/resources/high-availability-for-apache-kafka-detail
[6] https://access.redhat.com/documentation/en-us/red_hat_amq_streams/2.5/html/using_amq_streams_on_rhel/monitoring-str
[7] https://www.confluent.io/blog/kafka-listeners-explained/
[8] https://docs.confluent.io/platform/current/schema-registry/index.html
[9] https://www.conduktor.io/blog/what-is-the-schema-registry-and-why-do-you-need-to-use-it/
[10] "20170707-EB-Confluent_Kafka_Definitive-Guide_Complete", https://www.confluent.io/resources/kafka-the-definitive-guide/
[11] https://docs.confluent.io/platform/current/installation/installing_cp/zip-tar.html
[12] https://docs.confluent.io/platform/current/schema-registry/installation/deployment.html
[13] https://docs.confluent.io/platform/current/schema-registry/multidc.html
[14] https://docs.confluent.io/platform/current/schema-registry/fundamentals/index.html
[15] https://docs.kafka-ui.provectus.io/overview/getting-started
[16] https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html
[19] https://docs.confluent.io/platform/current/installation/docker/installation.html
How to produce and consume events on Kafka with Spring Stream?
Apache Kafka is a highly popular option for data streaming. Spring Cloud Stream is a framework built upon Spring Boot for building message-driven microservice applications and it provides built-in capabilities to work with Apache Kafka as the underlying message broker.
Here’s a basic guide on how to produce and consume events on Kafka using Spring Cloud Stream:
Set Up Kafka: First you need to have a working Kafka instance. You can either install it on your machine or run it in a docker container.
Create Spring Boot Application: You can create a Spring Boot application using Spring Initializr.
Add Required Dependencies:
You need to have the following dependencies in your pom.xml
or build.gradle
file.
For Maven:
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-stream-kafka</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-stream</artifactId>
</dependency>
</dependencies>
For Gradle:
dependencies {
implementation 'org.springframework.cloud:spring-cloud-starter-stream-kafka'
implementation 'org.springframework.cloud:spring-cloud-stream'
}
Configure the Application:
In application.properties
or application.yml
file you need to set the Kafka broker address, the name of the topic, and other parameters if necessary.
spring:
cloud:
stream:
bindings:
output:
destination: topic-name
binder: kafka
kafka:
binder:
brokers: localhost:9092
Create a Source:
A source is a channel through which the data flows out to the Kafka broker. Annotate an output channel with @Output
.
public interface ProducerBinding {
String OUTPUT = "output";
@Output(OUTPUT)
MessageChannel output();
}
Publish Messages:
Inject the source in your service, create a message and send it using output().send(MessageBuilder.withPayload(yourMessage).build())
.
@Autowired
private ProducerBinding source;
public void publish(String message){
source.output().send(MessageBuilder.withPayload(message).build());
}
Create a Sink:
A sink is a channel through which the data flows into the application from Kafka. Annotate an input channel with @Input
.
public interface ConsumerBinding {
String INPUT = "input";
@Input(INPUT)
SubscribableChannel input();
}
Consume Messages:
Create a consumer class where you listen to the messages. Annotate a method with @StreamListener
to receive messages.
@Service
public class ConsumerService {
@StreamListener(ConsumerBinding.INPUT)
public void handle(String value) {
// Process the message
System.out.println("Received: " + value);
}
}
Run the Application: Now, you can run your Spring Boot application. It should be able to send and receive the messages from Kafka.
Remember, this is just a basic guide that covers the core concepts. There are more configurations and details that you can explore as per your use case. Make sure to refer to the official documentations for both Spring Cloud Stream and Apache Kafka.
What’s the bindings and binder?
Both “bindings” and “binder” are terms used within the Spring Cloud Stream (SCS) framework, and they’re central to understanding how Spring Cloud Stream operates.
Bindings: In Spring Cloud Stream, a binding is a bridge between your application’s message channels and the message broker (like Apache Kafka or RabbitMQ). They form the core abstraction in Spring Cloud Stream programming model.
The concept of ‘binding’ is about connecting your application with the messaging system via a Message Channel (a key abstraction in Spring Integration, where ‘input’ or ‘output’ represent an inbound or outbound entity).
This wiring is auto-created by Spring Cloud Stream. In simple terms, SCS provides connectivity between your application and messaging middleware, and this connectivity is facilitated by ‘bindings’.
For example, if you define a binding named “output” as in the previous example, Spring Cloud Stream will know that a channel named “output” should be used to dispatch the messages to the messaging middleware.
Binder: The Binder is the component of Spring Cloud Stream responsible for providing the necessary configuration and implementation for the connection to the messaging system.
Binders exist for various message brokers, including but not limited to RabbitMQ, Apache Kafka, Amazon Kinesis, etc. The binder brings the bits that are needed to connect the application to the specific message broker and acts as an adapter layer between your application and the messaging system.
When Spring Cloud Stream receives a message that needs to be published, it’s the binder which takes care of how to connect to the broker, how to serialize the message, where to publish it (topics, routing key, etc.), and so on. It also deals with how to read and convert incoming messages before sending them to your application.
Through this system of bindings and binders, Spring Cloud Stream abstracts away the specifics of the underlying message brokers, allowing developers to focus on writing the business logic.
’’’
The application.yml file is a configuration file used by Spring Boot applications. It’s a space-sensitive file in YAML format that Spring Boot reads at launch to configure properties of the application.
In the example you referenced:
spring:
cloud:
stream:
bindings:
output:
destination: topic-name
binder: kafka
kafka:
binder:
brokers: localhost:9092
Here’s the explanation for this:
spring
: This is the root configuration prefix for all Spring Boot applications.
cloud
: This is the specific configuration prefix for all Spring Cloud applications.
stream
: This is the specific configuration prefix from the Spring Cloud Stream project. It’s used to configure the Spring Cloud Stream properties.
bindings
: These are the definitions of your input and output channels. In your case, you’re defining an output channel.
output
: This is the name of your output channel. It corresponds to the OUTPUT
constant value (‘output’) defined in your code.
destination
: This is the name of the Kafka topic where the events will be produced. Messages sent to the ‘output’ channel will end up in this topic on the Kafka broker.
binder
: This defines the binder configuration that this binding should be created from. This is optional and if not present, the default binder configuration will be used.
kafka
: This is for Kafka-specific configuration.
binder
: It holds properties for kafka binder.
brokers
: This is the hostname and port for your Kafka broker(s). The application will connect to Kafka using these details.Remember that YAML files are whitespace and indentation sensitive. Misalignment may lead to an error or properties may not be assigned correctly.
What’s the default binder meaning?
In Spring Cloud Stream, the binder is the piece that ties your application to the specific message broker (like RabbitMQ, Kafka, Google Pub-Sub, etc.). It does this by providing the necessary configuration and implementation of the broker-specific code.
When you define the ‘binder’ property for a specific binding, you’re telling Spring Cloud Stream to use that as the binder (and hence the message broker) for that binding.
However, when you don’t specify a ‘binder’ property for a binding, Spring Cloud Stream will use the ‘defaultBinder’ that’s defined in your application configurations.
For example, consider a scenario where your application connects to two different message brokers - Kafka and RabbitMQ. In your configurations, you can define both Kafka and RabbitMQ binders, but also specify the Kafka binder as the ‘defaultBinder’.
In the application.yaml
file:
spring:
cloud:
stream:
defaultBinder: kafka
kafka:
binder:
brokers: localhost:9092
rabbitmq:
binder:
address: localhost
...
Any binding definitions that don’t explicitly set the ‘binder’ attribute will automatically use the Kafka binder because that’s defined as the ‘defaultBinder’. However, if you want a particular channel to use RabbitMQ instead, you can achieve that by explicitly stating so in the ‘binder’ attribute for that specific channel.
spring:
cloud:
stream:
bindings:
myOutputChannel:
destination: mytopic
myRabbitChannel:
destination: myqueue
binder: rabbitmq
In this example, myOutputChannel
will use Kafka as the binder (as Kafka is defined as the default binder), and myRabbitChannel
will use the RabbitMQ as the binder (as it’s explicitly defined for this binding).
What if there is only ond binder like Kafka? Should we need to specify the defaultBinder explicitly?
No, if you only have one binder (like Kafka) in your application, you don’t need to specify it as the ‘defaultBinder’ explicitly.
Spring Cloud Stream will automatically use the only available binder for your application. If you have only one binder configuration in your classpath, then Spring Cloud Stream smartly understands that to be the default binder for your application.
Here is an example configuration where no default binder is specified, and there is only one binder, Kafka:
spring:
cloud:
stream:
kafka:
binder:
brokers: localhost:9092
bindings:
output:
destination: topic-name
In this case, spring.cloud.stream.kafka.binder.brokers
property maps to the Kafka broker, and spring.cloud.stream.bindings.output.destination
specifies the name of the topic on the Kafka broker from which the messages will be published to.
Your application will use Kafka as the binder since it’s the only binder available in the setup, and you don’t need to specify it as a default binder explicitly.
The sample code you used by MessageChannel etc, nowdays SCS will use the spring cloud functions like Supplier, Consumer etc.
You’re correct. The Spring Cloud Stream framework has moved toward a more functional programming model, using java.util.function
’s Supplier
, Consumer
, and Function
for message handling. This is especially recommended for Spring Cloud Stream 3.0 and above. Here’s a quick sketch of how to use them with your Apache Kafka setup:
Producer:
Firstly, define a Supplier
bean for producing a message:
@Bean
public Supplier<String> produce() {
return () -> {
// generate message content
String message = "Hello, Kafka!";
return message;
};
}
In your application.yml
configure the output stream for the Supplier
:
spring:
cloud:
function:
definition: produce
stream:
bindings:
produce-out-0:
destination: topic-name
kafka:
binder:
brokers: localhost:9092
Consumer:
Next, declare a Consumer
to consume messages from a broker:
@Bean
public Consumer<String> consume() {
return message -> {
// process message content
System.out.println("Received: " + message);
};
}
In your application.yml
, configure the input stream for the Consumer
:
spring:
cloud:
function:
definition: consume
stream:
bindings:
consume-in-0:
destination: topic-name
kafka:
binder:
brokers: localhost:9092
By convention, SCS uses function names as the binding names: ‘produce-out-0’ and ‘consume-in-0’. ‘0’ is the index in case you have multiple inputs or outputs. But you can change these via Spring Cloud Stream’s ‘spring.cloud.stream.function.bindings.*’ property.
In the latest functional programming model, you don’t explicitly deal with message channels or use annotations such as @Input
or @Output
.
The methods signature implies whether the application is intended to be used for source/sink/processor.
Remember, these examples are just basic scenarios and more complicated scenarios like content-type negotiation or error handling in Apache Kafka with Spring Cloud Stream may need additional steps.
How does the produce function work? I mean how it is triggered to send messages?
Great question! The Supplier function in Spring Cloud Stream is used to produce or generate messages that need to be sent to the message broker system, like Apache Kafka or RabbitMQ.
However, a Supplier
in Spring Cloud Stream is not called or triggered directly. Instead, the framework does it automatically for you at regular intervals. This interval is by default set at 1 message/second, but it can be easily configured using the property spring.cloud.stream.poller.fixed-delay
.
Here’s an example of how to adjust the poller’s interval:
spring:
cloud:
stream:
poller:
fixed-delay: 5000
This configuration will trigger the Supplier
function approximately every 5 seconds (value is in milliseconds) to generate a new message.
Moreover, when the application starts up, Spring Cloud Stream automatically sets up a background loop that calls the Supplier.get()
method at the specified rate (or the default rate, if no rate is specified) and sends its output to the specified message broker destination.
Note: The Supplier function is generally used in source applications where you are producing messages and not consuming any.
What if we want to send a message manually, like when we receive a user login event on a Rest Controller?
Indeed, the Supplier function is automatically triggered at regular intervals which is not an ideal scenario when you want to produce a message based on a certain event - like a user login event, as in your example.
In such a case, you typically use a MessageChannel
to manually send messages. However, with the functional programming model, you could use a StreamBridge
. The StreamBridge
is a utility that is automatically available in the context which you can Autowire
and use it anywhere in your application to dynamically send messages to a destination.
Here’s an example of how to use StreamBridge
:
@RestController
public class LoginController {
private final StreamBridge streamBridge;
@Autowired
public LoginController(StreamBridge streamBridge) {
this.streamBridge = streamBridge;
}
@PostMapping("/login")
public String loginUser(@RequestBody User user) {
// validate and login user, generate response
// ...
// send a message when a user logs in
streamBridge.send("produce-out-0", user);
return "Login Successful!";
}
}
In this example, each time the “/login” endpoint is hit, a User
object would be sent to the output channel “produce-out-0”.
Remember to configure the output binding in your application.yml
appropriately:
spring:
cloud:
stream:
bindings:
produce-out-0:
destination: topic-name
kafka:
binder:
brokers: localhost:9092
The StreamBridge
utility provides a flexible way to produce messages from anywhere within your application, making it a great choice when you need to manually send messages in a Spring Cloud Stream application.
What if there are multiple functions, but used to be handle diff topics?
If you have multiple functions corresponding to different topics (destinations), you can easily configure them in Spring Cloud Stream.
Each java.util.function.Function
, java.util.function.Consumer
or java.util.function.Supplier
bean can correspond to a different binder destination.
Here’s an example:
Producer Function Beans:
@Bean
public Supplier<String> produceUserEvent() {
return () -> {
// generate user event message content
String message = "User event";
return message;
};
}
@Bean
public Supplier<String> produceProductEvent() {
return () -> {
// generate product event message content
String message = "Product event";
return message;
};
}
Consumer Function Beans:
@Bean
public Consumer<String> consumeUserEvent() {
return message -> {
// process user event message content
System.out.println("Received user event: " + message);
};
}
@Bean
public Consumer<String> consumeProductEvent() {
return message -> {
// process product event message content
System.out.println("Received product event: " + message);
};
}
Configuration in application.yml
:
You can declare the bindings and specific destinations in your application.yml
:
spring:
cloud:
function:
definition: produceUserEvent;consumeUserEvent;produceProductEvent;consumeProductEvent
stream:
bindings:
produceUserEvent-out-0:
destination: user-events
consumeUserEvent-in-0:
destination: user-events
produceProductEvent-out-0:
destination: product-events
consumeProductEvent-in-0:
destination: product-events
kafka:
binder:
brokers: localhost:9092
In this example, produceUserEvent-out-0
and consumeUserEvent-in-0
are used to produce and consume messages for “user events”, while produceProductEvent-out-0
and consumeProductEvent-in-0
are used to produce and consume “product events”. Each -out-0
or -in-0
suffix represents the output or input channel binding for that function.
In the end, each function serves its own topic, producing or consuming different set of data, while having separate responsibility.
got it, it seems that the function.definition is used to tell the SCS to indentify the handlers? q
- That’s correct. The
spring.cloud.function.definition
property in the application configuration can be used to specify the beans - eitherSupplier
,Consumer
, orFunction
- that Spring Cloud Stream should bind to messaging channels.
If you have multiple function beans in your Spring application context, but you only want to use some of them for messaging purposes, this is where spring.cloud.function.definition
becomes handy.
The property takes a semi-colon separated list of the bean names to be bound to the messaging system.
For example:
spring:
cloud:
function:
definition: myFunction;myConsumer;mySupplier
In this example, myFunction
, myConsumer
, mySupplier
are the names of Function
, Consumer
, Supplier
beans respectively. Spring Cloud Stream will bind only these beans to messaging channels according to the output/input bindings specified elsewhere in your properties.
It’s important to remember that the function definition should have the same name as the function bean in the application. If your function bean is defined as follows:
@Bean
public Consumer<String> myConsumer() {
return data -> { /*...*/ };
}
You should use myConsumer
in the spring.cloud.function.definition
.
How to compiste functions to handle a topic?
To compose functions, you use the pipe symbol |
, similar to how you would in a Unix/Linux command line. Here’s an example:
@Bean
public Function<String, String> firstFunction() {
return input -> input.toUpperCase();
}
@Bean
public Function<String, String> secondFunction() {
return input -> "Message: " + input;
}
And in your application configuration file:
spring:
cloud:
function:
definition: firstFunction|secondFunction
The firstFunction
will be applied first, and then its output will be used as the input to the secondFunction
. This way, when a message comes in, it’ll be first converted to uppercase by firstFunction
, and then secondFunction
will prepended “Message: “ to it.
Keep in mind, when composing functions:
Consumer<A> | Consumer<B>
) or two or more suppliers (Supplier<A> | Supplier<B>
) is not valid as per Spring Cloud Stream documentation because the supplier does not consume any input, and the consumer does not produce any output.How does the composite functions to be bound to a destination?
The composed (composite) functions can be bound to a destination (topic) through the application configuration file (like application.yml) in Spring Cloud Stream.
The basic procedure is to define the destination in the spring.cloud.stream.bindings.<functionName>-in-0.destination
for input bindings and spring.cloud.stream.bindings.<functionName>-out-0.destination
for output bindings, where <functionName>
is the composed function name.
For example, consider two functions: uppercase
and prefix
which you have composed as processFunction
:
@Bean
public Function<String, String> uppercase() {
return String::toUpperCase;
}
@Bean
public Function<String, String> prefix() {
return input -> "Hello " + input;
}
You define the composite function in your spring.cloud.function.definition
like this:
spring:
cloud:
function:
definition: processFunction
To bind the composite function to a topic, define the input and output bindings:
spring:
cloud:
stream:
bindings:
processFunction-in-0:
destination: input-topic
processFunction-out-0:
destination: output-topic
In this setup, messages coming in from input-topic
will trigger the composite function processFunction
(which consists of uppercase
and prefix
) and the results will be sent to output-topic
.
Remember, the bindings
property should match the function name. So if your composed function is called processFunction
, you should have your bindings as processFunction-in-0
and processFunction-out-0
.
I didn’t see any configuration about the realtion between processFunction and the uppercase and prefix.
I apologize for the confusion, I should have clarified that in the explanation. The processFunction was intended as a placeholder name in the bindings configuration for the composed function (uppercase | prefix). |
The binding happens based on the spring.cloud.function.definition property. The composed function is defined as the concatenation of the beans you want to compose (uppercase |
prefix, in this case) in the spring.cloud.function.definition property. |
Here’s how you can specify uppercase
and prefix
functions and bind them to input and output channels:
@Bean
public Function<String, String> uppercase() {
return String::toUpperCase;
}
@Bean
public Function<String, String> prefix() {
return input -> "Hello " + input;
}
In your application.yml
, you compose them together:
spring:
cloud:
function:
definition: uppercase|prefix
cloud:
stream:
bindings:
uppercase|prefix-in-0:
destination: input-topic
uppercase|prefix-out-0:
destination: output-topic
In this setup, when a message comes in from the input-topic
, it is first processed by the uppercase
function, and then it’s output is processed by the prefix
function, and then the final output is sent to output-topic
.
The use of |
(pipe) character enables function composition, where the output of one function becomes the input of the next one in line.
Generics let you tailor a method, class, structure, or interface to the precise data type it acts upon. For example, instead of using the Hashtable
class, which allows keys and values to be of any type, you can use the Dictionary<TKey,TValue>
generic class and specify the types allowed for the key and the value. Among the benefits of generics are increased code reusability and type safety. [1]
Generics are classes, structures, interfaces, and methods that have placeholders (type parameters) for one or more of the types that they store or use. A generic collection class might use a type parameter as a placeholder for the type of objects that it stores. The type parameters appear as the types of its fields and the parameter types of its methods. A generic method might use its type parameter as the type of its return value or as the type of one of its formal parameters.
The following code illustrates a simple generic class definition.
public class SimpleGenericClass<T>
{
public T Field;
}
When you create an instance of a generic class, you specify the actual types to substitute for the type parameters. This establishes a new generic class, referred to as a constructed generic class, with your chosen types substituted everywhere that the type parameters appear. The result is a type-safe class that is tailored to your choice of types, as the following code illustrates.
public static void Main()
{
SimpleGenericClass<string> g = new SimpleGenericClass<string>();
g.Field = "A string";
//...
Console.WriteLine("SimpleGenericClass.Field = \"{0}\"", g.Field);
Console.WriteLine("SimpleGenericClass.Field.GetType() = {0}", g.Field.GetType().FullName);
}
The following terms are used to discuss generics in .NET:
A generic type definition is a class, structure, or interface declaration that functions as a template, with placeholders for the types that it can contain or use. For example, the System.Collections.Generic.Dictionary<TKey,TValue>
class can contain two types: keys and values. Because a generic type definition is only a template, you cannot create instances of a class, structure, or interface that is a generic type definition.
Generic type parameters, or type parameters, are the placeholders in a generic type or method definition. The System.Collections.Generic.Dictionary<TKey,TValue>
generic type has two type parameters, TKey
and TValue
, that represent the types of its keys and values.
A constructed generic type, or constructed type, is the result of specifying types for the generic type parameters of a generic type definition.
A generic type argument is any type that is substituted for a generic type parameter.
The general term generic type includes both constructed types and generic type definitions.
Covariance and contravariance of generic type parameters enable you to use constructed generic types whose type arguments are more derived (covariance) or less derived (contravariance) than a target constructed type. Covariance and contravariance are collectively referred to as variance.
Constraints are limits placed on generic type parameters. For example, you might limit a type parameter to types that implement the System.Collections.Generic.IComparer<T>
generic interface, to ensure that instances of the type can be ordered. You can also constrain type parameters to types that have a particular base class, that have a parameterless constructor, or that are reference types or value types. Users of the generic type cannot substitute type arguments that do not satisfy the constraints.
A generic method definition is a method with two parameter lists: a list of generic type parameters and a list of formal parameters. Type parameters can appear as the return type or as the types of the formal parameters, as the following code shows.
T MyGenericMethod<T>(T arg)
{
T temp = arg;
//...
return temp;
}
Generic methods can appear on generic or nongeneric types. It’s important to note that a method is not generic just because it belongs to a generic type, or even because it has formal parameters whose types are the generic parameters of the enclosing type. A method is generic only if it has its own list of type parameters. In the following code, only method G is generic.
class A
{
T G<T>(T arg)
{
T temp = arg;
//...
return temp;
}
}
class MyGenericClass<T>
{
T M(T arg)
{
T temp = arg;
//...
return temp;
}
}
There are many advantages to using generic collections and delegates:
Type safety. Generics shift the burden of type safety from you to the compiler. There is no need to write code to test for the correct data type because it is enforced at compile time. The need for type casting and the possibility of run-time errors are reduced.
Less code and code is more easily reused. There is no need to inherit from a base type and override members. For example, the LinkedList<T>
is ready for immediate use. For example, you can create a linked list of strings with the following variable declaration:
LinkedList<string> llist = new LinkedList<string>();
Better performance. Generic collection types generally perform better for storing and manipulating value types because there is no need to box the value types.
Boxing is the process of converting a value type to the type object or to any interface type implemented by this value type. When the common language runtime (CLR) boxes a value type, it wraps the value inside a
System.Object
instance and stores it on the managed heap. Unboxing extracts the value type from the object. Boxing is implicit; unboxing is explicit. The concept of boxing and unboxing underlies the C# unified view of the type system in which a value of any type can be treated as an object.
Generic delegates enable type-safe callbacks without the need to create multiple delegate classes. For example, the Predicate<T>
generic delegate allows you to create a method that implements your own search criteria for a particular type and to use your method with methods of the Array
type such as Find
, FindLast
, and FindAll
.
Generics streamline dynamically generated code. When you use generics with dynamically generated code you do not need to generate the type. This increases the number of scenarios in which you can use lightweight dynamic methods instead of generating entire assemblies.
The following are some limitations of generics:
Generic types can be derived from most base classes, such as MarshalByRefObject
(and constraints can be used to require that generic type parameters derive from base classes like MarshalByRefObject
). However, .NET does not support context-bound generic types. A generic type can be derived from ContextBoundObject
, but trying to create an instance of that type causes a TypeLoadException
.
Enumerations cannot have generic type parameters. An enumeration can be generic only incidentally (for example, because it is nested in a generic type that is defined using Visual Basic, C#, or C++).
Lightweight dynamic methods cannot be generic.
In Visual Basic, C#, and C++, a nested type that is enclosed in a generic type cannot be instantiated unless types have been assigned to the type parameters of all enclosing types. Another way of saying this is that in reflection, a nested type that is defined using these languages includes the type parameters of all its enclosing types. This allows the type parameters of enclosing types to be used in the member definitions of a nested type.
Liskov’s notion of a behavioural subtype defines a notion of substitutability for objects; that is, if S is a subtype of T, then objects of type T in a program may be replaced with objects of type S without altering any of the desirable properties of that program (e.g. correctness). [3]
Liskov substitution principle imposes some standard requirements on signatures that have been adopted in newer object-oriented programming languages (usually at the level of classes rather than types):
Contravariance of method parameter types in the subtype.
Covariance of method return types in the subtype.
New exceptions cannot be thrown by the methods in the subtype, except if they are subtypes of exceptions thrown by the methods of the supertype.
Covariance and contravariance are terms that refer to the ability to use a more derived type (more specific) or a less derived type (less specific) than originally specified. Generic type parameters support covariance and contravariance to provide greater flexibility in assigning and using generic types. [2]
When you’re referring to a type system, covariance, contravariance, and invariance have the following definitions. The examples assume a base class named Base
and a derived class named Derived
.
Covariance
Enables you to use a more derived type than originally specified.
You can assign an instance of IEnumerable<Derived>
to a variable of type IEnumerable<Base>
.
Contravariance
Enables you to use a more generic (less derived) type than originally specified.
You can assign an instance of Action<Base>
to a variable of type Action<Derived>
.
Invariance
Means that you can use only the type originally specified. An invariant generic type parameter is neither covariant nor contravariant.
You cannot assign an instance of List<Base>
to a variable of type List<Derived>
or vice versa.
Covariant type parameters enable you to make assignments that look much like ordinary Polymorphism, as shown in the following code.
IEnumerable<Derived> d = new List<Derived>();
IEnumerable<Base> b = d;
Contravariance, on the other hand, seems counterintuitive.
Action<Base> b = (target) => { Console.WriteLine(target.GetType().Name); };
Action<Derived> d = b;
d(new Derived());
In general, a covariant type parameter can be used as the return type of a delegate, and contravariant type parameters can be used as parameter types. For an interface, covariant type parameters can be used as the return types of the interface’s methods, and contravariant type parameters can be used as the parameter types of the interface’s methods.
Covariance and contravariance are collectively referred to as variance. A generic type parameter that is not marked covariant or contravariant is referred to as invariant. A brief summary of facts about variance in the common language runtime:
Variant type parameters are restricted to generic interface and generic delegate types.
A generic interface or generic delegate type can have both covariant and contravariant type parameters.
Variance applies only to reference types; if you specify a value type for a variant type parameter, that type parameter is invariant for the resulting constructed type.
Variance does not apply to delegate combination. That is, given two delegates of types Action<Derived>
and Action<Base>
(Action(Of Derived)
and Action(Of Base)
in Visual Basic), you cannot combine the second delegate with the first although the result would be type safe. Variance allows the second delegate to be assigned to a variable of type Action<Derived>
, but delegates can combine only if their types match exactly.
Starting in C# 9, covariant return types are supported. An overriding method can declare a more derived return type the method it overrides, and an overriding, read-only property can declare a more derived type.
abstract class Animal
{
public abstract Food GetFood();
...
}
class Tiger : Animal
{
public override Meat GetFood() => ...;
}
A covariant type parameter is marked with the out
keyword (Out
keyword in Visual Basic).
You can use a covariant type parameter as the return value of a method that belongs to an interface, or as the return type of a delegate.
If a method of an interface has a parameter that is a generic delegate type, a covariant type parameter of the interface type can be used to specify a contravariant type parameter of the delegate type.
|
You cannot use a covariant type parameter as a generic type constraint for interface methods. [4]
interface ICovariant<out R>
{
// The following statement generates a compiler error
// because you can use only contravariant or invariant types
// in generic constraints.
// void DoSomething<T>() where T : R;
}
A contravariant type parameter is marked with the in
keyword (In
keyword in Visual Basic).
You can use a contravariant type parameter as the type of a parameter of a method that belongs to an interface, or as the type of a parameter of a delegate.
You can use a contravariant type parameter as a generic type constraint for an interface method.
interface IContravariant<in A>
{
void SetSomething(A sampleArg);
void DoSomething<T>() where T : A;
// The following statement generates a compiler error.
// A GetSomething();
}
An interface or delegate type can have both covariant and contravariant type parameters.
public delegate TResult Func<in T, out TResult>(T arg)
Only interface types and delegate types can have variant type parameters. |
When a generic type or method is compiled into Microsoft intermediate language (MSIL), it contains metadata that identifies it as having type parameters. How the MSIL for a generic type is used differs based on whether the supplied type parameter is a value type or reference type. [5]
When a generic type is first constructed with a value type as a parameter, the runtime creates a specialized generic type with the supplied parameter or parameters substituted in the appropriate locations in the MSIL. Specialized generic types are created one time for each unique value type that is used as a parameter.
However, suppose a different value type as its parameter is created at another point, the runtime generates another version of the generic type and substitutes the type arguments in the appropriate locations in MSIL. Conversions are no longer necessary because each specialized generic class natively contains the value type.
The first time a generic type is constructed with any reference type, the runtime creates a specialized generic type with object references substituted for the parameters in the MSIL. Then, every time that a constructed type is instantiated with a reference type as its parameter, regardless of what type it is, the runtime reuses the previously created specialized version of the generic type. This is possible because all references are the same size.
Because the number of reference types can vary wildly from program to program, the C# implementation of generics greatly reduces the amount of code by reducing to one the number of specialized classes created by the compiler for generic classes of reference types.
Moreover, when a generic C# class is instantiated by using a value type or reference type parameter, reflection can query it at run time and both its actual type and its type parameter can be ascertained.
The runtime creates specific versions of the generic type based on the actual types used to instantiate the generic type. For example, if you have a List<T> and you create a When you instantiate the generic type with a reference type, like However, the .NET CLR maintains type safety by treating these as separate types at the type system level, even though the underlying implementation is the same. |
From the point of view of reflection, the difference between a generic type and an ordinary type is that a generic type has associated with it a set of type parameters (if it is a generic type definition) or type arguments (if it is a constructed type). A generic method differs from an ordinary method in the same way. [6]
There are two keys to understanding how reflection handles generic types and methods:
The type parameters of generic type definitions and generic method definitions are represented by instances of the Type class.
If an instance of Type represents a generic type, then it includes an array of types that represent the type parameters (for generic type definitions) or the type arguments (for constructed types). The same is true of an instance of the MethodInfo class that represents a generic method.
A generic type or method is closed if instantiable types have been substituted for all its type parameters, including all the type parameters of all enclosing types. You can only create an instance of a generic type if it is closed. |
WriteLine(typeof(Dictionary<,>));
WriteLine(typeof(MySet<>));
WriteLine(typeof(MySet<int>));
MySet<int> mySet = (MySet<int>)typeof(MySet<int>).GetConstructor([])!.Invoke(null);
mySet.AddRange([1, 2, 1, 1, 2, 1, 1, 2, 3, 4]);
mySet.Keys.ToList().ForEach(Console.WriteLine);
static void WriteLine(Type type)
{
string message = string.Format("{0} {1,-31} {2}",
$"IsGenericType: {type.IsGenericType}.",
$"IsGenericTypeDefinition: {type.IsGenericTypeDefinition}.",
$"Open or Closed: {(type.ContainsGenericParameters ? "Open" : "Closed")}.");
Console.WriteLine(message);
}
class MySet<E> : Dictionary<E, bool> where E : notnull
{
public void AddRange(params E[] items) => Array.ForEach(items, Add);
public void Add(E e) { if (ContainsKey(e)) return; Add(e, true); }
}
// $ dotnet run
// IsGenericType: True.IsGenericTypeDefinition: True.Open or Closed: Open.
// IsGenericType: True.IsGenericTypeDefinition: True.Open or Closed: Open.
// IsGenericType: True.IsGenericTypeDefinition: False.Open or Closed: Closed.
// 1
// 2
// 3
// 4
> Why the value type need NOT to be boxed in a generic collection in .NET? * In .NET, boxing is the process of converting a value type to a reference type, this involves creating a new object on the heap and copying the value into the new object. Unboxing is the reverse process, which can be costly in terms of performance. When using a generic collection, like Generics in .NET use a mechanism called 'reified generics', which means that the system knows the actual types a generic class was instantiated with, even at runtime. It uses this information to directly allocate and deallocate memory for the values in the collection, avoiding the need to box and unbox value types. That’s one of the key benefits of utilizing generics in .NET: they allow you to enforce type safety without incurring the performance cost of boxing and unboxing for value types. |
> With reified generics, the mememory would be still allocated on mananged heap? * Yes, the memory for any object, including those constructed using generic types, is still allocated on the managed heap in .NET. However, the point of difference lies in how these objects are managed. With reference types, an object is created on the heap and a reference to this object is used to access it. Value types, on the other hand, hold their value directly. When value types are used in a generic collection, a consecutive block of memory is allocated on the heap for the collection to store the actual values directly rather than references to the values. This optimization contributes to both the performance speed and reduction of pressure on the garbage collector. The mechanism 'reified generics' helps to keep the information about the specific type that a generic type was instantiated with, and this helps in efficient memory allocation and type safety at runtime. |
[1] https://learn.microsoft.com/en-us/dotnet/standard/generics/
[2] https://learn.microsoft.com/en-us/dotnet/standard/generics/covariance-and-contravariance
[3] https://en.wikipedia.org/wiki/Liskov_substitution_principle
[4] https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/covariance-contravariance/creating-variant-generic-interfaces
[5] https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/generics/generics-in-the-run-time
[6] https://learn.microsoft.com/en-us/dotnet/framework/reflection-and-codedom/reflection-and-generic-types
Delegates are reference types that serve a purpose similar to that of function pointers in C++. They are used for event handlers and callback functions in .NET. Unlike function pointers, delegates are secure, verifiable, and type safe. A delegate type can represent any instance method or static method that has a compatible signature. [cts-delegates]
A parameter of a delegate is compatible with the corresponding parameter of a method if the type of the delegate parameter is more restrictive than the type of the method parameter, because this guarantees that an argument passed to the delegate can be passed safely to the method.
Similarly, the return type of a delegate is compatible with the return type of a method if the return type of the method is more restrictive than the return type of the delegate, because this guarantees that the return value of the method can be cast safely to the return type of the delegate.
Liskov’s notion of a behavioural subtype defines a notion of substitutability for objects; that is, if S is a subtype of T, then objects of type T in a program may be replaced with objects of type S without altering any of the desirable properties of that program (e.g. correctness).
All delegates inherit from System.MulticastDelegate, which inherits from System.Delegate. The C#, Visual Basic, and C++ languages do not allow inheritance from these types. Instead, they provide keywords for declaring delegates.
Because delegates inherit from MulticastDelegate, a delegate has an invocation list, which is a list of methods that the delegate represents and that are executed when the delegate is invoked. All methods in the list receive the arguments supplied when the delegate is invoked.
The return value is not defined for a delegate that has more than one method in its invocation list, even if the delegate has a return type. |
In many cases, such as with callback methods, a delegate represents only one method, and the only actions you have to take are creating the delegate and invoking it.
For delegates that represent multiple methods, .NET provides methods of the Delegate and MulticastDelegate delegate classes to support operations such as adding a method to a delegate’s invocation list (the Delegate.Combine
method), removing a method (the Delegate.Remove
method), and getting the invocation list (the Delegate.GetInvocationList
method).
The following example declares a delegate named Callback
that can encapsulate a method that takes a string
as an argument and returns void
: [2]
public delegate void Callback(string message);
A delegate object is normally constructed by providing the name of the method the delegate will wrap, or with a {lambda-expressions}[lambda expression]. Once a delegate is instantiated in this manner it can be invoked. Invoking a delegate calls the method attached to the delegate instance. The parameters passed to the delegate by the caller are passed to the method, and the return value, if any, from the method is returned to the caller by the delegate. For example:
// Create a method for a delegate.
public static void DelegateMethod(string message)
{
Console.WriteLine(message);
}
// Instantiate the delegate.
Callback handler = DelegateMethod;
// Call the delegate.
handler("Hello World");
A delegate can call more than one method when invoked. This is referred to as multicasting. To add an extra method to the delegate’s list of methods—the invocation list—simply requires adding two delegates using the addition or addition assignment operators ('' or '='). For example:
var obj = new MethodClass();
Callback d1 = obj.Method1;
Callback d2 = obj.Method2;
Callback d3 = MethodClass.Method3;
Callback allMethodsDelegate = d1 + d2;
allMethodsDelegate += d3;
allMethodsDelegate -= d2;
Delegate[] delegates = allMethodsDelegate.GetInvocationList();
int invocationCount = delegates.Length;
public class MethodClass
{
public void Method1(string message) => Console.WriteLine($"Method 1: {message}");
public void Method2(string message) => Console.WriteLine($"Method 2: {message}");
public static void Method3(string message) => Console.WriteLine($"Method 3: {message}");
}
In order to streamline the development process, .NET includes a set of delegate types that programmers can reuse and not have to create new types. These types are Func<>
, Action<>
and Predicate<>
, and they can be used without having to define new delegate types.
.NET Framework 2.0 introduced the concept of anonymous delegates, which let you create "inline" delegates without having to specify any additional type or method.
List<int> nums = new List<int>([1, 5, 3, 2, 0, 4]);
nums.Sort(delegate (int x, int y) { return y - x; });
nums.ForEach(delegate (int num) { Console.Write($"{num} "); });
// 5 4 3 2 1 0
Lambda expressions, or just "lambdas" for short, were introduced in C# 3.0 as one of the core building blocks of Language Integrated Query (LINQ). They are just a more convenient syntax for using delegates. They declare a parameter list and method body, but don’t have a formal identity of their own, unless they are assigned to a delegate. Unlike delegates, they can be directly assigned as the right-hand side of event registration or in various LINQ clauses and methods.
List<int> nums = new List<int>([1, 5, 3, 2, 0, 4]);
nums.Sort((x, y) => y - x);
nums.ForEach(num => Console.Write($"{num} "));
// 5 4 3 2 1 0
> What’s diff of Delegates, Lambdas and anonymous delegates in essence? * In C#, delegates, lambdas, and anonymous methods are all used to create and work with function pointers. However, they’re best understood in the context of their introduction and the situations they’re used for. Here’s a basic breakdown:
In essence:
But remember: under the hood, lambdas and anonymous methods are compiled into delegates by the C# compiler. |
Events in .NET are based on the delegate model. The delegate model follows the observer design pattern, which enables a subscriber to register with and receive notifications from a provider. An event sender pushes a notification that an event has happened, and an event receiver receives that notification and defines a response to it. [3]
To define an event, you use the C# event
or the Visual Basic Event
keyword in the signature of your event class, and specify the type of delegate for the event.
Typically, to raise an event, you add a method that is marked as protected
and virtual (in C#) or Protected
and Overridable
(in Visual Basic).
Counter counter = new Counter();
counter.Callback += Console.WriteLine;
counter.Count++;
counter.Count++;
class Counter
{
public event Callback? Callback;
private void OnCallback(string message)
{
Callback?.Invoke(message);
}
private int _count;
public int Count
{
get => _count;
set
{
if (value != _count)
{
int old = _count;
_count = value;
OnCallback($"Count was changed from {old} to {_count}.");
}
}
}
}
// $ dotnet run
// Count was changed from 0 to 1.
// Count was changed from 1 to 2.
[1] https://learn.microsoft.com/en-us/dotnet/standard/base-types/common-type-system#delegates
[2] https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/delegates/using-delegates
[3] https://learn.microsoft.com/en-us/dotnet/standard/events/
[4] https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/operators/lambda-expressions
"Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once." — Rob Pike
Multithreading allows you to increase the responsiveness of your application and, if your application runs on a multiprocessor or multi-core system, increase its throughput. [1]
A process is an executing program. An operating system uses processes to separate the applications that are being executed.
A thread is the basic unit to which an operating system allocates processor time. Each thread has a scheduling priority and maintains a set of structures the system uses to save the thread context when the thread’s execution is paused.
The thread context includes all the information the thread needs to seamlessly resume execution, including the thread’s set of CPU registers and stack. Multiple threads can run in the context of a process. All threads of a process share its virtual address space. A thread can execute any part of the program code, including parts currently being executed by another thread.
.NET Framework provides a way to isolate applications within a process with the use of application domains. (Application domains are not available on .NET Core.) |
By default, a .NET program is started with a single thread, often called the primary thread. However, it can create additional threads to execute code in parallel or concurrently with the primary thread. These threads are often called worker threads.
Starting with .NET Framework 4, the recommended way to utilize multithreading is to use Task Parallel Library (TPL) and Parallel LINQ (PLINQ).
Both TPL and PLINQ rely on the ThreadPool threads. The System.Threading.ThreadPool
class provides a .NET application with a pool of worker threads. You can also use thread pool threads.
At last, you can use the System.Threading.Thread class that represents a managed thread.
With .NET, you can write applications that perform multiple operations at the same time. Operations with the potential of holding up other operations can execute on separate threads, a process known as multithreading or free threading. [2]
Applications that use multithreading are more responsive to user input because the user interface stays active as processor-intensive tasks execute on separate threads. Multithreading is also useful when you create scalable applications because you can add threads as the workload increases.
You create a new thread by creating a new instance of the System.Threading.Thread class. You provide the name of the method that you want to execute on the new thread to the constructor. To start a created thread, call the Thread.Start
method.
new Thread(() => Console.WriteLine("Hello Thread")).Start();
To terminate the execution of a thread, use the System.Threading.CancellationToken. It provides a unified way to stop threads cooperatively.
Sometimes it’s not possible to stop a thread cooperatively because it runs third-party code not designed for cooperative cancellation. In this case, you might want to terminate its execution forcibly. To terminate the execution of a thread forcibly, in .NET Framework you can use the Thread.Abort
method. That method raises a ThreadAbortException
on the thread on which it’s invoked.
The Thread.Abort method isn’t supported in .NET Core. If you need to terminate the execution of third-party code forcibly in .NET Core, run it in the separate process and use the Process.Kill method.
|
The System.Threading.CancellationToken
isn’t available before .NET Framework 4. To stop a thread in older .NET Framework versions, use the thread synchronization techniques to implement the cooperative cancellation manually. For example, you can create the volatile boolean
field shouldStop
and use it to request the code executed by the thread to stop.
Use the Thread.Join
method to make the calling thread wait for the termination of the thread being stopped.
You use the Thread.Sleep
method to pause the current thread for a specified amount of time. You can interrupt a blocked thread by calling the Thread.Interrupt
method.
Calling the Thread.Sleep
method causes the current thread to immediately block for the number of milliseconds or the time interval you pass to the method, and yields the remainder of its time slice to another thread. Once that interval elapses, the sleeping thread resumes execution. [4]
One thread cannot call Thread.Sleep on another thread. Thread.Sleep is a static method that always causes the current thread to sleep.
|
Calling Thread.Sleep
with a value of Timeout.Infinite
causes a thread to sleep until it is interrupted by another thread that calls the Thread.Interrupt
method on the sleeping thread, or until it is terminated by a call to its Thread.Abort
method.
You can interrupt a waiting thread by calling the Thread.Interrupt
method on the blocked thread to throw a ThreadInterruptedException
, which breaks the thread out of the blocking call. The thread should catch the ThreadInterruptedException
and do whatever is appropriate to continue working. If the thread ignores the exception, the runtime catches the exception and stops the thread.
|
If the target thread is not blocked when Thread.Interrupt is called, the thread is not interrupted until it blocks. If the thread never blocks, it could complete without ever being interrupted.
|
If a wait is a managed wait, then Thread.Interrupt
and Thread.Abort
both wake the thread immediately. If a wait is an unmanaged wait (for example, a platform invoke call to the Win32 WaitForSingleObject
function), neither Thread.Interrupt
nor Thread.Abort
can take control of the thread until it returns to or calls into managed code. In managed code, the behavior is as follows:
Thread.Interrupt
wakes a thread out of any wait it might be in and causes a ThreadInterruptedException
to be thrown in the destination thread.
.NET Framework only: Thread.Abort
wakes a thread out of any wait it might be in and causes a ThreadAbortException
to be thrown on the thread.
Thread sleepingThread = new Thread(() =>
{
Console.WriteLine("Thread '{0}' about to sleep indefinitely.", Thread.CurrentThread.Name);
try
{
Thread.Sleep(Timeout.Infinite);
}
catch (ThreadInterruptedException)
{
Console.WriteLine("Thread '{0}' awoken.", Thread.CurrentThread.Name);
}
finally
{
Console.WriteLine("Thread '{0}' executing finally block.", Thread.CurrentThread.Name);
}
Console.WriteLine("Thread '{0} finishing normal execution.", Thread.CurrentThread.Name);
});
sleepingThread.Name = "Sleeping";
sleepingThread.Start();
Thread.Sleep(2000);
sleepingThread.Interrupt();
// Thread 'Sleeping' about to sleep indefinitely.
// Thread 'Sleeping' awoken.
// Thread 'Sleeping' executing finally block.
// Thread 'Sleeping finishing normal execution.
Starting with .NET Framework 4, .NET uses a unified model for cooperative cancellation of asynchronous or long-running synchronous operations. This model is based on a lightweight object called a cancellation token. The object that invokes one or more cancelable operations, for example by creating new threads or tasks, passes the token to each operation. Individual operations can in turn pass copies of the token to other operations. At some later time, the object that created the token can use it to request that the operations stop what they are doing. Only the requesting object can issue the cancellation request, and each listener is responsible for noticing the request and responding to it in an appropriate and timely manner. [3]
The general pattern for implementing the cooperative cancellation model is:
Instantiate a CancellationTokenSource
object, which manages and sends cancellation notification to the individual cancellation tokens.
Pass the token returned by the CancellationTokenSource.Token
property to each task or thread that listens for cancellation.
Provide a mechanism for each task or thread to respond to cancellation.
Call the CancellationTokenSource.Cancel
method to provide notification of cancellation.
// Create the token source.
CancellationTokenSource cts = new CancellationTokenSource();
// Pass the token to the cancelable operation.
ThreadPool.QueueUserWorkItem(obj =>
{
if (obj is CancellationToken token)
{
for (int i = 0; i < 100000; i++)
{
if (token.IsCancellationRequested)
{
Console.WriteLine("In iteration {0}, cancellation has been requested...", i + 1);
// Perform cleanup if necessary.
//...
// Terminate the operation.
break;
}
// Simulate some work.
Thread.SpinWait(500000);
}
}
}, cts.Token);
Thread.Sleep(2500);
// Request cancellation.
cts.Cancel();
Console.WriteLine("Cancellation set in token source...");
Thread.Sleep(2500);
// Cancellation should have happened, so call Dispose.
cts.Dispose();
// The example displays output like the following:
// Cancellation set in token source...
// In iteration 1430, cancellation has been requested...
The CancellationTokenSource class implements the IDisposable interface. You should be sure to call the CancellationTokenSource.Dispose method when you have finished using the cancellation token source to free any unmanaged resources it holds.
|
The following illustration shows the relationship between a token source and all the copies of its token.
The cooperative cancellation model makes it easier to create cancellation-aware applications and libraries, and it supports the following features:
Cancellation is cooperative and is not forced on the listener. The listener determines how to gracefully terminate in response to a cancellation request.
Requesting is distinct from listening. An object that invokes a cancelable operation can control when (if ever) cancellation is requested.
The requesting object issues the cancellation request to all copies of the token by using just one method call.
A listener can listen to multiple tokens simultaneously by joining them into one linked token.
User code can notice and respond to cancellation requests from library code, and library code can notice and respond to cancellation requests from user code.
Listeners can be notified of cancellation requests by polling, callback registration, or waiting on wait handles.
In more complex cases, it might be necessary for the user delegate to notify library code that cancellation has occurred. In such cases, the correct way to terminate the operation is for the delegate to call the ThrowIfCancellationRequested
, method, which will cause an OperationCanceledException
to be thrown. Library code can catch this exception on the user delegate thread and examine the exception’s token to determine whether the exception indicates cooperative cancellation or some other exceptional situation.
The System.Threading.Tasks.Task
and System.Threading.Tasks.Task<TResult>
classes support cancellation by using cancellation tokens. You can terminate the operation by using one of these options:
By returning from the delegate. In many scenarios, this option is sufficient. However, a task instance that’s canceled in this way transitions to the TaskStatus.RanToCompletion
state, not to the TaskStatus.Canceled
state.
By throwing an OperationCanceledException
and passing it the token on which cancellation was requested. The preferred way to perform is to use the ThrowIfCancellationRequested
method. A task that’s canceled in this way transitions to the Canceled
state, which the calling code can use to verify that the task responded to its cancellation request.
When a task instance observes an OperationCanceledException
thrown by the user code, it compares the exception’s token to its associated token (the one that was passed to the API that created the Task). If the tokens are same and the token’s IsCancellationRequested
property returns true
, the task interprets this as acknowledging cancellation and transitions to the Canceled
state. If you don’t use a Wait
or WaitAll
method to wait for the task, then the task just sets its status to Canceled
.
If you’re waiting on a Task that transitions to the Canceled
state, a System.Threading.Tasks.TaskCanceledException
exception (wrapped in an AggregateException
exception) is thrown. This exception indicates successful cancellation instead of a faulty situation. Therefore, the task’s Exception
property returns null
.
public class TaskCanceledException : OperationCanceledException
If the token’s IsCancellationRequested
property returns false
or if the exception’s token doesn’t match the Task’s token, the OperationCanceledException
is treated like a normal exception, causing the Task to transition to the Faulted
state. The presence of other exceptions will also cause the Task to transition to the Faulted
state. You can get the status of the completed task in the Status
property.
It’s possible that a task might continue to process some items after cancellation is requested.
Please note that if you use
|
A managed thread is either a background thread or a foreground thread. Background threads are identical to foreground threads with one exception: a background thread does not keep the managed execution environment running. Once all foreground threads have been stopped in a managed process (where the .exe file is a managed assembly), the system stops all background threads and shuts down.
Use the Thread.IsBackground
property to determine whether a thread is a background or a foreground thread, or to change its status. A thread can be changed to a background thread at any time by setting its IsBackground
property to true
.
Threads that belong to the managed thread pool (that is, threads whose IsThreadPoolThread
property is true
) are background threads. All threads that enter the managed execution environment from unmanaged code are marked as background threads. All threads generated by creating and starting a new Thread object are by default foreground threads.
If you use a thread to monitor an activity, such as a socket connection, set its IsBackground
property to true
so that the thread does not prevent your process from terminating.
* In .NET, even though you can technically change the The In the code you provided, you’re attempting to change the Always remember that |
The System.Threading.ThreadPool class provides your application with a pool of worker threads that are managed by the system, allowing you to concentrate on application tasks rather than thread management. If you have short tasks that require background processing, the managed thread pool is an easy way to take advantage of multiple threads. Use of the thread pool is significantly easier in Framework 4 and later, since you can create Task
and Task<TResult>
objects that perform asynchronous tasks on thread pool threads. [5]
Thread pool threads are background threads. Each thread uses the default stack size, runs at the default priority, and is in the multithreaded apartment. Once a thread in the thread pool completes its task, it’s returned to a queue of waiting threads. From this moment it can be reused. This reuse enables applications to avoid the cost of creating a new thread for each task.
There is only one thread pool per process. |
Unhandled exceptions in thread pool threads terminate the process. There are three exceptions to this rule:
A System.Threading.ThreadAbortException
is thrown in a thread pool thread because Thread.Abort
was called.
A System.AppDomainUnloadedException
is thrown in a thread pool thread because the application domain is being unloaded.
The common language runtime or a host process terminates the thread.
The number of operations that can be queued to the thread pool is limited only by available memory. However, the thread pool limits the number of threads that can be active in the process simultaneously. If all thread pool threads are busy, additional work items are queued until threads to execute them become available. The default size of the thread pool for a process depends on several factors, such as the size of the virtual address space. A process can call the ThreadPool.GetMaxThreads
method to determine the number of threads.
You can control the maximum number of threads by using the ThreadPool.GetMaxThreads
and ThreadPool.SetMaxThreads
methods.
The thread pool provides new worker threads or I/O completion threads on demand until it reaches a specified minimum for each category. You can use the ThreadPool.GetMinThreads
method to obtain these minimum values.
When demand is low, the actual number of thread pool threads can fall below the minimum values. |
When a minimum is reached, the thread pool can create additional threads or wait until some tasks complete. The thread pool creates and destroys worker threads in order to optimize throughput, which is defined as the number of tasks that complete per unit of time. Too few threads might not make optimal use of available resources, whereas too many threads could increase resource contention.
You can use the |
There are several scenarios in which it’s appropriate to create and manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large number of blocked thread pool threads might prevent tasks from starting.
You need to place threads into a single-threaded apartment. All ThreadPool threads are in the multithreaded apartment.
You need to have a stable identity associated with the thread, or to dedicate a thread to a task.
.NET provides a range of types that you can use to synchronize access to a shared resource or coordinate thread interaction. [6]
Multiple .NET synchronization primitives derive from the System.Threading.WaitHandle class, which encapsulates a native operating system synchronization handle and uses a signaling mechanism for thread interaction. Those classes include:
System.Threading.Mutex
, which grants exclusive access to a shared resource. The state of a mutex is signaled if no thread owns it.
System.Threading.Semaphore
, which limits the number of threads that can access a shared resource or a pool of resources concurrently. The state of a semaphore is set to signaled when its count is greater than zero, and nonsignaled when its count is zero.
System.Threading.EventWaitHandle
, which represents a thread synchronization event and can be either in a signaled or unsignaled state.
System.Threading.AutoResetEvent
, which derives from EventWaitHandle
and, when signaled, resets automatically to an unsignaled state after releasing a single waiting thread.
System.Threading.ManualResetEvent
, which derives from EventWaitHandle
and, when signaled, stays in a signaled state until the Reset
method is called.
In .NET Framework, because WaitHandle
derives from System.MarshalByRefObject
, these types can be used to synchronize the activities of threads across application domain boundaries.
In .NET Framework, .NET Core, and .NET 5+, some of these types can represent named system synchronization handles, which are visible throughout the operating system and can be used for the inter-process synchronization:
Mutex
Semaphore (on Windows)
EventWaitHandle (on Windows)
Lightweight synchronization types don’t rely on underlying operating system handles and typically provide better performance. However, they cannot be used for the inter-process synchronization. Use those types for thread synchronization within one application.
Some of those types are alternatives to the types derived from WaitHandle
. For example, SemaphoreSlim
is a lightweight alternative to Semaphore
.
public class SemaphoreSlim : IDisposable
public sealed class Semaphore : System.Threading.WaitHandle
.NET provides a range of synchronization primitives to control access to a shared resource by multiple threads.
The System.Threading.Monitor class grants mutually exclusive access to a shared resource by acquiring or releasing a lock on the object that identifies the resource. While a lock is held, the thread that holds the lock can again acquire and release the lock. Any other thread is blocked from acquiring the lock and the Monitor.Enter
method waits until the lock is released. The Enter
method acquires a released lock. You can also use the Monitor.TryEnter
method to specify the amount of time during which a thread attempts to acquire a lock. Because the Monitor class has thread affinity, the thread that acquired a lock must release the lock by calling the Monitor.Exit
method.
You can coordinate the interaction of threads that acquire a lock on the same object by using the Monitor.Wait
, Monitor.Pulse
, and Monitor.PulseAll
methods.
Use the |
var ch = new BlockingChannel<object>();
ThreadPool.QueueUserWorkItem(_ =>
{
for (int i = 0; i < 10; i++)
{
ch.Add(i);
}
ch.Add(null!);
});
foreach (var v in ch)
{
Console.Write($"{v} ");
}
class BlockingChannel<T> : IEnumerable<T> where T : class, new()
{
private readonly object lockObj = new();
private bool _isEmpty = true;
private T? _val;
public void Add(T value)
{
Monitor.Enter(lockObj);
try
{
while (!_isEmpty)
{
Monitor.Wait(lockObj);
}
_isEmpty = false;
_val = value;
Monitor.Pulse(lockObj);
}
finally
{
Monitor.Exit(lockObj);
}
}
public T? Get()
{
Monitor.Enter(lockObj);
try
{
while (_isEmpty)
{
Monitor.Wait(lockObj);
}
_isEmpty = true;
Monitor.Pulse(lockObj);
return _val;
}
finally
{
Monitor.Exit(lockObj);
}
}
public IEnumerator<T> GetEnumerator()
{
while (true)
{
T? val = Get();
if (val == null) break;
yield return val;
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
// $ dotnet run
// 0 1 2 3 4 5 6 7 8 9
The System.Threading.Mutex class, like Monitor, grants exclusive access to a shared resource. Use one of the Mutex.WaitOne
method overloads to request the ownership of a mutex. Like Monitor, Mutex has thread affinity and the thread that acquired a mutex must release it by calling the Mutex.ReleaseMutex
method.
Unlike Monitor
, the Mutex
class can be used for inter-process synchronization. To do that, use a named mutex, which is visible throughout the operating system. To create a named mutex instance, use a Mutex constructor that specifies a name. You can also call the Mutex.OpenExisting
method to open an existing named system mutex.
The System.Threading.SpinLock structure, like Monitor, grants exclusive access to a shared resource based on the availability of a lock. When SpinLock attempts to acquire a lock that is unavailable, it waits in a loop, repeatedly checking until the lock becomes available.
SpinLock sl = new SpinLock();
StringBuilder sb = new StringBuilder();
// Action taken by each parallel job.
// Append to the StringBuilder 10000 times, protecting
// access to sb with a SpinLock.
Action action = () =>
{
bool gotLock = false;
for (int i = 0; i < 10000; i++)
{
gotLock = false;
try
{
sl.Enter(ref gotLock);
sb.Append(i % 10);
}
finally
{
// Only give up the lock if you actually acquired it
if (gotLock) { sl.Exit(); }
}
}
};
// Invoke 3 concurrent instances of the action above
Parallel.Invoke(action, action, action);
// Check/Show the results
Console.WriteLine("sb.Length = {0} (should be 30000)", sb.Length);
Console.WriteLine("number of occurrences of '5' in sb: {0} (should be 3000)",
sb.ToString().Where(c => (c == '5')).Count());
The System.Threading.ReaderWriterLockSlim class grants exclusive access to a shared resource for writing and allows multiple threads to access the resource simultaneously for reading. You might want to use ReaderWriterLockSlim
to synchronize access to a shared data structure that supports thread-safe read operations, but requires exclusive access to perform write operation. When a thread requests exclusive access (for example, by calling the ReaderWriterLockSlim.EnterWriteLock
method), subsequent reader and writer requests block until all existing readers have exited the lock, and the writer has entered and exited the lock.
class SynchronizedDictionary<TKey, TValue> : IDisposable where TKey : notnull
{
private readonly Dictionary<TKey, TValue> _dictionary = new Dictionary<TKey, TValue>();
private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim();
public void Add(TKey key, TValue value)
{
_lock.EnterWriteLock();
try
{
_dictionary.Add(key, value);
}
finally { _lock.ExitWriteLock(); }
}
public void TryAddValue(TKey key, TValue value)
{
_lock.EnterUpgradeableReadLock();
try
{
if (_dictionary.TryGetValue(key, out var res) && res != null && res.Equals(value)) return;
_lock.EnterWriteLock();
try
{
_dictionary[key] = value;
}
finally { _lock.ExitWriteLock(); }
}
finally { _lock.ExitUpgradeableReadLock(); }
}
public bool TryGetValue(TKey key, [MaybeNullWhen(false)] out TValue value)
{
_lock.EnterReadLock();
try
{
return _dictionary.TryGetValue(key, out value);
}
finally { _lock.ExitReadLock(); }
}
private bool _disposed;
protected virtual void Dispose(bool disposing)
{
if (!_disposed)
{
if (disposing)
{
// perform managed resource cleanup here
_lock.Dispose();
}
// perform unmanaged resource cleanup here
_disposed = true;
}
}
~SynchronizedDictionary() => Dispose(disposing: false);
public void Dispose()
{
Dispose(disposing: true);
GC.SuppressFinalize(this);
}
}
The System.Threading.Semaphore and System.Threading.SemaphoreSlim classes limit the number of threads that can access a shared resource or a pool of resources concurrently. Additional threads that request the resource wait until any thread releases the semaphore. Because the semaphore doesn’t have thread affinity, a thread can acquire the semaphore and another one can release it.
SemaphoreSlim is a lightweight alternative to Semaphore and can be used only for synchronization within a single process boundary.
On Windows, you can use Semaphore for the inter-process synchronization. To do that, create a Semaphore instance that represents a named system semaphore by using one of the Semaphore constructors that specifies a name or the Semaphore.OpenExisting
method. SemaphoreSlim doesn’t support named system semaphores.
Thread interaction (or thread signaling) means that a thread must wait for notification, or a signal, from one or more threads in order to proceed. For example, if thread A calls the Thread.Join
method of thread B, thread A is blocked until thread B completes. The synchronization primitives described in the preceding section provide a different mechanism for signaling: by releasing a lock, a thread notifies another thread that it can proceed by acquiring the lock.
The System.Threading.EventWaitHandle class represents a thread synchronization event.
A synchronization event can be either in an unsignaled or signaled state. When the state of an event is unsignaled, a thread that calls the event’s WaitOne
overload is blocked until an event is signaled. The EventWaitHandle.Set
method sets the state of an event to signaled.
The behavior of an EventWaitHandle that has been signaled depends on its reset mode:
An EventWaitHandle created with the EventResetMode.AutoReset
flag resets automatically after releasing a single waiting thread. It’s like a turnstile that allows only one thread through each time it’s signaled. The System.Threading.AutoResetEvent class, which derives from EventWaitHandle, represents that behavior.
An EventWaitHandle created with the EventResetMode.ManualReset
flag remains signaled until its Reset
method is called. It’s like a gate that is closed until signaled and then stays open until someone closes it. The System.Threading.ManualResetEvent class, which derives from EventWaitHandle, represents that behavior. The System.Threading.ManualResetEventSlim class is a lightweight alternative to ManualResetEvent.
On Windows, you can use EventWaitHandle for the inter-process synchronization. To do that, create an EventWaitHandle instance that represents a named system synchronization event by using one of the EventWaitHandle constructors that specifies a name or the EventWaitHandle.OpenExisting
method.
Event wait handles are not .NET events. There are no delegates or event handlers involved. The word "event" is used to describe them because they have traditionally been referred to as operating-system events, and because the act of signaling the wait handle indicates to waiting threads that an event has occurred. |
Event Wait Handles That Reset Automatically [7]
You create an automatic reset event by specifying EventResetMode.AutoReset
when you create the EventWaitHandle
object. As its name implies, this synchronization event resets automatically when signaled, after releasing a single waiting thread. Signal the event by calling its Set
method.
Automatic reset events are usually used to provide exclusive access to a resource for a single thread at a time. A thread requests the resource by calling the WaitOne
method. If no other thread is holding the wait handle, the method returns true and the calling thread has control of the resource.
If an automatic reset event is signaled when no threads are waiting, it remains signaled until a thread attempts to wait on it. The event releases the thread and immediately resets, blocking subsequent threads.
Event Wait Handles That Reset Manually [7]
You create a manual reset event by specifying EventResetMode.ManualReset
when you create the EventWaitHandle
object. As its name implies, this synchronization event must be reset manually after it has been signaled. Until it is reset, by calling its Reset
method, threads that wait on the event handle proceed immediately without blocking.
A manual reset event acts like the gate of a corral. When the event is not signaled, threads that wait on it block, like horses in a corral. When the event is signaled, by calling its Set
method, all waiting threads are free to proceed. The event remains signaled until its Reset
method is called. This makes the manual reset event an ideal way to hold up threads that need to wait until one thread finishes a task.
Like horses leaving a corral, it takes time for the released threads to be scheduled by the operating system and to resume execution. If the Reset
method is called before all the threads have resumed execution, the remaining threads once again block. Which threads resume and which threads block depends on random factors like the load on the system, the number of threads waiting for the scheduler, and so on. This is not a problem if the thread that signals the event ends after signaling, which is the most common usage pattern. If you want the thread that signaled the event to begin a new task after all the waiting threads have resumed, you must block it until all the waiting threads have resumed. Otherwise, you have a race condition, and the behavior of your code is unpredictable.
EventWaitHandle ewh = new EventWaitHandle(false, EventResetMode.ManualReset);
ThreadPool.QueueUserWorkItem(_ =>
{
ewh.WaitOne();
Console.WriteLine("FooSingled");
});
ThreadPool.QueueUserWorkItem(_ =>
{
ewh.WaitOne();
Console.WriteLine("BarSingled");
});
ewh.Set();
Thread.Sleep(1000);
// $ dotnet run
// BarSingled
// FooSingled
The System.Threading.CountdownEvent class represents an event that becomes set when its count is zero. While CountdownEvent.CurrentCount
is greater than zero, a thread that calls CountdownEvent.Wait
is blocked. Call CountdownEvent.Signal
to decrement an event’s count.
In contrast to ManualResetEvent
or ManualResetEventSlim
, which you can use to unblock multiple threads with a signal from one thread, you can use CountdownEvent to unblock one or more threads with signals from multiple threads.
The System.Threading.Barrier class represents a thread execution barrier. A thread that calls the Barrier.SignalAndWait
method signals that it reached the barrier and waits until other participant threads reach the barrier. When all participant threads reach the barrier, they proceed and the barrier is reset and can be used again.
You might use Barrier when one or more threads require the results of other threads before proceeding to the next computation phase.
The System.Threading.Interlocked class provides static methods that perform simple atomic operations on a variable. Those atomic operations include addition, increment and decrement, exchange and conditional exchange that depends on a comparison, and read operation of a 64-bit integer value.
The System.Threading.SpinWait structure provides support for spin-based waiting. You might want to use it when a thread has to wait for an event to be signaled or a condition to be met, but when the actual wait time is expected to be less than the waiting time required by using a wait handle or by otherwise blocking the thread. By using SpinWait, you can specify a short period of time to spin while waiting, and then yield (for example, by waiting or sleeping) only if the condition was not met in the specified time.
The System.Collections.Concurrent namespace includes several collection classes that are both thread-safe and scalable. Multiple threads can safely and efficiently add or remove items from these collections, without requiring additional synchronization in user code. When you write new code, use the concurrent collection classes to write multiple threads to the collection concurrently. If you’re only reading from a shared collection, then you can use the classes in the System.Collections.Generic namespace.
Some of the concurrent collection types use lightweight synchronization mechanisms such as SpinLock
, SpinWait
, SemaphoreSlim
, and CountdownEvent
. These synchronization types typically use busy spinning for brief periods before they put the thread into a true Wait
state. When wait times are expected to be short, spinning is far less computationally expensive than waiting, which involves an expensive kernel transition. For collection classes that use spinning, this efficiency means that multiple threads can add and remove items at a high rate.
The ConcurrentQueue<T>
and ConcurrentStack<T>
classes don’t use locks at all. Instead, they rely on Interlocked
operations to achieve thread safety.
The following table lists the collection types in the System.Collections.Concurrent namespace:
Type | Description |
---|---|
|
Provides bounding and blocking functionality for any type that implements |
|
Thread-safe implementation of a dictionary of key-value pairs. |
|
Thread-safe implementation of a FIFO (first-in, first-out) queue. |
|
Thread-safe implementation of a LIFO (last-in, first-out) stack. |
|
Thread-safe implementation of an unordered collection of elements. |
|
The interface that a type must implement to be used in a |
* BlockingCollection<T>
and Channel<T>
are both useful for producer/consumer scenarios where one thread or task is producing data and another thread or task is consuming that data. However, their implementation and features are quite different, and they are designed to handle different use-cases.
BlockingCollection<T>
is part of the System.Collections.Concurrent
namespace and was introduced in .NET Framework 4.0. It provides a thread-safe, blocking and bounded collection that can be used with multiple producers and consumers.
Benefits of BlockingCollection<T>
:
It simplifies thread communication, as it blocks and waits when trying to add to a full collection or take from an empty one.
It provides Add
and Take
methods for managing the collection, which if bounded, will block if the collection is full or empty, respectively.
It implements IEnumerable<T>
, allowing easy enumeration of the items in the collection.
It has built-in functionality for creating a complete producer/consumer on top of any IProducerConsumerCollection<T>
.
Channel<T>
is part of the System.Threading.Channels
namespace and was introduced in .NET Core 3.0. It’s newer and designed for the modern .NET threading infrastructure using async and await design patterns. [8]
Benefits of Channel<T>
:
It supports the async programming model and can be used with async
and await
keywords in C#.
It is designed for scenarios where you have asynchronous data streams that need to be processed.
It provides both synchronous and asynchronous methods for adding (Writer.TryWrite
, Writer.WriteAsync
) and receiving (Reader.TryRead
, Reader.ReadAsync
) data.
It supports back pressure by naturally making the producer wait if the channel is full.
It allows for creating unbounded or bounded channels via Channel.CreateUnbounded<T>
and Channel.CreateBounded<T>
.
In general, Channel<T>
is more modern and better integrated with async programming model. Therefore, for newer applications it is recommended to use the Channel<T>
class.
However, if you have a legacy application where you cannot use async and await extensively, or where you are using ThreadPool and Tasks heavily, then BlockingCollection<T>
might be a better choice.
Typically, WPF applications start with two threads: one for handling rendering and another for managing the UI. The rendering thread effectively runs hidden in the background while the UI thread receives input, handles events, paints the screen, and runs application code. Most applications use a single UI thread, although in some situations it is best to use several. [11]
The UI thread queues work items inside an object called a Dispatcher. The Dispatcher selects work items on a priority basis and runs each one to completion. Every UI thread must have at least one Dispatcher, and each Dispatcher can execute work items in exactly one thread.
The trick to building responsive, user-friendly applications is to maximize the Dispatcher throughput by keeping the work items small. This way items never get stale sitting in the Dispatcher queue waiting for processing. Any perceivable delay between input and response can frustrate a user.
How then are WPF applications supposed to handle big operations? What if your code involves a large calculation or needs to query a database on some remote server? Usually, the answer is to handle the big operation in a separate thread, leaving the UI thread free to tend to items in the Dispatcher queue. When the big operation is complete, it can report its result back to the UI thread for display.
If only one thread can modify the UI, how do background threads interact with the user? A background thread can ask the UI thread to perform an operation on its behalf. It does this by registering a work item with the Dispatcher of the UI thread. The Dispatcher class provides the methods for registering work items: Dispatcher.InvokeAsync
, Dispatcher.BeginInvoke
, and Dispatcher.Invoke
. These methods schedule a delegate for execution. Invoke
is a synchronous call – that is, it doesn’t return until the UI thread actually finishes executing the delegate. InvokeAsync
and BeginInvoke
are asynchronous and return immediately.
The volatile
keyword indicates that a field might be modified by multiple threads that are executing at the same time. The compiler, the runtime system, and even hardware may rearrange reads and writes to memory locations for performance reasons. Fields that are declared volatile are excluded from certain kinds of optimizations. There is no guarantee of a single total ordering of volatile writes as seen from all threads of execution. [9]
On a multiprocessor system, a volatile read operation does not guarantee to obtain the latest value written to that memory location by any processor. Similarly, a volatile write operation does not guarantee that the value written would be immediately visible to other processors. |
The volatile
keyword can be applied to fields of these types:
Reference types.
Pointer types (in an unsafe context). Note that although the pointer itself can be volatile, the object that it points to cannot. In other words, you cannot declare a "pointer to volatile."
Simple types such as sbyte, byte, short, ushort, int, uint, char, float, and bool.
An enum type with one of the following base types: byte, sbyte, short, ushort, int, or uint.
Generic type parameters known to be reference types.
IntPtr and UIntPtr.
Other types, including double and long, cannot be marked volatile because reads and writes to fields of those types cannot be guaranteed to be atomic. To protect multi-threaded access to those types of fields, use the Interlocked
class members or protect access using the lock
statement.
The volatile
keyword can only be applied to fields of a class or struct. Local variables cannot be declared volatile.
.NET provides three patterns for performing asynchronous operations:
Task-based Asynchronous Pattern (TAP), which uses a single method to represent the initiation and completion of an asynchronous operation. TAP was introduced in .NET Framework 4. It’s the recommended approach to asynchronous programming in .NET. The async
and await
keywords in C# and the Async
and Await
operators in Visual Basic add language support for TAP.
Event-based Asynchronous Pattern (EAP)
, which is the event-based legacy model for providing asynchronous behavior. It requires a method that has the Async
suffix and one or more events, event handler delegate types, and EventArg-derived types. EAP was introduced in .NET Framework 2.0. It’s no longer recommended for new development.
Asynchronous Programming Model (APM)
pattern (also called the IAsyncResult pattern), which is the legacy model that uses the IAsyncResult
interface to provide asynchronous behavior. In this pattern, asynchronous operations require Begin
and End
methods (for example, BeginWrite
and EndWrite
to implement an asynchronous write operation). This pattern is no longer recommended for new development.
Lazy initialization of an object means that its creation is deferred until it is first used. (For this topic, the terms lazy initialization and lazy instantiation are synonymous.) Lazy initialization is primarily used to improve performance, avoid wasteful computation, and reduce program memory requirements. [12]
Although you can write your own code to perform lazy initialization, we recommend that you use Lazy<T> instead. Lazy<T> and its related types also support thread-safety and provide a consistent exception propagation policy.
Type | Description |
---|---|
A wrapper class that provides lazy initialization semantics for any class library or user-defined type. |
|
Resembles |
|
Provides advanced static (Shared in Visual Basic) methods for lazy initialization of objects without the overhead of a class. |
Many personal computers and workstations have multiple CPU cores that enable multiple threads to be executed simultaneously. To take advantage of the hardware, you can parallelize your code to distribute work across multiple processors. [13]
In the past, parallelization required low-level manipulation of threads and locks. Visual Studio and .NET enhance support for parallel programming by providing a runtime, class library types, and diagnostic tools. These features, which were introduced in .NET Framework 4, simplify parallel development. You can write efficient, fine-grained, and scalable parallel code in a natural idiom without having to work directly with threads or the thread pool.
The following illustration provides a high-level overview of the parallel programming architecture in .NET.
The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System.Threading.Tasks namespaces. The purpose of the TPL is to make developers more productive by simplifying the process of adding parallelism and concurrency to applications. The TPL dynamically scales the degree of concurrency to use all the available processors most efficiently. In addition, the TPL handles the partitioning of the work, the scheduling of threads on the ThreadPool, cancellation support, state management, and other low-level details. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish.
Data parallelism refers to scenarios in which the same operation is performed concurrently (that is, in parallel) on elements in a source collection or array. In data parallel operations, the source collection is partitioned so that multiple threads can operate on different segments concurrently. [14]
The Task Parallel Library (TPL) supports data parallelism through the System.Threading.Tasks.Parallel class. This class provides method-based parallel implementations of for
and foreach
loops (For
and For Each
in Visual Basic). You write the loop logic for a Parallel.For
or Parallel.ForEach
loop much as you would write a sequential loop. You do not have to create threads or queue work items. In basic loops, you do not have to take locks. The TPL handles all the low-level work for you.
string path = Path.Combine(
Environment.GetFolderPath(Environment.SpecialFolder.UserProfile), ".nuget/packages/");
string[] fileNames = Directory.GetFiles(path, "*", SearchOption.AllDirectories);
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 2; i++)
{
sw.Restart();
long parallelTotalSize = 0;
Parallel.ForEach(fileNames,
fileName => Interlocked.Add(ref parallelTotalSize, new FileInfo(fileName).Length));
Console.WriteLine($"Parallel: {parallelTotalSize}, {sw.ElapsedMilliseconds}ms");
sw.Restart();
long totalSize = 0;
foreach (string fileName in fileNames) totalSize += new FileInfo(fileName).Length;
Console.WriteLine($"Sequential : {totalSize}, {sw.ElapsedMilliseconds}ms");
}
// $ dotnet run
// Parallel: 2743226084, 400ms
// Sequential : 2743226084, 598ms
// Parallel: 2743226084, 220ms
// Sequential : 2743226084, 429ms
The Task Parallel Library (TPL) provides dataflow components to help increase the robustness of concurrency-enabled applications. These dataflow components are collectively referred to as the TPL Dataflow Library. This dataflow model promotes actor-based programming by providing in-process message passing for coarse-grained dataflow and pipelining tasks. The dataflow components build on the types and scheduling infrastructure of the TPL and integrate with the C#, Visual Basic, and F# language support for asynchronous programming. These dataflow components are useful when you have multiple operations that must communicate with one another asynchronously or when you want to process data as it becomes available. Dataflow (Task Parallel Library)
The TPL Dataflow Library provides a foundation for message passing and parallelizing CPU-intensive and I/O-intensive applications that have high throughput and low latency. Because the runtime manages dependencies between data, you can often avoid the requirement to synchronize access to shared data. In addition, because the runtime schedules work based on the asynchronous arrival of data, dataflow can improve responsiveness and throughput by efficiently managing the underlying threads.
The TPL Dataflow Library consists of dataflow blocks, which are data structures that buffer and process data. The TPL defines three kinds of dataflow blocks: source blocks, target blocks, and propagator blocks.
A source block acts as a source of data and can be read from.
A target block acts as a receiver of data and can be written to.
A propagator block acts as both a source block and a target block, and can be read from and written to.
The Task Parallel Library (TPL) is based on the concept of a task, which represents an asynchronous operation. In some ways, a task resembles a thread or ThreadPool work item but at a higher level of abstraction. The term task parallelism refers to one or more independent tasks running concurrently. Tasks provide two primary benefits: [15]
More efficient and more scalable use of system resources.
Behind the scenes, tasks are queued to the ThreadPool, which has been enhanced with algorithms that determine and adjust to the number of threads. These algorithms provide load balancing to maximize throughput. This process makes tasks relatively lightweight, and you can create many of them to enable fine-grained parallelism.
More programmatic control than is possible with a thread or work item.
Tasks and the framework built around them provide a rich set of APIs that support waiting, cancellation, continuations, robust exception handling, detailed status, custom scheduling, and more.
For both reasons, TPL is the preferred API for writing multi-threaded, asynchronous, and parallel code in .NET.
Language-Integrated Query (LINQ) is the name for a set of technologies based on the integration of query capabilities directly into the C# language.
Traditionally, queries against data are expressed as simple strings without type checking at compile time or IntelliSense support. Furthermore, you have to learn a different query language for each type of data source: SQL databases, XML documents, various Web services, and so on.
With LINQ, a query is a first-class language construct, just like classes, methods, and events. [19]
In-memory data
There are two ways you enable LINQ querying of in-memory data. If the data is of a type that implements IEnumerable<T>
, you query the data by using LINQ to Objects. If it doesn’t make sense to enable enumeration by implementing the IEnumerable<T>
interface, you define LINQ standard query operator methods, either in that type or as extension methods for that type. Custom implementations of the standard query operators should use deferred execution to return the results.
Remote data
The best option for enabling LINQ querying of a remote data source is to implement the IQueryable<T>
interface.
At compile time, query expressions are converted to standard query operator method calls according to the rules defined in the C# specification. Any query that can be expressed by using query syntax can also be expressed by using method syntax. In some cases, query syntax is more readable and concise. In others, method syntax is more readable. There’s no semantic or performance difference between the two different forms.
|
Parallel LINQ (PLINQ) is a parallel implementation of the Language-Integrated Query (LINQ) pattern. PLINQ implements the full set of LINQ standard query operators as extension methods for the System.Linq namespace and has additional operators for parallel operations. PLINQ combines the simplicity and readability of LINQ syntax with the power of parallel programming. [20]
A PLINQ query in many ways resembles a non-parallel LINQ to Objects query. PLINQ queries, just like sequential LINQ queries, operate on any in-memory IEnumerable
or IEnumerable<T>
data source, and have deferred execution, which means they do not begin executing until the query is enumerated. The primary difference is that PLINQ attempts to make full use of all the processors on the system. It does this by partitioning the data source into segments, and then executing the query on each segment on separate worker threads in parallel on multiple processors. In many cases, parallel execution means that the query runs significantly faster.
The System.Linq.ParallelEnumerable class exposes almost all of PLINQ’s functionality, includes implementations of all the standard query operators that LINQ to Objects supports, although it does not attempt to parallelize each one.
In addition to the standard query operators, the ParallelEnumerable class contains a set of methods that enable behaviors specific to parallel execution. These PLINQ-specific methods are listed in the following table.
ParallelEnumerable Operator |
Description |
|
The entry point for PLINQ. Specifies that the rest of the query should be parallelized, if it is possible. |
|
Specifies that the rest of the query should be run sequentially, as a non-parallel LINQ query. |
|
Specifies that PLINQ should preserve the ordering of the source sequence for the rest of the query, or until the ordering is changed, for example by the use of an orderby (Order By in Visual Basic) clause. |
|
Specifies that PLINQ for the rest of the query is not required to preserve the ordering of the source sequence. |
|
Specifies that PLINQ should periodically monitor the state of the provided cancellation token and cancel execution if it is requested. |
|
Specifies the maximum number of processors that PLINQ should use to parallelize the query. |
|
Provides a hint about how PLINQ should, if it is possible, merge parallel results back into just one sequence on the consuming thread. |
|
Specifies whether PLINQ should parallelize the query even when the default behavior would be to run it sequentially. |
|
A multithreaded enumeration method that, unlike iterating over the results of the query, enables results to be processed in parallel without first merging back to the consumer thread. |
|
An overload that is unique to PLINQ and enables intermediate aggregation over thread-local partitions, plus a final aggregation function to combine the results of all partitions. |
IEnumerable<string> files = Directory.EnumerateFiles("/usr/share/man", "*.gz", SearchOption.AllDirectories);
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 2; i++)
{
sw.Restart();
var parallelLetters = files.AsParallel()
.Select(SplitLetters)
.SelectMany(w => w)
.GroupBy(char.ToLower)
.OrderByDescending(g => g.Count())
.First();
Console.WriteLine($"Parallel: {parallelLetters.Key}: {parallelLetters.Count()}, {sw.ElapsedMilliseconds}ms");
sw.Restart();
var sequentialLetters = files // .AsParallel().AsSequential()
.Select(SplitLetters)
.SelectMany(w => w)
.GroupBy(char.ToLower)
.OrderByDescending(g => g.Count())
.First();
Console.WriteLine($"Sequential: {sequentialLetters.Key}: {sequentialLetters.Count()}, {sw.ElapsedMilliseconds}ms");
}
static IEnumerable<char> SplitLetters(string fileName)
{
using StreamReader reader = new StreamReader(fileName);
string? line;
while ((line = reader.ReadLine()) != null)
{
foreach (char c in line.ToCharArray())
{
if (char.IsLetter(c))
yield return c;
}
}
}
// $ dotnet run
// Parallel: e: 251378, 2242ms
// Sequential: e: 251378, 1996ms
// Parallel: e: 251378, 1133ms
// Sequential: e: 251378, 1824ms
You can avoid performance bottlenecks and enhance the overall responsiveness of your application by using asynchronous programming. However, traditional techniques for writing asynchronous applications can be complicated, making them difficult to write, debug, and maintain.
C# supports simplified approach, async programming, that leverages asynchronous support in the .NET runtime. The compiler does the difficult work that the developer used to do, and your application retains a logical structure that resembles synchronous code. As a result, you get all the advantages of asynchronous programming with a fraction of the effort. [16]
Asynchrony is essential for activities that are potentially blocking, such as web access. Access to a web resource sometimes is slow or delayed. If such an activity is blocked in a synchronous process, the entire application must wait. In an asynchronous process, the application can continue with other work that doesn’t depend on the web resource until the potentially blocking task finishes.
Asynchrony proves especially valuable for applications that access the UI thread because all UI-related activity usually shares one thread. If any process is blocked in a synchronous application, all are blocked. Your application stops responding, and you might conclude that it has failed when instead it’s just waiting.
When you use asynchronous methods, the application continues to respond to the UI. You can resize or minimize a window, for example, or you can close the application if you don’t want to wait for it to finish.
The async-based approach adds the equivalent of an automatic transmission to the list of options that you can choose from when designing asynchronous operations. That is, you get all the benefits of traditional asynchronous programming but with much less effort from the developer.
Async methods are intended to be non-blocking operations. An await expression in an async method doesn’t block the current thread while the awaited task is running. Instead, the expression signs up the rest of the method as a continuation and returns control to the caller of the async method.
The async
and await
keywords don’t cause additional threads to be created. Async methods don’t require multithreading because an async method doesn’t run on its own thread. The method runs on the current synchronization context and uses time on the thread only when the method is active. You can use Task.Run
to move CPU-bound work to a background thread, but a background thread doesn’t help with a process that’s just waiting for results to become available.
If you specify that a method is an async method by using the async
modifier, you enable the following two capabilities.
The marked async method can use await
to designate suspension points. The await operator tells the compiler that the async method can’t continue past that point until the awaited asynchronous process is complete. In the meantime, control returns to the caller of the async method.
The suspension of an async method at an await expression doesn’t constitute an exit from the method, and finally blocks don’t run.
The marked async method can itself be awaited by methods that call it.
An async method typically contains one or more occurrences of an await operator, but the absence of await expressions doesn’t cause a compiler error. If an async method doesn’t use an await
operator to mark a suspension point, the method executes as a synchronous method does, despite the async
modifier. The compiler issues a warning for such methods.
SynchronizationContext was also introduced in .NET Framework 2.0, as an abstraction for a general scheduler. In particular, SynchronizationContext’s most used method is Post
, which queues a work item to whatever scheduler is represented by that context. [17]
Consider a UI framework like Windows Forms. As with most UI frameworks on Windows, controls are associated with a particular thread, and that thread runs a message pump which runs work that’s able to interact with those controls: only that thread should try to manipulate those controls, and any other thread that wants to interact with the controls should do so by sending a message to be consumed by the UI thread’s pump. Windows Forms makes this easy with methods like Control.BeginInvoke
, which queues the supplied delegate and arguments to be run by whatever thread is associated with that Control. You can thus write code like this:
private void button1_Click(object sender, EventArgs e)
{
ThreadPool.QueueUserWorkItem(_ =>
{
string message = ComputeMessage();
button1.BeginInvoke(() =>
{
button1.Text = message;
});
});
}
That will offload the ComputeMessage()
work to be done on a ThreadPool thread (so as to keep the UI responsive while it’s being processed), and then when that work has completed, queue a delegate back to the thread associated with button1
to update button1’s label. Easy enough. WPF has something similar, just with its Dispatcher
type:
private void button1_Click(object sender, RoutedEventArgs e)
{
ThreadPool.QueueUserWorkItem(_ =>
{
string message = ComputeMessage();
button1.Dispatcher.InvokeAsync(() =>
{
button1.Content = message;
});
});
}
Each application model then ensures it’s published as SynchronizationContext.Current
a SynchronizationContext-derived type that does the "right thing." For example, Windows Forms has this:
public sealed class WindowsFormsSynchronizationContext : SynchronizationContext, IDisposable
{
public override void Post(SendOrPostCallback d, object? state) =>
_controlToSendTo?.BeginInvoke(d, new object?[] { state });
...
}
and WPF has this:
public sealed class DispatcherSynchronizationContext : SynchronizationContext
{
public override void Post(SendOrPostCallback d, Object state) =>
_dispatcher.BeginInvoke(_priority, d, state);
...
}
SynchronizationContext makes it possible to call reusable helpers and automatically be scheduled back whenever and to wherever the calling environment deems fit. As a result, it’s natural to expect that to "just work" with async/await, and it does.
button1.Text = await Task.Run(() => ComputeMessage());
That invocation of ComputeMessage
is offloaded to the thread pool, and upon the method’s completion, execution transitions back to the UI thread associated with the button, and the setting of its Text
property happens on that thread.
That integration with SynchronizationContext is left up to the awaiter implementation (the code generated for the state machine knows nothing about SynchronizationContext), as it’s the awaiter that is responsible for actually invoking or queueing the supplied continuation when the represented asynchronous operation completes. While a custom awaiter need not respect SynchronizationContext.Current
, the awaiters for Task
, Task<TResult>
, ValueTask
, and ValueTask<TResult>
all do. That means that, by default, when you await a Task
, a Task<TResult>
, a ValueTask
, a ValueTask<TResult>
, or even the result of a Task.Yield()
call, the awaiter by default will look up the current SynchronizationContext
and then if it successfully got a non-default one, will eventually queue the continuation to that context.
The ConfigureAwait method isn’t special: it’s not recognized in any special way by the compiler or by the runtime. It is simply a method that returns a struct (a ConfiguredTaskAwaitable
) that wraps the original task it was called on as well as the specified Boolean value. Remember that await can be used with any type that exposes the right pattern. By returning a different type, it means that when the compiler accesses the instances GetAwaiter
method (part of the pattern), it’s doing so off of the type returned from ConfigureAwait rather than off of the task directly, and that provides a hook to change the behavior of how the await behaves via this custom awaiter. ConfigureAwait(continueOnCapturedContext: false)
is used to avoid forcing the callback to be invoked on the original context or scheduler. [18]
> What happens on Thread.Sleep(0) in .NET? * In .NET, It’s effectively a way to signal to the operating system that the thread is willing to give up its slice of processor time, if there are other threads that are ready to run on the same processor. However, This can be useful to prevent a thread from consuming too much CPU time in a busy-wait scenario or when you might want to give other threads the chance to run. Remember that using |
//
// Summary:
// Sets the number of requests to the thread pool that can be active concurrently.
// All requests above that number remain queued until thread pool threads become
// available.
//
// Parameters:
// workerThreads:
// The maximum number of worker threads in the thread pool.
//
// completionPortThreads:
// The maximum number of asynchronous I/O threads in the thread pool.
//
// Returns:
// true if the change is successful; otherwise, false.
public static bool SetMaxThreads(int workerThreads, int completionPortThreads);
public static bool SetMinThreads(int workerThreads, int completionPortThreads);
> What are the worker and completion port threads in SetMaxThreads(int workerThreads, int completionPortThreads)? * The
The threadpool maintains two types of threads, worker threads for the most part handle compute operations and I/O threads handle as the name suggests handle I/O bound operations. Asynchronous I/O operations, which are operations that complete at some later point, often use callback methods to signal completion. When the system calls these callback methods, it does so using a thread from the ThreadPool. The It’s typically not necessary to change the ThreadPool size, because the ThreadPool already optimizes the number of threads it uses based on the number of CPUs and the types of tasks it’s running. In most cases, ThreadPool is efficient in managing its threads. Its algorithm aims to efficiently use all available CPU resources while not overloading the system with too many threads at the same time. It tries to maintain a balance. |
> How does .NET identify I/O-bound or compute-bound operations? * The .NET runtime itself doesn’t actively categorize tasks as I/O-bound or compute-bound. It’s up to developers to ensure that appropriate APIs and techniques are used for I/O-bound and compute-bound tasks. For I/O-bound operations, which typically include file, database, and network operations, you should use asynchronous methods when they’re available to avoid blocking (or tying up) a thread from the thread pool. When an I/O-bound operation is asynchronous, this generally means the operation is offloaded to the operating system, freeing up the .NET runtime thread to perform other work. I/O bound means the data is transferred between the CPU and the peripheral devices (disk, network, etc.), during which the CPU isn’t doing much work. On the other hand, when a task is compute-bound, meaning it involves heavy computations and keeps the processor busy, developers should generally use synchronous methods and potentially multiple threads to utilize more cores and finish the computations faster. This kind of operation is typically done in parallel to split the work across multiple threads, where each thread uses synchronous code to do its part of the work. So, while .NET doesn’t automatically distinguish between I/O-bound and compute-bound operations, it provides facilities (like asynchronous methods for I/O-bound tasks and parallel processing techniques such as the TPL for compute-bound tasks) for developers to write efficient code for both types of operations. |
> How to understand 'It’s up to developers…'? * What "It’s up to developers…" means is that the .NET runtime, or indeed any other runtime, cannot automatically discern if the operation you’re performing is I/O-bound or compute-bound. It’s the responsibility of the developer writing the code to understand what kind of operation they’re performing and to use the correct method to handle it. For example,
The appropriate programming approach (asynchronous or multithreaded) is dictated by the nature of the operation. The developer must know their application, the tasks, and their nature, and then use the appropriate method. The .NET runtime will not do this discernment automatically. |
> How does CLR manage the number of threads (worker and I/O threads) in the ThreadPool? * The CLR, or Common Language Runtime, uses a set algorithm to manage the number of threads in its ThreadPool. Upon application startup, the ThreadPool possesses few threads. As requests come in, threads are added to fulfill those requests. The CLR considers several factors to make an adjustment, including the workload demand, number of processors, and the number of existing threads. For each pending task or work item, a check is performed to see if a thread is available. If not, and the current thread count is below the maximum limit, a new thread is created unless we’ve reached the maximum limit "rate". If a ThreadPool thread is idle for some time (around a minute), it gets terminated, which reduces resource usage. If the thread count reaches the max limit and the requests are still not fulfilled, they are queued. "Worker" threads are commonly used for executing synchronous and asynchronous managed code, and "I/O threads" are specifically designated for completing I/O-bound work. Both the maximum threads and "rate" are automatically set based on several factors like the size of virtual memory, number of processors available, etc. The thread injection rate (how often new threads are added) aids to prevent situations where rapidly creating too many threads would cause resource saturation and decrease performance. You can manually get and change the minimum and maximum number of threads in the ThreadPool by using However, it’s recommended to let the CLR manage the threading details because of its smart heuristics and tuning based on extensive research and testing. Custom changes should only be made after careful testing and analysis. |
> What’s the algorithm of the thread pool in .NET? * According to .NET’s official documentation, the CLR employs a hill-climbing algorithm for adjusting the number of ThreadPool threads that can run simultaneously. This heuristic algorithm is designed specifically to find the most efficient and effective number of threads for any given workload. Here’s how the algorithm works:
This algorithm works to maintain a balance between the number of threads and the amount of work that needs to be done. By ensuring threads are only created as needed, and cleaning up idle threads, it helps to optimize the system resources and performance. It’s important to note that the .NET ThreadPool’s algorithm is intentionally not publicly documented in detail because it’s subject to change as Microsoft fine-tunes it based on new findings and requirements. As always, it’s recommended to let the CLR manage the ThreadPool for you, unless you have a specific reason and have done careful performance analysis. |
> About the Interlocked class in .NET, what if we increment a 64-bit integer on a 32-bit hardware? * The In a multithreaded or parallel computing scenario, multiple threads might be trying to increment the value of a shared 64-bit integer simultaneously. In such a case, However, performing atomic operations on a 64-bit integer (long/Int64) in 32-bit hardware requires two CPU instructions, because the size of the long value is greater than the word size of the 32-bit processor. But even then, .NET runtime ensures via Still, it’s important to note that such atomic operations come with a slightly increased performance overhead as compared to non-atomic operations, due to preventing reordering and caching of the value being incremented. Thus, they must be used judiciously. |
> How does CLR make the two CPU instructions as an atomic? * The .NET Common Language Runtime (CLR) and the underlying operating system provide certain guarantees to ensure atomicity of operations on 64-bit values, even on 32-bit platforms. The On Windows, for example, this can be achieved using a combination of OS-provided synchronization primitives and specific processor instructions. For instance, the When you call an These atomic operations prevent any thread interference or inconsistencies while performing read-modify-write actions by ensuring that the operation is completed as a single, uninterruptible unit. |
> I heard there are some risk on atomic operations in Go or sth else? * Yes, coding with atomic operations can indeed be risky, not just in Go but in any language, if they are misused or misunderstood. Here are a few reasons why this is the case:
In Go, for example, the So while atomic operations are a powerful tool in concurrent programming, they are best suited for a limited set of problems and should be used judiciously and cautiously, usually when performance is a dicerning factor and you’re comfortable and proficient with the complexities of concurrent system behaviour and memory models. |
> What’s ABA problems? * The ABA problem is a particularly tricky issue that can occur in multi-threaded or concurrent programs that use lock-free data structures and make use of atomic primitives, such as Compare-and-Swap (CAS). Here’s a brief explanation of the ABA problem:
The result here is that Thread A acted on the assumption that nothing had changed when actually something had, but it was effectively invisible to it. This can have outcomes ranging from inconsequential to catastrophic, depending on what the shared data is used for. This is the ABA problem. This issue is typically resolved by assigning a version number or a timestamp along with every change so that the revisiting thread can check not only the value but whether it is the same instance of the value it observed before. This kind of solution is often referred to as "tagging". Double-width Compare-and-Swap (DWCAS) or similar atomic primitives are used to atomically change the value and its associated version number or tag. |
> How to understand 'hardware, compilers, and the language memory model'? * Understanding 'hardware, compilers, and the language memory model' is important when working with lower-level concurrent constructs such as atomic operations or volatile fields. Let me break them down:
All these can significantly impact the correctness, performance, and portability of multi-threaded code. Learning about them usually involves studying computer architecture, compilations, operating systems, and of course, the programming language’s specification. For most developers, however, using higher-level synchronization primitives and libraries that abstract these details away is sufficient and recommended. |
> Anyway, for a single operation like * Yes, for the This method guarantees that the increment operation, which is otherwise a read-modify-write operation consisting of multiple steps, is executed in a single, unbroken step. No context switch or thread interference can happen in the middle of the operation that could leave the data in an inconsistent state. It can be used safely in a multithreaded and concurrent environment, ensuring that all incrementing operations on a shared variable are done in full, one at a time. Please be aware that while |
> How to understand the volatile keyword in .NET? And how to understand "There is no guarantee of a single total ordering of volatile writes as seen from all threads of execution."? * The In multithreaded applications, the The quote "There is no guarantee of a single total ordering of volatile writes as seen from all threads of execution" means that the order in which threads observe volatile writes may not be consistent across all threads. While Here’s an example to illustrate:
The volatile keyword ensures that Thread 1 will write to variable A before variable B. If Thread 2 reads variable B, it will see this write (or a later one). However, another volatile read to variable A can still return an older value. This is because the volatile read from variable B does not provide 'release-fence' semantics; it doesn’t force a refresh of all other modifications. This is why it’s often recommended to use higher-level synchronization constructs (like |
> How to understand "On a multiprocessor system, a volatile read operation does not guarantee to obtain the latest value written to that memory location by any processor."? * This statement draws attention to an important fact about volatile keyword usage in .NET on a multiprocessor system where threads could be executing on different processors. The volatile keyword in C# tells the compiler not to reorder reads or writes to a field, or cache the field in a register, which helps ensure that all reads and writes on one processor occur in the order they’re specified in the code. Because of this, the volatile keyword is helpful for ensuring consistency in a multithreaded context. However, it does not guarantee immediate visibility on all processors. For example, if one processor writes to a volatile field and another processor reads from that field, the reading processor might not get the most recent value written by the writing processor. The reasons can include delay due to propagation of write from one processor’s cache to others (in a multiprocessor, multi-caching systems) or the order in which these reads and writes are performed. This is why it’s stated that "On a multiprocessor system, a volatile read operation does not guarantee to obtain the latest value written to that memory location by any processor." In general, for complex synchronization or if strong cross-processor memory guarantees are required, usage of higher-level synchronization primitives or constructs (like locks, |
> What’s the diff of volatile keyword and Volatile class? * The The The So, while both are used for volatile memory operations, the |
> It’s about the happen-before on a memory model. It seems we should avoid to use the volatile keyword? * In general, understanding and correctly using the The primary purpose of However, using Higher-level concurrency constructs (
In the end, |
> What’s the diff of asynchronous and parallel programming in .NET? * In .NET, both asynchronous and parallel programming paradigms are used to execute code that can potentially run concurrently to improve performance. However, they are used in different scenarios and address different types of problems.
Asynchronous programming is about freeing up the main executing thread to do other work when it would otherwise be blocked, usually waiting for I/O operations (like reading a file, downloading from the internet, querying a database, etc.). The goal of asynchronous programming is to make the application more responsive. With async programming, the application can yield control while it waits on these long-running tasks, allowing it to perform other tasks concurrently. In .NET, you often see
Parallel programming, on the other hand, is about executing multiple operations, or tasks, at the same time by utilizing multiple cores of the CPU. This is particularly helpful for CPU-bound work that can be broken down into smaller, independent tasks and processed in parallel. For example, if you’re processing an array of elements, you can use parallel programming to process multiple elements at the same time in different cores. In .NET, this could be achieved using the In summary, use asynchronous programming when dealing with I/O-bound tasks to avoid blocking and keep the application responsive, and use parallel programming when trying to break down CPU-intensive tasks to run simultaneously to improve throughput. |
> What’s the 'control' meaning in async and await programming? * In the context of asynchronous programming with When you call an When the awaited operation (such as an I/O operation, web request, etc.) completes, the remainder of the So "yielding control" in |
> What’s the underneath in 'control', like a thread or sth else? * "Control" primarily refers to the sequence of execution of code or the flow of a program. When we talk about control in the context of On a lower level in .NET runtime, this "control" is facilitated by the Task Scheduler and in many scenarios, the synchronization context. Here’s a simplified view: When a method awaits an operation (e.g. In a UI-based application, the synchronization context is typically tied to the main UI thread, which allows the continuation (the code following the await keyword) to be executed on the same UI thread. This makes async programming easier and safer in a UI context. In an environment without a SynchronizationContext (like in Console applications or thread pool threads), the default TaskScheduler schedules the continuation to run on the ThreadPool. With this in mind, when you call Again, this is a simplified view. In reality, there are several optimizations and mechanisms in place to make async programming more efficient and easier to use. |
> Does it mean the 'someTask' will be executed on another thread? * Not necessarily. When you call an async method, it does its work on the same thread that called it until it awaits an incomplete task. At the await, if the task isn’t already complete, the async method is paused and control returns to the caller of the async method. This all happens on the same thread, and no extra threads are created. What happens next depends on what the awaited task is doing. Here’s a simple way to think about it:
When the awaited operation completes, the remainder of the async method is scheduled to continue running. The scheduling depends on the context when the 'await' keyword is encountered. In a UI application, it resumes on the UI thread (as this is the captured context); for a server application, it generally runs on a ThreadPool thread, but not necessarily the same thread as the one it started. Remember, in most cases, async doesn’t mean concurrent or multi-threaded. Async methods don’t require multithreading because an async method doesn’t run on its own thread. The method runs on the current synchronization context and uses time on the thread only when the method is active. You can use |
> How to understand "Async methods don’t require multithreading because an async method doesn’t run on its own thread."? * In asynchronous programming, the fundamental misunderstanding is often thinking that "asynchronous" automatically means "on a different thread". This is not strictly true. When you mark a method with An At that point, assuming the task being awaited has not yet completed, the method yields and the remainder of the method is turned into a continuation that will be invoked once the awaited operation completes. What’s important to understand here is that the continuation does not have to run on the same thread where it left off - it could run on whatever thread the completed task calls back on. So an However, if you were to use So to sum it up, |
> Can the async/await improve the responsiveness on ASP.NET Core? * Yes, utilizing The primary benefit of using In a synchronous ASP.NET Core application without In ASP.NET Core, when you use async methods and await async tasks, the thread handling the request can be used for starting another request as soon as it enters a waiting state (awaiting I/O-bound operation). So by leveraging |
> It seems that it is because the I/O operations are offloaded from the current thread, and release the CPU capacity. But who should be in charge of the I/O operations? * Yes, that’s correct. The benefit of The operating system is in charge of coordinating and managing I/O operations. When you start an I/O-bound operation, like reading from disk or making a network request, the request is sent to the operating system. The operating system takes care of the actual I/O operation, interfacing with the necessary hardware, like the disk or network card. From your application’s perspective, once you start an I/O operation and So you can think of I/O-bound |
> Actually, if we use sync methods, the current thread will be in wait status for the I/O. but it still can free up CPU. * Yes, you’re correct. Even with synchronous I/O operations, the CPU isn’t actively engaged the whole time. The operating system still manages the I/O operation, and the CPU is freed up to perform other tasks. However, the key difference lies in the usage of threads. In synchronous operations, the thread performing the operation goes into a wait state and cannot be used for anything else. This means the application can handle fewer concurrent operations, as it might run out of available threads if multiple I/O operations are waiting to complete. With asynchronous operations using So while both sync and async free up CPU during I/O operations, async also allows for the thread to be reused during this time, leading to more efficient utilization of system resources. |
> This is the non-blocking I/O in essence? * Exactly, using In blocking or synchronous I/O, the execution thread is halted—or "blocked"—until the I/O operation completes. This means the thread can’t be used for any other tasks, regardless of how long the I/O operation takes. Non-blocking or asynchronous I/O, on the other hand, starts the I/O operation and then allows the execution thread to continue other tasks. It doesn’t wait for the I/O operation to complete. Instead, it will be notified once the I/O operation is done, at which point it can proceed with any steps that depend on the result. Thus, by enabling non-blocking I/O, the |
* Here’s a summary of the key points regarding the use of
|
> Is there a SynchronizationContext on ASP.NET Core? * No, there’s no In traditional .NET Framework, In ASP.NET Framework, However, ASP.NET Core does not have a With that said, developers need to be aware that the code after an |
> What’s the diff of AsOrdered and AsUnordered in PLINQ? * In PLINQ, the methods The The
It’s important to note that by default, PLINQ tries to preserve the original order, so Partitioning is the process in which PLINQ divides the source data into multiple chunks, or partitions, so that it can process them independently and concurrently on multiple cores. Order preservation affects this partitioning process:
Generally, if the order in which results are produced doesn’t matter, you should use |
[1] https://learn.microsoft.com/en-us/dotnet/standard/threading/threads-and-threading
[2] https://learn.microsoft.com/en-us/dotnet/standard/threading/using-threads-and-threading
[3] https://learn.microsoft.com/en-us/dotnet/standard/threading/cancellation-in-managed-threads
[4] https://learn.microsoft.com/en-us/dotnet/standard/threading/pausing-and-resuming-threads
[5] https://learn.microsoft.com/en-us/dotnet/standard/threading/the-managed-thread-pool
[6] https://learn.microsoft.com/en-us/dotnet/standard/threading/overview-of-synchronization-primitives
[7] https://learn.microsoft.com/en-us/dotnet/standard/threading/eventwaithandle
[8] https://devblogs.microsoft.com/dotnet/an-introduction-to-system-threading-channels/
[9] https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/volatile
[10] https://learn.microsoft.com/en-us/dotnet/standard/asynchronous-programming-patterns/
[11] https://learn.microsoft.com/en-us/dotnet/desktop/wpf/advanced/threading-model
[12] https://learn.microsoft.com/en-us/dotnet/framework/performance/lazy-initialization
[13] https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/
[14] https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/data-parallelism-task-parallel-library
[15] https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/task-based-asynchronous-programming
[16] https://learn.microsoft.com/en-us/dotnet/csharp/asynchronous-programming/task-asynchronous-programming-model
[17] https://devblogs.microsoft.com/dotnet/how-async-await-really-works/
[18] https://devblogs.microsoft.com/dotnet/configureawait-faq/
[20] https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/introduction-to-plinq
[21] https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/dataflow-task-parallel-library