Parallel programming in .NET

"Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once." — Rob Pike

1. Threads and threading
2. Asynchronous programming patterns
3. Lazy Initialization
4. Parallel programming in .NET
5. Asynchronous programming with async and await
Appendix A: FAQ
References

1. Threads and threading

Multithreading allows you to increase the responsiveness of your application and, if your application runs on a multiprocessor or multi-core system, increase its throughput. [1]

1.1. Processes and threads

A process is an executing program. An operating system uses processes to separate the applications that are being executed.

A thread is the basic unit to which an operating system allocates processor time. Each thread has a scheduling priority and maintains a set of structures the system uses to save the thread context when the thread’s execution is paused.

The thread context includes all the information the thread needs to seamlessly resume execution, including the thread’s set of CPU registers and stack. Multiple threads can run in the context of a process. All threads of a process share its virtual address space. A thread can execute any part of the program code, including parts currently being executed by another thread.

.NET Framework provides a way to isolate applications within a process with the use of application domains. (Application domains are not available on .NET Core.)

By default, a .NET program is started with a single thread, often called the primary thread. However, it can create additional threads to execute code in parallel or concurrently with the primary thread. These threads are often called worker threads.

1.2. How to use multithreading in .NET

Starting with .NET Framework 4, the recommended way to utilize multithreading is to use Task Parallel Library (TPL) and Parallel LINQ (PLINQ).

Both TPL and PLINQ rely on the ThreadPool threads. The System.Threading.ThreadPool class provides a .NET application with a pool of worker threads. You can also use thread pool threads.

At last, you can use the System.Threading.Thread class that represents a managed thread.

1.3. Using threads and threading

With .NET, you can write applications that perform multiple operations at the same time. Operations with the potential of holding up other operations can execute on separate threads, a process known as multithreading or free threading. [2]

Applications that use multithreading are more responsive to user input because the user interface stays active as processor-intensive tasks execute on separate threads. Multithreading is also useful when you create scalable applications because you can add threads as the workload increases.

1.3.1. Create and start a new thread

You create a new thread by creating a new instance of the System.Threading.Thread class. You provide the name of the method that you want to execute on the new thread to the constructor. To start a created thread, call the Thread.Start method.

new Thread(() => Console.WriteLine("Hello Thread")).Start();

1.3.2. Stop a thread

To terminate the execution of a thread, use the System.Threading.CancellationToken. It provides a unified way to stop threads cooperatively.

Sometimes it’s not possible to stop a thread cooperatively because it runs third-party code not designed for cooperative cancellation. In this case, you might want to terminate its execution forcibly. To terminate the execution of a thread forcibly, in .NET Framework you can use the Thread.Abort method. That method raises a ThreadAbortException on the thread on which it’s invoked.

The Thread.Abort method isn’t supported in .NET Core. If you need to terminate the execution of third-party code forcibly in .NET Core, run it in the separate process and use the Process.Kill method.

The System.Threading.CancellationToken isn’t available before .NET Framework 4. To stop a thread in older .NET Framework versions, use the thread synchronization techniques to implement the cooperative cancellation manually. For example, you can create the volatile boolean field shouldStop and use it to request the code executed by the thread to stop.

Use the Thread.Join method to make the calling thread wait for the termination of the thread being stopped.

1.3.3. Pause or interrupt a thread

You use the Thread.Sleep method to pause the current thread for a specified amount of time. You can interrupt a blocked thread by calling the Thread.Interrupt method.

Calling the Thread.Sleep method causes the current thread to immediately block for the number of milliseconds or the time interval you pass to the method, and yields the remainder of its time slice to another thread. Once that interval elapses, the sleeping thread resumes execution. [4]

One thread cannot call Thread.Sleep on another thread. Thread.Sleep is a static method that always causes the current thread to sleep.

Calling Thread.Sleep with a value of Timeout.Infinite causes a thread to sleep until it is interrupted by another thread that calls the Thread.Interrupt method on the sleeping thread, or until it is terminated by a call to its Thread.Abort method.

You can interrupt a waiting thread by calling the Thread.Interrupt method on the blocked thread to throw a ThreadInterruptedException, which breaks the thread out of the blocking call. The thread should catch the ThreadInterruptedException and do whatever is appropriate to continue working. If the thread ignores the exception, the runtime catches the exception and stops the thread.

// Interrupts a thread that is in the WaitSleepJoin thread state.
public void Interrupt ();

WaitSleepJoin: The thread is blocked.

This could be the result of calling Sleep(Int32) or Join(), of requesting a lock - for example, by calling Enter(Object) or Wait(Object, Int32, Boolean) - or of waiting on a thread synchronization object such as ManualResetEvent.

If the target thread is not blocked when Thread.Interrupt is called, the thread is not interrupted until it blocks. If the thread never blocks, it could complete without ever being interrupted.

If a wait is a managed wait, then Thread.Interrupt and Thread.Abort both wake the thread immediately. If a wait is an unmanaged wait (for example, a platform invoke call to the Win32 WaitForSingleObject function), neither Thread.Interrupt nor Thread.Abort can take control of the thread until it returns to or calls into managed code. In managed code, the behavior is as follows:

Thread.Interrupt wakes a thread out of any wait it might be in and causes a ThreadInterruptedException to be thrown in the destination thread.
.NET Framework only: Thread.Abort wakes a thread out of any wait it might be in and causes a ThreadAbortException to be thrown on the thread.

Thread sleepingThread = new Thread(() =>
{
    Console.WriteLine("Thread '{0}' about to sleep indefinitely.", Thread.CurrentThread.Name);
    try
    {
        Thread.Sleep(Timeout.Infinite);
    }
    catch (ThreadInterruptedException)
    {
        Console.WriteLine("Thread '{0}' awoken.", Thread.CurrentThread.Name);
    }
    finally
    {
        Console.WriteLine("Thread '{0}' executing finally block.", Thread.CurrentThread.Name);
    }
    Console.WriteLine("Thread '{0} finishing normal execution.", Thread.CurrentThread.Name);
});

sleepingThread.Name = "Sleeping";
sleepingThread.Start();
Thread.Sleep(2000);
sleepingThread.Interrupt();

// Thread 'Sleeping' about to sleep indefinitely.
// Thread 'Sleeping' awoken.
// Thread 'Sleeping' executing finally block.
// Thread 'Sleeping finishing normal execution.

1.4. Cancellation in Managed Threads

Starting with .NET Framework 4, .NET uses a unified model for cooperative cancellation of asynchronous or long-running synchronous operations. This model is based on a lightweight object called a cancellation token. The object that invokes one or more cancelable operations, for example by creating new threads or tasks, passes the token to each operation. Individual operations can in turn pass copies of the token to other operations. At some later time, the object that created the token can use it to request that the operations stop what they are doing. Only the requesting object can issue the cancellation request, and each listener is responsible for noticing the request and responding to it in an appropriate and timely manner. [3]

The general pattern for implementing the cooperative cancellation model is:

Instantiate a CancellationTokenSource object, which manages and sends cancellation notification to the individual cancellation tokens.
Pass the token returned by the CancellationTokenSource.Token property to each task or thread that listens for cancellation.
Provide a mechanism for each task or thread to respond to cancellation.
Call the CancellationTokenSource.Cancel method to provide notification of cancellation.

// Create the token source.
CancellationTokenSource cts = new CancellationTokenSource();

// Pass the token to the cancelable operation.
ThreadPool.QueueUserWorkItem(obj =>
{
    if (obj is CancellationToken token)
    {
        for (int i = 0; i < 100000; i++)
        {
            if (token.IsCancellationRequested)
            {
                Console.WriteLine("In iteration {0}, cancellation has been requested...", i + 1);
                // Perform cleanup if necessary.
                //...
                // Terminate the operation.
                break;
            }
            // Simulate some work.
            Thread.SpinWait(500000);
        }
    }
}, cts.Token);
Thread.Sleep(2500);

// Request cancellation.
cts.Cancel();
Console.WriteLine("Cancellation set in token source...");
Thread.Sleep(2500);
// Cancellation should have happened, so call Dispose.
cts.Dispose();

// The example displays output like the following:
//       Cancellation set in token source...
//       In iteration 1430, cancellation has been requested...

The CancellationTokenSource class implements the IDisposable interface. You should be sure to call the CancellationTokenSource.Dispose method when you have finished using the cancellation token source to free any unmanaged resources it holds.

The following illustration shows the relationship between a token source and all the copies of its token.

CancellationTokenSource and cancellation tokens

The cooperative cancellation model makes it easier to create cancellation-aware applications and libraries, and it supports the following features:

Cancellation is cooperative and is not forced on the listener. The listener determines how to gracefully terminate in response to a cancellation request.
Requesting is distinct from listening. An object that invokes a cancelable operation can control when (if ever) cancellation is requested.
The requesting object issues the cancellation request to all copies of the token by using just one method call.
A listener can listen to multiple tokens simultaneously by joining them into one linked token.
User code can notice and respond to cancellation requests from library code, and library code can notice and respond to cancellation requests from user code.
Listeners can be notified of cancellation requests by polling, callback registration, or waiting on wait handles.

In more complex cases, it might be necessary for the user delegate to notify library code that cancellation has occurred. In such cases, the correct way to terminate the operation is for the delegate to call the ThrowIfCancellationRequested, method, which will cause an OperationCanceledException to be thrown. Library code can catch this exception on the user delegate thread and examine the exception’s token to determine whether the exception indicates cooperative cancellation or some other exceptional situation.

The System.Threading.Tasks.Task and System.Threading.Tasks.Task<TResult> classes support cancellation by using cancellation tokens. You can terminate the operation by using one of these options:

By returning from the delegate. In many scenarios, this option is sufficient. However, a task instance that’s canceled in this way transitions to the TaskStatus.RanToCompletion state, not to the TaskStatus.Canceled state.
By throwing an OperationCanceledException and passing it the token on which cancellation was requested. The preferred way to perform is to use the ThrowIfCancellationRequested method. A task that’s canceled in this way transitions to the Canceled state, which the calling code can use to verify that the task responded to its cancellation request.

When a task instance observes an OperationCanceledException thrown by the user code, it compares the exception’s token to its associated token (the one that was passed to the API that created the Task). If the tokens are same and the token’s IsCancellationRequested property returns true, the task interprets this as acknowledging cancellation and transitions to the Canceled state. If you don’t use a Wait or WaitAll method to wait for the task, then the task just sets its status to Canceled.

If you’re waiting on a Task that transitions to the Canceled state, a System.Threading.Tasks.TaskCanceledException exception (wrapped in an AggregateException exception) is thrown. This exception indicates successful cancellation instead of a faulty situation. Therefore, the task’s Exception property returns null.

public class TaskCanceledException : OperationCanceledException

If the token’s IsCancellationRequested property returns false or if the exception’s token doesn’t match the Task’s token, the OperationCanceledException is treated like a normal exception, causing the Task to transition to the Faulted state. The presence of other exceptions will also cause the Task to transition to the Faulted state. You can get the status of the completed task in the Status property.

It’s possible that a task might continue to process some items after cancellation is requested.

Please note that if you use Task.Run(() => …, cancellationToken), then cancellation before execution leads to a Task in Canceled status. Just ensure to pass the CancellationToken as an argument to the Task.Run method.

CancellationTokenSource cts = new CancellationTokenSource();
CancellationToken ct = cts.Token;

Task task = Task.Run(() =>
{
    for (int i = 0; i < 100000; i++)
    {
        ct.ThrowIfCancellationRequested();
        Thread.SpinWait(500000);
    }
}, ct);

Thread.Sleep(1000);
cts.Cancel();

try
{
    task.Wait();
}
catch (AggregateException ae)
{
    Console.WriteLine($"Task.Status: {task.Status}. Task.Exception: {task.Exception is null}."
        + $" Catched: {ae.InnerException?.GetType().Name}.");
    // Task.Status: Canceled. Task.Exception: True. Catched: TaskCanceledException.
}

try
{
    await task;
}
catch (OperationCanceledException)
{
    Console.WriteLine($"Task.Status: {task.Status}. Task.Exception: {task.Exception is null}.");
    // Task.Status: Canceled. Task.Exception: True.
}

1.5. Foreground and background threads

A managed thread is either a background thread or a foreground thread. Background threads are identical to foreground threads with one exception: a background thread does not keep the managed execution environment running. Once all foreground threads have been stopped in a managed process (where the .exe file is a managed assembly), the system stops all background threads and shuts down.

Use the Thread.IsBackground property to determine whether a thread is a background or a foreground thread, or to change its status. A thread can be changed to a background thread at any time by setting its IsBackground property to true.

Threads that belong to the managed thread pool (that is, threads whose IsThreadPoolThread property is true) are background threads. All threads that enter the managed execution environment from unmanaged code are marked as background threads. All threads generated by creating and starting a new Thread object are by default foreground threads.

If you use a thread to monitor an activity, such as a socket connection, set its IsBackground property to true so that the thread does not prevent your process from terminating.

ThreadPool.QueueUserWorkItem(_ =>
{
    Thread.CurrentThread.IsBackground = false;
    Console.WriteLine($"Thread.CurrentThread.IsBackground: {Thread.CurrentThread.IsBackground}");
});

* In .NET, even though you can technically change the IsBackground property of a thread, it has no effect on ThreadPool threads.

The ThreadPool has its own management system and controls thread life-cycles, where it always treats its threads as background threads. This means that those threads will not prevent a process from terminating. Even if you set IsBackground to false, the thread pool will ignore this setting.

In the code you provided, you’re attempting to change the IsBackground property of a ThreadPool thread. Although it may not raise an exception, it won’t actually make any difference to the operation of the thread or to your application because the ThreadPool overrides this and controls its threads as being background threads.

Always remember that ThreadPool threads are designed for short operations or independent tasks in a multithreaded application. When having longer tasks or when a need for a fine-grained control over thread background/foreground status arises, regular Thread objects may be a better choice.

1.6. The managed thread pool

The System.Threading.ThreadPool class provides your application with a pool of worker threads that are managed by the system, allowing you to concentrate on application tasks rather than thread management. If you have short tasks that require background processing, the managed thread pool is an easy way to take advantage of multiple threads. Use of the thread pool is significantly easier in Framework 4 and later, since you can create Task and Task<TResult> objects that perform asynchronous tasks on thread pool threads. [5]

1.6.1. Thread pool characteristics

NET uses thread pool threads for many purposes, including Task Parallel Library (TPL) operations, asynchronous I/O completion, timer callbacks, registered wait operations, asynchronous method calls using delegates, and System.Net socket connections.

Thread pool threads are background threads. Each thread uses the default stack size, runs at the default priority, and is in the multithreaded apartment. Once a thread in the thread pool completes its task, it’s returned to a queue of waiting threads. From this moment it can be reused. This reuse enables applications to avoid the cost of creating a new thread for each task.

There is only one thread pool per process.

1.6.2. Exceptions in thread pool threads

Unhandled exceptions in thread pool threads terminate the process. There are three exceptions to this rule:

A System.Threading.ThreadAbortException is thrown in a thread pool thread because Thread.Abort was called.
A System.AppDomainUnloadedException is thrown in a thread pool thread because the application domain is being unloaded.
The common language runtime or a host process terminates the thread.

1.6.3. Maximum number of thread pool threads

The number of operations that can be queued to the thread pool is limited only by available memory. However, the thread pool limits the number of threads that can be active in the process simultaneously. If all thread pool threads are busy, additional work items are queued until threads to execute them become available. The default size of the thread pool for a process depends on several factors, such as the size of the virtual address space. A process can call the ThreadPool.GetMaxThreads method to determine the number of threads.

You can control the maximum number of threads by using the ThreadPool.GetMaxThreads and ThreadPool.SetMaxThreads methods.

1.6.4. Thread pool minimums

The thread pool provides new worker threads or I/O completion threads on demand until it reaches a specified minimum for each category. You can use the ThreadPool.GetMinThreads method to obtain these minimum values.

When demand is low, the actual number of thread pool threads can fall below the minimum values.

When a minimum is reached, the thread pool can create additional threads or wait until some tasks complete. The thread pool creates and destroys worker threads in order to optimize throughput, which is defined as the number of tasks that complete per unit of time. Too few threads might not make optimal use of available resources, whereas too many threads could increase resource contention.

You can use the ThreadPool.SetMinThreads method to increase the minimum number of idle threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases the thread pool will perform better with its own algorithm for allocating threads.

1.6.5. When not to use thread pool threads

There are several scenarios in which it’s appropriate to create and manage your own threads instead of using thread pool threads:

You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large number of blocked thread pool threads might prevent tasks from starting.
You need to place threads into a single-threaded apartment. All ThreadPool threads are in the multithreaded apartment.
You need to have a stable identity associated with the thread, or to dedicate a thread to a task.

1.7. Overview of synchronization primitives

.NET provides a range of types that you can use to synchronize access to a shared resource or coordinate thread interaction. [6]

1.7.1. WaitHandle class and lightweight synchronization types

Multiple .NET synchronization primitives derive from the System.Threading.WaitHandle class, which encapsulates a native operating system synchronization handle and uses a signaling mechanism for thread interaction. Those classes include:

System.Threading.Mutex, which grants exclusive access to a shared resource. The state of a mutex is signaled if no thread owns it.
System.Threading.Semaphore, which limits the number of threads that can access a shared resource or a pool of resources concurrently. The state of a semaphore is set to signaled when its count is greater than zero, and nonsignaled when its count is zero.
System.Threading.EventWaitHandle, which represents a thread synchronization event and can be either in a signaled or unsignaled state.
System.Threading.AutoResetEvent, which derives from EventWaitHandle and, when signaled, resets automatically to an unsignaled state after releasing a single waiting thread.
System.Threading.ManualResetEvent, which derives from EventWaitHandle and, when signaled, stays in a signaled state until the Reset method is called.

In .NET Framework, because WaitHandle derives from System.MarshalByRefObject, these types can be used to synchronize the activities of threads across application domain boundaries.

In .NET Framework, .NET Core, and .NET 5+, some of these types can represent named system synchronization handles, which are visible throughout the operating system and can be used for the inter-process synchronization:

Mutex
Semaphore (on Windows)
EventWaitHandle (on Windows)

Lightweight synchronization types don’t rely on underlying operating system handles and typically provide better performance. However, they cannot be used for the inter-process synchronization. Use those types for thread synchronization within one application.

Some of those types are alternatives to the types derived from WaitHandle. For example, SemaphoreSlim is a lightweight alternative to Semaphore.

public class SemaphoreSlim : IDisposable
public sealed class Semaphore : System.Threading.WaitHandle

1.7.2. Synchronization of access to a shared resource

.NET provides a range of synchronization primitives to control access to a shared resource by multiple threads.

1.7.2.1. Monitor class

The System.Threading.Monitor class grants mutually exclusive access to a shared resource by acquiring or releasing a lock on the object that identifies the resource. While a lock is held, the thread that holds the lock can again acquire and release the lock. Any other thread is blocked from acquiring the lock and the Monitor.Enter method waits until the lock is released. The Enter method acquires a released lock. You can also use the Monitor.TryEnter method to specify the amount of time during which a thread attempts to acquire a lock. Because the Monitor class has thread affinity, the thread that acquired a lock must release the lock by calling the Monitor.Exit method.

You can coordinate the interaction of threads that acquire a lock on the same object by using the Monitor.Wait, Monitor.Pulse, and Monitor.PulseAll methods.

Use the lock statement in C# and the SyncLock statement in Visual Basic to synchronize access to a shared resource instead of using the Monitor class directly. Those statements are implemented by using the Enter and Exit methods and a try…finally block to ensure that the acquired lock is always released.

var ch = new BlockingChannel<object>();
ThreadPool.QueueUserWorkItem(_ =>
{
    for (int i = 0; i < 10; i++)
    {
        ch.Add(i);
    }
    ch.Add(null!);
});

foreach (var v in ch)
{
    Console.Write($"{v} ");
}

class BlockingChannel<T> : IEnumerable<T> where T : class, new()
{
    private readonly object lockObj = new();
    private bool _isEmpty = true;
    private T? _val;

    public void Add(T value)
    {
        Monitor.Enter(lockObj);
        try
        {
            while (!_isEmpty)
            {
                Monitor.Wait(lockObj);
            }
            _isEmpty = false;
            _val = value;
            Monitor.Pulse(lockObj);
        }
        finally
        {
            Monitor.Exit(lockObj);
        }
    }

    public T? Get()
    {
        Monitor.Enter(lockObj);
        try
        {
            while (_isEmpty)
            {
                Monitor.Wait(lockObj);
            }
            _isEmpty = true;
            Monitor.Pulse(lockObj);
            return _val;
        }
        finally
        {
            Monitor.Exit(lockObj);
        }
    }

    public IEnumerator<T> GetEnumerator()
    {
        while (true)
        {
            T? val = Get();
            if (val == null) break;
            yield return val;
        }
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}
// $ dotnet run
// 0 1 2 3 4 5 6 7 8 9

1.7.2.2. Mutex class

The System.Threading.Mutex class, like Monitor, grants exclusive access to a shared resource. Use one of the Mutex.WaitOne method overloads to request the ownership of a mutex. Like Monitor, Mutex has thread affinity and the thread that acquired a mutex must release it by calling the Mutex.ReleaseMutex method.

Unlike Monitor, the Mutex class can be used for inter-process synchronization. To do that, use a named mutex, which is visible throughout the operating system. To create a named mutex instance, use a Mutex constructor that specifies a name. You can also call the Mutex.OpenExisting method to open an existing named system mutex.

1.7.2.3. SpinLock structure

The System.Threading.SpinLock structure, like Monitor, grants exclusive access to a shared resource based on the availability of a lock. When SpinLock attempts to acquire a lock that is unavailable, it waits in a loop, repeatedly checking until the lock becomes available.

SpinLock sl = new SpinLock();
StringBuilder sb = new StringBuilder();

// Action taken by each parallel job.
// Append to the StringBuilder 10000 times, protecting
// access to sb with a SpinLock.
Action action = () =>
{
    bool gotLock = false;
    for (int i = 0; i < 10000; i++)
    {
        gotLock = false;
        try
        {
            sl.Enter(ref gotLock);
            sb.Append(i % 10);
        }
        finally
        {
            // Only give up the lock if you actually acquired it
            if (gotLock) { sl.Exit(); }
        }
    }
};

// Invoke 3 concurrent instances of the action above
Parallel.Invoke(action, action, action);

// Check/Show the results
Console.WriteLine("sb.Length = {0} (should be 30000)", sb.Length);
Console.WriteLine("number of occurrences of '5' in sb: {0} (should be 3000)",
    sb.ToString().Where(c => (c == '5')).Count());

1.7.2.4. ReaderWriterLockSlim class

The System.Threading.ReaderWriterLockSlim class grants exclusive access to a shared resource for writing and allows multiple threads to access the resource simultaneously for reading. You might want to use ReaderWriterLockSlim to synchronize access to a shared data structure that supports thread-safe read operations, but requires exclusive access to perform write operation. When a thread requests exclusive access (for example, by calling the ReaderWriterLockSlim.EnterWriteLock method), subsequent reader and writer requests block until all existing readers have exited the lock, and the writer has entered and exited the lock.

class SynchronizedDictionary<TKey, TValue> : IDisposable where TKey : notnull
{
    private readonly Dictionary<TKey, TValue> _dictionary = new Dictionary<TKey, TValue>();
    private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim();

    public void Add(TKey key, TValue value)
    {
        _lock.EnterWriteLock();
        try
        {
            _dictionary.Add(key, value);
        }
        finally { _lock.ExitWriteLock(); }
    }

    public void TryAddValue(TKey key, TValue value)
    {
        _lock.EnterUpgradeableReadLock();
        try
        {
            if (_dictionary.TryGetValue(key, out var res) && res != null && res.Equals(value)) return;

            _lock.EnterWriteLock();
            try
            {
                _dictionary[key] = value;
            }
            finally { _lock.ExitWriteLock(); }
        }
        finally { _lock.ExitUpgradeableReadLock(); }
    }

    public bool TryGetValue(TKey key, [MaybeNullWhen(false)] out TValue value)
    {
        _lock.EnterReadLock();
        try
        {
            return _dictionary.TryGetValue(key, out value);
        }
        finally { _lock.ExitReadLock(); }
    }

    private bool _disposed;

    protected virtual void Dispose(bool disposing)
    {
        if (!_disposed)
        {
            if (disposing)
            {
                // perform managed resource cleanup here
                _lock.Dispose();
            }

            // perform unmanaged resource cleanup here
            _disposed = true;
        }
    }

    ~SynchronizedDictionary() => Dispose(disposing: false);

    public void Dispose()
    {
        Dispose(disposing: true);
        GC.SuppressFinalize(this);
    }
}

1.7.2.5. Semaphore and SemaphoreSlim classes

The System.Threading.Semaphore and System.Threading.SemaphoreSlim classes limit the number of threads that can access a shared resource or a pool of resources concurrently. Additional threads that request the resource wait until any thread releases the semaphore. Because the semaphore doesn’t have thread affinity, a thread can acquire the semaphore and another one can release it.

SemaphoreSlim is a lightweight alternative to Semaphore and can be used only for synchronization within a single process boundary.

On Windows, you can use Semaphore for the inter-process synchronization. To do that, create a Semaphore instance that represents a named system semaphore by using one of the Semaphore constructors that specifies a name or the Semaphore.OpenExisting method. SemaphoreSlim doesn’t support named system semaphores.

1.7.3. Thread interaction, or signaling

Thread interaction (or thread signaling) means that a thread must wait for notification, or a signal, from one or more threads in order to proceed. For example, if thread A calls the Thread.Join method of thread B, thread A is blocked until thread B completes. The synchronization primitives described in the preceding section provide a different mechanism for signaling: by releasing a lock, a thread notifies another thread that it can proceed by acquiring the lock.

1.7.3.1. EventWaitHandle, AutoResetEvent, ManualResetEvent, and ManualResetEventSlim classes

The System.Threading.EventWaitHandle class represents a thread synchronization event.

A synchronization event can be either in an unsignaled or signaled state. When the state of an event is unsignaled, a thread that calls the event’s WaitOne overload is blocked until an event is signaled. The EventWaitHandle.Set method sets the state of an event to signaled.

The behavior of an EventWaitHandle that has been signaled depends on its reset mode:

An EventWaitHandle created with the EventResetMode.AutoReset flag resets automatically after releasing a single waiting thread. It’s like a turnstile that allows only one thread through each time it’s signaled. The System.Threading.AutoResetEvent class, which derives from EventWaitHandle, represents that behavior.
An EventWaitHandle created with the EventResetMode.ManualReset flag remains signaled until its Reset method is called. It’s like a gate that is closed until signaled and then stays open until someone closes it. The System.Threading.ManualResetEvent class, which derives from EventWaitHandle, represents that behavior. The System.Threading.ManualResetEventSlim class is a lightweight alternative to ManualResetEvent.

On Windows, you can use EventWaitHandle for the inter-process synchronization. To do that, create an EventWaitHandle instance that represents a named system synchronization event by using one of the EventWaitHandle constructors that specifies a name or the EventWaitHandle.OpenExisting method.

Event wait handles are not .NET events. There are no delegates or event handlers involved. The word "event" is used to describe them because they have traditionally been referred to as operating-system events, and because the act of signaling the wait handle indicates to waiting threads that an event has occurred.

Event Wait Handles That Reset Automatically [7]

You create an automatic reset event by specifying EventResetMode.AutoReset when you create the EventWaitHandle object. As its name implies, this synchronization event resets automatically when signaled, after releasing a single waiting thread. Signal the event by calling its Set method.

Automatic reset events are usually used to provide exclusive access to a resource for a single thread at a time. A thread requests the resource by calling the WaitOne method. If no other thread is holding the wait handle, the method returns true and the calling thread has control of the resource.

If an automatic reset event is signaled when no threads are waiting, it remains signaled until a thread attempts to wait on it. The event releases the thread and immediately resets, blocking subsequent threads.
Event Wait Handles That Reset Manually [7]

You create a manual reset event by specifying EventResetMode.ManualReset when you create the EventWaitHandle object. As its name implies, this synchronization event must be reset manually after it has been signaled. Until it is reset, by calling its Reset method, threads that wait on the event handle proceed immediately without blocking.

A manual reset event acts like the gate of a corral. When the event is not signaled, threads that wait on it block, like horses in a corral. When the event is signaled, by calling its Set method, all waiting threads are free to proceed. The event remains signaled until its Reset method is called. This makes the manual reset event an ideal way to hold up threads that need to wait until one thread finishes a task.

Like horses leaving a corral, it takes time for the released threads to be scheduled by the operating system and to resume execution. If the Reset method is called before all the threads have resumed execution, the remaining threads once again block. Which threads resume and which threads block depends on random factors like the load on the system, the number of threads waiting for the scheduler, and so on. This is not a problem if the thread that signals the event ends after signaling, which is the most common usage pattern. If you want the thread that signaled the event to begin a new task after all the waiting threads have resumed, you must block it until all the waiting threads have resumed. Otherwise, you have a race condition, and the behavior of your code is unpredictable.
```
EventWaitHandle ewh = new EventWaitHandle(false, EventResetMode.ManualReset);
ThreadPool.QueueUserWorkItem(_ =>
{
    ewh.WaitOne();
    Console.WriteLine("FooSingled");
});
ThreadPool.QueueUserWorkItem(_ =>
{
    ewh.WaitOne();
    Console.WriteLine("BarSingled");
});
ewh.Set();
Thread.Sleep(1000);
// $ dotnet run
// BarSingled
// FooSingled
```

1.7.3.2. CountdownEvent class

The System.Threading.CountdownEvent class represents an event that becomes set when its count is zero. While CountdownEvent.CurrentCount is greater than zero, a thread that calls CountdownEvent.Wait is blocked. Call CountdownEvent.Signal to decrement an event’s count.

In contrast to ManualResetEvent or ManualResetEventSlim, which you can use to unblock multiple threads with a signal from one thread, you can use CountdownEvent to unblock one or more threads with signals from multiple threads.

1.7.3.3. Barrier class

The System.Threading.Barrier class represents a thread execution barrier. A thread that calls the Barrier.SignalAndWait method signals that it reached the barrier and waits until other participant threads reach the barrier. When all participant threads reach the barrier, they proceed and the barrier is reset and can be used again.

You might use Barrier when one or more threads require the results of other threads before proceeding to the next computation phase.

1.7.4. Interlocked class

The System.Threading.Interlocked class provides static methods that perform simple atomic operations on a variable. Those atomic operations include addition, increment and decrement, exchange and conditional exchange that depends on a comparison, and read operation of a 64-bit integer value.

1.7.5. SpinWait structure

The System.Threading.SpinWait structure provides support for spin-based waiting. You might want to use it when a thread has to wait for an event to be signaled or a condition to be met, but when the actual wait time is expected to be less than the waiting time required by using a wait handle or by otherwise blocking the thread. By using SpinWait, you can specify a short period of time to spin while waiting, and then yield (for example, by waiting or sleeping) only if the condition was not met in the specified time.

1.8. Thread-safe collections

The System.Collections.Concurrent namespace includes several collection classes that are both thread-safe and scalable. Multiple threads can safely and efficiently add or remove items from these collections, without requiring additional synchronization in user code. When you write new code, use the concurrent collection classes to write multiple threads to the collection concurrently. If you’re only reading from a shared collection, then you can use the classes in the System.Collections.Generic namespace.

1.8.1. Fine-grained locking and lock-free mechanisms

Some of the concurrent collection types use lightweight synchronization mechanisms such as SpinLock, SpinWait, SemaphoreSlim, and CountdownEvent. These synchronization types typically use busy spinning for brief periods before they put the thread into a true Wait state. When wait times are expected to be short, spinning is far less computationally expensive than waiting, which involves an expensive kernel transition. For collection classes that use spinning, this efficiency means that multiple threads can add and remove items at a high rate.

The ConcurrentQueue<T> and ConcurrentStack<T> classes don’t use locks at all. Instead, they rely on Interlocked operations to achieve thread safety.

The following table lists the collection types in the System.Collections.Concurrent namespace:

Type Description

Type	Description
`BlockingCollection<T>`	Provides bounding and blocking functionality for any type that implements `IProducerConsumerCollection<T>`.
`ConcurrentDictionary<TKey,TValue>`	Thread-safe implementation of a dictionary of key-value pairs.
`ConcurrentQueue<T>`	Thread-safe implementation of a FIFO (first-in, first-out) queue.
`ConcurrentStack<T>`	Thread-safe implementation of a LIFO (last-in, first-out) stack.
`ConcurrentBag<T>`	Thread-safe implementation of an unordered collection of elements.
`IProducerConsumerCollection<T>`	The interface that a type must implement to be used in a `BlockingCollection`.

BlockingCollection<T>

Provides bounding and blocking functionality for any type that implements IProducerConsumerCollection<T>.

ConcurrentDictionary<TKey,TValue>

Thread-safe implementation of a dictionary of key-value pairs.

ConcurrentQueue<T>

Thread-safe implementation of a FIFO (first-in, first-out) queue.

ConcurrentStack<T>

Thread-safe implementation of a LIFO (last-in, first-out) stack.

ConcurrentBag<T>

Thread-safe implementation of an unordered collection of elements.

IProducerConsumerCollection<T>

The interface that a type must implement to be used in a BlockingCollection.

FAQ :: Are all of the new concurrent collections lock-free?

1.8.2. What’s the diff of BlockingCollection<T> and Channel<T> ?

* BlockingCollection<T> and Channel<T> are both useful for producer/consumer scenarios where one thread or task is producing data and another thread or task is consuming that data. However, their implementation and features are quite different, and they are designed to handle different use-cases.

BlockingCollection<T> is part of the System.Collections.Concurrent namespace and was introduced in .NET Framework 4.0. It provides a thread-safe, blocking and bounded collection that can be used with multiple producers and consumers.

Benefits of BlockingCollection<T>:

It simplifies thread communication, as it blocks and waits when trying to add to a full collection or take from an empty one.
It provides Add and Take methods for managing the collection, which if bounded, will block if the collection is full or empty, respectively.
It implements IEnumerable<T>, allowing easy enumeration of the items in the collection.
It has built-in functionality for creating a complete producer/consumer on top of any IProducerConsumerCollection<T>.

Channel<T> is part of the System.Threading.Channels namespace and was introduced in .NET Core 3.0. It’s newer and designed for the modern .NET threading infrastructure using async and await design patterns. [8]

Benefits of Channel<T>:

It supports the async programming model and can be used with async and await keywords in C#.
It is designed for scenarios where you have asynchronous data streams that need to be processed.
It provides both synchronous and asynchronous methods for adding (Writer.TryWrite, Writer.WriteAsync) and receiving (Reader.TryRead, Reader.ReadAsync) data.
It supports back pressure by naturally making the producer wait if the channel is full.
It allows for creating unbounded or bounded channels via Channel.CreateUnbounded<T> and Channel.CreateBounded<T>.

In general, Channel<T> is more modern and better integrated with async programming model. Therefore, for newer applications it is recommended to use the Channel<T> class.

However, if you have a legacy application where you cannot use async and await extensively, or where you are using ThreadPool and Tasks heavily, then BlockingCollection<T> might be a better choice.

1.9. Windows Presentation Foundation (WPF): Threading model

Typically, WPF applications start with two threads: one for handling rendering and another for managing the UI. The rendering thread effectively runs hidden in the background while the UI thread receives input, handles events, paints the screen, and runs application code. Most applications use a single UI thread, although in some situations it is best to use several. [11]

The UI thread queues work items inside an object called a Dispatcher. The Dispatcher selects work items on a priority basis and runs each one to completion. Every UI thread must have at least one Dispatcher, and each Dispatcher can execute work items in exactly one thread.

The trick to building responsive, user-friendly applications is to maximize the Dispatcher throughput by keeping the work items small. This way items never get stale sitting in the Dispatcher queue waiting for processing. Any perceivable delay between input and response can frustrate a user.

How then are WPF applications supposed to handle big operations? What if your code involves a large calculation or needs to query a database on some remote server? Usually, the answer is to handle the big operation in a separate thread, leaving the UI thread free to tend to items in the Dispatcher queue. When the big operation is complete, it can report its result back to the UI thread for display.

If only one thread can modify the UI, how do background threads interact with the user? A background thread can ask the UI thread to perform an operation on its behalf. It does this by registering a work item with the Dispatcher of the UI thread. The Dispatcher class provides the methods for registering work items: Dispatcher.InvokeAsync, Dispatcher.BeginInvoke, and Dispatcher.Invoke. These methods schedule a delegate for execution. Invoke is a synchronous call – that is, it doesn’t return until the UI thread actually finishes executing the delegate. InvokeAsync and BeginInvoke are asynchronous and return immediately.

1.10. The volatile keyword

The volatile keyword indicates that a field might be modified by multiple threads that are executing at the same time. The compiler, the runtime system, and even hardware may rearrange reads and writes to memory locations for performance reasons. Fields that are declared volatile are excluded from certain kinds of optimizations. There is no guarantee of a single total ordering of volatile writes as seen from all threads of execution. [9]

On a multiprocessor system, a volatile read operation does not guarantee to obtain the latest value written to that memory location by any processor. Similarly, a volatile write operation does not guarantee that the value written would be immediately visible to other processors.

The volatile keyword can be applied to fields of these types:

Reference types.
Pointer types (in an unsafe context). Note that although the pointer itself can be volatile, the object that it points to cannot. In other words, you cannot declare a "pointer to volatile."
Simple types such as sbyte, byte, short, ushort, int, uint, char, float, and bool.
An enum type with one of the following base types: byte, sbyte, short, ushort, int, or uint.
Generic type parameters known to be reference types.
IntPtr and UIntPtr.

Other types, including double and long, cannot be marked volatile because reads and writes to fields of those types cannot be guaranteed to be atomic. To protect multi-threaded access to those types of fields, use the Interlocked class members or protect access using the lock statement.

The volatile keyword can only be applied to fields of a class or struct. Local variables cannot be declared volatile.

2. Asynchronous programming patterns

.NET provides three patterns for performing asynchronous operations:

Task-based Asynchronous Pattern (TAP), which uses a single method to represent the initiation and completion of an asynchronous operation. TAP was introduced in .NET Framework 4. It’s the recommended approach to asynchronous programming in .NET. The async and await keywords in C# and the Async and Await operators in Visual Basic add language support for TAP.
Event-based Asynchronous Pattern (EAP), which is the event-based legacy model for providing asynchronous behavior. It requires a method that has the Async suffix and one or more events, event handler delegate types, and EventArg-derived types. EAP was introduced in .NET Framework 2.0. It’s no longer recommended for new development.
Asynchronous Programming Model (APM) pattern (also called the IAsyncResult pattern), which is the legacy model that uses the IAsyncResult interface to provide asynchronous behavior. In this pattern, asynchronous operations require Begin and End methods (for example, BeginWrite and EndWrite to implement an asynchronous write operation). This pattern is no longer recommended for new development.

3. Lazy Initialization

Lazy initialization of an object means that its creation is deferred until it is first used. (For this topic, the terms lazy initialization and lazy instantiation are synonymous.) Lazy initialization is primarily used to improve performance, avoid wasteful computation, and reduce program memory requirements. [12]

Although you can write your own code to perform lazy initialization, we recommend that you use Lazy<T> instead. Lazy<T> and its related types also support thread-safety and provide a consistent exception propagation policy.

Table 1. The following table lists the types that the .NET Framework version 4 provides to enable lazy initialization in different scenarios.
Type	Description
Lazy<T>	A wrapper class that provides lazy initialization semantics for any class library or user-defined type.
ThreadLocal<T>	Resembles `Lazy<T>` except that it provides lazy initialization semantics on a thread-local basis. Every thread has access to its own unique value.
LazyInitializer	Provides advanced static (Shared in Visual Basic) methods for lazy initialization of objects without the overhead of a class.

4. Parallel programming in .NET

Many personal computers and workstations have multiple CPU cores that enable multiple threads to be executed simultaneously. To take advantage of the hardware, you can parallelize your code to distribute work across multiple processors. [13]

In the past, parallelization required low-level manipulation of threads and locks. Visual Studio and .NET enhance support for parallel programming by providing a runtime, class library types, and diagnostic tools. These features, which were introduced in .NET Framework 4, simplify parallel development. You can write efficient, fine-grained, and scalable parallel code in a natural idiom without having to work directly with threads or the thread pool.

The following illustration provides a high-level overview of the parallel programming architecture in .NET.

4.1. Task Parallel Library (TPL)

The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System.Threading.Tasks namespaces. The purpose of the TPL is to make developers more productive by simplifying the process of adding parallelism and concurrency to applications. The TPL dynamically scales the degree of concurrency to use all the available processors most efficiently. In addition, the TPL handles the partitioning of the work, the scheduling of threads on the ThreadPool, cancellation support, state management, and other low-level details. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish.

4.2. Data Parallelism (Task Parallel Library)

Data parallelism refers to scenarios in which the same operation is performed concurrently (that is, in parallel) on elements in a source collection or array. In data parallel operations, the source collection is partitioned so that multiple threads can operate on different segments concurrently. [14]

The Task Parallel Library (TPL) supports data parallelism through the System.Threading.Tasks.Parallel class. This class provides method-based parallel implementations of for and foreach loops (For and For Each in Visual Basic). You write the loop logic for a Parallel.For or Parallel.ForEach loop much as you would write a sequential loop. You do not have to create threads or queue work items. In basic loops, you do not have to take locks. The TPL handles all the low-level work for you.

string path = Path.Combine(
    Environment.GetFolderPath(Environment.SpecialFolder.UserProfile), ".nuget/packages/");
string[] fileNames = Directory.GetFiles(path, "*", SearchOption.AllDirectories);

Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 2; i++)
{
    sw.Restart();
    long parallelTotalSize = 0;
    Parallel.ForEach(fileNames,
        fileName => Interlocked.Add(ref parallelTotalSize, new FileInfo(fileName).Length));
    Console.WriteLine($"Parallel: {parallelTotalSize}, {sw.ElapsedMilliseconds}ms");

    sw.Restart();
    long totalSize = 0;
    foreach (string fileName in fileNames) totalSize += new FileInfo(fileName).Length;
    Console.WriteLine($"Sequential : {totalSize}, {sw.ElapsedMilliseconds}ms");
}
// $ dotnet run
// Parallel: 2743226084, 400ms
// Sequential : 2743226084, 598ms
// Parallel: 2743226084, 220ms
// Sequential : 2743226084, 429ms

4.3. Dataflow (Task Parallel Library)

The Task Parallel Library (TPL) provides dataflow components to help increase the robustness of concurrency-enabled applications. These dataflow components are collectively referred to as the TPL Dataflow Library. This dataflow model promotes actor-based programming by providing in-process message passing for coarse-grained dataflow and pipelining tasks. The dataflow components build on the types and scheduling infrastructure of the TPL and integrate with the C#, Visual Basic, and F# language support for asynchronous programming. These dataflow components are useful when you have multiple operations that must communicate with one another asynchronously or when you want to process data as it becomes available. Dataflow (Task Parallel Library)

The TPL Dataflow Library provides a foundation for message passing and parallelizing CPU-intensive and I/O-intensive applications that have high throughput and low latency. Because the runtime manages dependencies between data, you can often avoid the requirement to synchronize access to shared data. In addition, because the runtime schedules work based on the asynchronous arrival of data, dataflow can improve responsiveness and throughput by efficiently managing the underlying threads.

The TPL Dataflow Library consists of dataflow blocks, which are data structures that buffer and process data. The TPL defines three kinds of dataflow blocks: source blocks, target blocks, and propagator blocks.

A source block acts as a source of data and can be read from.
A target block acts as a receiver of data and can be written to.
A propagator block acts as both a source block and a target block, and can be read from and written to.

4.4. Task-based asynchronous programming

The Task Parallel Library (TPL) is based on the concept of a task, which represents an asynchronous operation. In some ways, a task resembles a thread or ThreadPool work item but at a higher level of abstraction. The term task parallelism refers to one or more independent tasks running concurrently. Tasks provide two primary benefits: [15]

More efficient and more scalable use of system resources.

Behind the scenes, tasks are queued to the ThreadPool, which has been enhanced with algorithms that determine and adjust to the number of threads. These algorithms provide load balancing to maximize throughput. This process makes tasks relatively lightweight, and you can create many of them to enable fine-grained parallelism.
More programmatic control than is possible with a thread or work item.

Tasks and the framework built around them provide a rich set of APIs that support waiting, cancellation, continuations, robust exception handling, detailed status, custom scheduling, and more.

For both reasons, TPL is the preferred API for writing multi-threaded, asynchronous, and parallel code in .NET.

4.5. Parallel LINQ (PLINQ)

Language-Integrated Query (LINQ) is the name for a set of technologies based on the integration of query capabilities directly into the C# language.

Traditionally, queries against data are expressed as simple strings without type checking at compile time or IntelliSense support. Furthermore, you have to learn a different query language for each type of data source: SQL databases, XML documents, various Web services, and so on.

With LINQ, a query is a first-class language construct, just like classes, methods, and events. [19]

In-memory data

There are two ways you enable LINQ querying of in-memory data. If the data is of a type that implements IEnumerable<T>, you query the data by using LINQ to Objects. If it doesn’t make sense to enable enumeration by implementing the IEnumerable<T> interface, you define LINQ standard query operator methods, either in that type or as extension methods for that type. Custom implementations of the standard query operators should use deferred execution to return the results.
Remote data

The best option for enabling LINQ querying of a remote data source is to implement the IQueryable<T> interface.

At compile time, query expressions are converted to standard query operator method calls according to the rules defined in the C# specification. Any query that can be expressed by using query syntax can also be expressed by using method syntax. In some cases, query syntax is more readable and concise. In others, method syntax is more readable. There’s no semantic or performance difference between the two different forms.

string sentence = "the quick brown fox jumps over the lazy dog";
// Split the string into individual words to create a collection.
string[] words = sentence.Split(' ');

// Using query expression syntax.
var query = from word in words
            group word.ToUpper() by word.Length into gr
            orderby gr.Key
            select new { Length = gr.Key, Words = gr };

// Using method-based query syntax.
var query2 = words.
    GroupBy(w => w.Length, w => w.ToUpper()).
    Select(g => new { Length = g.Key, Words = g }).
    OrderBy(o => o.Length);

foreach (var obj in query)
{
    Console.WriteLine("Words of length {0}:", obj.Length);
    foreach (string word in obj.Words)
        Console.WriteLine(word);
}

Parallel LINQ (PLINQ) is a parallel implementation of the Language-Integrated Query (LINQ) pattern. PLINQ implements the full set of LINQ standard query operators as extension methods for the System.Linq namespace and has additional operators for parallel operations. PLINQ combines the simplicity and readability of LINQ syntax with the power of parallel programming. [20]

A PLINQ query in many ways resembles a non-parallel LINQ to Objects query. PLINQ queries, just like sequential LINQ queries, operate on any in-memory IEnumerable or IEnumerable<T> data source, and have deferred execution, which means they do not begin executing until the query is enumerated. The primary difference is that PLINQ attempts to make full use of all the processors on the system. It does this by partitioning the data source into segments, and then executing the query on each segment on separate worker threads in parallel on multiple processors. In many cases, parallel execution means that the query runs significantly faster.

The System.Linq.ParallelEnumerable class exposes almost all of PLINQ’s functionality, includes implementations of all the standard query operators that LINQ to Objects supports, although it does not attempt to parallelize each one.

In addition to the standard query operators, the ParallelEnumerable class contains a set of methods that enable behaviors specific to parallel execution. These PLINQ-specific methods are listed in the following table.

ParallelEnumerable Operator

Description

AsParallel

The entry point for PLINQ. Specifies that the rest of the query should be parallelized, if it is possible.

AsSequential

Specifies that the rest of the query should be run sequentially, as a non-parallel LINQ query.

AsOrdered

Specifies that PLINQ should preserve the ordering of the source sequence for the rest of the query, or until the ordering is changed, for example by the use of an orderby (Order By in Visual Basic) clause.

AsUnordered

Specifies that PLINQ for the rest of the query is not required to preserve the ordering of the source sequence.

WithCancellation

Specifies that PLINQ should periodically monitor the state of the provided cancellation token and cancel execution if it is requested.

WithDegreeOfParallelism

Specifies the maximum number of processors that PLINQ should use to parallelize the query.

WithMergeOptions

Provides a hint about how PLINQ should, if it is possible, merge parallel results back into just one sequence on the consuming thread.

WithExecutionMode

Specifies whether PLINQ should parallelize the query even when the default behavior would be to run it sequentially.

ForAll

A multithreaded enumeration method that, unlike iterating over the results of the query, enables results to be processed in parallel without first merging back to the consumer thread.

Aggregate overload

An overload that is unique to PLINQ and enables intermediate aggregation over thread-local partitions, plus a final aggregation function to combine the results of all partitions.

IEnumerable<string> files = Directory.EnumerateFiles("/usr/share/man", "*.gz", SearchOption.AllDirectories);

Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 2; i++)
{
    sw.Restart();
    var parallelLetters = files.AsParallel()
        .Select(SplitLetters)
        .SelectMany(w => w)
        .GroupBy(char.ToLower)
        .OrderByDescending(g => g.Count())
        .First();
    Console.WriteLine($"Parallel: {parallelLetters.Key}: {parallelLetters.Count()}, {sw.ElapsedMilliseconds}ms");

    sw.Restart();
    var sequentialLetters = files // .AsParallel().AsSequential()
        .Select(SplitLetters)
        .SelectMany(w => w)
        .GroupBy(char.ToLower)
        .OrderByDescending(g => g.Count())
        .First();
    Console.WriteLine($"Sequential: {sequentialLetters.Key}: {sequentialLetters.Count()}, {sw.ElapsedMilliseconds}ms");
}

static IEnumerable<char> SplitLetters(string fileName)
{
    using StreamReader reader = new StreamReader(fileName);
    string? line;
    while ((line = reader.ReadLine()) != null)
    {
        foreach (char c in line.ToCharArray())
        {
            if (char.IsLetter(c))
                yield return c;
        }
    }
}
// $ dotnet run
// Parallel: e: 251378, 2242ms
// Sequential: e: 251378, 1996ms
// Parallel: e: 251378, 1133ms
// Sequential: e: 251378, 1824ms

5. Asynchronous programming with async and await

You can avoid performance bottlenecks and enhance the overall responsiveness of your application by using asynchronous programming. However, traditional techniques for writing asynchronous applications can be complicated, making them difficult to write, debug, and maintain.

C# supports simplified approach, async programming, that leverages asynchronous support in the .NET runtime. The compiler does the difficult work that the developer used to do, and your application retains a logical structure that resembles synchronous code. As a result, you get all the advantages of asynchronous programming with a fraction of the effort. [16]

5.1. Async improves responsiveness

Asynchrony is essential for activities that are potentially blocking, such as web access. Access to a web resource sometimes is slow or delayed. If such an activity is blocked in a synchronous process, the entire application must wait. In an asynchronous process, the application can continue with other work that doesn’t depend on the web resource until the potentially blocking task finishes.

Asynchrony proves especially valuable for applications that access the UI thread because all UI-related activity usually shares one thread. If any process is blocked in a synchronous application, all are blocked. Your application stops responding, and you might conclude that it has failed when instead it’s just waiting.

When you use asynchronous methods, the application continues to respond to the UI. You can resize or minimize a window, for example, or you can close the application if you don’t want to wait for it to finish.

The async-based approach adds the equivalent of an automatic transmission to the list of options that you can choose from when designing asynchronous operations. That is, you get all the benefits of traditional asynchronous programming but with much less effort from the developer.

5.2. Threads

Async methods are intended to be non-blocking operations. An await expression in an async method doesn’t block the current thread while the awaited task is running. Instead, the expression signs up the rest of the method as a continuation and returns control to the caller of the async method.

The async and await keywords don’t cause additional threads to be created. Async methods don’t require multithreading because an async method doesn’t run on its own thread. The method runs on the current synchronization context and uses time on the thread only when the method is active. You can use Task.Run to move CPU-bound work to a background thread, but a background thread doesn’t help with a process that’s just waiting for results to become available.

5.3. async and await

If you specify that a method is an async method by using the async modifier, you enable the following two capabilities.

The marked async method can use await to designate suspension points. The await operator tells the compiler that the async method can’t continue past that point until the awaited asynchronous process is complete. In the meantime, control returns to the caller of the async method.
The suspension of an async method at an await expression doesn’t constitute an exit from the method, and finally blocks don’t run.
The marked async method can itself be awaited by methods that call it.

An async method typically contains one or more occurrences of an await operator, but the absence of await expressions doesn’t cause a compiler error. If an async method doesn’t use an await operator to mark a suspension point, the method executes as a synchronous method does, despite the async modifier. The compiler issues a warning for such methods.

5.4. SynchronizationContext and ConfigureAwait

Don’t Need ConfigureAwait(false), But Still Use It in Libraries. [22]

SynchronizationContext was also introduced in .NET Framework 2.0, as an abstraction for a general scheduler. In particular, SynchronizationContext’s most used method is Post, which queues a work item to whatever scheduler is represented by that context. [17]

Consider a UI framework like Windows Forms. As with most UI frameworks on Windows, controls are associated with a particular thread, and that thread runs a message pump which runs work that’s able to interact with those controls: only that thread should try to manipulate those controls, and any other thread that wants to interact with the controls should do so by sending a message to be consumed by the UI thread’s pump. Windows Forms makes this easy with methods like Control.BeginInvoke, which queues the supplied delegate and arguments to be run by whatever thread is associated with that Control. You can thus write code like this:

private void button1_Click(object sender, EventArgs e)
{
    ThreadPool.QueueUserWorkItem(_ =>
    {
        string message = ComputeMessage();
        button1.BeginInvoke(() =>
        {
            button1.Text = message;
        });
    });
}

That will offload the ComputeMessage() work to be done on a ThreadPool thread (so as to keep the UI responsive while it’s being processed), and then when that work has completed, queue a delegate back to the thread associated with button1 to update button1’s label. Easy enough. WPF has something similar, just with its Dispatcher type:

private void button1_Click(object sender, RoutedEventArgs e)
{
    ThreadPool.QueueUserWorkItem(_ =>
    {
        string message = ComputeMessage();
        button1.Dispatcher.InvokeAsync(() =>
        {
            button1.Content = message;
        });
    });
}

Each application model then ensures it’s published as SynchronizationContext.Current a SynchronizationContext-derived type that does the "right thing." For example, Windows Forms has this:

public sealed class WindowsFormsSynchronizationContext : SynchronizationContext, IDisposable
{
    public override void Post(SendOrPostCallback d, object? state) =>
        _controlToSendTo?.BeginInvoke(d, new object?[] { state });
    ...
}

and WPF has this:

public sealed class DispatcherSynchronizationContext : SynchronizationContext
{
    public override void Post(SendOrPostCallback d, Object state) =>
        _dispatcher.BeginInvoke(_priority, d, state);
    ...
}

SynchronizationContext makes it possible to call reusable helpers and automatically be scheduled back whenever and to wherever the calling environment deems fit. As a result, it’s natural to expect that to "just work" with async/await, and it does.

button1.Text = await Task.Run(() => ComputeMessage());

That invocation of ComputeMessage is offloaded to the thread pool, and upon the method’s completion, execution transitions back to the UI thread associated with the button, and the setting of its Text property happens on that thread.

That integration with SynchronizationContext is left up to the awaiter implementation (the code generated for the state machine knows nothing about SynchronizationContext), as it’s the awaiter that is responsible for actually invoking or queueing the supplied continuation when the represented asynchronous operation completes. While a custom awaiter need not respect SynchronizationContext.Current, the awaiters for Task, Task<TResult>, ValueTask, and ValueTask<TResult> all do. That means that, by default, when you await a Task, a Task<TResult>, a ValueTask, a ValueTask<TResult>, or even the result of a Task.Yield() call, the awaiter by default will look up the current SynchronizationContext and then if it successfully got a non-default one, will eventually queue the continuation to that context.

The ConfigureAwait method isn’t special: it’s not recognized in any special way by the compiler or by the runtime. It is simply a method that returns a struct (a ConfiguredTaskAwaitable) that wraps the original task it was called on as well as the specified Boolean value. Remember that await can be used with any type that exposes the right pattern. By returning a different type, it means that when the compiler accesses the instances GetAwaiter method (part of the pattern), it’s doing so off of the type returned from ConfigureAwait rather than off of the task directly, and that provides a hook to change the behavior of how the await behaves via this custom awaiter. ConfigureAwait(continueOnCapturedContext: false) is used to avoid forcing the callback to be invoked on the original context or scheduler. [18]

Appendix A: FAQ

A.1. What happens on Thread.Sleep(0) in .NET?

> What happens on Thread.Sleep(0) in .NET?

* In .NET, Thread.Sleep(0) has a special meaning. It relinquishes the thread’s current time slice immediately, voluntarily handing over the CPU to other threads. The operating system could decide to immediately give the time slice back to the same thread if there are no other threads that need to run.

It’s effectively a way to signal to the operating system that the thread is willing to give up its slice of processor time, if there are other threads that are ready to run on the same processor.

However, Thread.Sleep(0) does not lead to a context switch if there are no higher priority threads waiting.

This can be useful to prevent a thread from consuming too much CPU time in a busy-wait scenario or when you might want to give other threads the chance to run.

Remember that using Thread.Sleep is generally not the best practice because it blocks the thread and also it’s not precise i.e. the thread might not awake exactly after specified time due to the way CPU scheduling works. Instead, using techniques such as Task, async/await, or the TPL’s synchronization primitives (like ManualResetEvent, Semaphore, etc.) are typically better approaches for managing thread synchronization and timing.

A.2. What are the worker and completion port threads?

//
// Summary:
//     Sets the number of requests to the thread pool that can be active concurrently.
//     All requests above that number remain queued until thread pool threads become
//     available.
//
// Parameters:
//   workerThreads:
//     The maximum number of worker threads in the thread pool.
//
//   completionPortThreads:
//     The maximum number of asynchronous I/O threads in the thread pool.
//
// Returns:
//     true if the change is successful; otherwise, false.
public static bool SetMaxThreads(int workerThreads, int completionPortThreads);
public static bool SetMinThreads(int workerThreads, int completionPortThreads);

> What are the worker and completion port threads in SetMaxThreads(int workerThreads, int completionPortThreads)?

* The ThreadPool.SetMaxThreads(int workerThreads, int completionPortThreads) method in .NET sets the number of requests to the thread pool that can be active concurrently. These two parameters define:

workerThreads: This is the maximum number of worker threads in the ThreadPool. A worker thread is a thread where your work is actually getting done. It is a thread on which a task is performed. "Worker" threads are used for compute-bound operations, i.e., your average piece of synchronous work.
completionPortThreads: This indicates maximum number of asynchronous I/O threads. Completion ports are used to execute callbacks after certain Async I/O operations have completed. "I/O completion" threads are used for I/O-bound operations (typically disk and network operations).

The threadpool maintains two types of threads, worker threads for the most part handle compute operations and I/O threads handle as the name suggests handle I/O bound operations.

Asynchronous I/O operations, which are operations that complete at some later point, often use callback methods to signal completion. When the system calls these callback methods, it does so using a thread from the ThreadPool.

The ThreadPool.SetMaxThreads method is usually used to increase max number of concurrent threads if you have a lot of short tasks that are I/O bound.

It’s typically not necessary to change the ThreadPool size, because the ThreadPool already optimizes the number of threads it uses based on the number of CPUs and the types of tasks it’s running. In most cases, ThreadPool is efficient in managing its threads. Its algorithm aims to efficiently use all available CPU resources while not overloading the system with too many threads at the same time. It tries to maintain a balance.

A.3. How does .NET identify I/O-bound or compute-bound operations?

> How does .NET identify I/O-bound or compute-bound operations?

* The .NET runtime itself doesn’t actively categorize tasks as I/O-bound or compute-bound. It’s up to developers to ensure that appropriate APIs and techniques are used for I/O-bound and compute-bound tasks.

For I/O-bound operations, which typically include file, database, and network operations, you should use asynchronous methods when they’re available to avoid blocking (or tying up) a thread from the thread pool. When an I/O-bound operation is asynchronous, this generally means the operation is offloaded to the operating system, freeing up the .NET runtime thread to perform other work. I/O bound means the data is transferred between the CPU and the peripheral devices (disk, network, etc.), during which the CPU isn’t doing much work.

On the other hand, when a task is compute-bound, meaning it involves heavy computations and keeps the processor busy, developers should generally use synchronous methods and potentially multiple threads to utilize more cores and finish the computations faster. This kind of operation is typically done in parallel to split the work across multiple threads, where each thread uses synchronous code to do its part of the work.

So, while .NET doesn’t automatically distinguish between I/O-bound and compute-bound operations, it provides facilities (like asynchronous methods for I/O-bound tasks and parallel processing techniques such as the TPL for compute-bound tasks) for developers to write efficient code for both types of operations.

> How to understand 'It’s up to developers…'?

* What "It’s up to developers…" means is that the .NET runtime, or indeed any other runtime, cannot automatically discern if the operation you’re performing is I/O-bound or compute-bound. It’s the responsibility of the developer writing the code to understand what kind of operation they’re performing and to use the correct method to handle it.

For example,

If a developer is reading data from a file, database, or making a network call, these are typically I/O-bound operations. They should use asynchronous methods (async/await) provided by .NET to avoid blocking the thread while the data is being fetched.
If a developer is intending to perform an action that requires heavy computation, like sorting a large array or processing a complex algorithm, these operations are typically compute-bound. They would then use parallel programming constructs like Parallel.For or Parallel.ForEach, or manually creating multiple threads.

The appropriate programming approach (asynchronous or multithreaded) is dictated by the nature of the operation. The developer must know their application, the tasks, and their nature, and then use the appropriate method. The .NET runtime will not do this discernment automatically.

A.4. How does CLR manage the number of threads (worker and I/O threads) in the ThreadPool?

> How does CLR manage the number of threads (worker and I/O threads) in the ThreadPool?

* The CLR, or Common Language Runtime, uses a set algorithm to manage the number of threads in its ThreadPool.

Upon application startup, the ThreadPool possesses few threads. As requests come in, threads are added to fulfill those requests. The CLR considers several factors to make an adjustment, including the workload demand, number of processors, and the number of existing threads.

For each pending task or work item, a check is performed to see if a thread is available. If not, and the current thread count is below the maximum limit, a new thread is created unless we’ve reached the maximum limit "rate". If a ThreadPool thread is idle for some time (around a minute), it gets terminated, which reduces resource usage.

If the thread count reaches the max limit and the requests are still not fulfilled, they are queued. "Worker" threads are commonly used for executing synchronous and asynchronous managed code, and "I/O threads" are specifically designated for completing I/O-bound work.

Both the maximum threads and "rate" are automatically set based on several factors like the size of virtual memory, number of processors available, etc.

The thread injection rate (how often new threads are added) aids to prevent situations where rapidly creating too many threads would cause resource saturation and decrease performance.

You can manually get and change the minimum and maximum number of threads in the ThreadPool by using GetMinThreads, GetMaxThreads, SetMinThreads, and SetMaxThreads methods.

However, it’s recommended to let the CLR manage the threading details because of its smart heuristics and tuning based on extensive research and testing. Custom changes should only be made after careful testing and analysis.

A.5. What’s the algorithm of the thread pool in .NET?

> What’s the algorithm of the thread pool in .NET?

* According to .NET’s official documentation, the CLR employs a hill-climbing algorithm for adjusting the number of ThreadPool threads that can run simultaneously. This heuristic algorithm is designed specifically to find the most efficient and effective number of threads for any given workload.

Here’s how the algorithm works:

It starts with the minimum number of ThreadPool worker threads and regularly checks if there are pending tasks in the queue which are waiting for an available thread.
If it finds tasks waiting in the queue, and it hasn’t already created a new thread recently, it creates a new one, up to the maximum limit.
If there are still tasks in the queue, but it just created a new thread, it waits a bit before creating another one. By waiting, it allows the currently running tasks to finish and free up some ThreadPool threads before deciding to create new ones.
If threads are idle for more than a certain time (like a minute), these threads can be retired and removed from the pool.

This algorithm works to maintain a balance between the number of threads and the amount of work that needs to be done. By ensuring threads are only created as needed, and cleaning up idle threads, it helps to optimize the system resources and performance.

It’s important to note that the .NET ThreadPool’s algorithm is intentionally not publicly documented in detail because it’s subject to change as Microsoft fine-tunes it based on new findings and requirements.

As always, it’s recommended to let the CLR manage the ThreadPool for you, unless you have a specific reason and have done careful performance analysis.

A.6. What if Interlocked.Increment a 64-bit integer on a 32-bit hardware?

> About the Interlocked class in .NET, what if we increment a 64-bit integer on a 32-bit hardware?

* The Interlocked.Increment method guarantees atomic increment operations, even with a 64-bit integer on 32-bit hardware. This method is thread-safe and ensures that the increment operation completes as one uninterruptible operation.

In a multithreaded or parallel computing scenario, multiple threads might be trying to increment the value of a shared 64-bit integer simultaneously. In such a case, Interlocked.Increment ensures that each increment operation happens in its entirety without being interrupted by another thread. This is because, without such atomic operations, a thread could be halfway through incrementing the integer when another thread reads from or writes to it, leading to buggy and hard-to-diagnose problems.

However, performing atomic operations on a 64-bit integer (long/Int64) in 32-bit hardware requires two CPU instructions, because the size of the long value is greater than the word size of the 32-bit processor. But even then, .NET runtime ensures via Interlocked.Increment method that the operation is atomic at thread level and no partial updates can be seen by another thread.

Still, it’s important to note that such atomic operations come with a slightly increased performance overhead as compared to non-atomic operations, due to preventing reordering and caching of the value being incremented. Thus, they must be used judiciously.

A.7. How does .NET make the multiple CPU instructions as an atomic?

> How does CLR make the two CPU instructions as an atomic?

* The .NET Common Language Runtime (CLR) and the underlying operating system provide certain guarantees to ensure atomicity of operations on 64-bit values, even on 32-bit platforms.

The Interlocked class methods in .NET, like Interlocked.Increment, wrap platform-specific instructions for atomic operations. On a 32-bit system, where a 64-bit increment operation would generally require two CPU instructions, these interlocked operations use special instructions provided by the platform to ensure atomicity.

On Windows, for example, this can be achieved using a combination of OS-provided synchronization primitives and specific processor instructions. For instance, the InterlockedIncrement64 function in the Windows API performs a 64-bit atomic increment operation even on a 32-bit platform.

When you call an Interlocked method, the .NET runtime translates that into a call to the appropriate system function or CPU instruction that guarantees the operation’s atomicity. It’s important to note that this relies on support from the operating system kernel and the CPU hardware, and the .NET runtime itself does not handle these details.

These atomic operations prevent any thread interference or inconsistencies while performing read-modify-write actions by ensuring that the operation is completed as a single, uninterruptible unit.

A.8. I heard there are some risk on atomic operations in Go or sth else?

> I heard there are some risk on atomic operations in Go or sth else?

* Yes, coding with atomic operations can indeed be risky, not just in Go but in any language, if they are misused or misunderstood. Here are a few reasons why this is the case:

Atomic operations do not compose: If there are multiple separate atomic operations, there is no guarantee about the order in which they will execute or whether some other operation (not part of the atomic operation) may not interleave among them.
As atomic operations do not lock or synchronize larger pieces of the code, using them incorrectly can leave data in an inconsistent state if a sequence of operations (transaction) must be completed without interruption.
Atomic operations generally require a precise understanding of hardware, compilers, and the language memory model to be used correctly. Any misunderstanding can lead to subtle, intermittent bugs that are hard to trace or replicate.
Atomics can suffer from issues like ABA problems where a value changes from A to B then back to A between your atomic check and usage, causing potential problems when the code assumes the value has not changed.
Atomic operations are tough to debug since typical stepping debuggers might disrupt the precise timing which is usually involved in the bugs related to atomic operations.

In Go, for example, the sync/atomic package provides low-level atomic memory primitives, but they are not sufficient to implement high-level synchronizations. Go encourages the use of higher-level synchronization primitives such as goroutines, channels, and select statements for most cases, and the sync package’s Mutex and WaitGroup types for others.

So while atomic operations are a powerful tool in concurrent programming, they are best suited for a limited set of problems and should be used judiciously and cautiously, usually when performance is a dicerning factor and you’re comfortable and proficient with the complexities of concurrent system behaviour and memory models.

A.9. What’s ABA problems?

> What’s ABA problems?

* The ABA problem is a particularly tricky issue that can occur in multi-threaded or concurrent programs that use lock-free data structures and make use of atomic primitives, such as Compare-and-Swap (CAS).

Here’s a brief explanation of the ABA problem:

Say a thread, Thread A, accesses shared data and notes the value there is 'A'.
Thread A gets preempted, allowing Thread B to run.
Thread B changes the value in the shared data from 'A' to 'B', then changes it back to 'A' before it’s finished.
Thread A resumes execution, goes to change the shared data using a CAS operation. It checks if the value is 'A', which it is, but what Thread A doesn’t know is that the value has changed since it first loaded it. As far as Thread A can tell, though, nothing has changed, and so it goes ahead with the modification.

The result here is that Thread A acted on the assumption that nothing had changed when actually something had, but it was effectively invisible to it. This can have outcomes ranging from inconsequential to catastrophic, depending on what the shared data is used for. This is the ABA problem.

This issue is typically resolved by assigning a version number or a timestamp along with every change so that the revisiting thread can check not only the value but whether it is the same instance of the value it observed before. This kind of solution is often referred to as "tagging". Double-width Compare-and-Swap (DWCAS) or similar atomic primitives are used to atomically change the value and its associated version number or tag.

A.10. How to understand 'hardware, compilers, and the language memory model'?

> How to understand 'hardware, compilers, and the language memory model'?

* Understanding 'hardware, compilers, and the language memory model' is important when working with lower-level concurrent constructs such as atomic operations or volatile fields. Let me break them down:

Hardware: Modern CPUs can have multiple cores with their own caches. When one core updates a value in its cache, other cores won’t immediately see that change until it’s written back to main memory, or if cache-coherency protocols mark it as invalid. Similarly, hardware can execute instructions out of order for optimization purposes, which can lead to unexpected results in concurrent code.
Compilers: Similar to hardware, the compiler can reorder instructions or optimize away redundant loads/stores for performance. This can change the program’s behavior unless specific memory barriers or volatile qualifiers are used. Also, understanding how the hardware and compiler interact is crucial. For example, an x86 CPU has a strong memory model and won’t naturally reorder certain instructions, but the C++ compiler, without any memory barriers, is free to reorder instructions as it sees fit.
Language Memory Model: A language’s memory model defines how threads interact through memory and what behaviors are allowed in multi-threaded code. For instance, the C++11 memory model allows several levels of memory ordering, provides atomic operations and fences, and describes how synchronizes-with and happens-before relationships are formed. Understanding this is vital to writing correct concurrent code.

All these can significantly impact the correctness, performance, and portability of multi-threaded code. Learning about them usually involves studying computer architecture, compilations, operating systems, and of course, the programming language’s specification. For most developers, however, using higher-level synchronization primitives and libraries that abstract these details away is sufficient and recommended.

A.11. Anyway, for a single operation like Interlocked.Increment, it will always ensure it as an atomic?

> Anyway, for a single operation like Interlocked.Increment, it will always ensure it as an atomic?

* Yes, for the Interlocked.Increment method in .NET, it always ensures atomicity of the increment operation, regardless of the system architecture (32-bit or 64-bit) and the data type of the variable being incremented (32-bit like int or 64-bit like long).

This method guarantees that the increment operation, which is otherwise a read-modify-write operation consisting of multiple steps, is executed in a single, unbroken step. No context switch or thread interference can happen in the middle of the operation that could leave the data in an inconsistent state.

It can be used safely in a multithreaded and concurrent environment, ensuring that all incrementing operations on a shared variable are done in full, one at a time.

Please be aware that while Interlocked.Increment ensures atomic increment operations, complex multi-step operations or transactions would require additional synchronization mechanisms, like locks or mutexes.

A.12. How to understand the volatile keyword in .NET?

> How to understand the volatile keyword in .NET? And how to understand "There is no guarantee of a single total ordering of volatile writes as seen from all threads of execution."?

* The volatile keyword in .NET is a modifier that you can use on fields. It tells the compiler and the runtime that accesses to that field should not be subject to certain kinds of compiler or hardware optimizations such as caching its value in a register or reordering reads and writes.

In multithreaded applications, the volatile keyword can be used to ensure that modification to a value by one thread is immediately visible to other threads. Without volatile, the changes one thread makes to the field could be stored in a register and not written back to the memory location until later, causing another thread to see out-of-date values.

The quote "There is no guarantee of a single total ordering of volatile writes as seen from all threads of execution" means that the order in which threads observe volatile writes may not be consistent across all threads. While volatile ensures that each individual thread will observe the writes in the order they were performed, it doesn’t guarantee that all threads will perceive those writes in the same order.

Here’s an example to illustrate:

Thread 1
    Volatile write to variable A
    Volatile write to variable B

Thread 2
    Volatile read from variable B
    Volatile read from variable A

The volatile keyword ensures that Thread 1 will write to variable A before variable B. If Thread 2 reads variable B, it will see this write (or a later one). However, another volatile read to variable A can still return an older value. This is because the volatile read from variable B does not provide 'release-fence' semantics; it doesn’t force a refresh of all other modifications.

This is why it’s often recommended to use higher-level synchronization constructs (like lock, Monitor, Mutex, Semaphore, etc.) or the System.Threading.Interlocked class for complex synchronization, because they provide stronger guarantees about how they behave in multithreaded scenarios. The volatile keyword is a low-level mechanism that’s harder to use correctly and it’s easy to accidentally build incorrect code when using volatile for complex synchronization.

> How to understand "On a multiprocessor system, a volatile read operation does not guarantee to obtain the latest value written to that memory location by any processor."?

* This statement draws attention to an important fact about volatile keyword usage in .NET on a multiprocessor system where threads could be executing on different processors.

The volatile keyword in C# tells the compiler not to reorder reads or writes to a field, or cache the field in a register, which helps ensure that all reads and writes on one processor occur in the order they’re specified in the code. Because of this, the volatile keyword is helpful for ensuring consistency in a multithreaded context.

However, it does not guarantee immediate visibility on all processors. For example, if one processor writes to a volatile field and another processor reads from that field, the reading processor might not get the most recent value written by the writing processor. The reasons can include delay due to propagation of write from one processor’s cache to others (in a multiprocessor, multi-caching systems) or the order in which these reads and writes are performed.

This is why it’s stated that "On a multiprocessor system, a volatile read operation does not guarantee to obtain the latest value written to that memory location by any processor."

In general, for complex synchronization or if strong cross-processor memory guarantees are required, usage of higher-level synchronization primitives or constructs (like locks, Monitor, Mutex, Semaphore, ConcurrentDictionary, Interlocked, MemoryBarrier, etc.) should be preferred. These toolsets are built to handle advanced multithreading and synchronization tasks, giving strict guarantees about order of operations and memory visibility on multi-core and multiprocessor systems.

A.13. What’s the diff of volatile keyword and Volatile class?

> What’s the diff of volatile keyword and Volatile class?

* The volatile keyword and Volatile class in .NET both provide support for using volatile memory, but they do so in slightly different ways.

The volatile keyword is a modifier that can be applied to fields of a class or a struct. The .NET run-time knows not to cache the value of that field and always accesses it directly from memory, which makes it useful in multithreading scenarios where you need to ensure that a specific field’s latest value is accessible across multiple threads. It provides acquire-fence semantics for reads (a read of a volatile field is guaranteed to happen after any previous memory accesses), and release-fence semantics for writes (a write to a volatile field is guaranteed to happen before any subsequent memory accesses).

The Volatile class, introduced in .NET Framework 4.5, provides a set of methods that perform volatile reads and writes. This is beneficial as it allows volatile operations on types that were not declared as volatile. It provides stronger memory guarantees: a read done via Volatile.Read has acquire-fence semantics, i.e., it ensures that all subsequent reads and writes occur strictly after the volatile read, and a write done via Volatile.Write has release-fence semantics, i.e., it ensures that all previous reads and writes happen strictly before the volatile write. It acts as a full-fence, not allowing reordering or caching optimizations around it.

So, while both are used for volatile memory operations, the Volatile class provides stronger guarantees and offers more flexibility because it lets you perform volatile operations on non-volatile declared variables, while the volatile keyword is simpler to use but only operates on fields declared as volatile.

A.14. It seems we should avoid to use the volatile keyword?

> It’s about the happen-before on a memory model. It seems we should avoid to use the volatile keyword?

* In general, understanding and correctly using the volatile keyword can be tricky. volatile only provides a very specific type of memory barrier, and can be misused easily. Beginners often have misunderstandings about what volatile actually does.

The primary purpose of volatile is to provide a way to declare fields that can be modified by multiple threads, to indicate that such fields should not be subject to certain compiler and hardware optimizations. Essentially, no caching, no reordering, etc.

However, using volatile won’t prevent all problems with visibility or ordering of operations. It does not provide atomicity, and often it’s not strong enough to replace other types of synchronization. For example, complex state manipulation or operations involving more than one memory location, like switching pointers or managing linked nodes, often need to be done in a single, uninterruptible step. Those operations are far beyond the scope of volatile.

Higher-level concurrency constructs (lock, Monitor, Mutex, Semaphore, ManualResetEvent, AutoResetEvent, and types in System.Collections.Concurrent or System.Threading.Tasks) should be preferred. These handle most of the complexity of ensuring correct memory barriers and state visibility, and they are generally safer and more efficient choices.

Interlocked class also offers atomic operations for a single memory location and can be used for lightweight synchronization in high-performance scenarios, although understanding and using it correctly can be as tricky as volatile.

In the end, volatile isn’t something you should be reaching for regularly. Its usage should be limited to very specific cases. In general, you are better off using a higher-level synchronization construct in .NET when dealing with multithreading scenarios.

A.15. What’s the diff of asynchronous and parallel programming in .NET?

> What’s the diff of asynchronous and parallel programming in .NET?

* In .NET, both asynchronous and parallel programming paradigms are used to execute code that can potentially run concurrently to improve performance. However, they are used in different scenarios and address different types of problems.

Asynchronous Programming:

Asynchronous programming is about freeing up the main executing thread to do other work when it would otherwise be blocked, usually waiting for I/O operations (like reading a file, downloading from the internet, querying a database, etc.).

The goal of asynchronous programming is to make the application more responsive. With async programming, the application can yield control while it waits on these long-running tasks, allowing it to perform other tasks concurrently.

In .NET, you often see async and await keywords for asynchronous programming. It does not necessarily create more threads, but uses the .NET framework’s ability to efficiently handle I/O bound tasks.

Parallel Programming:

Parallel programming, on the other hand, is about executing multiple operations, or tasks, at the same time by utilizing multiple cores of the CPU.

This is particularly helpful for CPU-bound work that can be broken down into smaller, independent tasks and processed in parallel. For example, if you’re processing an array of elements, you can use parallel programming to process multiple elements at the same time in different cores.

In .NET, this could be achieved using the Parallel class (like Parallel.For, Parallel.ForEach), or PLINQ (Parallel LINQ), or creating Tasks manually, and these generally involve multi-threading.

In summary, use asynchronous programming when dealing with I/O-bound tasks to avoid blocking and keep the application responsive, and use parallel programming when trying to break down CPU-intensive tasks to run simultaneously to improve throughput.

A.16. What’s the control meaning in async and await programming?

> What’s the 'control' meaning in async and await programming?

* In the context of asynchronous programming with async and await, "control" refers to the flow of execution in a program.

When you call an async method, control is returned to the caller as soon as an await statement is encountered and the awaited operation is not yet complete. The remainder of the async method then continues to execute "in the background" and the calling code can proceed to do other work without waiting for the async method to complete.

When the awaited operation (such as an I/O operation, web request, etc.) completes, the remainder of the async method is then executed. This may involve a "context switch" back to the original context where the await was called (for example, the user interface thread), but the important point is that the initial call to the async method doesn’t block and hold up that original context while the awaited operation is ongoing.

So "yielding control" in async/await programming means giving back the control of execution to the caller code immediately, without waiting for the async method to finish, thus keeping the application responsive for doing other tasks. It’s an effective way to achieve concurrency, particularly for I/O-bound operations.

> What’s the underneath in 'control', like a thread or sth else?

* "Control" primarily refers to the sequence of execution of code or the flow of a program. When we talk about control in the context of async and await, we are referring to which code or operation gets to use the executing thread at a given time.

On a lower level in .NET runtime, this "control" is facilitated by the Task Scheduler and in many scenarios, the synchronization context. Here’s a simplified view:

When a method awaits an operation (e.g. await someTask();), the rest of the method is posted back to the synchronization context as a continuation. If there is no SynchronizationContext (SynchronizationContext.Current is null), then by default the TaskScheduler is used. This continuation implicitly captures the synchronization context that was active at the time of the await.

In a UI-based application, the synchronization context is typically tied to the main UI thread, which allows the continuation (the code following the await keyword) to be executed on the same UI thread. This makes async programming easier and safer in a UI context.

In an environment without a SynchronizationContext (like in Console applications or thread pool threads), the default TaskScheduler schedules the continuation to run on the ThreadPool.

With this in mind, when you call await in a method, it signs up the rest of the method to be continued later, and then returns. This frees up the current thread to go off and do other work. When the awaited task finishes, the continuation (the remainder of the method) is then executed on the captured context.

Again, this is a simplified view. In reality, there are several optimizations and mechanisms in place to make async programming more efficient and easier to use.

> Does it mean the 'someTask' will be executed on another thread?

* Not necessarily. When you call an async method, it does its work on the same thread that called it until it awaits an incomplete task. At the await, if the task isn’t already complete, the async method is paused and control returns to the caller of the async method. This all happens on the same thread, and no extra threads are created.

What happens next depends on what the awaited task is doing. Here’s a simple way to think about it:

If it’s a CPU-bound task doing computation, it may be done on another thread, especially if you specifically started the task on a thread pool thread (for example, using Task.Run).
If it’s an I/O-bound task (like reading from disk or a network operation), then actually no thread is needed while it’s waiting for the I/O operation to complete. Threads are a precious resource and I/O-bound tasks are not doing any CPU-bound computation, they’re just waiting, so it would be wasteful to dedicate a thread to that.

When the awaited operation completes, the remainder of the async method is scheduled to continue running. The scheduling depends on the context when the 'await' keyword is encountered.

In a UI application, it resumes on the UI thread (as this is the captured context); for a server application, it generally runs on a ThreadPool thread, but not necessarily the same thread as the one it started.

Remember, in most cases, async doesn’t mean concurrent or multi-threaded. Async methods don’t require multithreading because an async method doesn’t run on its own thread. The method runs on the current synchronization context and uses time on the thread only when the method is active.

You can use Task.Run to queue work on the ThreadPool if you want to use separate threads for CPU-bound work.

A.17. How to understand "Async methods don’t require multithreading because an async method doesn’t run on its own thread."?

> How to understand "Async methods don’t require multithreading because an async method doesn’t run on its own thread."?

* In asynchronous programming, the fundamental misunderstanding is often thinking that "asynchronous" automatically means "on a different thread". This is not strictly true.

When you mark a method with async, all this does is allow the use of the await keyword in the method, it doesn’t inherently create a new thread for the method to run on.

An async method starts running on the same thread, because invoking an async method is just like invoking any regular method. So when an async method is called, it runs on the current thread in a synchronous manner until it encounters an await that applies to a task which has not yet completed.

At that point, assuming the task being awaited has not yet completed, the method yields and the remainder of the method is turned into a continuation that will be invoked once the awaited operation completes. What’s important to understand here is that the continuation does not have to run on the same thread where it left off - it could run on whatever thread the completed task calls back on.

So an async method doesn’t require multithreading because it typically offloads I/O-bound work or CPU-bound work rather than running its own ongoing work on a dedicated thread.

However, if you were to use Task.Run within an async method, that would indeed use a separate thread from the thread pool to perform the function passed to Task.Run, thereby enabling multithreading.

So to sum it up, async does not create additional threads. It simply provides a mechanism to better utilize the threads by allowing them to perform other work instead of sitting idle when waiting for operations to complete.

A.18. Can the async/await improve the responsiveness on ASP.NET Core?

> Can the async/await improve the responsiveness on ASP.NET Core?

* Yes, utilizing async and await in ASP.Net Core can drastically improve application responsiveness, especially when dealing with I/O-bound operations like database calls, file reads, or API requests.

The primary benefit of using async and await in ASP.Net Core is not faster execution of individual requests, but rather the ability to handle more concurrent requests with the same resources. This happens because async and await enable your application to free up threads when they would otherwise be blocked, waiting for I/O-bound tasks to complete.

In a synchronous ASP.NET Core application without async/await, each incoming request is handled by a dedicated thread from the ThreadPool. If the request requires I/O-bound operation like a database query or an external API call, that thread sits idle and waits for the operation to complete, which is a waste of resources.

In ASP.NET Core, when you use async methods and await async tasks, the thread handling the request can be used for starting another request as soon as it enters a waiting state (awaiting I/O-bound operation).

So by leveraging async and await in your ASP.NET Core applications, the overall resource usage becomes more efficient and your application can handle more concurrent clients/requests. This can lead to greater scalability of your application and also, in many cases, to a better user experience through increased responsiveness.

> It seems that it is because the I/O operations are offloaded from the current thread, and release the CPU capacity. But who should be in charge of the I/O operations?

* Yes, that’s correct. The benefit of async/await for I/O operations is to free up the current thread, rather than waiting for the I/O operation to complete. The CPU is then free to do other work.

The operating system is in charge of coordinating and managing I/O operations. When you start an I/O-bound operation, like reading from disk or making a network request, the request is sent to the operating system. The operating system takes care of the actual I/O operation, interfacing with the necessary hardware, like the disk or network card.

From your application’s perspective, once you start an I/O operation and await the returned task, your application doesn’t need to worry about the operation anymore. It gets a notification (in the form of the awaited task completing) when the operation is done.

So you can think of I/O-bound async operations as being "offloaded" to the operating system. Your application doesn’t need to use a thread to wait for the I/O operation to complete, making it available for other work. This is the key benefit of asynchrony for I/O-bound operations.

> Actually, if we use sync methods, the current thread will be in wait status for the I/O. but it still can free up CPU.

* Yes, you’re correct. Even with synchronous I/O operations, the CPU isn’t actively engaged the whole time. The operating system still manages the I/O operation, and the CPU is freed up to perform other tasks. However, the key difference lies in the usage of threads.

In synchronous operations, the thread performing the operation goes into a wait state and cannot be used for anything else. This means the application can handle fewer concurrent operations, as it might run out of available threads if multiple I/O operations are waiting to complete.

With asynchronous operations using async/await, after initiating the I/O operation, the thread is freed up entirely - it doesn’t enter a wait state. Instead, it can be returned to the thread pool and used to service other incoming requests. This is what makes async/await so beneficial in a scenario handling many concurrent I/O-bound operations, such as a server application.

So while both sync and async free up CPU during I/O operations, async also allows for the thread to be reused during this time, leading to more efficient utilization of system resources.

> This is the non-blocking I/O in essence?

* Exactly, using async/await for I/O-bound operations is a form of non-blocking I/O.

In blocking or synchronous I/O, the execution thread is halted—or "blocked"—until the I/O operation completes. This means the thread can’t be used for any other tasks, regardless of how long the I/O operation takes.

Non-blocking or asynchronous I/O, on the other hand, starts the I/O operation and then allows the execution thread to continue other tasks. It doesn’t wait for the I/O operation to complete. Instead, it will be notified once the I/O operation is done, at which point it can proceed with any steps that depend on the result.

Thus, by enabling non-blocking I/O, the async/await pattern can improve the scalability and responsiveness of applications, particularly in scenarios handling many concurrent I/O-bound operations, such as server applications or services.

* Here’s a summary of the key points regarding the use of async/await from the ASP.NET Core section:

Improves Responsiveness: Utilizing async and await in ASP.Net Core can drastically improve application responsiveness, particularly when dealing with I/O-bound operations like database calls, file reads, or API requests.
Enhances Scalability: The primary benefit of using async/await in ASP.NET Core is to handle more concurrent requests with the same resources. This is achieved by freeing up threads, which would otherwise be blocked during I/O-bound tasks.
Operating System’s Role: In async/await programming, the operating system handles the actual I/O operation. The application receives a notification when the operation is complete, reducing the need for threads to wait.
Benefit of Async Over Sync: While both synchronous and asynchronous operations free up the CPU, asynchronous programming also frees up threads. This allows them to be reused for other tasks, leading to more efficient utilization of system resources.
Non-Blocking I/O: Using async/await for I/O-bound operations is a form of non-blocking I/O. The I/O operation is initiated, and the thread is then freed up to continue with other tasks. This is the crux of how async/await improves the scalability and responsiveness of applications.

A.19. Is there a SynchronizationContext on ASP.NET Core?

> Is there a SynchronizationContext on ASP.NET Core?

* No, there’s no SynchronizationContext in ASP.NET Core.

In traditional .NET Framework, SynchronizationContext is utilized to marshal operations back to the original context or thread, for instance, a particular thread in a desktop application that interacts with the user interface. This behavior is often necessary because UI updates must be performed on the main/UI thread.

In ASP.NET Framework, SynchronizationContext is also used to provide a "request context" - meaning that after an await, you’re back on the same request context.

However, ASP.NET Core does not have a SynchronizationContext. It’s designed to be efficient and scalable right from the start. In most Server applications, synchronization back to the original context is not necessary. Allowing the system to execute continuations on any available thread tends to be more efficient and doesn’t block resources, which helps ASP.NET Core handle many requests simultaneously.

With that said, developers need to be aware that the code after an await may be running on a different thread than the code before it. All access to the HttpContext needs to be before the await or within the await call to make sure the right context is accessed.

A.20. What’s the diff of AsOrdered and AsUnordered in PLINQ?

> What’s the diff of AsOrdered and AsUnordered in PLINQ?

* In PLINQ, the methods AsOrdered and AsUnordered are used to determine whether the original order of the source sequence should be preserved in the output.

The AsOrdered method enforces that the output sequence will be in the same order as the input sequence. This means even though the operations may occur in parallel or out of order, the actual results are buffered and then delivered in the original order. This can impact performance due to the overhead of maintaining order.

The AsUnordered method allows the output sequence to provide data as soon as it is calculated, regardless of the input sequence’s original order. This means if the third item is processed before the second item, it can be returned before the second item.

var numbers = Enumerable.Range(1, 10);

var squareNumbersOrdered = numbers.AsParallel().AsOrdered()
                             .Select(n => { Thread.SpinWait(50_000 / n); return n * n; });

var squareNumbersUnordered = numbers.AsParallel().AsUnordered()
                             .Select(n => { Thread.SpinWait(50_000 / n); return n * n; });

Console.WriteLine(string.Join(", ", squareNumbersOrdered));
Console.WriteLine(string.Join(", ", squareNumbersUnordered));
// $ dotnet run
// 1, 4, 9, 16, 25, 36, 49, 64, 81, 100
// 16, 49, 81, 25, 64, 100, 36, 1, 4, 9

It’s important to note that by default, PLINQ tries to preserve the original order, so AsUnordered can be used when the order does not matter and you want to increase performance. If ordering is important, using AsOrdered makes this explicit, but could decrease performance due to the ordering requirement.

Partitioning is the process in which PLINQ divides the source data into multiple chunks, or partitions, so that it can process them independently and concurrently on multiple cores.

Order preservation affects this partitioning process:

Ordered PLINQ queries (AsOrdered): PLINQ partitions the input such that order is preserved. Each partition except for the first has to wait for its preceding partitions to finish before it produces its output. This introduces a synchronization delay but the ordering of data is maintained.
Unordered PLINQ queries (AsUnordered): PLINQ divides the input into multiple partitions, and each partition can produce and output its results independently of other partitions. This increases concurrency and performance, but data ordering is not preserved.

Generally, if the order in which results are produced doesn’t matter, you should use AsUnordered as it gives PLINQ more flexibility in how it performs partitioning and can often lead to better performance.

1. Threads and threading

1.1. Processes and threads

1.2. How to use multithreading in .NET

1.3. Using threads and threading

1.3.1. Create and start a new thread

1.3.2. Stop a thread

1.3.3. Pause or interrupt a thread

1.4. Cancellation in Managed Threads

1.5. Foreground and background threads

1.6. The managed thread pool

1.6.1. Thread pool characteristics

1.6.2. Exceptions in thread pool threads

1.6.3. Maximum number of thread pool threads

1.6.4. Thread pool minimums

1.6.5. When not to use thread pool threads

1.7. Overview of synchronization primitives

1.7.1. WaitHandle class and lightweight synchronization types

1.7.2. Synchronization of access to a shared resource

1.7.2.1. Monitor class

1.7.2.2. Mutex class

1.7.2.3. SpinLock structure

1.7.2.4. ReaderWriterLockSlim class

1.7.2.5. Semaphore and SemaphoreSlim classes

1.7.3. Thread interaction, or signaling

1.7.3.1. EventWaitHandle, AutoResetEvent, ManualResetEvent, and ManualResetEventSlim classes

1.7.3.2. CountdownEvent class

1.7.3.3. Barrier class

1.7.4. Interlocked class

1.7.5. SpinWait structure

1.8. Thread-safe collections

1.8.1. Fine-grained locking and lock-free mechanisms

1.8.2. What’s the diff of BlockingCollection<T> and Channel<T> ?

1.9. Windows Presentation Foundation (WPF): Threading model

1.10. The volatile keyword

2. Asynchronous programming patterns

3. Lazy Initialization

4. Parallel programming in .NET

4.1. Task Parallel Library (TPL)

4.2. Data Parallelism (Task Parallel Library)

4.3. Dataflow (Task Parallel Library)

4.4. Task-based asynchronous programming

4.5. Parallel LINQ (PLINQ)

5. Asynchronous programming with async and await

5.1. Async improves responsiveness

5.2. Threads

5.3. async and await

5.4. SynchronizationContext and ConfigureAwait

Appendix A: FAQ

A.1. What happens on Thread.Sleep(0) in .NET?

A.2. What are the worker and completion port threads?

A.3. How does .NET identify I/O-bound or compute-bound operations?

A.4. How does CLR manage the number of threads (worker and I/O threads) in the ThreadPool?

A.5. What’s the algorithm of the thread pool in .NET?

A.6. What if Interlocked.Increment a 64-bit integer on a 32-bit hardware?

A.7. How does .NET make the multiple CPU instructions as an atomic?

A.8. I heard there are some risk on atomic operations in Go or sth else?

A.9. What’s ABA problems?

A.10. How to understand 'hardware, compilers, and the language memory model'?

A.11. Anyway, for a single operation like Interlocked.Increment, it will always ensure it as an atomic?

A.12. How to understand the volatile keyword in .NET?

A.13. What’s the diff of volatile keyword and Volatile class?

A.14. It seems we should avoid to use the volatile keyword?

A.15. What’s the diff of asynchronous and parallel programming in .NET?

A.16. What’s the control meaning in async and await programming?

A.17. How to understand "Async methods don’t require multithreading because an async method doesn’t run on its own thread."?

A.18. Can the async/await improve the responsiveness on ASP.NET Core?

A.19. Is there a SynchronizationContext on ASP.NET Core?

A.20. What’s the diff of AsOrdered and AsUnordered in PLINQ?

References