1. WebSockets

WebSocket, standardized by the IETF as RFC 6455 in 2011, is a computer communications protocol, providing a simultaneous two-way communication channel over a single Transmission Control Protocol (TCP) connection. [1]

The WebSocket is designed to work over HTTP ports 80 and 443 as well as to support HTTP proxies and intermediaries on the existing HTTP infrastructure, and could also use a simpler handshake over a dedicated port without reinventing the entire protocol. [2]

WebSocket, conceptually, is really just a layer on top of TCP that does the following: [2]

  • adds a web origin-based security model for browsers

  • adds an addressing and protocol naming mechanism to support multiple services on one port and multiple host names on one IP address

  • layers a framing mechanism on top of TCP to get back to the IP packet mechanism that TCP is built on, but without length limits

  • includes an additional closing handshake in-band that is designed to work in the presence of proxies and other intermediaries

The protocol has two parts: a handshake and the data transfer. After a successful handshake, clients and servers transfer messages back and forth. [2]

  • On the wire, a message is composed of one or more fragmented frames.

  • A frame has an associated type and broadly speaking, there are types for textual data, binary data, and control frames.

WebSocket specification defines two URI schemes: [2]

  • ws-URI = "ws:" "//" host [ ":" port ] path [ "?" query ]

  • wss-URI = "wss:" "//" host [ ":" port ] path [ "?" query ]

The protocol uses the HTTP/1.1 Upgrade mechanism (Section 6.7 of RFC7230) to transition a TCP connection from HTTP into a WebSocket connection, and uses the extended CONNECT method to initiate a WebSocket connection on an HTTP/2 stream. [3]

1.1. Date framing

A high-level overview of the framing is given in the following figure. [2]

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
|     Extended payload length continued, if payload len == 127  |
+ - - - - - - - - - - - - - - - +-------------------------------+
|                               |Masking-key, if MASK set to 1  |
+-------------------------------+-------------------------------+
| Masking-key (continued)       |          Payload Data         |
+-------------------------------- - - - - - - - - - - - - - - - +
:                     Payload Data continued ...                :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
|                     Payload Data continued ...                |
+---------------------------------------------------------------+

FIN:  1 bit

   Indicates that this is the final fragment in a message.  The first
   fragment MAY also be the final fragment.

Opcode:  4 bits

   Defines the interpretation of the "Payload data".  If an unknown
   opcode is received, the receiving endpoint MUST _Fail the
   WebSocket Connection_.  The following values are defined.

   *  %x0 denotes a continuation frame

   *  %x1 denotes a text frame

   *  %x2 denotes a binary frame

   *  %x3-7 are reserved for further non-control frames

   *  %x8 denotes a connection close

   *  %x9 denotes a ping

   *  %xA denotes a pong

   *  %xB-F are reserved for further control frames
  • Control frames are identified by opcodes where the most significant bit of the opcode is 1.

    • Currently defined opcodes for control frames include 0x8 (Close), 0x9 (Ping), and 0xA (Pong).

    • Control frames are used to communicate state about the WebSocket.

  • Data frames (e.g., non-control frames) are identified by opcodes where the most significant bit of the opcode is 0.

    • Currently defined opcodes for data frames include 0x1 (Text), 0x2 (Binary).

    • Data frames carry application-layer and/or extension-layer data.

1.2. Opening handshake in HTTP/1.1

The opening handshake is intended to be compatible with HTTP-based server-side software and intermediaries, so that a single port can be used by both HTTP clients talking to that server and WebSocket clients talking to that server. [2]

  • The WebSocket client’s handshake is an HTTP Upgrade request with the Request-Line [4] format: [2]

    GET /chat HTTP/1.1
    Host: server.example.com
    Upgrade: websocket
    Connection: Upgrade
    Origin: http://example.com
    Sec-WebSocket-Protocol: chat, superchat (1)
    Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== (2)
    Sec-WebSocket-Version: 13
    1 The Sec-WebSocket-Protocol request-header field can be used to indicate what subprotocols (application-level protocols layered over the WebSocket Protocol) are acceptable to the client.
    2 The server takes the Sec-WebSocket-Key header field and echo the Sec-WebSocket-Accept header field to prove the received handshake.
  • The handshake from the server is much simpler than the client handshake, and looks as follows with the Status-Line [4] format:

    HTTP/1.1 101 Switching Protocols (1)
    Upgrade: websocket (2)
    Connection: Upgrade (2)
    Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= (3)
    Sec-WebSocket-Protocol: chat (4)
    1 Any status code other than 101 indicates that the WebSocket handshake has not completed and that the semantics of HTTP still apply.
    2 The Connection and Upgrade header fields complete the HTTP Upgrade.
    3 The Sec-WebSocket-Accept header field indicates whether the server is willing to accept the connection.
    4 The Sec-WebSocket-Protocol is an option field, which indicates the subprotocol that the server has selected.

Either peer can send a control frame with data containing a specified control sequence to begin the closing handshake.

1.3. WebSocket and HTTP Protocol [Gemini]

While WebSockets leverage the initial HTTP connection for the handshake, proxy servers do typically handle WebSocket traffic and normal HTTP traffic distinctly.

WebSocket vs. HTTP Protocol Differences:

  • Purpose: HTTP is designed for request-response interactions, while WebSocket establishes a full-duplex communication channel for real-time data exchange.

  • Data Format: HTTP uses a text-based request-response format with headers and payloads. WebSocket uses a binary frame format for efficient data transfer.

  • Connection State: HTTP connections are typically short-lived, closing after the response is sent. WebSockets maintain persistent connections for bi-directional communication.

Proxy Handling:

  • Initial Handshake: For both HTTP and WebSocket traffic, the proxy first establishes a standard HTTP connection with the target server.

  • Handshake Differentiation: The proxy can identify WebSocket traffic by recognizing the specific handshake headers used in the initial HTTP request.

  • Separate Handling: Once a WebSocket handshake is detected, the proxy switches to handling the subsequent frames using the WebSocket protocol. It might involve unmasking/remasking data and forwarding it appropriately.

  • HTTP Traffic Handling: Normal HTTP requests and responses continue to be handled using the standard HTTP protocol by the proxy.

Benefits of Separate Handling:

  • Performance: By handling WebSocket traffic differently, the proxy can optimize processing for the specific needs of each protocol. This can improve performance for both WebSocket and HTTP traffic.

  • Security: Some proxies might have specific security mechanisms tailored for HTTP traffic (e.g., content filtering). These wouldn’t be applicable to the binary data format of WebSockets. Separate handling allows for targeted security measures.

  • Complexity Management: Separating the handling logic simplifies the proxy implementation as it deals with each protocol according to its unique characteristics.

In summary:

  • A single proxy server can manage both HTTP and WebSocket traffic.

  • However, it differentiates between the two protocols during the initial handshake and then employs separate handling mechanisms for each to ensure optimal performance and proper data flow.

1.4. WebSockets in .NET

Using WebSockets over HTTP/2 takes advantage of new features are available in Kestrel on all HTTP/2 enabled platforms such as: [5]

  • Header compression.

  • Multiplexing, which reduces the time and resources needed when making multiple requests to the server.

HTTP/2 WebSockets use CONNECT requests rather than GET.

WebSockets Server in ASP.NET Core

var webSocketOptions = new WebSocketOptions
{
    KeepAliveInterval = TimeSpan.FromMinutes(2)
};

webSocketOptions.AllowedOrigins.Add("https://client.com");
webSocketOptions.AllowedOrigins.Add("https://www.client.com");

// Add the WebSockets middleware in `Program.cs`:
app.UseWebSockets(webSocketOptions);

app.Use(async (context, next) =>
{
    // [Route("/ws")] // HTTP/2 WebSockets use CONNECT requests rather than GET.
    if (context.Request.Path == "/ws")
    {
        // Accept WebSocket requests
        if (context.WebSockets.IsWebSocketRequest)
        {
            using var webSocket = await context.WebSockets.AcceptWebSocketAsync();
            await EchoAsync(webSocket);
        }
        else
        {
            context.Response.StatusCode = StatusCodes.Status400BadRequest;
        }
    }
    else
    {
        await next(context);
    }

});

app.Run();

// Send and receive messages
static async Task EchoAsync(WebSocket webSocket)
{
    var buffer = new byte[1024 * 4];
    var receiveResult = await webSocket.ReceiveAsync(
        new ArraySegment<byte>(buffer), CancellationToken.None);

    while (!receiveResult.CloseStatus.HasValue)
    {
        await webSocket.SendAsync(
            new ArraySegment<byte>(buffer, 0, receiveResult.Count),
            receiveResult.MessageType,
            receiveResult.EndOfMessage,
            CancellationToken.None);

        receiveResult = await webSocket.ReceiveAsync(
            new ArraySegment<byte>(buffer), CancellationToken.None);
    }

    await webSocket.CloseAsync(
        receiveResult.CloseStatus.Value,
        receiveResult.CloseStatusDescription,
        CancellationToken.None);
}

WebSockets Client in .NET

string[] messages = [
    "我们的生命不是消逝于那些重大的事件中,而是流逝在那些日常琐碎的小事里。",
    "生活是由无数微不足道的细节构建起来的,而记忆正是这些细节的忠实记录者,它们在某个不经意的瞬间被唤醒,带我们穿越回往昔。",
    "人们在追求他们以为是幸福的东西时,常常错过真正的幸福。",
    "在失去之后,我们才开始寻找那些曾经拥有但未被珍惜的东西,而记忆,则成了我们找回那些失落时光的唯一线索。"
    ];

Uri uri = new("ws://localhost:5000/ws");
using ClientWebSocket ws = new();
var cts = new CancellationTokenSource();
await ws.ConnectAsync(uri, cts.Token);

foreach (var message in messages)
{
    var bytes = Encoding.UTF8.GetBytes(message);
    await ws.SendAsync(bytes, WebSocketMessageType.Text, true, cts.Token);
}

ThreadPool.QueueUserWorkItem(async _ =>
{
    while (!cts.Token.IsCancellationRequested)
    {
        var (echoMessage, _, _, _, _) = await ReadMessageAsync(ws, cts.Token);
        Console.WriteLine(Encoding.UTF8.GetString(echoMessage.ToArray()));
    }
});

await Task.Delay(1000);

await ws.CloseAsync(WebSocketCloseStatus.NormalClosure, "Client closed", cts.Token);

// Read a complete message from a WebSocket.
static async Task<(IList<byte>, WebSocketMessageType, bool, WebSocketCloseStatus?, string?)> ReadMessageAsync(WebSocket webSocket, CancellationToken token = default)
{
    var message = new List<byte>(1024 * 2);
    var buffer = new byte[8 * 4];
    var receiveResult = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), token).ConfigureAwait(false);
    while (true)
    {
        message.AddRange(new ArraySegment<byte>(buffer, 0, receiveResult.Count));
        if (receiveResult.CloseStatus.HasValue || receiveResult.EndOfMessage)
        {
            break;
        }
        receiveResult = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), token).ConfigureAwait(false);
    }

    return (message.AsReadOnly(), receiveResult.MessageType, receiveResult.EndOfMessage, receiveResult.CloseStatus, receiveResult.CloseStatusDescription);
}

1.5. WebSockets in Browser

The WebSocket API is an advanced technology that makes it possible to open a two-way interactive communication session between browser and server, which can send messages to a server and receive event-driven responses without having to poll the server for a reply. [7]

const excerpts = [
  'Grown-ups never understand anything by themselves, and it is tiresome for children to be always and forever explaining things to them.',
  'And now here is my secret, a very simple secret: It is only with the heart that one can see rightly; what is essential is invisible to the eye.',
  "People have forgotten this truth,' the fox said. 'But you mustn’t forget it. You become responsible forever for what you’ve tamed. You’re responsible for your rose.",
  'All grown-ups were once children... but only few of them remember it.',
  'It is the time you have wasted for your rose that makes your rose so important.',
  'One sees clearly only with the heart. Anything essential is invisible to the eyes.',
  'You - you alone will have the stars as no one else has them... In one of the stars I shall be living. In one of them I shall be laughing. And so it will be as if all the stars were laughing, when you look at the sky at night... You - only you - will have stars that can laugh.',
  'You become responsible, forever, for what you have tamed.'
]

// Creating a WebSocket object
const ws = new WebSocket('ws://localhost:5000/ws')
// Listen for possible errors
ws.addEventListener('error', (event) => {
  console.log('WebSocket error: ', event)
})

// Sending data to the server
ws.onopen = () => {
  for (const excerpt of excerpts) {
    ws.send(excerpt)
  }
}

// Receiving messages from the server
ws.onmessage = (e) => {
  console.log(e.data)
}

2. Server-sent events

Server-Sent Events (SSE) is a server push technology enabling a client to receive automatic updates from a server via an HTTP connection, and describes how servers can initiate data transmission towards clients once an initial client connection has been established. [8]

2.1. Event stream format

The event stream is a simple stream of text data messages which are separated by a pair of newline characters (\n\n), and must be encoded using UTF-8. [9]

  • A colon (:) as the first character of a line is in essence a comment, and is ignored.

  • Each message consists of one or more lines of text listing the fields for that message.

  • Each field is represented by the field name (event, data, id, and retry), followed by a colon, followed by the text data for that field’s value.

    : this is a test stream (1)
    
    
    event: userconnect (2)
    data: {"username": "bobby", "time": "02:33:48"}
    
    
    data: another message (3)
    data: with two lines (3)
    
    
    event: usermessage (2)
    data: {"username": "bobby", "time": "02:34:11", "text": "Hi everyone."}
    1 The first is just a comment, since it starts with a colon character.
    2 This sends custom named events.
    3 The third message contains a data field with the value "another message\nwith two lines". Note the newline special character in the value.

2.2. Sending events from the server

The server-side that sends events needs to respond using the MIME type text/event-stream. Each notification is sent as a block of text terminated by a pair of newlines. [9]

Here is the .NET code for the example:

app.UseCors(policy => policy.AllowAnyOrigin()); // builder.Services.AddCors();

var excerpts = new string[]
{
  "Notre vie ne se gaspille pas dans les grands événements, mais s'écoule dans les petites choses de tous les jours.",
  "La vie est faite de millions de détails insignifiants, et la mémoire est le fidèle enregistreur de ces détails, qui se réveillent à un moment inattendu et nous transportent dans le passé.",
  "En poursuivant ce qu'ils croient être le bonheur, les gens passent souvent à côté du vrai bonheur.",
  "Ce n'est qu'après avoir perdu quelque chose que nous commençons à chercher ce que nous avions et que nous n'avons pas chéri, et la mémoire devient alors le seul fil conducteur pour retrouver ces moments perdus."
};

app.Use(async (context, next) =>
{
    if (context.Request.Path == "/sse")
    {
        if (context.Request.Headers.Accept.Any(x => x != null && x.Contains("text/event-stream")))
        {
            context.Response.Headers.ContentType = "text/event-stream";
            context.Response.Headers.CacheControl = "no-cache";

            await context.Response.Body.WriteAsync(System.Text.Encoding.UTF8.GetBytes($"event: ping\ndata: pong!\n\n"));
            await context.Response.Body.FlushAsync();

            foreach (var excerpt in excerpts)
            {
                await context.Response.Body.WriteAsync(System.Text.Encoding.UTF8.GetBytes($"data: {excerpt}\n\n"));
                await context.Response.Body.FlushAsync();
            }

            // the stream terminated by a data: [DONE]
            await context.Response.Body.WriteAsync(System.Text.Encoding.UTF8.GetBytes("data: [DONE]\n\n"));
            await context.Response.Body.FlushAsync();
        }
        else
        {
            context.Response.StatusCode = StatusCodes.Status415UnsupportedMediaType;
        }
    }
    else
    {
        await next(context);
    }
});

app.Run();
$ curl -i localhost:5000/sse -H 'accept: text/event-stream'
HTTP/1.1 200 OK
Content-Type: text/event-stream
Date: Thu, 16 May 2024 11:14:26 GMT
Server: Kestrel
Cache-Control: no-cache
Transfer-Encoding: chunked

event: ping
data: pong!

data: Notre vie ne se gaspille pas dans les grands événements, mais s'écoule dans les petites choses de tous les jours.

data: La vie est faite de millions de détails insignifiants, et la mémoire est le fidèle enregistreur de ces détails, qui se réveillent à un moment inattendu et nous transportent dans le passé.

data: En poursuivant ce qu'ils croient être le bonheur, les gens passent souvent à côté du vrai bonheur.

data: Ce n'est qu'après avoir perdu quelque chose que nous commençons à chercher ce que nous avions et que nous n'avons pas chéri, et la mémoire devient alors le seul fil conducteur pour retrouver ces moments perdus.

data: [DONE]

2.3. Receiving events from the server

The server-sent event API is contained in the EventSource interface.

// Creating an EventSource instance
const evtSource = new EventSource('//localhost:5000/sse', {
  // withCredentials: true
})
// Listening for message events
evtSource.onmessage = (event) => {
  console.log(`message: ${event.data}`)
  // the stream terminated by a data: [DONE]
  if (event.data === '[DONE]') {
    // Closing event streams
    evtSource.close()
  }
}
// Listening for custom events
evtSource.addEventListener('ping', (event) => {
  console.log(`ping: ${event.data}`)
})
// Error handling
evtSource.onerror = (err) => {
  console.error('EventSource failed:', err)
}

3. Websocket, Server-Sent Events (SSE), and HTTP 206 Partial Content [Gemini]

While all three terms involve transferring data from a server to a client, they serve distinct purposes and operate in different contexts. This blog post clarifies the key distinctions to help you choose the right approach for your application.

  1. WebSockets: Real-time, Two-Way Communication

    Imagine a live chat conversation. WebSockets facilitate full-duplex communication, where both the server and client can send and receive messages in real-time. This allows for interactive experiences like chat applications, online games, or collaborative editing tools. A long-lived connection is established, enabling continuous data flow in both directions.

  2. Server-Sent Events (SSE): Server Pushes Updates

    Think of a live news feed. SSE allows the server to proactively push updates or data to connected clients. Clients cannot directly send messages back. This unidirectional approach is ideal for real-time updates where the server needs to broadcast information, like stock tickers, sensor readings, or live sports scores. Similarly to WebSockets, a long-lived connection is maintained between the server and client.

  3. HTTP 206 Partial Content: Downloading Large Files in Chunks

    Imagine downloading a large movie. HTTP 206 Partial Content is part of the standard HTTP protocol for handling partial downloads. The client requests a specific portion of a resource (e.g., a specific chunk of the movie file), the server sends only that part, and the connection closes. This is useful for downloading large files more efficiently, allowing for progress updates and potentially faster perceived download speeds.

Choosing the Right Tool:

The best approach depends on your application’s needs:

  • Real-time, two-way communication: Use WebSockets.

  • Server-side updates without client interaction: Use SSE.

  • Downloading large resources in chunks: Use HTTP 206 Partial Content.

By understanding these concepts, you can make informed decisions when designing real-time or download functionalities in your web applications.