1. Kubernetes Logging Architecture

Application logs can help you understand what is happening inside your application. The logs are particularly useful for debugging problems and monitoring cluster activity. Most modern applications have some kind of logging mechanism. Likewise, container engines are designed to support logging. The easiest and most adopted logging method for containerized applications is writing to standard output and standard error streams.

However, the native functionality provided by a container engine or runtime is usually not enough for a complete logging solution. For example, you may want access your application’s logs if a container crashes; a pod gets evicted; or a node dies. In a cluster, logs should have a separate storage and lifecycle independent of nodes, pods, or containers. This concept is called cluster-level logging.

Cluster-level logging architectures require a separate backend to store, analyze, and query logs. Kubernetes does not provide a native storage solution for log data. Instead, there are many logging solutions that integrate with Kubernetes.

1.1. Logging at the Node Level

logging node level

A container engine handles and redirects any output generated to a containerized application’s stdout and stderr streams. For example, the Docker container engine redirects those two streams to a logging driver, which is configured in Kubernetes to write to a file in JSON format.

By default, if a container restarts, the kubelet keeps one terminated container with its logs. If a pod is evicted from the node, all corresponding containers are also evicted, along with their logs.

An important consideration in node-level logging is implementing log rotation, so that logs don’t consume all available storage on the node. Kubernetes is not responsible for rotating logs, but rather a deployment tool should set up a solution to address that.

When using a CRI container runtime, the kubelet is responsible for rotating the logs and managing the logging directory structure. The kubelet sends this information to the CRI container runtime and the runtime writes the container logs to the given location. The two kubelet parameters containerLogMaxSize and containerLogMaxFiles in kubelet config file can be used to configure the maximum size for each log file and the maximum number of files allowed for each container respectively.

When you run kubectl logs, the kubelet on the node handles the request and reads directly from the log file. The kubelet returns the content of the log file.

1.1.1. System component logs

There are two types of system components: those that run in a container and those that do not run in a container. For example:

  • The Kubernetes scheduler and kube-proxy run in a container.

  • The kubelet and container runtime do not run in containers.

On machines with systemd, the kubelet and container runtime write to journald. If systemd is not present, the kubelet and container runtime write to .log files in the /var/log directory. System components inside containers always write to the /var/log directory, bypassing the default logging mechanism.

1.2. Cluster-level logging architectures

While Kubernetes does not provide a native solution for cluster-level logging, there are several common approaches you can consider. Here are some options:

  • Use a node-level logging agent that runs on every node.

    logging with node agent
  • Include a dedicated sidecar container for logging in an application pod.

    logging with streaming sidecar
  • Push logs directly to a backend from within an application.

    logging with sidecar agent

1.3. CRI: Log Management for Container Stdout/Stderr Streams

  • Logging in kubernetes with docker

    Docker supports various logging drivers (e.g., syslog, journal, and json-file), and allows users to configure the driver by passing flags to the docker daemon at startup.

    Kubernetes defaults to the "json-file" logging driver, in which docker writes the stdout/stderr streams to a file in the json format as shown below.

    {"log": "The actual log line", "stream": "stderr", "time": "2016-10-05T00:00:30.082640485Z"}

    In a production cluster, logs are usually collected, aggregated, and shipped to a remote store where advanced analysis/search/archiving functions are supported. In kubernetes, the default cluster-addons includes a per-node log collection daemon, fluentd. To facilitate the log collection, kubelet creates symbolic links to all the docker containers logs under /var/log/containers with pod and container metadata embedded in the filename.


    The fluentd daemon watches the /var/log/containers/ directory and extract the metadata associated with the log from the path.

    Use crictl to determine the log path of containers:

    $ sudo crictl version
    Version:  0.1.0
    RuntimeName:  docker
    RuntimeVersion:  20.10.11
    RuntimeApiVersion:  1.41.0
    $ sudo crictl ps --state Running | head -n 2
    CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID
    5aa9ed1035b18       a4ca41631cc7a       About an hour ago   Running             coredns                   0                   9ea61ef06c670
    $ sudo crictl inspectp -o go-template --template '{{.status.metadata.name}}_{{.status.metadata.namespace}}' 9ea61ef06c670
    $ sudo crictl inspect -o go-template --template '{{.status.metadata.name}}-{{.status.id}}' 5aa9ed1035b18
    $ sudo readlink /var/log/containers/coredns-64897985d-6ps6n_kube-system_coredns-5aa9ed1035b1870f1c1551f4fcc4b195ca33ce0726109f3493a81508f315a087.log
    $ sudo crictl inspect -o go-template --template '{{.status.logPath}}' 5aa9ed1035b18
    $ sudo docker info -f '{{.LoggingDriver}}'
    $ sudo tail -n 1 /var/log/pods/kube-system_coredns-64897985d-6ps6n_fb974956-1f41-41f1-ba30-2658262cdbd2/coredns/0.log
    {"log":"linux/amd64, go1.17.1, 13a9191\n","stream":"stdout","time":"2022-01-07T05:37:18.356105709Z"}
  • Logging in kubernetes with CRI-compliant Runtimes

    Kubelet will be configured with a root directory (e.g., /var/log/pods or /var/lib/kubelet/logs/) to store all container logs. Below is an example of a path to the log of a container in a pod.


    In CRI, this is implemented by setting the pod-level log directory when creating the pod sandbox, and passing the relative container log path when creating a container.

    PodSandboxConfig.LogDirectory: /var/log/pods/<podUID>/
    ContainerConfig.LogPath: <containerName>_<instance#>.log

    The runtime should decorate each log entry with a RFC 3339Nano timestamp prefix, the stream type (i.e., "stdout" or "stderr"), the tags of the log entry, the log content that ends with a newline.

    The tags fields can support multiple tags, delimited by :. Currently, only one tag is defined in CRI to support multi-line log entries: partial or full. Partial (P) is used when a log entry is split into multiple lines by the runtime, and the entry has not ended yet. Full (F) indicates that the log entry is completed — it is either a single-line entry, or this is the last line of the multiple-line entry.

    For example,

    2016-10-06T00:17:09.669794202Z stdout F The content of the log entry 1
    2016-10-06T00:17:09.669794202Z stdout P First line of log entry 2
    2016-10-06T00:17:09.669794202Z stdout P Second line of the log entry 2
    2016-10-06T00:17:10.113242941Z stderr F Last line of the log entry 2

    Use crictl to determine the log path of containers:

    $ sudo crictl version
    Version:  0.1.0
    RuntimeName:  containerd
    RuntimeVersion:  v1.5.8
    RuntimeApiVersion:  v1alpha2
    $ sudo crictl ps --state Running | head -n 2
    CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID
    a140d889bac72       ae1a7201ec954       3 hours ago         Running             controller          0                   97db7329bd6f2
    $ sudo crictl inspectp -o go-template --template '{{.info.config.log_directory}}' 97db7329bd6f2
    $ sudo crictl inspect -o go-template --template '{{.info.config.log_path}}' a140d889bac72
    $ sudo crictl inspect -o go-template --template '{{.status.logPath}}' a140d889bac72
    $ sudo realpath /var/log/containers/ingress-nginx-controller-7dc8994d6f-w84bm_ingress-nginx_controller-a140d889bac72aeb8a94f706baca61d2a9f1a2490b4b8b546d7609108f9c0b92.log
    $ sudo tail -n 1 /var/log/pods/ingress-nginx_ingress-nginx-controller-7dc8994d6f-w84bm_f8a81dc8-5f3e-4e08-bcb7-46352b45e8e9/controller/0.log
    2022-01-07T14:00:57.629313444+08:00 stderr F I0107 06:00:57.629072       6 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"devtools", Name:"echo.onelinkplus.com", UID:"1d67a4a8-5465-4c10-b103-289ffc2cd1a7", APIVersion:"networking.k8s.io/v1", ResourceVersion:"6772918", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync

2. What is Fluent Bit?

Fluent Bit is a Fast and Lightweight Logs and Metrics Processor and Forwarder for Linux, OSX, Windows and BSD family operating systems. It has been made with a strong focus on performance to allow the collection of events from different sources without complexity.

Fluent Bit is a CNCF sub-project under the umbrella of Fluentd, it’s licensed under the terms of the Apache License v2.0. The project was originally created by Treasure Data and is currently a vendor neutral and community driven project.

2.1. Fluentd & Fluent Bit

Logging and data processing in general can be complex, and at scale a bit more, that’s why it was born.

Fluentd has become more than a simple tool, it has grown into a fullscale ecosystem that contains SDKs for different languages and sub-projects like Fluent Bit.

Both projects share a lot of similarities, Fluent Bit is fully designed and built on top of the best ideas of Fluentd architecture and general design. Choosing which one to use depends on the end-user needs.

The following table describes a comparison in different areas of the projects:

Fluentd Fluent Bit


Containers / Servers

Embedded Linux / Containers / Servers


C & Ruby






High Performance

High Performance


Built as a Ruby Gem, it requires a certain number of gems.

Zero dependencies, unless some special plugin requires them.


More than 1000 plugins available

Around 70 plugins available


Apache License v2.0

Apache License v2.0

Both Fluentd and Fluent Bit can work as Aggregators or Forwarders, they both can complement each other or use them as standalone solutions.

2.2. Key Concepts

  • Event or Record

    Every incoming piece of data that belongs to a log or a metric that is retrieved by Fluent Bit is considered an Event or a Record.

    As an example consider the following content of a Syslog file:

    Jan 18 12:52:16 flb systemd[2222]: Starting GNOME Terminal Server
    Jan 18 12:52:16 flb dbus-daemon[2243]: [session uid=1000 pid=2243] Successfully activated service 'org.gnome.Terminal'
    Jan 18 12:52:16 flb systemd[2222]: Started GNOME Terminal Server.
    Jan 18 12:52:16 flb gsd-media-keys[2640]: # watch_fast: "/org/gnome/terminal/legacy/" (establishing: 0, active: 0)

    It contains four lines and all of them represents four independent Events.

    Internally, an Event always has two components (in an array form):

  • Filtering

    In some cases it is required to perform modifications on the Events content, the process to alter, enrich or drop Events is called Filtering.

    There are many use cases when Filtering is required like:

    • Append specific information to the Event like an IP address or metadata.

    • Select a specific piece of the Event content.

    • Drop Events that matches certain pattern.

  • Tag

    Every Event that gets into Fluent Bit gets assigned a Tag. This tag is an internal string that is used in a later stage by the Router to decide which Filter or Output phase it must go through.

    Most of the tags are assigned manually in the configuration. If a tag is not specified, Fluent Bit will assign the name of the Input plugin instance from where that Event was generated from.

  • Timestamp

    The Timestamp represents the time when an Event was created. Every Event contains a Timestamp associated. The Timestamp is a numeric fractional integer in the format:


      It is the number of seconds that have elapsed since the Unix epoch.


      Fractional second or one thousand-millionth of a second.

  • Match

    Fluent Bit allows to deliver your collected and processed Events to one or multiple destinations, this is done through a routing phase. A Match represent a simple rule to select Events where its Tags matches a defined rule.

  • Structured Messages

    Source events can have or not have a structure. A structure defines a set of keys and values inside the Event message. As an example consider the following two messages:

    • No structured message

      "Project Fluent Bit created on 1398289291"
    • Structured Message

      {"project": "Fluent Bit", "created": 1398289291}

    At a low level both are just an array of bytes, but the Structured message defines keys and values, having a structure helps to implement faster operations on data modifications.

2.3. Data Pipeline

  • Input

    Fluent Bit provides different Input Plugins to gather information from different sources, some of them just collect data from log files while others can gather metrics information from the operating system. There are many plugins for different needs.


    When an input plugin is loaded, an internal instance is created. Every instance has its own and independent configuration. Configuration keys are often called properties.

  • Parser

    Dealing with raw strings or unstructured messages is a constant pain; having a structure is highly desired. Ideally we want to set a structure to the incoming data by the Input Plugins as soon as they are collected:


    The Parser allows you to convert from unstructured to structured data. As a demonstrative example consider the following Apache (HTTP Server) log entry: - - [28/Jul/2006:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395

    The above log line is a raw string without format, ideally we would like to give it a structure that can be processed later easily. If the proper configuration is used, the log entry could be converted to:

      "host":    "",
      "user":    "-",
      "method":  "GET",
      "path":    "/cgi-bin/try/",
      "code":    "200",
      "size":    "3395",
      "referer": "",
      "agent":   ""
  • Filter

    In production environments we want to have full control of the data we are collecting, filtering is an important feature that allows us to alter the data before delivering it to some destination.


    Filtering is implemented through plugins, so each filter available could be used to match, exclude or enrich your logs with some specific metadata.

  • Buffer

    The buffer phase in the pipeline aims to provide a unified and persistent mechanism to store your data, either using the primary in-memory model or using the filesystem based mode.

    The buffer phase already contains the data in an immutable state, meaning, no other filter can be applied.


    Fluent Bit offers a buffering mechanism in the file system that acts as a backup system to avoid data loss in case of system failures.

  • Router

    Routing is a core feature that allows to route your data through Filters and finally to one or multiple destinations. The router relies on the concept of Tags and Matching rules.


    When the data is generated by the input plugins, it comes with a Tag (most of the time the Tag is configured manually), the Tag is a human-readable indicator that helps to identify the data source.

    In order to define where the data should be routed, a Match rule must be specified in the output configuration.

    Consider the following configuration example that aims to deliver CPU metrics to an Elasticsearch database and Memory metrics to the standard output interface:

        name cpu
        tag  my_cpu
        name mem
        tag  my_mem
        name   es
        match  my_cpu
        name   stdout
        match  my_mem

    Routing works automatically reading the Input Tags and the Output Match rules. If some data has a Tag that doesn’t match upon routing time, the data is deleted.

    Routing is flexible enough to support wildcard in the Match pattern. The below example defines a common destination for both sources of data:

        name cpu
        tag  my_cpu
        name mem
        tag  my_mem
        name   stdout
        match  my_*

    The match rule is set to my_* which means it will match any Tag that starts with my_.

  • Output

    The output interface allows us to define destinations for the data. Common destinations are remote services, local file system or standard interface with others. Outputs are implemented as plugins and there are many available.


    When an output plugin is loaded, an internal instance is created. Every instance has its own independent configuration. Configuration keys are often called properties.

2.4. Configuring Fluent Bit

Fluent Bit might optionally use a configuration file to define how the service will behave.

A simple example of a configuration file is as follows:

    # This is a commented line
    daemon    off
    log_level debug

The configuration schema is defined by three concepts:

  • Sections

    A section is defined by a name or title inside brackets. Looking at the example above, a Service section has been set using [SERVICE] definition. Section rules:

    • All section content must be indented (4 spaces ideally).

    • Multiple sections can exist on the same file.

    • A section is expected to have comments and entries, it cannot be empty.

    • Any commented line under a section, must be indented too.

  • Entries: Key/Value

    A section may contain Entries, an entry is defined by a line of text that contains a Key and a Value, using the above example, the [SERVICE] section contains two entries, one is the key Daemon with value off and the other is the key Log_Level with the value debug. Entries rules:

    • An entry is defined by a key and a value.

    • A key must be indented.

    • A key must contain a value which ends in the breakline.

    • Multiple keys with the same name can exist.

      Also commented lines are set prefixing the # character, those lines are not processed but they must be indented too.

  • Indented Configuration Mode

    Fluent Bit configuration files are based in a strict Indented Mode, that means that each configuration file must follow the same pattern of alignment from left to right when writing text. By default an indentation level of four spaces from left to right is suggested.

One of the ways to configure Fluent Bit is using a main configuration file. The main configuration file supports four types of sections: Service, Input, Filter, Output. In addition, it’s also possible to split the main configuration file in multiple files using the feature to include external files: Include File.

The following configuration file example demonstrates how to collect CPU metrics and flush the results every five seconds to the standard output:

    flush     5
    daemon    off
    log_level debug

    name  cpu
    tag   my_cpu

    name  stdout
    match my*cpu

To avoid complicated long configuration files is better to split specific parts in different files and call them (include) from one main file.

Starting from Fluent Bit 0.12 the new configuration command @INCLUDE has been added and can be used in the following way:

@INCLUDE somefile.conf

The configuration reader will try to open the path somefile.conf, if not found, it will assume it’s a relative path based on the path of the base configuration file.

The @INCLUDE command only works at top-left level of the configuration line, it cannot be used inside sections.

Wildcard character (*) is supported to include multiple files, e.g:

@INCLUDE input_*.conf

Fluent Bit supports the usage of environment variables in any value associated to a key when using a configuration file.

The variables are case sensitive and can be used in the following format:


When Fluent Bit starts, the configuration reader will detect any request for ${MY_VARIABLE} and will try to resolve its value.

2.5. Parsers

Parsers are an important component of Fluent Bit, with them you can take any unstructured log entry and give them a structure that makes easier it processing and further filtering.

The parser engine is fully configurable and can process log entries based in two types of format:

By default, Fluent Bit provides a set of pre-configured parsers that can be used for different use cases such as logs from:

  • Apache

  • Nginx

  • Docker

  • Syslog rfc5424

  • Syslog rfc3164

Parsers are defined in one or multiple configuration files that are loaded at start time, either from the command line or through the main Fluent Bit configuration file.

Note: If you are using Regular Expressions note that Fluent Bit uses Ruby based regular expressions and we encourage to use Rubular web site as an online editor to test them.

Multiple parsers can be defined and each section has it own properties. The following table describes the available options for each parser definition:

Key Description


Set an unique name for the parser in question.


Specify the format of the parser, the available options here are: json, regex, ltsv or logfmt.


If format is regex, this option must be set specifying the Ruby Regular Expression that will be used to parse and compose the structured message.


If the log entry provides a field with a timestamp, this option specify the name of that field.


Specify the format of the time field so it can be recognized and analyzed properly. Fluent-bit uses strptime(3) to parse time so you can ferer to strptime documentation for available modifiers.


Specify a fixed UTC time offset (e.g. -0600, +0200, etc.) for local dates.


Specify the data type of parsed field. The syntax is types <field_name_1>:<type_name_1> <field_name_2>:<type_name_2> …​. The supported types are string(default), integer, bool, float, hex. The option is supported by ltsv, logfmt and regex.


Decode a field value, the only decoder available is json. The syntax is: Decode_Field json <field_name>.

All parsers must be defined in a parsers.conf file, not in the Fluent Bit global configuration file. The parsers file expose all parsers available that can be used by the Input plugins that are aware of this feature.

  • JSON Parser

    The JSON parser is the simplest option: if the original log source is a JSON map string, it will take it structure and convert it directly to the internal binary representation.

    A simple configuration that can be found in the default parsers configuration file, is the entry to parse Docker log files (when the tail input plugin is used):

        name        docker
        format      json
        time_key    time
        time_format %Y-%m-%dT%H:%M:%S %z

    The following log entry is a valid content for the parser defined above:

    {"key1": 12345, "key2": "abc", "time": "2006-07-28T13:22:04Z"}

    After processing, it internal representation will be:

    [1154103724, {"key1"=>12345, "key2"=>"abc"}]

    The time has been converted to Unix timestamp (UTC) and the map reduced to each component of the original message.

  • Regex Parser

    The regex parser allows to define a custom Ruby Regular Expression that will use a named capture feature to define which content belongs to which key name.

    The following parser configuration example aims to provide rules that can be applied to a Nginx combined access log entry:

        name   nginx
        format regex
        # log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ''"$http_referer" "$http_user_agent"';
        regex ^(?<remote_addr>[^ ]+) - (?<remote_user>[^ ]+) \[(?<time>[^\]]+)\] "(?<method>\w+) (?<path>[^ ]+) (?<proto>[^"]+)" (?<status>\d+) (?<body_byte_sent>\d+) "(?<referer>[^"]+)" "(?<user_agent>[^"]+)"$
        time_key time
        time_format %d/%b/%Y:%H:%M:%S %z
        types status:integer body_byte_sent:integer

    As an example, takes the following Nginx access log entry: - - [12/Jan/2022:08:24:28 +0000] "GET / HTTP/1.1" 200 615 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"

    The above content do not provide a defined structure for Fluent Bit, but enabling the proper parser we can help to make a structured representation of it:

        "user_agent"=>"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"

    A common pitfall is that you cannot use characters other than alphabets, numbers and underscore in group names. For example, a group name like (?<user-name>.*) will cause an error due to containing an invalid character (-).

2.6. Parser Filter

The Parser Filter plugin allows for parsing fields in event records.

The plugin supports the following configuration parameters:

Key Description Default


Specify field name in record to parse.


Specify the parser name to interpret the field. Multiple Parser entries are allowed (one per line).


Keep original Key_Name field in the parsed result. If false, the field will be removed.



Keep all other original fields in the parsed result. If false, all other original fields will be removed.



If the key is an escaped string (e.g: stringify JSON), unescape the string before applying the parser.


This is an example of parsing a record {"data":"100 0.5 true This is example"}.

The plugin needs a parser file which defines how to parse each field.

    name dummy_test
    format regex
    regex ^(?<INT>[^ ]+) (?<FLOAT>[^ ]+) (?<BOOL>[^ ]+) (?<STRING>.+)$

The path of the parser file should be written in configuration file under the [SERVICE] section.

    parsers_file parsers.conf

    name dummy
    tag  dummy.data
    dummy {"data":"100 0.5 true This is example"}
    samples 3

    name parser
    match dummy.*
    key_name data
    parser dummy_test

    name stdout
    match *

The raw output before parser filtering is:

docker run --rm  \
    fluent/fluent-bit:1.8 \
    /fluent-bit/bin/fluent-bit -q \
    -i dummy \
    -p 'tag=dummy.data' \
    -p 'samples=3' \
    -p 'dummy={"data":"100 0.5 true This is example"}' \
    -o stdout

[0] dummy.data: [1641963560.833349997, {"data"=>"100 0.5 true This is example"}]
[1] dummy.data: [1641963561.834293264, {"data"=>"100 0.5 true This is example"}]
[2] dummy.data: [1641963562.834409396, {"data"=>"100 0.5 true This is example"}]

The output after parser filtering is:

$ docker run --rm \
    -v $PWD/etc:/etc/fluent-bit \
    fluent/fluent-bit:1.8 \
    /fluent-bit/bin/fluent-bit -q \
    -c /etc/fluent-bit/fluent-bit.conf

[0] dummy.data: [1641970270.834847487, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}]
[1] dummy.data: [1641970271.833919275, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}]
[2] dummy.data: [1641970272.834001854, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}]

By default, the parser plugin only keeps the parsed fields in its output.

If you enable Reserve_Data, all other fields are preserved:

    name dummy_test
    format regex
    regex ^(?<INT>[^ ]+) (?<FLOAT>[^ ]+) (?<BOOL>[^ ]+) (?<STRING>.+)$
    parsers_file parsers.conf

    name dummy
    tag  dummy.data
    dummy {"data":"100 0.5 true This is example", "key1":"value1", "key2":"value2"}
    samples 3

    name parser
    match dummy.*
    key_name data
    parser dummy_test
    reserve_data on

    name stdout
    match *

This will produce the output:

$ docker run --rm \
    -v $PWD/etc2:/etc/fluent-bit \
    fluent/fluent-bit:1.8 \
    /fluent-bit/bin/fluent-bit -q \
    -c /etc/fluent-bit/fluent-bit.conf

[0] dummy.data: [1641971163.834882081, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example", "key1"=>"value1", "key2"=>"value2"}]
[1] dummy.data: [1641971164.834110226, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example", "key1"=>"value1", "key2"=>"value2"}]
[2] dummy.data: [1641971165.833051479, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example", "key1"=>"value1", "key2"=>"value2"}]

If you enable Reserved_Data and Preserve_Key, the original key field will be preserved as well:

    name dummy_test
    format regex
    regex ^(?<INT>[^ ]+) (?<FLOAT>[^ ]+) (?<BOOL>[^ ]+) (?<STRING>.+)$
    parsers_file parsers.conf

    name dummy
    tag  dummy.data
    dummy {"data":"100 0.5 true This is example", "key1":"value1", "key2":"value2"}
    samples 3

    name parser
    match dummy.*
    key_name data
    parser dummy_test
    reserve_data on
    preserve_key on

    name stdout
    match *

This will produce the output:

$ docker run --rm \
    -v $PWD/etc3:/etc/fluent-bit \
    fluent/fluent-bit:1.8 \
    /fluent-bit/bin/fluent-bit -q  \
    -c /etc/fluent-bit/fluent-bit.conf

[0] dummy.data: [1641971438.833271871, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example", "data"=>"100 0.5 true This is example", "key1"=>"value1", "key2"=>"value2"}]
[1] dummy.data: [1641971439.834690742, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example", "data"=>"100 0.5 true This is example", "key1"=>"value1", "key2"=>"value2"}]
[2] dummy.data: [1641971440.834035007, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example", "data"=>"100 0.5 true This is example", "key1"=>"value1", "key2"=>"value2"}]

2.7. Docker

Fluent Bit container images are available on Docker Hub ready for production usage. Current available images can be deployed in multiple architectures.

The following (useless) test which makes Fluent Bit measure CPU usage by the container:

$ docker run -ti fluent/fluent-bit:1.8 /fluent-bit/bin/fluent-bit -i cpu -o stdout -f 1

That command will let Fluent Bit measure CPU usage every second and flush the results to the standard output:

Fluent Bit v1.8.11
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/01/07 05:02:04] [ info] [engine] started (pid=1)
[2022/01/07 05:02:04] [ info] [storage] version=1.1.5, initializing...
[2022/01/07 05:02:04] [ info] [storage] in-memory
[2022/01/07 05:02:04] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/01/07 05:02:04] [ info] [cmetrics] version=0.2.2
[2022/01/07 05:02:04] [ info] [sp] stream processor started
[0] cpu.0: [1641531724.834023688, {"cpu_p"=>1.750000, "user_p"=>0.500000, "system_p"=>1.250000, "cpu0.p_cpu"=>2.000000, "cpu0.p_user"=>1.000000, "cpu0.p_system"=>1.000000, "cpu1.p_cpu"=>1.000000, "cpu1.p_user"=>0.000000, "cpu1.p_system"=>1.000000, "cpu2.p_cpu"=>0.000000, "cpu2.p_user"=>0.000000, "cpu2.p_system"=>0.000000, "cpu3.p_cpu"=>4.000000, "cpu3.p_user"=>1.000000, "cpu3.p_system"=>3.000000}]

2.8. Kubernetes

Fluent Bit is a lightweight and extensible Log Processor that comes with full support for Kubernetes:

  • Process Kubernetes containers logs from the file system or Systemd/Journald.

  • Enrich logs with Kubernetes Metadata.

  • Centralize your logs in third party storage services like Elasticsearch, InfluxDB, HTTP, etc.

Kubernetes manages a cluster of nodes, so our log agent tool will need to run on every node to collect logs from every POD, hence Fluent Bit is deployed as a DaemonSet (a POD that runs on every node of the cluster).

When Fluent Bit runs, it will read, parse and filter the logs of every POD and will enrich each entry with the following information (metadata):

  • Pod Name

  • Pod ID

  • Container Name

  • Container ID

  • Labels

  • Annotations

To obtain this information, a built-in filter plugin called kubernetes talks to the Kubernetes API Server to retrieve relevant information such as the pod_id, labels and annotations, other fields such as pod_name, container_id and container_name are retrieved locally from the log file names. All of this is handled automatically, no intervention is required from a configuration aspect.

Kubernetes Filter depends on either Tail and Systemd input plugins to process and enrich records with Kubernetes metadata. Here we will explain the workflow of Tail and how it configuration is correlated with Kubernetes filter. Consider the following configuration example (just for demo purposes, not production):

    name    tail
    tag     kube.*
    path    /var/log/containers/*.log
    parser  docker

    name             kubernetes
    match            kube.*
    kube_url         https://kubernetes.default.svc:443
    kube_ca_file     /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    kube_token_file  /var/run/secrets/kubernetes.io/serviceaccount/token
    kube_tag_prefix  kube.var.log.containers.
    merge_log        on
    merge_log_key    log_processed
  • Systemd

    The systemd input plugin allows to collect log messages from the Journald daemon on Linux environments.

        name            systemd
        tag             host.*
        systemd_filter  _SYSTEMD_UNIT=docker.service
  • Tail

    The tail input plugin allows to monitor one or several text files. It has a similar behavior like tail -f shell command.

    The plugin reads every matched file in the Path pattern and for every new line found, it generates a new record. Optionally a database file can be used so the plugin can have a history of tracked files and a state of offsets, this is very useful to resume a state if the service is restarted.

    If you are running Fluent Bit to process logs coming from containers like Docker or CRI, you can use the new built-in modes for such purposes. This will help to reassembly multiline messages originally split by Docker or CRI:

        name              tail
        path              /var/log/containers/*.log
        # exclude_path      /var/log/containers/*_logging_*.log,/var/log/containers/*_default*.log
        multiline.parser  docker, cri

    The two options separated by a comma means multi-format: try docker and cri multiline formats.