Kubernetes Logging with Fluent Bit
1. Kubernetes Logging Architecture
Application logs can help you understand what is happening inside your application. The logs are particularly useful for debugging problems and monitoring cluster activity. Most modern applications have some kind of logging mechanism. Likewise, container engines are designed to support logging. The easiest and most adopted logging method for containerized applications is writing to standard output and standard error streams.
However, the native functionality provided by a container engine or runtime is usually not enough for a complete logging solution. For example, you may want access your application’s logs if a container crashes; a pod gets evicted; or a node dies. In a cluster, logs should have a separate storage and lifecycle independent of nodes, pods, or containers. This concept is called cluster-level logging.
Cluster-level logging architectures require a separate backend to store, analyze, and query logs. Kubernetes does not provide a native storage solution for log data. Instead, there are many logging solutions that integrate with Kubernetes.
1.1. Logging at the Node Level

A container engine handles and redirects any output generated to a containerized application’s stdout
and stderr
streams. For example, the Docker container engine redirects those two streams to a logging driver, which is configured in Kubernetes to write to a file in JSON format.
By default, if a container restarts, the kubelet keeps one terminated container with its logs. If a pod is evicted from the node, all corresponding containers are also evicted, along with their logs.
An important consideration in node-level logging is implementing log rotation, so that logs don’t consume all available storage on the node. Kubernetes is not responsible for rotating logs, but rather a deployment tool should set up a solution to address that.
When using a CRI container runtime, the kubelet is responsible for rotating the logs and managing the logging directory structure. The kubelet sends this information to the CRI container runtime and the runtime writes the container logs to the given location. The two kubelet parameters containerLogMaxSize
and containerLogMaxFiles
in kubelet config file can be used to configure the maximum size for each log file and the maximum number of files allowed for each container respectively.
When you run kubectl logs
, the kubelet on the node handles the request and reads directly from the log file. The kubelet returns the content of the log file.
1.1.1. System component logs
There are two types of system components: those that run in a container and those that do not run in a container. For example:
-
The Kubernetes scheduler and kube-proxy run in a container.
-
The kubelet and container runtime do not run in containers.
On machines with systemd, the kubelet and container runtime write to journald. If systemd is not present, the kubelet and container runtime write to .log
files in the /var/log
directory. System components inside containers always write to the /var/log
directory, bypassing the default logging mechanism.
1.2. Cluster-level logging architectures
While Kubernetes does not provide a native solution for cluster-level logging, there are several common approaches you can consider. Here are some options:
-
Use a node-level logging agent that runs on every node.
-
Include a dedicated sidecar container for logging in an application pod.
-
Push logs directly to a backend from within an application.
1.3. CRI: Log Management for Container Stdout/Stderr Streams
-
Logging in kubernetes with docker
Docker supports various logging drivers (e.g., syslog, journal, and json-file), and allows users to configure the driver by passing flags to the docker daemon at startup.
Kubernetes defaults to the "json-file" logging driver, in which docker writes the stdout/stderr streams to a file in the json format as shown below.
{"log": "The actual log line", "stream": "stderr", "time": "2016-10-05T00:00:30.082640485Z"}
In a production cluster, logs are usually collected, aggregated, and shipped to a remote store where advanced analysis/search/archiving functions are supported. In kubernetes, the default cluster-addons includes a per-node log collection daemon, fluentd. To facilitate the log collection, kubelet creates symbolic links to all the docker containers logs under
/var/log/containers
with pod and container metadata embedded in the filename./var/log/containers/<pod_name>_<pod_namespace>_<container_name>-<container_id>.log
The fluentd daemon watches the
/var/log/containers/
directory and extract the metadata associated with the log from the path.Use
crictl
to determine the log path of containers:$ sudo crictl version Version: 0.1.0 RuntimeName: docker RuntimeVersion: 20.10.11 RuntimeApiVersion: 1.41.0 $ sudo crictl ps --state Running | head -n 2 CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID 5aa9ed1035b18 a4ca41631cc7a About an hour ago Running coredns 0 9ea61ef06c670 $ sudo crictl inspectp -o go-template --template '{{.status.metadata.name}}_{{.status.metadata.namespace}}' 9ea61ef06c670 coredns-64897985d-6ps6n_kube-system $ sudo crictl inspect -o go-template --template '{{.status.metadata.name}}-{{.status.id}}' 5aa9ed1035b18 coredns-5aa9ed1035b1870f1c1551f4fcc4b195ca33ce0726109f3493a81508f315a087 $ sudo readlink /var/log/containers/coredns-64897985d-6ps6n_kube-system_coredns-5aa9ed1035b1870f1c1551f4fcc4b195ca33ce0726109f3493a81508f315a087.log /var/log/pods/kube-system_coredns-64897985d-6ps6n_fb974956-1f41-41f1-ba30-2658262cdbd2/coredns/0.log $ sudo crictl inspect -o go-template --template '{{.status.logPath}}' 5aa9ed1035b18 /var/log/pods/kube-system_coredns-64897985d-6ps6n_fb974956-1f41-41f1-ba30-2658262cdbd2/coredns/0.log $ sudo docker info -f '{{.LoggingDriver}}' json-file $ sudo tail -n 1 /var/log/pods/kube-system_coredns-64897985d-6ps6n_fb974956-1f41-41f1-ba30-2658262cdbd2/coredns/0.log {"log":"linux/amd64, go1.17.1, 13a9191\n","stream":"stdout","time":"2022-01-07T05:37:18.356105709Z"}
-
Logging in kubernetes with CRI-compliant Runtimes
Kubelet will be configured with a root directory (e.g.,
/var/log/pods
or/var/lib/kubelet/logs/
) to store all container logs. Below is an example of a path to the log of a container in a pod./var/log/pods/<podUID>/<containerName>_<instance#>.log
In CRI, this is implemented by setting the pod-level log directory when creating the pod sandbox, and passing the relative container log path when creating a container.
PodSandboxConfig.LogDirectory: /var/log/pods/<podUID>/ ContainerConfig.LogPath: <containerName>_<instance#>.log
The runtime should decorate each log entry with a RFC 3339Nano timestamp prefix, the stream type (i.e., "stdout" or "stderr"), the tags of the log entry, the log content that ends with a newline.
The
tags
fields can support multiple tags, delimited by :. Currently, only one tag is defined in CRI to support multi-line log entries: partial or full. Partial (P) is used when a log entry is split into multiple lines by the runtime, and the entry has not ended yet. Full (F) indicates that the log entry is completed — it is either a single-line entry, or this is the last line of the multiple-line entry.For example,
2016-10-06T00:17:09.669794202Z stdout F The content of the log entry 1 2016-10-06T00:17:09.669794202Z stdout P First line of log entry 2 2016-10-06T00:17:09.669794202Z stdout P Second line of the log entry 2 2016-10-06T00:17:10.113242941Z stderr F Last line of the log entry 2
Use
crictl
to determine the log path of containers:$ sudo crictl version Version: 0.1.0 RuntimeName: containerd RuntimeVersion: v1.5.8 RuntimeApiVersion: v1alpha2 $ sudo crictl ps --state Running | head -n 2 CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID a140d889bac72 ae1a7201ec954 3 hours ago Running controller 0 97db7329bd6f2 $ sudo crictl inspectp -o go-template --template '{{.info.config.log_directory}}' 97db7329bd6f2 /var/log/pods/ingress-nginx_ingress-nginx-controller-7dc8994d6f-w84bm_f8a81dc8-5f3e-4e08-bcb7-46352b45e8e9 $ sudo crictl inspect -o go-template --template '{{.info.config.log_path}}' a140d889bac72 controller/0.log $ sudo crictl inspect -o go-template --template '{{.status.logPath}}' a140d889bac72 /var/log/pods/ingress-nginx_ingress-nginx-controller-7dc8994d6f-w84bm_f8a81dc8-5f3e-4e08-bcb7-46352b45e8e9/controller/0.log $ sudo realpath /var/log/containers/ingress-nginx-controller-7dc8994d6f-w84bm_ingress-nginx_controller-a140d889bac72aeb8a94f706baca61d2a9f1a2490b4b8b546d7609108f9c0b92.log /var/log/pods/ingress-nginx_ingress-nginx-controller-7dc8994d6f-w84bm_f8a81dc8-5f3e-4e08-bcb7-46352b45e8e9/controller/0.log $ sudo tail -n 1 /var/log/pods/ingress-nginx_ingress-nginx-controller-7dc8994d6f-w84bm_f8a81dc8-5f3e-4e08-bcb7-46352b45e8e9/controller/0.log 2022-01-07T14:00:57.629313444+08:00 stderr F I0107 06:00:57.629072 6 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"devtools", Name:"echo.onelinkplus.com", UID:"1d67a4a8-5465-4c10-b103-289ffc2cd1a7", APIVersion:"networking.k8s.io/v1", ResourceVersion:"6772918", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
2. What is Fluent Bit?
Fluent Bit is a Fast and Lightweight Logs and Metrics Processor and Forwarder for Linux, OSX, Windows and BSD family operating systems. It has been made with a strong focus on performance to allow the collection of events from different sources without complexity.
Fluent Bit is a CNCF sub-project under the umbrella of Fluentd, it’s licensed under the terms of the Apache License v2.0. The project was originally created by Treasure Data and is currently a vendor neutral and community driven project.
2.1. Fluentd & Fluent Bit
Logging and data processing in general can be complex, and at scale a bit more, that’s why it was born.
Fluentd has become more than a simple tool, it has grown into a fullscale ecosystem that contains SDKs for different languages and sub-projects like Fluent Bit.
Both projects share a lot of similarities, Fluent Bit is fully designed and built on top of the best ideas of Fluentd architecture and general design. Choosing which one to use depends on the end-user needs.
The following table describes a comparison in different areas of the projects:
Fluentd | Fluent Bit | |
---|---|---|
Scope |
Containers / Servers |
Embedded Linux / Containers / Servers |
Language |
C & Ruby |
C |
Memory |
~40MB |
~650KB |
Performance |
High Performance |
High Performance |
Dependencies |
Built as a Ruby Gem, it requires a certain number of gems. |
Zero dependencies, unless some special plugin requires them. |
Plugins |
More than 1000 plugins available |
Around 70 plugins available |
License |
Both Fluentd and Fluent Bit can work as Aggregators or Forwarders, they both can complement each other or use them as standalone solutions.
2.2. Key Concepts
-
Event or Record
Every incoming piece of data that belongs to a log or a metric that is retrieved by Fluent Bit is considered an Event or a Record.
As an example consider the following content of a Syslog file:
Jan 18 12:52:16 flb systemd[2222]: Starting GNOME Terminal Server Jan 18 12:52:16 flb dbus-daemon[2243]: [session uid=1000 pid=2243] Successfully activated service 'org.gnome.Terminal' Jan 18 12:52:16 flb systemd[2222]: Started GNOME Terminal Server. Jan 18 12:52:16 flb gsd-media-keys[2640]: # watch_fast: "/org/gnome/terminal/legacy/" (establishing: 0, active: 0)
It contains four lines and all of them represents four independent Events.
Internally, an Event always has two components (in an array form):
[TIMESTAMP, MESSAGE]
-
Filtering
In some cases it is required to perform modifications on the Events content, the process to alter, enrich or drop Events is called Filtering.
There are many use cases when Filtering is required like:
-
Append specific information to the Event like an IP address or metadata.
-
Select a specific piece of the Event content.
-
Drop Events that matches certain pattern.
-
-
Tag
Every Event that gets into Fluent Bit gets assigned a Tag. This tag is an internal string that is used in a later stage by the Router to decide which Filter or Output phase it must go through.
Most of the tags are assigned manually in the configuration. If a tag is not specified, Fluent Bit will assign the name of the Input plugin instance from where that Event was generated from.
-
Timestamp
The Timestamp represents the time when an Event was created. Every Event contains a Timestamp associated. The Timestamp is a numeric fractional integer in the format:
SECONDS.NANOSECONDS
-
SECONDS
It is the number of seconds that have elapsed since the Unix epoch.
-
NANOSECONDS
Fractional second or one thousand-millionth of a second.
-
-
Match
Fluent Bit allows to deliver your collected and processed Events to one or multiple destinations, this is done through a routing phase. A Match represent a simple rule to select Events where its Tags matches a defined rule.
-
Structured Messages
Source events can have or not have a structure. A structure defines a set of keys and values inside the Event message. As an example consider the following two messages:
-
No structured message
"Project Fluent Bit created on 1398289291"
-
Structured Message
{"project": "Fluent Bit", "created": 1398289291}
At a low level both are just an array of bytes, but the Structured message defines keys and values, having a structure helps to implement faster operations on data modifications.
-
2.3. Data Pipeline
-
Input
Fluent Bit provides different Input Plugins to gather information from different sources, some of them just collect data from log files while others can gather metrics information from the operating system. There are many plugins for different needs.
When an input plugin is loaded, an internal instance is created. Every instance has its own and independent configuration. Configuration keys are often called properties.
-
Parser
Dealing with raw strings or unstructured messages is a constant pain; having a structure is highly desired. Ideally we want to set a structure to the incoming data by the Input Plugins as soon as they are collected:
The Parser allows you to convert from unstructured to structured data. As a demonstrative example consider the following Apache (HTTP Server) log entry:
192.168.2.20 - - [28/Jul/2006:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395
The above log line is a raw string without format, ideally we would like to give it a structure that can be processed later easily. If the proper configuration is used, the log entry could be converted to:
{ "host": "192.168.2.20", "user": "-", "method": "GET", "path": "/cgi-bin/try/", "code": "200", "size": "3395", "referer": "", "agent": "" }
-
Filter
In production environments we want to have full control of the data we are collecting, filtering is an important feature that allows us to alter the data before delivering it to some destination.
Filtering is implemented through plugins, so each filter available could be used to match, exclude or enrich your logs with some specific metadata.
-
Buffer
The buffer phase in the pipeline aims to provide a unified and persistent mechanism to store your data, either using the primary in-memory model or using the filesystem based mode.
The buffer phase already contains the data in an immutable state, meaning, no other filter can be applied.
Fluent Bit offers a buffering mechanism in the file system that acts as a backup system to avoid data loss in case of system failures.
-
Router
Routing is a core feature that allows to route your data through Filters and finally to one or multiple destinations. The router relies on the concept of Tags and Matching rules.
When the data is generated by the input plugins, it comes with a Tag (most of the time the Tag is configured manually), the Tag is a human-readable indicator that helps to identify the data source.
In order to define where the data should be routed, a Match rule must be specified in the output configuration.
Consider the following configuration example that aims to deliver CPU metrics to an Elasticsearch database and Memory metrics to the standard output interface:
[INPUT] name cpu tag my_cpu [INPUT] name mem tag my_mem [OUTPUT] name es match my_cpu [OUTPUT] name stdout match my_mem
Routing works automatically reading the Input Tags and the Output Match rules. If some data has a Tag that doesn’t match upon routing time, the data is deleted.
Routing is flexible enough to support wildcard in the Match pattern. The below example defines a common destination for both sources of data:
[INPUT] name cpu tag my_cpu [INPUT] name mem tag my_mem [OUTPUT] name stdout match my_*
The match rule is set to
my_*
which means it will match any Tag that starts withmy_
. -
Output
The output interface allows us to define destinations for the data. Common destinations are remote services, local file system or standard interface with others. Outputs are implemented as plugins and there are many available.
When an output plugin is loaded, an internal instance is created. Every instance has its own independent configuration. Configuration keys are often called properties.
2.4. Configuring Fluent Bit
Fluent Bit might optionally use a configuration file to define how the service will behave.
A simple example of a configuration file is as follows:
[SERVICE]
# This is a commented line
daemon off
log_level debug
The configuration schema is defined by three concepts:
-
Sections
A section is defined by a name or title inside brackets. Looking at the example above, a Service section has been set using
[SERVICE]
definition. Section rules:-
All section content must be indented (4 spaces ideally).
-
Multiple sections can exist on the same file.
-
A section is expected to have comments and entries, it cannot be empty.
-
Any commented line under a section, must be indented too.
-
-
Entries: Key/Value
A section may contain Entries, an entry is defined by a line of text that contains a Key and a Value, using the above example, the
[SERVICE]
section contains two entries, one is the keyDaemon
with valueoff
and the other is the keyLog_Level
with the valuedebug
. Entries rules:-
An entry is defined by a key and a value.
-
A key must be indented.
-
A key must contain a value which ends in the breakline.
-
Multiple keys with the same name can exist.
Also commented lines are set prefixing the # character, those lines are not processed but they must be indented too.
-
-
Indented Configuration Mode
Fluent Bit configuration files are based in a strict Indented Mode, that means that each configuration file must follow the same pattern of alignment from left to right when writing text. By default an indentation level of four spaces from left to right is suggested.
One of the ways to configure Fluent Bit is using a main configuration file. The main configuration file supports four types of sections: Service, Input, Filter, Output. In addition, it’s also possible to split the main configuration file in multiple files using the feature to include external files: Include File.
The following configuration file example demonstrates how to collect CPU metrics and flush the results every five seconds to the standard output:
[SERVICE]
flush 5
daemon off
log_level debug
[INPUT]
name cpu
tag my_cpu
[OUTPUT]
name stdout
match my*cpu
To avoid complicated long configuration files is better to split specific parts in different files and call them (include) from one main file.
Starting from Fluent Bit 0.12 the new configuration command @INCLUDE has been added and can be used in the following way:
@INCLUDE somefile.conf
The configuration reader will try to open the path somefile.conf, if not found, it will assume it’s a relative path based on the path of the base configuration file.
The @INCLUDE command only works at top-left level of the configuration line, it cannot be used inside sections.
Wildcard character (*
) is supported to include multiple files, e.g:
@INCLUDE input_*.conf
Fluent Bit supports the usage of environment variables in any value associated to a key when using a configuration file.
The variables are case sensitive and can be used in the following format:
${MY_VARIABLE}
When Fluent Bit starts, the configuration reader will detect any request for ${MY_VARIABLE}
and will try to resolve its value.
2.5. Parsers
Parsers are an important component of Fluent Bit, with them you can take any unstructured log entry and give them a structure that makes easier it processing and further filtering.
The parser engine is fully configurable and can process log entries based in two types of format:
-
Regular Expressions (named capture)
By default, Fluent Bit provides a set of pre-configured parsers that can be used for different use cases such as logs from:
-
Apache
-
Nginx
-
Docker
-
Syslog rfc5424
-
Syslog rfc3164
Parsers are defined in one or multiple configuration files that are loaded at start time, either from the command line or through the main Fluent Bit configuration file.
Note: If you are using Regular Expressions note that Fluent Bit uses Ruby based regular expressions and we encourage to use Rubular web site as an online editor to test them.
Multiple parsers can be defined and each section has it own properties. The following table describes the available options for each parser definition:
Key | Description |
---|---|
Name |
Set an unique name for the parser in question. |
Format |
Specify the format of the parser, the available options here are: |
Regex |
If format is |
Time_Key |
If the log entry provides a field with a timestamp, this option specify the name of that field. |
Time_Format |
Specify the format of the time field so it can be recognized and analyzed properly. Fluent-bit uses |
Time_Offset |
Specify a fixed UTC time offset (e.g. -0600, +0200, etc.) for local dates. |
Types |
Specify the data type of parsed field. The syntax is types |
Decode_Field |
Decode a field value, the only decoder available is |
All parsers must be defined in a parsers.conf file, not in the Fluent Bit global configuration file. The parsers file expose all parsers available that can be used by the Input plugins that are aware of this feature.
-
JSON Parser
The JSON parser is the simplest option: if the original log source is a JSON map string, it will take it structure and convert it directly to the internal binary representation.
A simple configuration that can be found in the default parsers configuration file, is the entry to parse Docker log files (when the tail input plugin is used):
[PARSER] name docker format json time_key time time_format %Y-%m-%dT%H:%M:%S %z
The following log entry is a valid content for the parser defined above:
{"key1": 12345, "key2": "abc", "time": "2006-07-28T13:22:04Z"}
After processing, it internal representation will be:
[1154103724, {"key1"=>12345, "key2"=>"abc"}]
The time has been converted to Unix timestamp (UTC) and the map reduced to each component of the original message.
-
Regex Parser
The regex parser allows to define a custom Ruby Regular Expression that will use a named capture feature to define which content belongs to which key name.
The following parser configuration example aims to provide rules that can be applied to a Nginx combined access log entry:
[PARSER] name nginx format regex # log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ''"$http_referer" "$http_user_agent"'; regex ^(?<remote_addr>[^ ]+) - (?<remote_user>[^ ]+) \[(?<time>[^\]]+)\] "(?<method>\w+) (?<path>[^ ]+) (?<proto>[^"]+)" (?<status>\d+) (?<body_byte_sent>\d+) "(?<referer>[^"]+)" "(?<user_agent>[^"]+)"$ time_key time time_format %d/%b/%Y:%H:%M:%S %z types status:integer body_byte_sent:integer
As an example, takes the following Nginx access log entry:
192.168.91.1 - - [12/Jan/2022:08:24:28 +0000] "GET / HTTP/1.1" 200 615 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"
The above content do not provide a defined structure for Fluent Bit, but enabling the proper parser we can help to make a structured representation of it:
[ 1641975868.000000000, { "remote_addr"=>"192.168.91.1", "remote_user"=>"-", "method"=>"GET", "path"=>"/", "proto"=>"HTTP/1.1", "status"=>200, "body_byte_sent"=>615, "referer"=>"-", "user_agent"=>"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0" } ]
A common pitfall is that you cannot use characters other than alphabets, numbers and underscore in group names. For example, a group name like
(?<user-name>.*)
will cause an error due to containing an invalid character (-
).
2.6. Parser Filter
The Parser Filter plugin allows for parsing fields in event records.
The plugin supports the following configuration parameters:
Key | Description | Default |
---|---|---|
Key_Name |
Specify field name in record to parse. |
|
Parser |
Specify the parser name to interpret the field. Multiple Parser entries are allowed (one per line). |
|
Preserve_Key |
Keep original |
False |
Reserve_Data |
Keep all other original fields in the parsed result. If false, all other original fields will be removed. |
False |
Unescape_Key |
If the key is an escaped string (e.g: stringify JSON), unescape the string before applying the parser. |
False |
This is an example of parsing a record {"data":"100 0.5 true This is example"}
.
The plugin needs a parser file which defines how to parse each field.
[PARSER]
name dummy_test
format regex
regex ^(?<INT>[^ ]+) (?<FLOAT>[^ ]+) (?<BOOL>[^ ]+) (?<STRING>.+)$
The path of the parser file should be written in configuration file under the [SERVICE]
section.
[SERVICE]
parsers_file parsers.conf
[INPUT]
name dummy
tag dummy.data
dummy {"data":"100 0.5 true This is example"}
samples 3
[FILTER]
name parser
match dummy.*
key_name data
parser dummy_test
[OUTPUT]
name stdout
match *
The raw output before parser filtering is:
docker run --rm \
fluent/fluent-bit:1.8 \
/fluent-bit/bin/fluent-bit -q \
-i dummy \
-p 'tag=dummy.data' \
-p 'samples=3' \
-p 'dummy={"data":"100 0.5 true This is example"}' \
-o stdout
[0] dummy.data: [1641963560.833349997, {"data"=>"100 0.5 true This is example"}]
[1] dummy.data: [1641963561.834293264, {"data"=>"100 0.5 true This is example"}]
[2] dummy.data: [1641963562.834409396, {"data"=>"100 0.5 true This is example"}]
The output after parser filtering is:
$ docker run --rm \
-v $PWD/etc:/etc/fluent-bit \
fluent/fluent-bit:1.8 \
/fluent-bit/bin/fluent-bit -q \
-c /etc/fluent-bit/fluent-bit.conf
[0] dummy.data: [1641970270.834847487, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}]
[1] dummy.data: [1641970271.833919275, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}]
[2] dummy.data: [1641970272.834001854, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}]
By default, the parser plugin only keeps the parsed fields in its output.
If you enable Reserve_Data
, all other fields are preserved:
[PARSER]
name dummy_test
format regex
regex ^(?<INT>[^ ]+) (?<FLOAT>[^ ]+) (?<BOOL>[^ ]+) (?<STRING>.+)$
[SERVICE]
parsers_file parsers.conf
[INPUT]
name dummy
tag dummy.data
dummy {"data":"100 0.5 true This is example", "key1":"value1", "key2":"value2"}
samples 3
[FILTER]
name parser
match dummy.*
key_name data
parser dummy_test
reserve_data on
[OUTPUT]
name stdout
match *
This will produce the output:
$ docker run --rm \
-v $PWD/etc2:/etc/fluent-bit \
fluent/fluent-bit:1.8 \
/fluent-bit/bin/fluent-bit -q \
-c /etc/fluent-bit/fluent-bit.conf
[0] dummy.data: [1641971163.834882081, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example", "key1"=>"value1", "key2"=>"value2"}]
[1] dummy.data: [1641971164.834110226, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example", "key1"=>"value1", "key2"=>"value2"}]
[2] dummy.data: [1641971165.833051479, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example", "key1"=>"value1", "key2"=>"value2"}]
If you enable Reserved_Data
and Preserve_Key
, the original key field will be preserved as well:
[PARSER]
name dummy_test
format regex
regex ^(?<INT>[^ ]+) (?<FLOAT>[^ ]+) (?<BOOL>[^ ]+) (?<STRING>.+)$
[SERVICE]
parsers_file parsers.conf
[INPUT]
name dummy
tag dummy.data
dummy {"data":"100 0.5 true This is example", "key1":"value1", "key2":"value2"}
samples 3
[FILTER]
name parser
match dummy.*
key_name data
parser dummy_test
reserve_data on
preserve_key on
[OUTPUT]
name stdout
match *
This will produce the output:
$ docker run --rm \
-v $PWD/etc3:/etc/fluent-bit \
fluent/fluent-bit:1.8 \
/fluent-bit/bin/fluent-bit -q \
-c /etc/fluent-bit/fluent-bit.conf
[0] dummy.data: [1641971438.833271871, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example", "data"=>"100 0.5 true This is example", "key1"=>"value1", "key2"=>"value2"}]
[1] dummy.data: [1641971439.834690742, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example", "data"=>"100 0.5 true This is example", "key1"=>"value1", "key2"=>"value2"}]
[2] dummy.data: [1641971440.834035007, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example", "data"=>"100 0.5 true This is example", "key1"=>"value1", "key2"=>"value2"}]
2.7. Docker
Fluent Bit container images are available on Docker Hub ready for production usage. Current available images can be deployed in multiple architectures.
The following (useless) test which makes Fluent Bit measure CPU usage by the container:
$ docker run -ti fluent/fluent-bit:1.8 /fluent-bit/bin/fluent-bit -i cpu -o stdout -f 1
That command will let Fluent Bit measure CPU usage every second and flush the results to the standard output:
Fluent Bit v1.8.11
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2022/01/07 05:02:04] [ info] [engine] started (pid=1)
[2022/01/07 05:02:04] [ info] [storage] version=1.1.5, initializing...
[2022/01/07 05:02:04] [ info] [storage] in-memory
[2022/01/07 05:02:04] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/01/07 05:02:04] [ info] [cmetrics] version=0.2.2
[2022/01/07 05:02:04] [ info] [sp] stream processor started
[0] cpu.0: [1641531724.834023688, {"cpu_p"=>1.750000, "user_p"=>0.500000, "system_p"=>1.250000, "cpu0.p_cpu"=>2.000000, "cpu0.p_user"=>1.000000, "cpu0.p_system"=>1.000000, "cpu1.p_cpu"=>1.000000, "cpu1.p_user"=>0.000000, "cpu1.p_system"=>1.000000, "cpu2.p_cpu"=>0.000000, "cpu2.p_user"=>0.000000, "cpu2.p_system"=>0.000000, "cpu3.p_cpu"=>4.000000, "cpu3.p_user"=>1.000000, "cpu3.p_system"=>3.000000}]
2.8. Kubernetes
Fluent Bit is a lightweight and extensible Log Processor that comes with full support for Kubernetes:
-
Process Kubernetes containers logs from the file system or Systemd/Journald.
-
Enrich logs with Kubernetes Metadata.
-
Centralize your logs in third party storage services like Elasticsearch, InfluxDB, HTTP, etc.
Kubernetes manages a cluster of nodes, so our log agent tool will need to run on every node to collect logs from every POD, hence Fluent Bit is deployed as a DaemonSet (a POD that runs on every node of the cluster).
When Fluent Bit runs, it will read, parse and filter the logs of every POD and will enrich each entry with the following information (metadata):
-
Pod Name
-
Pod ID
-
Container Name
-
Container ID
-
Labels
-
Annotations
To obtain this information, a built-in filter plugin called kubernetes talks to the Kubernetes API Server to retrieve relevant information such as the pod_id, labels and annotations, other fields such as pod_name, container_id and container_name are retrieved locally from the log file names. All of this is handled automatically, no intervention is required from a configuration aspect.
Kubernetes Filter depends on either Tail and Systemd input plugins to process and enrich records with Kubernetes metadata. Here we will explain the workflow of Tail and how it configuration is correlated with Kubernetes filter. Consider the following configuration example (just for demo purposes, not production):
[INPUT]
name tail
tag kube.*
path /var/log/containers/*.log
parser docker
[FILTER]
name kubernetes
match kube.*
kube_url https://kubernetes.default.svc:443
kube_ca_file /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
kube_token_file /var/run/secrets/kubernetes.io/serviceaccount/token
kube_tag_prefix kube.var.log.containers.
merge_log on
merge_log_key log_processed
-
Systemd
The systemd input plugin allows to collect log messages from the Journald daemon on Linux environments.
[INPUT] name systemd tag host.* systemd_filter _SYSTEMD_UNIT=docker.service
SYSTEMD-JOURNALD.SERVICE(8) JOURNALD.CONF(5) -
Tail
The tail input plugin allows to monitor one or several text files. It has a similar behavior like
tail -f
shell command.The plugin reads every matched file in the Path pattern and for every new line found, it generates a new record. Optionally a database file can be used so the plugin can have a history of tracked files and a state of offsets, this is very useful to resume a state if the service is restarted.
If you are running Fluent Bit to process logs coming from containers like Docker or CRI, you can use the new built-in modes for such purposes. This will help to reassembly multiline messages originally split by Docker or CRI:
[INPUT] name tail path /var/log/containers/*.log # exclude_path /var/log/containers/*_logging_*.log,/var/log/containers/*_default*.log multiline.parser docker, cri
The two options separated by a comma means multi-format: try
docker
andcri
multiline formats.
3. References
-
https://kubernetes.io/docs/concepts/cluster-administration/logging/
-
https://docs.docker.com/config/containers/logging/json-file/
-
https://github.com/kubernetes/design-proposals-archive/blob/main/node/kubelet-cri-logging.md
-
https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/configuration-file
-
https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/issues/105
-
https://docs.fluentbit.io/manual/pipeline/filters/kubernetes