1. Device Files

In Unix-like operating systems, a device file or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. These special files allow an application program to interact with a device by using its device driver via standard input/output system calls. Using standard system calls simplifies many programming tasks, and leads to consistent user-space I/O mechanisms regardless of device features and functions.

There are two general kinds of device files in Unix-like operating systems, known as character special files and block special files. The difference between them lies in how much data is read and written by the operating system and hardware. These together can be called device special files in contrast to named pipes, which are not connected to a device but are not ordinary files either.

In some Unix-like systems, most device files are managed as part of a virtual file system traditionally mounted at /dev, possibly associated with a controlling daemon, which monitors hardware addition and removal at run time, making corresponding changes to the device file system if that’s not automatically done by the kernel, and possibly invoking scripts in system or user space to handle special device needs. The FreeBSD, DragonFly BSD and Darwin have a dedicated file system devfs; device nodes are managed automatically by this file system, in kernel space.

1.1. Unix and Unix-like Systems

Device nodes correspond to resources that an operating system’s kernel has already allocated. Unix identifies those resources by a major number and a minor number, both stored as part of the structure of a node. The assignment of these numbers occurs uniquely in different operating systems and on different computer platforms. Generally, the major number identifies the device driver and the minor number identifies a particular device (possibly out of many) that the driver controls: in this case, the system may pass the minor number to a driver. However, in the presence of dynamic number allocation, this may not be the case.

Simplified Structure of the Linux Kernel lxnode

User space programs access character and block devices through device nodes also referred to as device special files.

# lsblk
NAME                    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                       8:0    0   20G  0 disk
├─docker-thinpool_tmeta 254:0    0   52M  0 lvm
│ └─docker-thinpool     254:2    0  864M  0 lvm
└─docker-thinpool_tdata 254:1    0  864M  0 lvm
  └─docker-thinpool     254:2    0  864M  0 lvm
sdb                       8:16   0  100G  0 disk
└─sdb1                    8:17   0  100G  0 part /
sdc                       8:32   0   10G  0 disk

# ls -l /dev/sd*
brw-rw---- 1 root disk 8,  0 Nov 29 20:13 /dev/sda
brw-rw---- 1 root disk 8, 16 Nov 29 19:16 /dev/sdb
brw-rw---- 1 root disk 8, 17 Nov 29 19:16 /dev/sdb1
brw-rw---- 1 root disk 8, 32 Nov 29 20:13 /dev/sdc

1.1.1. Character Devices

Character special files or character devices provide unbuffered, direct access to the hardware device. They do not necessarily allow programs to read or write single characters at a time; that is up to the device in question.

1.1.2. Block Devices

Block special files or block devices provide buffered access to hardware devices, and provide some abstraction from their specifics. Unlike character devices, block devices will always allow the programmer to read or write a block of any size (including single characters/bytes) and any alignment.

1.1.3. Pseudo-devices

Device nodes on Unix-like systems do not necessarily have to correspond to physical devices. Nodes that lack this correspondence form the group of pseudo-devices. They provide various functions handled by the operating system. Some of the most commonly used (character-based) pseudo-devices include:

  • /dev/null – accepts and discards all input written to it; provides an end-of-file indication when read from.

  • /dev/zero – accepts and discards all input written to it; produces a continuous stream of null characters (zero-value bytes) as output when read from.

  • /dev/full – produces a continuous stream of null characters (zero-value bytes) as output when read from, and generates an ENOSPC ("disk full") error when attempting to write to it.

  • /dev/random – produces bytes generated by the kernel’s cryptographically secure pseudorandom number generator.

1.1.4. Naming Conventions

The following prefixes are used for the names of some devices in the /dev hierarchy, to identify the type of device:

  • pt: pseudo-terminals (virtual terminals)

  • tty: terminals

  • hd: ("classic") IDE driver

    • hda: the master device on the first ATA channel

    • hdb: the slave device on the first ATA channel

    • hdc: the master device on the second ATA channel

    • hdd: the slave device on the second ATA channel

  • sd: mass-storage driver (block device), SCSI driver

    • sda: first registered device

    • sdb, sdc, etc.: second, third, etc. registered devices

2. Device Mapper

The device mapper is a framework provided by the Linux kernel for mapping physical block devices onto higher-level virtual block devices. It forms the foundation of the logical volume manager (LVM), software RAIDs and dm-crypt disk encryption, and offers additional features such as file system snapshots.

Device mapper works by passing data from a virtual block device, which is provided by the device mapper itself, to another block device. Data can be also modified in transition, which is performed, for example, in the case of device mapper providing disk encryption or simulation of unreliable hardware behavior.

The following mapping targets are available:

  • cache – allows creation of hybrid volumes, by using solid-state drives (SSDs) as caches for hard disk drives (HDDs)

  • clone – will permit usage before a transfer is complete.

  • crypt – provides data encryption, by using the Linux kernel’s Crypto API

  • delay – delays reads and/or writes to different devices (used for testing)

  • era – behaves in a way similar to the linear target, while it keeps track of blocks that were written to within a user-defined period of time

  • error – simulates I/O errors for all mapped blocks (used for testing)

  • flakey – simulates periodic unreliable behaviour (used for testing)

  • linear – maps a continuous range of blocks onto another block device

  • mirror – maps a mirrored logical device, while providing data redundancy

  • multipath – supports the mapping of multipathed devices, through usage of their path groups

  • raid – offers an interface to the Linux kernel’s software RAID driver (md)

  • snapshot and snapshot-origin – used for creation of LVM snapshots, as part of the underlying copy-on-write scheme

  • striped – stripes the data across physical devices, with the number of stripes and the striping chunk size as parameters

  • thin – allows creation of devices larger than the underlying physical device, physical space is allocated only when written to

  • zero – an equivalent of /dev/zero, all reads return blocks of zeros, and writes are discarded

The Linux Storage Stack Diagram

2.1. Device Table Mappings

A mapped device is defined by a table that specifies how to map each range of logical sectors of the device using a supported Device Table mapping. The table for a mapped device is constructed from a list of lines of the form:

start length mapping [mapping_parameters...]

In the first line of a Device Mapper table, the start parameter must equal 0. The start + length parameters on one line must equal the start on the next line. Which mapping parameters are specified in a line of the mapping table depends on which mapping type is specified on the line.

Sizes in the Device Mapper are always specified in sectors (512 bytes).

When a device is specified as a mapping parameter in the Device Mapper, it can be referenced by the device name in the filesystem (for example, /dev/hda) or by the major and minor numbers in the format major:minor. The major:minor format is preferred because it avoids pathname lookups.

The following shows a sample mapping table for a device.

# dmsetup ls
docker-thinpool_tdata	(254:1)
docker-thinpool_tmeta	(254:0)
docker-thinpool	(254:2)
docker-8:17-3541243-067f280bcb2a54b005f1c85751b39d0b60fbe82a27bb84f0292b310f70fb754b	(254:3)

# dmsetup table
docker-thinpool_tdata: 0 1024000 linear 8:0 2048
docker-thinpool_tdata: 1024000 745472 linear 8:0 1239040
docker-thinpool_tmeta: 0 106496 linear 8:0 1026048
docker-thinpool: 0 1769472 thin-pool 254:0 254:1 1024 345 1 skip_block_zeroing
docker-8:17-3541243-067f280bcb2a54b005f1c85751b39d0b60fbe82a27bb84f0292b310f70fb754b: 0 20971520 thin 254:2 24

2.2. The dmsetup Command

The application interface to the Device Mapper is the ioctl system call. The user interface is the dmsetup command.

The dmsetup command is a command line wrapper for communication with the Device Mapper. For general system information about LVM devices, you may find the info, ls, status, and deps options of the dmsetup command to be useful.

  • The dmsetup ls Command

    You can list the device names of mapped devices with the dmsetup ls command. You can list devices that have at least one target of a specified type with the dmsetup ls --target target_type command. The dmsetup ls command provides a --tree option that displays dependencies between devices as a tree.

    # dmsetup ls
    vg0-lvol0	(254:0)
    
    # dmsetup ls --target linear
    vg0-lvol0	(254, 0)
    
    # dmsetup ls --tree
    vg0-lvol0 (254:0)
     └─ (8:0)
    
    # lvextend -L +20G vg0/lvol0
      Size of logical volume vg0/lvol0 changed from 500.00 MiB (125 extents) to <20.49 GiB (5245 extents).
      Logical volume vg0/lvol0 successfully resized.
    
    # lsblk
    NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    sda           8:0    0   20G  0 disk
    └─vg0-lvol0 254:0    0 20.5G  0 lvm
    sdb           8:16   0   10G  0 disk
    └─vg0-lvol0 254:0    0 20.5G  0 lvm
    sdc           8:32   0  100G  0 disk
    └─sdc1        8:33   0  100G  0 part /
    
    # dmsetup ls --tree
    vg0-lvol0 (254:0)
     ├─ (8:16)
     └─ (8:0)
  • The dmsetup info Command

    The dmsetup info device command provides summary information about Device Mapper devices. If you do not specify a device name, the output is information about all of the currently configured Device Mapper devices.

    # dmsetup info vg0-lvol0
    Name:              vg0-lvol0
    State:             ACTIVE
    Read Ahead:        256
    Tables present:    LIVE
    Open count:        0
    Event number:      0
    Major, minor:      254, 0
    Number of targets: 2
    UUID: LVM-iGEIRSIULIXi00RrqQZzoFEHYupSo8xDEYdOMnMSAjPKLNsXtT3wp9ozCyHzfZa5
    
    # lvs -v
      LV    VG  #Seg Attr       LSize   Maj Min KMaj KMin Pool Origin Data%  Meta%  Move Cpy%Sync Log Convert LV UUID                                LProfile
      lvol0 vg0    2 -wi-a----- <20.49g  -1  -1  254    0                                                     EYdOMn-MSAj-PKLN-sXtT-3wp9-ozCy-HzfZa5
  • The dmsetup status Command

    The dmsetup status device command provides status information for each target in a specified device. If you do not specify a device name, the output is information about all of the currently configured Device Mapper devices.

    # dmsetup status vg0-lvol0
    0 41934848 linear
    41934848 1032192 linear
  • The dmsetup deps Command

    The dmsetup deps device command provides a list of (major, minor) pairs for devices referenced by the mapping table for the specified device. If you do not specify a device name, the output is information about all of the currently configured Device Mapper devices.

    # dmsetup deps vg0-lvol0
    2 dependencies	: (8, 16) (8, 0)

3. Logical Volume Manager (LVM)

In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes to store volumes. In particular, a volume manager can concatenate, stripe together or otherwise combine partitions (or block devices in general) into larger virtual partitions that administrators can re-size or move, potentially without interrupting system use.

Disk partitioning or disk slicing is the creation of one or more regions on secondary storage, so that each region can be managed separately. These regions are called partitions. It is typically the first step of preparing a newly installed disk, before any file system is created. The disk stores the information about the partitions' locations and sizes in an area known as the partition table that the operating system reads before any other part of the disk. Each partition then appears to the operating system as a distinct "logical" disk that uses part of the actual disk.
Data striping is the technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices. Striping is useful when a processing device requests data more quickly than a single storage device can provide it. By spreading segments across multiple devices which can be accessed concurrently, total data throughput is increased.

Most volume-manager implementations share the same basic design. They start with physical volumes (PVs), which can be either hard disks, hard disk partitions, or Logical Unit Numbers (LUNs) of an external storage device. Volume management treats each PV as being composed of a sequence of chunks called physical extents (PEs).

Normally, PEs simply map one-to-one to logical extents (LEs). With mirroring, multiple PEs map to each LE. These PEs are drawn from a physical volume group (PVG), a set of same-sized PVs which act similarly to hard disks in a RAID1 array. PVGs are usually laid out so that they reside on different disks or data buses for maximum redundancy.

The system pools LEs into a volume group (VG). The pooled LEs can then be concatenated together into virtual disk partitions called logical volumes or LVs. Systems can use LVs as raw block devices just like disk partitions: creating mountable file systems on them, or using them as swap storage.

LVM1

LVM is used for the following purposes:

  • Creating single logical volumes of multiple physical volumes or entire hard disks (somewhat similar to RAID 0, but more similar to JBOD), allowing for dynamic volume resizing.

  • Managing large hard disk farms by allowing disks to be added and replaced without downtime or service disruption, in combination with hot swapping.

    Hot swapping is the replacement or addition of components to a computer system without stopping, shutting down, or rebooting the system; hot plugging describes the addition of components only. Components which have such functionality are said to be hot-swappable or hot-pluggable; likewise, components which do not are cold-swappable or cold-pluggable.
  • On small systems (like a desktop), instead of having to estimate at installation time how big a partition might need to be, LVM allows filesystems to be easily resized as needed.

  • Performing consistent backups by taking snapshots of the logical volumes.

  • Encrypting multiple physical partitions with one password.

LVM can be considered as a thin software layer on top of the hard disks and partitions, which creates an abstraction of continuity and ease-of-use for managing hard drive replacement, repartitioning and backup.

Lvm

3.1. Logical Volumes

Volume management creates a layer of abstraction over physical storage, allowing you to create logical storage volumes. This provides much greater flexibility in a number of ways than using physical storage directly. With a logical volume, you are not restricted to physical disk sizes. In addition, the hardware storage configuration is hidden from the software so it can be resized and moved without stopping applications or unmounting file systems. This can reduce operational costs. Logical volumes provide the following advantages over using physical storage directly:

  • Flexible capacity

    When using logical volumes, file systems can extend across multiple disks, since you can aggregate disks and partitions into a single logical volume.

  • Resizeable storage pools

    You can extend logical volumes or reduce logical volumes in size with simple software commands, without reformatting and repartitioning the underlying disk devices.

  • Online data relocation

    To deploy newer, faster, or more resilient storage subsystems, you can move data while your system is active. Data can be rearranged on disks while the disks are in use. For example, you can empty a hot-swappable disk before removing it.

  • Convenient device naming

    Logical storage volumes can be managed in user-defined and custom named groups.

  • Disk striping

    You can create a logical volume that stripes data across two or more disks. This can dramatically increase throughput.

  • Mirroring volumes

    Logical volumes provide a convenient way to configure a mirror for your data.

  • Volume Snapshots

    Using logical volumes, you can take device snapshots for consistent backups or to test the effect of changes without affecting the real data.

3.2. LVM with CLI Command

3.2.1. Physical Volumes

  • Setting the Partition Type

    If you are using a whole disk device for your physical volume, the disk must have no partition table. For whole disk devices only the partition table must be erased, which will effectively destroy all data on that disk. You can remove an existing partition table by zeroing the first sector with the following command:

    # dd if=/dev/zero of=PhysicalVolume bs=512 count=1
    • Use dd to erase disk partition table

      NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
      sda      8:0    0   20G  0 disk
      └─sda1   8:1    0   20G  0 part
      sdb      8:16   0   10G  0 disk
      └─sdb1   8:17   0    5G  0 part
      sdc      8:32   0  100G  0 disk
      └─sdc1   8:33   0  100G  0 part /
      
      # dd if=/dev/zero of=/dev/sda bs=512 count=1
      1+0 records in
      1+0 records out
      512 bytes copied, 0.00303601 s, 169 kB/s
      
      # lsblk
      NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
      sda      8:0    0   20G  0 disk
      sdb      8:16   0   10G  0 disk
      └─sdb1   8:17   0    5G  0 part
      sdc      8:32   0  100G  0 disk
      └─sdc1   8:33   0  100G  0 part /
  • Initializing Physical Volumes

    Use the pvcreate command to initialize a block device to be used as a physical volume. Initialization is analogous to formatting a file system.

    The following command initializes the whole disk /dev/sda, and partition /dev/sdb1 as LVM physical volumes for later use as part of LVM logical volumes.

    # pvcreate /dev/sda /dev/sdb1
      Physical volume "/dev/sda" successfully created.
      Physical volume "/dev/sdb1" successfully created.
  • Scanning for Block Devices

    You can scan for block devices that may be used as physical volumes with the lvmdiskscan command, as shown in the following example.

    # lvmdiskscan
      /dev/sda  [      20.00 GiB] LVM physical volume
      /dev/sdb1 [       5.00 GiB] LVM physical volume
      /dev/sdc1 [    <100.00 GiB]
      0 disks
      1 partition
      1 LVM physical volume whole disk
      1 LVM physical volume
  • Displaying Physical Volumes

    There are three commands you can use to display properties of LVM physical volumes: pvs, pvdisplay, and pvscan.

    The pvs command provides physical volume information in a configurable form, displaying one line per physical volume.

    The pvdisplay command provides a verbose multi-line output for each physical volume. It displays physical properties (size, extents, volume group, and so on) in a fixed format.

    The pvscan command scans all supported LVM block devices in the system for physical volumes.

    # pvs
      PV         VG Fmt  Attr PSize  PFree
      /dev/sda      lvm2 ---  20.00g 20.00g
      /dev/sdb1     lvm2 ---   5.00g  5.00g
    
    # pvdisplay
      "/dev/sda" is a new physical volume of "20.00 GiB"
      --- NEW Physical volume ---
      PV Name               /dev/sda
      VG Name
      PV Size               20.00 GiB
      Allocatable           NO
      PE Size               0
      Total PE              0
      Free PE               0
      Allocated PE          0
      PV UUID               dkb7NA-jjx0-203S-wb8K-KUnu-dbj3-RLQ1lc
    
      "/dev/sdb1" is a new physical volume of "5.00 GiB"
      --- NEW Physical volume ---
      PV Name               /dev/sdb1
      VG Name
      PV Size               5.00 GiB
      Allocatable           NO
      PE Size               0
      Total PE              0
      Free PE               0
      Allocated PE          0
      PV UUID               TYTlaL-Wbzd-wZhW-tNeb-GWFA-HErD-NJbKNU
    
    # pvscan
      PV /dev/sda                       lvm2 [20.00 GiB]
      PV /dev/sdb1                      lvm2 [5.00 GiB]
      Total: 2 [25.00 GiB] / in use: 0 [0   ] / in no VG: 2 [25.00 GiB]
  • Resizing a Physical Volume

    If you need to change the size of an underlying block device for any reason, use the pvresize command to update LVM with the new size. You can execute this command while LVM is using the physical volume.

    # pvresize --setphysicalvolumesize 10G /dev/sda
    /dev/sda: Requested size 10.00 GiB is less than real size 20.00 GiB. Proceed?  [y/n]: y
      WARNING: /dev/sda: Pretending size is 20971520 not 41943040 sectors.
      Physical volume "/dev/sda" changed
      1 physical volume(s) resized or updated / 0 physical volume(s) not resized
  • Removing Physical Volumes

    If a device is no longer required for use by LVM, you can remove the LVM label with the pvremove command. Executing the pvremove command zeroes the LVM metadata on an empty physical volume.

    # pvremove /dev/sda
      Labels on physical volume "/dev/sda" successfully wiped.

3.2.2. Volume Groups

  • Creating Volume Groups

    To create a volume group from one or more physical volumes, use the vgcreate command. The vgcreate command creates a new volume group by name and adds at least one physical volume to it.

    # vgcreate vg0 /dev/sda /dev/sdb1
      Volume group "vg0" successfully created
    
    # pvs
      PV         VG  Fmt  Attr PSize   PFree
      /dev/sda   vg0 lvm2 a--  <20.00g <20.00g
      /dev/sdb1  vg0 lvm2 a--   <5.00g  <5.00g

    When physical volumes are used to create a volume group, its disk space is divided into 4MB extents, by default.

    LVM volume groups and underlying logical volumes are included in the device special file directory tree in the /dev directory with the following layout:

    /dev/<vg>/<lv>/

    The device special files are not present if the corresponding logical volume is not currently active.

  • Adding Physical Volumes to a Volume Group

    To add additional physical volumes to an existing volume group, use the vgextend command. The vgextend command increases a volume group’s capacity by adding one or more free physical volumes.

    # vgextend vg0 /dev/sdb2
      Volume group "vg0" successfully extended
  • Displaying Volume Groups

    The vgscan command, which scans all the disks for volume groups and rebuilds the LVM cache file, also displays the volume groups.

    The vgs command provides volume group information in a configurable form, displaying one line per volume group.

    The vgdisplay command displays volume group properties (such as size, extents, number of physical volumes, and so on) in a fixed form.

    # vgs
      VG  #PV #LV #SN Attr   VSize   VFree
      vg0   3   0   0 wz--n- <25.99g <25.99g
    
    # vgscan
      Found volume group "vg0" using metadata type lvm2
    
    # vgdisplay
      --- Volume group ---
      VG Name               vg0
      System ID
      Format                lvm2
      Metadata Areas        3
      Metadata Sequence No  2
      VG Access             read/write
      VG Status             resizable
      MAX LV                0
      Cur LV                0
      Open LV               0
      Max PV                0
      Cur PV                3
      Act PV                3
      VG Size               <25.99 GiB
      PE Size               4.00 MiB
      Total PE              6653
      Alloc PE / Size       0 / 0
      Free  PE / Size       6653 / <25.99 GiB
      VG UUID               5dLR48-em6r-8UIA-PcPe-RyLY-p8gB-QNOzpU
  • Removing Physical Volumes from a Volume Group

    To remove unused physical volumes from a volume group, use the vgreduce command. The vgreduce command shrinks a volume group’s capacity by removing one or more empty physical volumes. This frees those physical volumes to be used in different volume groups or to be removed from the system.

    Before removing a physical volume from a volume group, you can make sure that the physical volume is not used by any logical volumes by using the pvdisplay command.

    # pvdisplay /dev/sdb2
      --- Physical volume ---
      PV Name               /dev/sdb2
      VG Name               vg0
      PV Size               1.00 GiB / not usable 4.00 MiB
      Allocatable           yes
      PE Size               4.00 MiB
      Total PE              255
      Free PE               255
      Allocated PE          0
      PV UUID               sBmEek-5ylr-T3FE-daaw-mNOb-J2Yu-XzNR1q

    If the physical volume is still being used you will have to migrate the data to another physical volume using the pvmove command. Then use the vgreduce command to remove the physical volume.

    # pvmove /dev/sdb2 /dev/sdb1
      No data to move for vg0.
  • Activating and Deactivating Volume Groups

    When you create a volume group it is, by default, activated. This means that the logical volumes in that group are accessible and subject to change.

    There are various circumstances for which you need to make a volume group inactive and thus unknown to the kernel. To deactivate or activate a volume group, use the -a (--active) argument of the vgchange command.

    # vgchange -a n vg0
      0 logical volume(s) in volume group "vg0" now active
  • Renaming a Volume Group

    Use the vgrename command to rename an existing volume group.

    # vgrename vg0 vg1
      Volume group "vg0" successfully renamed to "vg1"
  • Removing Volume Groups

    To remove a volume group that contains no logical volumes, use the vgremove command.

    # vgremove vg1
      Volume group "vg1" successfully removed

3.2.3. Logical Volumes

  • Creating Linear Logical Volumes

    To create a logical volume, use the lvcreate command. If you do not specify a name for the logical volume, the default name lvol# is used where # is the internal number of the logical volume.

    When you create a logical volume, the logical volume is carved from a volume group using the free extents on the physical volumes that make up the volume group. Normally logical volumes use up any space available on the underlying physical volumes on a next-free basis. Modifying the logical volume frees and reallocates space in the physical volumes.

    The following command creates a logical volume 10 gigabytes in size in the volume group vg0.

    # lvcreate -L 10G vg1
      Logical volume "lvol0" created.
    
    # ls -l /dev/vg1/lvol0
    lrwxrwxrwx 1 root root 7 Nov 29 15:39 /dev/vg1/lvol0 -> ../dm-0

    You can use the -l argument of the lvcreate command to specify the size of the logical volume in extents.

    # lvcreate -l 50 vg1
      Logical volume "lvol1" created.
    
    # lvs
      LV    VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
      lvol0 vg1 -wi-a-----  10.00g
      lvol1 vg1 -wi-a----- 200.00m
  • Creating Thinly-Provisioned Logical Volumes

    Logical volumes can be thinly provisioned. This allows you to create logical volumes that are larger than the available extents. Using thin provisioning, you can manage a storage pool of free space, known as a thin pool, which can be allocated to an arbitrary number of devices when needed by applications. You can then create devices that can be bound to the thin pool for later allocation when an application actually writes to the logical volume. The thin pool can be expanded dynamically when needed for cost-effective allocation of storage space.

    You can use the -T (or --thin) option of the lvcreate command to create either a thin pool or a thin volume. You can also use -T option of the lvcreate command to create both a thin pool and a thin volume in that pool at the same time with a single command.

    # lvcreate -L 100M -T vg1/mythinpool0
      Thin pool volume with chunk size 64.00 KiB can address at most 15.81 TiB of data.
      Logical volume "mythinpool0" created.
    
    # lvs
      LV          VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
      lvol0       vg1 -wi-a-----  10.00g
      lvol1       vg1 -wi-a----- 200.00m
      mythinpool0 vg1 twi-a-tz-- 100.00m             0.00   10.84
    
    # lvcreate -V 1G -T vg1/mythinpool0 -n thinvolume0
      WARNING: Sum of all thin volume sizes (1.00 GiB) exceeds the size of thin pool vg1/mythinpool0 (100.00 MiB).
      WARNING: You have not turned on protection against thin pools running out of space.
      WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
      Logical volume "thinvolume0" created.
    
    # lvs
      LV          VG  Attr       LSize   Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
      lvol0       vg1 -wi-a-----  10.00g
      lvol1       vg1 -wi-a----- 200.00m
      mythinpool0 vg1 twi-aotz-- 100.00m                    0.00   10.94
      thinvolume0 vg1 Vwi-a-tz--   1.00g mythinpool0        0.00
    
    
    # lvcreate -L 100m -T vg1/mythinpool1 -V 50m -n thinvolume1
      Rounding up size to full physical extent 52.00 MiB
      Thin pool volume with chunk size 64.00 KiB can address at most 15.81 TiB of data.
      Logical volume "thinvolume1" created.
    
    # lvs
      LV          VG  Attr       LSize   Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
      lvol0       vg1 -wi-a-----  10.00g
      lvol1       vg1 -wi-a----- 200.00m
      mythinpool0 vg1 twi-aotz-- 100.00m                    0.00   10.94
      mythinpool1 vg1 twi-aotz-- 100.00m                    0.00   10.94
      thinvolume0 vg1 Vwi-a-tz--   1.00g mythinpool0        0.00
      thinvolume1 vg1 Vwi-a-tz--  52.00m mythinpool1        0.00
  • Creating Snapshot Volumes

    Use the -s argument of the lvcreate command to create a snapshot volume. A snapshot volume is writable.

    LVM does not allow you to create a snapshot volume that is larger than the size of the origin volume plus needed metadata for the volume. If you specify a snapshot volume that is larger than this, the system will create a snapshot volume that is only as large as will be needed for the size of the origin.

    # lvcreate -L 100m -n snap0 -s /dev/vg1/lvol0
      WARNING: Sum of all thin volume sizes (1.05 GiB) exceeds the size of thin pools (200.00 MiB).
      WARNING: You have not turned on protection against thin pools running out of space.
      WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
      Logical volume "snap0" created.
    
    # lvs
      LV          VG  Attr       LSize   Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
      lvol0       vg1 owi-a-s---  10.00g
      lvol1       vg1 -wi-a----- 200.00m
      mythinpool0 vg1 twi-aotz-- 100.00m                    0.00   10.94
      mythinpool1 vg1 twi-aotz-- 100.00m                    0.00   10.94
      snap0       vg1 swi-a-s--- 100.00m             lvol0  0.00
      thinvolume0 vg1 Vwi-a-tz--   1.00g mythinpool0        0.00
      thinvolume1 vg1 Vwi-a-tz--  52.00m mythinpool1        0.00
  • Creating Thinly-Provisioned Snapshot Volumes

    Thin snapshot volumes allow many virtual devices to be stored on the same data volume. This simplifies administration and allows for the sharing of data between snapshot volumes.

    Thin snapshot volumes provide the following benefits:

    • A thin snapshot volume can reduce disk usage when there are multiple snapshots of the same origin volume.

    • If there are multiple snapshots of the same origin, then a write to the origin will cause one COW operation to preserve the data. Increasing the number of snapshots of the origin should yield no major slowdown.

    • Thin snapshot volumes can be used as a logical volume origin for another snapshot. This allows for an arbitrary depth of recursive snapshots (snapshots of snapshots of snapshots…​).

    • A snapshot of a thin logical volume also creates a thin logical volume. This consumes no data space until a COW operation is required, or until the snapshot itself is written.

    • A thin snapshot volume does not need to be activated with its origin, so a user may have only the origin active while there are many inactive snapshot volumes of the origin.

    • When you delete the origin of a thinly-provisioned snapshot volume, each snapshot of that origin volume becomes an independent thinly-provisioned volume. This means that instead of merging a snapshot with its origin volume, you may choose to delete the origin volume and then create a new thinly-provisioned snapshot using that independent volume as the origin volume for the new snapshot.

    Thin snapshots can be created for thinly-provisioned origin volumes, or for origin volumes that are not thinly-provisioned.

    # lvcreate -s -n mysnapshot1  vg1/thinvolume0
      WARNING: Sum of all thin volume sizes (2.05 GiB) exceeds the size of thin pools (200.00 MiB).
      WARNING: You have not turned on protection against thin pools running out of space.
      WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
      Logical volume "mysnapshot1" created.
    
    # lvchange -p r vg1/lvol1
      Logical volume vg1/lvol1 changed.
    
    # lvchange -a n vg1/lvol1
    
    # lvcreate -s -n mysnapshot2 --thinpool mythinpool1  vg1/lvol1
      WARNING: Sum of all thin volume sizes (<2.25 GiB) exceeds the size of thin pools (200.00 MiB).
      WARNING: You have not turned on protection against thin pools running out of space.
      WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
      Logical volume "mysnapshot2" created.
    
    # lvs
      LV          VG  Attr       LSize   Pool        Origin      Data%  Meta%  Move Log Cpy%Sync Convert
      lvol0       vg1 owi-a-s---  10.00g
      lvol1       vg1 ori------- 200.00m
      mysnapshot1 vg1 Vwi---tz-k   1.00g mythinpool0 thinvolume0
      mysnapshot2 vg1 Vwi-a-tz-- 200.00m mythinpool1 lvol1       0.00
      mythinpool0 vg1 twi-aotz-- 100.00m                         0.00   10.94
      mythinpool1 vg1 twi-aotz-- 100.00m                         0.00   11.04
      snap0       vg1 swi-a-s--- 100.00m             lvol0       0.00
      thinvolume0 vg1 Vwi-a-tz--   1.00g mythinpool0             0.00
      thinvolume1 vg1 Vwi-a-tz--  52.00m mythinpool1             0.00

3.3. LVM Configuration Examples

  • To use disks in a volume group, label them as LVM physical volumes with the pvcreate command.

    # lsblk
    NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    sda      8:0    0   20G  0 disk
    └─sda1   8:1    0   10G  0 part
    sdb      8:16   0   10G  0 disk
    sdc      8:32   0  100G  0 disk
    └─sdc1   8:33   0  100G  0 part /
    
    # dd if=/dev/zero of=/dev/sda count=1
    1+0 records in
    1+0 records out
    512 bytes copied, 0.00135126 s, 379 kB/s
    
    # pvcreate /dev/sda /dev/sdb
      Physical volume "/dev/sda" successfully created.
      Physical volume "/dev/sdb" successfully created.
    
    # pvs
      PV         VG Fmt  Attr PSize  PFree
      /dev/sda      lvm2 ---  20.00g 20.00g
      /dev/sdb      lvm2 ---  10.00g 10.00g
  • Create a volume group that consists of the LVM physical volumes you have created.

    # vgcreate vg0 /dev/sda /dev/sdb
      Volume group "vg0" successfully created
    
    # vgs
      VG  #PV #LV #SN Attr   VSize  VFree
      vg0   2   0   0 wz--n- 29.99g 29.99g
  • Create the logical volume from the volume group you have created.

    # lvcreate -L 5G vg0
      Logical volume "lvol0" created.
    
    # lvs
      LV    VG  Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
      lvol0 vg0 -wi-a----- 5.00g
  • Create a file system on the logical volume.

    # mkfs.ext4 /dev/vg0/lvol0
    mke2fs 1.46.2 (28-Feb-2021)
    Creating filesystem with 1310720 4k blocks and 327680 inodes
    Filesystem UUID: b08cfa69-5034-4e46-b045-d5d7221bc434
    Superblock backups stored on blocks:
    	32768, 98304, 163840, 229376, 294912, 819200, 884736
    
    Allocating group tables: done
    Writing inode tables: done
    Creating journal (16384 blocks): done
    Writing superblocks and filesystem accounting information: done
  • Resize the file system online.

    # mkdir /mnt/data
    # mount /dev/mapper/vg0-lvol0 /mnt/data/
    # grep mapper /proc/mounts
    /dev/mapper/vg0-lvol0 /mnt/data ext4 rw,relatime 0 0
    # df -h /mnt/data/
    Filesystem             Size  Used Avail Use% Mounted on
    /dev/mapper/vg0-lvol0  4.9G   24K  4.6G   1% /mnt/data
    
    # lvextend -L +5G /dev/vg0/lvol0
      Size of logical volume vg0/lvol0 changed from 5.00 GiB (1280 extents) to 10.00 GiB (2560 extents).
      Logical volume vg0/lvol0 successfully resized.
    
    # resize2fs /dev/vg0/lvol0
    resize2fs 1.46.2 (28-Feb-2021)
    Filesystem at /dev/vg0/lvol0 is mounted on /mnt/data; on-line resizing required
    old_desc_blocks = 1, new_desc_blocks = 2
    The filesystem on /dev/vg0/lvol0 is now 2621440 (4k) blocks long.
    
    # df -h /mnt/data/
    Filesystem             Size  Used Avail Use% Mounted on
    /dev/mapper/vg0-lvol0  9.8G   23M  9.3G   1% /mnt/data
  • Cleanup

    # umount /mnt/data && rm -rf /mnt/data/
    
    # vgremove vg0 -y
      Logical volume "lvol0" successfully removed
      Volume group "vg0" successfully removed
    
    # pvremove /dev/sda /dev/sdb
      Labels on physical volume "/dev/sda" successfully wiped.
      Labels on physical volume "/dev/sdb" successfully wiped.

4. Docker Device Mapper Storage Driver

Device Mapper is a kernel-based framework that underpins many advanced volume management technologies on Linux. Docker’s devicemapper storage driver leverages the thin provisioning and snapshotting capabilities of this framework for image and container management.

The devicemapper driver uses block devices dedicated to Docker and operates at the block level, rather than the file level. These devices can be extended by adding physical storage to your Docker host, and they perform better than using a filesystem at the operating system (OS) level.

The devicemapper storage driver requires direct-lvm for production environments, because loopback-lvm, while zero-configuration, has very poor performance. devicemapper was the recommended storage driver for CentOS and RHEL, as their kernel version did not support overlay2. However, current versions of CentOS and RHEL now have support for overlay2, which is now the recommended driver.

4.1. Configure loop-lvm mode for testing

This configuration is only appropriate for testing. The loop-lvm mode makes use of a ‘loopback’ mechanism that allows files on the local disk to be read from and written to as if they were an actual physical disk or block device. However, the addition of the loopback mechanism, and interaction with the OS filesystem layer, means that IO operations can be slow and resource-intensive. Use of loopback devices can also introduce race conditions. However, setting up loop-lvm mode can help identify basic issues (such as missing user space packages, kernel drivers, etc.) ahead of attempting the more complex set up required to enable direct-lvm mode. loop-lvm mode should therefore only be used to perform rudimentary testing prior to configuring direct-lvm.

  1. Stop Docker.

    sudo systemctl stop docker
  2. Edit /etc/docker/daemon.json. If it does not yet exist, create it. Assuming that the file was empty, add the following contents.

    {
      "storage-driver": "devicemapper"
    }
  3. Start Docker.

    sudo systemctl start docker
  4. Verify that the daemon is using the devicemapper storage driver. Use the docker info command and look for Storage Driver.

    $ docker info
    <...>
    Server:
    <...>
     Server Version: 20.10.10
     Storage Driver: devicemapper
      Pool Name: docker-8:33-3832377-pool
      Pool Blocksize: 65.54kB
      Base Device Size: 10.74GB
      Backing Filesystem: ext4
      Udev Sync Supported: true
      Data file: /dev/loop0
      Metadata file: /dev/loop1
      Data loop file: /var/lib/docker/devicemapper/devicemapper/data
      Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
      Data Space Used: 240.1MB
      Data Space Total: 107.4GB
      Data Space Available: 39.26GB
      Metadata Space Used: 17.47MB
      Metadata Space Total: 2.147GB
      Metadata Space Available: 2.13GB
      Thin Pool Minimum Free Space: 10.74GB
      Deferred Removal Enabled: true
      Deferred Deletion Enabled: true
      Deferred Deleted Device Count: 0
      Library Version: 1.02.175 (2021-01-08)
    <...>
    WARNING: the devicemapper storage-driver is deprecated, and will be removed in a future release.
    WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
             Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.

4.2. Configure direct-lvm mode manually

  • Create a physical volume on your block device

    # pvcreate /dev/sda /dev/sdb
      Physical volume "/dev/sda" successfully created.
      Physical volume "/dev/sdb" successfully created.
    
    # pvs
      PV         VG Fmt  Attr PSize  PFree
      /dev/sda      lvm2 ---  20.00g 20.00g
      /dev/sdb      lvm2 ---  10.00g 10.00g
  • Create a docker volume group on the same device

    # vgcreate docker /dev/sda /dev/sdb
      Volume group "docker" successfully created
    
    # vgs
      VG     #PV #LV #SN Attr   VSize  VFree
      docker   2   0   0 wz--n- 29.99g 29.99g
  • Create two logical volumes named thinpool and thinpoolmeta

    # lvcreate --wipesignatures y -n thinpool docker -L 500m
      Logical volume "thinpool" created.
    
    # lvcreate --wipesignatures y -n thinpoolmeta docker -L 50m
      Rounding up size to full physical extent 52.00 MiB
      Logical volume "thinpoolmeta" created.
    
    # dmsetup ls
    docker-thinpoolmeta	(254:1)
    docker-thinpool	(254:0)
    
    # lvs
      LV           VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
      thinpool     docker -wi-a----- 500.00m
      thinpoolmeta docker -wi-a-----  52.00m
  • Convert the volumes to a thin pool and a storage location for metadata for the thin pool.

    # lvconvert -y \
        --zero n \
        -c 512K \
        --thinpool docker/thinpool \
        --poolmetadata docker/thinpoolmeta
    
      Thin pool volume with chunk size 512.00 KiB can address at most 126.50 TiB of data.
      WARNING: Converting docker/thinpool and docker/thinpoolmeta to thin pool's data and metadata volumes with metadata wiping.
      THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
      Converted docker/thinpool and docker/thinpoolmeta to thin pool.
    
    # dmsetup ls
    docker-thinpool_tdata	(254:1)
    docker-thinpool_tmeta	(254:0)
    docker-thinpool	(254:2)
    
    # lvs
      LV       VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
      thinpool docker twi-a-t--- 500.00m             0.00   10.07
  • Configure autoextension of thin pools via an lvm profile.

    # cat <<EOF > /etc/lvm/profile/docker-thinpool.profile
    > activation {
      thin_pool_autoextend_threshold=80
      thin_pool_autoextend_percent=20
    }
    > EOF
  • Apply the LVM profile.

    # lvchange --metadataprofile docker-thinpool docker/thinpool
      Logical volume docker/thinpool changed.
    
    # lvs -v
      LV       VG     #Seg Attr       LSize   Maj Min KMaj KMin Pool Origin Data%  Meta%  Move Cpy%Sync Log Convert LV UUID                                LProfile
      thinpool docker    1 twi-a-t--- 500.00m  -1  -1  254    2             0.00   10.07                            34X3Lb-QjmS-tkgG-LZWm-OUs3-FECR-X2hmMp docker-thinpool
  • Ensure monitoring of the logical volume is enabled.

    # lvs -o+seg_monitor
      LV       VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Monitor
      thinpool docker twi-a-t--- 500.00m             0.00   10.07                            monitored
  • If you have ever run Docker on this host before, or if /var/lib/docker/ exists, move it out of the way so that Docker can use the new LVM pool to store the contents of image and containers.

  • Edit /etc/docker/daemon.json and configure the options needed for the devicemapper storage driver.

    {
        "storage-driver": "devicemapper",
        "storage-opts": [
        "dm.thinpooldev=/dev/mapper/docker-thinpool",
        "dm.use_deferred_removal=true",
        "dm.use_deferred_deletion=true"
        ]
    }
  • Verify that Docker is using the new configuration using docker info.

    # systemctl start docker
    
    # docker info
    <...>
    Server:
    <...>
     Server Version: 20.10.10
     Storage Driver: devicemapper
      Pool Name: docker-thinpool
      Pool Blocksize: 524.3kB
      Base Device Size: 10.74GB
      Backing Filesystem: ext4
      Udev Sync Supported: true
      Data Space Used: 246.4MB
      Data Space Total: 524.3MB
      Data Space Available: 277.9MB
      Metadata Space Used: 5.505MB
      Metadata Space Total: 54.53MB
      Metadata Space Available: 49.02MB
      Thin Pool Minimum Free Space: 52.43MB
      Deferred Removal Enabled: true
      Deferred Deletion Enabled: true
      Deferred Deleted Device Count: 0
      Library Version: 1.02.175 (2021-01-08)
    <...>
    
    WARNING: the devicemapper storage-driver is deprecated, and will be removed in a future release.
    # docker pull nginx
    Using default tag: latest
    latest: Pulling from library/nginx
    Digest: sha256:097c3a0913d7e3a5b01b6c685a60c03632fc7a2b50bc8e35bcaa3691d788226e
    Status: Image is up to date for nginx:latest
    docker.io/library/nginx:latest
    
    # lvs
      LV       VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
      thinpool docker twi-a-t--- 600.00m             69.42  10.31
    
    # docker run --rm -d nginx
    9c4cb272b38b3bc1cf469cfa885abe2547df49c93f983b3b48596bed1cdb1b8e
    # docker run --rm -d nginx
    99d9be78d9707f36bda6329e42448e3fb0f18eb79b41eb072db4144f490bebbd
    # docker run --rm -d nginx
    53e8d193c92132431ffb0c34c2b3575f1a0df81dff718d34153970f3bdb61a9a
    # docker run --rm -d nginx
    dd2834a4f72305a90eb332c1970667e601dabc6a47ca27be4882b7e01a4c7107
    # docker run --rm -d nginx
    353a9cb5896670f7edbd0caed4a716552a1d1bf8948c31b1c2791ae26715cda9
    # docker run --rm -d nginx
    9e8c545bcbfa2a01c0708140bb9e3ab29b829e14a096084839e9c428e1cf6c72
    
    # docker ps -s
    CONTAINER ID   IMAGE     COMMAND                  CREATED         STATUS         PORTS     NAMES                 SIZE
    9e8c545bcbfa   nginx     "/docker-entrypoint.…"   2 minutes ago   Up 2 minutes   80/tcp    romantic_chaum        1.09kB (virtual 141MB)
    353a9cb58966   nginx     "/docker-entrypoint.…"   2 minutes ago   Up 2 minutes   80/tcp    peaceful_carson       1.09kB (virtual 141MB)
    dd2834a4f723   nginx     "/docker-entrypoint.…"   2 minutes ago   Up 2 minutes   80/tcp    suspicious_thompson   1.09kB (virtual 141MB)
    53e8d193c921   nginx     "/docker-entrypoint.…"   2 minutes ago   Up 2 minutes   80/tcp    lucid_antonelli       1.09kB (virtual 141MB)
    99d9be78d970   nginx     "/docker-entrypoint.…"   2 minutes ago   Up 2 minutes   80/tcp    gifted_chaum          1.09kB (virtual 141MB)
    9c4cb272b38b   nginx     "/docker-entrypoint.…"   2 minutes ago   Up 2 minutes   80/tcp    youthful_allen        1.09kB (virtual 141MB)
    
    # df -h
    Filesystem      Size  Used Avail Use% Mounted on
    udev            1.9G     0  1.9G   0% /dev
    tmpfs           391M  1.4M  389M   1% /run
    /dev/sdb1        98G   62G   32G  67% /
    tmpfs           2.0G     0  2.0G   0% /dev/shm
    tmpfs           5.0M     0  5.0M   0% /run/lock
    tmpfs           391M  4.0K  391M   1% /run/user/1000
    /dev/dm-4       9.8G  148M  9.1G   2% /var/lib/docker/devicemapper/mnt/1e3bc2b9ece4a7496fb62ac28b70f81c2c9c2c12c1a11f8be45bb0d1aba37a46
    /dev/dm-3       9.8G  148M  9.1G   2% /var/lib/docker/devicemapper/mnt/3de2e58935eae6cec5b8412db8f75cf1113c5df202a4c3c52354892af054a5b4
    /dev/dm-5       9.8G  148M  9.1G   2% /var/lib/docker/devicemapper/mnt/28cca0406d4e6e0d166d8da345bbb113bff1c40f5a493d957877eecd1d6b214b
    /dev/dm-7       9.8G  148M  9.1G   2% /var/lib/docker/devicemapper/mnt/67bba018334381eb84a5c8bcdd123af5db429d3add3d27cea9ee39359d9d127f
    /dev/dm-6       9.8G  148M  9.1G   2% /var/lib/docker/devicemapper/mnt/f37bd1fabee07ec2e4256c3e725e7d9001a5cb042c6993a56cc3db1840ad3d5e
    /dev/dm-8       9.8G  148M  9.1G   2% /var/lib/docker/devicemapper/mnt/6e9517b86e554438cd35d9d78474677da7a65de9ab4fd1dd25583dfd5ba8e6f1
    # lvs
      LV       VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
      thinpool docker twi-aot--- 600.00m             73.42  10.63
    # docker pull debian:bullseye
    bullseye: Pulling from library/debian
    647acf3d48c2: Pull complete
    Digest: sha256:e8c184b56a94db0947a9d51ec68f42ef5584442f20547fa3bd8cbd00203b2e7a
    Status: Downloaded newer image for debian:bullseye
    docker.io/library/debian:bullseye
    
    # lvs
      LV       VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
      thinpool docker twi-aot--- 864.00m             71.30  10.71