883 lines
		
	
	
		
			40 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			883 lines
		
	
	
		
			40 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. SPDX-License-Identifier: GPL-2.0
 | |
| 
 | |
| .. _fsverity:
 | |
| 
 | |
| =======================================================
 | |
| fs-verity: read-only file-based authenticity protection
 | |
| =======================================================
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| fs-verity (``fs/verity/``) is a support layer that filesystems can
 | |
| hook into to support transparent integrity and authenticity protection
 | |
| of read-only files.  Currently, it is supported by the ext4, f2fs, and
 | |
| btrfs filesystems.  Like fscrypt, not too much filesystem-specific
 | |
| code is needed to support fs-verity.
 | |
| 
 | |
| fs-verity is similar to `dm-verity
 | |
| <https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_
 | |
| but works on files rather than block devices.  On regular files on
 | |
| filesystems supporting fs-verity, userspace can execute an ioctl that
 | |
| causes the filesystem to build a Merkle tree for the file and persist
 | |
| it to a filesystem-specific location associated with the file.
 | |
| 
 | |
| After this, the file is made readonly, and all reads from the file are
 | |
| automatically verified against the file's Merkle tree.  Reads of any
 | |
| corrupted data, including mmap reads, will fail.
 | |
| 
 | |
| Userspace can use another ioctl to retrieve the root hash (actually
 | |
| the "fs-verity file digest", which is a hash that includes the Merkle
 | |
| tree root hash) that fs-verity is enforcing for the file.  This ioctl
 | |
| executes in constant time, regardless of the file size.
 | |
| 
 | |
| fs-verity is essentially a way to hash a file in constant time,
 | |
| subject to the caveat that reads which would violate the hash will
 | |
| fail at runtime.
 | |
| 
 | |
| Use cases
 | |
| =========
 | |
| 
 | |
| By itself, fs-verity only provides integrity protection, i.e.
 | |
| detection of accidental (non-malicious) corruption.
 | |
| 
 | |
| However, because fs-verity makes retrieving the file hash extremely
 | |
| efficient, it's primarily meant to be used as a tool to support
 | |
| authentication (detection of malicious modifications) or auditing
 | |
| (logging file hashes before use).
 | |
| 
 | |
| A standard file hash could be used instead of fs-verity.  However,
 | |
| this is inefficient if the file is large and only a small portion may
 | |
| be accessed.  This is often the case for Android application package
 | |
| (APK) files, for example.  These typically contain many translations,
 | |
| classes, and other resources that are infrequently or even never
 | |
| accessed on a particular device.  It would be slow and wasteful to
 | |
| read and hash the entire file before starting the application.
 | |
| 
 | |
| Unlike an ahead-of-time hash, fs-verity also re-verifies data each
 | |
| time it's paged in.  This ensures that malicious disk firmware can't
 | |
| undetectably change the contents of the file at runtime.
 | |
| 
 | |
| fs-verity does not replace or obsolete dm-verity.  dm-verity should
 | |
| still be used on read-only filesystems.  fs-verity is for files that
 | |
| must live on a read-write filesystem because they are independently
 | |
| updated and potentially user-installed, so dm-verity cannot be used.
 | |
| 
 | |
| fs-verity does not mandate a particular scheme for authenticating its
 | |
| file hashes.  (Similarly, dm-verity does not mandate a particular
 | |
| scheme for authenticating its block device root hashes.)  Options for
 | |
| authenticating fs-verity file hashes include:
 | |
| 
 | |
| - Trusted userspace code.  Often, the userspace code that accesses
 | |
|   files can be trusted to authenticate them.  Consider e.g. an
 | |
|   application that wants to authenticate data files before using them,
 | |
|   or an application loader that is part of the operating system (which
 | |
|   is already authenticated in a different way, such as by being loaded
 | |
|   from a read-only partition that uses dm-verity) and that wants to
 | |
|   authenticate applications before loading them.  In these cases, this
 | |
|   trusted userspace code can authenticate a file's contents by
 | |
|   retrieving its fs-verity digest using `FS_IOC_MEASURE_VERITY`_, then
 | |
|   verifying a signature of it using any userspace cryptographic
 | |
|   library that supports digital signatures.
 | |
| 
 | |
| - Integrity Measurement Architecture (IMA).  IMA supports fs-verity
 | |
|   file digests as an alternative to its traditional full file digests.
 | |
|   "IMA appraisal" enforces that files contain a valid, matching
 | |
|   signature in their "security.ima" extended attribute, as controlled
 | |
|   by the IMA policy.  For more information, see the IMA documentation.
 | |
| 
 | |
| - Trusted userspace code in combination with `Built-in signature
 | |
|   verification`_.  This approach should be used only with great care.
 | |
| 
 | |
| User API
 | |
| ========
 | |
| 
 | |
| FS_IOC_ENABLE_VERITY
 | |
| --------------------
 | |
| 
 | |
| The FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a file.  It takes
 | |
| in a pointer to a struct fsverity_enable_arg, defined as
 | |
| follows::
 | |
| 
 | |
|     struct fsverity_enable_arg {
 | |
|             __u32 version;
 | |
|             __u32 hash_algorithm;
 | |
|             __u32 block_size;
 | |
|             __u32 salt_size;
 | |
|             __u64 salt_ptr;
 | |
|             __u32 sig_size;
 | |
|             __u32 __reserved1;
 | |
|             __u64 sig_ptr;
 | |
|             __u64 __reserved2[11];
 | |
|     };
 | |
| 
 | |
| This structure contains the parameters of the Merkle tree to build for
 | |
| the file.  It must be initialized as follows:
 | |
| 
 | |
| - ``version`` must be 1.
 | |
| - ``hash_algorithm`` must be the identifier for the hash algorithm to
 | |
|   use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256.  See
 | |
|   ``include/uapi/linux/fsverity.h`` for the list of possible values.
 | |
| - ``block_size`` is the Merkle tree block size, in bytes.  In Linux
 | |
|   v6.3 and later, this can be any power of 2 between (inclusively)
 | |
|   1024 and the minimum of the system page size and the filesystem
 | |
|   block size.  In earlier versions, the page size was the only allowed
 | |
|   value.
 | |
| - ``salt_size`` is the size of the salt in bytes, or 0 if no salt is
 | |
|   provided.  The salt is a value that is prepended to every hashed
 | |
|   block; it can be used to personalize the hashing for a particular
 | |
|   file or device.  Currently the maximum salt size is 32 bytes.
 | |
| - ``salt_ptr`` is the pointer to the salt, or NULL if no salt is
 | |
|   provided.
 | |
| - ``sig_size`` is the size of the builtin signature in bytes, or 0 if no
 | |
|   builtin signature is provided.  Currently the builtin signature is
 | |
|   (somewhat arbitrarily) limited to 16128 bytes.
 | |
| - ``sig_ptr``  is the pointer to the builtin signature, or NULL if no
 | |
|   builtin signature is provided.  A builtin signature is only needed
 | |
|   if the `Built-in signature verification`_ feature is being used.  It
 | |
|   is not needed for IMA appraisal, and it is not needed if the file
 | |
|   signature is being handled entirely in userspace.
 | |
| - All reserved fields must be zeroed.
 | |
| 
 | |
| FS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for
 | |
| the file and persist it to a filesystem-specific location associated
 | |
| with the file, then mark the file as a verity file.  This ioctl may
 | |
| take a long time to execute on large files, and it is interruptible by
 | |
| fatal signals.
 | |
| 
 | |
| FS_IOC_ENABLE_VERITY checks for write access to the inode.  However,
 | |
| it must be executed on an O_RDONLY file descriptor and no processes
 | |
| can have the file open for writing.  Attempts to open the file for
 | |
| writing while this ioctl is executing will fail with ETXTBSY.  (This
 | |
| is necessary to guarantee that no writable file descriptors will exist
 | |
| after verity is enabled, and to guarantee that the file's contents are
 | |
| stable while the Merkle tree is being built over it.)
 | |
| 
 | |
| On success, FS_IOC_ENABLE_VERITY returns 0, and the file becomes a
 | |
| verity file.  On failure (including the case of interruption by a
 | |
| fatal signal), no changes are made to the file.
 | |
| 
 | |
| FS_IOC_ENABLE_VERITY can fail with the following errors:
 | |
| 
 | |
| - ``EACCES``: the process does not have write access to the file
 | |
| - ``EBADMSG``: the builtin signature is malformed
 | |
| - ``EBUSY``: this ioctl is already running on the file
 | |
| - ``EEXIST``: the file already has verity enabled
 | |
| - ``EFAULT``: the caller provided inaccessible memory
 | |
| - ``EFBIG``: the file is too large to enable verity on
 | |
| - ``EINTR``: the operation was interrupted by a fatal signal
 | |
| - ``EINVAL``: unsupported version, hash algorithm, or block size; or
 | |
|   reserved bits are set; or the file descriptor refers to neither a
 | |
|   regular file nor a directory.
 | |
| - ``EISDIR``: the file descriptor refers to a directory
 | |
| - ``EKEYREJECTED``: the builtin signature doesn't match the file
 | |
| - ``EMSGSIZE``: the salt or builtin signature is too long
 | |
| - ``ENOKEY``: the ".fs-verity" keyring doesn't contain the certificate
 | |
|   needed to verify the builtin signature
 | |
| - ``ENOPKG``: fs-verity recognizes the hash algorithm, but it's not
 | |
|   available in the kernel's crypto API as currently configured (e.g.
 | |
|   for SHA-512, missing CONFIG_CRYPTO_SHA512).
 | |
| - ``ENOTTY``: this type of filesystem does not implement fs-verity
 | |
| - ``EOPNOTSUPP``: the kernel was not configured with fs-verity
 | |
|   support; or the filesystem superblock has not had the 'verity'
 | |
|   feature enabled on it; or the filesystem does not support fs-verity
 | |
|   on this file.  (See `Filesystem support`_.)
 | |
| - ``EPERM``: the file is append-only; or, a builtin signature is
 | |
|   required and one was not provided.
 | |
| - ``EROFS``: the filesystem is read-only
 | |
| - ``ETXTBSY``: someone has the file open for writing.  This can be the
 | |
|   caller's file descriptor, another open file descriptor, or the file
 | |
|   reference held by a writable memory map.
 | |
| 
 | |
| FS_IOC_MEASURE_VERITY
 | |
| ---------------------
 | |
| 
 | |
| The FS_IOC_MEASURE_VERITY ioctl retrieves the digest of a verity file.
 | |
| The fs-verity file digest is a cryptographic digest that identifies
 | |
| the file contents that are being enforced on reads; it is computed via
 | |
| a Merkle tree and is different from a traditional full-file digest.
 | |
| 
 | |
| This ioctl takes in a pointer to a variable-length structure::
 | |
| 
 | |
|     struct fsverity_digest {
 | |
|             __u16 digest_algorithm;
 | |
|             __u16 digest_size; /* input/output */
 | |
|             __u8 digest[];
 | |
|     };
 | |
| 
 | |
| ``digest_size`` is an input/output field.  On input, it must be
 | |
| initialized to the number of bytes allocated for the variable-length
 | |
| ``digest`` field.
 | |
| 
 | |
| On success, 0 is returned and the kernel fills in the structure as
 | |
| follows:
 | |
| 
 | |
| - ``digest_algorithm`` will be the hash algorithm used for the file
 | |
|   digest.  It will match ``fsverity_enable_arg::hash_algorithm``.
 | |
| - ``digest_size`` will be the size of the digest in bytes, e.g. 32
 | |
|   for SHA-256.  (This can be redundant with ``digest_algorithm``.)
 | |
| - ``digest`` will be the actual bytes of the digest.
 | |
| 
 | |
| FS_IOC_MEASURE_VERITY is guaranteed to execute in constant time,
 | |
| regardless of the size of the file.
 | |
| 
 | |
| FS_IOC_MEASURE_VERITY can fail with the following errors:
 | |
| 
 | |
| - ``EFAULT``: the caller provided inaccessible memory
 | |
| - ``ENODATA``: the file is not a verity file
 | |
| - ``ENOTTY``: this type of filesystem does not implement fs-verity
 | |
| - ``EOPNOTSUPP``: the kernel was not configured with fs-verity
 | |
|   support, or the filesystem superblock has not had the 'verity'
 | |
|   feature enabled on it.  (See `Filesystem support`_.)
 | |
| - ``EOVERFLOW``: the digest is longer than the specified
 | |
|   ``digest_size`` bytes.  Try providing a larger buffer.
 | |
| 
 | |
| FS_IOC_READ_VERITY_METADATA
 | |
| ---------------------------
 | |
| 
 | |
| The FS_IOC_READ_VERITY_METADATA ioctl reads verity metadata from a
 | |
| verity file.  This ioctl is available since Linux v5.12.
 | |
| 
 | |
| This ioctl allows writing a server program that takes a verity file
 | |
| and serves it to a client program, such that the client can do its own
 | |
| fs-verity compatible verification of the file.  This only makes sense
 | |
| if the client doesn't trust the server and if the server needs to
 | |
| provide the storage for the client.
 | |
| 
 | |
| This is a fairly specialized use case, and most fs-verity users won't
 | |
| need this ioctl.
 | |
| 
 | |
| This ioctl takes in a pointer to the following structure::
 | |
| 
 | |
|    #define FS_VERITY_METADATA_TYPE_MERKLE_TREE     1
 | |
|    #define FS_VERITY_METADATA_TYPE_DESCRIPTOR      2
 | |
|    #define FS_VERITY_METADATA_TYPE_SIGNATURE       3
 | |
| 
 | |
|    struct fsverity_read_metadata_arg {
 | |
|            __u64 metadata_type;
 | |
|            __u64 offset;
 | |
|            __u64 length;
 | |
|            __u64 buf_ptr;
 | |
|            __u64 __reserved;
 | |
|    };
 | |
| 
 | |
| ``metadata_type`` specifies the type of metadata to read:
 | |
| 
 | |
| - ``FS_VERITY_METADATA_TYPE_MERKLE_TREE`` reads the blocks of the
 | |
|   Merkle tree.  The blocks are returned in order from the root level
 | |
|   to the leaf level.  Within each level, the blocks are returned in
 | |
|   the same order that their hashes are themselves hashed.
 | |
|   See `Merkle tree`_ for more information.
 | |
| 
 | |
| - ``FS_VERITY_METADATA_TYPE_DESCRIPTOR`` reads the fs-verity
 | |
|   descriptor.  See `fs-verity descriptor`_.
 | |
| 
 | |
| - ``FS_VERITY_METADATA_TYPE_SIGNATURE`` reads the builtin signature
 | |
|   which was passed to FS_IOC_ENABLE_VERITY, if any.  See `Built-in
 | |
|   signature verification`_.
 | |
| 
 | |
| The semantics are similar to those of ``pread()``.  ``offset``
 | |
| specifies the offset in bytes into the metadata item to read from, and
 | |
| ``length`` specifies the maximum number of bytes to read from the
 | |
| metadata item.  ``buf_ptr`` is the pointer to the buffer to read into,
 | |
| cast to a 64-bit integer.  ``__reserved`` must be 0.  On success, the
 | |
| number of bytes read is returned.  0 is returned at the end of the
 | |
| metadata item.  The returned length may be less than ``length``, for
 | |
| example if the ioctl is interrupted.
 | |
| 
 | |
| The metadata returned by FS_IOC_READ_VERITY_METADATA isn't guaranteed
 | |
| to be authenticated against the file digest that would be returned by
 | |
| `FS_IOC_MEASURE_VERITY`_, as the metadata is expected to be used to
 | |
| implement fs-verity compatible verification anyway (though absent a
 | |
| malicious disk, the metadata will indeed match).  E.g. to implement
 | |
| this ioctl, the filesystem is allowed to just read the Merkle tree
 | |
| blocks from disk without actually verifying the path to the root node.
 | |
| 
 | |
| FS_IOC_READ_VERITY_METADATA can fail with the following errors:
 | |
| 
 | |
| - ``EFAULT``: the caller provided inaccessible memory
 | |
| - ``EINTR``: the ioctl was interrupted before any data was read
 | |
| - ``EINVAL``: reserved fields were set, or ``offset + length``
 | |
|   overflowed
 | |
| - ``ENODATA``: the file is not a verity file, or
 | |
|   FS_VERITY_METADATA_TYPE_SIGNATURE was requested but the file doesn't
 | |
|   have a builtin signature
 | |
| - ``ENOTTY``: this type of filesystem does not implement fs-verity, or
 | |
|   this ioctl is not yet implemented on it
 | |
| - ``EOPNOTSUPP``: the kernel was not configured with fs-verity
 | |
|   support, or the filesystem superblock has not had the 'verity'
 | |
|   feature enabled on it.  (See `Filesystem support`_.)
 | |
| 
 | |
| FS_IOC_GETFLAGS
 | |
| ---------------
 | |
| 
 | |
| The existing ioctl FS_IOC_GETFLAGS (which isn't specific to fs-verity)
 | |
| can also be used to check whether a file has fs-verity enabled or not.
 | |
| To do so, check for FS_VERITY_FL (0x00100000) in the returned flags.
 | |
| 
 | |
| The verity flag is not settable via FS_IOC_SETFLAGS.  You must use
 | |
| FS_IOC_ENABLE_VERITY instead, since parameters must be provided.
 | |
| 
 | |
| statx
 | |
| -----
 | |
| 
 | |
| Since Linux v5.5, the statx() system call sets STATX_ATTR_VERITY if
 | |
| the file has fs-verity enabled.  This can perform better than
 | |
| FS_IOC_GETFLAGS and FS_IOC_MEASURE_VERITY because it doesn't require
 | |
| opening the file, and opening verity files can be expensive.
 | |
| 
 | |
| .. _accessing_verity_files:
 | |
| 
 | |
| Accessing verity files
 | |
| ======================
 | |
| 
 | |
| Applications can transparently access a verity file just like a
 | |
| non-verity one, with the following exceptions:
 | |
| 
 | |
| - Verity files are readonly.  They cannot be opened for writing or
 | |
|   truncate()d, even if the file mode bits allow it.  Attempts to do
 | |
|   one of these things will fail with EPERM.  However, changes to
 | |
|   metadata such as owner, mode, timestamps, and xattrs are still
 | |
|   allowed, since these are not measured by fs-verity.  Verity files
 | |
|   can also still be renamed, deleted, and linked to.
 | |
| 
 | |
| - Direct I/O is not supported on verity files.  Attempts to use direct
 | |
|   I/O on such files will fall back to buffered I/O.
 | |
| 
 | |
| - DAX (Direct Access) is not supported on verity files, because this
 | |
|   would circumvent the data verification.
 | |
| 
 | |
| - Reads of data that doesn't match the verity Merkle tree will fail
 | |
|   with EIO (for read()) or SIGBUS (for mmap() reads).
 | |
| 
 | |
| - If the sysctl "fs.verity.require_signatures" is set to 1 and the
 | |
|   file is not signed by a key in the ".fs-verity" keyring, then
 | |
|   opening the file will fail.  See `Built-in signature verification`_.
 | |
| 
 | |
| Direct access to the Merkle tree is not supported.  Therefore, if a
 | |
| verity file is copied, or is backed up and restored, then it will lose
 | |
| its "verity"-ness.  fs-verity is primarily meant for files like
 | |
| executables that are managed by a package manager.
 | |
| 
 | |
| File digest computation
 | |
| =======================
 | |
| 
 | |
| This section describes how fs-verity hashes the file contents using a
 | |
| Merkle tree to produce the digest which cryptographically identifies
 | |
| the file contents.  This algorithm is the same for all filesystems
 | |
| that support fs-verity.
 | |
| 
 | |
| Userspace only needs to be aware of this algorithm if it needs to
 | |
| compute fs-verity file digests itself, e.g. in order to sign files.
 | |
| 
 | |
| .. _fsverity_merkle_tree:
 | |
| 
 | |
| Merkle tree
 | |
| -----------
 | |
| 
 | |
| The file contents is divided into blocks, where the block size is
 | |
| configurable but is usually 4096 bytes.  The end of the last block is
 | |
| zero-padded if needed.  Each block is then hashed, producing the first
 | |
| level of hashes.  Then, the hashes in this first level are grouped
 | |
| into 'blocksize'-byte blocks (zero-padding the ends as needed) and
 | |
| these blocks are hashed, producing the second level of hashes.  This
 | |
| proceeds up the tree until only a single block remains.  The hash of
 | |
| this block is the "Merkle tree root hash".
 | |
| 
 | |
| If the file fits in one block and is nonempty, then the "Merkle tree
 | |
| root hash" is simply the hash of the single data block.  If the file
 | |
| is empty, then the "Merkle tree root hash" is all zeroes.
 | |
| 
 | |
| The "blocks" here are not necessarily the same as "filesystem blocks".
 | |
| 
 | |
| If a salt was specified, then it's zero-padded to the closest multiple
 | |
| of the input size of the hash algorithm's compression function, e.g.
 | |
| 64 bytes for SHA-256 or 128 bytes for SHA-512.  The padded salt is
 | |
| prepended to every data or Merkle tree block that is hashed.
 | |
| 
 | |
| The purpose of the block padding is to cause every hash to be taken
 | |
| over the same amount of data, which simplifies the implementation and
 | |
| keeps open more possibilities for hardware acceleration.  The purpose
 | |
| of the salt padding is to make the salting "free" when the salted hash
 | |
| state is precomputed, then imported for each hash.
 | |
| 
 | |
| Example: in the recommended configuration of SHA-256 and 4K blocks,
 | |
| 128 hash values fit in each block.  Thus, each level of the Merkle
 | |
| tree is approximately 128 times smaller than the previous, and for
 | |
| large files the Merkle tree's size converges to approximately 1/127 of
 | |
| the original file size.  However, for small files, the padding is
 | |
| significant, making the space overhead proportionally more.
 | |
| 
 | |
| .. _fsverity_descriptor:
 | |
| 
 | |
| fs-verity descriptor
 | |
| --------------------
 | |
| 
 | |
| By itself, the Merkle tree root hash is ambiguous.  For example, it
 | |
| can't a distinguish a large file from a small second file whose data
 | |
| is exactly the top-level hash block of the first file.  Ambiguities
 | |
| also arise from the convention of padding to the next block boundary.
 | |
| 
 | |
| To solve this problem, the fs-verity file digest is actually computed
 | |
| as a hash of the following structure, which contains the Merkle tree
 | |
| root hash as well as other fields such as the file size::
 | |
| 
 | |
|     struct fsverity_descriptor {
 | |
|             __u8 version;           /* must be 1 */
 | |
|             __u8 hash_algorithm;    /* Merkle tree hash algorithm */
 | |
|             __u8 log_blocksize;     /* log2 of size of data and tree blocks */
 | |
|             __u8 salt_size;         /* size of salt in bytes; 0 if none */
 | |
|             __le32 __reserved_0x04; /* must be 0 */
 | |
|             __le64 data_size;       /* size of file the Merkle tree is built over */
 | |
|             __u8 root_hash[64];     /* Merkle tree root hash */
 | |
|             __u8 salt[32];          /* salt prepended to each hashed block */
 | |
|             __u8 __reserved[144];   /* must be 0's */
 | |
|     };
 | |
| 
 | |
| Built-in signature verification
 | |
| ===============================
 | |
| 
 | |
| CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y adds supports for in-kernel
 | |
| verification of fs-verity builtin signatures.
 | |
| 
 | |
| **IMPORTANT**!  Please take great care before using this feature.
 | |
| It is not the only way to do signatures with fs-verity, and the
 | |
| alternatives (such as userspace signature verification, and IMA
 | |
| appraisal) can be much better.  It's also easy to fall into a trap
 | |
| of thinking this feature solves more problems than it actually does.
 | |
| 
 | |
| Enabling this option adds the following:
 | |
| 
 | |
| 1. At boot time, the kernel creates a keyring named ".fs-verity".  The
 | |
|    root user can add trusted X.509 certificates to this keyring using
 | |
|    the add_key() system call.
 | |
| 
 | |
| 2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted
 | |
|    detached signature in DER format of the file's fs-verity digest.
 | |
|    On success, the ioctl persists the signature alongside the Merkle
 | |
|    tree.  Then, any time the file is opened, the kernel verifies the
 | |
|    file's actual digest against this signature, using the certificates
 | |
|    in the ".fs-verity" keyring.
 | |
| 
 | |
| 3. A new sysctl "fs.verity.require_signatures" is made available.
 | |
|    When set to 1, the kernel requires that all verity files have a
 | |
|    correctly signed digest as described in (2).
 | |
| 
 | |
| The data that the signature as described in (2) must be a signature of
 | |
| is the fs-verity file digest in the following format::
 | |
| 
 | |
|     struct fsverity_formatted_digest {
 | |
|             char magic[8];                  /* must be "FSVerity" */
 | |
|             __le16 digest_algorithm;
 | |
|             __le16 digest_size;
 | |
|             __u8 digest[];
 | |
|     };
 | |
| 
 | |
| That's it.  It should be emphasized again that fs-verity builtin
 | |
| signatures are not the only way to do signatures with fs-verity.  See
 | |
| `Use cases`_ for an overview of ways in which fs-verity can be used.
 | |
| fs-verity builtin signatures have some major limitations that should
 | |
| be carefully considered before using them:
 | |
| 
 | |
| - Builtin signature verification does *not* make the kernel enforce
 | |
|   that any files actually have fs-verity enabled.  Thus, it is not a
 | |
|   complete authentication policy.  Currently, if it is used, the only
 | |
|   way to complete the authentication policy is for trusted userspace
 | |
|   code to explicitly check whether files have fs-verity enabled with a
 | |
|   signature before they are accessed.  (With
 | |
|   fs.verity.require_signatures=1, just checking whether fs-verity is
 | |
|   enabled suffices.)  But, in this case the trusted userspace code
 | |
|   could just store the signature alongside the file and verify it
 | |
|   itself using a cryptographic library, instead of using this feature.
 | |
| 
 | |
| - A file's builtin signature can only be set at the same time that
 | |
|   fs-verity is being enabled on the file.  Changing or deleting the
 | |
|   builtin signature later requires re-creating the file.
 | |
| 
 | |
| - Builtin signature verification uses the same set of public keys for
 | |
|   all fs-verity enabled files on the system.  Different keys cannot be
 | |
|   trusted for different files; each key is all or nothing.
 | |
| 
 | |
| - The sysctl fs.verity.require_signatures applies system-wide.
 | |
|   Setting it to 1 only works when all users of fs-verity on the system
 | |
|   agree that it should be set to 1.  This limitation can prevent
 | |
|   fs-verity from being used in cases where it would be helpful.
 | |
| 
 | |
| - Builtin signature verification can only use signature algorithms
 | |
|   that are supported by the kernel.  For example, the kernel does not
 | |
|   yet support Ed25519, even though this is often the signature
 | |
|   algorithm that is recommended for new cryptographic designs.
 | |
| 
 | |
| - fs-verity builtin signatures are in PKCS#7 format, and the public
 | |
|   keys are in X.509 format.  These formats are commonly used,
 | |
|   including by some other kernel features (which is why the fs-verity
 | |
|   builtin signatures use them), and are very feature rich.
 | |
|   Unfortunately, history has shown that code that parses and handles
 | |
|   these formats (which are from the 1990s and are based on ASN.1)
 | |
|   often has vulnerabilities as a result of their complexity.  This
 | |
|   complexity is not inherent to the cryptography itself.
 | |
| 
 | |
|   fs-verity users who do not need advanced features of X.509 and
 | |
|   PKCS#7 should strongly consider using simpler formats, such as plain
 | |
|   Ed25519 keys and signatures, and verifying signatures in userspace.
 | |
| 
 | |
|   fs-verity users who choose to use X.509 and PKCS#7 anyway should
 | |
|   still consider that verifying those signatures in userspace is more
 | |
|   flexible (for other reasons mentioned earlier in this document) and
 | |
|   eliminates the need to enable CONFIG_FS_VERITY_BUILTIN_SIGNATURES
 | |
|   and its associated increase in kernel attack surface.  In some cases
 | |
|   it can even be necessary, since advanced X.509 and PKCS#7 features
 | |
|   do not always work as intended with the kernel.  For example, the
 | |
|   kernel does not check X.509 certificate validity times.
 | |
| 
 | |
|   Note: IMA appraisal, which supports fs-verity, does not use PKCS#7
 | |
|   for its signatures, so it partially avoids the issues discussed
 | |
|   here.  IMA appraisal does use X.509.
 | |
| 
 | |
| Filesystem support
 | |
| ==================
 | |
| 
 | |
| fs-verity is supported by several filesystems, described below.  The
 | |
| CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity on
 | |
| any of these filesystems.
 | |
| 
 | |
| ``include/linux/fsverity.h`` declares the interface between the
 | |
| ``fs/verity/`` support layer and filesystems.  Briefly, filesystems
 | |
| must provide an ``fsverity_operations`` structure that provides
 | |
| methods to read and write the verity metadata to a filesystem-specific
 | |
| location, including the Merkle tree blocks and
 | |
| ``fsverity_descriptor``.  Filesystems must also call functions in
 | |
| ``fs/verity/`` at certain times, such as when a file is opened or when
 | |
| pages have been read into the pagecache.  (See `Verifying data`_.)
 | |
| 
 | |
| ext4
 | |
| ----
 | |
| 
 | |
| ext4 supports fs-verity since Linux v5.4 and e2fsprogs v1.45.2.
 | |
| 
 | |
| To create verity files on an ext4 filesystem, the filesystem must have
 | |
| been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on
 | |
| it.  "verity" is an RO_COMPAT filesystem feature, so once set, old
 | |
| kernels will only be able to mount the filesystem readonly, and old
 | |
| versions of e2fsck will be unable to check the filesystem.  Moreover,
 | |
| currently ext4 only supports mounting a filesystem with the "verity"
 | |
| feature when its block size is equal to PAGE_SIZE (often 4096 bytes).
 | |
| 
 | |
| ext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files.  It
 | |
| can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared.
 | |
| 
 | |
| ext4 also supports encryption, which can be used simultaneously with
 | |
| fs-verity.  In this case, the plaintext data is verified rather than
 | |
| the ciphertext.  This is necessary in order to make the fs-verity file
 | |
| digest meaningful, since every file is encrypted differently.
 | |
| 
 | |
| ext4 stores the verity metadata (Merkle tree and fsverity_descriptor)
 | |
| past the end of the file, starting at the first 64K boundary beyond
 | |
| i_size.  This approach works because (a) verity files are readonly,
 | |
| and (b) pages fully beyond i_size aren't visible to userspace but can
 | |
| be read/written internally by ext4 with only some relatively small
 | |
| changes to ext4.  This approach avoids having to depend on the
 | |
| EA_INODE feature and on rearchitecturing ext4's xattr support to
 | |
| support paging multi-gigabyte xattrs into memory, and to support
 | |
| encrypting xattrs.  Note that the verity metadata *must* be encrypted
 | |
| when the file is, since it contains hashes of the plaintext data.
 | |
| 
 | |
| ext4 only allows verity on extent-based files.
 | |
| 
 | |
| f2fs
 | |
| ----
 | |
| 
 | |
| f2fs supports fs-verity since Linux v5.4 and f2fs-tools v1.11.0.
 | |
| 
 | |
| To create verity files on an f2fs filesystem, the filesystem must have
 | |
| been formatted with ``-O verity``.
 | |
| 
 | |
| f2fs sets the FADVISE_VERITY_BIT on-disk inode flag on verity files.
 | |
| It can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be
 | |
| cleared.
 | |
| 
 | |
| Like ext4, f2fs stores the verity metadata (Merkle tree and
 | |
| fsverity_descriptor) past the end of the file, starting at the first
 | |
| 64K boundary beyond i_size.  See explanation for ext4 above.
 | |
| Moreover, f2fs supports at most 4096 bytes of xattr entries per inode
 | |
| which usually wouldn't be enough for even a single Merkle tree block.
 | |
| 
 | |
| f2fs doesn't support enabling verity on files that currently have
 | |
| atomic or volatile writes pending.
 | |
| 
 | |
| btrfs
 | |
| -----
 | |
| 
 | |
| btrfs supports fs-verity since Linux v5.15.  Verity-enabled inodes are
 | |
| marked with a RO_COMPAT inode flag, and the verity metadata is stored
 | |
| in separate btree items.
 | |
| 
 | |
| Implementation details
 | |
| ======================
 | |
| 
 | |
| Verifying data
 | |
| --------------
 | |
| 
 | |
| fs-verity ensures that all reads of a verity file's data are verified,
 | |
| regardless of which syscall is used to do the read (e.g. mmap(),
 | |
| read(), pread()) and regardless of whether it's the first read or a
 | |
| later read (unless the later read can return cached data that was
 | |
| already verified).  Below, we describe how filesystems implement this.
 | |
| 
 | |
| Pagecache
 | |
| ~~~~~~~~~
 | |
| 
 | |
| For filesystems using Linux's pagecache, the ``->read_folio()`` and
 | |
| ``->readahead()`` methods must be modified to verify folios before
 | |
| they are marked Uptodate.  Merely hooking ``->read_iter()`` would be
 | |
| insufficient, since ``->read_iter()`` is not used for memory maps.
 | |
| 
 | |
| Therefore, fs/verity/ provides the function fsverity_verify_blocks()
 | |
| which verifies data that has been read into the pagecache of a verity
 | |
| inode.  The containing folio must still be locked and not Uptodate, so
 | |
| it's not yet readable by userspace.  As needed to do the verification,
 | |
| fsverity_verify_blocks() will call back into the filesystem to read
 | |
| hash blocks via fsverity_operations::read_merkle_tree_page().
 | |
| 
 | |
| fsverity_verify_blocks() returns false if verification failed; in this
 | |
| case, the filesystem must not set the folio Uptodate.  Following this,
 | |
| as per the usual Linux pagecache behavior, attempts by userspace to
 | |
| read() from the part of the file containing the folio will fail with
 | |
| EIO, and accesses to the folio within a memory map will raise SIGBUS.
 | |
| 
 | |
| In principle, verifying a data block requires verifying the entire
 | |
| path in the Merkle tree from the data block to the root hash.
 | |
| However, for efficiency the filesystem may cache the hash blocks.
 | |
| Therefore, fsverity_verify_blocks() only ascends the tree reading hash
 | |
| blocks until an already-verified hash block is seen.  It then verifies
 | |
| the path to that block.
 | |
| 
 | |
| This optimization, which is also used by dm-verity, results in
 | |
| excellent sequential read performance.  This is because usually (e.g.
 | |
| 127 in 128 times for 4K blocks and SHA-256) the hash block from the
 | |
| bottom level of the tree will already be cached and checked from
 | |
| reading a previous data block.  However, random reads perform worse.
 | |
| 
 | |
| Block device based filesystems
 | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 | |
| 
 | |
| Block device based filesystems (e.g. ext4 and f2fs) in Linux also use
 | |
| the pagecache, so the above subsection applies too.  However, they
 | |
| also usually read many data blocks from a file at once, grouped into a
 | |
| structure called a "bio".  To make it easier for these types of
 | |
| filesystems to support fs-verity, fs/verity/ also provides a function
 | |
| fsverity_verify_bio() which verifies all data blocks in a bio.
 | |
| 
 | |
| ext4 and f2fs also support encryption.  If a verity file is also
 | |
| encrypted, the data must be decrypted before being verified.  To
 | |
| support this, these filesystems allocate a "post-read context" for
 | |
| each bio and store it in ``->bi_private``::
 | |
| 
 | |
|     struct bio_post_read_ctx {
 | |
|            struct bio *bio;
 | |
|            struct work_struct work;
 | |
|            unsigned int cur_step;
 | |
|            unsigned int enabled_steps;
 | |
|     };
 | |
| 
 | |
| ``enabled_steps`` is a bitmask that specifies whether decryption,
 | |
| verity, or both is enabled.  After the bio completes, for each needed
 | |
| postprocessing step the filesystem enqueues the bio_post_read_ctx on a
 | |
| workqueue, and then the workqueue work does the decryption or
 | |
| verification.  Finally, folios where no decryption or verity error
 | |
| occurred are marked Uptodate, and the folios are unlocked.
 | |
| 
 | |
| On many filesystems, files can contain holes.  Normally,
 | |
| ``->readahead()`` simply zeroes hole blocks and considers the
 | |
| corresponding data to be up-to-date; no bios are issued.  To prevent
 | |
| this case from bypassing fs-verity, filesystems use
 | |
| fsverity_verify_blocks() to verify hole blocks.
 | |
| 
 | |
| Filesystems also disable direct I/O on verity files, since otherwise
 | |
| direct I/O would bypass fs-verity.
 | |
| 
 | |
| Userspace utility
 | |
| =================
 | |
| 
 | |
| This document focuses on the kernel, but a userspace utility for
 | |
| fs-verity can be found at:
 | |
| 
 | |
| 	https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git
 | |
| 
 | |
| See the README.md file in the fsverity-utils source tree for details,
 | |
| including examples of setting up fs-verity protected files.
 | |
| 
 | |
| Tests
 | |
| =====
 | |
| 
 | |
| To test fs-verity, use xfstests.  For example, using `kvm-xfstests
 | |
| <https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_::
 | |
| 
 | |
|     kvm-xfstests -c ext4,f2fs,btrfs -g verity
 | |
| 
 | |
| FAQ
 | |
| ===
 | |
| 
 | |
| This section answers frequently asked questions about fs-verity that
 | |
| weren't already directly answered in other parts of this document.
 | |
| 
 | |
| :Q: Why isn't fs-verity part of IMA?
 | |
| :A: fs-verity and IMA (Integrity Measurement Architecture) have
 | |
|     different focuses.  fs-verity is a filesystem-level mechanism for
 | |
|     hashing individual files using a Merkle tree.  In contrast, IMA
 | |
|     specifies a system-wide policy that specifies which files are
 | |
|     hashed and what to do with those hashes, such as log them,
 | |
|     authenticate them, or add them to a measurement list.
 | |
| 
 | |
|     IMA supports the fs-verity hashing mechanism as an alternative
 | |
|     to full file hashes, for those who want the performance and
 | |
|     security benefits of the Merkle tree based hash.  However, it
 | |
|     doesn't make sense to force all uses of fs-verity to be through
 | |
|     IMA.  fs-verity already meets many users' needs even as a
 | |
|     standalone filesystem feature, and it's testable like other
 | |
|     filesystem features e.g. with xfstests.
 | |
| 
 | |
| :Q: Isn't fs-verity useless because the attacker can just modify the
 | |
|     hashes in the Merkle tree, which is stored on-disk?
 | |
| :A: To verify the authenticity of an fs-verity file you must verify
 | |
|     the authenticity of the "fs-verity file digest", which
 | |
|     incorporates the root hash of the Merkle tree.  See `Use cases`_.
 | |
| 
 | |
| :Q: Isn't fs-verity useless because the attacker can just replace a
 | |
|     verity file with a non-verity one?
 | |
| :A: See `Use cases`_.  In the initial use case, it's really trusted
 | |
|     userspace code that authenticates the files; fs-verity is just a
 | |
|     tool to do this job efficiently and securely.  The trusted
 | |
|     userspace code will consider non-verity files to be inauthentic.
 | |
| 
 | |
| :Q: Why does the Merkle tree need to be stored on-disk?  Couldn't you
 | |
|     store just the root hash?
 | |
| :A: If the Merkle tree wasn't stored on-disk, then you'd have to
 | |
|     compute the entire tree when the file is first accessed, even if
 | |
|     just one byte is being read.  This is a fundamental consequence of
 | |
|     how Merkle tree hashing works.  To verify a leaf node, you need to
 | |
|     verify the whole path to the root hash, including the root node
 | |
|     (the thing which the root hash is a hash of).  But if the root
 | |
|     node isn't stored on-disk, you have to compute it by hashing its
 | |
|     children, and so on until you've actually hashed the entire file.
 | |
| 
 | |
|     That defeats most of the point of doing a Merkle tree-based hash,
 | |
|     since if you have to hash the whole file ahead of time anyway,
 | |
|     then you could simply do sha256(file) instead.  That would be much
 | |
|     simpler, and a bit faster too.
 | |
| 
 | |
|     It's true that an in-memory Merkle tree could still provide the
 | |
|     advantage of verification on every read rather than just on the
 | |
|     first read.  However, it would be inefficient because every time a
 | |
|     hash page gets evicted (you can't pin the entire Merkle tree into
 | |
|     memory, since it may be very large), in order to restore it you
 | |
|     again need to hash everything below it in the tree.  This again
 | |
|     defeats most of the point of doing a Merkle tree-based hash, since
 | |
|     a single block read could trigger re-hashing gigabytes of data.
 | |
| 
 | |
| :Q: But couldn't you store just the leaf nodes and compute the rest?
 | |
| :A: See previous answer; this really just moves up one level, since
 | |
|     one could alternatively interpret the data blocks as being the
 | |
|     leaf nodes of the Merkle tree.  It's true that the tree can be
 | |
|     computed much faster if the leaf level is stored rather than just
 | |
|     the data, but that's only because each level is less than 1% the
 | |
|     size of the level below (assuming the recommended settings of
 | |
|     SHA-256 and 4K blocks).  For the exact same reason, by storing
 | |
|     "just the leaf nodes" you'd already be storing over 99% of the
 | |
|     tree, so you might as well simply store the whole tree.
 | |
| 
 | |
| :Q: Can the Merkle tree be built ahead of time, e.g. distributed as
 | |
|     part of a package that is installed to many computers?
 | |
| :A: This isn't currently supported.  It was part of the original
 | |
|     design, but was removed to simplify the kernel UAPI and because it
 | |
|     wasn't a critical use case.  Files are usually installed once and
 | |
|     used many times, and cryptographic hashing is somewhat fast on
 | |
|     most modern processors.
 | |
| 
 | |
| :Q: Why doesn't fs-verity support writes?
 | |
| :A: Write support would be very difficult and would require a
 | |
|     completely different design, so it's well outside the scope of
 | |
|     fs-verity.  Write support would require:
 | |
| 
 | |
|     - A way to maintain consistency between the data and hashes,
 | |
|       including all levels of hashes, since corruption after a crash
 | |
|       (especially of potentially the entire file!) is unacceptable.
 | |
|       The main options for solving this are data journalling,
 | |
|       copy-on-write, and log-structured volume.  But it's very hard to
 | |
|       retrofit existing filesystems with new consistency mechanisms.
 | |
|       Data journalling is available on ext4, but is very slow.
 | |
| 
 | |
|     - Rebuilding the Merkle tree after every write, which would be
 | |
|       extremely inefficient.  Alternatively, a different authenticated
 | |
|       dictionary structure such as an "authenticated skiplist" could
 | |
|       be used.  However, this would be far more complex.
 | |
| 
 | |
|     Compare it to dm-verity vs. dm-integrity.  dm-verity is very
 | |
|     simple: the kernel just verifies read-only data against a
 | |
|     read-only Merkle tree.  In contrast, dm-integrity supports writes
 | |
|     but is slow, is much more complex, and doesn't actually support
 | |
|     full-device authentication since it authenticates each sector
 | |
|     independently, i.e. there is no "root hash".  It doesn't really
 | |
|     make sense for the same device-mapper target to support these two
 | |
|     very different cases; the same applies to fs-verity.
 | |
| 
 | |
| :Q: Since verity files are immutable, why isn't the immutable bit set?
 | |
| :A: The existing "immutable" bit (FS_IMMUTABLE_FL) already has a
 | |
|     specific set of semantics which not only make the file contents
 | |
|     read-only, but also prevent the file from being deleted, renamed,
 | |
|     linked to, or having its owner or mode changed.  These extra
 | |
|     properties are unwanted for fs-verity, so reusing the immutable
 | |
|     bit isn't appropriate.
 | |
| 
 | |
| :Q: Why does the API use ioctls instead of setxattr() and getxattr()?
 | |
| :A: Abusing the xattr interface for basically arbitrary syscalls is
 | |
|     heavily frowned upon by most of the Linux filesystem developers.
 | |
|     An xattr should really just be an xattr on-disk, not an API to
 | |
|     e.g. magically trigger construction of a Merkle tree.
 | |
| 
 | |
| :Q: Does fs-verity support remote filesystems?
 | |
| :A: So far all filesystems that have implemented fs-verity support are
 | |
|     local filesystems, but in principle any filesystem that can store
 | |
|     per-file verity metadata can support fs-verity, regardless of
 | |
|     whether it's local or remote.  Some filesystems may have fewer
 | |
|     options of where to store the verity metadata; one possibility is
 | |
|     to store it past the end of the file and "hide" it from userspace
 | |
|     by manipulating i_size.  The data verification functions provided
 | |
|     by ``fs/verity/`` also assume that the filesystem uses the Linux
 | |
|     pagecache, but both local and remote filesystems normally do so.
 | |
| 
 | |
| :Q: Why is anything filesystem-specific at all?  Shouldn't fs-verity
 | |
|     be implemented entirely at the VFS level?
 | |
| :A: There are many reasons why this is not possible or would be very
 | |
|     difficult, including the following:
 | |
| 
 | |
|     - To prevent bypassing verification, folios must not be marked
 | |
|       Uptodate until they've been verified.  Currently, each
 | |
|       filesystem is responsible for marking folios Uptodate via
 | |
|       ``->readahead()``.  Therefore, currently it's not possible for
 | |
|       the VFS to do the verification on its own.  Changing this would
 | |
|       require significant changes to the VFS and all filesystems.
 | |
| 
 | |
|     - It would require defining a filesystem-independent way to store
 | |
|       the verity metadata.  Extended attributes don't work for this
 | |
|       because (a) the Merkle tree may be gigabytes, but many
 | |
|       filesystems assume that all xattrs fit into a single 4K
 | |
|       filesystem block, and (b) ext4 and f2fs encryption doesn't
 | |
|       encrypt xattrs, yet the Merkle tree *must* be encrypted when the
 | |
|       file contents are, because it stores hashes of the plaintext
 | |
|       file contents.
 | |
| 
 | |
|       So the verity metadata would have to be stored in an actual
 | |
|       file.  Using a separate file would be very ugly, since the
 | |
|       metadata is fundamentally part of the file to be protected, and
 | |
|       it could cause problems where users could delete the real file
 | |
|       but not the metadata file or vice versa.  On the other hand,
 | |
|       having it be in the same file would break applications unless
 | |
|       filesystems' notion of i_size were divorced from the VFS's,
 | |
|       which would be complex and require changes to all filesystems.
 | |
| 
 | |
|     - It's desirable that FS_IOC_ENABLE_VERITY uses the filesystem's
 | |
|       transaction mechanism so that either the file ends up with
 | |
|       verity enabled, or no changes were made.  Allowing intermediate
 | |
|       states to occur after a crash may cause problems.
 |