28 Jan, 2021

1 commit

  • Overlayfs's volatile option allows the user to bypass all forced sync calls
    to the upperdir filesystem. This comes at the cost of safety. We can never
    ensure that the user's data is intact, but we can make a best effort to
    expose whether or not the data is likely to be in a bad state.

    The best way to handle this in the time being is that if an overlayfs's
    upperdir experiences an error after a volatile mount occurs, that error
    will be returned on fsync, fdatasync, sync, and syncfs. This is
    contradictory to the traditional behaviour of VFS which fails the call
    once, and only raises an error if a subsequent fsync error has occurred,
    and been raised by the filesystem.

    One awkward aspect of the patch is that we have to manually set the
    superblock's errseq_t after the sync_fs callback as opposed to just
    returning an error from syncfs. This is because the call chain looks
    something like this:

    sys_syncfs ->
    sync_filesystem ->
    __sync_filesystem ->
    /* The return value is ignored here
    sb->s_op->sync_fs(sb)
    _sync_blockdev
    /* Where the VFS fetches the error to raise to userspace */
    errseq_check_and_advance

    Because of this we call errseq_set every time the sync_fs callback occurs.
    Due to the nature of this seen / unseen dichotomy, if the upperdir is an
    inconsistent state at the initial mount time, overlayfs will refuse to
    mount, as overlayfs cannot get a snapshot of the upperdir's errseq that
    will increment on error until the user calls syncfs.

    Signed-off-by: Sargun Dhillon
    Suggested-by: Amir Goldstein
    Reviewed-by: Amir Goldstein
    Fixes: c86243b090bc ("ovl: provide a mount option "volatile"")
    Cc: stable@vger.kernel.org
    Reviewed-by: Vivek Goyal
    Reviewed-by: Jeff Layton
    Signed-off-by: Miklos Szeredi

    Sargun Dhillon
     

25 Dec, 2020

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Various bug fixes and cleanups for ext4; no new features this cycle"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (29 commits)
    ext4: remove unnecessary wbc parameter from ext4_bio_write_page
    ext4: avoid s_mb_prefetch to be zero in individual scenarios
    ext4: defer saving error info from atomic context
    ext4: simplify ext4 error translation
    ext4: move functions in super.c
    ext4: make ext4_abort() use __ext4_error()
    ext4: standardize error message in ext4_protect_reserved_inode()
    ext4: remove redundant sb checksum recomputation
    ext4: don't remount read-only with errors=continue on reboot
    ext4: fix deadlock with fs freezing and EA inodes
    jbd2: add a helper to find out number of fast commit blocks
    ext4: make fast_commit.h byte identical with e2fsprogs/fast_commit.h
    ext4: fix fall-through warnings for Clang
    ext4: add docs about fast commit idempotence
    ext4: remove the unused EXT4_CURRENT_REV macro
    ext4: fix an IS_ERR() vs NULL check
    ext4: check for invalid block size early when mounting a file system
    ext4: fix a memory leak of ext4_free_data
    ext4: delete nonsensical (commented-out) code inside ext4_xattr_block_set()
    ext4: update ext4_data_block_valid related comments
    ...

    Linus Torvalds
     

21 Dec, 2020

1 commit

  • Pull gfs2 updates from Andreas Gruenbacher:

    - Don't wait for unfreeze of the wrong filesystems

    - Remove an obsolete delete_work_func hack and an incorrect
    sb_start_write

    - Minor documentation updates and cosmetic care

    * tag 'gfs2-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    gfs2: in signal_our_withdraw wait for unfreeze of _this_ fs only
    gfs2: Remove sb_start_write from gfs2_statfs_sync
    gfs2: remove trailing semicolons from macro definitions
    Revert "GFS2: Prevent delete work from occurring on glocks used for create"
    gfs2: Make inode operations static
    MAINTAINERS: Add gfs2 bug tracker link
    Documentation: Update filesystems/gfs2.rst

    Linus Torvalds
     

18 Dec, 2020

4 commits

  • Pull overlayfs updates from Miklos Szeredi:

    - Allow unprivileged mounting in a user namespace.

    For quite some time the security model of overlayfs has been that
    operations on underlying layers shall be performed with the
    privileges of the mounting task.

    This way an unprvileged user cannot gain privileges by the act of
    mounting an overlayfs instance. A full audit of all function calls
    made by the overlayfs code has been performed to see whether they
    conform to this model, and this branch contains some fixes in this
    regard.

    - Support running on copied filesystem images by optionally disabling
    UUID verification.

    - Bug fixes as well as documentation updates.

    * tag 'ovl-update-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: unprivieged mounts
    ovl: do not get metacopy for userxattr
    ovl: do not fail because of O_NOATIME
    ovl: do not fail when setting origin xattr
    ovl: user xattr
    ovl: simplify file splice
    ovl: make ioctl() safe
    ovl: check privs before decoding file handle
    vfs: verify source area in vfs_dedupe_file_range_one()
    vfs: move cap_convert_nscap() call into vfs_setxattr()
    ovl: fix incorrect extent info in metacopy case
    ovl: expand warning in ovl_d_real()
    ovl: document lower modification caveats
    ovl: warn about orphan metacopy
    ovl: doc clarification
    ovl: introduce new "uuid=off" option for inodes index feature
    ovl: propagate ovl_fs to ovl_decode_real_fh and ovl_encode_real_fh

    Linus Torvalds
     
  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've made more work into per-file compression support.

    For example, F2FS_IOC_GET | SET_COMPRESS_OPTION provides a way to
    change the algorithm or cluster size per file. F2FS_IOC_COMPRESS |
    DECOMPRESS_FILE provides a way to compress and decompress the existing
    normal files manually.

    There is also a new mount option, compress_mode=fs|user, which can
    control who compresses the data.

    Chao also added a checksum feature with a mount option so that
    we are able to detect any corrupted cluster.

    In addition, Daniel contributed casefolding with encryption patch,
    which will be used for Android devices.

    Summary:

    Enhancements:
    - add ioctls and mount option to manage per-file compression feature
    - support casefolding with encryption
    - support checksum for compressed cluster
    - avoid IO starvation by replacing mutex with rwsem
    - add sysfs, max_io_bytes, to control max bio size

    Bug fixes:
    - fix use-after-free issue when compression and fsverity are enabled
    - fix consistency corruption during fault injection test
    - fix data offset for lseek
    - get rid of buffer_head which has 32bits limit in fiemap
    - fix some bugs in multi-partitions support
    - fix nat entry count calculation in shrinker
    - fix some stat information

    And, we've refactored some logics and fix minor bugs as well"

    * tag 'f2fs-for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (36 commits)
    f2fs: compress: fix compression chksum
    f2fs: fix shift-out-of-bounds in sanity_check_raw_super()
    f2fs: fix race of pending_pages in decompression
    f2fs: fix to account inline xattr correctly during recovery
    f2fs: inline: fix wrong inline inode stat
    f2fs: inline: correct comment in f2fs_recover_inline_data
    f2fs: don't check PAGE_SIZE again in sanity_check_raw_super()
    f2fs: convert to F2FS_*_INO macro
    f2fs: introduce max_io_bytes, a sysfs entry, to limit bio size
    f2fs: don't allow any writes on readonly mount
    f2fs: avoid race condition for shrinker count
    f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE
    f2fs: add compress_mode mount option
    f2fs: Remove unnecessary unlikely()
    f2fs: init dirty_secmap incorrectly
    f2fs: remove buffer_head which has 32bits limit
    f2fs: fix wrong block count instead of bytes
    f2fs: use new conversion functions between blks and bytes
    f2fs: rename logical_to_blk and blk_to_logical
    f2fs: fix kbytes written stat for multi-device case
    ...

    Linus Torvalds
     
  • Pull ext2, reiserfs, quota and writeback updates from Jan Kara:

    - a couple of quota fixes (mostly for problems found by syzbot)

    - several ext2 cleanups

    - one fix for reiserfs crash on corrupted image

    - a fix for spurious warning in writeback code

    * tag 'for_v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    writeback: don't warn on an unregistered BDI in __mark_inode_dirty
    fs: quota: fix array-index-out-of-bounds bug by passing correct argument to vfs_cleanup_quota_inode()
    reiserfs: add check for an invalid ih_entry_count
    ext2: Fix fall-through warnings for Clang
    fs/ext2: Use ext2_put_page
    docs: filesystems: Reduce ext2.rst to one top-level heading
    quota: Sanity-check quota file headers on load
    quota: Don't overflow quota file offsets
    ext2: Remove unnecessary blank
    fs/quota: update quota state flags scheme with project quota flags

    Linus Torvalds
     
  • Fast commit on-disk format is designed such that the replay of these
    tags can be idempotent. This patch adds documentation in the code in
    form of comments and in form kernel docs that describes these
    characteristics. This patch also adds a TODO item needed to ensure
    kernel fast commit replay idempotence.

    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201119232822.1860882-1-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     

16 Dec, 2020

6 commits

  • Merge yet more updates from Andrew Morton:

    - lots of little subsystems

    - a few post-linux-next MM material. Most of the rest awaits more
    merging of other trees.

    Subsystems affected by this series: alpha, procfs, misc, core-kernel,
    bitmap, lib, lz4, checkpatch, nilfs, kdump, rapidio, gcov, bfs, relay,
    resource, ubsan, reboot, fault-injection, lzo, apparmor, and mm (swap,
    memory-hotplug, pagemap, cleanups, and gup).

    * emailed patches from Andrew Morton : (86 commits)
    mm: fix some spelling mistakes in comments
    mm: simplify follow_pte{,pmd}
    mm: unexport follow_pte_pmd
    apparmor: remove duplicate macro list_entry_is_head()
    lib/lzo/lzo1x_compress.c: make lzogeneric1x_1_compress() static
    fault-injection: handle EI_ETYPE_TRUE
    reboot: hide from sysfs not applicable settings
    reboot: allow to override reboot type if quirks are found
    reboot: remove cf9_safe from allowed types and rename cf9_force
    reboot: allow to specify reboot mode via sysfs
    reboot: refactor and comment the cpu selection code
    lib/ubsan.c: mark type_check_kinds with static keyword
    kcov: don't instrument with UBSAN
    ubsan: expand tests and reporting
    ubsan: remove UBSAN_MISC in favor of individual options
    ubsan: enable for all*config builds
    ubsan: disable UBSAN_TRAP for all*config
    ubsan: disable object-size sanitizer under GCC
    ubsan: move cc-option tests into Kconfig
    ubsan: remove redundant -Wno-maybe-uninitialized
    ...

    Linus Torvalds
     
  • Similar to speculation store bypass, show information about the indirect
    branch speculation mode of a task in /proc/$pid/status.

    For testing/benchmarking, I needed to see whether IB (Indirect Branch)
    speculation (see Spectre-v2) is enabled on a task, to see whether an
    IBPB instruction should be executed on an address space switch.
    Unfortunately, this information isn't available anywhere else and
    currently the only way to get it is to hack the kernel to expose it
    (like this change). It also helped expose a bug with conditional IB
    speculation on certain CPUs.

    Another place this could be useful is to audit the system when using
    sanboxing. With this change, I can confirm that seccomp-enabled
    process have IB speculation force disabled as expected when the kernel
    command line parameter `spectre_v2_user=seccomp`.

    Since there's already a 'Speculation_Store_Bypass' field, I used that
    as precedent for adding this one.

    [amistry@google.com: remove underscores from field name to workaround documentation issue]
    Link: https://lkml.kernel.org/r/20201106131015.v2.1.I7782b0cedb705384a634cfd8898eb7523562da99@changeid

    Link: https://lkml.kernel.org/r/20201030172731.1.I7782b0cedb705384a634cfd8898eb7523562da99@changeid
    Signed-off-by: Anand K Mistry
    Cc: Anthony Steinhauser
    Cc: Thomas Gleixner
    Cc: Anand K Mistry
    Cc: Alexey Dobriyan
    Cc: Alexey Gladkov
    Cc: Jonathan Corbet
    Cc: Kees Cook
    Cc: Mauro Carvalho Chehab
    Cc: Michal Hocko
    Cc: Mike Rapoport
    Cc: NeilBrown
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anand K Mistry
     
  • …biederm/user-namespace

    Pull execve updates from Eric Biederman:
    "This set of changes ultimately fixes the interaction of posix file
    lock and exec. Fundamentally most of the change is just moving where
    unshare_files is called during exec, and tweaking the users of
    files_struct so that the count of files_struct is not unnecessarily
    played with.

    Along the way fcheck and related helpers were renamed to more
    accurately reflect what they do.

    There were also many other small changes that fell out, as this is the
    first time in a long time much of this code has been touched.

    Benchmarks haven't turned up any practical issues but Al Viro has
    observed a possibility for a lot of pounding on task_lock. So I have
    some changes in progress to convert put_files_struct to always rcu
    free files_struct. That wasn't ready for the merge window so that will
    have to wait until next time"

    * 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
    exec: Move io_uring_task_cancel after the point of no return
    coredump: Document coredump code exclusively used by cell spufs
    file: Remove get_files_struct
    file: Rename __close_fd_get_file close_fd_get_file
    file: Replace ksys_close with close_fd
    file: Rename __close_fd to close_fd and remove the files parameter
    file: Merge __alloc_fd into alloc_fd
    file: In f_dupfd read RLIMIT_NOFILE once.
    file: Merge __fd_install into fd_install
    proc/fd: In fdinfo seq_show don't use get_files_struct
    bpf/task_iter: In task_file_seq_get_next use task_lookup_next_fd_rcu
    proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu
    file: Implement task_lookup_next_fd_rcu
    kcmp: In get_file_raw_ptr use task_lookup_fd_rcu
    proc/fd: In tid_fd_mode use task_lookup_fd_rcu
    file: Implement task_lookup_fd_rcu
    file: Rename fcheck lookup_fd_rcu
    file: Replace fcheck_files with files_lookup_fd_rcu
    file: Factor files_lookup_fd_locked out of fcheck_files
    file: Rename __fcheck_files to files_lookup_fd_raw
    ...

    Linus Torvalds
     
  • Pull nfsd updates from Chuck Lever:
    "Several substantial changes this time around:

    - Previously, exporting an NFS mount via NFSD was considered to be an
    unsupported feature. With v5.11, the community has attempted to
    make re-exporting a first-class feature of NFSD.

    This would enable the Linux in-kernel NFS server to be used as an
    intermediate cache for a remotely-located primary NFS server, for
    example, even with other NFS server implementations, like a NetApp
    filer, as the primary.

    - A short series of patches brings support for multiple RPC/RDMA data
    chunks per RPC transaction to the Linux NFS server's RPC/RDMA
    transport implementation.

    This is a part of the RPC/RDMA spec that the other premiere
    NFS/RDMA implementation (Solaris) has had for a very long time, and
    completes the implementation of RPC/RDMA version 1 in the Linux
    kernel's NFS server.

    - Long ago, NFSv4 support was introduced to NFSD using a series of C
    macros that hid dprintk's and goto's. Over time, the kernel's XDR
    implementation has been greatly improved, but these C macros have
    remained and become fallow. A series of patches in this pull
    request completely replaces those macros with the use of current
    kernel XDR infrastructure. Benefits include:

    - More robust input sanitization in NFSD's NFSv4 XDR decoders.

    - Make it easier to use common kernel library functions that use
    XDR stream APIs (for example, GSS-API).

    - Align the structure of the source code with the RFCs so it is
    easier to learn, verify, and maintain our XDR implementation.

    - Removal of more than a hundred hidden dprintk() call sites.

    - Removal of some explicit manipulation of pages to help make the
    eventual transition to xdr->bvec smoother.

    - On top of several related fixes in 5.10-rc, there are a few more
    fixes to get the Linux NFSD implementation of NFSv4.2 inter-server
    copy up to speed.

    And as usual, there is a pinch of seasoning in the form of a
    collection of unrelated minor bug fixes and clean-ups.

    Many thanks to all who contributed this time around!"

    * tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6: (131 commits)
    nfsd: Record NFSv4 pre/post-op attributes as non-atomic
    nfsd: Set PF_LOCAL_THROTTLE on local filesystems only
    nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE
    exportfs: Add a function to return the raw output from fh_to_dentry()
    nfsd: close cached files prior to a REMOVE or RENAME that would replace target
    nfsd: allow filesystems to opt out of subtree checking
    nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
    Revert "nfsd4: support change_attr_type attribute"
    nfsd4: don't query change attribute in v2/v3 case
    nfsd: minor nfsd4_change_attribute cleanup
    nfsd: simplify nfsd4_change_info
    nfsd: only call inode_query_iversion in the I_VERSION case
    nfs_common: need lock during iterate through the list
    NFSD: Fix 5 seconds delay when doing inter server copy
    NFSD: Fix sparse warning in nfs4proc.c
    SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcall
    sunrpc: clean-up cache downcall
    nfsd: Fix message level for normal termination
    NFSD: Remove macros that are no longer used
    NFSD: Replace READ* macros in nfsd4_decode_compound()
    ...

    Linus Torvalds
     
  • Merge misc updates from Andrew Morton:

    - a few random little subsystems

    - almost all of the MM patches which are staged ahead of linux-next
    material. I'll trickle to post-linux-next work in as the dependents
    get merged up.

    Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
    ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
    gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
    kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
    oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
    uaccess, zram, and cleanups).

    * emailed patches from Andrew Morton : (200 commits)
    mm: cleanup kstrto*() usage
    mm: fix fall-through warnings for Clang
    mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
    mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
    mm:backing-dev: use sysfs_emit in macro defining functions
    mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
    mm: use sysfs_emit for struct kobject * uses
    mm: fix kernel-doc markups
    zram: break the strict dependency from lzo
    zram: add stat to gather incompressible pages since zram set up
    zram: support page writeback
    mm/process_vm_access: remove redundant initialization of iov_r
    mm/zsmalloc.c: rework the list_add code in insert_zspage()
    mm/zswap: move to use crypto_acomp API for hardware acceleration
    mm/zswap: fix passing zero to 'PTR_ERR' warning
    mm/zswap: make struct kernel_param_ops definitions const
    userfaultfd/selftests: hint the test runner on required privilege
    userfaultfd/selftests: fix retval check for userfaultfd_open()
    userfaultfd/selftests: always dump something in modes
    userfaultfd: selftests: make __{s,u}64 format specifiers portable
    ...

    Linus Torvalds
     
  • Fix a typo, punctuation, use uppercase for CPUs, and limit
    tmpfs to keeping only its files in virtual memory (phrasing).

    Link: https://lkml.kernel.org/r/20201202010934.18566-1-rdunlap@infradead.org
    Signed-off-by: Randy Dunlap
    Acked-by: Hugh Dickins
    Cc: Chris Down
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

15 Dec, 2020

1 commit

  • Pull documentation updates from Jonathan Corbet:
    "A much quieter cycle for documentation (happily), with, one hopes, the
    bulk of the churn behind us. Significant stuff in this pull includes:

    - A set of new Chinese translations

    - Italian translation updates

    - A mechanism from Mauro to automatically format
    Documentation/features for the built docs

    - Automatic cross references without explicit :ref: markup

    - A new reset-controller document

    - An extensive new document on reporting problems from Thorsten

    That last patch also adds the CC-BY-4.0 license to LICENSES/dual;
    there was some discussion on this, but we seem to have consensus and
    an ack from Greg for that addition"

    * tag 'docs-5.11' of git://git.lwn.net/linux: (50 commits)
    docs: fix broken cross reference in translations/zh_CN
    docs: Note that sphinx 1.7 will be required soon
    docs: update requirements to install six module
    docs: reporting-issues: move 'outdated, need help' note to proper place
    docs: Update documentation to reflect what TAINT_CPU_OUT_OF_SPEC means
    docs: add a reset controller chapter to the driver API docs
    docs: make reporting-bugs.rst obsolete
    docs: Add a new text describing how to report bugs
    LICENSES: Add the CC-BY-4.0 license
    Documentation: fix multiple typos found in the admin-guide subdirectory
    Documentation: fix typos found in admin-guide subdirectory
    kernel-doc: Fix example in Nested structs/unions
    docs: clean up sysctl/kernel: titles, version
    docs: trace: fix event state structure name
    docs: nios2: add missing ReST file
    scripts: get_feat.pl: reduce table width for all features output
    scripts: get_feat.pl: change the group by order
    scripts: get_feat.pl: make complete table more coincise
    scripts: kernel-doc: fix parsing function-like typedefs
    Documentation: fix typos found in process, dev-tools, and doc-guide subdirectories
    ...

    Linus Torvalds
     

14 Dec, 2020

1 commit

  • Optionally allow using "user.overlay." namespace instead of
    "trusted.overlay."

    This is necessary for overlayfs to be able to be mounted in an unprivileged
    namepsace.

    Make the option explicit, since it makes the filesystem format be
    incompatible.

    Disable redirect_dir and metacopy options, because these would allow
    privilege escalation through direct manipulation of the
    "user.overlay.redirect" or "user.overlay.metacopy" xattrs.

    Signed-off-by: Miklos Szeredi
    Reviewed-by: Amir Goldstein

    Miklos Szeredi
     

11 Dec, 2020

2 commits

  • Also remove the confusing comment about checking if a fd exists. I
    could not find one instance in the entire kernel that still matches
    the description or the reason for the name fcheck.

    The need for better names became apparent in the last round of
    discussion of this set of changes[1].

    [1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqyQ@mail.gmail.com
    Link: https://lkml.kernel.org/r/20201120231441.29911-10-ebiederm@xmission.com
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • This change renames fcheck_files to files_lookup_fd_rcu. All of the
    remaining callers take the rcu_read_lock before calling this function
    so the _rcu suffix is appropriate. This change also tightens up the
    debug check to verify that all callers hold the rcu_read_lock.

    All callers that used to call files_check with the files->file_lock
    held have now been changed to call files_lookup_fd_locked.

    This change of name has helped remind me of which locks and which
    guarantees are in place helping me to catch bugs later in the
    patchset.

    The need for better names became apparent in the last round of
    discussion of this set of changes[1].

    [1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqyQ@mail.gmail.com
    Link: https://lkml.kernel.org/r/20201120231441.29911-9-ebiederm@xmission.com
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

09 Dec, 2020

3 commits

  • It's not uncommon for some workloads to do a bunch of I/O to a file and
    delete it just afterward. If knfsd has a cached open file however, then
    the file may still be open when the dentry is unlinked. If the
    underlying filesystem is nfs, then that could trigger it to do a
    sillyrename.

    On a REMOVE or RENAME scan the nfsd_file cache for open files that
    correspond to the inode, and proactively unhash and put their
    references. This should prevent any delete-on-last-close activity from
    occurring, solely due to knfsd's open file cache.

    This must be done synchronously though so we use the variants that call
    flush_delayed_fput. There are deadlock possibilities if you call
    flush_delayed_fput while holding locks, however. In the case of
    nfsd_rename, we don't even do the lookups of the dentries to be renamed
    until we've locked for rename.

    Once we've figured out what the target dentry is for a rename, check to
    see whether there are cached open files associated with it. If there
    are, then unwind all of the locking, close them all, and then reattempt
    the rename.

    None of this is really necessary for "typical" filesystems though. It's
    mostly of use for NFS, so declare a new export op flag and use that to
    determine whether to close the files beforehand.

    Signed-off-by: Jeff Layton
    Signed-off-by: Lance Shelton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Chuck Lever

    Jeff Layton
     
  • When we start allowing NFS to be reexported, then we have some problems
    when it comes to subtree checking. In principle, we could allow it, but
    it would mean encoding parent info in the filehandles and there may not
    be enough space for that in a NFSv3 filehandle.

    To enforce this at export upcall time, we add a new export_ops flag
    that declares the filesystem ineligible for subtree checking.

    Signed-off-by: Jeff Layton
    Signed-off-by: Lance Shelton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Chuck Lever

    Jeff Layton
     
  • With NFSv3 nfsd will always attempt to send along WCC data to the
    client. This generally involves saving off the in-core inode information
    prior to doing the operation on the given filehandle, and then issuing a
    vfs_getattr to it after the op.

    Some filesystems (particularly clustered or networked ones) have an
    expensive ->getattr inode operation. Atomicity is also often difficult
    or impossible to guarantee on such filesystems. For those, we're best
    off not trying to provide WCC information to the client at all, and to
    simply allow it to poll for that information as needed with a GETATTR
    RPC.

    This patch adds a new flags field to struct export_operations, and
    defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
    that nfsd should not attempt to provide WCC info in NFSv3 replies. It
    also adds a blurb about the new flags field and flag to the exporting
    documentation.

    The server will also now skip collecting this information for NFSv2 as
    well, since that info is never used there anyway.

    Note that this patch does not add this flag to any filesystem
    export_operations structures. This was originally developed to allow
    reexporting nfs via nfsd.

    Other filesystems may want to consider enabling this flag too. It's hard
    to tell however which ones have export operations to enable export via
    knfsd and which ones mostly rely on them for open-by-filehandle support,
    so I'm leaving that up to the individual maintainers to decide. I am
    cc'ing the relevant lists for those filesystems that I think may want to
    consider adding this though.

    Cc: HPDD-discuss@lists.01.org
    Cc: ceph-devel@vger.kernel.org
    Cc: cluster-devel@redhat.com
    Cc: fuse-devel@lists.sourceforge.net
    Cc: ocfs2-devel@oss.oracle.com
    Signed-off-by: Jeff Layton
    Signed-off-by: Lance Shelton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Chuck Lever

    Jeff Layton
     

04 Dec, 2020

1 commit

  • Change wording to say that messages are logged to the kernel log
    buffer instead of to dmesg. dmesg is just one program that can
    print the kernel log buffer.

    Signed-off-by: Randy Dunlap
    Cc: David Howells
    Cc: Andrew Morton
    Cc: Alexander Viro
    Link: https://lore.kernel.org/r/20201202012409.19194-1-rdunlap@infradead.org
    Signed-off-by: Jonathan Corbet

    Randy Dunlap
     

03 Dec, 2020

2 commits

  • We will add a new "compress_mode" mount option to control file
    compression mode. This supports "fs" and "user". In "fs" mode (default),
    f2fs does automatic compression on the compression enabled files.
    In "user" mode, f2fs disables the automaic compression and gives the
    user discretion of choosing the target file and the timing. It means
    the user can do manual compression/decompression on the compression
    enabled files using ioctls.

    Signed-off-by: Daeho Jeong
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Daeho Jeong
     
  • This patch supports to store chksum value with compressed
    data, and verify the integrality of compressed data while
    reading the data.

    The feature can be enabled through specifying mount option
    'compress_chksum'.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

02 Dec, 2020

1 commit

  • Document that /proc/PID/smaps shows PROT_MTE settings in VmFlags.
    Support for this was introduced in

    commit 9f3419315f3cdc41a7318e4d50ba18a592b30c8c
    arm64: mte: Add PROT_MTE support to mmap() and mprotect()

    Signed-off-by: Szabolcs Nagy
    Reviewed-by: Catalin Marinas
    Cc: linux-doc@vger.kernel.org
    Link: https://lore.kernel.org/r/20201106101940.5777-1-szabolcs.nagy@arm.com
    Signed-off-by: Jonathan Corbet

    Szabolcs Nagy
     

01 Dec, 2020

1 commit


24 Nov, 2020

1 commit

  • Although it isn't used directly by the ioctls,
    "struct fsverity_descriptor" is required by userspace programs that need
    to compute fs-verity file digests in a standalone way. Therefore
    it's also needed to sign files in a standalone way.

    Similarly, "struct fsverity_formatted_digest" (previously called
    "struct fsverity_signed_digest" which was misleading) is also needed to
    sign files if the built-in signature verification is being used.

    Therefore, move these structs to the UAPI header.

    While doing this, try to make it clear that the signature-related fields
    in fsverity_descriptor aren't used in the file digest computation.

    Acked-by: Luca Boccassi
    Link: https://lore.kernel.org/r/20201113211918.71883-5-ebiggers@kernel.org
    Signed-off-by: Eric Biggers

    Eric Biggers
     

17 Nov, 2020

2 commits

  • I originally chose the name "file measurement" to refer to the fs-verity
    file digest to avoid confusion with traditional full-file digests or
    with the bare root hash of the Merkle tree.

    But the name "file measurement" hasn't caught on, and usually people are
    calling it something else, usually the "file digest". E.g. see
    "struct fsverity_digest" and "struct fsverity_formatted_digest", the
    libfsverity_compute_digest() and libfsverity_sign_digest() functions in
    libfsverity, and the "fsverity digest" command.

    Having multiple names for the same thing is always confusing.

    So to hopefully avoid confusion in the future, rename
    "fs-verity file measurement" to "fs-verity file digest".

    This leaves FS_IOC_MEASURE_VERITY as the only reference to "measure" in
    the kernel, which makes some amount of sense since the ioctl is actively
    "measuring" the file.

    I'll be renaming this in fsverity-utils too (though similarly the
    'fsverity measure' command, which is a wrapper for
    FS_IOC_MEASURE_VERITY, will stay).

    Acked-by: Luca Boccassi
    Link: https://lore.kernel.org/r/20201113211918.71883-4-ebiggers@kernel.org
    Signed-off-by: Eric Biggers

    Eric Biggers
     
  • The name "struct fsverity_signed_digest" is causing confusion because it
    isn't actually a signed digest, but rather it's the way that the digest
    is formatted in order to be signed. Rename it to
    "struct fsverity_formatted_digest" to prevent this confusion.

    Also update the struct's comment to clarify that it's specific to the
    built-in signature verification support and isn't a requirement for all
    fs-verity users.

    I'll be renaming this struct in fsverity-utils too.

    Acked-by: Luca Boccassi
    Link: https://lore.kernel.org/r/20201113211918.71883-3-ebiggers@kernel.org
    Signed-off-by: Eric Biggers

    Eric Biggers
     

14 Nov, 2020

1 commit


12 Nov, 2020

3 commits

  • Some overlayfs optional features are incompatible with offline changes to
    the lower tree and may result in -EXDEV, -EIO, or other errors. Such
    modification is not supported and the error behavior is intentionally not
    specified.

    Update the "Changes to underlying filesystems" section to note this
    restriction. Move the paragraph describing the offline behavior below the
    online behavior so it is adjacent to the following 3 paragraphs describing
    the NFS export offline modification behavior.

    Link: https://lore.kernel.org/linux-unionfs/20200708142353.GA103536@redhat.com/
    Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxi23Zsmfb4rCed1n=On0NNA5KZD74jjjeyz+et32sk-gg@mail.gmail.com/
    Link: https://lore.kernel.org/linux-unionfs/20200817135651.GA637139@redhat.com/
    Link: https://lore.kernel.org/linux-unionfs/20200709153616.GE150543@redhat.com/
    Link: https://lore.kernel.org/linux-unionfs/20200812135529.GA122370@kevinolos/
    Signed-off-by: Kevin Locke
    Signed-off-by: Miklos Szeredi

    Kevin Locke
     
  • Documentation says "The lower filesystem can be any filesystem supported by
    Linux". However, this is not the case, as Linux supports vfat and vfat
    doesn't work as a lower filesystem

    Reported-by: nerdopolis
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • This replaces uuid with null in overlayfs file handles and thus relaxes
    uuid checks for overlay index feature. It is only possible in case there is
    only one filesystem for all the work/upper/lower directories and bare file
    handles from this backing filesystem are unique. In other case when we have
    multiple filesystems lets just fallback to "uuid=on" which is and
    equivalent of how it worked before with all uuid checks.

    This is needed when overlayfs is/was mounted in a container with index
    enabled (e.g.: to be able to resolve inotify watch file handles on it to
    paths in CRIU), and this container is copied and started alongside with the
    original one. This way the "copy" container can't have the same uuid on the
    superblock and mounting the overlayfs from it later would fail.

    That is an example of the problem on top of loop+ext4:

    dd if=/dev/zero of=loopbackfile.img bs=100M count=10
    losetup -fP loopbackfile.img
    losetup -a
    #/dev/loop0: [64768]:35 (/loop-test/loopbackfile.img)
    mkfs.ext4 loopbackfile.img
    mkdir loop-mp
    mount -o loop /dev/loop0 loop-mp
    mkdir loop-mp/{lower,upper,work,merged}
    mount -t overlay overlay -oindex=on,lowerdir=loop-mp/lower,\
    upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged
    umount loop-mp/merged
    umount loop-mp
    e2fsck -f /dev/loop0
    tune2fs -U random /dev/loop0

    mount -o loop /dev/loop0 loop-mp
    mount -t overlay overlay -oindex=on,lowerdir=loop-mp/lower,\
    upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged
    #mount: /loop-test/loop-mp/merged:
    #mount(2) system call failed: Stale file handle.

    If you just change the uuid of the backing filesystem, overlay is not
    mounting any more. In Virtuozzo we copy container disks (ploops) when
    create the copy of container and we require fs uuid to be unique for a new
    container.

    Signed-off-by: Pavel Tikhomirov
    Reviewed-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Pavel Tikhomirov
     

10 Nov, 2020

1 commit

  • Pull ext4 fixes and cleanups from Ted Ts'o:
    "More fixes and cleanups for the new fast_commit features, but also a
    few other miscellaneous bug fixes and a cleanup for the MAINTAINERS
    file"

    * tag 'ext4_for_linus_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (28 commits)
    jbd2: fix up sparse warnings in checkpoint code
    ext4: fix sparse warnings in fast_commit code
    ext4: cleanup fast commit mount options
    jbd2: don't start fast commit on aborted journal
    ext4: make s_mount_flags modifications atomic
    ext4: issue fsdev cache flush before starting fast commit
    ext4: disable fast commit with data journalling
    ext4: fix inode dirty check in case of fast commits
    ext4: remove unnecessary fast commit calls from ext4_file_mmap
    ext4: mark buf dirty before submitting fast commit buffer
    ext4: fix code documentatioon
    ext4: dedpulicate the code to wait on inode that's being committed
    jbd2: don't read journal->j_commit_sequence without taking a lock
    jbd2: don't touch buffer state until it is filled
    jbd2: add todo for a fast commit performance optimization
    jbd2: don't pass tid to jbd2_fc_end_commit_fallback()
    jbd2: don't use state lock during commit path
    jbd2: drop jbd2_fc_init documentation
    ext4: clean up the JBD2 API that initializes fast commits
    jbd2: rename j_maxlen to j_total_len and add jbd2_journal_max_txn_bufs
    ...

    Linus Torvalds
     

09 Nov, 2020

1 commit


07 Nov, 2020

2 commits

  • Now that jbd2_fc_init is dropped, drop its docs too.

    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201106035911.1942128-8-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     
  • Fast commit feature has flags in the file system as well in JBD2. The
    meaning of fast commit feature flags can get confusing. Update docs
    and code to add more documentation about it.

    Suggested-by: Jan Kara
    Signed-off-by: Harshad Shirwadkar
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/20201106035911.1942128-2-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     

04 Nov, 2020

1 commit

  • Pull documentation build warning fixes from Jonathan Corbet:
    "This contains a series of warning fixes from Mauro; once applied, the
    number of warnings from the once-noisy docs build process is nearly
    zero.

    Getting to this point has required a lot of work; once there,
    hopefully we can keep things that way.

    I have packaged this as a separate pull because it does a fair amount
    of reaching outside of Documentation/. The changes are all in comments
    and in code placement. It's all been in linux-next since last week"

    * tag 'docs-5.10-warnings' of git://git.lwn.net/linux: (24 commits)
    docs: SafeSetID: fix a warning
    amdgpu: fix a few kernel-doc markup issues
    selftests: kselftest_harness.h: fix kernel-doc markups
    drm: amdgpu_dm: fix a typo
    gpu: docs: amdgpu.rst: get rid of wrong kernel-doc markups
    drm: amdgpu: kernel-doc: update some adev parameters
    docs: fs: api-summary.rst: get rid of kernel-doc include
    IB/srpt: docs: add a description for cq_size member
    locking/refcount: move kernel-doc markups to the proper place
    docs: lockdep-design: fix some warning issues
    MAINTAINERS: fix broken doc refs due to yaml conversion
    ice: docs fix a devlink info that broke a table
    crypto: sun8x-ce*: update entries to its documentation
    net: phy: remove kernel-doc duplication
    mm: pagemap.h: fix two kernel-doc markups
    blk-mq: docs: add kernel-doc description for a new struct member
    docs: userspace-api: add iommu.rst to the index file
    docs: hwmon: mp2975.rst: address some html build warnings
    docs: net: statistics.rst: remove a duplicated kernel-doc
    docs: kasan.rst: add two missing blank lines
    ...

    Linus Torvalds
     

30 Oct, 2020

1 commit


29 Oct, 2020

1 commit

  • The direct-io.c file used to have just two exported symbols:

    - dio_end_io()
    - __blockdev_direct_IO()

    The first one was removed by changeset
    c33fe275b530 ("fs: remove no longer used dio_end_io()")

    And the last one is used on most places indirectly, via
    the inline macro blockdev_direct_IO() provided by fs.h.
    Yet, neither the macro or the function have kernel-doc
    markups.

    So, drop the inclusion of fs/direct-io.c at the docs.

    Fixes: c33fe275b530 ("fs: remove no longer used dio_end_io()")
    Signed-off-by: Mauro Carvalho Chehab
    Link: https://lore.kernel.org/r/d0a9fffedca102633c168adaf157f34288a4ea67.1603791716.git.mchehab+huawei@kernel.org
    Signed-off-by: Jonathan Corbet

    Mauro Carvalho Chehab
     

23 Oct, 2020

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "The siginificant new ext4 feature this time around is Harshad's new
    fast_commit mode.

    In addition, thanks to Mauricio for fixing a race where mmap'ed pages
    that are being changed in parallel with a data=journal transaction
    commit could result in bad checksums in the failure that could cause
    journal replays to fail.

    Also notable is Ritesh's buffered write optimization which can result
    in significant improvements on parallel write workloads. (The kernel
    test robot reported a 330.6% improvement on fio.write_iops on a 96
    core system using DAX)

    Besides that, we have the usual miscellaneous cleanups and bug fixes"

    Link: https://lore.kernel.org/r/20200925071217.GO28663@shao2-debian

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (46 commits)
    ext4: fix invalid inode checksum
    ext4: add fast commit stats in procfs
    ext4: add a mount opt to forcefully turn fast commits on
    ext4: fast commit recovery path
    jbd2: fast commit recovery path
    ext4: main fast-commit commit path
    jbd2: add fast commit machinery
    ext4 / jbd2: add fast commit initialization
    ext4: add fast_commit feature and handling for extended mount options
    doc: update ext4 and journalling docs to include fast commit feature
    ext4: Detect already used quota file early
    jbd2: avoid transaction reuse after reformatting
    ext4: use the normal helper to get the actual inode
    ext4: fix bs < ps issue reported with dioread_nolock mount opt
    ext4: data=journal: write-protect pages on j_submit_inode_data_buffers()
    ext4: data=journal: fixes for ext4_page_mkwrite()
    jbd2, ext4, ocfs2: introduce/use journal callbacks j_submit|finish_inode_data_buffers()
    jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers()
    ext4: introduce ext4_sb_bread_unmovable() to replace sb_bread_unmovable()
    ext4: use ext4_sb_bread() instead of sb_bread()
    ...

    Linus Torvalds