02 Jul, 2022

1 commit

  • commit 8cc5c54de44c5e8e104d364a627ac4296845fc7f upstream.

    Now that we implement the full remapping algorithms described in our
    documentation remove the section about shortcircuting them.

    Link: https://lore.kernel.org/r/20211123114227.3124056-6-brauner@kernel.org (v1)
    Link: https://lore.kernel.org/r/20211130121032.3753852-6-brauner@kernel.org (v2)
    Link: https://lore.kernel.org/r/20211203111707.3901969-6-brauner@kernel.org
    Cc: Seth Forshee
    Cc: Amir Goldstein
    Cc: Christoph Hellwig
    Cc: Al Viro
    CC: linux-fsdevel@vger.kernel.org
    Reviewed-by: Seth Forshee
    Signed-off-by: Christian Brauner
    Signed-off-by: Christian Brauner (Microsoft)
    Signed-off-by: Greg Kroah-Hartman

    Christian Brauner
     

09 Jun, 2022

1 commit

  • [ Upstream commit 10a26878564f27327b12e8f4b4d8d7b43065fae5 ]

    This patch adds a new function f2fs_dquot_initialize() to wrap
    dquot_initialize(), and it supports to inject fault into
    f2fs_dquot_initialize() to simulate inner failure occurs in
    dquot_initialize().

    Usage:
    a) echo 65536 > /sys/fs/f2fs//inject_type or
    b) mount -o fault_type=65536

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Chao Yu
     

27 Apr, 2022

1 commit

  • commit 7102ffe4c166ca0f5e35137e9f9de83768c2d27d upstream.

    According to document and code, ext4_xattr_header's size is 32 bytes, so
    h_reserved size should be 3.

    Signed-off-by: Wang Jianjian
    Link: https://lore.kernel.org/r/92fcc3a6-7d77-8c09-4126-377fcb4c46a5@huawei.com
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    wangjianjian (C)
     

19 Nov, 2021

1 commit

  • [ Upstream commit 7f595d6a6cdc336834552069a2e0a4f6d4756ddf ]

    fscrypt currently requires a 512-bit master key when AES-256-XTS is
    used, since AES-256-XTS keys are 512-bit and fscrypt requires that the
    master key be at least as long any key that will be derived from it.

    However, this is overly strict because AES-256-XTS doesn't actually have
    a 512-bit security strength, but rather 256-bit. The fact that XTS
    takes twice the expected key size is a quirk of the XTS mode. It is
    sufficient to use 256 bits of entropy for AES-256-XTS, provided that it
    is first properly expanded into a 512-bit key, which HKDF-SHA512 does.

    Therefore, relax the check of the master key size to use the security
    strength of the derived key rather than the size of the derived key
    (except for v1 encryption policies, which don't use HKDF).

    Besides making things more flexible for userspace, this is needed in
    order for the use of a KDF which only takes a 256-bit key to be
    introduced into the fscrypt key hierarchy. This will happen with
    hardware-wrapped keys support, as all known hardware which supports that
    feature uses an SP800-108 KDF using AES-256-CMAC, so the wrapped keys
    are wrapped 256-bit AES keys. Moreover, there is interest in fscrypt
    supporting the same type of AES-256-CMAC based KDF in software as an
    alternative to HKDF-SHA512. There is no security problem with such
    features, so fix the key length check to work properly with them.

    Reviewed-by: Paul Crowley
    Link: https://lore.kernel.org/r/20210921030303.5598-1-ebiggers@kernel.org
    Signed-off-by: Eric Biggers
    Signed-off-by: Sasha Levin

    Eric Biggers
     

15 Oct, 2021

1 commit

  • Pull ntfs3 fixes from Konstantin Komarov:
    "Use the new api for mounting as requested by Christoph.

    Also fixed:

    - some memory leaks and panic

    - xfstests (tested on x86_64) generic/016 generic/021 generic/022
    generic/041 generic/274 generic/423

    - some typos, wrong returned error codes, dead code, etc"

    * tag 'ntfs3_for_5.15' of git://github.com/Paragon-Software-Group/linux-ntfs3: (70 commits)
    fs/ntfs3: Check for NULL pointers in ni_try_remove_attr_list
    fs/ntfs3: Refactor ntfs_read_mft
    fs/ntfs3: Refactor ni_parse_reparse
    fs/ntfs3: Refactor ntfs_create_inode
    fs/ntfs3: Refactor ntfs_readlink_hlp
    fs/ntfs3: Rework ntfs_utf16_to_nls
    fs/ntfs3: Fix memory leak if fill_super failed
    fs/ntfs3: Keep prealloc for all types of files
    fs/ntfs3: Remove unnecessary functions
    fs/ntfs3: Forbid FALLOC_FL_PUNCH_HOLE for normal files
    fs/ntfs3: Refactoring of ntfs_set_ea
    fs/ntfs3: Remove locked argument in ntfs_set_ea
    fs/ntfs3: Use available posix_acl_release instead of ntfs_posix_acl_release
    fs/ntfs3: Check for NULL if ATTR_EA_INFO is incorrect
    fs/ntfs3: Refactoring of ntfs_init_from_boot
    fs/ntfs3: Reject mount if boot's cluster size < media sector size
    fs/ntfs3: Refactoring lock in ntfs_init_acl
    fs/ntfs3: Change posix_acl_equiv_mode to posix_acl_update_mode
    fs/ntfs3: Pass flags to ntfs_set_ea in ntfs_set_acl_ex
    fs/ntfs3: Refactor ntfs_get_acl_ex for better readability
    ...

    Linus Torvalds
     

20 Sep, 2021

1 commit

  • Current ntfs3 rst documentation is broken. I turn table to list table as
    this is current Linux documentation quide line. Simple table also did
    not quite work in our situation as we need to span rows together.

    It still look quite good as text so we did not loss anything. This will
    also make diffing quite bit more pleasure.

    Signed-off-by: Kari Argillander
    Signed-off-by: Konstantin Komarov

    Kari Argillander
     

12 Sep, 2021

1 commit

  • Pull block fixes from Jens Axboe:

    - NVMe pull request from Christoph:
    - fix nvmet command set reporting for passthrough controllers (Adam Manzanares)
    - update a MAINTAINERS email address (Chaitanya Kulkarni)
    - set QUEUE_FLAG_NOWAIT for nvme-multipth (me)
    - handle errors from add_disk() (Luis Chamberlain)
    - update the keep alive interval when kato is modified (Tatsuya Sasaki)
    - fix a buffer overrun in nvmet_subsys_attr_serial (Hannes Reinecke)
    - do not reset transport on data digest errors in nvme-tcp (Daniel Wagner)
    - only call synchronize_srcu when clearing current path (Daniel Wagner)
    - revalidate paths during rescan (Hannes Reinecke)

    - Split out the fs/block_dev into block/fops.c and block/bdev.c, which
    has been long overdue. Do this now before -rc1, to avoid annoying
    conflicts due to this (Christoph)

    - blk-throtl use-after-free fix (Li)

    - Improve plug depth for multi-device plugs, greatly increasing md
    resync performance (Song)

    - blkdev_show() locking fix (Tetsuo)

    - n64cart error check fix (Yang)

    * tag 'block-5.15-2021-09-11' of git://git.kernel.dk/linux-block:
    n64cart: fix return value check in n64cart_probe()
    blk-mq: allow 4x BLK_MAX_REQUEST_COUNT at blk_plug for multiple_queues
    block: move fs/block_dev.c to block/bdev.c
    block: split out operations on block special files
    blk-throttle: fix UAF by deleteing timer in blk_throtl_exit()
    block: genhd: don't call blkdev_show() with major_names_lock held
    nvme: update MAINTAINERS email address
    nvme: add error handling support for add_disk()
    nvme: only call synchronize_srcu when clearing current path
    nvme: update keep alive interval when kato is modified
    nvme-tcp: Do not reset transport on data digest errors
    nvmet: fixup buffer overrun in nvmet_subsys_attr_serial()
    nvmet: return bool from nvmet_passthru_ctrl and nvmet_is_passthru_req
    nvmet: looks at the passthrough controller when initializing CAP
    nvme: move nvme_multi_css into nvme.h
    nvme-multipath: revalidate paths during rescan
    nvme-multipath: set QUEUE_FLAG_NOWAIT

    Linus Torvalds
     

10 Sep, 2021

3 commits


07 Sep, 2021

1 commit


05 Sep, 2021

2 commits

  • Merge NTFSv3 filesystem from Konstantin Komarov:
    "This patch adds NTFS Read-Write driver to fs/ntfs3.

    Having decades of expertise in commercial file systems development and
    huge test coverage, we at Paragon Software GmbH want to make our
    contribution to the Open Source Community by providing implementation
    of NTFS Read-Write driver for the Linux Kernel.

    This is fully functional NTFS Read-Write driver. Current version works
    with NTFS (including v3.1) and normal/compressed/sparse files and
    supports journal replaying.

    We plan to support this version after the codebase once merged, and
    add new features and fix bugs. For example, full journaling support
    over JBD will be added in later updates"

    Link: https://lore.kernel.org/lkml/20210729134943.778917-1-almaz.alexandrovich@paragon-software.com/
    Link: https://lore.kernel.org/lkml/aa4aa155-b9b2-9099-b7a2-349d8d9d8fbd@paragon-software.com/

    * git://github.com/Paragon-Software-Group/linux-ntfs3: (35 commits)
    fs/ntfs3: Change how module init/info messages are displayed
    fs/ntfs3: Remove GPL boilerplates from decompress lib files
    fs/ntfs3: Remove unnecessary condition checking from ntfs_file_read_iter
    fs/ntfs3: Fix integer overflow in ni_fiemap with fiemap_prep()
    fs/ntfs3: Restyle comments to better align with kernel-doc
    fs/ntfs3: Rework file operations
    fs/ntfs3: Remove fat ioctl's from ntfs3 driver for now
    fs/ntfs3: Restyle comments to better align with kernel-doc
    fs/ntfs3: Fix error handling in indx_insert_into_root()
    fs/ntfs3: Potential NULL dereference in hdr_find_split()
    fs/ntfs3: Fix error code in indx_add_allocate()
    fs/ntfs3: fix an error code in ntfs_get_acl_ex()
    fs/ntfs3: add checks for allocation failure
    fs/ntfs3: Use kcalloc/kmalloc_array over kzalloc/kmalloc
    fs/ntfs3: Do not use driver own alloc wrappers
    fs/ntfs3: Use kernel ALIGN macros over driver specific
    fs/ntfs3: Restyle comment block in ni_parse_reparse()
    fs/ntfs3: Remove unused including
    fs/ntfs3: Fix fall-through warnings for Clang
    fs/ntfs3: Fix one none utf8 char in source file
    ...

    Linus Torvalds
     
  • Pull f2fs updates from Jaegeuk Kim:
    "In this cycle, we've addressed some performance issues such as lock
    contention, misbehaving compress_cache, allowing extent_cache for
    compressed files, and new sysfs to adjust ra_size for fadvise.

    In order to diagnose the performance issues quickly, we also added an
    iostat which shows the IO latencies periodically.

    On the stability side, we've found two memory leakage cases in the
    error path in compression flow. And, we've also fixed various corner
    cases in fiemap, quota, checkpoint=disable, zstd, and so on.

    Enhancements:
    - avoid long checkpoint latency by releasing nat_tree_lock
    - collect and show iostats periodically
    - support extent_cache for compressed files
    - add a sysfs entry to manage ra_size given fadvise(POSIX_FADV_SEQUENTIAL)
    - report f2fs GC status via sysfs
    - add discard_unit=%s in mount option to handle zoned device

    Bug fixes:
    - fix two memory leakages when an error happens in the compressed IO flow
    - fix commpress_cache to get the right LBA
    - fix fiemap to deal with compressed case correctly
    - fix wrong EIO returns due to SBI_NEED_FSCK
    - fix missing writes when enabling checkpoint back
    - fix quota deadlock
    - fix zstd level mount option

    In addition to the above major updates, we've cleaned up several code
    paths such as dio, unnecessary operations, debugfs/f2fs/status, sanity
    check, and typos"

    * tag 'f2fs-for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits)
    f2fs: should put a page beyond EOF when preparing a write
    f2fs: deallocate compressed pages when error happens
    f2fs: enable realtime discard iff device supports discard
    f2fs: guarantee to write dirty data when enabling checkpoint back
    f2fs: fix to unmap pages from userspace process in punch_hole()
    f2fs: fix unexpected ENOENT comes from f2fs_map_blocks()
    f2fs: fix to account missing .skipped_gc_rwsem
    f2fs: adjust unlock order for cleanup
    f2fs: Don't create discard thread when device doesn't support realtime discard
    f2fs: rebuild nat_bits during umount
    f2fs: introduce periodic iostat io latency traces
    f2fs: separate out iostat feature
    f2fs: compress: do sanity check on cluster
    f2fs: fix description about main_blkaddr node
    f2fs: convert S_IRUGO to 0444
    f2fs: fix to keep compatibility of fault injection interface
    f2fs: support fault injection for f2fs_kmem_cache_alloc()
    f2fs: compress: allow write compress released file after truncate to zero
    f2fs: correct comment in segment.h
    f2fs: improve sbi status info in debugfs/f2fs/status
    ...

    Linus Torvalds
     

03 Sep, 2021

3 commits

  • Pull ext4 updates from Ted Ts'o:
    "In addition to some ext4 bug fixes and cleanups, this cycle we add the
    orphan_file feature, which eliminates bottlenecks when doing a large
    number of parallel truncates and file deletions, and move the discard
    operation out of the jbd2 commit thread when using the discard mount
    option, to better support devices with slow discard operations"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
    ext4: make the updating inode data procedure atomic
    ext4: remove an unnecessary if statement in __ext4_get_inode_loc()
    ext4: move inode eio simulation behind io completeion
    ext4: Improve scalability of ext4 orphan file handling
    ext4: Orphan file documentation
    ext4: Speedup ext4 orphan inode handling
    ext4: Move orphan inode handling into a separate file
    ext4: Support for checksumming from journal triggers
    ext4: fix race writing to an inline_data file while its xattrs are changing
    jbd2: add sparse annotations for add_transaction_credits()
    ext4: fix sparse warnings
    ext4: Make sure quota files are not grabbed accidentally
    ext4: fix e2fsprogs checksum failure for mounted filesystem
    ext4: if zeroout fails fall back to splitting the extent node
    ext4: reduce arguments of ext4_fc_add_dentry_tlv
    ext4: flush background discard kwork when retry allocation
    ext4: get discard out of jbd2 commit kthread contex
    ext4: remove the repeated comment of ext4_trim_all_free
    ext4: add new helper interface ext4_try_to_trim_range()
    ext4: remove the 'group' parameter of ext4_trim_extent
    ...

    Linus Torvalds
     
  • Pull overlayfs update from Miklos Szeredi:

    - Copy up immutable/append/sync/noatime attributes (Amir Goldstein)

    - Improve performance by enabling RCU lookup.

    - Misc fixes and improvements

    The reason this touches so many files is that the ->get_acl() method now
    gets a "bool rcu" argument. The ->get_acl() API was updated based on
    comments from Al and Linus:

    Link: https://lore.kernel.org/linux-fsdevel/CAJfpeguQxpd6Wgc0Jd3ks77zcsAv_bn0q17L3VNnnmPKu11t8A@mail.gmail.com/

    * tag 'ovl-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: enable RCU'd ->get_acl()
    vfs: add rcu argument to ->get_acl() callback
    ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
    ovl: use kvalloc in xattr copy-up
    ovl: update ctime when changing fileattr
    ovl: skip checking lower file's i_writecount on truncate
    ovl: relax lookup error on mismatch origin ftype
    ovl: do not set overlay.opaque for new directories
    ovl: add ovl_allow_offline_changes() helper
    ovl: disable decoding null uuid with redirect_dir
    ovl: consistent behavior for immutable/append-only inodes
    ovl: copy up sync/noatime fileattr flags
    ovl: pass ovl_fs to ovl_check_setxattr()
    fs: add generic helper for filling statx attribute flags

    Linus Torvalds
     
  • Pull erofs updates from Gao Xiang:
    "In this cycle, direct I/O and fsdax support for uncompressed files are
    now added in order to avoid double-caching for loop device and VM
    container use cases. All uncompressed cases are now turned into iomap
    infrastructure, which looks much simpler and cleaner.

    In addition, fiemap support is added for both (un)compressed files by
    using iomap infrastructure as well so end users can easily get file
    distribution. We've also added chunk-based uncompressed files support
    for data deduplication as the next step of VM container use cases.

    Summary:

    - support direct I/O for all uncompressed files

    - support fsdax for non-tailpacking regular files

    - use iomap infrastructure for all uncompressed cases

    - support fiemap for both (un)compressed files

    - introduce chunk-based files for chunk deduplication

    - some cleanups"

    * tag 'erofs-for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
    erofs: fix double free of 'copied'
    erofs: support reading chunk-based uncompressed files
    erofs: introduce chunk-based file on-disk format
    erofs: add fiemap support with iomap
    erofs: add support for the full decompressed length
    erofs: remove the mapping parameter from erofs_try_to_free_cached_page()
    erofs: directly use wrapper erofs_page_is_managed() when shrinking
    erofs: convert all uncompressed cases to iomap
    erofs: dax support for non-tailpacking regular file
    erofs: iomap support for non-tailpacking DIO

    Linus Torvalds
     

01 Sep, 2021

3 commits

  • Pull idmapping documentation updates from Christian Brauner:
    "The bulk of the idmapped work this cycle was adding support for
    idmapped mounts to btrfs.

    While this required the addition of a (simple) new vfs helper all the
    work is going through David Sterba's btrfs tree. It was way simpler to
    do it this way rather then forcing David to coordinate between his
    btrfs and my tree. Plus I don't care who merges it as long as I feel I
    can trust the maintainer and the btrfs folks were really fast and
    helpful in reviewing this work.

    As always, associated with the btrfs port for idmapped mounts is a new
    fstests extension specifically concerned with btrfs ioctls (e.g.
    subvolume creation, deletion etc.) on idmapped mounts which can be
    found in the fstests repo as 5f8179ce8b00 ("btrfs: introduce btrfs
    specific idmapped mounts tests").

    Consequently, this cycle the idmapping pull is boring. It only
    contains documentation updates, specifically about how idmappings and
    idmapped mounts work"

    * tag 'fs.idmapped.v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
    doc: give a more thorough id handling explanation

    Linus Torvalds
     
  • Pull fscrypt updates from Eric Biggers:
    "Some small fixes and cleanups for fs/crypto/:

    - Fix ->getattr() for ext4, f2fs, and ubifs to report the correct
    st_size for encrypted symlinks

    - Use base64url instead of a custom Base64 variant

    - Document struct fscrypt_operations"

    * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
    fscrypt: document struct fscrypt_operations
    fscrypt: align Base64 encoding with RFC 4648 base64url
    fscrypt: remove mention of symlink st_size quirk from documentation
    ubifs: report correct st_size for encrypted symlinks
    f2fs: report correct st_size for encrypted symlinks
    ext4: report correct st_size for encrypted symlinks
    fscrypt: add fscrypt_symlink_getattr() for computing st_size

    Linus Torvalds
     
  • Pull initial ksmbd implementation from Steve French:
    "Initial merge of kernel smb3 file server, ksmbd.

    The SMB family of protocols is the most widely deployed network
    filesystem protocol, the default on Windows and Macs (and even on many
    phones and tablets), with clients and servers on all major operating
    systems, but lacked a kernel server for Linux. For many cases the
    current userspace server choices were suboptimal either due to memory
    footprint, performance or difficulty integrating well with advanced
    Linux features.

    ksmbd is a new kernel module which implements the server-side of the
    SMB3 protocol. The target is to provide optimized performance, GPLv2
    SMB server, and better lease handling (distributed caching). The
    bigger goal is to add new features more rapidly (e.g. RDMA aka
    "smbdirect", and recent encryption and signing improvements to the
    protocol) which are easier to develop on a smaller, more tightly
    optimized kernel server than for example in Samba.

    The Samba project is much broader in scope (tools, security services,
    LDAP, Active Directory Domain Controller, and a cross platform file
    server for a wider variety of purposes) but the user space file server
    portion of Samba has proved hard to optimize for some Linux workloads,
    including for smaller devices.

    This is not meant to replace Samba, but rather be an extension to
    allow better optimizing for Linux, and will continue to integrate well
    with Samba user space tools and libraries where appropriate. Working
    with the Samba team we have already made sure that the configuration
    files and xattrs are in a compatible format between the kernel and
    user space server.

    Various types of functional and regression tests are regularly run
    against it. One example is the automated 'buildbot' regression tests
    which use the Linux client to test against ksmbd, e.g.

    http://smb3-test-rhel-75.southcentralus.cloudapp.azure.com/#/builders/8/builds/56

    but other test suites, including Samba's smbtorture functional test
    suite are also used regularly"

    * tag '5.15-rc-first-ksmbd-merge' of git://git.samba.org/ksmbd: (219 commits)
    ksmbd: fix __write_overflow warning in ndr_read_string
    MAINTAINERS: ksmbd: add cifs_common directory to ksmbd entry
    MAINTAINERS: ksmbd: update my email address
    ksmbd: fix permission check issue on chown and chmod
    ksmbd: don't set FILE DELETE and FILE_DELETE_CHILD in access mask by default
    MAINTAINERS: add git adddress of ksmbd
    ksmbd: update SMB3 multi-channel support in ksmbd.rst
    ksmbd: smbd: fix kernel oops during server shutdown
    ksmbd: remove select FS_POSIX_ACL in Kconfig
    ksmbd: use proper errno instead of -1 in smb2_get_ksmbd_tcon()
    ksmbd: update the comment for smb2_get_ksmbd_tcon()
    ksmbd: change int data type to boolean
    ksmbd: Fix multi-protocol negotiation
    ksmbd: fix an oops in error handling in smb2_open()
    ksmbd: add ipv6_addr_v4mapped check to know if connection from client is ipv4
    ksmbd: fix missing error code in smb2_lock
    ksmbd: use channel signingkey for binding SMB2 session setup
    ksmbd: don't set RSS capable in FSCTL_QUERY_NETWORK_INTERFACE_INFO
    ksmbd: Return STATUS_OBJECT_PATH_NOT_FOUND if smb2_creat() returns ENOENT
    ksmbd: fix -Wstringop-truncation warnings
    ...

    Linus Torvalds
     

31 Aug, 2021

3 commits

  • Add documentation about the orphan file feature.

    Reviewed-by: Theodore Ts'o
    Signed-off-by: Jan Kara
    Link: https://lore.kernel.org/r/20210816095713.16537-4-jack@suse.cz
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Pull file locking updates from Jeff Layton:
    "This starts with a couple of fixes for potential deadlocks in the
    fowner/fasync handling.

    The next patch removes the old mandatory locking code from the kernel
    altogether.

    The last patch cleans up rw_verify_area a bit more after the mandatory
    locking removal"

    * tag 'locks-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
    fs: clean up after mandatory file locking support removal
    fs: remove mandatory file locking support
    fcntl: fix potential deadlock for &fasync_struct.fa_lock
    fcntl: fix potential deadlocks for &fown_struct.lock

    Linus Torvalds
     
  • Pull fs hole punching vs cache filling race fixes from Jan Kara:
    "Fix races leading to possible data corruption or stale data exposure
    in multiple filesystems when hole punching races with operations such
    as readahead.

    This is the series I was sending for the last merge window but with
    your objection fixed - now filemap_fault() has been modified to take
    invalidate_lock only when we need to create new page in the page cache
    and / or bring it uptodate"

    * tag 'hole_punch_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    filesystems/locking: fix Malformed table warning
    cifs: Fix race between hole punch and page fault
    ceph: Fix race between hole punch and page fault
    fuse: Convert to using invalidate_lock
    f2fs: Convert to using invalidate_lock
    zonefs: Convert to using invalidate_lock
    xfs: Convert double locking of MMAPLOCK to use VFS helpers
    xfs: Convert to use invalidate_lock
    xfs: Refactor xfs_isilocked()
    ext2: Convert to using invalidate_lock
    ext4: Convert to use mapping->invalidate_lock
    mm: Add functions to lock invalidate_lock for two mappings
    mm: Protect operations adding pages to page cache with invalidate_lock
    documentation: Sync file_operations members with reality
    mm: Fix comments mentioning i_mutex

    Linus Torvalds
     

23 Aug, 2021

1 commit

  • We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it
    off in Fedora and RHEL8. Several other distros have followed suit.

    I've heard of one problem in all that time: Someone migrated from an
    older distro that supported "-o mand" to one that didn't, and the host
    had a fstab entry with "mand" in it which broke on reboot. They didn't
    actually _use_ mandatory locking so they just removed the mount option
    and moved on.

    This patch rips out mandatory locking support wholesale from the kernel,
    along with the Kconfig option and the Documentation file. It also
    changes the mount code to ignore the "mand" mount option instead of
    erroring out, and to throw a big, ugly warning.

    Signed-off-by: Jeff Layton

    Jeff Layton
     

20 Aug, 2021

1 commit

  • Currently, uncompressed data except for tail-packing inline is
    consecutive on disk.

    In order to support chunk-based data deduplication, add a new
    corresponding inode data layout.

    In the future, the data source of chunks can be either (un)compressed.

    Link: https://lore.kernel.org/r/20210820100019.208490-1-hsiangkao@linux.alibaba.com
    Reviewed-by: Liu Bo
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     

19 Aug, 2021

1 commit


18 Aug, 2021

3 commits


17 Aug, 2021

1 commit

  • It is possible that a directory tree is shared between multiple overlay
    instances as a lower layer. In this case when one instance executes a file
    residing on the lower layer, the other instance denies a truncate(2) call
    on this file.

    This only happens for truncate(2) and not for open(2) with the O_TRUNC
    flag.

    Fix this interference and inconsistency by removing the preliminary
    i_writecount check before copy-up.

    This means that unlike on normal filesystems truncate(argv[0]) will now
    succeed. If this ever causes a regression in a real world use case this
    needs to be revisited.

    One way to fix this properly would be to keep a correct i_writecount in the
    overlay inode, but that is difficult due to memory mapping code only
    dealing with the real file/inode.

    Signed-off-by: Chengguang Xu
    Signed-off-by: Miklos Szeredi

    Chengguang Xu
     

13 Aug, 2021

2 commits


11 Aug, 2021

1 commit

  • Currently there's no document explaining how idmappings work at all.
    Add a document that gives an introduction and also goes into a bit more
    detail for more advanced use-cases.

    Link: https://lore.kernel.org/r/20210727104416.828293-1-brauner@kernel.org
    Cc: Seth Forshee
    Cc: Christoph Hellwig
    Cc: Aleksa Sarai
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Christian Brauner

    Christian Brauner
     

10 Aug, 2021

1 commit

  • DAX is quite useful for some VM use cases in order to save guest
    memory extremely with minimal lightweight EROFS.

    In order to prepare for such use cases, add preliminary dax support
    for non-tailpacking regular files for now.

    Tested with the DRAM-emulated PMEM and the EROFS image generated by
    "mkfs.erofs -Enoinline_data enwik9.fsdax.img enwik9"

    Link: https://lore.kernel.org/r/20210805003601.183063-3-hsiangkao@linux.alibaba.com
    Cc: nvdimm@lists.linux.dev
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     

04 Aug, 2021

1 commit

  • As James Z reported in bugzilla:

    https://bugzilla.kernel.org/show_bug.cgi?id=213877

    [1.] One-line summary of the problem:
    Mount multiple SMR block devices exceed certain number cause system non-response

    [2.] Full description of the problem/report:
    Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
    Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
    The number of SMR devices with other FS mounted on this system does not interfere with the result above.

    [3.] Keywords (i.e., modules, networking, kernel):
    F2FS, SMR, Memory

    [4.] Kernel information
    [4.1.] Kernel version (uname -a):
    Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

    [4.2.] Kernel .config file:
    Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64

    [5.] Most recent kernel version which did not have the bug:
    None

    [6.] Output of Oops.. message (if applicable) with symbolic information
    resolved (see Documentation/admin-guide/oops-tracing.rst)
    None

    [7.] A small shell script or example program which triggers the
    problem (if possible)
    mount /dev/sdX /mnt/0X

    [8.] Memory consumption

    With 24 * 14T SMR Block device with F2FS
    free -g
    total used free shared buff/cache available
    Mem: 46 36 0 0 10 10
    Swap: 0 0 0

    With 3 * 14T SMR Block device with F2FS
    free -g
    total used free shared buff/cache available
    Mem: 7 5 0 0 1 1
    Swap: 7 0 7

    The root cause is, there are three bitmaps:
    - cur_valid_map
    - ckpt_valid_map
    - discard_map
    and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
    necessary, but discard_map is optional, since this bitmap will only be
    useful in mountpoint that small discard is enabled.

    For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
    discard for a section(zone) when all blocks of that section are invalid,
    so, for such device, we don't need small discard functionality at all.

    This patch introduces a new mountoption "discard_unit=block|segment|
    section" to support issuing discard with different basic unit which is
    aligned to block, segment or section, so that user can specify
    "discard_unit=segment" or "discard_unit=section" to disable small
    discard functionality.

    Note that this mount option can not be changed by remount() due to
    related metadata need to be initialized during mount().

    In order to save memory, let's use "discard_unit=section" for blkzoned
    device by default.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

28 Jul, 2021

1 commit

  • Update the bottom border to be the same as the top border.

    Documentation/filesystems/locking.rst:274: WARNING: Malformed table.
    Bottom/header table border does not match top border.

    Fixes: 730633f0b7f9 ("mm: Protect operations adding pages to page cache with invalidate_lock")
    Link: https://lore.kernel.org/r/20210727232212.12510-1-rdunlap@infradead.org
    Signed-off-by: Randy Dunlap
    Reviewed-by: Darrick J. Wong
    Cc: Darrick J. Wong
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Jan Kara

    Randy Dunlap
     

26 Jul, 2021

2 commits

  • fscrypt uses a Base64 encoding to encode no-key filenames (the filenames
    that are presented to userspace when a directory is listed without its
    encryption key). There are many variants of Base64, but the most common
    ones are specified by RFC 4648. fscrypt can't use the regular RFC 4648
    "base64" variant because "base64" uses the '/' character, which isn't
    allowed in filenames. However, RFC 4648 also specifies a "base64url"
    variant for use in URLs and filenames. "base64url" is less common than
    "base64", but it's still implemented in many programming libraries.

    Unfortunately, what fscrypt actually uses is a custom Base64 variant
    that differs from "base64url" in several ways:

    - The binary data is divided into 6-bit chunks differently.

    - Values 62 and 63 are encoded with '+' and ',' instead of '-' and '_'.

    - '='-padding isn't used. This isn't a problem per se, as the padding
    isn't technically necessary, and RFC 4648 doesn't strictly require it.
    But it needs to be properly documented.

    There have been two attempts to copy the fscrypt Base64 code into lib/
    (https://lkml.kernel.org/r/20200821182813.52570-6-jlayton@kernel.org and
    https://lkml.kernel.org/r/20210716110428.9727-5-hare@suse.de), and both
    have been caught up by the fscrypt Base64 variant being nonstandard and
    not properly documented. Also, the planned use of the fscrypt Base64
    code in the CephFS storage back-end will prevent it from being changed
    later (whereas currently it can still be changed), so we need to choose
    an encoding that we're happy with before it's too late.

    Therefore, switch the fscrypt Base64 variant to base64url, in order to
    align more closely with RFC 4648 and other implementations and uses of
    Base64. However, I opted not to implement '='-padding, as '='-padding
    adds complexity, is unnecessary, and isn't required by the RFC.

    Link: https://lore.kernel.org/r/20210718000125.59701-1-ebiggers@kernel.org
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Eric Biggers

    Eric Biggers
     
  • Now that the correct st_size is reported for encrypted symlinks on all
    filesystems, update the documentation accordingly.

    Link: https://lore.kernel.org/r/20210702065350.209646-6-ebiggers@kernel.org
    Signed-off-by: Eric Biggers

    Eric Biggers
     

18 Jul, 2021

1 commit

  • Documentation was not changed when renaming the script in commit
    80e715a06c2d ("initramfs: rename gen_initramfs_list.sh to
    gen_initramfs.sh"). Fixing this.

    Basically does:

    $ sed -i -e s/gen_initramfs_list.sh/gen_initramfs.sh/g $(git grep -l gen_initramfs_list.sh)

    Fixes: 80e715a06c2d ("initramfs: rename gen_initramfs_list.sh to gen_initramfs.sh")
    Signed-off-by: Robert Richter
    Signed-off-by: Masahiro Yamada

    Robert Richter
     

13 Jul, 2021

2 commits

  • Currently, serializing operations such as page fault, read, or readahead
    against hole punching is rather difficult. The basic race scheme is
    like:

    fallocate(FALLOC_FL_PUNCH_HOLE) read / fault / ..
    truncate_inode_pages_range()

    Now the problem is in this way read / page fault / readahead can
    instantiate pages in page cache with potentially stale data (if blocks
    get quickly reused). Avoiding this race is not simple - page locks do
    not work because we want to make sure there are *no* pages in given
    range. inode->i_rwsem does not work because page fault happens under
    mmap_sem which ranks below inode->i_rwsem. Also using it for reads makes
    the performance for mixed read-write workloads suffer.

    So create a new rw_semaphore in the address_space - invalidate_lock -
    that protects adding of pages to page cache for page faults / reads /
    readahead.

    Reviewed-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     
  • Sync listing of struct file_operations members with the real one in
    fs.h.

    Reviewed-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara