09 Feb, 2020

1 commit

  • Pull misc vfs updates from Al Viro:

    - bmap series from cmaiolino

    - getting rid of convolutions in copy_mount_options() (use a couple of
    copy_from_user() instead of the __get_user() crap)

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    saner copy_mount_options()
    fibmap: Reject negative block numbers
    fibmap: Use bmap instead of ->bmap method in ioctl_fibmap
    ecryptfs: drop direct calls to ->bmap
    cachefiles: drop direct usage of ->bmap method.
    fs: Enable bmap() function to properly return errors

    Linus Torvalds
     

05 Feb, 2020

1 commit

  • Pull vfs timestamp updates from Al Viro:
    "More 64bit timestamp work"

    * 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    kernfs: don't bother with timestamp truncation
    fs: Do not overload update_time
    fs: Delete timespec64_trunc()
    fs: ubifs: Eliminate timespec64_trunc() usage
    fs: ceph: Delete timespec64_trunc() usage
    fs: cifs: Delete usage of timespec64_trunc
    fs: fat: Eliminate timespec64_trunc() usage
    utimes: Clamp the timestamps in notify_change()

    Linus Torvalds
     

04 Feb, 2020

1 commit

  • 'PTR_ERR(p) == -E*' is a stronger condition than IS_ERR(p).
    Hence, IS_ERR(p) is unneeded.

    The semantic patch that generates this commit is as follows:

    //
    @@
    expression ptr;
    constant error_code;
    @@
    -IS_ERR(ptr) && (PTR_ERR(ptr) == - error_code)
    +PTR_ERR(ptr) == - error_code
    //

    Link: http://lkml.kernel.org/r/20200106045833.1725-1-masahiroy@kernel.org
    Signed-off-by: Masahiro Yamada
    Cc: Julia Lawall
    Acked-by: Stephen Boyd [drivers/clk/clk.c]
    Acked-by: Bartosz Golaszewski [GPIO]
    Acked-by: Wolfram Sang [drivers/i2c]
    Acked-by: Rafael J. Wysocki [acpi/scan.c]
    Acked-by: Rob Herring
    Cc: Eric Biggers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     

03 Feb, 2020

1 commit

  • By now, bmap() will either return the physical block number related to
    the requested file offset or 0 in case of error or the requested offset
    maps into a hole.
    This patch makes the needed changes to enable bmap() to proper return
    errors, using the return value as an error return, and now, a pointer
    must be passed to bmap() to be filled with the mapped physical block.

    It will change the behavior of bmap() on return:

    - negative value in case of error
    - zero on success or map fell into a hole

    In case of a hole, the *block will be zero too

    Since this is a prep patch, by now, the only error return is -EINVAL if
    ->bmap doesn't exist.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Carlos Maiolino
    Signed-off-by: Al Viro

    Carlos Maiolino
     

31 Jan, 2020

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "In this series, we've implemented transparent compression
    experimentally. It supports LZO and LZ4, but will add more later as we
    investigate in the field more.

    At this point, the feature doesn't expose compressed space to user
    directly in order to guarantee potential data updates later to the
    space. Instead, the main goal is to reduce data writes to flash disk
    as much as possible, resulting in extending disk life time as well as
    relaxing IO congestion.

    Alternatively, we're also considering to add ioctl() to reclaim
    compressed space and show it to user after putting the immutable bit.

    Enhancements:
    - add compression support
    - avoid unnecessary locks in quota ops
    - harden power-cut scenario for zoned block devices
    - use private bio_set to avoid IO congestion
    - replace GC mutex with rwsem to serialize callers

    Bug fixes:
    - fix dentry consistency and memory corruption in rename()'s error case
    - fix wrong swap extent reports
    - fix casefolding bugs
    - change lock coverage to avoid deadlock
    - avoid GFP_KERNEL under f2fs_lock_op

    And, we've cleaned up sysfs entries to prepare no debugfs"

    * tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (31 commits)
    f2fs: fix race conditions in ->d_compare() and ->d_hash()
    f2fs: fix dcache lookup of !casefolded directories
    f2fs: Add f2fs stats to sysfs
    f2fs: delete duplicate information on sysfs nodes
    f2fs: change to use rwsem for gc_mutex
    f2fs: update f2fs document regarding to fsync_mode
    f2fs: add a way to turn off ipu bio cache
    f2fs: code cleanup for f2fs_statfs_project()
    f2fs: fix miscounted block limit in f2fs_statfs_project()
    f2fs: show the CP_PAUSE reason in checkpoint traces
    f2fs: fix deadlock allocating bio_post_read_ctx from mempool
    f2fs: remove unneeded check for error allocating bio_post_read_ctx
    f2fs: convert inline_dir early before starting rename
    f2fs: fix memleak of kobject
    f2fs: fix to add swap extent correctly
    f2fs: run fsck when getting bad inode during GC
    f2fs: support data compression
    f2fs: free sysfs kobject
    f2fs: declare nested quota_sem and remove unnecessary sems
    f2fs: don't put new_page twice in f2fs_rename
    ...

    Linus Torvalds
     

29 Jan, 2020

1 commit

  • Pull fsverity updates from Eric Biggers:

    - Optimize fs-verity sequential read performance by implementing
    readahead of Merkle tree pages. This allows the Merkle tree to be
    read in larger chunks.

    - Optimize FS_IOC_ENABLE_VERITY performance in the uncached case by
    implementing readahead of data pages.

    - Allocate the hash requests from a mempool in order to eliminate the
    possibility of allocation failures during I/O.

    * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
    fs-verity: use u64_to_user_ptr()
    fs-verity: use mempool for hash requests
    fs-verity: implement readahead of Merkle tree pages
    fs-verity: implement readahead for FS_IOC_ENABLE_VERITY

    Linus Torvalds
     

25 Jan, 2020

2 commits

  • Since ->d_compare() and ->d_hash() can be called in RCU-walk mode,
    ->d_parent and ->d_inode can be concurrently modified, and in
    particular, ->d_inode may be changed to NULL. For f2fs_d_hash() this
    resulted in a reproducible NULL dereference if a lookup is done in a
    directory being deleted, e.g. with:

    int main()
    {
    if (fork()) {
    for (;;) {
    mkdir("subdir", 0700);
    rmdir("subdir");
    }
    } else {
    for (;;)
    access("subdir/file", 0);
    }
    }

    ... or by running the 't_encrypted_d_revalidate' program from xfstests.
    Both repros work in any directory on a filesystem with the encoding
    feature, even if the directory doesn't actually have the casefold flag.

    I couldn't reproduce a crash in f2fs_d_compare(), but it appears that a
    similar crash is possible there.

    Fix these bugs by reading ->d_parent and ->d_inode using READ_ONCE() and
    falling back to the case sensitive behavior if the inode is NULL.

    Reported-by: Al Viro
    Fixes: 2c2eb7a300cd ("f2fs: Support case-insensitive file name lookups")
    Cc: # v5.4+
    Signed-off-by: Eric Biggers
    Signed-off-by: Jaegeuk Kim

    Eric Biggers
     
  • Do the name comparison for non-casefolded directories correctly.

    This is analogous to ext4's commit 66883da1eee8 ("ext4: fix dcache
    lookup of !casefolded directories").

    Fixes: 2c2eb7a300cd ("f2fs: Support case-insensitive file name lookups")
    Cc: # v5.4+
    Signed-off-by: Eric Biggers
    Signed-off-by: Jaegeuk Kim

    Eric Biggers
     

24 Jan, 2020

1 commit

  • Currently f2fs stats are only available from /d/f2fs/status. This patch
    adds some of the f2fs stats to sysfs so that they are accessible even
    when debugfs is not mounted.

    The following sysfs nodes are added:
    -/sys/fs/f2fs//free_segments
    -/sys/fs/f2fs//cp_foreground_calls
    -/sys/fs/f2fs//cp_background_calls
    -/sys/fs/f2fs//gc_foreground_calls
    -/sys/fs/f2fs//gc_background_calls
    -/sys/fs/f2fs//moved_blocks_foreground
    -/sys/fs/f2fs//moved_blocks_background
    -/sys/fs/f2fs//avg_vblocks

    Signed-off-by: Hridya Valsaraju
    [Jaegeuk Kim: allow STAT_FS without DEBUG_FS]
    Signed-off-by: Jaegeuk Kim

    Hridya Valsaraju
     

18 Jan, 2020

12 commits

  • Mutex lock won't serialize callers, in order to avoid starving of unlucky
    caller, let's use rwsem lock instead.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch adds missing fsync_mode entry in f2fs document.

    Fixes: 04485987f053 ("f2fs: introduce async IPU policy")
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Setting 0x40 in /sys/fs/f2fs/dev/ipu_policy gives a way to turn off
    bio cache, which is useufl to check whether block layer using hardware
    encryption engine merges IOs correctly.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Calling min_not_zero() to simplify complicated prjquota
    limit comparison in f2fs_statfs_project().

    Signed-off-by: Chengguang Xu
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chengguang Xu
     
  • statfs calculates Total/Used/Avail disk space in block unit,
    so we should translate soft/hard prjquota limit to block unit
    as well.

    Below testing result shows the block/inode numbers of
    Total/Used/Avail from df command are all correct afer
    applying this patch.

    [root@localhost quota-tools]\# ./repquota -P /dev/sdb1
    *** Report for project quotas on device /dev/sdb1
    Block grace time: 7days; Inode grace time: 7days
    Block limits File limits
    Project used soft hard grace used soft hard grace
    -----------------------------------------------------------
    \#0 -- 4 0 0 1 0 0
    \#101 -- 0 0 0 2 0 0
    \#102 -- 0 10240 0 2 10 0
    \#103 -- 0 0 20480 2 0 20
    \#104 -- 0 10240 20480 2 10 20
    \#105 -- 0 20480 10240 2 20 10

    [root@localhost sdb1]\# lsattr -p t{1,2,3,4,5}
    101 ----------------N-- t1/a1
    102 ----------------N-- t2/a2
    103 ----------------N-- t3/a3
    104 ----------------N-- t4/a4
    105 ----------------N-- t5/a5

    [root@localhost sdb1]\# df -hi t{1,2,3,4,5}
    Filesystem Inodes IUsed IFree IUse% Mounted on
    /dev/sdb1 2.4M 21 2.4M 1% /mnt/sdb1
    /dev/sdb1 10 2 8 20% /mnt/sdb1
    /dev/sdb1 20 2 18 10% /mnt/sdb1
    /dev/sdb1 10 2 8 20% /mnt/sdb1
    /dev/sdb1 10 2 8 20% /mnt/sdb1

    [root@localhost sdb1]\# df -h t{1,2,3,4,5}
    Filesystem Size Used Avail Use% Mounted on
    /dev/sdb1 10G 489M 9.6G 5% /mnt/sdb1
    /dev/sdb1 10M 0 10M 0% /mnt/sdb1
    /dev/sdb1 20M 0 20M 0% /mnt/sdb1
    /dev/sdb1 10M 0 10M 0% /mnt/sdb1
    /dev/sdb1 10M 0 10M 0% /mnt/sdb1

    Fixes: 909110c060f2 ("f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()")
    Signed-off-by: Chengguang Xu
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chengguang Xu
     
  • Without any form of coordination, any case where multiple allocations
    from the same mempool are needed at a time to make forward progress can
    deadlock under memory pressure.

    This is the case for struct bio_post_read_ctx, as one can be allocated
    to decrypt a Merkle tree page during fsverity_verify_bio(), which itself
    is running from a post-read callback for a data bio which has its own
    struct bio_post_read_ctx.

    Fix this by freeing first bio_post_read_ctx before calling
    fsverity_verify_bio(). This works because verity (if enabled) is always
    the last post-read step.

    This deadlock can be reproduced by trying to read from an encrypted
    verity file after reducing NUM_PREALLOC_POST_READ_CTXS to 1 and patching
    mempool_alloc() to pretend that pool->alloc() always fails.

    Note that since NUM_PREALLOC_POST_READ_CTXS is actually 128, to actually
    hit this bug in practice would require reading from lots of encrypted
    verity files at the same time. But it's theoretically possible, as N
    available objects doesn't guarantee forward progress when > N/2 threads
    each need 2 objects at a time.

    Fixes: 95ae251fe828 ("f2fs: add fs-verity support")
    Signed-off-by: Eric Biggers
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Eric Biggers
     
  • Since allocating an object from a mempool never fails when
    __GFP_DIRECT_RECLAIM (which is included in GFP_NOFS) is set, the check
    for failure to allocate a bio_post_read_ctx is unnecessary. Remove it.

    Signed-off-by: Eric Biggers
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Eric Biggers
     
  • If we hit an error during rename, we'll get two dentries in different
    directories.

    Chao adds to check the room in inline_dir which can avoid needless
    inversion. This should be done by inode_lock(&old_dir).

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • If kobject_init_and_add() failed, caller needs to invoke kobject_put()
    to release kobject explicitly.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As Youling reported in mailing list:

    https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs-is-broken-4175666043/

    https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/

    There is a test case can corrupt f2fs image:
    - dd if=/dev/zero of=/swapfile bs=1M count=4096
    - chmod 600 /swapfile
    - mkswap /swapfile
    - swapon --discard /swapfile

    The root cause is f2fs_swap_activate() intends to return zero value
    to setup_swap_extents() to enable SWP_FS mode (swap file goes through
    fs), in this flow, setup_swap_extents() setups swap extent with wrong
    block address range, result in discard_swap() erasing incorrect address.

    Because f2fs_swap_activate() has pinned swapfile, its data block
    address will not change, it's safe to let swap to handle IO through
    raw device, so we can get rid of SWAP_FS mode and initial swap extents
    inside f2fs_swap_activate(), by this way, later discard_swap() can trim
    in right address range.

    Fixes: 4969c06a0d83 ("f2fs: support swap file w/ DIO")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This is to avoid inifinite GC when trying to disable checkpoint.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch tries to support compression in f2fs.

    - New term named cluster is defined as basic unit of compression, file can
    be divided into multiple clusters logically. One cluster includes 4 << n
    (n >= 0) logical pages, compression size is also cluster size, each of
    cluster can be compressed or not.

    - In cluster metadata layout, one special flag is used to indicate cluster
    is compressed one or normal one, for compressed cluster, following metadata
    maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
    data including compress header and compressed data.

    - In order to eliminate write amplification during overwrite, F2FS only
    support compression on write-once file, data can be compressed only when
    all logical blocks in file are valid and cluster compress ratio is lower
    than specified threshold.

    - To enable compression on regular inode, there are three ways:
    * chattr +c file
    * chattr +c dir; touch dir/file
    * mount w/ -o compress_extension=ext; touch file.ext

    Compress metadata layout:
    [Dnode Structure]
    +-----------------------------------------------+
    | cluster 1 | cluster 2 | ......... | cluster N |
    +-----------------------------------------------+
    . . . .
    . . . .
    . Compressed Cluster . . Normal Cluster .
    +----------+---------+---------+---------+ +---------+---------+---------+---------+
    |compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
    +----------+---------+---------+---------+ +---------+---------+---------+---------+
    . .
    . .
    . .
    +-------------+-------------+----------+----------------------------+
    | data length | data chksum | reserved | compressed data |
    +-------------+-------------+----------+----------------------------+

    Changelog:

    20190326:
    - fix error handling of read_end_io().
    - remove unneeded comments in f2fs_encrypt_one_page().

    20190327:
    - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
    - don't jump into loop directly to avoid uninitialized variables.
    - add TODO tag in error path of f2fs_write_cache_pages().

    20190328:
    - fix wrong merge condition in f2fs_read_multi_pages().
    - check compressed file in f2fs_post_read_required().

    20190401
    - allow overwrite on non-compressed cluster.
    - check cluster meta before writing compressed data.

    20190402
    - don't preallocate blocks for compressed file.

    - add lz4 compress algorithm
    - process multiple post read works in one workqueue
    Now f2fs supports processing post read work in multiple workqueue,
    it shows low performance due to schedule overhead of multiple
    workqueue executing orderly.

    20190921
    - compress: support buffered overwrite
    C: compress cluster flag
    V: valid block address
    N: NEW_ADDR

    One cluster contain 4 blocks

    before overwrite after overwrite

    - VVVV -> CVNN
    - CVNN -> VVVV

    - CVNN -> CVNN
    - CVNN -> CVVV

    - CVVV -> CVNN
    - CVVV -> CVVV

    20191029
    - add kconfig F2FS_FS_COMPRESSION to isolate compression related
    codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
    note that: will remove lzo backend if Jaegeuk agreed that too.
    - update codes according to Eric's comments.

    20191101
    - apply fixes from Jaegeuk

    20191113
    - apply fixes from Jaegeuk
    - split workqueue for fsverity

    20191216
    - apply fixes from Jaegeuk

    20200117
    - fix to avoid NULL pointer dereference

    [Jaegeuk Kim]
    - add tracepoint for f2fs_{,de}compress_pages()
    - fix many bugs and add some compression stats
    - fix overwrite/mmap bugs
    - address 32bit build error, reported by Geert.
    - bug fixes when handling errors and i_compressed_blocks

    Reported-by:
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

16 Jan, 2020

9 commits

  • Detected kmemleak.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • 1.
    f2fs_quota_sync
    -> down_read(&sbi->quota_sem)
    -> dquot_writeback_dquots
    -> f2fs_dquot_commit
    -> down_read(&sbi->quota_sem)

    2.
    f2fs_quota_sync
    -> down_read(&sbi->quota_sem)
    -> f2fs_write_data_pages
    -> f2fs_write_single_data_page
    -> down_write(&F2FS_I(inode)->i_sem)

    f2fs_mkdir
    -> f2fs_do_add_link
    -> down_write(&F2FS_I(inode)->i_sem)
    -> f2fs_init_inode_metadata
    -> f2fs_new_node_page
    -> dquot_alloc_inode
    -> f2fs_dquot_mark_dquot_dirty
    -> down_read(&sbi->quota_sem)

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • In f2fs_rename(), new_page is gone after f2fs_set_link(), but it tries
    to put again when whiteout is failed and jumped to put_out_dir.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch moves setting I_LINKABLE early in rename2(whiteout) to avoid the
    below warning.

    [ 3189.163385] WARNING: CPU: 3 PID: 59523 at fs/inode.c:358 inc_nlink+0x32/0x40
    [ 3189.246979] Call Trace:
    [ 3189.248707] f2fs_init_inode_metadata+0x2d6/0x440 [f2fs]
    [ 3189.251399] f2fs_add_inline_entry+0x162/0x8c0 [f2fs]
    [ 3189.254010] f2fs_add_dentry+0x69/0xe0 [f2fs]
    [ 3189.256353] f2fs_do_add_link+0xc5/0x100 [f2fs]
    [ 3189.258774] f2fs_rename2+0xabf/0x1010 [f2fs]
    [ 3189.261079] vfs_rename+0x3f8/0xaa0
    [ 3189.263056] ? tomoyo_path_rename+0x44/0x60
    [ 3189.265283] ? do_renameat2+0x49b/0x550
    [ 3189.267324] do_renameat2+0x49b/0x550
    [ 3189.269316] __x64_sys_renameat2+0x20/0x30
    [ 3189.271441] do_syscall_64+0x5a/0x230
    [ 3189.273410] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 3189.275848] RIP: 0033:0x7f270b4d9a49

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • META_MAPPING is used to move blocks for both encrypted and verity files.
    So the META_MAPPING invalidation condition in do_checkpoint() should
    consider verity too, not just encrypt.

    Signed-off-by: Eric Biggers
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Eric Biggers
     
  • In low memory scenario, we can allocate multiple bios without
    submitting any of them.

    - f2fs_write_checkpoint()
    - block_operations()
    - f2fs_sync_node_pages()
    step 1) flush cold nodes, allocate new bio from mempool
    - bio_alloc()
    - mempool_alloc()
    step 2) flush hot nodes, allocate a bio from mempool
    - bio_alloc()
    - mempool_alloc()
    step 3) flush warm nodes, be stuck in below call path
    - bio_alloc()
    - mempool_alloc()
    - loop to wait mempool element release, as we only
    reserved memory for two bio allocation, however above
    allocated two bios may never be submitted.

    So we need avoid using default bioset, in this patch we introduce a
    private bioset, in where we enlarg mempool element count to total
    number of log header, so that we can make sure we have enough
    backuped memory pool in scenario of allocating/holding multiple
    bios.

    Signed-off-by: Gao Xiang
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Remove duplicate sbi->aw_cnt stats counter that tracks
    the number of atomic files currently opened (it also shows
    incorrect value sometimes). Use more relit lable sbi->atomic_files
    to show in the stats.

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Sahitya Tummala
     
  • To catch f2fs bugs in write pointer handling code for zoned block
    devices, check write pointers of non-open zones that current segments do
    not point to. Do this check at mount time, after the fsync data recovery
    and current segments' write pointer consistency fix. Or when fsync data
    recovery is disabled by mount option, do the check when there is no fsync
    data.

    Check two items comparing write pointers with valid block maps in SIT.
    The first item is check for zones with no valid blocks. When there is no
    valid blocks in a zone, the write pointer should be at the start of the
    zone. If not, next write operation to the zone will cause unaligned write
    error. If write pointer is not at the zone start, reset the write pointer
    to place at the zone start.

    The second item is check between the write pointer position and the last
    valid block in the zone. It is unexpected that the last valid block
    position is beyond the write pointer. In such a case, report as a bug.
    Fix is not required for such zone, because the zone is not selected for
    next write operation until the zone get discarded.

    Signed-off-by: Shin'ichiro Kawasaki
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Shin'ichiro Kawasaki
     
  • On sudden f2fs shutdown, write pointers of zoned block devices can go
    further but f2fs meta data keeps current segments at positions before the
    write operations. After remounting the f2fs, this inconsistency causes
    write operations not at write pointers and "Unaligned write command"
    error is reported.

    To avoid the error, compare current segments with write pointers of open
    zones the current segments point to, during mount operation. If the write
    pointer position is not aligned with the current segment position, assign
    a new zone to the current segment. Also check the newly assigned zone has
    write pointer at zone start. If not, reset write pointer of the zone.

    Perform the consistency check during fsync recovery. Not to lose the
    fsync data, do the check after fsync data gets restored and before
    checkpoint commit which flushes data at current segment positions. Not to
    cause conflict with kworker's dirfy data/node flush, do the fix within
    SBI_POR_DOING protection.

    Signed-off-by: Shin'ichiro Kawasaki
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Shin'ichiro Kawasaki
     

15 Jan, 2020

1 commit

  • When fs-verity verifies data pages, currently it reads each Merkle tree
    page synchronously using read_mapping_page().

    Therefore, when the Merkle tree pages aren't already cached, fs-verity
    causes an extra 4 KiB I/O request for every 512 KiB of data (assuming
    that the Merkle tree uses SHA-256 and 4 KiB blocks). This results in
    more I/O requests and performance loss than is strictly necessary.

    Therefore, implement readahead of the Merkle tree pages.

    For simplicity, we take advantage of the fact that the kernel already
    does readahead of the file's *data*, just like it does for any other
    file. Due to this, we don't really need a separate readahead state
    (struct file_ra_state) just for the Merkle tree, but rather we just need
    to piggy-back on the existing data readahead requests.

    We also only really need to bother with the first level of the Merkle
    tree, since the usual fan-out factor is 128, so normally over 99% of
    Merkle tree I/O requests are for the first level.

    Therefore, make fsverity_verify_bio() enable readahead of the first
    Merkle tree level, for up to 1/4 the number of pages in the bio, when it
    sees that the REQ_RAHEAD flag is set on the bio. The readahead size is
    then passed down to ->read_merkle_tree_page() for the filesystem to
    (optionally) implement if it sees that the requested page is uncached.

    While we're at it, also make build_merkle_tree_level() set the Merkle
    tree readahead size, since it's easy to do there.

    However, for now don't set the readahead size in fsverity_verify_page(),
    since currently it's only used to verify holes on ext4 and f2fs, and it
    would need parameters added to know how much to read ahead.

    This patch significantly improves fs-verity sequential read performance.
    Some quick benchmarks with 'cat'-ing a 250MB file after dropping caches:

    On an ARM64 phone (using sha256-ce):
    Before: 217 MB/s
    After: 263 MB/s
    (compare to sha256sum of non-verity file: 357 MB/s)

    In an x86_64 VM (using sha256-avx2):
    Before: 173 MB/s
    After: 215 MB/s
    (compare to sha256sum of non-verity file: 223 MB/s)

    Link: https://lore.kernel.org/r/20200106205533.137005-1-ebiggers@kernel.org
    Reviewed-by: Theodore Ts'o
    Signed-off-by: Eric Biggers

    Eric Biggers
     

01 Jan, 2020

2 commits

  • The commit 643fa9612bf1 ("fscrypt: remove filesystem specific
    build config option") removed modular support for fs/crypto. This
    causes the Crypto API to be built-in whenever fscrypt is enabled.
    This makes it very difficult for me to test modular builds of
    the Crypto API without disabling fscrypt which is a pain.

    As fscrypt is still evolving and it's developing new ties with the
    fs layer, it's hard to build it as a module for now.

    However, the actual algorithms are not required until a filesystem
    is mounted. Therefore we can allow them to be built as modules.

    Signed-off-by: Herbert Xu
    Link: https://lore.kernel.org/r/20191227024700.7vrzuux32uyfdgum@gondor.apana.org.au
    Signed-off-by: Eric Biggers

    Herbert Xu
     
  • fscrypt_get_encryption_info() returns 0 if the encryption key is
    unavailable; it never returns ENOKEY. So remove checks for ENOKEY.

    Link: https://lore.kernel.org/r/20191209212348.243331-1-ebiggers@kernel.org
    Signed-off-by: Eric Biggers

    Eric Biggers
     

13 Dec, 2019

3 commits


11 Dec, 2019

1 commit

  • Otherwise, we can hit deadlock by waiting for the locked page in
    move_data_block in GC.

    Thread A Thread B
    - do_page_mkwrite
    - f2fs_vm_page_mkwrite
    - lock_page
    - f2fs_balance_fs
    - mutex_lock(gc_mutex)
    - f2fs_gc
    - do_garbage_collect
    - ra_data_block
    - grab_cache_page
    - f2fs_balance_fs
    - mutex_lock(gc_mutex)

    Fixes: 39a8695824510 ("f2fs: refactor ->page_mkwrite() flow")
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

10 Dec, 2019

1 commit

  • The previous preallocation and DIO decision like below.

    allow_outplace_dio !allow_outplace_dio
    f2fs_force_buffered_io (*) No_Prealloc / Buffered_IO Prealloc / Buffered_IO
    !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO

    But, Javier reported Case (*) where zoned device bypassed preallocation but
    fell back to buffered writes in f2fs_direct_IO(), resulting in stale data
    being read.

    In order to fix the issue, actually we need to preallocate blocks whenever
    we fall back to buffered IO like this. No change is made in the other cases.

    allow_outplace_dio !allow_outplace_dio
    f2fs_force_buffered_io (*) Prealloc / Buffered_IO Prealloc / Buffered_IO
    !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO

    Reported-and-tested-by: Javier Gonzalez
    Signed-off-by: Damien Le Moal
    Tested-by: Shin'ichiro Kawasaki
    Reviewed-by: Chao Yu
    Reviewed-by: Javier González
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

09 Dec, 2019

1 commit

  • Push clamping timestamps into notify_change(), so in-kernel
    callers like nfsd and overlayfs will get similar timestamp
    set behavior as utimes.

    AV: get rid of clamping in ->setattr() instances; we don't need
    to bother with that there, with notify_change() doing normalization
    in all cases now (it already did for implicit case, since current_time()
    clamps).

    Suggested-by: Miklos Szeredi
    Fixes: 42e729b9ddbb ("utimes: Clamp the timestamps before update")
    Cc: stable@vger.kernel.org # v5.4
    Cc: Deepa Dinamani
    Cc: Jeff Layton
    Signed-off-by: Amir Goldstein
    Signed-off-by: Al Viro

    Amir Goldstein
     

02 Dec, 2019

1 commit

  • Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
    "As part of the cleanup of some remaining y2038 issues, I came to
    fs/compat_ioctl.c, which still has a couple of commands that need
    support for time64_t.

    In completely unrelated work, I spent time on cleaning up parts of
    this file in the past, moving things out into drivers instead.

    After Al Viro reviewed an earlier version of this series and did a lot
    more of that cleanup, I decided to try to completely eliminate the
    rest of it and move it all into drivers.

    This series incorporates some of Al's work and many patches of my own,
    but in the end stops short of actually removing the last part, which
    is the scsi ioctl handlers. I have patches for those as well, but they
    need more testing or possibly a rewrite"

    * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
    scsi: sd: enable compat ioctls for sed-opal
    pktcdvd: add compat_ioctl handler
    compat_ioctl: move SG_GET_REQUEST_TABLE handling
    compat_ioctl: ppp: move simple commands into ppp_generic.c
    compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
    compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
    compat_ioctl: unify copy-in of ppp filters
    tty: handle compat PPP ioctls
    compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
    compat_ioctl: handle SIOCOUTQNSD
    af_unix: add compat_ioctl support
    compat_ioctl: reimplement SG_IO handling
    compat_ioctl: move WDIOC handling into wdt drivers
    fs: compat_ioctl: move FITRIM emulation into file systems
    gfs2: add compat_ioctl support
    compat_ioctl: remove unused convert_in_user macro
    compat_ioctl: remove last RAID handling code
    compat_ioctl: remove /dev/raw ioctl translation
    compat_ioctl: remove PCI ioctl translation
    compat_ioctl: remove joystick ioctl translation
    ...

    Linus Torvalds