02 Dec, 2019

1 commit

  • Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
    "As part of the cleanup of some remaining y2038 issues, I came to
    fs/compat_ioctl.c, which still has a couple of commands that need
    support for time64_t.

    In completely unrelated work, I spent time on cleaning up parts of
    this file in the past, moving things out into drivers instead.

    After Al Viro reviewed an earlier version of this series and did a lot
    more of that cleanup, I decided to try to completely eliminate the
    rest of it and move it all into drivers.

    This series incorporates some of Al's work and many patches of my own,
    but in the end stops short of actually removing the last part, which
    is the scsi ioctl handlers. I have patches for those as well, but they
    need more testing or possibly a rewrite"

    * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
    scsi: sd: enable compat ioctls for sed-opal
    pktcdvd: add compat_ioctl handler
    compat_ioctl: move SG_GET_REQUEST_TABLE handling
    compat_ioctl: ppp: move simple commands into ppp_generic.c
    compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
    compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
    compat_ioctl: unify copy-in of ppp filters
    tty: handle compat PPP ioctls
    compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
    compat_ioctl: handle SIOCOUTQNSD
    af_unix: add compat_ioctl support
    compat_ioctl: reimplement SG_IO handling
    compat_ioctl: move WDIOC handling into wdt drivers
    fs: compat_ioctl: move FITRIM emulation into file systems
    gfs2: add compat_ioctl support
    compat_ioctl: remove unused convert_in_user macro
    compat_ioctl: remove last RAID handling code
    compat_ioctl: remove /dev/raw ioctl translation
    compat_ioctl: remove PCI ioctl translation
    compat_ioctl: remove joystick ioctl translation
    ...

    Linus Torvalds
     

01 Dec, 2019

2 commits

  • Pull ext2, quota, reiserfs cleanups and fixes from Jan Kara:

    - Refactor the quota on/off kernel internal interfaces (mostly for
    ubifs quota support as ubifs does not want to have inodes holding
    quota information)

    - A few other small quota fixes and cleanups

    - Various small ext2 fixes and cleanups

    - Reiserfs xattr fix and one cleanup

    * tag 'for_v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits)
    ext2: code cleanup for descriptor_loc()
    fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long
    ext2: fix improper function comment
    ext2: code cleanup for ext2_try_to_allocate()
    ext2: skip unnecessary operations in ext2_try_to_allocate()
    ext2: Simplify initialization in ext2_try_to_allocate()
    ext2: code cleanup by calling ext2_group_last_block_no()
    ext2: introduce new helper ext2_group_last_block_no()
    reiserfs: replace open-coded atomic_dec_and_mutex_lock()
    ext2: check err when partial != NULL
    quota: Handle quotas without quota inodes in dquot_get_state()
    quota: Make dquot_disable() work without quota inodes
    quota: Drop dquot_enable()
    fs: Use dquot_load_quota_inode() from filesystems
    quota: Rename vfs_load_quota_inode() to dquot_load_quota_inode()
    quota: Simplify dquot_resume()
    quota: Factor out setup of quota inode
    quota: Check that quota is not dirty before release
    quota: fix livelock in dquot_writeback_dquots
    ext2: don't set *count in the case of failure in ext2_try_to_allocate()
    ...

    Linus Torvalds
     
  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've introduced fairly small number of patches as below.

    Enhancements:
    - improve the in-place-update IO flow
    - allocate segment to guarantee no GC for pinned files

    Bug fixes:
    - fix updatetime in lazytime mode
    - potential memory leak in f2fs_listxattr
    - record parent inode number in rename2 correctly
    - fix deadlock in f2fs_gc along with atomic writes
    - avoid needless data migration in GC"

    * tag 'f2fs-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
    f2fs: stop GC when the victim becomes fully valid
    f2fs: expose main_blkaddr in sysfs
    f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
    f2fs: Fix deadlock in f2fs_gc() context during atomic files handling
    f2fs: show f2fs instance in printk_ratelimited
    f2fs: fix potential overflow
    f2fs: fix to update dir's i_pino during cross_rename
    f2fs: support aligned pinned file
    f2fs: avoid kernel panic on corruption test
    f2fs: fix wrong description in document
    f2fs: cache global IPU bio
    f2fs: fix to avoid memory leakage in f2fs_listxattr
    f2fs: check total_segments from devices in raw_super
    f2fs: update multi-dev metadata in resize_fs
    f2fs: mark recovery flag correctly in read_raw_super_block()
    f2fs: fix to update time in lazytime mode

    Linus Torvalds
     

26 Nov, 2019

5 commits

  • Pull fsverity updates from Eric Biggers:
    "Expose the fs-verity bit through statx()"

    * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
    docs: fs-verity: mention statx() support
    f2fs: support STATX_ATTR_VERITY
    ext4: support STATX_ATTR_VERITY
    statx: define STATX_ATTR_VERITY
    docs: fs-verity: document first supported kernel version

    Linus Torvalds
     
  • Pull fscrypt updates from Eric Biggers:

    - Add the IV_INO_LBLK_64 encryption policy flag which modifies the
    encryption to be optimized for UFS inline encryption hardware.

    - For AES-128-CBC, use the crypto API's implementation of ESSIV (which
    was added in 5.4) rather than doing ESSIV manually.

    - A few other cleanups.

    * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
    f2fs: add support for IV_INO_LBLK_64 encryption policies
    ext4: add support for IV_INO_LBLK_64 encryption policies
    fscrypt: add support for IV_INO_LBLK_64 policies
    fscrypt: avoid data race on fscrypt_mode::logged_impl_name
    docs: ioctl-number: document fscrypt ioctl numbers
    fscrypt: zeroize fscrypt_info before freeing
    fscrypt: remove struct fscrypt_ctx
    fscrypt: invoke crypto API for ESSIV handling

    Linus Torvalds
     
  • We must stop GC, once the segment becomes fully valid. Otherwise, it can
    produce another dirty segments by moving valid blocks in the segment partially.

    Ramon hit no free segment panic sometimes and saw this case happens when
    validating reliable file pinning feature.

    Signed-off-by: Ramon Pantin
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Expose in /sys/fs/f2fs//main_blkaddr the block address where the
    main area starts. This allows user mode programs to determine:

    - That pinned files that are made exclusively of fully allocated 2MB
    segments will never be unpinned by the file system.

    - Where the main area starts. This is required by programs that want to
    verify if a file is made exclusively of 2MB f2fs segments, the alignment
    boundary for segments starts at this address. Testing for 2MB alignment
    relative to the start of the device is incorrect, because for some
    filesystems main_blkaddr is not at a 2MB boundary relative to the start
    of the device.

    The entry will be used when validating reliable pinning file feature proposed
    by "f2fs: support aligned pinned file".

    Signed-off-by: Ramon Pantin
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Setting softlimit larger than hardlimit seems meaningless
    for disk quota but currently it is allowed. In this case,
    there may be a bit of comfusion for users when they run
    df comamnd to directory which has project quota.

    For example, we set 20M softlimit and 10M hardlimit of
    block usage limit for project quota of test_dir(project id 123).

    [root@hades f2fs]# repquota -P -a
    *** Report for project quotas on device /dev/nvme0n1p8
    Block grace time: 7days; Inode grace time: 7days
    Block limits File limits
    Project used soft hard grace used soft hard grace
    ----------------------------------------------------------------------
    0 -- 4 0 0 1 0 0
    123 +- 10248 20480 10240 2 0 0

    The result of df command as below:

    [root@hades f2fs]# df -h /mnt/f2fs/test
    Filesystem Size Used Avail Use% Mounted on
    /dev/nvme0n1p8 20M 11M 10M 51% /mnt/f2fs

    Even though it looks like there is another 10M free space to use,
    if we write new data to diretory test(inherit project id),
    the write will fail with errno(-EDQUOT).

    After this patch, the df result looks like below.

    [root@hades f2fs]# df -h /mnt/f2fs/test
    Filesystem Size Used Avail Use% Mounted on
    /dev/nvme0n1p8 10M 10M 0 100% /mnt/f2fs

    Signed-off-by: Chengguang Xu
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chengguang Xu
     

20 Nov, 2019

2 commits

  • The FS got stuck in the below stack when the storage is almost
    full/dirty condition (when FG_GC is being done).

    schedule_timeout
    io_schedule_timeout
    congestion_wait
    f2fs_drop_inmem_pages_all
    f2fs_gc
    f2fs_balance_fs
    __write_node_page
    f2fs_fsync_node_pages
    f2fs_do_sync_file
    f2fs_ioctl

    The root cause for this issue is there is a potential infinite loop
    in f2fs_drop_inmem_pages_all() for the case where gc_failure is true
    and when there an inode whose i_gc_failures[GC_FAILURE_ATOMIC] is
    not set. Fix this by keeping track of the total atomic files
    currently opened and using that to exit from this condition.

    Fix-suggested-by: Chao Yu
    Signed-off-by: Chao Yu
    Signed-off-by: Sahitya Tummala
    Signed-off-by: Jaegeuk Kim

    Sahitya Tummala
     
  • As Eric mentioned, bare printk{,_ratelimited} won't show which
    filesystem instance these message is coming from, this patch tries
    to show fs instance with sb->s_id field in all places we missed
    before.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

14 Nov, 2019

1 commit


13 Nov, 2019

1 commit

  • Avoid the need to allocate a potentially large array of struct blk_zone
    in the block layer by switching the ->report_zones method interface to
    a callback model. Now the caller simply supplies a callback that is
    executed on each reported zone, and private data for it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Shin'ichiro Kawasaki
    Signed-off-by: Damien Le Moal
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

08 Nov, 2019

4 commits

  • We expect 64-bit calculation result from below statement, however
    in 32-bit machine, looped left shift operation on pgoff_t type
    variable may cause overflow issue, fix it by forcing type cast.

    page->index << PAGE_SHIFT;

    Fixes: 26de9b117130 ("f2fs: avoid unnecessary updating inode during fsync")
    Fixes: 0a2aa8fbb969 ("f2fs: refactor __exchange_data_block for speed up")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As Eric reported:

    RENAME_EXCHANGE support was just added to fsstress in xfstests:

    commit 65dfd40a97b6bbbd2a22538977bab355c5bc0f06
    Author: kaixuxia
    Date: Thu Oct 31 14:41:48 2019 +0800

    fsstress: add EXCHANGE renameat2 support

    This is causing xfstest generic/579 to fail due to fsck.f2fs reporting errors.
    I'm not sure what the problem is, but it still happens even with all the
    fs-verity stuff in the test commented out, so that the test just runs fsstress.

    generic/579 23s ... [10:02:25]
    [ 7.745370] run fstests generic/579 at 2019-11-04 10:02:25
    _check_generic_filesystem: filesystem on /dev/vdc is inconsistent
    (see /results/f2fs/results-default/generic/579.full for details)
    [10:02:47]
    Ran: generic/579
    Failures: generic/579
    Failed 1 of 1 tests
    Xunit report: /results/f2fs/results-default/result.xml

    Here's the contents of 579.full:

    _check_generic_filesystem: filesystem on /dev/vdc is inconsistent
    *** fsck.f2fs output ***
    [ASSERT] (__chk_dots_dentries:1378) --> Bad inode number[0x24] for '..', parent parent ino is [0xd10]

    The root cause is that we forgot to update directory's i_pino during
    cross_rename, fix it.

    Fixes: 32f9bc25cbda0 ("f2fs: support ->rename2()")
    Signed-off-by: Chao Yu
    Tested-by: Eric Biggers
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch supports 2MB-aligned pinned file, which can guarantee no GC at all
    by allocating fully valid 2MB segment.

    Check free segments by has_not_enough_free_secs() with large budget.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • xfstests/generic/475 complains kernel warn/panic while testing corrupted disk.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

07 Nov, 2019

2 commits

  • Zoned block devices (ZBC and ZAC devices) allow an explicit control
    over the condition (state) of zones. The operations allowed are:
    * Open a zone: Transition to open condition to indicate that a zone will
    actively be written
    * Close a zone: Transition to closed condition to release the drive
    resources used for writing to a zone
    * Finish a zone: Transition an open or closed zone to the full
    condition to prevent write operations

    To enable this control for in-kernel zoned block device users, define
    the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE
    and REQ_OP_ZONE_FINISH as well as the generic function
    blkdev_zone_mgmt() for submitting these operations on a range of zones.
    This results in blkdev_reset_zones() removal and replacement with this
    new zone magement function. Users of blkdev_reset_zones() (f2fs and
    dm-zoned) are updated accordingly.

    Contains contributions from Matias Bjorling, Hans Holmberg,
    Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.

    Reviewed-by: Javier González
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ajay Joshi
    Signed-off-by: Matias Bjorling
    Signed-off-by: Hans Holmberg
    Signed-off-by: Dmitry Fomichev
    Signed-off-by: Keith Busch
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Ajay Joshi
     
  • f2fs inode numbers are stable across filesystem resizing, and f2fs inode
    and file logical block numbers are always 32-bit. So f2fs can always
    support IV_INO_LBLK_64 encryption policies. Wire up the needed
    fscrypt_operations to declare support.

    Acked-by: Jaegeuk Kim
    Signed-off-by: Eric Biggers

    Eric Biggers
     

04 Nov, 2019

1 commit


26 Oct, 2019

1 commit

  • In commit 8648de2c581e ("f2fs: add bio cache for IPU"), we added
    f2fs_submit_ipu_bio() in __write_data_page() as below:

    __write_data_page()

    if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) {
    f2fs_submit_ipu_bio(sbi, bio, page);
    ....
    }

    in order to avoid below deadlock:

    Thread A Thread B
    - __write_data_page (inode x, page y)
    - f2fs_do_write_data_page
    - set_page_writeback ---- set writeback flag in page y
    - f2fs_inplace_write_data
    - f2fs_balance_fs
    - lock gc_mutex
    - lock gc_mutex
    - f2fs_gc
    - do_garbage_collect
    - gc_data_segment
    - move_data_page
    - f2fs_wait_on_page_writeback
    - wait_on_page_writeback --- wait writeback of page y

    However, the bio submission breaks the merge of IPU IOs.

    So in this patch let's add a global bio cache for merged IPU pages,
    then f2fs_wait_on_page_writeback() is able to submit bio if a
    writebacked page is cached in global bio cache.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

23 Oct, 2019

5 commits


05 Oct, 2019

1 commit

  • generic/018 reports an inconsistent status of atime, the
    testcase is as below:
    - open file with O_SYNC
    - write file to construct fraged space
    - calc md5 of file
    - record {a,c,m}time
    - defrag file --- do nothing
    - umount & mount
    - check {a,c,m}time

    The root cause is, as f2fs enables lazytime by default, atime
    update will dirty vfs inode, rather than dirtying f2fs inode (by set
    with FI_DIRTY_INODE), so later f2fs_write_inode() called from VFS will
    fail to update inode page due to our skip:

    f2fs_write_inode()
    if (is_inode_flag_set(inode, FI_DIRTY_INODE))
    return 0;

    So eventually, after evict(), we lose last atime for ever.

    To fix this issue, we need to check whether {a,c,m,cr}time is
    consistent in between inode cache and inode page, and only skip
    f2fs_update_inode() if f2fs inode is not dirty and time is
    consistent as well.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

22 Sep, 2019

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we introduced casefolding support in f2fs, and fixed
    various bugs in individual features such as IO alignment,
    checkpoint=disable, quota, and swapfile.

    Enhancement:
    - support casefolding w/ enhancement in ext4
    - support fiemap for directory
    - support FS_IO_GET|SET_FSLABEL

    Bug fix:
    - fix IO stuck during checkpoint=disable
    - avoid infinite GC loop
    - fix panic/overflow related to IO alignment feature
    - fix livelock in swap file
    - fix discard command leak
    - disallow dio for atomic_write"

    * tag 'f2fs-for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits)
    f2fs: add a condition to detect overflow in f2fs_ioc_gc_range()
    f2fs: fix to add missing F2FS_IO_ALIGNED() condition
    f2fs: fix to fallback to buffered IO in IO aligned mode
    f2fs: fix to handle error path correctly in f2fs_map_blocks
    f2fs: fix extent corrupotion during directIO in LFS mode
    f2fs: check all the data segments against all node ones
    f2fs: Add a small clarification to CONFIG_FS_F2FS_FS_SECURITY
    f2fs: fix inode rwsem regression
    f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()
    f2fs: avoid infinite GC loop due to stale atomic files
    f2fs: Fix indefinite loop in f2fs_gc()
    f2fs: convert inline_data in prior to i_size_write
    f2fs: fix error path of f2fs_convert_inline_page()
    f2fs: add missing documents of reserve_root/resuid/resgid
    f2fs: fix flushing node pages when checkpoint is disabled
    f2fs: enhance f2fs_is_checkpoint_ready()'s readability
    f2fs: clean up __bio_alloc()'s parameter
    f2fs: fix wrong error injection path in inc_valid_block_count()
    f2fs: fix to writeout dirty inode during node flush
    f2fs: optimize case-insensitive lookups
    ...

    Linus Torvalds
     

20 Sep, 2019

1 commit

  • Pull y2038 vfs updates from Arnd Bergmann:
    "Add inode timestamp clamping.

    This series from Deepa Dinamani adds a per-superblock minimum/maximum
    timestamp limit for a file system, and clamps timestamps as they are
    written, to avoid random behavior from integer overflow as well as
    having different time stamps on disk vs in memory.

    At mount time, a warning is now printed for any file system that can
    represent current timestamps but not future timestamps more than 30
    years into the future, similar to the arbitrary 30 year limit that was
    added to settimeofday().

    This was picked as a compromise to warn users to migrate to other file
    systems (e.g. ext4 instead of ext3) when they need the file system to
    survive beyond 2038 (or similar limits in other file systems), but not
    get in the way of normal usage"

    * tag 'y2038-vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
    ext4: Reduce ext4 timestamp warnings
    isofs: Initialize filesystem timestamp ranges
    pstore: fs superblock limits
    fs: omfs: Initialize filesystem timestamp ranges
    fs: hpfs: Initialize filesystem timestamp ranges
    fs: ceph: Initialize filesystem timestamp ranges
    fs: sysv: Initialize filesystem timestamp ranges
    fs: affs: Initialize filesystem timestamp ranges
    fs: fat: Initialize filesystem timestamp ranges
    fs: cifs: Initialize filesystem timestamp ranges
    fs: nfs: Initialize filesystem timestamp ranges
    ext4: Initialize timestamps limits
    9p: Fill min and max timestamps in sb
    fs: Fill in max and min timestamps in superblock
    utimes: Clamp the timestamps before update
    mount: Add mount warning for impending timestamp expiry
    timestamp_truncate: Replace users of timespec64_trunc
    vfs: Add timestamp_truncate() api
    vfs: Add file timestamp range support

    Linus Torvalds
     

19 Sep, 2019

1 commit

  • Pull fs-verity support from Eric Biggers:
    "fs-verity is a filesystem feature that provides Merkle tree based
    hashing (similar to dm-verity) for individual readonly files, mainly
    for the purpose of efficient authenticity verification.

    This pull request includes:

    (a) The fs/verity/ support layer and documentation.

    (b) fs-verity support for ext4 and f2fs.

    Compared to the original fs-verity patchset from last year, the UAPI
    to enable fs-verity on a file has been greatly simplified. Lots of
    other things were cleaned up too.

    fs-verity is planned to be used by two different projects on Android;
    most of the userspace code is in place already. Another userspace tool
    ("fsverity-utils"), and xfstests, are also available. e2fsprogs and
    f2fs-tools already have fs-verity support. Other people have shown
    interest in using fs-verity too.

    I've tested this on ext4 and f2fs with xfstests, both the existing
    tests and the new fs-verity tests. This has also been in linux-next
    since July 30 with no reported issues except a couple minor ones I
    found myself and folded in fixes for.

    Ted and I will be co-maintaining fs-verity"

    * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
    f2fs: add fs-verity support
    ext4: update on-disk format documentation for fs-verity
    ext4: add fs-verity read support
    ext4: add basic fs-verity support
    fs-verity: support builtin file signatures
    fs-verity: add SHA-512 support
    fs-verity: implement FS_IOC_MEASURE_VERITY ioctl
    fs-verity: implement FS_IOC_ENABLE_VERITY ioctl
    fs-verity: add data verification hooks for ->readpages()
    fs-verity: add the hook for file ->setattr()
    fs-verity: add the hook for file ->open()
    fs-verity: add inode and superblock fields
    fs-verity: add Kconfig and the helper functions for hashing
    fs: uapi: define verity bit for FS_IOC_GETFLAGS
    fs-verity: add UAPI header
    fs-verity: add MAINTAINERS file entry
    fs-verity: add a documentation file

    Linus Torvalds
     

18 Sep, 2019

1 commit


16 Sep, 2019

9 commits

  • In f2fs_allocate_data_block(), we will reset fio.retry for IO
    alignment feature instead of IO serialization feature.

    In addition, spread F2FS_IO_ALIGNED() to check IO alignment
    feature status explicitly.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In LFS mode, we allow OPU for direct IO, however, we didn't consider
    IO alignment feature, so direct IO can trigger unaligned IO, let's
    just fallback to buffered IO to keep correct IO alignment semantics
    in all places.

    Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In f2fs_map_blocks(), we should bail out once __allocate_data_block()
    failed.

    Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In LFS mode, por_fsstress testcase reports a bug as below:

    [ASSERT] (fsck_chk_inode_blk: 931) --> ino: 0x12fe has wrong ext: [pgofs:142, blk:215424, len:16]

    Since commit f847c699cff3 ("f2fs: allow out-place-update for direct
    IO in LFS mode"), we start to allow OPU mode for direct IO, however,
    we missed to update extent cache in __allocate_data_block(), finally,
    it cause extent field being inconsistent with physical block address,
    fix it.

    Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As a part of the sanity checking while mounting, distinct segment number
    assignment to data and node segments is verified. Fixing a small bug in
    this verification between node and data segments. We need to check all
    the data segments with all the node segments.

    Fixes: 042be0f849e5f ("f2fs: fix to do sanity check with current segment number")
    Signed-off-by: Surbhi Palande
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Surbhi Palande
     
  • Signed-off-by: Lockywolf
    Signed-off-by: Jaegeuk Kim

    Lockywolf
     
  • This is similar to 942491c9e6d6 ("xfs: fix AIM7 regression")
    Apparently our current rwsem code doesn't like doing the trylock, then
    lock for real scheme. So change our read/write methods to just do the
    trylock for the RWF_NOWAIT case.

    We don't need a check for IOCB_NOWAIT and !direct-IO because it
    is checked in generic_write_checks().

    Fixes: b91050a80cec ("f2fs: add nowait aio support")
    Signed-off-by: Goldwyn Rodrigues
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Goldwyn Rodrigues
     
  • If inode is newly created, inode page may not synchronize with inode cache,
    so fields like .i_inline or .i_extra_isize could be wrong, in below call
    path, we may access such wrong fields, result in failing to migrate valid
    target block.

    Thread A Thread B
    - f2fs_create
    - f2fs_add_link
    - f2fs_add_dentry
    - f2fs_init_inode_metadata
    - f2fs_add_inline_entry
    - f2fs_new_inode_page
    - f2fs_put_page
    : inode page wasn't updated with inode cache
    - gc_data_segment
    - is_alive
    - f2fs_get_node_page
    - datablock_addr
    - offset_in_addr
    : access uninitialized fields

    Fixes: 7a2af766af15 ("f2fs: enhance on-disk inode structure scalability")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • If committing atomic pages is failed when doing f2fs_do_sync_file(), we can
    get commited pages but atomic_file being still set like:

    - inmem: 0, atomic IO: 4 (Max. 10), volatile IO: 0 (Max. 0)

    If GC selects this block, we can get an infinite loop like this:

    f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
    f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 18533696, size = 4096
    f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
    f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
    f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
    f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 18533696, size = 4096
    f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
    f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c

    In that moment, we can observe:

    [Before]
    Try to move 5084219 blocks (BG: 384508)
    - data blocks : 4962373 (274483)
    - node blocks : 121846 (110025)
    Skipped : atomic write 4534686 (10)

    [After]
    Try to move 5088973 blocks (BG: 384508)
    - data blocks : 4967127 (274483)
    - node blocks : 121846 (110025)
    Skipped : atomic write 4539440 (10)

    So, refactor atomic_write flow like this:
    1. start_atomic_write
    - add inmem_list and set atomic_file

    2. write()
    - register it in inmem_pages

    3. commit_atomic_write
    - if no error, f2fs_drop_inmem_pages()
    - f2fs_commit_inmme_pages() failed
    : __revoked_inmem_pages() was done
    - f2fs_do_sync_file failed
    : abort_atomic_write later

    4. abort_atomic_write
    - f2fs_drop_inmem_pages

    5. f2fs_drop_inmem_pages
    - clear atomic_file
    - remove inmem_list

    Based on this change, when GC fails to move block in atomic_file,
    f2fs_drop_inmem_pages_all() can call f2fs_drop_inmem_pages().

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

09 Sep, 2019

1 commit

  • Policy - foreground GC, LFS mode and greedy GC mode.

    Under this policy, f2fs_gc() loops forever to GC as it doesn't have
    enough free segements to proceed and thus it keeps calling gc_more
    for the same victim segment. This can happen if the selected victim
    segment could not be GC'd due to failed blkaddr validity check i.e.
    is_alive() returns false for the blocks set in current validity map.

    Fix this by not resetting the sbi->cur_victim_sec to NULL_SEGNO, when
    the segment selected could not be GC'd. This helps to select another
    segment for GC and thus helps to proceed forward with GC.

    [Note]
    This can happen due to is_alive as well as atomic_file which skipps
    GC.

    Signed-off-by: Sahitya Tummala
    Signed-off-by: Jaegeuk Kim

    Sahitya Tummala