06 Jan, 2021

3 commits

  • [ Upstream commit 6422a71ef40e4751d59b8c9412e7e2dafe085878 ]

    I found out f2fs_free_dic() is invoked in a wrong timing, but
    f2fs_verify_bio() still needed the dic info and it triggered the
    below kernel panic. It has been caused by the race condition of
    pending_pages value between decompression and verity logic, when
    the same compression cluster had been split in different bios.
    By split bios, f2fs_verify_bio() ended up with decreasing
    pending_pages value before it is reset to nr_cpages by
    f2fs_decompress_pages() and caused the kernel panic.

    [ 4416.564763] Unable to handle kernel NULL pointer dereference
    at virtual address 0000000000000000
    ...
    [ 4416.896016] Workqueue: fsverity_read_queue f2fs_verity_work
    [ 4416.908515] pc : fsverity_verify_page+0x20/0x78
    [ 4416.913721] lr : f2fs_verify_bio+0x11c/0x29c
    [ 4416.913722] sp : ffffffc019533cd0
    [ 4416.913723] x29: ffffffc019533cd0 x28: 0000000000000402
    [ 4416.913724] x27: 0000000000000001 x26: 0000000000000100
    [ 4416.913726] x25: 0000000000000001 x24: 0000000000000004
    [ 4416.913727] x23: 0000000000001000 x22: 0000000000000000
    [ 4416.913728] x21: 0000000000000000 x20: ffffffff2076f9c0
    [ 4416.913729] x19: ffffffff2076f9c0 x18: ffffff8a32380c30
    [ 4416.913731] x17: ffffffc01f966d97 x16: 0000000000000298
    [ 4416.913732] x15: 0000000000000000 x14: 0000000000000000
    [ 4416.913733] x13: f074faec89ffffff x12: 0000000000000000
    [ 4416.913734] x11: 0000000000001000 x10: 0000000000001000
    [ 4416.929176] x9 : ffffffff20d1f5c7 x8 : 0000000000000000
    [ 4416.929178] x7 : 626d7464ff286b6b x6 : ffffffc019533ade
    [ 4416.929179] x5 : 000000008049000e x4 : ffffffff2793e9e0
    [ 4416.929180] x3 : 000000008049000e x2 : ffffff89ecfa74d0
    [ 4416.929181] x1 : 0000000000000c40 x0 : ffffffff2076f9c0
    [ 4416.929184] Call trace:
    [ 4416.929187] fsverity_verify_page+0x20/0x78
    [ 4416.929189] f2fs_verify_bio+0x11c/0x29c
    [ 4416.929192] f2fs_verity_work+0x58/0x84
    [ 4417.050667] process_one_work+0x270/0x47c
    [ 4417.055354] worker_thread+0x27c/0x4d8
    [ 4417.059784] kthread+0x13c/0x320
    [ 4417.063693] ret_from_fork+0x10/0x18

    Chao pointed this can happen by the below race condition.

    Thread A f2fs_post_read_wq fsverity_wq
    - f2fs_read_multi_pages()
    - f2fs_alloc_dic
    - dic->pending_pages = 2
    - submit_bio()
    - submit_bio()
    - f2fs_post_read_work() handle first bio
    - f2fs_decompress_work()
    - __read_end_io()
    - f2fs_decompress_pages()
    - dic->pending_pages--
    - enqueue f2fs_verity_work()
    - f2fs_verity_work() handle first bio
    - f2fs_verify_bio()
    - dic->pending_pages--
    - f2fs_post_read_work() handle second bio
    - f2fs_decompress_work()
    - enqueue f2fs_verity_work()
    - f2fs_verify_pages()
    - f2fs_free_dic()

    - f2fs_verity_work() handle second bio
    - f2fs_verfy_bio()
    - use-after-free on dic

    Signed-off-by: Daeho Jeong
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Daeho Jeong
     
  • [ Upstream commit a95ba66ac1457b76fe472c8e092ab1006271f16c ]

    Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
    wrong do_shinker work. Let's avoid to return insane overflowed value by adding
    single tracking value.

    Reported-by: Light Hsieh
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Jaegeuk Kim
     
  • commit e584bbe821229a3e7cc409eecd51df66f9268c21 upstream.

    syzbot reported a bug which could cause shift-out-of-bounds issue,
    fix it.

    Call Trace:
    __dump_stack lib/dump_stack.c:79 [inline]
    dump_stack+0x107/0x163 lib/dump_stack.c:120
    ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
    __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
    sanity_check_raw_super fs/f2fs/super.c:2812 [inline]
    read_raw_super_block fs/f2fs/super.c:3267 [inline]
    f2fs_fill_super.cold+0x16c9/0x16f6 fs/f2fs/super.c:3519
    mount_bdev+0x34d/0x410 fs/super.c:1366
    legacy_get_tree+0x105/0x220 fs/fs_context.c:592
    vfs_get_tree+0x89/0x2f0 fs/super.c:1496
    do_new_mount fs/namespace.c:2896 [inline]
    path_mount+0x12ae/0x1e70 fs/namespace.c:3227
    do_mount fs/namespace.c:3240 [inline]
    __do_sys_mount fs/namespace.c:3448 [inline]
    __se_sys_mount fs/namespace.c:3425 [inline]
    __x64_sys_mount+0x27f/0x300 fs/namespace.c:3425
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Reported-by: syzbot+ca9a785f8ac472085994@syzkaller.appspotmail.com
    Signed-off-by: Anant Thazhemadam
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     

30 Dec, 2020

2 commits

  • [ Upstream commit 89ff6005039a878afac87889fee748fa3f957c3a ]

    In case of retrying fill_super with skip_recovery,
    s_encoding for casefold would not be loaded again even though it's
    already been freed because it's not NULL.
    Set NULL after free to prevent double freeing when unmount.

    Fixes: eca4873ee1b6 ("f2fs: Use generic casefolding support")
    Signed-off-by: Hyeongseok Kim
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Hyeongseok Kim
     
  • [ Upstream commit 3acc4522d89e0a326db69e9d0afaad8cf763a54c ]

    When running fault injection test, if we don't stop checkpoint, some stale
    NAT entries were flushed which breaks consistency.

    Fixes: 86f33603f8c5 ("f2fs: handle errors of f2fs_get_meta_page_nofail")
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Jaegeuk Kim
     

26 Dec, 2020

3 commits

  • commit bfc2b7e8518999003a61f91c1deb5e88ed77b07d upstream.

    As described in "fscrypt: add fscrypt_is_nokey_name()", it's possible to
    create a duplicate filename in an encrypted directory by creating a file
    concurrently with adding the directory's encryption key.

    Fix this bug on f2fs by rejecting no-key dentries in f2fs_add_link().

    Note that the weird check for the current task in f2fs_do_add_link()
    seems to make this bug difficult to reproduce on f2fs.

    Fixes: 9ea97163c6da ("f2fs crypto: add filename encryption for f2fs_add_link")
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20201118075609.120337-4-ebiggers@kernel.org
    Signed-off-by: Eric Biggers
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     
  • commit 5335bfc6eb688344bfcd4b4133c002c0ae0d0719 upstream.

    section is dirty, but dirty_secmap may not set

    Reported-by: Jia Yang
    Fixes: da52f8ade40b ("f2fs: get the right gc victim section when section has several segments")
    Cc:
    Signed-off-by: Jack Qiu
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Jack Qiu
     
  • commit 7a6e59d719ef0ec9b3d765cba3ba98ee585cbde3 upstream.

    As kitestramuort reported:

    F2FS-fs (nvme0n1p4): access invalid blkaddr:1598541474
    [ 25.725898] ------------[ cut here ]------------
    [ 25.725903] WARNING: CPU: 6 PID: 2018 at f2fs_is_valid_blkaddr+0x23a/0x250
    [ 25.725923] Call Trace:
    [ 25.725927] ? f2fs_llseek+0x204/0x620
    [ 25.725929] ? ovl_copy_up_data+0x14f/0x200
    [ 25.725931] ? ovl_copy_up_inode+0x174/0x1e0
    [ 25.725933] ? ovl_copy_up_one+0xa22/0xdf0
    [ 25.725936] ? ovl_copy_up_flags+0xa6/0xf0
    [ 25.725938] ? ovl_aio_cleanup_handler+0xd0/0xd0
    [ 25.725939] ? ovl_maybe_copy_up+0x86/0xa0
    [ 25.725941] ? ovl_open+0x22/0x80
    [ 25.725943] ? do_dentry_open+0x136/0x350
    [ 25.725945] ? path_openat+0xb7e/0xf40
    [ 25.725947] ? __check_sticky+0x40/0x40
    [ 25.725948] ? do_filp_open+0x70/0x100
    [ 25.725950] ? __check_sticky+0x40/0x40
    [ 25.725951] ? __check_sticky+0x40/0x40
    [ 25.725953] ? __x64_sys_openat+0x1db/0x2c0
    [ 25.725955] ? do_syscall_64+0x2d/0x40
    [ 25.725957] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

    llseek() reports invalid block address access, the root cause is if
    file has inline data, f2fs_seek_block() will access inline data regard
    as block address index in inode block, which should be wrong, fix it.

    Reported-by: kitestramuort
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     

25 Oct, 2020

1 commit

  • Pull misc vfs updates from Al Viro:
    "Assorted stuff all over the place (the largest group here is
    Christoph's stat cleanups)"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: remove KSTAT_QUERY_FLAGS
    fs: remove vfs_stat_set_lookup_flags
    fs: move vfs_fstatat out of line
    fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat
    fs: remove vfs_statx_fd
    fs: omfs: use kmemdup() rather than kmalloc+memcpy
    [PATCH] reduce boilerplate in fsid handling
    fs: Remove duplicated flag O_NDELAY occurring twice in VALID_OPEN_FLAGS
    selftests: mount: add nosymfollow tests
    Add a "nosymfollow" mount option.

    Linus Torvalds
     

17 Oct, 2020

2 commits

  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've added new features such as zone capacity for ZNS
    and a new GC policy, ATGC, along with in-memory segment management. In
    addition, we could improve the decompression speed significantly by
    changing virtual mapping method. Even though we've fixed lots of small
    bugs in compression support, I feel that it becomes more stable so
    that I could give it a try in production.

    Enhancements:
    - suport zone capacity in NVMe Zoned Namespace devices
    - introduce in-memory current segment management
    - add standart casefolding support
    - support age threshold based garbage collection
    - improve decompression speed by changing virtual mapping method

    Bug fixes:
    - fix condition checks in some ioctl() such as compression, move_range, etc
    - fix 32/64bits support in data structures
    - fix memory allocation in zstd decompress
    - add some boundary checks to avoid kernel panic on corrupted image
    - fix disallowing compression for non-empty file
    - fix slab leakage of compressed block writes

    In addition, it includes code refactoring for better readability and
    minor bug fixes for compression and zoned device support"

    * tag 'f2fs-for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits)
    f2fs: code cleanup by removing unnecessary check
    f2fs: wait for sysfs kobject removal before freeing f2fs_sb_info
    f2fs: fix writecount false positive in releasing compress blocks
    f2fs: introduce check_swap_activate_fast()
    f2fs: don't issue flush in f2fs_flush_device_cache() for nobarrier case
    f2fs: handle errors of f2fs_get_meta_page_nofail
    f2fs: fix to set SBI_NEED_FSCK flag for inconsistent inode
    f2fs: reject CASEFOLD inode flag without casefold feature
    f2fs: fix memory alignment to support 32bit
    f2fs: fix slab leak of rpages pointer
    f2fs: compress: fix to disallow enabling compress on non-empty file
    f2fs: compress: introduce cic/dic slab cache
    f2fs: compress: introduce page array slab cache
    f2fs: fix to do sanity check on segment/section count
    f2fs: fix to check segment boundary during SIT page readahead
    f2fs: fix uninit-value in f2fs_lookup
    f2fs: remove unneeded parameter in find_in_block()
    f2fs: fix wrong total_sections check and fsmeta check
    f2fs: remove duplicated code in sanity_check_area_boundary
    f2fs: remove unused check on version_bitmap
    ...

    Linus Torvalds
     
  • Define it in the callers instead of in page_cache_ra_unbounded().

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Cc: David Howells
    Cc: Eric Biggers
    Link: https://lkml.kernel.org/r/20200903140844.14194-4-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

15 Oct, 2020

2 commits

  • f2fs_seek_block() is only used for regular file,
    so don't have to check inline dentry in it.

    Signed-off-by: Chengguang Xu
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chengguang Xu
     
  • syzkaller found that with CONFIG_DEBUG_KOBJECT_RELEASE=y, unmounting an
    f2fs filesystem could result in the following splat:

    kobject: 'loop5' ((____ptrval____)): kobject_release, parent 0000000000000000 (delayed 250)
    kobject: 'f2fs_xattr_entry-7:5' ((____ptrval____)): kobject_release, parent 0000000000000000 (delayed 750)
    ------------[ cut here ]------------
    ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x98
    WARNING: CPU: 0 PID: 699 at lib/debugobjects.c:485 debug_print_object+0x180/0x240
    Kernel panic - not syncing: panic_on_warn set ...
    CPU: 0 PID: 699 Comm: syz-executor.5 Tainted: G S 5.9.0-rc8+ #101
    Hardware name: linux,dummy-virt (DT)
    Call trace:
    dump_backtrace+0x0/0x4d8
    show_stack+0x34/0x48
    dump_stack+0x174/0x1f8
    panic+0x360/0x7a0
    __warn+0x244/0x2ec
    report_bug+0x240/0x398
    bug_handler+0x50/0xc0
    call_break_hook+0x160/0x1d8
    brk_handler+0x30/0xc0
    do_debug_exception+0x184/0x340
    el1_dbg+0x48/0xb0
    el1_sync_handler+0x170/0x1c8
    el1_sync+0x80/0x100
    debug_print_object+0x180/0x240
    debug_check_no_obj_freed+0x200/0x430
    slab_free_freelist_hook+0x190/0x210
    kfree+0x13c/0x460
    f2fs_put_super+0x624/0xa58
    generic_shutdown_super+0x120/0x300
    kill_block_super+0x94/0xf8
    kill_f2fs_super+0x244/0x308
    deactivate_locked_super+0x104/0x150
    deactivate_super+0x118/0x148
    cleanup_mnt+0x27c/0x3c0
    __cleanup_mnt+0x28/0x38
    task_work_run+0x10c/0x248
    do_notify_resume+0x9d4/0x1188
    work_pending+0x8/0x34c

    Like the error handling for f2fs_register_sysfs(), we need to wait for
    the kobject to be destroyed before returning to prevent a potential
    use-after-free.

    Fixes: bf9e697ecd42 ("f2fs: expose features to sysfs entry")
    Cc: Jaegeuk Kim
    Cc: Chao Yu
    Signed-off-by: Jamie Iles
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jamie Iles
     

14 Oct, 2020

4 commits

  • In current condition check, if it detects writecount, it return -EBUSY
    regardless of f_mode of the file. Fixed it.

    Signed-off-by: Daeho Jeong
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Daeho Jeong
     
  • check_swap_activate() will lookup block mapping via bmap() one by one, so
    its performance is very bad, this patch introduces check_swap_activate_fast()
    to use f2fs_fiemap() to boost this process, since f2fs_fiemap() will lookup
    block mappings in batch, therefore, it can improve swapon()'s performance
    significantly.

    Note that this enhancement only works when page size is equal to f2fs' block
    size.

    Testcase: (backend device: zram)
    - touch file
    - pin & fallocate file to 8GB
    - mkswap file
    - swapon file

    Before:
    real 0m2.999s
    user 0m0.000s
    sys 0m2.980s

    After:
    real 0m0.081s
    user 0m0.000s
    sys 0m0.064s

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch changes f2fs_flush_device_cache() to skip issuing flush for
    nobarrier case.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • First problem is we hit BUG_ON() in f2fs_get_sum_page given EIO on
    f2fs_get_meta_page_nofail().

    Quick fix was not to give any error with infinite loop, but syzbot caught
    a case where it goes to that loop from fuzzed image. In turned out we abused
    f2fs_get_meta_page_nofail() like in the below call stack.

    - f2fs_fill_super
    - f2fs_build_segment_manager
    - build_sit_entries
    - get_current_sit_page

    INFO: task syz-executor178:6870 can't die for more than 143 seconds.
    task:syz-executor178 state:R
    stack:26960 pid: 6870 ppid: 6869 flags:0x00004006
    Call Trace:

    Showing all locks held in the system:
    1 lock held by khungtaskd/1179:
    #0: ffffffff8a554da0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6242
    1 lock held by systemd-journal/3920:
    1 lock held by in:imklog/6769:
    #0: ffff88809eebc130 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:930
    1 lock held by syz-executor178/6870:
    #0: ffff8880925120e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0x201/0xaf0 fs/super.c:229

    Actually, we didn't have to use _nofail in this case, since we could return
    error to mount(2) already with the error handler.

    As a result, this patch tries to 1) remove _nofail callers as much as possible,
    2) deal with error case in last remaining caller, f2fs_get_sum_page().

    Reported-by: syzbot+ee250ac8137be41d7b13@syzkaller.appspotmail.com
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

13 Oct, 2020

1 commit

  • Pull fscrypt updates from Eric Biggers:
    "This release, we rework the implementation of creating new encrypted
    files in order to fix some deadlocks and prepare for adding fscrypt
    support to CephFS, which Jeff Layton is working on.

    We also export a symbol in preparation for the above-mentioned CephFS
    support and also for ext4/f2fs encrypt+casefold support.

    Finally, there are a few other small cleanups.

    As usual, all these patches have been in linux-next with no reported
    issues, and I've tested them with xfstests"

    * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
    fscrypt: export fscrypt_d_revalidate()
    fscrypt: rename DCACHE_ENCRYPTED_NAME to DCACHE_NOKEY_NAME
    fscrypt: don't call no-key names "ciphertext names"
    fscrypt: use sha256() instead of open coding
    fscrypt: make fscrypt_set_test_dummy_encryption() take a 'const char *'
    fscrypt: handle test_dummy_encryption in more logical way
    fscrypt: move fscrypt_prepare_symlink() out-of-line
    fscrypt: make "#define fscrypt_policy" user-only
    fscrypt: stop pretending that key setup is nofs-safe
    fscrypt: require that fscrypt_encrypt_symlink() already has key
    fscrypt: remove fscrypt_inherit_context()
    fscrypt: adjust logging for in-creation inodes
    ubifs: use fscrypt_prepare_new_inode() and fscrypt_set_context()
    f2fs: use fscrypt_prepare_new_inode() and fscrypt_set_context()
    ext4: use fscrypt_prepare_new_inode() and fscrypt_set_context()
    ext4: factor out ext4_xattr_credits_for_new_inode()
    fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()
    fscrypt: restrict IV_INO_LBLK_32 to ino_bits <= 32
    fscrypt: drop unused inode argument from fscrypt_fname_alloc_buffer

    Linus Torvalds
     

10 Oct, 2020

1 commit


09 Oct, 2020

2 commits

  • syzbot reported:

    general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
    KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
    CPU: 0 PID: 6860 Comm: syz-executor835 Not tainted 5.9.0-rc8-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:utf8_casefold+0x43/0x1b0 fs/unicode/utf8-core.c:107
    [...]
    Call Trace:
    f2fs_init_casefolded_name fs/f2fs/dir.c:85 [inline]
    __f2fs_setup_filename fs/f2fs/dir.c:118 [inline]
    f2fs_prepare_lookup+0x3bf/0x640 fs/f2fs/dir.c:163
    f2fs_lookup+0x10d/0x920 fs/f2fs/namei.c:494
    __lookup_hash+0x115/0x240 fs/namei.c:1445
    filename_create+0x14b/0x630 fs/namei.c:3467
    user_path_create fs/namei.c:3524 [inline]
    do_mkdirat+0x56/0x310 fs/namei.c:3664
    do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [...]

    The problem is that an inode has F2FS_CASEFOLD_FL set, but the
    filesystem doesn't have the casefold feature flag set, and therefore
    super_block::s_encoding is NULL.

    Fix this by making sanity_check_inode() reject inodes that have
    F2FS_CASEFOLD_FL when the filesystem doesn't have the casefold feature.

    Reported-by: syzbot+05139c4039d0679e19ff@syzkaller.appspotmail.com
    Fixes: 2c2eb7a300cd ("f2fs: Support case-insensitive file name lookups")
    Signed-off-by: Eric Biggers
    Reviewed-by: Gabriel Krisman Bertazi
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Eric Biggers
     
  • In 32bit system, 64-bits key breaks memory alignment.
    This fixes the commit "f2fs: support 64-bits key in f2fs rb-tree node entry".

    Reported-by: Nicolas Chauvet
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

30 Sep, 2020

4 commits

  • This fixes the below mem leak.

    [ 130.157600] =============================================================================
    [ 130.159662] BUG f2fs_page_array_entry-252:16 (Tainted: G W O ): Objects remaining in f2fs_page_array_entry-252:16 on __kmem_cache_shutdown()
    [ 130.162742] -----------------------------------------------------------------------------
    [ 130.162742]
    [ 130.164979] Disabling lock debugging due to kernel taint
    [ 130.166188] INFO: Slab 0x000000009f5a52d2 objects=22 used=4 fp=0x00000000ba72c3e9 flags=0xfffffc0010200
    [ 130.168269] CPU: 7 PID: 3560 Comm: umount Tainted: G B W O 5.9.0-rc4+ #35
    [ 130.170019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
    [ 130.171941] Call Trace:
    [ 130.172528] dump_stack+0x74/0x9a
    [ 130.173298] slab_err+0xb7/0xdc
    [ 130.174044] ? kernel_poison_pages+0xc0/0xc0
    [ 130.175065] ? on_each_cpu_cond_mask+0x48/0x90
    [ 130.176096] __kmem_cache_shutdown.cold+0x34/0x141
    [ 130.177190] kmem_cache_destroy+0x59/0x100
    [ 130.178223] f2fs_destroy_page_array_cache+0x15/0x20 [f2fs]
    [ 130.179527] f2fs_put_super+0x1bc/0x380 [f2fs]
    [ 130.180538] generic_shutdown_super+0x72/0x110
    [ 130.181547] kill_block_super+0x27/0x50
    [ 130.182438] kill_f2fs_super+0x76/0xe0 [f2fs]
    [ 130.183448] deactivate_locked_super+0x3b/0x80
    [ 130.184456] deactivate_super+0x3e/0x50
    [ 130.185363] cleanup_mnt+0x109/0x160
    [ 130.186179] __cleanup_mnt+0x12/0x20
    [ 130.187003] task_work_run+0x70/0xb0
    [ 130.187841] exit_to_user_mode_prepare+0x18f/0x1b0
    [ 130.188917] syscall_exit_to_user_mode+0x31/0x170
    [ 130.189989] do_syscall_64+0x45/0x90
    [ 130.190828] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 130.191986] RIP: 0033:0x7faf868ea2eb
    [ 130.192815] Code: 7b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 7b 0c 00 f7 d8 64 89 01
    [ 130.196872] RSP: 002b:00007fffb7edb478 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
    [ 130.198494] RAX: 0000000000000000 RBX: 00007faf86a18204 RCX: 00007faf868ea2eb
    [ 130.201021] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055971df71c50
    [ 130.203415] RBP: 000055971df71a40 R08: 0000000000000000 R09: 00007fffb7eda1f0
    [ 130.205772] R10: 00007faf86a04339 R11: 0000000000000246 R12: 000055971df71c50
    [ 130.208150] R13: 0000000000000000 R14: 000055971df71b38 R15: 0000000000000000
    [ 130.210515] INFO: Object 0x00000000a980843a @offset=744
    [ 130.212476] INFO: Allocated in page_array_alloc+0x3d/0xe0 [f2fs] age=1572 cpu=0 pid=3297
    [ 130.215030] __slab_alloc+0x20/0x40
    [ 130.216566] kmem_cache_alloc+0x2a0/0x2e0
    [ 130.218217] page_array_alloc+0x3d/0xe0 [f2fs]
    [ 130.219940] f2fs_init_compress_ctx+0x1f/0x40 [f2fs]
    [ 130.221736] f2fs_write_cache_pages+0x3db/0x860 [f2fs]
    [ 130.223591] f2fs_write_data_pages+0x2c9/0x300 [f2fs]
    [ 130.225414] do_writepages+0x43/0xd0
    [ 130.226907] __filemap_fdatawrite_range+0xd5/0x110
    [ 130.228632] filemap_write_and_wait_range+0x48/0xb0
    [ 130.230336] __generic_file_write_iter+0x18a/0x1d0
    [ 130.232035] f2fs_file_write_iter+0x226/0x550 [f2fs]
    [ 130.233737] new_sync_write+0x113/0x1a0
    [ 130.235204] vfs_write+0x1a6/0x200
    [ 130.236579] ksys_write+0x67/0xe0
    [ 130.237898] __x64_sys_write+0x1a/0x20
    [ 130.239309] do_syscall_64+0x38/0x90

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Compressed inode and normal inode has different layout, so we should
    disallow enabling compress on non-empty file to avoid race condition
    during inode .i_addr array parsing and updating.

    Signed-off-by: Chao Yu
    [Jaegeuk Kim: Fix missing condition]
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Add two slab caches: "f2fs_cic_entry" and "f2fs_dic_entry" for memory
    allocation of compress_io_ctx and decompress_io_ctx structure.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Add a per-sbi slab cache "f2fs_page_array_entry-%u:%u" for memory
    allocation of page pointer array in compress context.

    Signed-off-by: Chao Yu
    [Jaegeuk Kim: Fix wrong memory allocation]
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

29 Sep, 2020

11 commits

  • As syzbot reported:

    BUG: KASAN: slab-out-of-bounds in init_min_max_mtime fs/f2fs/segment.c:4710 [inline]
    BUG: KASAN: slab-out-of-bounds in f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792
    Read of size 8 at addr ffff8880a1b934a8 by task syz-executor682/6878

    CPU: 1 PID: 6878 Comm: syz-executor682 Not tainted 5.9.0-rc6-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x198/0x1fd lib/dump_stack.c:118
    print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
    __kasan_report mm/kasan/report.c:513 [inline]
    kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
    init_min_max_mtime fs/f2fs/segment.c:4710 [inline]
    f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792
    f2fs_fill_super+0x381a/0x6e80 fs/f2fs/super.c:3633
    mount_bdev+0x32e/0x3f0 fs/super.c:1417
    legacy_get_tree+0x105/0x220 fs/fs_context.c:592
    vfs_get_tree+0x89/0x2f0 fs/super.c:1547
    do_new_mount fs/namespace.c:2875 [inline]
    path_mount+0x1387/0x20a0 fs/namespace.c:3192
    do_mount fs/namespace.c:3205 [inline]
    __do_sys_mount fs/namespace.c:3413 [inline]
    __se_sys_mount fs/namespace.c:3390 [inline]
    __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    The root cause is: if segs_per_sec is larger than one, and segment count
    in last section is less than segs_per_sec, we will suffer out-of-boundary
    memory access on sit_i->sentries[] in init_min_max_mtime().

    Fix this by adding sanity check among segment count, section count and
    segs_per_sec value in sanity_check_raw_super().

    Reported-by: syzbot+481a3ffab50fed41dcc0@syzkaller.appspotmail.com
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As syzbot reported:

    kernel BUG at fs/f2fs/segment.h:657!
    invalid opcode: 0000 [#1] PREEMPT SMP KASAN
    CPU: 1 PID: 16220 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:f2fs_ra_meta_pages+0xa51/0xdc0 fs/f2fs/segment.h:657
    Call Trace:
    build_sit_entries fs/f2fs/segment.c:4195 [inline]
    f2fs_build_segment_manager+0x4b8a/0xa3c0 fs/f2fs/segment.c:4779
    f2fs_fill_super+0x377d/0x6b80 fs/f2fs/super.c:3633
    mount_bdev+0x32e/0x3f0 fs/super.c:1417
    legacy_get_tree+0x105/0x220 fs/fs_context.c:592
    vfs_get_tree+0x89/0x2f0 fs/super.c:1547
    do_new_mount fs/namespace.c:2875 [inline]
    path_mount+0x1387/0x2070 fs/namespace.c:3192
    do_mount fs/namespace.c:3205 [inline]
    __do_sys_mount fs/namespace.c:3413 [inline]
    __se_sys_mount fs/namespace.c:3390 [inline]
    __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    @blkno in f2fs_ra_meta_pages could exceed max segment count, causing panic
    in following sanity check in current_sit_addr(), add check condition to
    avoid this issue.

    Reported-by: syzbot+3698081bcf0bb2d12174@syzkaller.appspotmail.com
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As syzbot reported:

    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x21c/0x280 lib/dump_stack.c:118
    kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122
    __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:219
    f2fs_lookup+0xe05/0x1a80 fs/f2fs/namei.c:503
    lookup_open fs/namei.c:3082 [inline]
    open_last_lookups fs/namei.c:3177 [inline]
    path_openat+0x2729/0x6a90 fs/namei.c:3365
    do_filp_open+0x2b8/0x710 fs/namei.c:3395
    do_sys_openat2+0xa88/0x1140 fs/open.c:1168
    do_sys_open fs/open.c:1184 [inline]
    __do_compat_sys_openat fs/open.c:1242 [inline]
    __se_compat_sys_openat+0x2a4/0x310 fs/open.c:1240
    __ia32_compat_sys_openat+0x56/0x70 fs/open.c:1240
    do_syscall_32_irqs_on arch/x86/entry/common.c:80 [inline]
    __do_fast_syscall_32+0x129/0x180 arch/x86/entry/common.c:139
    do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:162
    do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:205
    entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

    In f2fs_lookup(), @res_page could be used before being initialized,
    because in __f2fs_find_entry(), once F2FS_I(dir)->i_current_depth was
    been fuzzed to zero, then @res_page will never be initialized, causing
    this kmsan warning, relocating @res_page initialization place to fix
    this bug.

    Reported-by: syzbot+0eac6f0bbd558fd866d7@syzkaller.appspotmail.com
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • We can relocate @res_page assignment in find_in_block() to
    its caller, so unneeded parameter could be removed for cleanup.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Meta area is not included in section_count computation.
    So the minimum number of total_sections is 1 meanwhile it cannot be
    greater than segment_count_main.

    The minimum number of meta segments is 8 (SB + 2 (CP + SIT + NAT) + SSA).

    Signed-off-by: Wang Xiaojun
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Wang Xiaojun
     
  • Use seg_end_blkaddr instead of "segment0_blkaddr + (segment_count <<
    log_blocks_per_seg)".

    Signed-off-by: Wang Xiaojun
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Wang Xiaojun
     
  • A NULL will not be return by __bitmap_ptr here.
    Remove the unused check.

    Signed-off-by: Wang Xiaojun
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Wang Xiaojun
     
  • Relocate blkzoned feature check into parse_options() like
    other feature check.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • sbi->devs would be initialized only if image enables multiple device
    feature or blkzoned feature, if blkzoned feature flag was set by fuzz
    in non-blkzoned device, we will suffer below panic:

    get_zone_idx fs/f2fs/segment.c:4892 [inline]
    f2fs_usable_zone_blks_in_seg fs/f2fs/segment.c:4943 [inline]
    f2fs_usable_blks_in_seg+0x39b/0xa00 fs/f2fs/segment.c:4999
    Call Trace:
    check_block_count+0x69/0x4e0 fs/f2fs/segment.h:704
    build_sit_entries fs/f2fs/segment.c:4403 [inline]
    f2fs_build_segment_manager+0x51da/0xa370 fs/f2fs/segment.c:5100
    f2fs_fill_super+0x3880/0x6ff0 fs/f2fs/super.c:3684
    mount_bdev+0x32e/0x3f0 fs/super.c:1417
    legacy_get_tree+0x105/0x220 fs/fs_context.c:592
    vfs_get_tree+0x89/0x2f0 fs/super.c:1547
    do_new_mount fs/namespace.c:2896 [inline]
    path_mount+0x12ae/0x1e70 fs/namespace.c:3216
    do_mount fs/namespace.c:3229 [inline]
    __do_sys_mount fs/namespace.c:3437 [inline]
    __se_sys_mount fs/namespace.c:3414 [inline]
    __x64_sys_mount+0x27f/0x300 fs/namespace.c:3414
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46

    Add sanity check to inconsistency on factors: blkzoned flag, device
    path and device character to avoid above panic.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Missing the trace exit in f2fs_sync_dirty_inodes

    Signed-off-by: Zhang Qilong
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Zhang Qilong
     
  • The type of SM_I(sbi)->reserved_segments is unsigned int,
    so change the return value to unsigned int.
    The type cast can be removed in reserved_sections as a result.

    Signed-off-by: Xiaojun Wang
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Xiaojun Wang
     

24 Sep, 2020

1 commit

  • Currently we're using the term "ciphertext name" ambiguously because it
    can mean either the actual ciphertext filename, or the encoded filename
    that is shown when an encrypted directory is listed without its key.
    The latter we're now usually calling the "no-key name"; and while it's
    derived from the ciphertext name, it's not the same thing.

    To avoid this ambiguity, rename fscrypt_name::is_ciphertext_name to
    fscrypt_name::is_nokey_name, and update comments that say "ciphertext
    name" (or "encrypted name") to say "no-key name" instead when warranted.

    Link: https://lore.kernel.org/r/20200924042624.98439-2-ebiggers@kernel.org
    Signed-off-by: Eric Biggers

    Eric Biggers
     

22 Sep, 2020

3 commits

  • fscrypt_set_test_dummy_encryption() requires that the optional argument
    to the test_dummy_encryption mount option be specified as a substring_t.
    That doesn't work well with filesystems that use the new mount API,
    since the new way of parsing mount options doesn't use substring_t.

    Make it take the argument as a 'const char *' instead.

    Instead of moving the match_strdup() into the callers in ext4 and f2fs,
    make them just use arg->from directly. Since the pattern is
    "test_dummy_encryption=%s", the argument will be null-terminated.

    Acked-by: Jeff Layton
    Link: https://lore.kernel.org/r/20200917041136.178600-14-ebiggers@kernel.org
    Signed-off-by: Eric Biggers

    Eric Biggers
     
  • The behavior of the test_dummy_encryption mount option is that when a
    new file (or directory or symlink) is created in an unencrypted
    directory, it's automatically encrypted using a dummy encryption policy.
    That's it; in particular, the encryption (or lack thereof) of existing
    files (or directories or symlinks) doesn't change.

    Unfortunately the implementation of test_dummy_encryption is a bit weird
    and confusing. When test_dummy_encryption is enabled and a file is
    being created in an unencrypted directory, we set up an encryption key
    (->i_crypt_info) for the directory. This isn't actually used to do any
    encryption, however, since the directory is still unencrypted! Instead,
    ->i_crypt_info is only used for inheriting the encryption policy.

    One consequence of this is that the filesystem ends up providing a
    "dummy context" (policy + nonce) instead of a "dummy policy". In
    commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I
    mistakenly thought this was required. However, actually the nonce only
    ends up being used to derive a key that is never used.

    Another consequence of this implementation is that it allows for
    'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
    case that can be forgotten about. For example, currently
    FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
    dummy encryption policy when the filesystem is mounted with
    test_dummy_encryption. That seems like the wrong thing to do, since
    again, the directory itself is not actually encrypted.

    Therefore, switch to a more logical and maintainable implementation
    where the dummy encryption policy inheritance is done without setting up
    keys for unencrypted directories. This involves:

    - Adding a function fscrypt_policy_to_inherit() which returns the
    encryption policy to inherit from a directory. This can be a real
    policy, a dummy policy, or no policy.

    - Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
    with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.

    - Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
    of an inode.

    Acked-by: Jaegeuk Kim
    Acked-by: Jeff Layton
    Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org
    Signed-off-by: Eric Biggers

    Eric Biggers
     
  • Convert f2fs to use the new functions fscrypt_prepare_new_inode() and
    fscrypt_set_context(). This avoids calling
    fscrypt_get_encryption_info() from under f2fs_lock_op(), which can
    deadlock because fscrypt_get_encryption_info() isn't GFP_NOFS-safe.

    For more details about this problem, see the earlier patch
    "fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()".

    This also fixes a f2fs-specific deadlock when the filesystem is mounted
    with '-o test_dummy_encryption' and a file is created in an unencrypted
    directory other than the root directory:

    INFO: task touch:207 blocked for more than 30 seconds.
    Not tainted 5.9.0-rc4-00099-g729e3d0919844 #2
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:touch state:D stack: 0 pid: 207 ppid: 167 flags:0x00000000
    Call Trace:
    [...]
    lock_page include/linux/pagemap.h:548 [inline]
    pagecache_get_page+0x25e/0x310 mm/filemap.c:1682
    find_or_create_page include/linux/pagemap.h:348 [inline]
    grab_cache_page include/linux/pagemap.h:424 [inline]
    f2fs_grab_cache_page fs/f2fs/f2fs.h:2395 [inline]
    f2fs_grab_cache_page fs/f2fs/f2fs.h:2373 [inline]
    __get_node_page.part.0+0x39/0x2d0 fs/f2fs/node.c:1350
    __get_node_page fs/f2fs/node.c:35 [inline]
    f2fs_get_node_page+0x2e/0x60 fs/f2fs/node.c:1399
    read_inline_xattr+0x88/0x140 fs/f2fs/xattr.c:288
    lookup_all_xattrs+0x1f9/0x2c0 fs/f2fs/xattr.c:344
    f2fs_getxattr+0x9b/0x160 fs/f2fs/xattr.c:532
    f2fs_get_context+0x1e/0x20 fs/f2fs/super.c:2460
    fscrypt_get_encryption_info+0x9b/0x450 fs/crypto/keysetup.c:472
    fscrypt_inherit_context+0x2f/0xb0 fs/crypto/policy.c:640
    f2fs_init_inode_metadata+0xab/0x340 fs/f2fs/dir.c:540
    f2fs_add_inline_entry+0x145/0x390 fs/f2fs/inline.c:621
    f2fs_add_dentry+0x31/0x80 fs/f2fs/dir.c:757
    f2fs_do_add_link+0xcd/0x130 fs/f2fs/dir.c:798
    f2fs_add_link fs/f2fs/f2fs.h:3234 [inline]
    f2fs_create+0x104/0x290 fs/f2fs/namei.c:344
    lookup_open.isra.0+0x2de/0x500 fs/namei.c:3103
    open_last_lookups+0xa9/0x340 fs/namei.c:3177
    path_openat+0x8f/0x1b0 fs/namei.c:3365
    do_filp_open+0x87/0x130 fs/namei.c:3395
    do_sys_openat2+0x96/0x150 fs/open.c:1168
    [...]

    That happened because f2fs_add_inline_entry() locks the directory
    inode's page in order to add the dentry, then f2fs_get_context() tries
    to lock it recursively in order to read the encryption xattr. This
    problem is specific to "test_dummy_encryption" because normally the
    directory's fscrypt_info would be set up prior to
    f2fs_add_inline_entry() in order to encrypt the new filename.

    Regardless, the new design fixes this test_dummy_encryption deadlock as
    well as potential deadlocks with fs reclaim, by setting up any needed
    fscrypt_info structs prior to taking so many locks.

    The test_dummy_encryption deadlock was reported by Daniel Rosenberg.

    Reported-by: Daniel Rosenberg
    Acked-by: Jaegeuk Kim
    Link: https://lore.kernel.org/r/20200917041136.178600-5-ebiggers@kernel.org
    Signed-off-by: Eric Biggers

    Eric Biggers