13 Oct, 2018

1 commit

  • commit d3f07c049dab1a3f1740f476afd3d5e5b738c21c upstream.

    syzbot found the following crash on:

    HEAD commit: d9bd94c0bcaa Add linux-next specific files for 20180801
    git tree: linux-next
    console output: https://syzkaller.appspot.com/x/log.txt?x=1001189c400000
    kernel config: https://syzkaller.appspot.com/x/.config?x=cc8964ea4d04518c
    dashboard link: https://syzkaller.appspot.com/bug?extid=c966a82db0b14aa37e81
    compiler: gcc (GCC) 8.0.1 20180413 (experimental)

    Unfortunately, I don't have any reproducer for this crash yet.

    IMPORTANT: if you fix the bug, please add the following tag to the commit:
    Reported-by: syzbot+c966a82db0b14aa37e81@syzkaller.appspotmail.com

    loop7: rw=12288, want=8200, limit=20
    netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'.
    openvswitch: netlink: Message has 8 unknown bytes.
    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] SMP KASAN
    CPU: 1 PID: 7615 Comm: syz-executor7 Not tainted 4.18.0-rc7-next-20180801+ #29
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline]
    RIP: 0010:compound_head include/linux/page-flags.h:142 [inline]
    RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline]
    RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline]
    RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835
    Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00
    RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246
    RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000
    RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005
    RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026
    R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb
    R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40
    FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    f2fs_get_valid_checkpoint+0x436/0x1ec0 fs/f2fs/checkpoint.c:860
    f2fs_fill_super+0x2d42/0x8110 fs/f2fs/super.c:2883
    mount_bdev+0x314/0x3e0 fs/super.c:1344
    f2fs_mount+0x3c/0x50 fs/f2fs/super.c:3133
    legacy_get_tree+0x131/0x460 fs/fs_context.c:729
    vfs_get_tree+0x1cb/0x5c0 fs/super.c:1743
    do_new_mount fs/namespace.c:2603 [inline]
    do_mount+0x6f2/0x1e20 fs/namespace.c:2927
    ksys_mount+0x12d/0x140 fs/namespace.c:3143
    __do_sys_mount fs/namespace.c:3157 [inline]
    __se_sys_mount fs/namespace.c:3154 [inline]
    __x64_sys_mount+0xbe/0x150 fs/namespace.c:3154
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x45943a
    Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 bd 8a fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 3d 01 f0 ff ff 0f 83 9a 8a fb ff c3 66 0f 1f 84 00 00 00 00 00
    RSP: 002b:00007f36a61d4a88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
    RAX: ffffffffffffffda RBX: 00007f36a61d4b30 RCX: 000000000045943a
    RDX: 00007f36a61d4ad0 RSI: 0000000020000100 RDI: 00007f36a61d4af0
    RBP: 0000000020000100 R08: 00007f36a61d4b30 R09: 00007f36a61d4ad0
    R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000013
    R13: 0000000000000000 R14: 00000000004c8ea0 R15: 0000000000000000
    Modules linked in:
    Dumping ftrace buffer:
    (ftrace buffer empty)
    ---[ end trace bd8550c129352286 ]---
    RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline]
    RIP: 0010:compound_head include/linux/page-flags.h:142 [inline]
    RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline]
    RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline]
    RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835
    Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00
    RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246
    RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000
    RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005
    netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'.
    RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026
    openvswitch: netlink: Message has 8 unknown bytes.
    R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb
    R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40
    FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

    In validate_checkpoint(), if we failed to call get_checkpoint_version(), we
    will pass returned invalid page pointer into f2fs_put_page, cause accessing
    invalid memory, this patch tries to handle error path correctly to fix this
    issue.

    Signed-off-by: Chao Yu
    Signed-off-by: Greg Kroah-Hartman

    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

30 May, 2018

1 commit

  • [ Upstream commit cd36d7a17f9da68be9aa67185ba3ad7969934a19 ]

    Once CP_TRIMMED_FLAG is set, after a reboot, we will never issue discard
    before LBA becomes invalid again, fix it by clearing the flag in
    checkpoint without CP_TRIMMED reason.

    Fixes: 1f43e2ad7bff ("f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     

22 Aug, 2017

1 commit

  • This patch supports to enable f2fs to accept quota information through
    mount option:
    - {usr,grp,prj}jquota=
    - jqfmt=

    Then, in ->mount flow, we can recover quota file during log replaying,
    by this, journelled quota can be supported.

    Signed-off-by: Chao Yu
    [Jaegeuk Kim: Fix wrong return values.]
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

10 Aug, 2017

1 commit

  • This patch enables inner app/fs io stats and introduces below virtual fs
    nodes for exposing stats info:
    /sys/fs/f2fs//iostat_enable
    /proc/fs/f2fs//iostat_info

    Signed-off-by: Chao Yu
    [Jaegeuk Kim: fix wrong stat assignment]
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

04 Aug, 2017

1 commit


18 Jul, 2017

1 commit


08 Jul, 2017

2 commits

  • generic/361 reports below warning, this is because: once, there is
    someone entering into critical region of sbi.cp_lock, if write_end_io.
    f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
    deadlock.

    So this patch changes to use spin_{,un}lock_irq{save,restore} to create
    critical region without IRQ enabled to avoid potential deadlock.

    irq event stamp: 83391573
    loop: Write error at byte offset 438729728, length 1024.
    hardirqs last enabled at (83391573): [] restore_all+0xf/0x65
    hardirqs last disabled at (83391572): [] reschedule_interrupt+0x30/0x3c
    loop: Write error at byte offset 438860288, length 1536.
    softirqs last enabled at (83389244): [] __do_softirq+0x1ae/0x476
    softirqs last disabled at (83389237): [] do_softirq_own_stack+0x2c/0x40
    loop: Write error at byte offset 438990848, length 2048.
    ================================
    WARNING: inconsistent lock state
    4.12.0-rc2+ #30 Tainted: G O
    --------------------------------
    inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
    xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
    (&(&sbi->cp_lock)->rlock){?.+...}, at: [] f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
    {HARDIRQ-ON-W} state was registered at:
    __lock_acquire+0x527/0x7b0
    lock_acquire+0xae/0x220
    _raw_spin_lock+0x42/0x50
    do_checkpoint+0x165/0x9e0 [f2fs]
    write_checkpoint+0x33f/0x740 [f2fs]
    __f2fs_sync_fs+0x92/0x1f0 [f2fs]
    f2fs_sync_fs+0x12/0x20 [f2fs]
    sync_filesystem+0x67/0x80
    generic_shutdown_super+0x27/0x100
    kill_block_super+0x22/0x50
    kill_f2fs_super+0x3a/0x40 [f2fs]
    deactivate_locked_super+0x3d/0x70
    deactivate_super+0x40/0x60
    cleanup_mnt+0x39/0x70
    __cleanup_mnt+0x10/0x20
    task_work_run+0x69/0x80
    exit_to_usermode_loop+0x57/0x85
    do_fast_syscall_32+0x18c/0x1b0
    entry_SYSENTER_32+0x4c/0x7b
    irq event stamp: 1957420
    hardirqs last enabled at (1957419): [] _raw_spin_unlock_irq+0x27/0x50
    hardirqs last disabled at (1957420): [] call_function_single_interrupt+0x30/0x3c
    softirqs last enabled at (1953784): [] __do_softirq+0x1ae/0x476
    softirqs last disabled at (1953773): [] do_softirq_own_stack+0x2c/0x40

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(&(&sbi->cp_lock)->rlock);

    lock(&(&sbi->cp_lock)->rlock);

    *** DEADLOCK ***

    2 locks held by xfs_io/7959:
    #0: (sb_writers#13){.+.+.+}, at: [] vfs_write+0x16a/0x190
    #1: (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [] f2fs_file_write_iter+0x25/0x140 [f2fs]

    stack backtrace:
    CPU: 2 PID: 7959 Comm: xfs_io Tainted: G O 4.12.0-rc2+ #30
    Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    Call Trace:
    dump_stack+0x5f/0x92
    print_usage_bug+0x1d3/0x1dd
    ? check_usage_backwards+0xe0/0xe0
    mark_lock+0x23d/0x280
    __lock_acquire+0x699/0x7b0
    ? __this_cpu_preempt_check+0xf/0x20
    ? trace_hardirqs_off_caller+0x91/0xe0
    lock_acquire+0xae/0x220
    ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
    _raw_spin_lock+0x42/0x50
    ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
    f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
    f2fs_write_end_io+0x147/0x150 [f2fs]
    bio_endio+0x7a/0x1e0
    blk_update_request+0xad/0x410
    blk_mq_end_request+0x16/0x60
    lo_complete_rq+0x3c/0x70
    __blk_mq_complete_request_remote+0x11/0x20
    flush_smp_call_function_queue+0x6d/0x120
    ? debug_smp_processor_id+0x12/0x20
    generic_smp_call_function_single_interrupt+0x12/0x30
    smp_call_function_single_interrupt+0x25/0x40
    call_function_single_interrupt+0x37/0x3c
    EIP: _raw_spin_unlock_irq+0x2d/0x50
    EFLAGS: 00000296 CPU: 2
    EAX: 00000001 EBX: d2ccc51c ECX: 00000001 EDX: c1aacebd
    ESI: 00000000 EDI: 00000000 EBP: c96c9d1c ESP: c96c9d18
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    ? inherit_task_group.isra.98.part.99+0x6b/0xb0
    __add_to_page_cache_locked+0x1d4/0x290
    add_to_page_cache_lru+0x38/0xb0
    pagecache_get_page+0x8e/0x200
    f2fs_write_begin+0x96/0xf00 [f2fs]
    ? trace_hardirqs_on_caller+0xdd/0x1c0
    ? current_time+0x17/0x50
    ? trace_hardirqs_on+0xb/0x10
    generic_perform_write+0xa9/0x170
    __generic_file_write_iter+0x1a2/0x1f0
    ? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
    f2fs_file_write_iter+0x6e/0x140 [f2fs]
    ? __lock_acquire+0x429/0x7b0
    __vfs_write+0xc1/0x140
    vfs_write+0x9b/0x190
    SyS_pwrite64+0x63/0xa0
    do_fast_syscall_32+0xa1/0x1b0
    entry_SYSENTER_32+0x4c/0x7b
    EIP: 0xb7786c61
    EFLAGS: 00000293 CPU: 2
    EAX: ffffffda EBX: 00000003 ECX: 08416000 EDX: 00001000
    ESI: 18b24000 EDI: 00000000 EBP: 00000003 ESP: bf9b36b0
    DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

    Fixes: aaec2b1d1879 ("f2fs: introduce cp_lock to protect updating of ckpt_flags")
    Cc: stable@vger.kernel.org
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Skip ->writepages in prior to ->writepage for {meta,node}_inode during
    recovery, hence unneeded loop in ->writepages can be avoided.

    Moreover, check SBI_POR_DOING earlier while writebacking pages.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

24 May, 2017

3 commits


04 May, 2017

2 commits


13 Apr, 2017

1 commit


06 Apr, 2017

1 commit


25 Mar, 2017

2 commits


22 Mar, 2017

4 commits


28 Feb, 2017

3 commits

  • There are four places that getting the crc value in f2fs_checkpoint,
    just add a new helper cur_cp_crc for them.

    Signed-off-by: Kinglong Mee
    Signed-off-by: Jaegeuk Kim

    Kinglong Mee
     
  • Previously kernel message can show that in which function we do the
    injection, but unfortunately, most of the caller are the same, for
    tracking more information of injection path, it needs to show upper
    caller's name. This patch supports that ability.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patches adds bitmaps to represent empty or full NAT blocks containing
    free nid entries.

    If we can find valid crc|cp_ver in the last block of checkpoint pack, we'll
    use these bitmaps when building free nids. In order to avoid checkpointing
    burden, up-to-date bitmaps will be flushed only during umount time. So,
    normally we can get this gain, but when power-cut happens, we rely on fsck.f2fs
    which recovers this bitmap again.

    After this patch, we build free nids from nid #0 at mount time to make more
    full NAT blocks, but in runtime, we check empty NAT blocks to load free nids
    without loading any NAT pages from disk.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

24 Feb, 2017

2 commits


23 Feb, 2017

3 commits


29 Jan, 2017

1 commit

  • This patch relaxes async discard commands to avoid waiting its end_io during
    checkpoint.
    Instead of waiting them during checkpoint, it will be done when actually reusing
    them.

    Test on initial partition of nvme drive.

    # time fstrim /mnt/test

    Before : 6.158s
    After : 4.822s

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

15 Dec, 2016

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "This patch series contains several performance tuning patches
    regarding to the IO submission flow, in addition to supporting new
    features such as a ZBC-base drive and multiple devices.

    It also includes some major bug fixes such as:
    - checkpoint version control
    - fdatasync-related roll-forward recovery routine
    - memory boundary or null-pointer access in corner cases
    - missing error cases

    It has various minor clean-up patches as well"

    * tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits)
    f2fs: fix a missing size change in f2fs_setattr
    f2fs: fix to access nullified flush_cmd_control pointer
    f2fs: free meta pages if sanity check for ckpt is failed
    f2fs: detect wrong layout
    f2fs: call sync_fs when f2fs is idle
    Revert "f2fs: use percpu_counter for # of dirty pages in inode"
    f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage
    f2fs: do not activate auto_recovery for fallocated i_size
    f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack
    f2fs: fix 32-bit build
    f2fs: set ->owner for debugfs status file's file_operations
    f2fs: fix incorrect free inode count in ->statfs
    f2fs: drop duplicate header timer.h
    f2fs: fix wrong AUTO_RECOVER condition
    f2fs: do not recover i_size if it's valid
    f2fs: fix fdatasync
    f2fs: fix to account total free nid correctly
    f2fs: fix an infinite loop when flush nodes in cp
    f2fs: don't wait writeback for datas during checkpoint
    f2fs: fix wrong written_valid_blocks counting
    ...

    Linus Torvalds
     

08 Dec, 2016

1 commit


29 Nov, 2016

1 commit


26 Nov, 2016

1 commit

  • Normally, while committing checkpoint, we will wait on all pages to be
    writebacked no matter the page is data or metadata, so in scenario where
    there are lots of data IO being submitted with metadata, we may suffer
    long latency for waiting writeback during checkpoint.

    Indeed, we only care about persistence for pages with metadata, but not
    pages with data, as file system consistent are only related to metadate,
    so in order to avoid encountering long latency in above scenario, let's
    recognize and reference metadata in submitted IOs, wait writeback only
    for metadatas.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

24 Nov, 2016

4 commits


01 Nov, 2016

1 commit


01 Oct, 2016

1 commit