21 Nov, 2018

1 commit

  • [ Upstream commit daf00ae71dad8aa05965713c62558aeebf2df48e ]

    commit b96672dd840f ("powerpc: Machine check interrupt is a non-
    maskable interrupt") added a call to nmi_enter() at the beginning of
    machine check restart exception handler. Due to that, in_interrupt()
    always returns true regardless of the state before entering the
    exception, and die() panics even when the system was not already in
    interrupt.

    This patch calls nmi_exit() before calling die() in order to restore
    the interrupt state we had before calling nmi_enter()

    Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
    Signed-off-by: Christophe Leroy
    Reviewed-by: Nicholas Piggin
    Signed-off-by: Michael Ellerman
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Christophe Leroy
     

14 Nov, 2018

39 commits

  • Greg Kroah-Hartman
     
  • commit 9e753ba9b9b405e3902d9f08aec5f2ea58a0c317 upstream.

    Commit d595567dc4f0 (MD: fix invalid stored role for a disk) broke linear
    hotadd. Let's only fix the role for disks in raid1/10.
    Based on Guoqing's original patch.

    Reported-by: kernel test robot
    Cc: Gioh Kim
    Cc: Guoqing Jiang
    Signed-off-by: Shaohua Li
    Signed-off-by: Greg Kroah-Hartman

    Shaohua Li
     
  • commit 1ae80cf31938c8f77c37a29bbe29e7f1cd492be8 upstream.

    The map-in-map frequently serves as a mechanism for atomic
    snapshotting of state that a BPF program might record. The current
    implementation is dangerous to use in this way, however, since
    userspace has no way of knowing when all programs that might have
    retrieved the "old" value of the map may have completed.

    This change ensures that map update operations on map-in-map map types
    always wait for all references to the old map to drop before returning
    to userspace.

    Signed-off-by: Daniel Colascione
    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Alexei Starovoitov
    [fengc@google.com: 4.14 backport: adjust context]
    Signed-off-by: Chenbo Feng
    Signed-off-by: Greg Kroah-Hartman

    Daniel Colascione
     
  • commit e72bde6b66299602087c8c2350d36a525e75d06e upstream.

    Marco reported an error with hfsc:
    root@Calimero:~# tc qdisc add dev eth0 root handle 1:0 hfsc default 1
    Error: Attribute failed policy validation.

    Apparently a few implementations pass TCA_OPTIONS as a binary instead
    of nested attribute, so drop TCA_OPTIONS from the policy.

    Fixes: 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes")
    Reported-by: Marco Berizzi
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • commit 4ee3fad34a9cc2cf33303dfbd0cf554248651c86 upstream.

    When we have the no-holes mode enabled and fsync a file after punching a
    hole in it, we can end up not logging the whole hole range in the log tree.
    This happens if the file has extent items that span more than one leaf and
    we punch a hole that covers a range that starts in a leaf but does not go
    beyond the offset of the first extent in the next leaf.

    Example:

    $ mkfs.btrfs -f -O no-holes -n 65536 /dev/sdb
    $ mount /dev/sdb /mnt
    $ for ((i = 0; i /dev/null
    done
    $ sync

    # We now have 2 leafs in our filesystem fs tree, the first leaf has an
    # item corresponding the extent at file offset 216530944 and the second
    # leaf has a first item corresponding to the extent at offset 217055232.
    # Now we punch a hole that partially covers the range of the extent at
    # offset 216530944 but does go beyond the offset 217055232.

    $ xfs_io -c "fpunch $((216530944 + 128 * 1024 - 4000)) 256K" /mnt/foobar
    $ xfs_io -c "fsync" /mnt/foobar

    # mount to replay the log
    $ mount /dev/sdb /mnt

    # Before this patch, only the subrange [216658016, 216662016[ (length of
    # 4000 bytes) was logged, leaving an incorrect file layout after log
    # replay.

    Fix this by checking if there is a hole between the last extent item that
    we processed and the first extent item in the next leaf, and if there is
    one, log an explicit hole extent item.

    Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents")
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit 9084cb6a24bf5838a665af92ded1af8363f9e563 upstream.

    We were iterating a block group's free space cache rbtree without locking
    first the lock that protects it (the free_space_ctl->free_space_offset
    rbtree is protected by the free_space_ctl->tree_lock spinlock).

    KASAN reported an use-after-free problem when iterating such a rbtree due
    to a concurrent rbtree delete:

    [ 9520.359168] ==================================================================
    [ 9520.359656] BUG: KASAN: use-after-free in rb_next+0x13/0x90
    [ 9520.359949] Read of size 8 at addr ffff8800b7ada500 by task btrfs-transacti/1721
    [ 9520.360357]
    [ 9520.360530] CPU: 4 PID: 1721 Comm: btrfs-transacti Tainted: G L 4.19.0-rc8-nbor #555
    [ 9520.360990] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
    [ 9520.362682] Call Trace:
    [ 9520.362887] dump_stack+0xa4/0xf5
    [ 9520.363146] print_address_description+0x78/0x280
    [ 9520.363412] kasan_report+0x263/0x390
    [ 9520.363650] ? rb_next+0x13/0x90
    [ 9520.363873] __asan_load8+0x54/0x90
    [ 9520.364102] rb_next+0x13/0x90
    [ 9520.364380] btrfs_dump_free_space+0x146/0x160 [btrfs]
    [ 9520.364697] dump_space_info+0x2cd/0x310 [btrfs]
    [ 9520.364997] btrfs_reserve_extent+0x1ee/0x1f0 [btrfs]
    [ 9520.365310] __btrfs_prealloc_file_range+0x1cc/0x620 [btrfs]
    [ 9520.365646] ? btrfs_update_time+0x180/0x180 [btrfs]
    [ 9520.365923] ? _raw_spin_unlock+0x27/0x40
    [ 9520.366204] ? btrfs_alloc_data_chunk_ondemand+0x2c0/0x5c0 [btrfs]
    [ 9520.366549] btrfs_prealloc_file_range_trans+0x23/0x30 [btrfs]
    [ 9520.366880] cache_save_setup+0x42e/0x580 [btrfs]
    [ 9520.367220] ? btrfs_check_data_free_space+0xd0/0xd0 [btrfs]
    [ 9520.367518] ? lock_downgrade+0x2f0/0x2f0
    [ 9520.367799] ? btrfs_write_dirty_block_groups+0x11f/0x6e0 [btrfs]
    [ 9520.368104] ? kasan_check_read+0x11/0x20
    [ 9520.368349] ? do_raw_spin_unlock+0xa8/0x140
    [ 9520.368638] btrfs_write_dirty_block_groups+0x2af/0x6e0 [btrfs]
    [ 9520.368978] ? btrfs_start_dirty_block_groups+0x870/0x870 [btrfs]
    [ 9520.369282] ? do_raw_spin_unlock+0xa8/0x140
    [ 9520.369534] ? _raw_spin_unlock+0x27/0x40
    [ 9520.369811] ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
    [ 9520.370137] commit_cowonly_roots+0x4b9/0x610 [btrfs]
    [ 9520.370560] ? commit_fs_roots+0x350/0x350 [btrfs]
    [ 9520.370926] ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
    [ 9520.371285] btrfs_commit_transaction+0x5e5/0x10e0 [btrfs]
    [ 9520.371612] ? btrfs_apply_pending_changes+0x90/0x90 [btrfs]
    [ 9520.371943] ? start_transaction+0x168/0x6c0 [btrfs]
    [ 9520.372257] transaction_kthread+0x21c/0x240 [btrfs]
    [ 9520.372537] kthread+0x1d2/0x1f0
    [ 9520.372793] ? btrfs_cleanup_transaction+0xb50/0xb50 [btrfs]
    [ 9520.373090] ? kthread_park+0xb0/0xb0
    [ 9520.373329] ret_from_fork+0x3a/0x50
    [ 9520.373567]
    [ 9520.373738] Allocated by task 1804:
    [ 9520.373974] kasan_kmalloc+0xff/0x180
    [ 9520.374208] kasan_slab_alloc+0x11/0x20
    [ 9520.374447] kmem_cache_alloc+0xfc/0x2d0
    [ 9520.374731] __btrfs_add_free_space+0x40/0x580 [btrfs]
    [ 9520.375044] unpin_extent_range+0x4f7/0x7a0 [btrfs]
    [ 9520.375383] btrfs_finish_extent_commit+0x15f/0x4d0 [btrfs]
    [ 9520.375707] btrfs_commit_transaction+0xb06/0x10e0 [btrfs]
    [ 9520.376027] btrfs_alloc_data_chunk_ondemand+0x237/0x5c0 [btrfs]
    [ 9520.376365] btrfs_check_data_free_space+0x81/0xd0 [btrfs]
    [ 9520.376689] btrfs_delalloc_reserve_space+0x25/0x80 [btrfs]
    [ 9520.377018] btrfs_direct_IO+0x42e/0x6d0 [btrfs]
    [ 9520.377284] generic_file_direct_write+0x11e/0x220
    [ 9520.377587] btrfs_file_write_iter+0x472/0xac0 [btrfs]
    [ 9520.377875] aio_write+0x25c/0x360
    [ 9520.378106] io_submit_one+0xaa0/0xdc0
    [ 9520.378343] __se_sys_io_submit+0xfa/0x2f0
    [ 9520.378589] __x64_sys_io_submit+0x43/0x50
    [ 9520.378840] do_syscall_64+0x7d/0x240
    [ 9520.379081] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 9520.379387]
    [ 9520.379557] Freed by task 1802:
    [ 9520.379782] __kasan_slab_free+0x173/0x260
    [ 9520.380028] kasan_slab_free+0xe/0x10
    [ 9520.380262] kmem_cache_free+0xc1/0x2c0
    [ 9520.380544] btrfs_find_space_for_alloc+0x4cd/0x4e0 [btrfs]
    [ 9520.380866] find_free_extent+0xa99/0x17e0 [btrfs]
    [ 9520.381166] btrfs_reserve_extent+0xd5/0x1f0 [btrfs]
    [ 9520.381474] btrfs_get_blocks_direct+0x60b/0xbd0 [btrfs]
    [ 9520.381761] __blockdev_direct_IO+0x10ee/0x58a1
    [ 9520.382059] btrfs_direct_IO+0x25a/0x6d0 [btrfs]
    [ 9520.382321] generic_file_direct_write+0x11e/0x220
    [ 9520.382623] btrfs_file_write_iter+0x472/0xac0 [btrfs]
    [ 9520.382904] aio_write+0x25c/0x360
    [ 9520.383172] io_submit_one+0xaa0/0xdc0
    [ 9520.383416] __se_sys_io_submit+0xfa/0x2f0
    [ 9520.383678] __x64_sys_io_submit+0x43/0x50
    [ 9520.383927] do_syscall_64+0x7d/0x240
    [ 9520.384165] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 9520.384439]
    [ 9520.384610] The buggy address belongs to the object at ffff8800b7ada500
    which belongs to the cache btrfs_free_space of size 72
    [ 9520.385175] The buggy address is located 0 bytes inside of
    72-byte region [ffff8800b7ada500, ffff8800b7ada548)
    [ 9520.385691] The buggy address belongs to the page:
    [ 9520.385957] page:ffffea0002deb680 count:1 mapcount:0 mapping:ffff880108a1d700 index:0x0 compound_mapcount: 0
    [ 9520.388030] flags: 0x8100(slab|head)
    [ 9520.388281] raw: 0000000000008100 ffffea0002deb608 ffffea0002728808 ffff880108a1d700
    [ 9520.388722] raw: 0000000000000000 0000000000130013 00000001ffffffff 0000000000000000
    [ 9520.389169] page dumped because: kasan: bad access detected
    [ 9520.389473]
    [ 9520.389658] Memory state around the buggy address:
    [ 9520.389943] ffff8800b7ada400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [ 9520.390368] ffff8800b7ada480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [ 9520.390796] >ffff8800b7ada500: fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc
    [ 9520.391223] ^
    [ 9520.391461] ffff8800b7ada580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [ 9520.391885] ffff8800b7ada600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [ 9520.392313] ==================================================================
    [ 9520.392772] BTRFS critical (device vdc): entry offset 2258497536, bytes 131072, bitmap no
    [ 9520.393247] BUG: unable to handle kernel NULL pointer dereference at 0000000000000011
    [ 9520.393705] PGD 800000010dbab067 P4D 800000010dbab067 PUD 107551067 PMD 0
    [ 9520.394059] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 9520.394378] CPU: 4 PID: 1721 Comm: btrfs-transacti Tainted: G B L 4.19.0-rc8-nbor #555
    [ 9520.394858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
    [ 9520.395350] RIP: 0010:rb_next+0x3c/0x90
    [ 9520.396461] RSP: 0018:ffff8801074ff780 EFLAGS: 00010292
    [ 9520.396762] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81b5ac4c
    [ 9520.397115] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000011
    [ 9520.397468] RBP: ffff8801074ff7a0 R08: ffffed0021d64ccc R09: ffffed0021d64ccc
    [ 9520.397821] R10: 0000000000000001 R11: ffffed0021d64ccb R12: ffff8800b91e0000
    [ 9520.398188] R13: ffff8800a3ceba48 R14: ffff8800b627bf80 R15: 0000000000020000
    [ 9520.398555] FS: 0000000000000000(0000) GS:ffff88010eb00000(0000) knlGS:0000000000000000
    [ 9520.399007] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 9520.399335] CR2: 0000000000000011 CR3: 0000000106b52000 CR4: 00000000000006a0
    [ 9520.399679] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 9520.400023] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 9520.400400] Call Trace:
    [ 9520.400648] btrfs_dump_free_space+0x146/0x160 [btrfs]
    [ 9520.400974] dump_space_info+0x2cd/0x310 [btrfs]
    [ 9520.401287] btrfs_reserve_extent+0x1ee/0x1f0 [btrfs]
    [ 9520.401609] __btrfs_prealloc_file_range+0x1cc/0x620 [btrfs]
    [ 9520.401952] ? btrfs_update_time+0x180/0x180 [btrfs]
    [ 9520.402232] ? _raw_spin_unlock+0x27/0x40
    [ 9520.402522] ? btrfs_alloc_data_chunk_ondemand+0x2c0/0x5c0 [btrfs]
    [ 9520.402882] btrfs_prealloc_file_range_trans+0x23/0x30 [btrfs]
    [ 9520.403261] cache_save_setup+0x42e/0x580 [btrfs]
    [ 9520.403570] ? btrfs_check_data_free_space+0xd0/0xd0 [btrfs]
    [ 9520.403871] ? lock_downgrade+0x2f0/0x2f0
    [ 9520.404161] ? btrfs_write_dirty_block_groups+0x11f/0x6e0 [btrfs]
    [ 9520.404481] ? kasan_check_read+0x11/0x20
    [ 9520.404732] ? do_raw_spin_unlock+0xa8/0x140
    [ 9520.405026] btrfs_write_dirty_block_groups+0x2af/0x6e0 [btrfs]
    [ 9520.405375] ? btrfs_start_dirty_block_groups+0x870/0x870 [btrfs]
    [ 9520.405694] ? do_raw_spin_unlock+0xa8/0x140
    [ 9520.405958] ? _raw_spin_unlock+0x27/0x40
    [ 9520.406243] ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
    [ 9520.406574] commit_cowonly_roots+0x4b9/0x610 [btrfs]
    [ 9520.406899] ? commit_fs_roots+0x350/0x350 [btrfs]
    [ 9520.407253] ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
    [ 9520.407589] btrfs_commit_transaction+0x5e5/0x10e0 [btrfs]
    [ 9520.407925] ? btrfs_apply_pending_changes+0x90/0x90 [btrfs]
    [ 9520.408262] ? start_transaction+0x168/0x6c0 [btrfs]
    [ 9520.408582] transaction_kthread+0x21c/0x240 [btrfs]
    [ 9520.408870] kthread+0x1d2/0x1f0
    [ 9520.409138] ? btrfs_cleanup_transaction+0xb50/0xb50 [btrfs]
    [ 9520.409440] ? kthread_park+0xb0/0xb0
    [ 9520.409682] ret_from_fork+0x3a/0x50
    [ 9520.410508] Dumping ftrace buffer:
    [ 9520.410764] (ftrace buffer empty)
    [ 9520.411007] CR2: 0000000000000011
    [ 9520.411297] ---[ end trace 01a0863445cf360a ]---
    [ 9520.411568] RIP: 0010:rb_next+0x3c/0x90
    [ 9520.412644] RSP: 0018:ffff8801074ff780 EFLAGS: 00010292
    [ 9520.412932] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81b5ac4c
    [ 9520.413274] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000011
    [ 9520.413616] RBP: ffff8801074ff7a0 R08: ffffed0021d64ccc R09: ffffed0021d64ccc
    [ 9520.414007] R10: 0000000000000001 R11: ffffed0021d64ccb R12: ffff8800b91e0000
    [ 9520.414349] R13: ffff8800a3ceba48 R14: ffff8800b627bf80 R15: 0000000000020000
    [ 9520.416074] FS: 0000000000000000(0000) GS:ffff88010eb00000(0000) knlGS:0000000000000000
    [ 9520.416536] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 9520.416848] CR2: 0000000000000011 CR3: 0000000106b52000 CR4: 00000000000006a0
    [ 9520.418477] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 9520.418846] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 9520.419204] Kernel panic - not syncing: Fatal exception
    [ 9520.419666] Dumping ftrace buffer:
    [ 9520.419930] (ftrace buffer empty)
    [ 9520.420168] Kernel Offset: disabled
    [ 9520.420406] ---[ end Kernel panic - not syncing: Fatal exception ]---

    Fix this by acquiring the respective lock before iterating the rbtree.

    Reported-by: Nikolay Borisov
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Josef Bacik
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit 421f0922a2cfb0c75acd9746454aaa576c711a65 upstream.

    At inode.c:evict_inode_truncate_pages(), when we iterate over the
    inode's extent states, we access an extent state record's "state" field
    after we unlocked the inode's io tree lock. This can lead to a
    use-after-free issue because after we unlock the io tree that extent
    state record might have been freed due to being merged into another
    adjacent extent state record (a previous inflight bio for a read
    operation finished in the meanwhile which unlocked a range in the io
    tree and cause a merge of extent state records, as explained in the
    comment before the while loop added in commit 6ca0709756710 ("Btrfs: fix
    hang during inode eviction due to concurrent readahead")).

    Fix this by keeping a copy of the extent state's flags in a local
    variable and using it after unlocking the io tree.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201189
    Fixes: b9d0b38928e2 ("btrfs: Add handler for invalidate page")
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Qu Wenruo
    Signed-off-by: Filipe Manana
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit c495144bc6962186feae31d687596d2472000e45 upstream.

    We're getting a lockdep splat because we take the dio_sem under the
    log_mutex. What we really need is to protect fsync() from logging an
    extent map for an extent we never waited on higher up, so just guard the
    whole thing with dio_sem.

    ======================================================
    WARNING: possible circular locking dependency detected
    4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411 Not tainted
    ------------------------------------------------------
    aio-dio-invalid/30928 is trying to acquire lock:
    0000000092621cfd (&mm->mmap_sem){++++}, at: get_user_pages_unlocked+0x5a/0x1e0

    but task is already holding lock:
    00000000cefe6b35 (&ei->dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #5 (&ei->dio_sem){++++}:
    lock_acquire+0xbd/0x220
    down_write+0x51/0xb0
    btrfs_log_changed_extents+0x80/0xa40
    btrfs_log_inode+0xbaf/0x1000
    btrfs_log_inode_parent+0x26f/0xa80
    btrfs_log_dentry_safe+0x50/0x70
    btrfs_sync_file+0x357/0x540
    do_fsync+0x38/0x60
    __ia32_sys_fdatasync+0x12/0x20
    do_fast_syscall_32+0x9a/0x2f0
    entry_SYSENTER_compat+0x84/0x96

    -> #4 (&ei->log_mutex){+.+.}:
    lock_acquire+0xbd/0x220
    __mutex_lock+0x86/0xa10
    btrfs_record_unlink_dir+0x2a/0xa0
    btrfs_unlink+0x5a/0xc0
    vfs_unlink+0xb1/0x1a0
    do_unlinkat+0x264/0x2b0
    do_fast_syscall_32+0x9a/0x2f0
    entry_SYSENTER_compat+0x84/0x96

    -> #3 (sb_internal#2){.+.+}:
    lock_acquire+0xbd/0x220
    __sb_start_write+0x14d/0x230
    start_transaction+0x3e6/0x590
    btrfs_evict_inode+0x475/0x640
    evict+0xbf/0x1b0
    btrfs_run_delayed_iputs+0x6c/0x90
    cleaner_kthread+0x124/0x1a0
    kthread+0x106/0x140
    ret_from_fork+0x3a/0x50

    -> #2 (&fs_info->cleaner_delayed_iput_mutex){+.+.}:
    lock_acquire+0xbd/0x220
    __mutex_lock+0x86/0xa10
    btrfs_alloc_data_chunk_ondemand+0x197/0x530
    btrfs_check_data_free_space+0x4c/0x90
    btrfs_delalloc_reserve_space+0x20/0x60
    btrfs_page_mkwrite+0x87/0x520
    do_page_mkwrite+0x31/0xa0
    __handle_mm_fault+0x799/0xb00
    handle_mm_fault+0x7c/0xe0
    __do_page_fault+0x1d3/0x4a0
    async_page_fault+0x1e/0x30

    -> #1 (sb_pagefaults){.+.+}:
    lock_acquire+0xbd/0x220
    __sb_start_write+0x14d/0x230
    btrfs_page_mkwrite+0x6a/0x520
    do_page_mkwrite+0x31/0xa0
    __handle_mm_fault+0x799/0xb00
    handle_mm_fault+0x7c/0xe0
    __do_page_fault+0x1d3/0x4a0
    async_page_fault+0x1e/0x30

    -> #0 (&mm->mmap_sem){++++}:
    __lock_acquire+0x42e/0x7a0
    lock_acquire+0xbd/0x220
    down_read+0x48/0xb0
    get_user_pages_unlocked+0x5a/0x1e0
    get_user_pages_fast+0xa4/0x150
    iov_iter_get_pages+0xc3/0x340
    do_direct_IO+0xf93/0x1d70
    __blockdev_direct_IO+0x32d/0x1c20
    btrfs_direct_IO+0x227/0x400
    generic_file_direct_write+0xcf/0x180
    btrfs_file_write_iter+0x308/0x58c
    aio_write+0xf8/0x1d0
    io_submit_one+0x3a9/0x620
    __ia32_compat_sys_io_submit+0xb2/0x270
    do_int80_syscall_32+0x5b/0x1a0
    entry_INT80_compat+0x88/0xa0

    other info that might help us debug this:

    Chain exists of:
    &mm->mmap_sem --> &ei->log_mutex --> &ei->dio_sem

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&ei->dio_sem);
    lock(&ei->log_mutex);
    lock(&ei->dio_sem);
    lock(&mm->mmap_sem);

    *** DEADLOCK ***

    1 lock held by aio-dio-invalid/30928:
    #0: 00000000cefe6b35 (&ei->dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400

    stack backtrace:
    CPU: 0 PID: 30928 Comm: aio-dio-invalid Not tainted 4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
    Call Trace:
    dump_stack+0x7c/0xbb
    print_circular_bug.isra.37+0x297/0x2a4
    check_prev_add.constprop.45+0x781/0x7a0
    ? __lock_acquire+0x42e/0x7a0
    validate_chain.isra.41+0x7f0/0xb00
    __lock_acquire+0x42e/0x7a0
    lock_acquire+0xbd/0x220
    ? get_user_pages_unlocked+0x5a/0x1e0
    down_read+0x48/0xb0
    ? get_user_pages_unlocked+0x5a/0x1e0
    get_user_pages_unlocked+0x5a/0x1e0
    get_user_pages_fast+0xa4/0x150
    iov_iter_get_pages+0xc3/0x340
    do_direct_IO+0xf93/0x1d70
    ? __alloc_workqueue_key+0x358/0x490
    ? __blockdev_direct_IO+0x14b/0x1c20
    __blockdev_direct_IO+0x32d/0x1c20
    ? btrfs_run_delalloc_work+0x40/0x40
    ? can_nocow_extent+0x490/0x490
    ? kvm_clock_read+0x1f/0x30
    ? can_nocow_extent+0x490/0x490
    ? btrfs_run_delalloc_work+0x40/0x40
    btrfs_direct_IO+0x227/0x400
    ? btrfs_run_delalloc_work+0x40/0x40
    generic_file_direct_write+0xcf/0x180
    btrfs_file_write_iter+0x308/0x58c
    aio_write+0xf8/0x1d0
    ? kvm_clock_read+0x1f/0x30
    ? __might_fault+0x3e/0x90
    io_submit_one+0x3a9/0x620
    ? io_submit_one+0xe5/0x620
    __ia32_compat_sys_io_submit+0xb2/0x270
    do_int80_syscall_32+0x5b/0x1a0
    entry_INT80_compat+0x88/0xa0

    CC: stable@vger.kernel.org # 4.14+
    Reviewed-by: Filipe Manana
    Signed-off-by: Josef Bacik
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 30928e9baac238a7330085a1c5747f0b5df444b4 upstream.

    This could result in a really bad case where we do something like

    evict
    evict_refill_and_join
    btrfs_commit_transaction
    btrfs_run_delayed_iputs
    evict
    evict_refill_and_join
    btrfs_commit_transaction
    ... forever

    We have plenty of other places where we run delayed iputs that are much
    safer, let those do the work.

    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Filipe Manana
    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 49940bdd57779c78462da7aa5a8650b2fea8c2ff upstream.

    When we insert the file extent once the ordered extent completes we free
    the reserved extent reservation as it'll have been migrated to the
    bytes_used counter. However if we error out after this step we'll still
    clear the reserved extent reservation, resulting in a negative
    accounting of the reserved bytes for the block group and space info.
    Fix this by only doing the free if we didn't successfully insert a file
    extent for this extent.

    CC: stable@vger.kernel.org # 4.14+
    Reviewed-by: Omar Sandoval
    Reviewed-by: Filipe Manana
    Signed-off-by: Josef Bacik
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit fb5c39d7a887108087de6ff93d3f326b01b4ef41 upstream.

    max_extent_size is supposed to be the largest contiguous range for the
    space info, and ctl->free_space is the total free space in the block
    group. We need to keep track of these separately and _only_ use the
    max_free_space if we don't have a max_extent_size, as that means our
    original request was too large to search any of the block groups for and
    therefore wouldn't have a max_extent_size set.

    CC: stable@vger.kernel.org # 4.14+
    Reviewed-by: Filipe Manana
    Signed-off-by: Josef Bacik
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit ad22cf6ea47fa20fbe11ac324a0a15c0a9a4a2a9 upstream.

    We can't use entry->bytes if our entry is a bitmap entry, we need to use
    entry->max_extent_size in that case. Fix up all the logic to make this
    consistent.

    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 7ed586d0a8241e81d58c656c5b315f781fa6fc97 upstream.

    When using the NO_HOLES feature and logging a regular file, we were
    expecting that if we find an inline extent, that either its size in RAM
    (uncompressed and unenconded) matches the size of the file or if it does
    not, that it matches the sector size and it represents compressed data.
    This assertion does not cover a case where the length of the inline extent
    is smaller than the sector size and also smaller the file's size, such
    case is possible through fallocate. Example:

    $ mkfs.btrfs -f -O no-holes /dev/sdb
    $ mount /dev/sdb /mnt

    $ xfs_io -f -c "pwrite -S 0xb60 0 21" /mnt/foobar
    $ xfs_io -c "falloc 40 40" /mnt/foobar
    $ xfs_io -c "fsync" /mnt/foobar

    In the above example we trigger the assertion because the inline extent's
    length is 21 bytes while the file size is 80 bytes. The fallocate() call
    merely updated the file's size and did not touch the existing inline
    extent, as expected.

    So fix this by adjusting the assertion so that an inline extent length
    smaller than the file size is valid if the file size is smaller than the
    filesystem's sector size.

    A test case for fstests follows soon.

    Reported-by: Anatoly Trosinenko
    Fixes: a89ca6f24ffe ("Btrfs: fix fsync after truncate when no_holes feature is enabled")
    CC: stable@vger.kernel.org # 4.14+
    Link: https://lore.kernel.org/linux-btrfs/CAE5jQCfRSBC7n4pUTFJcmHh109=gwyT9mFkCOL+NKfzswmR=_Q@mail.gmail.com/
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit 3527a018c00e5dbada2f9d7ed5576437b6dd5cfb upstream.

    At inode.c:compress_file_range(), under the "free_pages_out" label, we can
    end up dereferencing the "pages" pointer when it has a NULL value. This
    case happens when "start" has a value of 0 and we fail to allocate memory
    for the "pages" pointer. When that happens we jump to the "cont" label and
    then enter the "if (start == 0)" branch where we immediately call the
    cow_file_range_inline() function. If that function returns 0 (success
    creating an inline extent) or an error (like -ENOMEM for example) we jump
    to the "free_pages_out" label and then access "pages[i]" leading to a NULL
    pointer dereference, since "nr_pages" has a value greater than zero at
    that point.

    Fix this by setting "nr_pages" to 0 when we fail to allocate memory for
    the "pages" pointer.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201119
    Fixes: 771ed689d2cd ("Btrfs: Optimize compressed writeback and reads")
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Liu Bo
    Signed-off-by: Filipe Manana
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit 9c7b0c2e8dbfbcd80a71e2cbfe02704f26c185c6 upstream.

    [BUG]
    In the following case, rescan won't zero out the number of qgroup 1/0:

    $ mkfs.btrfs -fq $DEV
    $ mount $DEV /mnt

    $ btrfs quota enable /mnt
    $ btrfs qgroup create 1/0 /mnt
    $ btrfs sub create /mnt/sub
    $ btrfs qgroup assign 0/257 1/0 /mnt

    $ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
    $ btrfs sub snap /mnt/sub /mnt/snap
    $ btrfs quota rescan -w /mnt
    $ btrfs qgroup show -pcre /mnt
    qgroupid rfer excl max_rfer max_excl parent child
    -------- ---- ---- -------- -------- ------ -----
    0/5 16.00KiB 16.00KiB none none --- ---
    0/257 1016.00KiB 16.00KiB none none 1/0 ---
    0/258 1016.00KiB 16.00KiB none none --- ---
    1/0 1016.00KiB 16.00KiB none none --- 0/257

    So far so good, but:

    $ btrfs qgroup remove 0/257 1/0 /mnt
    WARNING: quotas may be inconsistent, rescan needed
    $ btrfs quota rescan -w /mnt
    $ btrfs qgroup show -pcre /mnt
    qgoupid rfer excl max_rfer max_excl parent child
    -------- ---- ---- -------- -------- ------ -----
    0/5 16.00KiB 16.00KiB none none --- ---
    0/257 1016.00KiB 16.00KiB none none --- ---
    0/258 1016.00KiB 16.00KiB none none --- ---
    1/0 1016.00KiB 16.00KiB none none --- ---
    ^^^^^^^^^^ ^^^^^^^^ not cleared

    [CAUSE]
    Before rescan we call qgroup_rescan_zero_tracking() to zero out all
    qgroups' accounting numbers.

    However we don't mark all qgroups dirty, but rely on rescan to do so.

    If we have any high level qgroup without children, it won't be marked
    dirty during rescan, since we cannot reach that qgroup.

    This will cause QGROUP_INFO items of childless qgroups never get updated
    in the quota tree, thus their numbers will stay the same in "btrfs
    qgroup show" output.

    [FIX]
    Just mark all qgroups dirty in qgroup_rescan_zero_tracking(), so even if
    we have childless qgroups, their QGROUP_INFO items will still get
    updated during rescan.

    Reported-by: Misono Tomohiro
    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Qu Wenruo
    Reviewed-by: Misono Tomohiro
    Tested-by: Misono Tomohiro
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Qu Wenruo
     
  • commit 0f375eed92b5a407657532637ed9652611a682f5 upstream.

    In a scenario like the following:

    mkdir /mnt/A # inode 258
    mkdir /mnt/B # inode 259
    touch /mnt/B/bar # inode 260

    sync

    mv /mnt/B/bar /mnt/A/bar
    mv -T /mnt/A /mnt/B
    fsync /mnt/B/bar

    After replaying the log we end up with file bar having 2 hard links, both
    with the name 'bar' and one in the directory with inode number 258 and the
    other in the directory with inode number 259. Also, we end up with the
    directory inode 259 still existing and with the directory inode 258 still
    named as 'A', instead of 'B'. In this scenario, file 'bar' should only
    have one hard link, located at directory inode 258, the directory inode
    259 should not exist anymore and the name for directory inode 258 should
    be 'B'.

    This incorrect behaviour happens because when attempting to log the old
    parents of an inode, we skip any parents that no longer exist. Fix this
    by forcing a full commit if an old parent no longer exists.

    A test case for fstests follows soon.

    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit f2d72f42d5fa3bf33761d9e47201745f624fcff5 upstream.

    When replaying a log which contains a tmpfile (which necessarily has a
    link count of 0) we end up calling inc_nlink(), at
    fs/btrfs/tree-log.c:replay_one_buffer(), which produces a warning like
    the following:

    [195191.943673] WARNING: CPU: 0 PID: 6924 at fs/inode.c:342 inc_nlink+0x33/0x40
    [195191.943723] CPU: 0 PID: 6924 Comm: mount Not tainted 4.19.0-rc6-btrfs-next-38 #1
    [195191.943724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
    [195191.943726] RIP: 0010:inc_nlink+0x33/0x40
    [195191.943728] RSP: 0018:ffffb96e425e3870 EFLAGS: 00010246
    [195191.943730] RAX: 0000000000000000 RBX: ffff8c0d1e6af4f0 RCX: 0000000000000006
    [195191.943731] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8c0d1e6af4f0
    [195191.943731] RBP: 0000000000000097 R08: 0000000000000001 R09: 0000000000000000
    [195191.943732] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb96e425e3a60
    [195191.943733] R13: ffff8c0d10cff0c8 R14: ffff8c0d0d515348 R15: ffff8c0d78a1b3f8
    [195191.943735] FS: 00007f570ee24480(0000) GS:ffff8c0dfb200000(0000) knlGS:0000000000000000
    [195191.943736] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [195191.943737] CR2: 00005593286277c8 CR3: 00000000bb8f2006 CR4: 00000000003606f0
    [195191.943739] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [195191.943740] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [195191.943741] Call Trace:
    [195191.943778] replay_one_buffer+0x797/0x7d0 [btrfs]
    [195191.943802] walk_up_log_tree+0x1c1/0x250 [btrfs]
    [195191.943809] ? rcu_read_lock_sched_held+0x3f/0x70
    [195191.943825] walk_log_tree+0xae/0x1d0 [btrfs]
    [195191.943840] btrfs_recover_log_trees+0x1d7/0x4d0 [btrfs]
    [195191.943856] ? replay_dir_deletes+0x280/0x280 [btrfs]
    [195191.943870] open_ctree+0x1c3b/0x22a0 [btrfs]
    [195191.943887] btrfs_mount_root+0x6b4/0x800 [btrfs]
    [195191.943894] ? rcu_read_lock_sched_held+0x3f/0x70
    [195191.943899] ? pcpu_alloc+0x55b/0x7c0
    [195191.943906] ? mount_fs+0x3b/0x140
    [195191.943908] mount_fs+0x3b/0x140
    [195191.943912] ? __init_waitqueue_head+0x36/0x50
    [195191.943916] vfs_kern_mount+0x62/0x160
    [195191.943927] btrfs_mount+0x134/0x890 [btrfs]
    [195191.943936] ? rcu_read_lock_sched_held+0x3f/0x70
    [195191.943938] ? pcpu_alloc+0x55b/0x7c0
    [195191.943943] ? mount_fs+0x3b/0x140
    [195191.943952] ? btrfs_remount+0x570/0x570 [btrfs]
    [195191.943954] mount_fs+0x3b/0x140
    [195191.943956] ? __init_waitqueue_head+0x36/0x50
    [195191.943960] vfs_kern_mount+0x62/0x160
    [195191.943963] do_mount+0x1f9/0xd40
    [195191.943967] ? memdup_user+0x4b/0x70
    [195191.943971] ksys_mount+0x7e/0xd0
    [195191.943974] __x64_sys_mount+0x21/0x30
    [195191.943977] do_syscall_64+0x60/0x1b0
    [195191.943980] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [195191.943983] RIP: 0033:0x7f570e4e524a
    [195191.943986] RSP: 002b:00007ffd83589478 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
    [195191.943989] RAX: ffffffffffffffda RBX: 0000563f335b2060 RCX: 00007f570e4e524a
    [195191.943990] RDX: 0000563f335b2240 RSI: 0000563f335b2280 RDI: 0000563f335b2260
    [195191.943992] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020
    [195191.943993] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000563f335b2260
    [195191.943994] R13: 0000563f335b2240 R14: 0000000000000000 R15: 00000000ffffffff
    [195191.944002] irq event stamp: 8688
    [195191.944010] hardirqs last enabled at (8687): [] console_unlock+0x503/0x640
    [195191.944012] hardirqs last disabled at (8688): [] trace_hardirqs_off_thunk+0x1a/0x1c
    [195191.944018] softirqs last enabled at (8638): [] __set_page_dirty_nobuffers+0x101/0x150
    [195191.944020] softirqs last disabled at (8634): [] wb_wakeup_delayed+0x2e/0x60
    [195191.944022] ---[ end trace 5d6e873a9a0b811a ]---

    This happens because the inode does not have the flag I_LINKABLE set,
    which is a runtime only flag, not meant to be persisted, set when the
    inode is created through open(2) if the flag O_EXCL is not passed to it.
    Except for the warning, there are no other consequences (like corruptions
    or metadata inconsistencies).

    Since it's pointless to replay a tmpfile as it would be deleted in a
    later phase of the log replay procedure (it has a link count of 0), fix
    this by not logging tmpfiles and if a tmpfile is found in a log (created
    by a kernel without this change), skip the replay of the inode.

    A test case for fstests follows soon.

    Fixes: 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay")
    CC: stable@vger.kernel.org # 4.18+
    Reported-by: Martin Steigerwald
    Link: https://lore.kernel.org/linux-btrfs/3666619.NTnn27ZJZE@merkaba/
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit 545e3366db823dc3342ca9d7fea803f829c9062f upstream.

    Allocating new chunks modifies both the extent and chunk tree, which can
    trigger new chunk allocations. So instead of doing list_for_each_safe,
    just do while (!list_empty()) so we make sure we don't exit with other
    pending bg's still on our list.

    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Omar Sandoval
    Reviewed-by: Liu Bo
    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 553cceb49681d60975d00892877d4c871bf220f9 upstream.

    We need to clear the max_extent_size when we clear bits from a bitmap
    since it could have been from the range that contains the
    max_extent_size.

    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Liu Bo
    Signed-off-by: Josef Bacik
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 84de76a2fb217dc1b6bc2965cc397d1648aa1404 upstream.

    If we're allocating a new space cache inode it's likely going to be
    under a transaction handle, so we need to use memalloc_nofs_save() in
    order to avoid deadlocks, and more importantly lockdep messages that
    make xfstests fail.

    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Omar Sandoval
    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 3aa7c7a31c26321696b92841d5103461c6f3f517 upstream.

    While testing my backport I noticed there was a panic if I ran
    generic/416 generic/417 generic/418 all in a row. This just happened to
    uncover a race where we had outstanding IO after we destroy all of our
    workqueues, and then we'd go to queue the endio work on those free'd
    workqueues.

    This is because we aren't waiting for the caching threads to be done
    before freeing everything up, so to fix this make sure we wait on any
    outstanding caching that's being done before we free up the block group,
    so we're sure to be done with all IO by the time we get to
    btrfs_stop_all_workers(). This fixes the panic I was seeing
    consistently in testing.

    ------------[ cut here ]------------
    kernel BUG at fs/btrfs/volumes.c:6112!
    SMP PTI
    Modules linked in:
    CPU: 1 PID: 27165 Comm: kworker/u4:7 Not tainted 4.16.0-02155-g3553e54a578d-dirty #875
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
    Workqueue: btrfs-cache btrfs_cache_helper
    RIP: 0010:btrfs_map_bio+0x346/0x370
    RSP: 0000:ffffc900061e79d0 EFLAGS: 00010202
    RAX: 0000000000000000 RBX: ffff880071542e00 RCX: 0000000000533000
    RDX: ffff88006bb74380 RSI: 0000000000000008 RDI: ffff880078160000
    RBP: 0000000000000001 R08: ffff8800781cd200 R09: 0000000000503000
    R10: ffff88006cd21200 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff8800781cd200 R15: ffff880071542e00
    FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000000817ffc4 CR3: 0000000078314000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    btree_submit_bio_hook+0x8a/0xd0
    submit_one_bio+0x5d/0x80
    read_extent_buffer_pages+0x18a/0x320
    btree_read_extent_buffer_pages+0xbc/0x200
    ? alloc_extent_buffer+0x359/0x3e0
    read_tree_block+0x3d/0x60
    read_block_for_search.isra.30+0x1a5/0x360
    btrfs_search_slot+0x41b/0xa10
    btrfs_next_old_leaf+0x212/0x470
    caching_thread+0x323/0x490
    normal_work_helper+0xc5/0x310
    process_one_work+0x141/0x340
    worker_thread+0x44/0x3c0
    kthread+0xf8/0x130
    ? process_one_work+0x340/0x340
    ? kthread_bind+0x10/0x10
    ret_from_fork+0x35/0x40
    RIP: btrfs_map_bio+0x346/0x370 RSP: ffffc900061e79d0
    ---[ end trace 827eb13e50846033 ]---
    Kernel panic - not syncing: Fatal exception
    Kernel Offset: disabled
    ---[ end Kernel panic - not syncing: Fatal exception

    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Josef Bacik
    Reviewed-by: Omar Sandoval
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 0be88e367fd8fbdb45257615d691f4675dda062f upstream.

    We check whether any device the file system is using supports discard in
    the ioctl call, but then we attempt to trim free extents on every device
    regardless of whether discard is supported. Due to the way we mask off
    EOPNOTSUPP, we can end up issuing the trim operations on each free range
    on devices that don't support it, just wasting time.

    Fixes: 499f377f49f08 ("btrfs: iterate over unused chunk space in FITRIM")
    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Jeff Mahoney
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Jeff Mahoney
     
  • commit d4e329de5e5e21594df2e0dd59da9acee71f133b upstream.

    btrfs_trim_fs iterates over the fs_devices->alloc_list while holding the
    device_list_mutex. The problem is that ->alloc_list is protected by the
    chunk mutex. We don't want to hold the chunk mutex over the trim of the
    entire file system. Fortunately, the ->dev_list list is protected by
    the dev_list mutex and while it will give us all devices, including
    read-only devices, we already just skip the read-only devices. Then we
    can continue to take and release the chunk mutex while scanning each
    device.

    Fixes: 499f377f49f ("btrfs: iterate over unused chunk space in FITRIM")
    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Jeff Mahoney
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Jeff Mahoney
     
  • commit 6ba9fc8e628becf0e3ec94083450d089b0dec5f5 upstream.

    [BUG]
    fstrim on some btrfs only trims the unallocated space, not trimming any
    space in existing block groups.

    [CAUSE]
    Before fstrim_range passed to btrfs_trim_fs(), it gets truncated to
    range [0, super->total_bytes). So later btrfs_trim_fs() will only be
    able to trim block groups in range [0, super->total_bytes).

    While for btrfs, any bytenr aligned to sectorsize is valid, since btrfs
    uses its logical address space, there is nothing limiting the location
    where we put block groups.

    For filesystem with frequent balance, it's quite easy to relocate all
    block groups and bytenr of block groups will start beyond
    super->total_bytes.

    In that case, btrfs will not trim existing block groups.

    [FIX]
    Just remove the truncation in btrfs_ioctl_fitrim(), so btrfs_trim_fs()
    can get the unmodified range, which is normally set to [0, U64_MAX].

    Reported-by: Chris Murphy
    Fixes: f4c697e6406d ("btrfs: return EINVAL if start > total_bytes in fitrim ioctl")
    CC: # v4.4+
    Signed-off-by: Qu Wenruo
    Reviewed-by: Nikolay Borisov
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Qu Wenruo
     
  • commit 93bba24d4b5ad1e5cd8b43f64e66ff9d6355dd20 upstream.

    Function btrfs_trim_fs() doesn't handle errors in a consistent way. If
    error happens when trimming existing block groups, it will skip the
    remaining blocks and continue to trim unallocated space for each device.

    The return value will only reflect the final error from device trimming.

    This patch will fix such behavior by:

    1) Recording the last error from block group or device trimming
    The return value will also reflect the last error during trimming.
    Make developer more aware of the problem.

    2) Continuing trimming if possible
    If we failed to trim one block group or device, we could still try
    the next block group or device.

    3) Report number of failures during block group and device trimming
    It would be less noisy, but still gives user a brief summary of
    what's going wrong.

    Such behavior can avoid confusion for cases like failure to trim the
    first block group and then only unallocated space is trimmed.

    Reported-by: Chris Murphy
    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Qu Wenruo
    Reviewed-by: David Sterba
    [ add bg_ret and dev_ret to the messages ]
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Qu Wenruo
     
  • commit 374b0e2d6ba5da7fd1cadb3247731ff27d011f6f upstream.

    When we hit an I/O error in free_log_tree->walk_log_tree during file system
    shutdown we can crash due to there not being a valid transaction handle.

    Use btrfs_handle_fs_error when there's no transaction handle to use.

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
    IP: free_log_tree+0xd2/0x140 [btrfs]
    PGD 0 P4D 0
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
    Modules linked in:
    CPU: 2 PID: 23544 Comm: umount Tainted: G W 4.12.14-kvmsmall #9 SLE15 (unreleased)
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
    task: ffff96bfd3478880 task.stack: ffffa7cf40d78000
    RIP: 0010:free_log_tree+0xd2/0x140 [btrfs]
    RSP: 0018:ffffa7cf40d7bd10 EFLAGS: 00010282
    RAX: 00000000fffffffb RBX: 00000000fffffffb RCX: 0000000000000002
    RDX: 0000000000000000 RSI: ffff96c02f07d4c8 RDI: 0000000000000282
    RBP: ffff96c013cf1000 R08: ffff96c02f07d4c8 R09: ffff96c02f07d4d0
    R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000
    R13: ffff96c005e800c0 R14: ffffa7cf40d7bdb8 R15: 0000000000000000
    FS: 00007f17856bcfc0(0000) GS:ffff96c03f600000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000060 CR3: 0000000045ed6002 CR4: 00000000003606e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    ? wait_for_writer+0xb0/0xb0 [btrfs]
    btrfs_free_log+0x17/0x30 [btrfs]
    btrfs_drop_and_free_fs_root+0x9a/0xe0 [btrfs]
    btrfs_free_fs_roots+0xc0/0x130 [btrfs]
    ? wait_for_completion+0xf2/0x100
    close_ctree+0xea/0x2e0 [btrfs]
    ? kthread_stop+0x161/0x260
    generic_shutdown_super+0x6c/0x120
    kill_anon_super+0xe/0x20
    btrfs_kill_super+0x13/0x100 [btrfs]
    deactivate_locked_super+0x3f/0x70
    cleanup_mnt+0x3b/0x70
    task_work_run+0x78/0x90
    exit_to_usermode_loop+0x77/0xa6
    do_syscall_64+0x1c5/0x1e0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    RIP: 0033:0x7f1784f90827
    RSP: 002b:00007ffdeeb03118 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
    RAX: 0000000000000000 RBX: 0000556a60c62970 RCX: 00007f1784f90827
    RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000556a60c62b50
    RBP: 0000000000000000 R08: 0000000000000005 R09: 00000000ffffffff
    R10: 0000556a60c63900 R11: 0000000000000246 R12: 0000556a60c62b50
    R13: 00007f17854a81c4 R14: 0000000000000000 R15: 0000000000000000
    RIP: free_log_tree+0xd2/0x140 [btrfs] RSP: ffffa7cf40d7bd10
    CR2: 0000000000000060

    Fixes: 681ae50917df9 ("Btrfs: cleanup reserved space when freeing tree log on error")
    CC: # v3.13
    Signed-off-by: Jeff Mahoney
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Jeff Mahoney
     
  • commit b72c3aba09a53fc7c1824250d71180ca154517a7 upstream.

    [BUG]
    For certain crafted image, whose csum root leaf has missing backref, if
    we try to trigger write with data csum, it could cause deadlock with the
    following kernel WARN_ON():

    WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400
    CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ #8
    Workqueue: btrfs-endio-write btrfs_endio_write_helper
    RIP: 0010:btrfs_tree_lock+0x3e2/0x400
    Call Trace:
    btrfs_alloc_tree_block+0x39f/0x770
    __btrfs_cow_block+0x285/0x9e0
    btrfs_cow_block+0x191/0x2e0
    btrfs_search_slot+0x492/0x1160
    btrfs_lookup_csum+0xec/0x280
    btrfs_csum_file_blocks+0x2be/0xa60
    add_pending_csums+0xaf/0xf0
    btrfs_finish_ordered_io+0x74b/0xc90
    finish_ordered_fn+0x15/0x20
    normal_work_helper+0xf6/0x500
    btrfs_endio_write_helper+0x12/0x20
    process_one_work+0x302/0x770
    worker_thread+0x81/0x6d0
    kthread+0x180/0x1d0
    ret_from_fork+0x35/0x40

    [CAUSE]
    That crafted image has missing backref for csum tree root leaf. And
    when we try to allocate new tree block, since there is no
    EXTENT/METADATA_ITEM for csum tree root, btrfs consider it's free slot
    and use it.

    The extent tree of the image looks like:

    Normal image | This fuzzed image
    ----------------------------------+--------------------------------
    BG 29360128 | BG 29360128
    One empty slot | One empty slot
    29364224: backref to UUID tree | 29364224: backref to UUID tree
    Two empty slots | Two empty slots
    29376512: backref to CSUM tree | One empty slot (bad type) <<<
    29380608: backref to D_RELOC tree | 29380608: backref to D_RELOC tree
    ... | ...

    Since bytenr 29376512 has no METADATA/EXTENT_ITEM, when btrfs try to
    alloc tree block, it's an valid slot for btrfs.

    And for finish_ordered_write, when we need to insert csum, we try to CoW
    csum tree root.

    By accident, empty slots at bytenr BG_OFFSET, BG_OFFSET + 8K,
    BG_OFFSET + 12K is already used by tree block COW for other trees, the
    next empty slot is BG_OFFSET + 16K, which should be the backref for CSUM
    tree.

    But due to the bad type, btrfs can recognize it and still consider it as
    an empty slot, and will try to use it for csum tree CoW.

    Then in the following call trace, we will try to lock the new tree
    block, which turns out to be the old csum tree root which is already
    locked:

    btrfs_search_slot() called on csum tree root, which is at 29376512
    |- btrfs_cow_block()
    |- btrfs_set_lock_block()
    | |- Now locks tree block 29376512 (old csum tree root)
    |- __btrfs_cow_block()
    |- btrfs_alloc_tree_block()
    |- btrfs_reserve_extent()
    | Now it returns tree block 29376512, which extent tree
    | shows its empty slot, but it's already hold by csum tree
    |- btrfs_init_new_buffer()
    |- btrfs_tree_lock()
    | Triggers WARN_ON(eb->lock_owner == current->pid)
    |- wait_event()
    Wait lock owner to release the lock, but it's
    locked by ourself, so it will deadlock

    [FIX]
    This patch will do the lock_owner and current->pid check at
    btrfs_init_new_buffer().
    So above deadlock can be avoided.

    Since such problem can only happen in crafted image, we will still
    trigger kernel warning for later aborted transaction, but with a little
    more meaningful warning message.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405
    Reported-by: Xu Wen
    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Qu Wenruo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Qu Wenruo
     
  • commit 65c6e82becec33731f48786e5a30f98662c86b16 upstream.

    [BUG]
    When mounting certain crafted image, btrfs will trigger kernel BUG_ON()
    when trying to recover balance:

    kernel BUG at fs/btrfs/extent-tree.c:8956!
    invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 1 PID: 662 Comm: mount Not tainted 4.18.0-rc1-custom+ #10
    RIP: 0010:walk_up_proc+0x336/0x480 [btrfs]
    RSP: 0018:ffffb53540c9b890 EFLAGS: 00010202
    Call Trace:
    walk_up_tree+0x172/0x1f0 [btrfs]
    btrfs_drop_snapshot+0x3a4/0x830 [btrfs]
    merge_reloc_roots+0xe1/0x1d0 [btrfs]
    btrfs_recover_relocation+0x3ea/0x420 [btrfs]
    open_ctree+0x1af3/0x1dd0 [btrfs]
    btrfs_mount_root+0x66b/0x740 [btrfs]
    mount_fs+0x3b/0x16a
    vfs_kern_mount.part.9+0x54/0x140
    btrfs_mount+0x16d/0x890 [btrfs]
    mount_fs+0x3b/0x16a
    vfs_kern_mount.part.9+0x54/0x140
    do_mount+0x1fd/0xda0
    ksys_mount+0xba/0xd0
    __x64_sys_mount+0x21/0x30
    do_syscall_64+0x60/0x210
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    [CAUSE]
    Extent tree corruption. In this particular case, reloc tree root's
    owner is DATA_RELOC_TREE (should be TREE_RELOC), thus its backref is
    corrupted and we failed the owner check in walk_up_tree().

    [FIX]
    It's pretty hard to take care of every extent tree corruption, but at
    least we can remove such BUG_ON() and exit more gracefully.

    And since in this particular image, DATA_RELOC_TREE and TREE_RELOC share
    the same root (which is obviously invalid), we needs to make
    __del_reloc_root() more robust to detect such invalid sharing to avoid
    possible NULL dereference as root->node can be NULL in this case.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=200411
    Reported-by: Xu Wen
    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Qu Wenruo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Qu Wenruo
     
  • commit 3628b4ca64f24a4ec55055597d0cb1c814729f8b upstream.

    Some qgroup trace events like btrfs_qgroup_release_data() and
    btrfs_qgroup_free_delayed_ref() can still be triggered even if qgroup is
    not enabled.

    This is caused by the lack of qgroup status check before calling some
    qgroup functions. Thankfully the functions can handle quota disabled
    case well and just do nothing for qgroup disabled case.

    This patch will do earlier check before triggering related trace events.

    And for enabled disabled race case:

    1) For enabled->disabled case
    Disable will wipe out all qgroups data including reservation and
    excl/rfer. Even if we leak some reservation or numbers, it will
    still be cleared, so nothing will go wrong.

    2) For disabled -> enabled case
    Current btrfs_qgroup_release_data() will use extent_io tree to ensure
    we won't underflow reservation. And for delayed_ref we use
    head->qgroup_reserved to record the reserved space, so in that case
    head->qgroup_reserved should be 0 and we won't underflow.

    CC: stable@vger.kernel.org # 4.14+
    Reported-by: Chris Murphy
    Link: https://lore.kernel.org/linux-btrfs/CAJCQCtQau7DtuUUeycCkZ36qjbKuxNzsgqJ7+sJ6W0dK_NLE3w@mail.gmail.com/
    Signed-off-by: Qu Wenruo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Qu Wenruo
     
  • commit 48dc0ef19044bfb69193302fbe3a834e3331b7ae upstream.

    Test ptrace-tm-spd-gpr fails on current kernel (4.19) due to a segmentation
    fault that happens on the child process prior to setting cptr[2] = 1. This
    causes the parent process to wait forever at 'while (!pptr[2])' and the test to
    be killed by the test harness framework by timeout, thus, failing.

    The segmentation fault happens because of a inline assembly being
    generated as:

    0x10000355c lfs f0, 0(0)

    This is reading memory position 0x0 and causing the segmentation fault.

    This code is being generated by ASM_LOAD_FPR_SINGLE_PRECISION(flt_4), where
    flt_4 is passed to the inline assembly block as:

    [flt_4] "r" (&d)

    Since the inline assembly 'r' constraint means any GPR, gpr0 is being
    chosen, thus causing this issue when issuing a Load Floating-Point Single
    instruction.

    This patch simply changes the constraint to 'b', which specify that this
    register will be used as base, and r0 is not allowed to be used, avoiding
    this issue.

    Other than that, removing flt_2 register from the input operands, since it
    is not used by the inline assembly code at all.

    Cc: stable@vger.kernel.org
    Signed-off-by: Breno Leitao
    Acked-by: Segher Boessenkool
    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Breno Leitao
     
  • commit 1dc6bd5e39a29453bdcc17348dd2a89f1aa4004e upstream.

    Fix child-node lookup during probe, which ended up searching the whole
    device tree depth-first starting at the parent rather than just matching
    on its children.

    To make things worse, the parent pmc node could end up being prematurely
    freed as of_find_node_by_name() drops a reference to its first argument.

    Fixes: 3568df3d31d6 ("soc: tegra: Add thermal reset (thermtrip) support to PMC")
    Cc: stable # 4.0
    Cc: Mikko Perttunen
    Signed-off-by: Johan Hovold
    Reviewed-by: Mikko Perttunen
    Signed-off-by: Thierry Reding
    Signed-off-by: Greg Kroah-Hartman

    Johan Hovold
     
  • commit 74121b9aa3cd571ddfff014a9f47db36cae3cda9 upstream.

    Correct the register size of the System Manager node.

    Cc: stable@vger.kernel.org
    Fixes: 78cd6a9d8e154 ("arm64: dts: Add base stratix 10 dtsi")
    Signed-off-by: Thor Thayer
    Signed-off-by: Dinh Nguyen
    Signed-off-by: Greg Kroah-Hartman

    Thor Thayer
     
  • commit ce3bf934f919a7d675c5b7fa4cc233ded9c6256e upstream.

    The address in the SDRAM node was incorrect. Fix this to agree with the
    correct address and to match the reg definition block.

    Cc: stable@vger.kernel.org
    Fixes: 54b4a8f57848b("arm: socfpga: dts: Add Arria10 SDRAM EDAC DTS support")
    Signed-off-by: Thor Thayer
    Signed-off-by: Dinh Nguyen
    Signed-off-by: Greg Kroah-Hartman

    Thor Thayer
     
  • commit 672ca9dd13f1aca0c17516f76fc5b0e8344b3e46 upstream.

    It is possible for corrupted filesystem images to produce very large
    block offsets that may wrap when a length is added, and wrongly pass
    the buffer size test.

    Reported-by: Anatoly Trosinenko
    Signed-off-by: Nicolas Pitre
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Nicolas Pitre
     
  • commit 940c620d6af8fca7d115de40f19870fba415efac upstream.

    Currently a failed allocation of channel->name leads to an
    immediate return without freeing channel. Fix this by setting
    ret to -ENOMEM and jumping to an exit path that kfree's channel.

    Detected by CoverityScan, CID#1473692 ("Resource Leak")

    Fixes: 53e2822e56c7 ("rpmsg: Introduce Qualcomm SMD backend")
    Cc: stable@vger.kernel.org
    Signed-off-by: Colin Ian King
    Signed-off-by: Bjorn Andersson
    Signed-off-by: Greg Kroah-Hartman

    Colin Ian King
     
  • commit 2a6c7c367de82951c98a290a21156770f6f82c84 upstream.

    x0 is not callee-saved in the PCS. So there is no need to specify
    -fcall-used-x0.

    Clang doesn't currently support -fcall-used flags. This patch will help
    building the kernel with clang.

    Tested-by: Nick Desaulniers
    Acked-by: Will Deacon
    Signed-off-by: Tri Vo
    Signed-off-by: Catalin Marinas
    Signed-off-by: Greg Kroah-Hartman

    Tri Vo
     
  • commit a58c37978cf02f6d35d05ee4e9288cb8455f1401 upstream.

    Drop all Adobe references and use the official opRGB standard
    instead.

    Signed-off-by: Hans Verkuil
    Cc: stable@vger.kernel.org
    Acked-by: Daniel Vetter
    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Greg Kroah-Hartman

    Hans Verkuil
     
  • commit afeaade90db4c5dab93f326d9582be1d5954a198 upstream.

    The v4l2-compliance tool complains if a video doesn't start
    with a zero sequence number.

    While this shouldn't cause any real problem for apps, let's
    make it happier, in order to better check the v4l2-compliance
    differences before and after patchsets.

    This is actually an old issue. It is there since at least its
    videobuf2 conversion, e. g. changeset 3829fadc461 ("[media]
    em28xx: convert to videobuf2"), if VB1 wouldn't suffer from
    the same issue.

    Cc: stable@vger.kernel.org
    Fixes: d3829fadc461 ("[media] em28xx: convert to videobuf2")
    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Greg Kroah-Hartman

    Mauro Carvalho Chehab
     
  • commit 15644bfa195bd166d0a5ed76ae2d587f719c3dac upstream.

    Instead of using a register value, use an AMUX name, as otherwise
    VIDIOC_G_AUDIO would fail.

    Cc: stable@vger.kernel.org
    Fixes: 766ed64de554 ("V4L/DVB (11827): Add support for Terratec Grabster AV350")
    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Greg Kroah-Hartman

    Mauro Carvalho Chehab