05 Mar, 2020

11 commits

  • commit 953aa9d136f53e226448dbd801a905c28f8071bf upstream.

    Don't allow passing arbitrary flags as they change behavior including
    memory allocation that the call stack is not prepared for.

    Fixes: ddbca70cc45c ("xfs: allocate xattr buffer on demand")
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Christoph Hellwig
     
  • commit 155fc6ba488a8bdfd1d3be3d7ba98c9cec2b2429 upstream.

    On alpha and s390x:

    fs/ubifs/debug.h:158:11: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘ino_t {aka unsigned int}’ [-Wformat=]
    ...
    fs/ubifs/orphan.c:132:3: note: in expansion of macro ‘dbg_gen’
    dbg_gen("deleted twice ino %lu", orph->inum);
    ...
    fs/ubifs/orphan.c:140:3: note: in expansion of macro ‘dbg_gen’
    dbg_gen("delete later ino %lu", orph->inum);

    __kernel_ino_t is "unsigned long" on most architectures, but not on
    alpha and s390x, where it is "unsigned int". Hence when printing an
    ino_t, it should always be cast to "unsigned long" first.

    Fix this by re-adding the recently removed casts.

    Fixes: 8009ce956c3d2802 ("ubifs: Don't leak orphans on memory during commit")
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Geert Uytterhoeven
     
  • commit 3e5e479a39ce9ed60cd63f7565cc1d9da77c2a4e upstream.

    As Youling reported in mailing list:

    https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs-is-broken-4175666043/

    https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/

    There is a test case can corrupt f2fs image:
    - dd if=/dev/zero of=/swapfile bs=1M count=4096
    - chmod 600 /swapfile
    - mkswap /swapfile
    - swapon --discard /swapfile

    The root cause is f2fs_swap_activate() intends to return zero value
    to setup_swap_extents() to enable SWP_FS mode (swap file goes through
    fs), in this flow, setup_swap_extents() setups swap extent with wrong
    block address range, result in discard_swap() erasing incorrect address.

    Because f2fs_swap_activate() has pinned swapfile, its data block
    address will not change, it's safe to let swap to handle IO through
    raw device, so we can get rid of SWAP_FS mode and initial swap extents
    inside f2fs_swap_activate(), by this way, later discard_swap() can trim
    in right address range.

    Fixes: 4969c06a0d83 ("f2fs: support swap file w/ DIO")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     
  • commit 2b98149c2377bff12be5dd3ce02ae0506e2dd613 upstream.

    It's over-zealous to return hard errors under RCU-walk here, given that
    a REF-walk will be triggered for all other cases handling ".." under
    RCU.

    The original purpose of this check was to ensure that if a rename occurs
    such that a directory is moved outside of the bind-mount which the
    resolution started in, it would be detected and blocked to avoid being
    able to mess with paths outside of the bind-mount. However, triggering a
    new REF-walk is just as effective a solution.

    Cc: "Eric W. Biederman"
    Fixes: 397d425dc26d ("vfs: Test for and handle paths that are unreachable from their mnt_root")
    Suggested-by: Al Viro
    Signed-off-by: Aleksa Sarai
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Aleksa Sarai
     
  • commit d876836204897b6d7d911f942084f69a1e9d5c4d upstream.

    We must set MSG_CMSG_COMPAT if we're in compatability mode, otherwise
    the iovec import for these commands will not do the right thing and fail
    the command with -EINVAL.

    Found by running the test suite compiled as 32-bit.

    Cc: stable@vger.kernel.org
    Fixes: aa1fa28fc73e ("io_uring: add support for recvmsg()")
    Fixes: 0fa03c624d8f ("io_uring: add support for sendmsg()")
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jens Axboe
     
  • commit 37b0b6b8b99c0e1c1f11abbe7cf49b6d03795b3f upstream.

    If sbi->s_flex_groups_allocated is zero and the first allocation fails
    then this code will crash. The problem is that "i--" will set "i" to
    -1 but when we compare "i >= sbi->s_flex_groups_allocated" then the -1
    is type promoted to unsigned and becomes UINT_MAX. Since UINT_MAX
    is more than zero, the condition is true so we call kvfree(new_groups[-1]).
    The loop will carry on freeing invalid memory until it crashes.

    Fixes: 7c990728b99e ("ext4: fix potential race between s_flex_groups online resizing and access")
    Reviewed-by: Suraj Jitindar Singh
    Signed-off-by: Dan Carpenter
    Cc: stable@kernel.org
    Link: https://lore.kernel.org/r/20200228092142.7irbc44yaz3by7nb@kili.mountain
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • [ Upstream commit f52aa79df43c4509146140de0241bc21a4a3b4c7 ]

    A number of the debug statements output file or directory mode
    in hex. Change these to print using octal.

    Signed-off-by: Frank Sorenson
    Signed-off-by: Steve French
    Signed-off-by: Sasha Levin

    Frank Sorenson
     
  • [ Upstream commit 8e4473bb50a1796c9c32b244e5dbc5ee24ead937 ]

    In O_APPEND & O_DIRECT mode, the data from different writers will
    be possibly overlapping each other since they take the shared lock.

    For example, both Writer1 and Writer2 are in O_APPEND and O_DIRECT
    mode:

    Writer1 Writer2

    shared_lock() shared_lock()
    getattr(CAP_SIZE) getattr(CAP_SIZE)
    iocb->ki_pos = EOF iocb->ki_pos = EOF
    write(data1)
    write(data2)
    shared_unlock() shared_unlock()

    The data2 will overlap the data1 from the same file offset, the
    old EOF.

    Switch to exclusive lock instead when O_APPEND is specified.

    Signed-off-by: Xiubo Li
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Sasha Levin

    Xiubo Li
     
  • [ Upstream commit cf5b4059ba7197d6cef9c0e024979d178ed8c8ec ]

    We want to make sure that we revalidate the dentry if and only if
    we've done an OPEN by filename.
    In order to avoid races with remote changes to the directory on the
    server, we want to save the verifier before calling OPEN. The exception
    is if the server returned a delegation with our OPEN, as we then
    know that the filename can't have changed on the server.

    Signed-off-by: Trond Myklebust
    Reviewed-by: Benjamin Coddington
    Tested-by: Benjamin Coddington
    Signed-off-by: Anna Schumaker
    Signed-off-by: Sasha Levin

    Trond Myklebust
     
  • [ Upstream commit 96222d53842dfe54869ec4e1b9d4856daf9105a2 ]

    fstests generic/471 reports a failure when run with MOUNT_OPTIONS="-o
    dax". The reason is that the initial pwrite to an empty file with the
    RWF_NOWAIT flag set does not return -EAGAIN. It turns out that
    dax_iomap_rw doesn't pass that flag through to iomap_apply.

    With this patch applied, generic/471 passes for me.

    Signed-off-by: Jeff Moyer
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/x49r1z86e1d.fsf@segfault.boston.devel.redhat.com
    Signed-off-by: Dan Williams
    Signed-off-by: Sasha Levin

    Jeff Moyer
     
  • [ Upstream commits 9392a27d88b9 and ff002b30181d ]

    Ensure that the async work grabs ->fs from the queueing task if the
    punted commands needs to do lookups.

    We don't have these two commits in 5.4-stable:

    ff002b30181d30cdfbca316dadd099c3ca0d739c
    9392a27d88b9707145d713654eb26f0c29789e50

    because they don't apply with the rework that was done in how io_uring
    handles offload. Since there's no io-wq in 5.4, it doesn't make sense to
    do two patches. I'm attaching my port of the two for 5.4-stable, it's
    been tested. Please queue it up for the next 5.4-stable, thanks!

    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Jens Axboe
     

29 Feb, 2020

23 commits

  • commit 7143b5ac5750f404ff3a594b34fdf3fc2f99f828 upstream.

    This patch drops 'cur_mm' before calling cond_resched(), to prevent
    the sq_thread from spinning even when the user process is finished.

    Before this patch, if the user process ended without closing the
    io_uring fd, the sq_thread continues to spin until the
    'sq_thread_idle' timeout ends.

    In the worst case where the 'sq_thread_idle' parameter is bigger than
    INT_MAX, the sq_thread will spin forever.

    Fixes: 6c271ce2f1d5 ("io_uring: add submission polling")
    Signed-off-by: Stefano Garzarella
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Stefano Garzarella
     
  • commit c7849be9cc2dd2754c48ddbaca27c2de6d80a95d upstream.

    Since commit a3a0e43fd770 ("io_uring: don't enter poll loop if we have
    CQEs pending"), if we already events pending, we won't enter poll loop.
    In case SETUP_IOPOLL and SETUP_SQPOLL are both enabled, if app has
    been terminated and don't reap pending events which are already in cq
    ring, and there are some reqs in poll_list, io_sq_thread will enter
    __io_iopoll_check(), and find pending events, then return, this loop
    will never have a chance to exit.

    I have seen this issue in fio stress tests, to fix this issue, let
    io_sq_thread call io_iopoll_getevents() with argument 'min' being zero,
    and remove __io_iopoll_check().

    Fixes: a3a0e43fd770 ("io_uring: don't enter poll loop if we have CQEs pending")
    Signed-off-by: Xiaoguang Wang
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Xiaoguang Wang
     
  • commit 2c2a7552dd6465e8fde6bc9cccf8d66ed1c1eb72 upstream.

    In crypt_scatterlist, if the crypt_stat argument is not set up
    correctly, the kernel crashes. Instead, by returning an error code
    upstream, the error is handled safely.

    The issue is detected via a static analysis tool written by us.

    Fixes: 237fead619984 (ecryptfs: fs/Makefile and fs/Kconfig)
    Signed-off-by: Aditya Pakki
    Signed-off-by: Tyler Hicks
    Signed-off-by: Greg Kroah-Hartman

    Aditya Pakki
     
  • commit a5ae50dea9111db63d30d700766dd5509602f7ad upstream.

    While logging the prealloc extents of an inode during a fast fsync we call
    btrfs_truncate_inode_items(), through btrfs_log_prealloc_extents(), while
    holding a read lock on a leaf of the inode's root (not the log root, the
    fs/subvol root), and then that function locks the file range in the inode's
    iotree. This can lead to a deadlock when:

    * the fsync is ranged

    * the file has prealloc extents beyond eof

    * writeback for a range different from the fsync range starts
    during the fsync

    * the size of the file is not sector size aligned

    Because when finishing an ordered extent we lock first a file range and
    then try to COW the fs/subvol tree to insert an extent item.

    The following diagram shows how the deadlock can happen.

    CPU 1 CPU 2

    btrfs_sync_file()
    --> for range [0, 1MiB)

    --> inode has a size of
    1MiB and has 1 prealloc
    extent beyond the
    i_size, starting at offset
    4MiB

    flushes all delalloc for the
    range [0MiB, 1MiB) and waits
    for the respective ordered
    extents to complete

    --> before task at CPU 1 locks the
    inode, a write into file range
    [1MiB, 2MiB + 1KiB) is made

    --> i_size is updated to 2MiB + 1KiB

    --> writeback is started for that
    range, [1MiB, 2MiB + 4KiB)
    --> end offset rounded up to
    be sector size aligned

    btrfs_log_dentry_safe()
    btrfs_log_inode_parent()
    btrfs_log_inode()

    btrfs_log_changed_extents()
    btrfs_log_prealloc_extents()
    --> does a search on the
    inode's root
    --> holds a read lock on
    leaf X

    btrfs_finish_ordered_io()
    --> locks range [1MiB, 2MiB + 4KiB)
    --> end offset rounded up
    to be sector size aligned

    --> tries to cow leaf X, through
    insert_reserved_file_extent()
    --> already locked by the
    task at CPU 1

    btrfs_truncate_inode_items()

    --> gets an i_size of
    2MiB + 1KiB, which is
    not sector size
    aligned

    --> tries to lock file
    range [2MiB, (u64)-1)
    --> the start range
    is rounded down
    from 2MiB + 1K
    to 2MiB to be sector
    size aligned

    --> but the subrange
    [2MiB, 2MiB + 4KiB) is
    already locked by
    task at CPU 2 which
    is waiting to get a
    write lock on leaf X
    for which we are
    holding a read lock

    *** deadlock ***

    This results in a stack trace like the following, triggered by test case
    generic/561 from fstests:

    [ 2779.973608] INFO: task kworker/u8:6:247 blocked for more than 120 seconds.
    [ 2779.979536] Not tainted 5.6.0-rc2-btrfs-next-53 #1
    [ 2779.984503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 2779.990136] kworker/u8:6 D 0 247 2 0x80004000
    [ 2779.990457] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
    [ 2779.990466] Call Trace:
    [ 2779.990491] ? __schedule+0x384/0xa30
    [ 2779.990521] schedule+0x33/0xe0
    [ 2779.990616] btrfs_tree_read_lock+0x19e/0x2e0 [btrfs]
    [ 2779.990632] ? remove_wait_queue+0x60/0x60
    [ 2779.990730] btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
    [ 2779.990782] btrfs_search_slot+0x510/0x1000 [btrfs]
    [ 2779.990869] btrfs_lookup_file_extent+0x4a/0x70 [btrfs]
    [ 2779.990944] __btrfs_drop_extents+0x161/0x1060 [btrfs]
    [ 2779.990987] ? mark_held_locks+0x6d/0xc0
    [ 2779.990994] ? __slab_alloc.isra.49+0x99/0x100
    [ 2779.991060] ? insert_reserved_file_extent.constprop.19+0x64/0x300 [btrfs]
    [ 2779.991145] insert_reserved_file_extent.constprop.19+0x97/0x300 [btrfs]
    [ 2779.991222] ? start_transaction+0xdd/0x5c0 [btrfs]
    [ 2779.991291] btrfs_finish_ordered_io+0x4f4/0x840 [btrfs]
    [ 2779.991405] btrfs_work_helper+0xaa/0x720 [btrfs]
    [ 2779.991432] process_one_work+0x26d/0x6a0
    [ 2779.991460] worker_thread+0x4f/0x3e0
    [ 2779.991481] ? process_one_work+0x6a0/0x6a0
    [ 2779.991489] kthread+0x103/0x140
    [ 2779.991499] ? kthread_create_worker_on_cpu+0x70/0x70
    [ 2779.991515] ret_from_fork+0x3a/0x50
    (...)
    [ 2780.026211] INFO: task fsstress:17375 blocked for more than 120 seconds.
    [ 2780.027480] Not tainted 5.6.0-rc2-btrfs-next-53 #1
    [ 2780.028482] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 2780.030035] fsstress D 0 17375 17373 0x00004000
    [ 2780.030038] Call Trace:
    [ 2780.030044] ? __schedule+0x384/0xa30
    [ 2780.030052] schedule+0x33/0xe0
    [ 2780.030075] lock_extent_bits+0x20c/0x320 [btrfs]
    [ 2780.030094] ? btrfs_truncate_inode_items+0xf4/0x1150 [btrfs]
    [ 2780.030098] ? rcu_read_lock_sched_held+0x59/0xa0
    [ 2780.030102] ? remove_wait_queue+0x60/0x60
    [ 2780.030122] btrfs_truncate_inode_items+0x133/0x1150 [btrfs]
    [ 2780.030151] ? btrfs_set_path_blocking+0xb2/0x160 [btrfs]
    [ 2780.030165] ? btrfs_search_slot+0x379/0x1000 [btrfs]
    [ 2780.030195] btrfs_log_changed_extents.isra.8+0x841/0x93e [btrfs]
    [ 2780.030202] ? do_raw_spin_unlock+0x49/0xc0
    [ 2780.030215] ? btrfs_get_num_csums+0x10/0x10 [btrfs]
    [ 2780.030239] btrfs_log_inode+0xf83/0x1124 [btrfs]
    [ 2780.030251] ? __mutex_unlock_slowpath+0x45/0x2a0
    [ 2780.030275] btrfs_log_inode_parent+0x2a0/0xe40 [btrfs]
    [ 2780.030282] ? dget_parent+0xa1/0x370
    [ 2780.030309] btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
    [ 2780.030329] btrfs_sync_file+0x3f3/0x490 [btrfs]
    [ 2780.030339] do_fsync+0x38/0x60
    [ 2780.030343] __x64_sys_fdatasync+0x13/0x20
    [ 2780.030345] do_syscall_64+0x5c/0x280
    [ 2780.030348] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 2780.030356] RIP: 0033:0x7f2d80f6d5f0
    [ 2780.030361] Code: Bad RIP value.
    [ 2780.030362] RSP: 002b:00007ffdba3c8548 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
    [ 2780.030364] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2d80f6d5f0
    [ 2780.030365] RDX: 00007ffdba3c84b0 RSI: 00007ffdba3c84b0 RDI: 0000000000000003
    [ 2780.030367] RBP: 000000000000004a R08: 0000000000000001 R09: 00007ffdba3c855c
    [ 2780.030368] R10: 0000000000000078 R11: 0000000000000246 R12: 00000000000001f4
    [ 2780.030369] R13: 0000000051eb851f R14: 00007ffdba3c85f0 R15: 0000557a49220d90

    So fix this by making btrfs_truncate_inode_items() not lock the range in
    the inode's iotree when the target root is a log root, since it's not
    needed to lock the range for log roots as the protection from the inode's
    lock and log_mutex are all that's needed.

    Fixes: 28553fa992cb28 ("Btrfs: fix race between shrinking truncate and fiemap")
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Josef Bacik
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit 52e29e331070cd7d52a64cbf1b0958212a340e28 upstream.

    The only time we actually leave the path spinning is if we're truncating
    a small amount and don't actually free an extent, which is not a common
    occurrence. We have to set the path blocking in order to add the
    delayed ref anyway, so the first extent we find we set the path to
    blocking and stay blocking for the duration of the operation. With the
    upcoming file extent map stuff there will be another case that we have
    to have the path blocking, so just swap to blocking always.

    Note: this patch also fixes a warning after 28553fa992cb ("Btrfs: fix
    race between shrinking truncate and fiemap") got merged that inserts
    extent locks around truncation so the path must not leave spinning locks
    after btrfs_search_slot.

    [70.794783] BUG: sleeping function called from invalid context at mm/slab.h:565
    [70.794834] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1141, name: rsync
    [70.794863] 5 locks held by rsync/1141:
    [70.794876] #0: ffff888417b9c408 (sb_writers#17){.+.+}, at: mnt_want_write+0x20/0x50
    [70.795030] #1: ffff888428de28e8 (&type->i_mutex_dir_key#13/1){+.+.}, at: lock_rename+0xf1/0x100
    [70.795051] #2: ffff888417b9c608 (sb_internal#2){.+.+}, at: start_transaction+0x394/0x560
    [70.795124] #3: ffff888403081768 (btrfs-fs-01){++++}, at: btrfs_try_tree_write_lock+0x2f/0x160
    [70.795203] #4: ffff888403086568 (btrfs-fs-00){++++}, at: btrfs_try_tree_write_lock+0x2f/0x160
    [70.795222] CPU: 5 PID: 1141 Comm: rsync Not tainted 5.6.0-rc2-backup+ #2
    [70.795362] Call Trace:
    [70.795374] dump_stack+0x71/0xa0
    [70.795445] ___might_sleep.part.96.cold.106+0xa6/0xb6
    [70.795459] kmem_cache_alloc+0x1d3/0x290
    [70.795471] alloc_extent_state+0x22/0x1c0
    [70.795544] __clear_extent_bit+0x3ba/0x580
    [70.795557] ? _raw_spin_unlock_irq+0x24/0x30
    [70.795569] btrfs_truncate_inode_items+0x339/0xe50
    [70.795647] btrfs_evict_inode+0x269/0x540
    [70.795659] ? dput.part.38+0x29/0x460
    [70.795671] evict+0xcd/0x190
    [70.795682] __dentry_kill+0xd6/0x180
    [70.795754] dput.part.38+0x2ad/0x460
    [70.795765] do_renameat2+0x3cb/0x540
    [70.795777] __x64_sys_rename+0x1c/0x20

    Reported-by: Dave Jones
    Fixes: 28553fa992cb ("Btrfs: fix race between shrinking truncate and fiemap")
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Filipe Manana
    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    [ add note ]
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 28553fa992cb28be6a65566681aac6cafabb4f2d upstream.

    When there is a fiemap executing in parallel with a shrinking truncate
    we can end up in a situation where we have extent maps for which we no
    longer have corresponding file extent items. This is generally harmless
    and at the moment the only consequences are missing file extent items
    representing holes after we expand the file size again after the
    truncate operation removed the prealloc extent items, and stale
    information for future fiemap calls (reporting extents that no longer
    exist or may have been reallocated to other files for example).

    Consider the following example:

    1) Our inode has a size of 128KiB, one 128KiB extent at file offset 0
    and a 1MiB prealloc extent at file offset 128KiB;

    2) Task A starts doing a shrinking truncate of our inode to reduce it to
    a size of 64KiB. Before it searches the subvolume tree for file
    extent items to delete, it drops all the extent maps in the range
    from 64KiB to (u64)-1 by calling btrfs_drop_extent_cache();

    3) Task B starts doing a fiemap against our inode. When looking up for
    the inode's extent maps in the range from 128KiB to (u64)-1, it
    doesn't find any in the inode's extent map tree, since they were
    removed by task A. Because it didn't find any in the extent map
    tree, it scans the inode's subvolume tree for file extent items, and
    it finds the 1MiB prealloc extent at file offset 128KiB, then it
    creates an extent map based on that file extent item and adds it to
    inode's extent map tree (this ends up being done by
    btrfs_get_extent()
    Signed-off-by: Filipe Manana
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit e75fd33b3f744f644061a4f9662bd63f5434f806 upstream.

    In btrfs_wait_ordered_range() once we find an ordered extent that has
    finished with an error we exit the loop and don't wait for any other
    ordered extents that might be still in progress.

    All the users of btrfs_wait_ordered_range() expect that there are no more
    ordered extents in progress after that function returns. So past fixes
    such like the ones from the two following commits:

    ff612ba7849964 ("btrfs: fix panic during relocation after ENOSPC before
    writeback happens")

    28aeeac1dd3080 ("Btrfs: fix panic when starting bg cache writeout after
    IO error")

    don't work when there are multiple ordered extents in the range.

    Fix that by making btrfs_wait_ordered_range() wait for all ordered extents
    even after it finds one that had an error.

    Link: https://github.com/kdave/btrfs-progs/issues/228#issuecomment-569777554
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Qu Wenruo
    Reviewed-by: Josef Bacik
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit 1e90315149f3fe148e114a5de86f0196d1c21fa5 upstream.

    btrfs_assert_delayed_root_empty() will check if the delayed root is
    completely empty, but this is a filesystem-wide check. On cleanup we
    may have allowed other transactions to begin, for whatever reason, and
    thus the delayed root is not empty.

    So remove this check from cleanup_one_transation(). This however can
    stay in btrfs_cleanup_transaction(), because it checks only after all of
    the transactions have been properly cleaned up, and thus is valid.

    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Nikolay Borisov
    Reviewed-by: Qu Wenruo
    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 315bf8ef914f31d51d084af950703aa1e09a728c upstream.

    While running my error injection script I hit a panic when we tried to
    clean up the fs_root when freeing the fs_root. This is because
    fs_info->fs_root == PTR_ERR(-EIO), which isn't great. Fix this by
    setting fs_info->fs_root = NULL; if we fail to read the root.

    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Nikolay Borisov
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Qu Wenruo
    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit b778cf962d71a0e737923d55d0432f3bd287258e upstream.

    I hit the following warning while running my error injection stress
    testing:

    WARNING: CPU: 3 PID: 1453 at fs/btrfs/space-info.h:108 btrfs_free_reserved_data_space_noquota+0xfd/0x160 [btrfs]
    RIP: 0010:btrfs_free_reserved_data_space_noquota+0xfd/0x160 [btrfs]
    Call Trace:
    btrfs_free_reserved_data_space+0x4f/0x70 [btrfs]
    __btrfs_prealloc_file_range+0x378/0x470 [btrfs]
    elfcorehdr_read+0x40/0x40
    ? elfcorehdr_read+0x40/0x40
    ? btrfs_commit_transaction+0xca/0xa50 [btrfs]
    ? dput+0xb4/0x2a0
    ? btrfs_log_dentry_safe+0x55/0x70 [btrfs]
    ? btrfs_sync_file+0x30e/0x420 [btrfs]
    ? do_fsync+0x38/0x70
    ? __x64_sys_fdatasync+0x13/0x20
    ? do_syscall_64+0x5b/0x1b0
    ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

    This happens if we fail to insert our reserved file extent. At this
    point we've already converted our reservation from ->bytes_may_use to
    ->bytes_reserved. However once we break we will attempt to free
    everything from [cur_offset, end] from ->bytes_may_use, but our extent
    reservation will overlap part of this.

    Fix this problem by adding ins.offset (our extent allocation size) to
    cur_offset so we remove the actual remaining part from ->bytes_may_use.

    I validated this fix using my inject-error.py script

    python inject-error.py -o should_fail_bio -t cache_save_setup -t \
    __btrfs_prealloc_file_range \
    -t insert_reserved_file_extent.constprop.0 \
    -r "-5" ./run-fsstress.sh

    where run-fsstress.sh simply mounts and runs fsstress on a disk.

    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Qu Wenruo
    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 81f7eb00ff5bb8326e82503a32809421d14abb8a upstream.

    We clean up the delayed references when we abort a transaction but we
    leave the pending qgroup extent records behind, leaking memory.

    This patch destroys the extent records when we destroy the delayed refs
    and makes sure ensure they're gone before releasing the transaction.

    Fixes: 3368d001ba5d ("btrfs: qgroup: Record possible quota-related extent for qgroup.")
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Josef Bacik
    Signed-off-by: Jeff Mahoney
    [ Rebased to latest upstream, remove to_qgroup() helper, use
    rbtree_postorder_for_each_entry_safe() wrapper ]
    Signed-off-by: Qu Wenruo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Jeff Mahoney
     
  • commit cb85f4d23f794e24127f3e562cb3b54b0803f456 upstream.

    If EXT4_EXTENTS_FL is set on an inode while ext4_writepages() is running
    on it, the following warning in ext4_add_complete_io() can be hit:

    WARNING: CPU: 1 PID: 0 at fs/ext4/page-io.c:234 ext4_put_io_end_defer+0xf0/0x120

    Here's a minimal reproducer (not 100% reliable) (root isn't required):

    while true; do
    sync
    done &
    while true; do
    rm -f file
    touch file
    chattr -e file
    echo X >> file
    chattr +e file
    done

    The problem is that in ext4_writepages(), ext4_should_dioread_nolock()
    (which only returns true on extent-based files) is checked once to set
    the number of reserved journal credits, and also again later to select
    the flags for ext4_map_blocks() and copy the reserved journal handle to
    ext4_io_end::handle. But if EXT4_EXTENTS_FL is being concurrently set,
    the first check can see dioread_nolock disabled while the later one can
    see it enabled, causing the reserved handle to unexpectedly be NULL.

    Since changing EXT4_EXTENTS_FL is uncommon, and there may be other races
    related to doing so as well, fix this by synchronizing changing
    EXT4_EXTENTS_FL with ext4_writepages() via the existing
    s_writepages_rwsem (previously called s_journal_flag_rwsem).

    This was originally reported by syzbot without a reproducer at
    https://syzkaller.appspot.com/bug?extid=2202a584a00fffd19fbf,
    but now that dioread_nolock is the default I also started seeing this
    when running syzkaller locally.

    Link: https://lore.kernel.org/r/20200219183047.47417-3-ebiggers@kernel.org
    Reported-by: syzbot+2202a584a00fffd19fbf@syzkaller.appspotmail.com
    Fixes: 6b523df4fb5a ("ext4: use transaction reservation for extent conversion in ext4_end_io")
    Signed-off-by: Eric Biggers
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     
  • commit bbd55937de8f2754adc5792b0f8e5ff7d9c0420e upstream.

    In preparation for making s_journal_flag_rwsem synchronize
    ext4_writepages() with changes to both the EXTENTS and JOURNAL_DATA
    flags (rather than just JOURNAL_DATA as it does currently), rename it to
    s_writepages_rwsem.

    Link: https://lore.kernel.org/r/20200219183047.47417-2-ebiggers@kernel.org
    Signed-off-by: Eric Biggers
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     
  • commit 9db176bceb5c5df4990486709da386edadc6bd1d upstream.

    When CONFIG_QFMT_V2 is configured as a module, the test in
    ext4_feature_set_ok() fails and so mount of filesystems with quota or
    project features fails. Fix the test to use IS_ENABLED macro which
    works properly even for modules.

    Link: https://lore.kernel.org/r/20200221100835.9332-1-jack@suse.cz
    Fixes: d65d87a07476 ("ext4: improve explanation of a mount failure caused by a misconfigured kernel")
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 7c990728b99ed6fbe9c75fc202fce1172d9916da upstream.

    During an online resize an array of s_flex_groups structures gets replaced
    so it can get enlarged. If there is a concurrent access to the array and
    this memory has been reused then this can lead to an invalid memory access.

    The s_flex_group array has been converted into an array of pointers rather
    than an array of structures. This is to ensure that the information
    contained in the structures cannot get out of sync during a resize due to
    an accessor updating the value in the old structure after it has been
    copied but before the array pointer is updated. Since the structures them-
    selves are no longer copied but only the pointers to them this case is
    mitigated.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
    Link: https://lore.kernel.org/r/20200221053458.730016-4-tytso@mit.edu
    Signed-off-by: Suraj Jitindar Singh
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Suraj Jitindar Singh
     
  • commit df3da4ea5a0fc5d115c90d5aa6caa4dd433750a7 upstream.

    During an online resize an array of pointers to s_group_info gets replaced
    so it can get enlarged. If there is a concurrent access to the array in
    ext4_get_group_info() and this memory has been reused then this can lead to
    an invalid memory access.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
    Link: https://lore.kernel.org/r/20200221053458.730016-3-tytso@mit.edu
    Signed-off-by: Suraj Jitindar Singh
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Balbir Singh
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Suraj Jitindar Singh
     
  • commit 1d0c3924a92e69bfa91163bda83c12a994b4d106 upstream.

    During an online resize an array of pointers to buffer heads gets
    replaced so it can get enlarged. If there is a racing block
    allocation or deallocation which uses the old array, and the old array
    has gotten reused this can lead to a GPF or some other random kernel
    memory getting modified.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
    Link: https://lore.kernel.org/r/20200221053458.730016-2-tytso@mit.edu
    Reported-by: Suraj Jitindar Singh
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 9424ef56e13a1f14c57ea161eed3ecfdc7b2770e upstream.

    We tested a soft lockup problem in linux 4.19 which could also
    be found in linux 5.x.

    When dir inode takes up a large number of blocks, and if the
    directory is growing when we are searching, it's possible the
    restart branch could be called many times, and the do while loop
    could hold cpu a long time.

    Here is the call trace in linux 4.19.

    [ 473.756186] Call trace:
    [ 473.756196] dump_backtrace+0x0/0x198
    [ 473.756199] show_stack+0x24/0x30
    [ 473.756205] dump_stack+0xa4/0xcc
    [ 473.756210] watchdog_timer_fn+0x300/0x3e8
    [ 473.756215] __hrtimer_run_queues+0x114/0x358
    [ 473.756217] hrtimer_interrupt+0x104/0x2d8
    [ 473.756222] arch_timer_handler_virt+0x38/0x58
    [ 473.756226] handle_percpu_devid_irq+0x90/0x248
    [ 473.756231] generic_handle_irq+0x34/0x50
    [ 473.756234] __handle_domain_irq+0x68/0xc0
    [ 473.756236] gic_handle_irq+0x6c/0x150
    [ 473.756238] el1_irq+0xb8/0x140
    [ 473.756286] ext4_es_lookup_extent+0xdc/0x258 [ext4]
    [ 473.756310] ext4_map_blocks+0x64/0x5c0 [ext4]
    [ 473.756333] ext4_getblk+0x6c/0x1d0 [ext4]
    [ 473.756356] ext4_bread_batch+0x7c/0x1f8 [ext4]
    [ 473.756379] ext4_find_entry+0x124/0x3f8 [ext4]
    [ 473.756402] ext4_lookup+0x8c/0x258 [ext4]
    [ 473.756407] __lookup_hash+0x8c/0xe8
    [ 473.756411] filename_create+0xa0/0x170
    [ 473.756413] do_mkdirat+0x6c/0x140
    [ 473.756415] __arm64_sys_mkdirat+0x28/0x38
    [ 473.756419] el0_svc_common+0x78/0x130
    [ 473.756421] el0_svc_handler+0x38/0x78
    [ 473.756423] el0_svc+0x8/0xc
    [ 485.755156] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [tmp:5149]

    Add cond_resched() to avoid soft lockup and to provide a better
    system responding.

    Link: https://lore.kernel.org/r/20200215080206.13293-1-luoshijie1@huawei.com
    Signed-off-by: Shijie Luo
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Shijie Luo
     
  • commit 35df4299a6487f323b0aca120ea3f485dfee2ae3 upstream.

    EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by
    KCSAN,

    BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4]

    write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127:
    ext4_write_end+0x4e3/0x750 [ext4]
    ext4_update_i_disksize at fs/ext4/ext4.h:3032
    (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046
    (inlined by) ext4_write_end at fs/ext4/inode.c:1287
    generic_perform_write+0x208/0x2a0
    ext4_buffered_write_iter+0x11f/0x210 [ext4]
    ext4_file_write_iter+0xce/0x9e0 [ext4]
    new_sync_write+0x29c/0x3b0
    __vfs_write+0x92/0xa0
    vfs_write+0x103/0x260
    ksys_write+0x9d/0x130
    __x64_sys_write+0x4c/0x60
    do_syscall_64+0x91/0xb47
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37:
    ext4_writepages+0x10ac/0x1d00 [ext4]
    mpage_map_and_submit_extent at fs/ext4/inode.c:2468
    (inlined by) ext4_writepages at fs/ext4/inode.c:2772
    do_writepages+0x5e/0x130
    __writeback_single_inode+0xeb/0xb20
    writeback_sb_inodes+0x429/0x900
    __writeback_inodes_wb+0xc4/0x150
    wb_writeback+0x4bd/0x870
    wb_workfn+0x6b4/0x960
    process_one_work+0x54c/0xbe0
    worker_thread+0x80/0x650
    kthread+0x1e0/0x200
    ret_from_fork+0x27/0x50

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G W O L 5.5.0-next-20200204+ #5
    Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
    Workqueue: writeback wb_workfn (flush-7:0)

    Since only the read is operating as lockless (outside of the
    "i_data_sem"), load tearing could introduce a logic bug. Fix it by
    adding READ_ONCE() for the read and WRITE_ONCE() for the write.

    Signed-off-by: Qian Cai
    Link: https://lore.kernel.org/r/1581085751-31793-1-git-send-email-cai@lca.pw
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Qian Cai
     
  • commit 8eedabfd66b68a4623beec0789eac54b8c9d0fb6 upstream.

    I found a NULL pointer dereference in ocfs2_block_group_clear_bits().
    The running environment:
    kernel version: 4.19
    A cluster with two nodes, 5 luns mounted on two nodes, and do some
    file operations like dd/fallocate/truncate/rm on every lun with storage
    network disconnection.

    The fallocate operation on dm-23-45 caused an null pointer dereference.

    The information of NULL pointer dereference as follows:
    [577992.878282] JBD2: Error -5 detected when updating journal superblock for dm-23-45.
    [577992.878290] Aborting journal on device dm-23-45.
    ...
    [577992.890778] JBD2: Error -5 detected when updating journal superblock for dm-24-46.
    [577992.890908] __journal_remove_journal_head: freeing b_committed_data
    [577992.890916] (fallocate,88392,52):ocfs2_extend_trans:474 ERROR: status = -30
    [577992.890918] __journal_remove_journal_head: freeing b_committed_data
    [577992.890920] (fallocate,88392,52):ocfs2_rotate_tree_right:2500 ERROR: status = -30
    [577992.890922] __journal_remove_journal_head: freeing b_committed_data
    [577992.890924] (fallocate,88392,52):ocfs2_do_insert_extent:4382 ERROR: status = -30
    [577992.890928] (fallocate,88392,52):ocfs2_insert_extent:4842 ERROR: status = -30
    [577992.890928] __journal_remove_journal_head: freeing b_committed_data
    [577992.890930] (fallocate,88392,52):ocfs2_add_clusters_in_btree:4947 ERROR: status = -30
    [577992.890933] __journal_remove_journal_head: freeing b_committed_data
    [577992.890939] __journal_remove_journal_head: freeing b_committed_data
    [577992.890949] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
    [577992.890950] Mem abort info:
    [577992.890951] ESR = 0x96000004
    [577992.890952] Exception class = DABT (current EL), IL = 32 bits
    [577992.890952] SET = 0, FnV = 0
    [577992.890953] EA = 0, S1PTW = 0
    [577992.890954] Data abort info:
    [577992.890955] ISV = 0, ISS = 0x00000004
    [577992.890956] CM = 0, WnR = 0
    [577992.890958] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f8da07a9
    [577992.890960] [0000000000000020] pgd=0000000000000000
    [577992.890964] Internal error: Oops: 96000004 [#1] SMP
    [577992.890965] Process fallocate (pid: 88392, stack limit = 0x00000000013db2fd)
    [577992.890968] CPU: 52 PID: 88392 Comm: fallocate Kdump: loaded Tainted: G W OE 4.19.36 #1
    [577992.890969] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
    [577992.890971] pstate: 60400009 (nZCv daif +PAN -UAO)
    [577992.891054] pc : _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
    [577992.891082] lr : _ocfs2_free_suballoc_bits+0x618/0x968 [ocfs2]
    [577992.891084] sp : ffff0000c8e2b810
    [577992.891085] x29: ffff0000c8e2b820 x28: 0000000000000000
    [577992.891087] x27: 00000000000006f3 x26: ffffa07957b02e70
    [577992.891089] x25: ffff807c59d50000 x24: 00000000000006f2
    [577992.891091] x23: 0000000000000001 x22: ffff807bd39abc30
    [577992.891093] x21: ffff0000811d9000 x20: ffffa07535d6a000
    [577992.891097] x19: ffff000001681638 x18: ffffffffffffffff
    [577992.891098] x17: 0000000000000000 x16: ffff000080a03df0
    [577992.891100] x15: ffff0000811d9708 x14: 203d207375746174
    [577992.891101] x13: 73203a524f525245 x12: 20373439343a6565
    [577992.891103] x11: 0000000000000038 x10: 0101010101010101
    [577992.891106] x9 : ffffa07c68a85d70 x8 : 7f7f7f7f7f7f7f7f
    [577992.891109] x7 : 0000000000000000 x6 : 0000000000000080
    [577992.891110] x5 : 0000000000000000 x4 : 0000000000000002
    [577992.891112] x3 : ffff000001713390 x2 : 2ff90f88b1c22f00
    [577992.891114] x1 : ffff807bd39abc30 x0 : 0000000000000000
    [577992.891116] Call trace:
    [577992.891139] _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
    [577992.891162] _ocfs2_free_clusters+0x100/0x290 [ocfs2]
    [577992.891185] ocfs2_free_clusters+0x50/0x68 [ocfs2]
    [577992.891206] ocfs2_add_clusters_in_btree+0x198/0x5e0 [ocfs2]
    [577992.891227] ocfs2_add_inode_data+0x94/0xc8 [ocfs2]
    [577992.891248] ocfs2_extend_allocation+0x1bc/0x7a8 [ocfs2]
    [577992.891269] ocfs2_allocate_extents+0x14c/0x338 [ocfs2]
    [577992.891290] __ocfs2_change_file_space+0x3f8/0x610 [ocfs2]
    [577992.891309] ocfs2_fallocate+0xe4/0x128 [ocfs2]
    [577992.891316] vfs_fallocate+0x11c/0x250
    [577992.891317] ksys_fallocate+0x54/0x88
    [577992.891319] __arm64_sys_fallocate+0x28/0x38
    [577992.891323] el0_svc_common+0x78/0x130
    [577992.891325] el0_svc_handler+0x38/0x78
    [577992.891327] el0_svc+0x8/0xc

    My analysis process as follows:
    ocfs2_fallocate
    __ocfs2_change_file_space
    ocfs2_allocate_extents
    ocfs2_extend_allocation
    ocfs2_add_inode_data
    ocfs2_add_clusters_in_btree
    ocfs2_insert_extent
    ocfs2_do_insert_extent
    ocfs2_rotate_tree_right
    ocfs2_extend_rotate_transaction
    ocfs2_extend_trans
    jbd2_journal_restart
    jbd2__journal_restart
    /* handle->h_transaction is NULL,
    * is_handle_aborted(handle) is true
    */
    handle->h_transaction = NULL;
    start_this_handle
    return -EROFS;
    ocfs2_free_clusters
    _ocfs2_free_clusters
    _ocfs2_free_suballoc_bits
    ocfs2_block_group_clear_bits
    ocfs2_journal_access_gd
    __ocfs2_journal_access
    jbd2_journal_get_undo_access
    /* I think jbd2_write_access_granted() will
    * return true, because do_get_write_access()
    * will return -EROFS.
    */
    if (jbd2_write_access_granted(...)) return 0;
    do_get_write_access
    /* handle->h_transaction is NULL, it will
    * return -EROFS here, so do_get_write_access()
    * was not called.
    */
    if (is_handle_aborted(handle)) return -EROFS;
    /* bh2jh(group_bh) is NULL, caused NULL
    pointer dereference */
    undo_bg = (struct ocfs2_group_desc *)
    bh2jh(group_bh)->b_committed_data;

    If handle->h_transaction == NULL, then jbd2_write_access_granted()
    does not really guarantee that journal_head will stay around,
    not even speaking of its b_committed_data. The bh2jh(group_bh)
    can be removed after ocfs2_journal_access_gd() and before call
    "bh2jh(group_bh)->b_committed_data". So, we should move
    is_handle_aborted() check from do_get_write_access() into
    jbd2_journal_get_undo_access() and jbd2_journal_get_write_access()
    before the call to jbd2_write_access_granted().

    Link: https://lore.kernel.org/r/f72a623f-b3f1-381a-d91d-d22a1c83a336@huawei.com
    Signed-off-by: Yan Wang
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jun Piao
    Reviewed-by: Jan Kara
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    wangyan
     
  • commit bd727173e4432fe6cb70ba108dc1f3602c5409d7 upstream.

    If we're allocating a logged extent we attempt to insert an extent
    record for the file extent directly. We increase
    space_info->bytes_reserved, because the extent entry addition will call
    btrfs_update_block_group(), which will convert the ->bytes_reserved to
    ->bytes_used. However if we fail at any point while inserting the
    extent entry we will bail and leave space on ->bytes_reserved, which
    will trigger a WARN_ON() on umount. Fix this by pinning the space if we
    fail to insert, which is what happens in every other failure case that
    involves adding the extent entry.

    CC: stable@vger.kernel.org # 5.4+
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Nikolay Borisov
    Reviewed-by: Qu Wenruo
    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit b4a81b87a4cfe2bb26a4a943b748d96a43ef20e8 upstream.

    In ecryptfs_init_messaging(), if the allocation for 'ecryptfs_msg_ctx_arr'
    fails, the previously allocated 'ecryptfs_daemon_hash' is not deallocated,
    leading to a memory leak bug. To fix this issue, free
    'ecryptfs_daemon_hash' before returning the error.

    Cc: stable@vger.kernel.org
    Fixes: 88b4a07e6610 ("[PATCH] eCryptfs: Public key transport mechanism")
    Signed-off-by: Wenwen Wang
    Signed-off-by: Tyler Hicks
    Signed-off-by: Greg Kroah-Hartman

    Wenwen Wang
     
  • commit fe2e082f5da5b4a0a92ae32978f81507ef37ec66 upstream.

    In parse_tag_1_packet(), if tag 1 packet contains a key larger than
    ECRYPTFS_MAX_ENCRYPTED_KEY_BYTES, no cleanup is executed, leading to a
    memory leak on the allocated 'auth_tok_list_item'. To fix this issue, go to
    the label 'out_free' to perform the cleanup work.

    Cc: stable@vger.kernel.org
    Fixes: dddfa461fc89 ("[PATCH] eCryptfs: Public key; packet management")
    Signed-off-by: Wenwen Wang
    Signed-off-by: Tyler Hicks
    Signed-off-by: Greg Kroah-Hartman

    Wenwen Wang
     

24 Feb, 2020

6 commits

  • [ Upstream commit 2f1398291bf35fe027914ae7a9610d8e601fbfde ]

    Handle the special case of fuse_readpages() wanting to read the last page
    of a hugest file possible and overflowing the end offset in the process.

    This is basically to unbreak xfstests:generic/525 and prevent filesystems
    from doing bad things with an overflowing offset.

    Reported-by: Xiao Yang
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Sasha Levin

    Miklos Szeredi
     
  • [ Upstream commit d6fd41905ec577851734623fb905b1763801f5ef ]

    We ran into a confusing problem where an application wasn't checking
    return code on close and so user didn't realize that the application
    ran out of disk space. log a warning message (once) in these
    cases. For example:

    [ 8407.391909] Out of space writing to \\oleg-server\small-share

    Signed-off-by: Steve French
    Reported-by: Oleg Kravtsov
    Reviewed-by: Ronnie Sahlberg
    Reviewed-by: Pavel Shilovsky
    Signed-off-by: Sasha Levin

    Steve French
     
  • [ Upstream commit 9f198a2ac543eaaf47be275531ad5cbd50db3edf ]

    if seq_file .next fuction does not change position index,
    read after some lseek can generate unexpected output.

    https://bugzilla.kernel.org/show_bug.cgi?id=206283
    Signed-off-by: Vasily Averin
    Signed-off-by: Mike Marshall
    Signed-off-by: Sasha Levin

    Vasily Averin
     
  • [ Upstream commit 123c23c6a7b7ecd2a3d6060bea1d94019f71fd66 ]

    In _nfs42_proc_copy(), 'res->commit_res.verf' is allocated through
    kzalloc() if 'args->sync' is true. In the following code, if
    'res->synchronous' is false, handle_async_copy() will be invoked. If an
    error occurs during the invocation, the following code will not be executed
    and the error will be returned . However, the allocated
    'res->commit_res.verf' is not deallocated, leading to a memory leak. This
    is also true if the invocation of process_copy_commit() returns an error.

    To fix the above leaks, redirect the execution to the 'out' label if an
    error is encountered.

    Signed-off-by: Wenwen Wang
    Signed-off-by: Anna Schumaker
    Signed-off-by: Sasha Levin

    Wenwen Wang
     
  • [ Upstream commit aacee5446a2a1aa35d0a49dab289552578657fb4 ]

    The variable inode may be NULL in reiserfs_insert_item(), but there is
    no check before accessing the member of inode.

    Fix this by adding NULL pointer check before calling reiserfs_debug().

    Link: http://lkml.kernel.org/r/79c5135d-ff25-1cc9-4e99-9f572b88cc00@huawei.com
    Signed-off-by: Yunfeng Ye
    Cc: zhengbin
    Cc: Hu Shiyuan
    Cc: Feilong Lin
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Yunfeng Ye
     
  • [ Upstream commit 9f16ca48fc818a17de8be1f75d08e7f4addc4497 ]

    I found a NULL pointer dereference in ocfs2_update_inode_fsync_trans(),
    handle->h_transaction may be NULL in this situation:

    ocfs2_file_write_iter
    ->__generic_file_write_iter
    ->generic_perform_write
    ->ocfs2_write_begin
    ->ocfs2_write_begin_nolock
    ->ocfs2_write_cluster_by_desc
    ->ocfs2_write_cluster
    ->ocfs2_mark_extent_written
    ->ocfs2_change_extent_flag
    ->ocfs2_split_extent
    ->ocfs2_try_to_merge_extent
    ->ocfs2_extend_rotate_transaction
    ->ocfs2_extend_trans
    ->jbd2_journal_restart
    ->jbd2__journal_restart
    // handle->h_transaction is NULL here
    ->handle->h_transaction = NULL;
    ->start_this_handle
    /* journal aborted due to storage
    network disconnection, return error */
    ->return -EROFS;
    /* line 3806 in ocfs2_try_to_merge_extent (),
    it will ignore ret error. */
    ->ret = 0;
    ->...
    ->ocfs2_write_end
    ->ocfs2_write_end_nolock
    ->ocfs2_update_inode_fsync_trans
    // NULL pointer dereference
    ->oi->i_sync_tid = handle->h_transaction->t_tid;

    The information of NULL pointer dereference as follows:
    JBD2: Detected IO errors while flushing file data on dm-11-45
    Aborting journal on device dm-11-45.
    JBD2: Error -5 detected when updating journal superblock for dm-11-45.
    (dd,22081,3):ocfs2_extend_trans:474 ERROR: status = -30
    (dd,22081,3):ocfs2_try_to_merge_extent:3877 ERROR: status = -30
    Unable to handle kernel NULL pointer dereference at
    virtual address 0000000000000008
    Mem abort info:
    ESR = 0x96000004
    Exception class = DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
    Data abort info:
    ISV = 0, ISS = 0x00000004
    CM = 0, WnR = 0
    user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000e74e1338
    [0000000000000008] pgd=0000000000000000
    Internal error: Oops: 96000004 [#1] SMP
    Process dd (pid: 22081, stack limit = 0x00000000584f35a9)
    CPU: 3 PID: 22081 Comm: dd Kdump: loaded
    Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
    pstate: 60400009 (nZCv daif +PAN -UAO)
    pc : ocfs2_write_end_nolock+0x2b8/0x550 [ocfs2]
    lr : ocfs2_write_end_nolock+0x2a0/0x550 [ocfs2]
    sp : ffff0000459fba70
    x29: ffff0000459fba70 x28: 0000000000000000
    x27: ffff807ccf7f1000 x26: 0000000000000001
    x25: ffff807bdff57970 x24: ffff807caf1d4000
    x23: ffff807cc79e9000 x22: 0000000000001000
    x21: 000000006c6cd000 x20: ffff0000091d9000
    x19: ffff807ccb239db0 x18: ffffffffffffffff
    x17: 000000000000000e x16: 0000000000000007
    x15: ffff807c5e15bd78 x14: 0000000000000000
    x13: 0000000000000000 x12: 0000000000000000
    x11: 0000000000000000 x10: 0000000000000001
    x9 : 0000000000000228 x8 : 000000000000000c
    x7 : 0000000000000fff x6 : ffff807a308ed6b0
    x5 : ffff7e01f10967c0 x4 : 0000000000000018
    x3 : d0bc661572445600 x2 : 0000000000000000
    x1 : 000000001b2e0200 x0 : 0000000000000000
    Call trace:
    ocfs2_write_end_nolock+0x2b8/0x550 [ocfs2]
    ocfs2_write_end+0x4c/0x80 [ocfs2]
    generic_perform_write+0x108/0x1a8
    __generic_file_write_iter+0x158/0x1c8
    ocfs2_file_write_iter+0x668/0x950 [ocfs2]
    __vfs_write+0x11c/0x190
    vfs_write+0xac/0x1c0
    ksys_write+0x6c/0xd8
    __arm64_sys_write+0x24/0x30
    el0_svc_common+0x78/0x130
    el0_svc_handler+0x38/0x78
    el0_svc+0x8/0xc

    To prevent NULL pointer dereference in this situation, we use
    is_handle_aborted() before using handle->h_transaction->t_tid.

    Link: http://lkml.kernel.org/r/03e750ab-9ade-83aa-b000-b9e81e34e539@huawei.com
    Signed-off-by: Yan Wang
    Reviewed-by: Jun Piao
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Cc: Changwei Ge
    Cc: Gang He
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    wangyan