06 Feb, 2020

4 commits

  • commit 5474ca7da6f34fa95e82edc747d5faa19cbdfb5c upstream.

    When a filesystem is mounted with jdev mount option, we store the
    journal device name in an allocated string in superblock. However we
    fail to ever free that string. Fix it.

    Reported-by: syzbot+1c6756baf4b16b94d2a6@syzkaller.appspotmail.com
    Fixes: c3aa077648e1 ("reiserfs: Properly display mount options in /proc/mounts")
    CC: stable@vger.kernel.org
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit eed0f953b90e86e765197a1dad06bb48aedc27fe upstream.

    On filesystems with a block size smaller than the page size,
    gfs2_find_jhead can split a page across two bios (for example, when
    blocks are not allocated consecutively). When that happens, the first
    bio that completes will unlock the page in its bi_end_io handler even
    though the page hasn't been read completely yet. Fix that by using a
    chained bio for the rest of the page.

    While at it, clean up the sector calculation logic in
    gfs2_log_alloc_bio. In gfs2_find_jhead, simplify the disk block and
    offset calculation logic and fix a variable name.

    Fixes: f4686c26ecc3 ("gfs2: read journal in large chunks")
    Cc: stable@vger.kernel.org # v5.2+
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Greg Kroah-Hartman

    Andreas Gruenbacher
     
  • commit c54849ddd832ae0a45cab16bcd1ed2db7da090d7 upstream.

    RHBZ: 1795429

    In recent DFS updates we have a new variable controlling how many times we will
    retry to reconnect the share.
    If DFS is not used, then this variable is initialized to 0 in:

    static inline int
    dfs_cache_get_nr_tgts(const struct dfs_cache_tgt_list *tl)
    {
    return tl ? tl->tl_numtgts : 0;
    }

    This means that in the reconnect loop in smb2_reconnect() we will immediately wrap retries to -1
    and never actually get to pass this conditional:

    if (--retries)
    continue;

    The effect is that we no longer reach the point where we fail the commands with -EHOSTDOWN
    and basically the kernel threads are virtually hung and unkillable.

    Fixes: a3a53b7603798fd8 (cifs: Add support for failover in smb2_reconnect())
    Signed-off-by: Ronnie Sahlberg
    Signed-off-by: Steve French
    Reviewed-by: Paulo Alcantara (SUSE)
    CC: Stable
    Signed-off-by: Greg Kroah-Hartman

    Ronnie Sahlberg
     
  • commit 6404674acd596de41fd3ad5f267b4525494a891a upstream.

    Brown paperbag time: fetching ->i_uid/->i_mode really should've been
    done from nd->inode. I even suggested that, but the reason for that has
    slipped through the cracks and I went for dir->d_inode instead - made
    for more "obvious" patch.

    Analysis:

    - at the entry into do_last() and all the way to step_into(): dir (aka
    nd->path.dentry) is known not to have been freed; so's nd->inode and
    it's equal to dir->d_inode unless we are already doomed to -ECHILD.
    inode of the file to get opened is not known.

    - after step_into(): inode of the file to get opened is known; dir
    might be pointing to freed memory/be negative/etc.

    - at the call of may_create_in_sticky(): guaranteed to be out of RCU
    mode; inode of the file to get opened is known and pinned; dir might
    be garbage.

    The last was the reason for the original patch. Except that at the
    do_last() entry we can be in RCU mode and it is possible that
    nd->path.dentry->d_inode has already changed under us.

    In that case we are going to fail with -ECHILD, but we need to be
    careful; nd->inode is pointing to valid struct inode and it's the same
    as nd->path.dentry->d_inode in "won't fail with -ECHILD" case, so we
    should use that.

    Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)"
    Reported-by: syzbot+190005201ced78a74ad6@syzkaller.appspotmail.com
    Wearing-brown-paperbag: Al Viro
    Cc: stable@kernel.org
    Fixes: d0cb50185ae9 ("do_last(): fetch directory ->i_mode and ->i_uid before it's too late")
    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     

01 Feb, 2020

4 commits

  • commit 0a5a98863c9debc02387b3d23c46d187756f5e2b upstream.

    __smb2_handle_cancelled_cmd() is called under a spin lock held in
    cifs_mid_q_entry_release(), so make its memory allocation GFP_ATOMIC.

    This issue was observed when running xfstests generic/028:

    [ 1722.589204] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72064 cmd: 5
    [ 1722.590687] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72065 cmd: 17
    [ 1722.593529] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72066 cmd: 6
    [ 1723.039014] BUG: sleeping function called from invalid context at mm/slab.h:565
    [ 1723.040710] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 30877, name: cifsd
    [ 1723.045098] CPU: 3 PID: 30877 Comm: cifsd Not tainted 5.5.0-rc4+ #313
    [ 1723.046256] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
    [ 1723.048221] Call Trace:
    [ 1723.048689] dump_stack+0x97/0xe0
    [ 1723.049268] ___might_sleep.cold+0xd1/0xe1
    [ 1723.050069] kmem_cache_alloc_trace+0x204/0x2b0
    [ 1723.051051] __smb2_handle_cancelled_cmd+0x40/0x140 [cifs]
    [ 1723.052137] smb2_handle_cancelled_mid+0xf6/0x120 [cifs]
    [ 1723.053247] cifs_mid_q_entry_release+0x44d/0x630 [cifs]
    [ 1723.054351] ? cifs_reconnect+0x26a/0x1620 [cifs]
    [ 1723.055325] cifs_demultiplex_thread+0xad4/0x14a0 [cifs]
    [ 1723.056458] ? cifs_handle_standard+0x2c0/0x2c0 [cifs]
    [ 1723.057365] ? kvm_sched_clock_read+0x14/0x30
    [ 1723.058197] ? sched_clock+0x5/0x10
    [ 1723.058838] ? sched_clock_cpu+0x18/0x110
    [ 1723.059629] ? lockdep_hardirqs_on+0x17d/0x250
    [ 1723.060456] kthread+0x1ab/0x200
    [ 1723.061149] ? cifs_handle_standard+0x2c0/0x2c0 [cifs]
    [ 1723.062078] ? kthread_create_on_node+0xd0/0xd0
    [ 1723.062897] ret_from_fork+0x3a/0x50

    Signed-off-by: Paulo Alcantara (SUSE)
    Fixes: 9150c3adbf24 ("CIFS: Close open handle after interrupted close")
    Cc: Stable
    Signed-off-by: Steve French
    Reviewed-by: Pavel Shilovsky
    Signed-off-by: Greg Kroah-Hartman

    Paulo Alcantara (SUSE)
     
  • commit 731b82bb1750a906c1e7f070aedf5505995ebea7 upstream.

    Fix two places where we need to adjust down the max response size for
    ioctl when it is used together with compounding.

    Signed-off-by: Ronnie Sahlberg
    Signed-off-by: Steve French
    Reviewed-by: Pavel Shilovsky
    CC: Stable
    Signed-off-by: Greg Kroah-Hartman

    Ronnie Sahlberg
     
  • commit f1f27ad74557e39f67a8331a808b860f89254f2d upstream.

    The task which created the MID may be gone by the time cifsd attempts to
    call the callbacks on MIDs from cifs_reconnect().

    This leads to a use-after-free of the task struct in cifs_wake_up_task:

    ==================================================================
    BUG: KASAN: use-after-free in __lock_acquire+0x31a0/0x3270
    Read of size 8 at addr ffff8880103e3a68 by task cifsd/630

    CPU: 0 PID: 630 Comm: cifsd Not tainted 5.5.0-rc6+ #119
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
    Call Trace:
    dump_stack+0x8e/0xcb
    print_address_description.constprop.5+0x1d3/0x3c0
    ? __lock_acquire+0x31a0/0x3270
    __kasan_report+0x152/0x1aa
    ? __lock_acquire+0x31a0/0x3270
    ? __lock_acquire+0x31a0/0x3270
    kasan_report+0xe/0x20
    __lock_acquire+0x31a0/0x3270
    ? __wake_up_common+0x1dc/0x630
    ? _raw_spin_unlock_irqrestore+0x4c/0x60
    ? mark_held_locks+0xf0/0xf0
    ? _raw_spin_unlock_irqrestore+0x39/0x60
    ? __wake_up_common_lock+0xd5/0x130
    ? __wake_up_common+0x630/0x630
    lock_acquire+0x13f/0x330
    ? try_to_wake_up+0xa3/0x19e0
    _raw_spin_lock_irqsave+0x38/0x50
    ? try_to_wake_up+0xa3/0x19e0
    try_to_wake_up+0xa3/0x19e0
    ? cifs_compound_callback+0x178/0x210
    ? set_cpus_allowed_ptr+0x10/0x10
    cifs_reconnect+0xa1c/0x15d0
    ? generic_ip_connect+0x1860/0x1860
    ? rwlock_bug.part.0+0x90/0x90
    cifs_readv_from_socket+0x479/0x690
    cifs_read_from_socket+0x9d/0xe0
    ? cifs_readv_from_socket+0x690/0x690
    ? mempool_resize+0x690/0x690
    ? rwlock_bug.part.0+0x90/0x90
    ? memset+0x1f/0x40
    ? allocate_buffers+0xff/0x340
    cifs_demultiplex_thread+0x388/0x2a50
    ? cifs_handle_standard+0x610/0x610
    ? rcu_read_lock_held_common+0x120/0x120
    ? mark_lock+0x11b/0xc00
    ? __lock_acquire+0x14ed/0x3270
    ? __kthread_parkme+0x78/0x100
    ? lockdep_hardirqs_on+0x3e8/0x560
    ? lock_downgrade+0x6a0/0x6a0
    ? lockdep_hardirqs_on+0x3e8/0x560
    ? _raw_spin_unlock_irqrestore+0x39/0x60
    ? cifs_handle_standard+0x610/0x610
    kthread+0x2bb/0x3a0
    ? kthread_create_worker_on_cpu+0xc0/0xc0
    ret_from_fork+0x3a/0x50

    Allocated by task 649:
    save_stack+0x19/0x70
    __kasan_kmalloc.constprop.5+0xa6/0xf0
    kmem_cache_alloc+0x107/0x320
    copy_process+0x17bc/0x5370
    _do_fork+0x103/0xbf0
    __x64_sys_clone+0x168/0x1e0
    do_syscall_64+0x9b/0xec0
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 0:
    save_stack+0x19/0x70
    __kasan_slab_free+0x11d/0x160
    kmem_cache_free+0xb5/0x3d0
    rcu_core+0x52f/0x1230
    __do_softirq+0x24d/0x962

    The buggy address belongs to the object at ffff8880103e32c0
    which belongs to the cache task_struct of size 6016
    The buggy address is located 1960 bytes inside of
    6016-byte region [ffff8880103e32c0, ffff8880103e4a40)
    The buggy address belongs to the page:
    page:ffffea000040f800 refcount:1 mapcount:0 mapping:ffff8880108da5c0
    index:0xffff8880103e4c00 compound_mapcount: 0
    raw: 4000000000010200 ffffea00001f2208 ffffea00001e3408 ffff8880108da5c0
    raw: ffff8880103e4c00 0000000000050003 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8880103e3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8880103e3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    >ffff8880103e3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ^
    ffff8880103e3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8880103e3b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ==================================================================

    This can be reliably reproduced by adding the below delay to
    cifs_reconnect(), running find(1) on the mount, restarting the samba
    server while find is running, and killing find during the delay:

    spin_unlock(&GlobalMid_Lock);
    mutex_unlock(&server->srv_mutex);

    + msleep(10000);
    +
    cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
    list_for_each_safe(tmp, tmp2, &retry_list) {
    mid_entry = list_entry(tmp, struct mid_q_entry, qhead);

    Fix this by holding a reference to the task struct until the MID is
    freed.

    Signed-off-by: Vincent Whitchurch
    Signed-off-by: Steve French
    CC: Stable
    Reviewed-by: Paulo Alcantara (SUSE)
    Reviewed-by: Pavel Shilovsky
    Signed-off-by: Greg Kroah-Hartman

    Vincent Whitchurch
     
  • commit a37f4958f7b63d2b3cd17a76151fdfc29ce1da5f upstream.

    When lockdown is enabled, debugfs_is_locked_down returns 1. It will then
    trigger the following:

    WARNING: CPU: 48 PID: 3747
    CPU: 48 PID: 3743 Comm: bash Not tainted 5.4.0-1946.x86_64 #1
    Hardware name: Oracle Corporation ORACLE SERVER X7-2/ASM, MB, X7-2, BIOS 41060400 05/20/2019
    RIP: 0010:do_dentry_open+0x343/0x3a0
    Code: 00 40 08 00 45 31 ff 48 c7 43 28 40 5b e7 89 e9 02 ff ff ff 48 8b 53 28 4c 8b 72 70 4d 85 f6 0f 84 10 fe ff ff e9 f5 fd ff ff 0b 41 bf ea ff ff ff e9 3b ff ff ff 41 bf e6 ff ff ff e9 b4 fe
    RSP: 0018:ffffb8740dde7ca0 EFLAGS: 00010202
    RAX: ffffffff89e88a40 RBX: ffff928c8e6b6f00 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffff928dbfd97778 RDI: ffff9285cff685c0
    RBP: ffffb8740dde7cc8 R08: 0000000000000821 R09: 0000000000000030
    R10: 0000000000000057 R11: ffffb8740dde7a98 R12: ffff926ec781c900
    R13: ffff928c8e6b6f10 R14: ffffffff8936e190 R15: 0000000000000001
    FS: 00007f45f6777740(0000) GS:ffff928dbfd80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fff95e0d5d8 CR3: 0000001ece562006 CR4: 00000000007606e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
    vfs_open+0x2d/0x30
    path_openat+0x2d4/0x1680
    ? tty_mode_ioctl+0x298/0x4c0
    do_filp_open+0x93/0x100
    ? strncpy_from_user+0x57/0x1b0
    ? __alloc_fd+0x46/0x150
    do_sys_open+0x182/0x230
    __x64_sys_openat+0x20/0x30
    do_syscall_64+0x60/0x1b0
    entry_SYSCALL_64_after_hwframe+0x170/0x1d5
    RIP: 0033:0x7f45f5e5ce02
    Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 25 59 2d 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
    RSP: 002b:00007fff95e0d2e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
    RAX: ffffffffffffffda RBX: 0000561178c069b0 RCX: 00007f45f5e5ce02
    RDX: 0000000000000241 RSI: 0000561178c08800 RDI: 00000000ffffff9c
    RBP: 00007fff95e0d3e0 R08: 0000000000000020 R09: 0000000000000005
    R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000000
    R13: 0000000000000003 R14: 0000000000000001 R15: 0000561178c08800

    Change the return type to int and return -EPERM when lockdown is enabled
    to remove the warning above. Also rename debugfs_is_locked_down to
    debugfs_locked_down to make it sound less like it returns a boolean.

    Fixes: 5496197f9b08 ("debugfs: Restrict debugfs when the kernel is locked down")
    Signed-off-by: Eric Snowberg
    Reviewed-by: Matthew Wilcox (Oracle)
    Cc: stable
    Acked-by: James Morris
    Link: https://lore.kernel.org/r/20191207161603.35907-1-eric.snowberg@oracle.com
    Signed-off-by: Greg Kroah-Hartman

    Eric Snowberg
     

29 Jan, 2020

6 commits

  • commit 2c6b7bcd747201441923a0d3062577a8d1fbd8f8 upstream.

    Commit 8a23eb804ca4 ("Make filldir[64]() verify the directory entry
    filename is valid") added some minimal validity checks on the directory
    entries passed to filldir[64](). But they really were pretty minimal.

    This fleshes out at least the name length check: we used to disallow
    zero-length names, but really, negative lengths or oevr-long names
    aren't ok either. Both could happen if there is some filesystem
    corruption going on.

    Now, most filesystems tend to use just an "unsigned char" or similar for
    the length of a directory entry name, so even with a corrupt filesystem
    you should never see anything odd like that. But since we then use the
    name length to create the directory entry record length, let's make sure
    it actually is half-way sensible.

    Note how POSIX states that the size of a path component is limited by
    NAME_MAX, but we actually use PATH_MAX for the check here. That's
    because while NAME_MAX is generally the correct maximum name length
    (it's 255, for the same old "name length is usually just a byte on
    disk"), there's nothing in the VFS layer that really cares.

    So the real limitation at a VFS layer is the total pathname length you
    can pass as a filename: PATH_MAX.

    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     
  • commit d0cb50185ae942b03c4327be322055d622dc79f6 upstream.

    may_create_in_sticky() call is done when we already have dropped the
    reference to dir.

    Fixes: 30aba6656f61e (namei: allow restricted O_CREAT of FIFOs and regular files)
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit 3c2659bd1db81ed6a264a9fc6262d51667d655ad upstream.

    In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
    unsafe_put_user()") I changed filldir to not do individual __put_user()
    accesses, but instead use unsafe_put_user() surrounded by the proper
    user_access_begin/end() pair.

    That make them enormously faster on modern x86, where the STAC/CLAC
    games make individual user accesses fairly heavy-weight.

    However, the user_access_begin() range was not really the exact right
    one, since filldir() has the unfortunate problem that it needs to not
    only fill out the new directory entry, it also needs to fix up the
    previous one to contain the proper file offset.

    It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
    file offset of the directory entry itself - it's the offset of the next
    one. So we end up backfilling the offset in the previous entry as we
    walk along.

    But since x86 didn't really care about the exact range, and used to be
    the only architecture that did anything fancy in user_access_begin() to
    begin with, the filldir[64]() changes did something lazy, and even
    commented on it:

    /*
    * Note! This range-checks 'previous' (which may be NULL).
    * The real range was checked in getdents
    */
    if (!user_access_begin(dirent, sizeof(*dirent)))
    goto efault;

    and it all worked fine.

    But now 32-bit ppc is starting to also implement user_access_begin(),
    and the fact that we faked the range to only be the (possibly not even
    valid) previous directory entry becomes a problem, because ppc32 will
    actually be using the range that is passed in for more than just "check
    that it's user space".

    This is a complete rewrite of Christophe's original patch.

    By saving off the record length of the previous entry instead of a
    pointer to it in the filldir data structures, we can simplify the range
    check and the writing of the previous entry d_off field. No need for
    any conditionals in the user accesses themselves, although we retain the
    conditional EINTR checking for the "was this the first directory entry"
    signal handling latency logic.

    Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
    Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
    Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
    Reported-and-tested-by: Christophe Leroy
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     
  • commit 9c1c2b35f1d94de8325344c2777d7ee67492db3b upstream.

    Currently, we just assume that it will stick around by virtue of the
    submitter's reference, but later patches will allow the syscall to
    return early and we can't rely on that reference at that point.

    While I'm not aware of any reports of it, Xiubo pointed out that this
    may fix a use-after-free. If the wait for a reply times out or is
    canceled via signal, and then the reply comes in after the syscall
    returns, the client can end up trying to access r_parent without a
    reference.

    Take an extra reference to the inode when setting r_parent and release
    it when releasing the request.

    Cc: stable@vger.kernel.org
    Signed-off-by: Jeff Layton
    Reviewed-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Greg Kroah-Hartman

    Jeff Layton
     
  • commit a45ea48e2bcd92c1f678b794f488ca0bda9835b8 upstream.

    The afs filesystem needs to prohibit certain characters from cell names,
    such as '/', as these are used to form filenames in procfs, leading to
    the following warning being generated:

    WARNING: CPU: 0 PID: 3489 at fs/proc/generic.c:178

    Fix afs_alloc_cell() to disallow nonprintable characters, '/', '@' and
    names that begin with a dot.

    Remove the check for "@cell" as that is then redundant.

    This can be tested by running:

    echo add foo/.bar 1.2.3.4 >/proc/fs/afs/cells

    Note that we will also need to deal with:

    - Names ending in ".invalid" shouldn't be passed to the DNS.

    - Names that contain non-valid domainname chars shouldn't be passed to
    the DNS.

    - DNS replies that say "your-dns-needs-immediate-attention." and
    replies containing A records that say 127.0.53.53 should be
    considered invalid.
    [https://www.icann.org/en/system/files/files/name-collision-mitigation-01aug14-en.pdf]

    but these need to be dealt with by the kafs-client DNS program rather
    than the kernel.

    Reported-by: syzbot+b904ba7c947a37b4b291@syzkaller.appspotmail.com
    Cc: stable@kernel.org
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • commit 73e08e711d9c1d79fae01daed4b0e1fee5f8a275 upstream.

    This ends up being too restrictive for tasks that willingly fork and
    share the ring between forks. Andres reports that this breaks his
    postgresql work. Since we're close to 5.5 release, revert this change
    for now.

    Cc: stable@vger.kernel.org
    Fixes: 44d282796f81 ("io_uring: only allow submit from owning task")
    Reported-by: Andres Freund
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jens Axboe
     

26 Jan, 2020

3 commits

  • [ Upstream commit 51590df4f3306cb1f43dca54e3ccdd121ab89594 ]

    Fixes gcc '-Wunused-but-set-variable' warning:

    fs/afs/dir_edit.c: In function afs_set_contig_bits:
    fs/afs/dir_edit.c:75:20: warning: variable after set but not used [-Wunused-but-set-variable]
    fs/afs/dir_edit.c: In function afs_set_contig_bits:
    fs/afs/dir_edit.c:75:12: warning: variable before set but not used [-Wunused-but-set-variable]
    fs/afs/dir_edit.c: In function afs_clear_contig_bits:
    fs/afs/dir_edit.c:100:20: warning: variable after set but not used [-Wunused-but-set-variable]
    fs/afs/dir_edit.c: In function afs_clear_contig_bits:
    fs/afs/dir_edit.c:100:12: warning: variable before set but not used [-Wunused-but-set-variable]

    They are never used since commit 63a4681ff39c.

    Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...")
    Reported-by: Hulk Robot
    Signed-off-by: zhengbin
    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin

    zhengbin
     
  • commit 38a2204f5298620e8a1c3b1dc7b831425106dbc0 upstream.

    The legacy client tracking infrastructure of nfsd makes use of MD5 to
    derive a client's recovery directory name. As the nfsd module doesn't
    declare any dependency on CRYPTO_MD5, though, it may fail to allocate
    the hash if the kernel was compiled without it. As a result, generation
    of client recovery directories will fail with the following error:

    NFSD: unable to generate recoverydir name

    The explicit dependency on CRYPTO_MD5 was removed as redundant back in
    6aaa67b5f3b9 (NFSD: Remove redundant "select" clauses in fs/Kconfig
    2008-02-11) as it was already implicitly selected via RPCSEC_GSS_KRB5.
    This broke when RPCSEC_GSS_KRB5 was made optional for NFSv4 in commit
    df486a25900f (NFS: Fix the selection of security flavours in Kconfig) at
    a later point.

    Fix the issue by adding back an explicit dependency on CRYPTO_MD5.

    Fixes: df486a25900f (NFS: Fix the selection of security flavours in Kconfig)
    Signed-off-by: Patrick Steinhardt
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Patrick Steinhardt
     
  • commit 3dd4d40b420846dd35869ccc8f8627feef2cff32 upstream.

    Flags passed to Q_XQUOTARM were not sanity checked for invalid values.
    Fix that.

    Fixes: 9da93f9b7cdf ("xfs: fix Q_XQUOTARM ioctl")
    Reported-by: Yang Xu
    Signed-off-by: Jan Kara
    Reviewed-by: Eric Sandeen
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     

23 Jan, 2020

10 commits

  • commit 394440d469413fa9b74f88a11f144d76017221f2 upstream.

    Commit 60e4cf67a58 (reiserfs: fix extended attributes on the root
    directory) introduced a regression open_xa_root started returning
    -EOPNOTSUPP but it was not handled properly in reiserfs_for_each_xattr.

    When the reiserfs module is built without CONFIG_REISERFS_FS_XATTR,
    deleting an inode would result in a warning and chowning an inode
    would also result in a warning and then fail to complete.

    With CONFIG_REISERFS_FS_XATTR enabled, the xattr root would always be
    present for read-write operations.

    This commit handles -EOPNOSUPP in the same way -ENODATA is handled.

    Fixes: 60e4cf67a582 ("reiserfs: fix extended attributes on the root directory")
    CC: stable@vger.kernel.org # Commit 60e4cf67a58 was picked up by stable
    Link: https://lore.kernel.org/r/20200115180059.6935-1-jeffm@suse.com
    Reported-by: Michael Brunnbauer
    Signed-off-by: Jeff Mahoney
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jeff Mahoney
     
  • commit 5afe6ce748c1ea99e0d648153c05075e1ab93afb upstream.

    If scrub returns an error we are not copying back the scrub arguments
    structure to user space. This prevents user space to know how much
    progress scrub has done if an error happened - this includes -ECANCELED
    which is returned when users ask for scrub to stop. A particular use
    case, which is used in btrfs-progs, is to resume scrub after it is
    canceled, in that case it relies on checking the progress from the scrub
    arguments structure and then use that progress in a call to resume
    scrub.

    So fix this by always copying the scrub arguments structure to user
    space, overwriting the value returned to user space with -EFAULT only if
    copying the structure failed to let user space know that either that
    copying did not happen, and therefore the structure is stale, or it
    happened partially and the structure is probably not valid and corrupt
    due to the partial copy.

    Reported-by: Graham Cobb
    Link: https://lore.kernel.org/linux-btrfs/d0a97688-78be-08de-ca7d-bcb4c7fb397e@cobb.uk.net/
    Fixes: 06fe39ab15a6a4 ("Btrfs: do not overwrite scrub error with fault error in scrub ioctl")
    CC: stable@vger.kernel.org # 5.1+
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Qu Wenruo
    Tested-by: Graham Cobb
    Signed-off-by: Filipe Manana
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit b35cf1f0bf1f2b0b193093338414b9bd63b29015 upstream.

    The fstest btrfs/154 reports

    [ 8675.381709] BTRFS: Transaction aborted (error -28)
    [ 8675.383302] WARNING: CPU: 1 PID: 31900 at fs/btrfs/block-group.c:2038 btrfs_create_pending_block_groups+0x1e0/0x1f0 [btrfs]
    [ 8675.390925] CPU: 1 PID: 31900 Comm: btrfs Not tainted 5.5.0-rc6-default+ #935
    [ 8675.392780] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
    [ 8675.395452] RIP: 0010:btrfs_create_pending_block_groups+0x1e0/0x1f0 [btrfs]
    [ 8675.402672] RSP: 0018:ffffb2090888fb00 EFLAGS: 00010286
    [ 8675.404413] RAX: 0000000000000000 RBX: ffff92026dfa91c8 RCX: 0000000000000001
    [ 8675.406609] RDX: 0000000000000000 RSI: ffffffff8e100899 RDI: ffffffff8e100971
    [ 8675.408775] RBP: ffff920247c61660 R08: 0000000000000000 R09: 0000000000000000
    [ 8675.410978] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffe4
    [ 8675.412647] R13: ffff92026db74000 R14: ffff920247c616b8 R15: ffff92026dfbc000
    [ 8675.413994] FS: 00007fd5e57248c0(0000) GS:ffff92027d800000(0000) knlGS:0000000000000000
    [ 8675.416146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 8675.417833] CR2: 0000564aa51682d8 CR3: 000000006dcbc004 CR4: 0000000000160ee0
    [ 8675.419801] Call Trace:
    [ 8675.420742] btrfs_start_dirty_block_groups+0x355/0x480 [btrfs]
    [ 8675.422600] btrfs_commit_transaction+0xc8/0xaf0 [btrfs]
    [ 8675.424335] reset_balance_state+0x14a/0x190 [btrfs]
    [ 8675.425824] btrfs_balance.cold+0xe7/0x154 [btrfs]
    [ 8675.427313] ? kmem_cache_alloc_trace+0x235/0x2c0
    [ 8675.428663] btrfs_ioctl_balance+0x298/0x350 [btrfs]
    [ 8675.430285] btrfs_ioctl+0x466/0x2550 [btrfs]
    [ 8675.431788] ? mem_cgroup_charge_statistics+0x51/0xf0
    [ 8675.433487] ? mem_cgroup_commit_charge+0x56/0x400
    [ 8675.435122] ? do_raw_spin_unlock+0x4b/0xc0
    [ 8675.436618] ? _raw_spin_unlock+0x1f/0x30
    [ 8675.438093] ? __handle_mm_fault+0x499/0x740
    [ 8675.439619] ? do_vfs_ioctl+0x56e/0x770
    [ 8675.441034] do_vfs_ioctl+0x56e/0x770
    [ 8675.442411] ksys_ioctl+0x3a/0x70
    [ 8675.443718] ? trace_hardirqs_off_thunk+0x1a/0x1c
    [ 8675.445333] __x64_sys_ioctl+0x16/0x20
    [ 8675.446705] do_syscall_64+0x50/0x210
    [ 8675.448059] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 8675.479187] BTRFS: error (device vdb) in btrfs_create_pending_block_groups:2038: errno=-28 No space left

    We now use btrfs_can_overcommit() to see if we can flip a block group
    read only. Before this would fail because we weren't taking into
    account the usable un-allocated space for allocating chunks. With my
    patches we were allowed to do the balance, which is technically correct.

    The test is trying to start balance on degraded mount. So now we're
    trying to allocate a chunk and cannot because we want to allocate a
    RAID1 chunk, but there's only 1 device that's available for usage. This
    results in an ENOSPC.

    But we shouldn't even be making it this far, we don't have enough
    devices to restripe. The problem is we're using btrfs_num_devices(),
    that also includes missing devices. That's not actually what we want, we
    need to use rw_devices.

    The chunk_mutex is not needed here, rw_devices changes only in device
    add, remove or replace, all are excluded by EXCL_OP mechanism.

    Fixes: e4d8ec0f65b9 ("Btrfs: implement online profile changing")
    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    [ add stacktrace, update changelog, drop chunk_mutex ]
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 26ef8493e1ab771cb01d27defca2fa1315dc3980 upstream.

    When running xfstests on the current btrfs I get the following splat from
    kmemleak:

    unreferenced object 0xffff88821b2404e0 (size 32):
    comm "kworker/u4:7", pid 26663, jiffies 4295283698 (age 8.776s)
    hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 10 ff fd 26 82 88 ff ff ...........&....
    10 ff fd 26 82 88 ff ff 20 ff fd 26 82 88 ff ff ...&.... ..&....
    backtrace:
    [] ulist_alloc+0x25/0x60 [btrfs]
    [] btrfs_find_all_roots_safe+0x41/0x100 [btrfs]
    [] btrfs_find_all_roots+0x52/0x70 [btrfs]
    [] btrfs_qgroup_rescan_worker+0x343/0x680 [btrfs]
    [] btrfs_work_helper+0xac/0x1e0 [btrfs]
    [] process_one_work+0x1cf/0x350
    [] worker_thread+0x28/0x3c0
    [] kthread+0x109/0x120
    [] ret_from_fork+0x35/0x40

    This corresponds to:

    (gdb) l *(btrfs_find_all_roots_safe+0x41)
    0x8d7e1 is in btrfs_find_all_roots_safe (fs/btrfs/backref.c:1413).
    1408
    1409 tmp = ulist_alloc(GFP_NOFS);
    1410 if (!tmp)
    1411 return -ENOMEM;
    1412 *roots = ulist_alloc(GFP_NOFS);
    1413 if (!*roots) {
    1414 ulist_free(tmp);
    1415 return -ENOMEM;
    1416 }
    1417

    Following the lifetime of the allocated 'roots' ulist, it gets freed
    again in btrfs_qgroup_account_extent().

    But this does not happen if the function is called with the
    'BTRFS_FS_QUOTA_ENABLED' flag cleared, then btrfs_qgroup_account_extent()
    does a short leave and directly returns.

    Instead of directly returning we should jump to the 'out_free' in order to
    free all resources as expected.

    CC: stable@vger.kernel.org # 4.14+
    Reviewed-by: Qu Wenruo
    Signed-off-by: Johannes Thumshirn
    [ add comment ]
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Johannes Thumshirn
     
  • commit 6282675e6708ec78518cc0e9ad1f1f73d7c5c53d upstream.

    [BUG]
    There are several different KASAN reports for balance + snapshot
    workloads. Involved call paths include:

    should_ignore_root+0x54/0xb0 [btrfs]
    build_backref_tree+0x11af/0x2280 [btrfs]
    relocate_tree_blocks+0x391/0xb80 [btrfs]
    relocate_block_group+0x3e5/0xa00 [btrfs]
    btrfs_relocate_block_group+0x240/0x4d0 [btrfs]
    btrfs_relocate_chunk+0x53/0xf0 [btrfs]
    btrfs_balance+0xc91/0x1840 [btrfs]
    btrfs_ioctl_balance+0x416/0x4e0 [btrfs]
    btrfs_ioctl+0x8af/0x3e60 [btrfs]
    do_vfs_ioctl+0x831/0xb10

    create_reloc_root+0x9f/0x460 [btrfs]
    btrfs_reloc_post_snapshot+0xff/0x6c0 [btrfs]
    create_pending_snapshot+0xa9b/0x15f0 [btrfs]
    create_pending_snapshots+0x111/0x140 [btrfs]
    btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
    btrfs_mksubvol+0x915/0x960 [btrfs]
    btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
    btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
    btrfs_ioctl+0x241b/0x3e60 [btrfs]
    do_vfs_ioctl+0x831/0xb10

    btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs]
    create_pending_snapshot+0x209/0x15f0 [btrfs]
    create_pending_snapshots+0x111/0x140 [btrfs]
    btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
    btrfs_mksubvol+0x915/0x960 [btrfs]
    btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
    btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
    btrfs_ioctl+0x241b/0x3e60 [btrfs]
    do_vfs_ioctl+0x831/0xb10

    [CAUSE]
    All these call sites are only relying on root->reloc_root, which can
    undergo btrfs_drop_snapshot(), and since we don't have real refcount
    based protection to reloc roots, we can reach already dropped reloc
    root, triggering KASAN.

    [FIX]
    To avoid such access to unstable root->reloc_root, we should check
    BTRFS_ROOT_DEAD_RELOC_TREE bit first.

    This patch introduces wrappers that provide the correct way to check the
    bit with memory barriers protection.

    Most callers don't distinguish merged reloc tree and no reloc tree. The
    only exception is should_ignore_root(), as merged reloc tree can be
    ignored, while no reloc tree shouldn't.

    [CRITICAL SECTION ANALYSIS]
    Although test_bit()/set_bit()/clear_bit() doesn't imply a barrier, the
    DEAD_RELOC_TREE bit has extra help from transaction as a higher level
    barrier, the lifespan of root::reloc_root and DEAD_RELOC_TREE bit are:

    NULL: reloc_root is NULL PTR: reloc_root is not NULL
    0: DEAD_RELOC_ROOT bit not set DEAD: DEAD_RELOC_ROOT bit set

    (NULL, 0) Initial state __
    | /\ Section A
    btrfs_init_reloc_root() \/
    | __
    (PTR, 0) reloc_root initialized /\
    | |
    btrfs_update_reloc_root() | Section B
    | |
    (PTR, DEAD) reloc_root has been merged \/
    | __
    === btrfs_commit_transaction() ====================
    | /\
    clean_dirty_subvols() |
    | | Section C
    (NULL, DEAD) reloc_root cleanup starts \/
    | __
    btrfs_drop_snapshot() /\
    | | Section D
    (NULL, 0) Back to initial state \/

    Every have_reloc_root() or test_bit(DEAD_RELOC_ROOT) caller holds
    transaction handle, so none of such caller can cross transaction boundary.

    In Section A, every caller just found no DEAD bit, and grab reloc_root.

    In the cross section A-B, caller may get no DEAD bit, but since reloc_root
    is still completely valid thus accessing reloc_root is completely safe.

    No test_bit() caller can cross the boundary of Section B and Section C.

    In Section C, every caller found the DEAD bit, so no one will access
    reloc_root.

    In the cross section C-D, either caller gets the DEAD bit set, avoiding
    access reloc_root no matter if it's safe or not. Or caller get the DEAD
    bit cleared, then access reloc_root, which is already NULL, nothing will
    be wrong.

    The memory write barriers are between the reloc_root updates and bit
    set/clear, the pairing read side is before test_bit.

    Reported-by: Zygo Blaxell
    Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
    CC: stable@vger.kernel.org # 5.4+
    Reviewed-by: Josef Bacik
    Signed-off-by: Qu Wenruo
    Reviewed-by: David Sterba
    [ barriers ]
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Qu Wenruo
     
  • commit 423a716cd7be16fb08690760691befe3be97d3fc upstream.

    btrfs_del_root_ref() will simply WARN_ON() if the ref doesn't match in
    any way, and then continue to delete the reference. This shouldn't
    happen, we have these values because there's more to the reference than
    the original root and the sub root. If any of these checks fail, return
    -ENOENT.

    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit d49d3287e74ffe55ae7430d1e795e5f9bf7359ea upstream.

    If we have the following sequence of events

    btrfs sub create A
    btrfs sub create A/B
    btrfs sub snap A C
    mkdir C/foo
    mv A/B C/foo
    rm -rf *

    We will end up with a transaction abort.

    The reason for this is because we create a root ref for B pointing to A.
    When we create a snapshot of C we still have B in our tree, but because
    the root ref points to A and not C we will make it appear to be empty.

    The problem happens when we move B into C. This removes the root ref
    for B pointing to A and adds a ref of B pointing to C. When we rmdir C
    we'll see that we have a ref to our root and remove the root ref,
    despite not actually matching our reference name.

    Now btrfs_del_root_ref() allowing this to work is a bug as well, however
    we know that this inode does not actually point to a root ref in the
    first place, so we shouldn't be calling btrfs_del_root_ref() in the
    first place and instead simply look up our dir index for this item and
    do the rest of the removal.

    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • [ Upstream commit 045d3967b6920b663fc010ad414ade1b24143bd1 ]

    btrfs_unlink_subvol takes the name of the dentry and the root objectid
    based on what kind of inode this is, either a real subvolume link or a
    empty one that we inherited as a snapshot. We need to fix how we unlink
    in the case for BTRFS_EMPTY_SUBVOL_DIR_OBJECTID in the future, so rework
    btrfs_unlink_subvol to just take the dentry and handle getting the right
    objectid given the type of inode this is. There is no functional change
    here, simply pushing the work into btrfs_unlink_subvol() proper.

    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin

    Josef Bacik
     
  • commit 44d282796f81eb1debc1d7cb53245b4cb3214cb5 upstream.

    If the credentials or the mm doesn't match, don't allow the task to
    submit anything on behalf of this ring. The task that owns the ring can
    pass the file descriptor to another task, but we don't want to allow
    that task to submit an SQE that then assumes the ring mm and creds if
    it needs to go async.

    Cc: stable@vger.kernel.org
    Suggested-by: Stefan Metzmacher
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jens Axboe
     
  • commit 7df1e988c723a066754090b22d047c3225342152 upstream.

    Buffered read in fuse normally goes via:

    -> generic_file_buffered_read()
    -> fuse_readpages()
    -> fuse_send_readpages()
    ->fuse_simple_request() [called since v5.4]

    In the case of a read request, fuse_simple_request() will return a
    non-negative bytecount on success or a negative error value. A positive
    bytecount was taken to be an error and the PG_error flag set on the page.
    This resulted in generic_file_buffered_read() falling back to ->readpage(),
    which would repeat the read request and succeed. Because of the repeated
    read succeeding the bug was not detected with regression tests or other use
    cases.

    The FTP module in GVFS however fails the second read due to the
    non-seekable nature of FTP downloads.

    Fix by checking and ignoring positive return value from
    fuse_simple_request().

    Reported-by: Ondrej Holy
    Link: https://gitlab.gnome.org/GNOME/gvfs/issues/441
    Fixes: 134831e36bbd ("fuse: convert readpages to simple api")
    Cc: # v5.4
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     

18 Jan, 2020

13 commits

  • [ Upstream commit 397eac17f86f404f5ba31d8c3e39ec3124b39fd3 ]

    If journal is dirty when mount, it will be replayed but jbd2 sb log tail
    cannot be updated to mark a new start because journal->j_flag has
    already been set with JBD2_ABORT first in journal_init_common.

    When a new transaction is committed, it will be recored in block 1
    first(journal->j_tail is set to 1 in journal_reset). If emergency
    restart happens again before journal super block is updated
    unfortunately, the new recorded trans will not be replayed in the next
    mount.

    The following steps describe this procedure in detail.
    1. mount and touch some files
    2. these transactions are committed to journal area but not checkpointed
    3. emergency restart
    4. mount again and its journals are replayed
    5. journal super block's first s_start is 1, but its s_seq is not updated
    6. touch a new file and its trans is committed but not checkpointed
    7. emergency restart again
    8. mount and journal is dirty, but trans committed in 6 will not be
    replayed.

    This exception happens easily when this lun is used by only one node.
    If it is used by multi-nodes, other node will replay its journal and its
    journal super block will be updated after recovery like what this patch
    does.

    ocfs2_recover_node->ocfs2_replay_journal.

    The following jbd2 journal can be generated by touching a new file after
    journal is replayed, and seq 15 is the first valid commit, but first seq
    is 13 in journal super block.

    logdump:
    Block 0: Journal Superblock
    Seq: 0 Type: 4 (JBD2_SUPERBLOCK_V2)
    Blocksize: 4096 Total Blocks: 32768 First Block: 1
    First Commit ID: 13 Start Log Blknum: 1
    Error: 0
    Feature Compat: 0
    Feature Incompat: 2 block64
    Feature RO compat: 0
    Journal UUID: 4ED3822C54294467A4F8E87D2BA4BC36
    FS Share Cnt: 1 Dynamic Superblk Blknum: 0
    Per Txn Block Limit Journal: 0 Data: 0

    Block 1: Journal Commit Block
    Seq: 14 Type: 2 (JBD2_COMMIT_BLOCK)

    Block 2: Journal Descriptor
    Seq: 15 Type: 1 (JBD2_DESCRIPTOR_BLOCK)
    No. Blocknum Flags
    0. 587 none
    UUID: 00000000000000000000000000000000
    1. 8257792 JBD2_FLAG_SAME_UUID
    2. 619 JBD2_FLAG_SAME_UUID
    3. 24772864 JBD2_FLAG_SAME_UUID
    4. 8257802 JBD2_FLAG_SAME_UUID
    5. 513 JBD2_FLAG_SAME_UUID JBD2_FLAG_LAST_TAG
    ...
    Block 7: Inode
    Inode: 8257802 Mode: 0640 Generation: 57157641 (0x3682809)
    FS Generation: 2839773110 (0xa9437fb6)
    CRC32: 00000000 ECC: 0000
    Type: Regular Attr: 0x0 Flags: Valid
    Dynamic Features: (0x1) InlineData
    User: 0 (root) Group: 0 (root) Size: 7
    Links: 1 Clusters: 0
    ctime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019
    atime: 0x5de5d870 0x113181a1 -- Tue Dec 3 11:37:20.288457121 2019
    mtime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019
    dtime: 0x0 -- Thu Jan 1 08:00:00 1970
    ...
    Block 9: Journal Commit Block
    Seq: 15 Type: 2 (JBD2_COMMIT_BLOCK)

    The following is journal recovery log when recovering the upper jbd2
    journal when mount again.

    syslog:
    ocfs2: File system on device (252,1) was not unmounted cleanly, recovering it.
    fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 0
    fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 1
    fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 2
    fs/jbd2/recovery.c:(jbd2_journal_recover, 278): JBD2: recovery, exit status 0, recovered transactions 13 to 13

    Due to first commit seq 13 recorded in journal super is not consistent
    with the value recorded in block 1(seq is 14), journal recovery will be
    terminated before seq 15 even though it is an unbroken commit, inode
    8257802 is a new file and it will be lost.

    Link: http://lkml.kernel.org/r/20191217020140.2197-1-li.kai4@h3c.com
    Signed-off-by: Kai Li
    Reviewed-by: Joseph Qi
    Reviewed-by: Changwei Ge
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Gang He
    Cc: Jun Piao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Kai Li
     
  • commit 18f428d4e2f7eff162d80b2b21689496c4e82afd upstream.

    Static checker revealed possible error path leading to possible
    NULL pointer dereferencing.

    Reported-by: Dan Carpenter
    Fixes: e0639dc5805a: ("NFSD introduce async copy feature")
    Signed-off-by: Olga Kornievskaia
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Olga Kornievskaia
     
  • commit 1f0d5c911b64165c9754139a26c8c2fad352c132 upstream.

    We expect 64-bit calculation result from below statement, however
    in 32-bit machine, looped left shift operation on pgoff_t type
    variable may cause overflow issue, fix it by forcing type cast.

    page->index << PAGE_SHIFT;

    Fixes: 26de9b117130 ("f2fs: avoid unnecessary updating inode during fsync")
    Fixes: 0a2aa8fbb969 ("f2fs: refactor __exchange_data_block for speed up")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     
  • commit 10256f000932f12596dc043cf880ecf488a32510 upstream.

    If there are more than one valid snod on the sleb->nodes list,
    do_kill_orphans will malloc ino more than once without releasing
    previous ino's memory. Finally, it will trigger memory leak.

    Fixes: ee1438ce5dc4 ("ubifs: Check link count of inodes when...")
    Signed-off-by: Zhihao Cheng
    Signed-off-by: zhangyi (F)
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Zhihao Cheng
     
  • commit df22b5b3ecc6233e33bd27f67f14c0cd1b5a5897 upstream.

    In the ubifs_jnl_write_inode() functon, it calls ubifs_iget()
    with xent->inum. The xent->inum is __le64, but the ubifs_iget()
    takes native cpu endian.

    I think that this should be changed to passing le64_to_cpu(xent->inum)
    to fix the following sparse warning:

    fs/ubifs/journal.c:902:58: warning: incorrect type in argument 2 (different base types)
    fs/ubifs/journal.c:902:58: expected unsigned long inum
    fs/ubifs/journal.c:902:58: got restricted __le64 [usertype] inum

    Fixes: 7959cf3a7506 ("ubifs: journal: Handle xattrs like files")
    Signed-off-by: Ben Dooks
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Ben Dooks (Codethink)
     
  • commit 91cbf01178c37086b32148c53e24b04cb77557cf upstream.

    This reverts commit 9163e0184bd7d5f779934d34581843f699ad2ffd.

    At the point when ubifs_fill_super() runs, we have already a reference
    to the super block. So upon deactivate_locked_super() c will get
    free()'ed via ->kill_sb().

    Cc: Wenwen Wang
    Fixes: 9163e0184bd7 ("ubifs: Fix memory leak bug in alloc_ubifs_info() error path")
    Reported-by: https://twitter.com/grsecurity/status/1180609139359277056
    Signed-off-by: Richard Weinberger
    Tested-by: Romain Izard
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Richard Weinberger
     
  • commit 8d0980704842e8a68df2c3164c1c165e5c7ebc08 upstream.

    Out of the four ioctl commands supported on gfs2, only FITRIM
    works in compat mode.

    Add a proper handler based on the ext4 implementation.

    Fixes: 6ddc5c3ddf25 ("gfs2: getlabel support")
    Reviewed-by: Bob Peterson
    Cc: Andreas Gruenbacher
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     
  • commit 450c3d4166837c496ebce03650c08800991f2150 upstream.

    In affs_remount if data is provided it is duplicated into new_opts. The
    allocated memory for new_opts is only released if parse_options fails.

    There's a bit of history behind new_options, originally there was
    save/replace options on the VFS layer so the 'data' passed must not
    change (thus strdup), this got cleaned up in later patches. But not
    completely.

    There's no reason to do the strdup in cases where the filesystem does
    not need to reuse the 'data' again, because strsep would modify it
    directly.

    Fixes: c8f33d0bec99 ("affs: kstrdup() memory handling")
    Signed-off-by: Navid Emamdoost
    [ update changelog ]
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Navid Emamdoost
     
  • commit 5326de9e94bedcf7366e7e7625d4deb8c1f1ca8a upstream.

    If nfs4_delegreturn_prepare needs to wait for a layoutreturn to complete
    then make sure we drop the sequence slot if we hold it.

    Fixes: 1c5bd76d17cc ("pNFS: Enable layoutreturn operation for return-on-close")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit 5c441544f045e679afd6c3c6d9f7aaf5fa5f37b0 upstream.

    If the server returns a bad or dead session error, the we don't want
    to update the session slot number, but just immediately schedule
    recovery and allow it to proceed.

    We can/should then remove handling in other places

    Fixes: 3453d5708b33 ("NFSv4.1: Avoid false retries when RPC calls are interrupted")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit a2e2f2dc77a18d2b0f450fb7fcb4871c9f697822 upstream.

    The new nfsdcld client tracking operations use sha256 to compute hashes
    of the kerberos principals, so make sure CRYPTO_SHA256 is enabled.

    Fixes: 6ee95d1c8991 ("nfsd: add support for upcall version 2")
    Reported-by: Jamie Heilman
    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Scott Mayhew
     
  • commit 18b9a895e652979b70f9c20565394a69354dfebc upstream.

    Don't assign an error pointer to cld_net->cn_tfm, otherwise an oops will
    occur in nfsd4_remove_cld_pipe().

    Also, move the initialization of cld_net->cn_tfm so that it occurs after
    the check to see if nfsdcld is running. This is necessary because
    nfsd4_client_tracking_init() looks for -ETIMEDOUT to determine whether
    to use the "old" nfsdcld tracking ops.

    Fixes: 6ee95d1c8991 ("nfsd: add support for upcall version 2")
    Reported-by: Jamie Heilman
    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Scott Mayhew
     
  • commit ad97a995d8edff820d4238bd0dfc69f440031ae6 upstream.

    Encode the mtime correctly.

    Fixes: 95582b0083883 ("vfs: change inode times to use struct timespec64")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust