29 Oct, 2018

2 commits

  • That we we can also poll non blk-mq queues. Mostly needed for
    the NVMe multipath code, but could also be useful elsewhere.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe
    (cherry picked from commit ea435e1b9392a33deceaea2a16ebaa3397bead93)

    Christoph Hellwig
     
  • Fix bug of commit 74d46992e0d9 ("block: replace bi_bdev with a gendisk
    pointer and partitions index").

    bio_dev(bio) is used to find the dev state in function
    __btrfsic_submit_bio. But when dev_state is added to the hashtable, it
    is using dev_t of block_device.

    bio_dev(bio) returns a dev_t of part0 which is different from dev_t in
    block_device(bd_dev). bd_dev in block_device represents the exact
    partition.

    block_device.bd_dev =
    bio->bi_partno (same as block_device.bd_partno) + bio_dev(bio).

    When adding a dev_state into hashtable, we use the exact partition dev_t.
    So when looking it up, it should also use the exact partition dev_t.

    Reproducer of this bug:

    Use MOUNT_OPTIONS="-o check_int" and run btrfs/001 in fstests.
    Then there will be WARNING like below.

    WARNING:
    btrfs: attempt to write superblock which references block M @29523968 (sda7 /1111654400/2) which is never written!

    Signed-off-by: Gu JinXiang
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    (cherry picked from commit d28e649a5c58b779b303c252c66ee84a0f2c3b32)

    Gu JinXiang
     

20 Oct, 2018

1 commit


18 Oct, 2018

1 commit

  • commit f1782c9bc547754f4bd3043fe8cfda53db85f13f upstream.

    I received a report about suspicious growth of unreclaimable slabs on
    some machines. I've found that it happens on machines with low memory
    pressure, and these unreclaimable slabs are external names attached to
    dentries.

    External names are allocated using generic kmalloc() function, so they
    are accounted as unreclaimable. But they are held by dentries, which
    are reclaimable, and they will be reclaimed under the memory pressure.

    In particular, this breaks MemAvailable calculation, as it doesn't take
    unreclaimable slabs into account. This leads to a silly situation, when
    a machine is almost idle, has no memory pressure and therefore has a big
    dentry cache. And the resulting MemAvailable is too low to start a new
    workload.

    To address the issue, the NR_INDIRECTLY_RECLAIMABLE_BYTES counter is
    used to track the amount of memory, consumed by external names. The
    counter is increased in the dentry allocation path, if an external name
    structure is allocated; and it's decreased in the dentry freeing path.

    To reproduce the problem I've used the following Python script:

    import os

    for iter in range (0, 10000000):
    try:
    name = ("/some_long_name_%d" % iter) + "_" * 220
    os.stat(name)
    except Exception:
    pass

    Without this patch:
    $ cat /proc/meminfo | grep MemAvailable
    MemAvailable: 7811688 kB
    $ python indirect.py
    $ cat /proc/meminfo | grep MemAvailable
    MemAvailable: 2753052 kB

    With the patch:
    $ cat /proc/meminfo | grep MemAvailable
    MemAvailable: 7809516 kB
    $ python indirect.py
    $ cat /proc/meminfo | grep MemAvailable
    MemAvailable: 7749144 kB

    [guro@fb.com: fix indirectly reclaimable memory accounting for CONFIG_SLOB]
    Link: http://lkml.kernel.org/r/20180312194140.19517-1-guro@fb.com
    [guro@fb.com: fix indirectly reclaimable memory accounting]
    Link: http://lkml.kernel.org/r/20180313125701.7955-1-guro@fb.com
    Link: http://lkml.kernel.org/r/20180305133743.12746-5-guro@fb.com
    Signed-off-by: Roman Gushchin
    Reviewed-by: Andrew Morton
    Cc: Alexander Viro
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Roman Gushchin
     

13 Oct, 2018

2 commits

  • commit 37f31b6ca4311b94d985fb398a72e5399ad57925 upstream.

    The requested device name can be NULL or an empty string.
    Check for that and refuse to continue. UBIFS has to do this manually
    since we cannot use mount_bdev(), which checks for this condition.

    Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system")
    Reported-by: syzbot+38bd0f7865e5c6379280@syzkaller.appspotmail.com
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Richard Weinberger
     
  • commit d3f07c049dab1a3f1740f476afd3d5e5b738c21c upstream.

    syzbot found the following crash on:

    HEAD commit: d9bd94c0bcaa Add linux-next specific files for 20180801
    git tree: linux-next
    console output: https://syzkaller.appspot.com/x/log.txt?x=1001189c400000
    kernel config: https://syzkaller.appspot.com/x/.config?x=cc8964ea4d04518c
    dashboard link: https://syzkaller.appspot.com/bug?extid=c966a82db0b14aa37e81
    compiler: gcc (GCC) 8.0.1 20180413 (experimental)

    Unfortunately, I don't have any reproducer for this crash yet.

    IMPORTANT: if you fix the bug, please add the following tag to the commit:
    Reported-by: syzbot+c966a82db0b14aa37e81@syzkaller.appspotmail.com

    loop7: rw=12288, want=8200, limit=20
    netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'.
    openvswitch: netlink: Message has 8 unknown bytes.
    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] SMP KASAN
    CPU: 1 PID: 7615 Comm: syz-executor7 Not tainted 4.18.0-rc7-next-20180801+ #29
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline]
    RIP: 0010:compound_head include/linux/page-flags.h:142 [inline]
    RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline]
    RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline]
    RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835
    Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00
    RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246
    RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000
    RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005
    RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026
    R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb
    R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40
    FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    f2fs_get_valid_checkpoint+0x436/0x1ec0 fs/f2fs/checkpoint.c:860
    f2fs_fill_super+0x2d42/0x8110 fs/f2fs/super.c:2883
    mount_bdev+0x314/0x3e0 fs/super.c:1344
    f2fs_mount+0x3c/0x50 fs/f2fs/super.c:3133
    legacy_get_tree+0x131/0x460 fs/fs_context.c:729
    vfs_get_tree+0x1cb/0x5c0 fs/super.c:1743
    do_new_mount fs/namespace.c:2603 [inline]
    do_mount+0x6f2/0x1e20 fs/namespace.c:2927
    ksys_mount+0x12d/0x140 fs/namespace.c:3143
    __do_sys_mount fs/namespace.c:3157 [inline]
    __se_sys_mount fs/namespace.c:3154 [inline]
    __x64_sys_mount+0xbe/0x150 fs/namespace.c:3154
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x45943a
    Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 bd 8a fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 3d 01 f0 ff ff 0f 83 9a 8a fb ff c3 66 0f 1f 84 00 00 00 00 00
    RSP: 002b:00007f36a61d4a88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
    RAX: ffffffffffffffda RBX: 00007f36a61d4b30 RCX: 000000000045943a
    RDX: 00007f36a61d4ad0 RSI: 0000000020000100 RDI: 00007f36a61d4af0
    RBP: 0000000020000100 R08: 00007f36a61d4b30 R09: 00007f36a61d4ad0
    R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000013
    R13: 0000000000000000 R14: 00000000004c8ea0 R15: 0000000000000000
    Modules linked in:
    Dumping ftrace buffer:
    (ftrace buffer empty)
    ---[ end trace bd8550c129352286 ]---
    RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline]
    RIP: 0010:compound_head include/linux/page-flags.h:142 [inline]
    RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline]
    RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline]
    RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835
    Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00
    RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246
    RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000
    RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005
    netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'.
    RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026
    openvswitch: netlink: Message has 8 unknown bytes.
    R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb
    R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40
    FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

    In validate_checkpoint(), if we failed to call get_checkpoint_version(), we
    will pass returned invalid page pointer into f2fs_put_page, cause accessing
    invalid memory, this patch tries to handle error path correctly to fix this
    issue.

    Signed-off-by: Chao Yu
    Signed-off-by: Greg Kroah-Hartman

    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

10 Oct, 2018

11 commits

  • commit cbe355f57c8074bc4f452e5b6e35509044c6fa23 upstream.

    In dlm_init_lockres() we access and modify res->tracking and
    dlm->tracking_list without holding dlm->track_lock. This can cause list
    corruptions and can end up in kernel panic.

    Fix this by locking res->tracking and dlm->tracking_list with
    dlm->track_lock instead of dlm->spinlock.

    Link: http://lkml.kernel.org/r/1529951192-4686-1-git-send-email-ashish.samant@oracle.com
    Signed-off-by: Ashish Samant
    Reviewed-by: Changwei Ge
    Acked-by: Joseph Qi
    Acked-by: Jun Piao
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Ashish Samant
     
  • commit f8a00cef17206ecd1b30d3d9f99e10d9fa707aa7 upstream.

    Currently, you can use /proc/self/task/*/stack to cause a stack walk on
    a task you control while it is running on another CPU. That means that
    the stack can change under the stack walker. The stack walker does
    have guards against going completely off the rails and into random
    kernel memory, but it can interpret random data from your kernel stack
    as instruction pointers and stack pointers. This can cause exposure of
    kernel stack contents to userspace.

    Restrict the ability to inspect kernel stacks of arbitrary tasks to root
    in order to prevent a local attacker from exploiting racy stack unwinding
    to leak kernel task stack contents. See the added comment for a longer
    rationale.

    There don't seem to be any users of this userspace API that can't
    gracefully bail out if reading from the file fails. Therefore, I believe
    that this change is unlikely to break things. In the case that this patch
    does end up needing a revert, the next-best solution might be to fake a
    single-entry stack based on wchan.

    Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com
    Fixes: 2ec220e27f50 ("proc: add /proc/*/stack")
    Signed-off-by: Jann Horn
    Acked-by: Kees Cook
    Cc: Alexey Dobriyan
    Cc: Ken Chen
    Cc: Will Deacon
    Cc: Laura Abbott
    Cc: Andy Lutomirski
    Cc: Catalin Marinas
    Cc: Josh Poimboeuf
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H . Peter Anvin"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     
  • commit 0595751f267994c3c7027377058e4185b3a28e75 upstream.

    When mounting a Windows share that is the root of a drive (eg. C$)
    the server does not return . and .. directory entries. This results in
    the smb2 code path erroneously skipping the 2 first entries.

    Pseudo-code of the readdir() code path:

    cifs_readdir(struct file, struct dir_context)
    initiate_cifs_search ops->query_dir_first

    dir_emit_dots
    dir_emit < (pos_in_buf)) && (cur_ent != NULL); i++) {
    /* go entry by entry figuring out which is first */
    cur_ent = nxt_dir_entry(cur_ent, end_of_smb,
    cfile->srch_inf.info_level);
    }

    C) cifs_filldir() skips . and .. so we can safely ignore them for now.

    Sample program:

    int main(int argc, char **argv)
    {
    const char *path = argc >= 2 ? argv[1] : ".";
    DIR *dh;
    struct dirent *de;

    printf("listing path \n", path);
    dh = opendir(path);
    if (!dh) {
    printf("opendir error %d\n", errno);
    return 1;
    }

    while (1) {
    de = readdir(dh);
    if (!de) {
    if (errno) {
    printf("readdir error %d\n", errno);
    return 1;
    }
    printf("end of listing\n");
    break;
    }
    printf("off=%lu \n", de->d_off, de->d_name);
    }

    return 0;
    }

    Before the fix with SMB1 on root shares:

    off=1
    off=2
    off=3
    off=4

    and on non-root shares:

    off=1
    off=4 off=5 we skipped . and .. from response buffer (C)
    off=6 but still incremented pos
    off=7
    off=8

    Therefore the fix for smb2 is to mimic smb1 behaviour and offset the
    index_of_last_entry by 2.

    Test results comparing smb1 and smb2 before/after the fix on root
    share, non-root shares and on large directories (ie. multi-response
    dir listing):

    PRE FIX
    =======
    pre-1-root VS pre-2-root:
    ERR pre-2-root is missing [bootmgr, $Recycle.Bin]
    pre-1-nonroot VS pre-2-nonroot:
    OK~ same files, same order, different offsets
    pre-1-nonroot-large VS pre-2-nonroot-large:
    OK~ same files, same order, different offsets

    POST FIX
    ========
    post-1-root VS post-2-root:
    OK same files, same order, same offsets
    post-1-nonroot VS post-2-nonroot:
    OK same files, same order, same offsets
    post-1-nonroot-large VS post-2-nonroot-large:
    OK same files, same order, same offsets

    REGRESSION?
    ===========
    pre-1-root VS post-1-root:
    OK same files, same order, same offsets
    pre-1-nonroot VS post-1-nonroot:
    OK same files, same order, same offsets

    BugLink: https://bugzilla.samba.org/show_bug.cgi?id=13107
    Signed-off-by: Aurelien Aptel
    Signed-off-by: Paulo Alcantara
    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Steve French
    CC: Stable
    Signed-off-by: Greg Kroah-Hartman

    Aurelien Aptel
     
  • commit ffc4c92227db5699493e43eb140b4cb5904c30ff upstream.

    Commit 786534b92f3c introduced a regression that caused listxattr to
    return the POSIX ACL attribute names even though sysfs doesn't support
    POSIX ACLs. This happens because simple_xattr_list checks for NULL
    i_acl / i_default_acl, but inode_init_always initializes those fields
    to ACL_NOT_CACHED ((void *)-1). For example:
    $ getfattr -m- -d /sys
    /sys: system.posix_acl_access: Operation not supported
    /sys: system.posix_acl_default: Operation not supported
    Fix this in simple_xattr_list by checking if the filesystem supports POSIX ACLs.

    Fixes: 786534b92f3c ("tmpfs: listxattr should include POSIX ACL xattrs")
    Reported-by: Marc Aurèle La France
    Tested-by: Marc Aurèle La France
    Signed-off-by: Andreas Gruenbacher
    Cc: stable@vger.kernel.org # v4.5+
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Andreas Gruenbacher
     
  • commit 1a8f8d2a443ef9ad9a3065ba8c8119df714240fa upstream.

    Format has a typo: it was meant to be "%.*s", not "%*s". But at some point
    callers grew nonprintable values as well, so use "%*pE" instead with a
    maximized length.

    Reported-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Fixes: 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
    Cc: # v4.12
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit 63e132528032ce937126aba591a7b37ec593a6bb upstream.

    The memory leak was detected by kmemleak when running xfstests
    overlay/051,053

    Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries")
    Cc: # v4.13
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     
  • commit 601350ff58d5415a001769532f6b8333820e5786 upstream.

    KASAN detected slab-out-of-bounds access in printk from overlayfs,
    because string format used %*s instead of %.*s.

    > BUG: KASAN: slab-out-of-bounds in string+0x298/0x2d0 lib/vsprintf.c:604
    > Read of size 1 at addr ffff8801c36c66ba by task syz-executor2/27811
    >
    > CPU: 0 PID: 27811 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #36
    ...
    > printk+0xa7/0xcf kernel/printk/printk.c:1996
    > ovl_lookup_index.cold.15+0xe8/0x1f8 fs/overlayfs/namei.c:689

    Reported-by: syzbot+376cea2b0ef340db3dd4@syzkaller.appspotmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin")
    Cc: # v4.13
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     
  • [ Upstream commit 097f5863b1a0c9901f180bbd56ae7d630655faaa ]

    We need to verify that the "data_offset" is within bounds.

    Reported-by: Dr Silvio Cesare of InfoSect
    Signed-off-by: Dan Carpenter
    Signed-off-by: Steve French
    Reviewed-by: Aurelien Aptel
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • [ Upstream commit bcfb84a996f6fa90b5e6e2954b2accb7a4711097 ]

    A powerpc build of cifs with gcc v8.2.0 produces this warning:

    fs/cifs/cifssmb.c: In function ‘CIFSSMBNegotiate’:
    fs/cifs/cifssmb.c:605:3: warning: ‘strncpy’ writing 16 bytes into a region of size 1 overflows the destination [-Wstringop-overflow=]
    strncpy(pSMB->DialectsArray+count, protocols[i].name, 16);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    Since we are already doing a strlen() on the source, change the strncpy
    to a memcpy().

    Signed-off-by: Stephen Rothwell
    Signed-off-by: Steve French
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Stephen Rothwell
     
  • [ Upstream commit c15e3f19a6d5c89b1209dc94b40e568177cb0921 ]

    When a Mac client saves an item containing a backslash to a file server
    the backslash is represented in the CIFS/SMB protocol as as U+F026.
    Before this change, listing a directory containing an item with a
    backslash in its name will return that item with the backslash
    represented with a true backslash character (U+005C) because
    convert_sfm_character mapped U+F026 to U+005C when interpretting the
    CIFS/SMB protocol response. However, attempting to open or stat the
    path using a true backslash will result in an error because
    convert_to_sfm_char does not map U+005C back to U+F026 causing the
    CIFS/SMB request to be made with the backslash represented as U+005C.

    This change simply prevents the U+F026 to U+005C conversion from
    happenning. This is analogous to how the code does not do any
    translation of UNI_SLASH (U+F000).

    Signed-off-by: Jon Kuhn
    Signed-off-by: Steve French
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jon Kuhn
     
  • [ Upstream commit 801660b040d132f67fac6a95910ad307c5929b49 ]

    Test case btrfs/164 reports use-after-free:

    [ 6712.084324] general protection fault: 0000 [#1] PREEMPT SMP
    ..
    [ 6712.195423] btrfs_update_commit_device_size+0x75/0xf0 [btrfs]
    [ 6712.201424] btrfs_commit_transaction+0x57d/0xa90 [btrfs]
    [ 6712.206999] btrfs_rm_device+0x627/0x850 [btrfs]
    [ 6712.211800] btrfs_ioctl+0x2b03/0x3120 [btrfs]

    Reason for this is that btrfs_shrink_device adds the resized device to
    the fs_devices::resized_devices after it has called the last commit
    transaction.

    So the list fs_devices::resized_devices is not empty when
    btrfs_shrink_device returns. Now the parent function
    btrfs_rm_device calls:

    btrfs_close_bdev(device);
    call_rcu(&device->rcu, free_device_rcu);

    and then does the transactio ncommit. It goes through the
    fs_devices::resized_devices in btrfs_update_commit_device_size and
    leads to use-after-free.

    Fix this by making sure btrfs_shrink_device calls the last needed
    btrfs_commit_transaction before the return. This is consistent with what
    the grow counterpart does and this makes sure the on-disk state is
    persistent when the function returns.

    Reported-by: Lu Fengqi
    Tested-by: Lu Fengqi
    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    [ update changelog ]
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Anand Jain
     

04 Oct, 2018

5 commits

  • [ Upstream commit 09a4e0be5826aa66c4ce9954841f110ffe63ef4f ]

    The largest block size supported by isofs is ISOFS_BLOCK_SIZE (2048), but
    isofs_fill_super calls sb_min_blocksize and sets the blocksize to the
    device's logical block size if it's larger than what we ended up with after
    option parsing.

    If for some reason we try to mount a hard 4k device as an isofs filesystem,
    we'll set opt.blocksize to 4096, and when we try to read the superblock
    we found via:

    block = iso_blknum << (ISOFS_BLOCK_BITS - s->s_blocksize_bits)

    with s_blocksize_bits greater than ISOFS_BLOCK_BITS, we'll have a negative
    shift and the bread will fail somewhat cryptically:

    isofs_fill_super: bread failed, dev=sda, iso_blknum=17, block=-2147483648

    It seems best to just catch and clearly reject mounts of such a device.

    Reported-by: Bryan Gurney
    Signed-off-by: Eric Sandeen
    Signed-off-by: Jan Kara
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Eric Sandeen
     
  • commit 764baba80168ad3adafb521d2ab483ccbc49e344 upstream.

    Commit 31747eda41ef ("ovl: hash directory inodes for fsnotify")
    fixed an issue of inotify watch on directory that stops getting
    events after dropping dentry caches.

    A similar issue exists for non-dir non-upper files, for example:

    $ mkdir -p lower upper work merged
    $ touch lower/foo
    $ mount -t overlay -o
    lowerdir=lower,workdir=work,upperdir=upper none merged
    $ inotifywait merged/foo &
    $ echo 2 > /proc/sys/vm/drop_caches
    $ cat merged/foo

    inotifywait doesn't get the OPEN event, because ovl_lookup() called
    from 'cat' allocates a new overlay inode and does not reuse the
    watched inode.

    Fix this by hashing non-dir overlay inodes by lower real inode in
    the following cases that were not hashed before this change:
    - A non-upper overlay mount
    - A lower non-hardlink when index=off

    A helper ovl_hash_bylower() was added to put all the logic and
    documentation about which real inode an overlay inode is hashed by
    into one place.

    The issue dates back to initial version of overlayfs, but this
    patch depends on ovl_inode code that was introduced in kernel v4.13.

    Cc: #v4.13
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Mark Salyzyn #4.14
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     
  • [ Upstream commit 826d7bc9f013d01e92997883d2fd0c25f4af1f1c ]

    If the flock owner process is dead and its pid has been already freed,
    pid translation won't work, but we still want to show flock owner pid
    number when expecting /proc/$PID/fdinfo/$FD in init pidns.

    Reproducer:
    process A process A1 process A2
    fork()--------->
    exit() open()
    flock()
    fork()--------->
    exit() sleep()

    Before the patch:
    ================
    (root@vz7)/: cat /proc/${PID_A2}/fdinfo/3
    pos: 4
    flags: 02100002
    mnt_id: 257
    lock: (root@vz7)/:

    After the patch:
    ===============
    (root@vz7)/:cat /proc/${PID_A2}/fdinfo/3
    pos: 4
    flags: 02100002
    mnt_id: 295
    lock: 1: FLOCK ADVISORY WRITE ${PID_A1} b6:f8a61:529946 0 EOF

    Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks")
    Signed-off-by: Konstantin Khorenko
    Acked-by: Andrey Vagin
    Reviewed-by: Benjamin Coddington
    Signed-off-by: Jeff Layton
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Konstantin Khorenko
     
  • [ Upstream commit 5b7b15aee641904ae269be9846610a3950cbd64c ]

    We're encoding a single op in the reply but leaving the number of ops
    zero, so the reply makes no sense.

    Somewhat academic as this isn't a case any real client will hit, though
    in theory perhaps that could change in a future protocol extension.

    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    J. Bruce Fields
     
  • [ Upstream commit ebf00be37de35788cad72f4f20b4a39e30c0be4a ]

    According to xfstest generic/240, applications seem to expect direct I/O
    writes to either complete as a whole or to fail; short direct I/O writes
    are apparently not appreciated. This means that when only part of an
    asynchronous direct I/O write succeeds, we can either fail the entire
    write, or we can wait for the partial write to complete and retry the
    remaining write as buffered I/O. The old __blockdev_direct_IO helper
    has code for waiting for partial writes to complete; the new
    iomap_dio_rw iomap helper does not.

    The above mentioned fallback mode is needed for gfs2, which doesn't
    allow block allocations under direct I/O to avoid taking cluster-wide
    exclusive locks. As a consequence, an asynchronous direct I/O write to
    a file range that contains a hole will result in a short write. In that
    case, wait for the short write to complete to allow gfs2 to recover.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Andreas Gruenbacher
     

29 Sep, 2018

10 commits

  • commit 338affb548c243d2af25b1ca628e67819350de6b upstream.

    When in effect, add "test_dummy_encryption" to _ext4_show_options() so
    that it is shown in /proc/mounts and other relevant procfs files.

    Signed-off-by: Eric Biggers
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     
  • commit fe18d649891d813964d3aaeebad873f281627fbc upstream.

    Marking mmp bh dirty before writing it will make writeback
    pick up mmp block later and submit a write, we don't want the
    duplicate write as kmmpd thread should have full control of
    reading and writing the mmp block.
    Another reason is we will also have random I/O error on
    the writeback request when blk integrity is enabled, because
    kmmpd could modify the content of the mmp block(e.g. setting
    new seq and time) while the mmp block is under I/O requested
    by writeback.

    Signed-off-by: Li Dongyang
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Andreas Dilger
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Li Dongyang
     
  • commit 5f8c10936fab2b69a487400f2872902e597dd320 upstream.

    An online resize of a file system with the bigalloc feature enabled
    and a 1k block size would be refused since ext4_resize_begin() did not
    understand s_first_data_block is 0 for all bigalloc file systems, even
    when the block size is 1k.

    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit f0a459dec5495a3580f8d784555e6f8f3bf7f263 upstream.

    Avoid growing the file system to an extent so that the last block
    group is too small to hold all of the metadata that must be stored in
    the block group.

    This problem can be triggered with the following reproducer:

    umount /mnt
    mke2fs -F -m0 -b 4096 -t ext4 -O resize_inode,^has_journal \
    -E resize=1073741824 /tmp/foo.img 128M
    mount /tmp/foo.img /mnt
    truncate --size 1708M /tmp/foo.img
    resize2fs /dev/loop0 295400
    umount /mnt
    e2fsck -fy /tmp/foo.img

    Reported-by: Torsten Hilbrich
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 4274f516d4bc50648a4d97e4f67ecbd7b65cde4a upstream.

    When mounting the superblock, ext4_fill_super() calculates the free
    blocks and free inodes and stores them in the superblock. It's not
    strictly necessary, since we don't use them any more, but it's nice to
    keep them roughly aligned to reality.

    Since it's not critical for file system correctness, the code doesn't
    call ext4_commit_super(). The problem is that it's in
    ext4_commit_super() that we recalculate the superblock checksum. So
    if we're not going to call ext4_commit_super(), we need to call
    ext4_superblock_csum_set() to make sure the superblock checksum is
    consistent.

    Most of the time, this doesn't matter, since we end up calling
    ext4_commit_super() very soon thereafter, and definitely by the time
    the file system is unmounted. However, it doesn't work in this
    sequence:

    mke2fs -Fq -t ext4 /dev/vdc 128M
    mount /dev/vdc /vdc
    cp xfstests/git-versions /vdc
    godown /vdc
    umount /vdc
    mount /dev/vdc
    tune2fs -l /dev/vdc

    With this commit, the "tune2fs -l" no longer fails.

    Reported-by: Chengguang Xu
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit bcd8e91f98c156f4b1ebcfacae675f9cfd962441 upstream.

    A maliciously crafted file system can cause an overflow when the
    results of a 64-bit calculation is stored into a 32-bit length
    parameter.

    https://bugzilla.kernel.org/show_bug.cgi?id=200623

    Signed-off-by: Theodore Ts'o
    Reported-by: Wen Xu
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 4d982e25d0bdc83d8c64e66fdeca0b89240b3b85 upstream.

    A specially crafted file system can trick empty_inline_dir() into
    reading past the last valid entry in a inline directory, and then run
    into the end of xattr marker. This will trigger a divide by zero
    fault. Fix this by using the size of the inline directory instead of
    dir->i_size.

    Also clean up error reporting in __ext4_check_dir_entry so that the
    message is clearer and more understandable --- and avoids the division
    by zero trap if the size passed in is zero. (I'm not sure why we
    coded it that way in the first place; printing offset % size is
    actually more confusing and less useful.)

    https://bugzilla.kernel.org/show_bug.cgi?id=200933

    Signed-off-by: Theodore Ts'o
    Reported-by: Wen Xu
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit b50282f3241acee880514212d88b6049fb5039c8 upstream.

    If the destination of the rename(2) system call exists, the inode's
    link count (i_nlinks) must be non-zero. If it is, the inode can end
    up on the orphan list prematurely, leading to all sorts of hilarity,
    including a use-after-free.

    https://bugzilla.kernel.org/show_bug.cgi?id=200931

    Signed-off-by: Theodore Ts'o
    Reported-by: Wen Xu
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 234b69e3e089d850a98e7b3145bd00e9b52b1111 upstream.

    While reading block, it is possible that io error return due to underlying
    storage issue, in this case, BH_NeedsValidate was left in the buffer head.
    Then when reading the very block next time, if it was already linked into
    journal, that will trigger the following panic.

    [203748.702517] kernel BUG at fs/ocfs2/buffer_head_io.c:342!
    [203748.702533] invalid opcode: 0000 [#1] SMP
    [203748.702561] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc dm_switch dm_queue_length dm_multipath bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif i2c_core ipmi_si ipmi_msghandler acpi_pad pcspkr sb_edac edac_core lpc_ich mfd_core shpchp sg tg3 ptp pps_core ext4 jbd2 mbcache2 sr_mod cdrom sd_mod ahci libahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod
    [203748.703024] CPU: 7 PID: 38369 Comm: touch Not tainted 4.1.12-124.18.6.el6uek.x86_64 #2
    [203748.703045] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, BIOS 2.5.2 01/28/2015
    [203748.703067] task: ffff880768139c00 ti: ffff88006ff48000 task.ti: ffff88006ff48000
    [203748.703088] RIP: 0010:[] [] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
    [203748.703130] RSP: 0018:ffff88006ff4b818 EFLAGS: 00010206
    [203748.703389] RAX: 0000000008620029 RBX: ffff88006ff4b910 RCX: 0000000000000000
    [203748.703885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000023079fe
    [203748.704382] RBP: ffff88006ff4b8d8 R08: 0000000000000000 R09: ffff8807578c25b0
    [203748.704877] R10: 000000000f637376 R11: 000000003030322e R12: 0000000000000000
    [203748.705373] R13: ffff88006ff4b910 R14: ffff880732fe38f0 R15: 0000000000000000
    [203748.705871] FS: 00007f401992c700(0000) GS:ffff880bfebc0000(0000) knlGS:0000000000000000
    [203748.706370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [203748.706627] CR2: 00007f4019252440 CR3: 00000000a621e000 CR4: 0000000000060670
    [203748.707124] Stack:
    [203748.707371] ffff88006ff4b828 ffffffffa0609f52 ffff88006ff4b838 0000000000000001
    [203748.707885] 0000000000000000 0000000000000000 ffff880bf67c3800 ffffffffa05eca00
    [203748.708399] 00000000023079ff ffffffff81c58b80 0000000000000000 0000000000000000
    [203748.708915] Call Trace:
    [203748.709175] [] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2]
    [203748.709680] [] ? ocfs2_empty_dir_filldir+0x80/0x80 [ocfs2]
    [203748.710185] [] ocfs2_read_dir_block_direct+0x3b/0x200 [ocfs2]
    [203748.710691] [] ocfs2_prepare_dx_dir_for_insert.isra.57+0x19f/0xf60 [ocfs2]
    [203748.711204] [] ? ocfs2_metadata_cache_io_unlock+0x1f/0x30 [ocfs2]
    [203748.711716] [] ocfs2_prepare_dir_for_insert+0x13a/0x890 [ocfs2]
    [203748.712227] [] ? ocfs2_check_dir_for_entry+0x8e/0x140 [ocfs2]
    [203748.712737] [] ocfs2_mknod+0x4b2/0x1370 [ocfs2]
    [203748.713003] [] ocfs2_create+0x65/0x170 [ocfs2]
    [203748.713263] [] vfs_create+0xdb/0x150
    [203748.713518] [] do_last+0x815/0x1210
    [203748.713772] [] ? path_init+0xb9/0x450
    [203748.714123] [] path_openat+0x80/0x600
    [203748.714378] [] ? handle_pte_fault+0xd15/0x1620
    [203748.714634] [] do_filp_open+0x3a/0xb0
    [203748.714888] [] ? __alloc_fd+0xa7/0x130
    [203748.715143] [] do_sys_open+0x12c/0x220
    [203748.715403] [] ? syscall_trace_enter_phase1+0x11b/0x180
    [203748.715668] [] ? system_call_after_swapgs+0xe9/0x190
    [203748.715928] [] SyS_open+0x1e/0x20
    [203748.716184] [] system_call_fastpath+0x18/0xd7
    [203748.716440] Code: 00 00 48 8b 7b 08 48 83 c3 10 45 89 f8 44 89 e1 44 89 f2 4c 89 ee e8 07 06 11 e1 48 8b 03 48 85 c0 75 df 8b 5d c8 e9 4d fa ff ff 0b 48 8b 7d a0 e8 dc c6 06 00 48 b8 00 00 00 00 00 00 00 10
    [203748.717505] RIP [] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
    [203748.717775] RSP

    Joesph ever reported a similar panic.
    Link: https://oss.oracle.com/pipermail/ocfs2-devel/2013-May/008931.html

    Link: http://lkml.kernel.org/r/20180912063207.29484-1-junxiao.bi@oracle.com
    Signed-off-by: Junxiao Bi
    Cc: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Changwei Ge
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Junxiao Bi
     
  • commit f061c1cc404a618858a77aea233fde0aeaad2f2d upstream.

    This reverts commit 11a6fc3dc743e22fb50f2196ec55bee5140d3c52.
    UBIFS wants to assert that xattr operations are only issued on files
    with positive link count. The said patch made this operations return
    -ENOENT for unlinked files such that the asserts will no longer trigger.
    This was wrong since xattr operations are perfectly fine on unlinked
    files.
    Instead the assertions need to be fixed/removed.

    Cc:
    Fixes: 11a6fc3dc743 ("ubifs: xattr: Don't operate on deleted inodes")
    Reported-by: Koen Vandeputte
    Tested-by: Joel Stanley
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Richard Weinberger
     

26 Sep, 2018

8 commits

  • [ Upstream commit cc57c07343bd071cdf1915a91a24ab7d40c9b590 ]

    This patch fixes a bug where configfs_register_group had added
    a group in a tree, and userspace has done a rmdir on a dir somewhere
    above that group and we hit a kernel crash. The problem is configfs_rmdir
    will detach everything under it and unlink groups on the default_groups
    list. It will not unlink groups added with configfs_register_group so when
    configfs_unregister_group is called to drop its references to the group/items
    we crash when we try to access the freed dentrys.

    The patch just adds a check for if a rmdir has been done above
    us and if so just does the unlink part of unregistration.

    Sorry if you are getting this multiple times. I thouhgt I sent
    this to some of you and lkml, but I do not see it.

    Signed-off-by: Mike Christie
    Cc: Christoph Hellwig
    Cc: Joel Becker
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Mike Christie
     
  • [ Upstream commit a6795a585929d94ca3e931bc8518f8deb8bbe627 ]

    The underlying real file used by overlayfs still contains the overlay path.
    This results in mnt_want_write_file() calls by the filesystem getting
    freeze protection on the wrong inode (the overlayfs one instead of the real
    one).

    Fix by using file_inode(file)->i_sb instead of file->f_path.mnt->mnt_sb.

    Reported-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • [ Upstream commit 2f819db565e82e5f73cd42b39925098986693378 ]

    The regset API documented in defines -ENODEV as the
    result of the `->active' handler to be used where the feature requested
    is not available on the hardware found. However code handling core file
    note generation in `fill_thread_core_info' interpretes any non-zero
    result from the `->active' handler as the regset requested being active.
    Consequently processing continues (and hopefully gracefully fails later
    on) rather than being abandoned right away for the regset requested.

    Fix the problem then by making the code proceed only if a positive
    result is returned from the `->active' handler.

    Signed-off-by: Maciej W. Rozycki
    Signed-off-by: Paul Burton
    Fixes: 4206d3aa1978 ("elf core dump: notes user_regset")
    Patchwork: https://patchwork.linux-mips.org/patch/19332/
    Cc: Alexander Viro
    Cc: James Hogan
    Cc: Ralf Baechle
    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Maciej W. Rozycki
     
  • commit 994b15b983a72e1148a173b61e5b279219bb45ae upstream.

    The previous fix broke recovery of delegated stateids because it assumes
    that if we did not mark the delegation as suspect, then the delegation has
    effectively been revoked, and so it removes that delegation irrespectively
    of whether or not it is valid and still in use. While this is "mostly
    harmless" for ordinary I/O, we've seen pNFS fail with LAYOUTGET spinning
    in an infinite loop while complaining that we're using an invalid stateid
    (in this case the all-zero stateid).

    What we rather want to do here is ensure that the delegation is always
    correctly marked as needing testing when that is the case. So we want
    to close the loophole offered by nfs4_schedule_stateid_recovery(),
    which marks the state as needing to be reclaimed, but not the
    delegation that may be backing it.

    Fixes: 0e3d3e5df07dc ("NFSv4.1 fix infinite loop on IO BAD_STATEID error")
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org # v4.11+
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit 56446f218af1133c802dad8e9e116f07f381846c upstream.

    The problem is that "entryptr + next_offset" and "entryptr + len + size"
    can wrap. I ended up changing the type of "entryptr" because it makes
    the math easier when we don't have to do so much casting.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Steve French
    Reviewed-by: Aurelien Aptel
    Reviewed-by: Pavel Shilovsky
    CC: Stable
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • commit 8ad8aa353524d89fa2e09522f3078166ff78ec42 upstream.

    The "old_entry + le32_to_cpu(pDirInfo->NextEntryOffset)" can wrap
    around so I have added a check for integer overflow.

    Reported-by: Dr Silvio Cesare of InfoSect
    Reviewed-by: Ronnie Sahlberg
    Reviewed-by: Aurelien Aptel
    Signed-off-by: Dan Carpenter
    Signed-off-by: Steve French
    CC: Stable
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • commit 831b624df1b420c8f9281ed1307a8db23afb72df upstream.

    persistent_ram_vmap() returns the page start vaddr.
    persistent_ram_iomap() supports non-page-aligned mapping.

    persistent_ram_buffer_map() always adds offset-in-page to the vaddr
    returned from these two functions, which causes incorrect mapping of
    non-page-aligned persistent ram buffer.

    By default ftrace_size is 4096 and max_ftrace_cnt is nr_cpu_ids. Without
    this patch, the zone_sz in ramoops_init_przs() is 4096/nr_cpu_ids which
    might not be page aligned. If the offset-in-page > 2048, the vaddr will be
    in next page. If the next page is not mapped, it will cause kernel panic:

    [ 0.074231] BUG: unable to handle kernel paging request at ffffa19e0081b000
    ...
    [ 0.075000] RIP: 0010:persistent_ram_new+0x1f8/0x39f
    ...
    [ 0.075000] Call Trace:
    [ 0.075000] ramoops_init_przs.part.10.constprop.15+0x105/0x260
    [ 0.075000] ramoops_probe+0x232/0x3a0
    [ 0.075000] platform_drv_probe+0x3e/0xa0
    [ 0.075000] driver_probe_device+0x2cd/0x400
    [ 0.075000] __driver_attach+0xe4/0x110
    [ 0.075000] ? driver_probe_device+0x400/0x400
    [ 0.075000] bus_for_each_dev+0x70/0xa0
    [ 0.075000] driver_attach+0x1e/0x20
    [ 0.075000] bus_add_driver+0x159/0x230
    [ 0.075000] ? do_early_param+0x95/0x95
    [ 0.075000] driver_register+0x70/0xc0
    [ 0.075000] ? init_pstore_fs+0x4d/0x4d
    [ 0.075000] __platform_driver_register+0x36/0x40
    [ 0.075000] ramoops_init+0x12f/0x131
    [ 0.075000] do_one_initcall+0x4d/0x12c
    [ 0.075000] ? do_early_param+0x95/0x95
    [ 0.075000] kernel_init_freeable+0x19b/0x222
    [ 0.075000] ? rest_init+0xbb/0xbb
    [ 0.075000] kernel_init+0xe/0xfc
    [ 0.075000] ret_from_fork+0x3a/0x50

    Signed-off-by: Bin Yang
    [kees: add comments describing the mapping differences, updated commit log]
    Fixes: 24c3d2f342ed ("staging: android: persistent_ram: Make it possible to use memory outside of bootmem")
    Cc: stable@vger.kernel.org
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Bin Yang
     
  • [ Upstream commit e79e0e1428188b24c3b57309ffa54a33c4ae40c4 ]

    Before this patch, you could get into situations like this:

    1. Process 1 searches for X free blocks, finds them, makes a reservation
    2. Process 2 searches for free blocks in the same rgrp, but now the
    bitmap is full because process 1's reservation is skipped over.
    So it marks the bitmap as GBF_FULL.
    3. Process 1 tries to allocate blocks from its own reservation, but
    since the GBF_FULL bit is set, it skips over the rgrp and searches
    elsewhere, thus not using its own reservation.

    This patch adds an additional check to allow processes to use their
    own reservations.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Bob Peterson