30 May, 2018

2 commits

  • commit 8f89c007b6dec16a1793cb88de88fcc02117bbbc upstream.

    shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
    fact the very first thing we check for. Andrea reported that for
    SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
    but we need to check again if the address was rounded down to nil. As
    of this patch, such cases will return -EINVAL.

    Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805
    Signed-off-by: Davidlohr Bueso
    Reported-by: Andrea Arcangeli
    Cc: Joe Lawrence
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit a73ab244f0dad8fffb3291b905f73e2d3eaa7c00 upstream.

    Patch series "ipc/shm: shmat() fixes around nil-page".

    These patches fix two issues reported[1] a while back by Joe and Andrea
    around how shmat(2) behaves with nil-page.

    The first reverts a commit that it was incorrectly thought that mapping
    nil-page (address=0) was a no no with MAP_FIXED. This is not the case,
    with the exception of SHM_REMAP; which is address in the second patch.

    I chose two patches because it is easier to backport and it explicitly
    reverts bogus behaviour. Both patches ought to be in -stable and ltp
    testcases need updated (the added testcase around the cve can be
    modified to just test for SHM_RND|SHM_REMAP).

    [1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805

    This patch (of 2):

    Commit 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
    worked on the idea that we should not be mapping as root addr=0 and
    MAP_FIXED. However, it was reported that this scenario is in fact
    valid, thus making the patch both bogus and breaks userspace as well.

    For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
    initialization[1].

    [1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
    Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
    Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
    Signed-off-by: Davidlohr Bueso
    Reported-by: Joe Lawrence
    Reported-by: Andrea Arcangeli
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     

24 Apr, 2018

1 commit

  • commit 3f05317d9889ab75c7190dcd39491d2a97921984 upstream.

    syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
    shm_get_unmapped_area(), called via sys_remap_file_pages().

    Unfortunately it couldn't generate a reproducer, but I found a bug which
    I think caused it. When remap_file_pages() is passed a full System V
    shared memory segment, the memory is first unmapped, then a new map is
    created using the ->vm_file. Between these steps, the shm ID can be
    removed and reused for a new shm segment. But, shm_mmap() only checks
    whether the ID is currently valid before calling the underlying file's
    ->mmap(); it doesn't check whether it was reused. Thus it can use the
    wrong underlying file, one that was already freed.

    Fix this by making the "outer" shm file (the one that gets put in
    ->vm_file) hold a reference to the real shm file, and by making
    __shm_open() require that the file associated with the shm ID matches
    the one associated with the "outer" file.

    Taking the reference to the real shm file is needed to fully solve the
    problem, since otherwise sfd->file could point to a freed file, which
    then could be reallocated for the reused shm ID, causing the wrong shm
    segment to be mapped (and without the required permission checks).

    Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in
    shm_mmap()") almost fixed this bug, but it didn't go far enough because
    it didn't consider the case where the shm ID is reused.

    The following program usually reproduces this bug:

    #include
    #include
    #include
    #include

    int main()
    {
    int is_parent = (fork() != 0);
    srand(getpid());
    for (;;) {
    int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
    if (is_parent) {
    void *addr = shmat(id, NULL, 0);
    usleep(rand() % 50);
    while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
    } else {
    usleep(rand() % 50);
    shmctl(id, IPC_RMID, NULL);
    }
    }
    }

    It causes the following NULL pointer dereference due to a 'struct file'
    being used while it's being freed. (I couldn't actually get a KASAN
    use-after-free splat like in the syzbot report. But I think it's
    possible with this bug; it would just take a more extraordinary race...)

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
    PGD 0 P4D 0
    Oops: 0000 [#1] SMP NOPTI
    CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
    RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
    RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
    [...]
    Call Trace:
    file_accessed include/linux/fs.h:2063 [inline]
    shmem_mmap+0x25/0x40 mm/shmem.c:2149
    call_mmap include/linux/fs.h:1789 [inline]
    shm_mmap+0x34/0x80 ipc/shm.c:465
    call_mmap include/linux/fs.h:1789 [inline]
    mmap_region+0x309/0x5b0 mm/mmap.c:1712
    do_mmap+0x294/0x4a0 mm/mmap.c:1483
    do_mmap_pgoff include/linux/mm.h:2235 [inline]
    SYSC_remap_file_pages mm/mmap.c:2853 [inline]
    SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
    do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    [ebiggers@google.com: add comment]
    Link: http://lkml.kernel.org/r/20180410192850.235835-1-ebiggers3@gmail.com
    Link: http://lkml.kernel.org/r/20180409043039.28915-1-ebiggers3@gmail.com
    Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439@syzkaller.appspotmail.com
    Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation")
    Signed-off-by: Eric Biggers
    Acked-by: Kirill A. Shutemov
    Acked-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: "Eric W . Biederman"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     

08 Apr, 2018

1 commit

  • commit 3d942ee079b917b24e2a0c5f18d35ac8ec9fee48 upstream.

    If System V shmget/shmat operations are used to create a hugetlbfs
    backed mapping, it is possible to munmap part of the mapping and split
    the underlying vma such that it is not huge page aligned. This will
    untimately result in the following BUG:

    kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE SMP NR_CPUS=2048 NUMA PowerNV
    Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
    CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G C E 4.15.0-10-generic #11-Ubuntu
    NIP: c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009
    REGS: c000003fbcdcf810 TRAP: 0700 Tainted: G C E (4.15.0-10-generic)
    MSR: 9000000000029033 CR: 24002222 XER: 20040000
    CFAR: c00000000036ee44 SOFTE: 1
    NIP __unmap_hugepage_range+0xa4/0x760
    LR __unmap_hugepage_range_final+0x28/0x50
    Call Trace:
    0x7115e4e00000 (unreliable)
    __unmap_hugepage_range_final+0x28/0x50
    unmap_single_vma+0x11c/0x190
    unmap_vmas+0x94/0x140
    exit_mmap+0x9c/0x1d0
    mmput+0xa8/0x1d0
    do_exit+0x360/0xc80
    do_group_exit+0x60/0x100
    SyS_exit_group+0x24/0x30
    system_call+0x58/0x6c
    ---[ end trace ee88f958a1c62605 ]---

    This bug was introduced by commit 31383c6865a5 ("mm, hugetlbfs:
    introduce ->split() to vm_operations_struct"). A split function was
    added to vm_operations_struct to determine if a mapping can be split.
    This was mostly for device-dax and hugetlbfs mappings which have
    specific alignment constraints.

    Mappings initiated via shmget/shmat have their original vm_ops
    overwritten with shm_vm_ops. shm_vm_ops functions will call back to the
    original vm_ops if needed. Add such a split function to shm_vm_ops.

    Link: http://lkml.kernel.org/r/20180321161314.7711-1-mike.kravetz@oracle.com
    Fixes: 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct")
    Signed-off-by: Mike Kravetz
    Reported-by: Laurent Dufour
    Reviewed-by: Laurent Dufour
    Tested-by: Laurent Dufour
    Reviewed-by: Dan Williams
    Acked-by: Michal Hocko
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Mike Kravetz
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

26 Sep, 2017

1 commit


21 Sep, 2017

1 commit

  • Commit 553f770ef71b ("ipc: move compat shmctl to native") moved the
    compat IPC syscall handling into ipc/shm.c and refactored the struct
    accessors in the process. Unfortunately, the call to
    copy_compat_shmid_to_user when handling a compat {IPC,SHM}_STAT command
    gets the arguments the wrong way round, passing a kernel stack address
    as the user buffer (destination) and the user buffer as the kernel stack
    address (source).

    This patch fixes the parameter ordering so the buffers are accessed
    correctly.

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Will Deacon
    Signed-off-by: Al Viro

    Will Deacon
     

15 Sep, 2017

1 commit

  • Pull ipc compat cleanup and 64-bit time_t from Al Viro:
    "IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa
    Dinamani"

    * 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    utimes: Make utimes y2038 safe
    ipc: shm: Make shmid_kernel timestamps y2038 safe
    ipc: sem: Make sem_array timestamps y2038 safe
    ipc: msg: Make msg_queue timestamps y2038 safe
    ipc: mqueue: Replace timespec with timespec64
    ipc: Make sys_semtimedop() y2038 safe
    get rid of SYSVIPC_COMPAT on ia64
    semtimedop(): move compat to native
    shmat(2): move compat to native
    msgrcv(2), msgsnd(2): move compat to native
    ipc(2): move compat to native
    ipc: make use of compat ipc_perm helpers
    semctl(): move compat to native
    semctl(): separate all layout-dependent copyin/copyout
    msgctl(): move compat to native
    msgctl(): split the actual work from copyin/copyout
    ipc: move compat shmctl to native
    shmctl: split the work from copyin/copyout

    Linus Torvalds
     

09 Sep, 2017

1 commit

  • ipc_findkey() used to scan all objects to look for the wanted key. This
    is slow when using a high number of keys. This change adds an rhashtable
    of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n).

    This change gives a 865% improvement of benchmark reaim.jobs_per_min on a
    56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1]

    Other (more micro) benchmark results, by the author: On an i5 laptop, the
    following loop executed right after a reboot took, without and with this
    change:

    for (int i = 0, k=0x424242; i < KEYS; ++i)
    semget(k++, 1, IPC_CREAT | 0600);

    total total max single max single
    KEYS without with call without call with

    1 3.5 4.9 µs 3.5 4.9
    10 7.6 8.6 µs 3.7 4.7
    32 16.2 15.9 µs 4.3 5.3
    100 72.9 41.8 µs 3.7 4.7
    1000 5,630.0 502.0 µs * *
    10000 1,340,000.0 7,240.0 µs * *
    31900 17,600,000.0 22,200.0 µs * *

    *: unreliable measure: high variance

    The duration for a lookup-only usage was obtained by the same loop once
    the keys are present:

    total total max single max single
    KEYS without with call without call with

    1 2.1 2.5 µs 2.1 2.5
    10 4.5 4.8 µs 2.2 2.3
    32 13.0 10.8 µs 2.3 2.8
    100 82.9 25.1 µs * 2.3
    1000 5,780.0 217.0 µs * *
    10000 1,470,000.0 2,520.0 µs * *
    31900 17,400,000.0 7,810.0 µs * *

    Finally, executing each semget() in a new process gave, when still
    summing only the durations of these syscalls:

    creation:
    total total
    KEYS without with

    1 3.7 5.0 µs
    10 32.9 36.7 µs
    32 125.0 109.0 µs
    100 523.0 353.0 µs
    1000 20,300.0 3,280.0 µs
    10000 2,470,000.0 46,700.0 µs
    31900 27,800,000.0 219,000.0 µs

    lookup-only:
    total total
    KEYS without with

    1 2.5 2.7 µs
    10 25.4 24.4 µs
    32 106.0 72.6 µs
    100 591.0 352.0 µs
    1000 22,400.0 2,250.0 µs
    10000 2,510,000.0 25,700.0 µs
    31900 28,200,000.0 115,000.0 µs

    [1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop

    Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix
    Signed-off-by: Guillaume Knispel
    Reviewed-by: Marc Pardo
    Cc: Davidlohr Bueso
    Cc: Kees Cook
    Cc: Manfred Spraul
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Cc: "Peter Zijlstra (Intel)"
    Cc: Ingo Molnar
    Cc: Sebastian Andrzej Siewior
    Cc: Serge Hallyn
    Cc: Andrey Vagin
    Cc: Guillaume Knispel
    Cc: Marc Pardo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guillaume Knispel
     

04 Sep, 2017

1 commit

  • time_t is not y2038 safe. Replace all uses of
    time_t by y2038 safe time64_t.

    Similarly, replace the calls to get_seconds() with
    y2038 safe ktime_get_real_seconds().
    Note that this preserves fast access on 64 bit systems,
    but 32 bit systems need sequence counters.

    The syscall interfaces themselves are not changed as part of
    the patch. They will be part of a different series.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Al Viro

    Deepa Dinamani
     

03 Aug, 2017

1 commit

  • When building with the randstruct gcc plugin, the layout of the IPC
    structs will be randomized, which requires any sub-structure accesses to
    use container_of(). The proc display handlers were missing the needed
    container_of()s since the iterator is passing in the top-level struct
    kern_ipc_perm.

    This would lead to crashes when running the "lsipc" program after the
    system had IPC registered (e.g. after starting up Gnome):

    general protection fault: 0000 [#1] PREEMPT SMP
    ...
    RIP: 0010:shm_add_rss_swap.isra.1+0x13/0xa0
    ...
    Call Trace:
    sysvipc_shm_proc_show+0x5e/0x150
    sysvipc_proc_show+0x1a/0x30
    seq_read+0x2e9/0x3f0
    ...

    Link: http://lkml.kernel.org/r/20170730205950.GA55841@beast
    Fixes: 3859a271a003 ("randstruct: Mark various structs for randomization")
    Signed-off-by: Kees Cook
    Reported-by: Dominik Brodowski
    Acked-by: Davidlohr Bueso
    Acked-by: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

16 Jul, 2017

4 commits


13 Jul, 2017

6 commits

  • There is nothing special about the shm_alloc/free routines any more, so
    remove them to make code more readable.

    [manfred@colorfullife.com: Rediff, to continue to keep rcu for free calls after a successful security_shm_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-18-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Only after ipc_addid() has succeeded will refcounting be used, so move
    initialization into ipc_addid() and remove from open-coded *_alloc()
    routines.

    Link: http://lkml.kernel.org/r/20170525185107.12869-17-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Loosely based on a patch from Kees Cook :
    - id and error can be merged
    - if operations before ipc_addid() fail, then use call_rcu() directly.

    The difference is that call_rcu is used for failures after
    security_shm_alloc(), to continue to guaranteed an rcu delay for
    security_sem_free().

    Link: http://lkml.kernel.org/r/20170525185107.12869-15-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Instead of using ipc_rcu_alloc() which only performs the refcount bump,
    open code it. This also allows for shmid_kernel structure layout to be
    randomized in the future.

    Link: http://lkml.kernel.org/r/20170525185107.12869-11-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Avoid using ipc_rcu_free, since it just re-finds the original structure
    pointer. For the pre-list-init failure path, there is no RCU needed,
    since it was just allocated. It can be directly freed.

    Link: http://lkml.kernel.org/r/20170525185107.12869-7-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • ipc has two management structures that exist for every id:
    - struct kern_ipc_perm, it contains e.g. the permissions.
    - struct ipc_rcu, it contains the rcu head for rcu handling and the
    refcount.

    The patch merges both structures.

    As a bonus, we may save one cacheline, because both structures are
    cacheline aligned. In addition, it reduces the number of casts, instead
    most codepaths can use container_of.

    To simplify code, the ipc_rcu_alloc initializes the allocation to 0.

    [manfred@colorfullife.com: really include the memset() into ipc_alloc_rcu()]
    Link: http://lkml.kernel.org/r/564f8612-0601-b267-514f-a9f650ec9b32@colorfullife.com
    Link: http://lkml.kernel.org/r/20170525185107.12869-3-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

06 Jul, 2017

1 commit


09 May, 2017

1 commit

  • Clean up early flag and address some minutia.

    Link: http://lkml.kernel.org/r/1486673582-6979-3-git-send-email-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

03 Mar, 2017

1 commit

  • Pull vfs pile two from Al Viro:

    - orangefs fix

    - series of fs/namei.c cleanups from me

    - VFS stuff coming from overlayfs tree

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    orangefs: Use RCU for destroy_inode
    vfs: use helper for calling f_op->fsync()
    mm: use helper for calling f_op->mmap()
    vfs: use helpers for calling f_op->{read,write}_iter()
    vfs: pass type instead of fn to do_{loop,iter}_readv_writev()
    vfs: extract common parts of {compat_,}do_readv_writev()
    vfs: wrap write f_ops with file_{start,end}_write()
    vfs: deny copy_file_range() for non regular files
    vfs: deny fallocate() on directory
    vfs: create vfs helper vfs_tmpfile()
    namei.c: split unlazy_walk()
    namei.c: fold the check for DCACHE_OP_REVALIDATE into d_revalidate()
    lookup_fast(): clean up the logics around the fallback to non-rcu mode
    namei: fold unlazy_link() into its sole caller

    Linus Torvalds
     

28 Feb, 2017

1 commit

  • The issue is described here, with a nice testcase:

    https://bugzilla.kernel.org/show_bug.cgi?id=192931

    The problem is that shmat() calls do_mmap_pgoff() with MAP_FIXED, and
    the address rounded down to 0. For the regular mmap case, the
    protection mentioned above is that the kernel gets to generate the
    address -- arch_get_unmapped_area() will always check for MAP_FIXED and
    return that address. So by the time we do security_mmap_addr(0) things
    get funky for shmat().

    The testcase itself shows that while a regular user crashes, root will
    not have a problem attaching a nil-page. There are two possible fixes
    to this. The first, and which this patch does, is to simply allow root
    to crash as well -- this is also regular mmap behavior, ie when hacking
    up the testcase and adding mmap(... |MAP_FIXED). While this approach
    is the safer option, the second alternative is to ignore SHM_RND if the
    rounded address is 0, thus only having MAP_SHARED flags. This makes the
    behavior of shmat() identical to the mmap() case. The downside of this
    is obviously user visible, but does make sense in that it maintains
    semantics after the round-down wrt 0 address and mmap.

    Passes shm related ltp tests.

    Link: http://lkml.kernel.org/r/1486050195-18629-1-git-send-email-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Reported-by: Gareth Evans
    Cc: Manfred Spraul
    Cc: Michael Kerrisk
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

25 Feb, 2017

2 commits

  • When a non-cooperative userfaultfd monitor copies pages in the
    background, it may encounter regions that were already unmapped.
    Addition of UFFD_EVENT_UNMAP allows the uffd monitor to track precisely
    changes in the virtual memory layout.

    Since there might be different uffd contexts for the affected VMAs, we
    first should create a temporary representation for the unmap event for
    each uffd context and then notify them one by one to the appropriate
    userfault file descriptors.

    The event notification occurs after the mmap_sem has been released.

    [arnd@arndb.de: fix nommu build]
    Link: http://lkml.kernel.org/r/20170203165141.3665284-1-arnd@arndb.de
    [mhocko@suse.com: fix nommu build]
    Link: http://lkml.kernel.org/r/20170202091503.GA22823@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/1485542673-24387-3-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Michal Hocko
    Signed-off-by: Arnd Bergmann
    Acked-by: Hillf Danton
    Cc: Andrea Arcangeli
    Cc: "Dr. David Alan Gilbert"
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
    take a vma and vmf parameter when the vma already resides in vmf.

    Remove the vma parameter to simplify things.

    [arnd@arndb.de: fix ARM build]
    Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Ross Zwisler
    Cc: Theodore Ts'o
    Cc: Darrick J. Wong
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     

20 Feb, 2017

2 commits


15 Dec, 2016

1 commit

  • This patch fixes below warnings:

    WARNING: Missing a blank line after declarations
    WARNING: Block comments use a trailing */ on a separate line
    ERROR: spaces required around that '=' (ctx:WxV)

    Above warnings were reported by checkpatch.pl

    Link: http://lkml.kernel.org/r/1478604980-18062-1-git-send-email-p.shailesh@samsung.com
    Signed-off-by: Shailesh Pandey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailesh Pandey
     

27 Jul, 2016

2 commits

  • We are going to need to call shmem_charge() under tree_lock to get
    accoutning right on collapse of small tmpfs pages into a huge one.

    The problem is that tree_lock is irq-safe and lockdep is not happy, that
    we take irq-unsafe lock under irq-safe[1].

    Let's convert the lock to irq-safe.

    [1] https://gist.github.com/kiryl/80c0149e03ed35dfaf26628b8e03cdbc

    Link: http://lkml.kernel.org/r/1466021202-61880-34-git-send-email-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Provide a shmem_get_unmapped_area method in file_operations, called at
    mmap time to decide the mapping address. It could be conditional on
    CONFIG_TRANSPARENT_HUGEPAGE, but save #ifdefs in other places by making
    it unconditional.

    shmem_get_unmapped_area() first calls the usual mm->get_unmapped_area
    (which we treat as a black box, highly dependent on architecture and
    config and executable layout). Lots of conditions, and in most cases it
    just goes with the address that chose; but when our huge stars are
    rightly aligned, yet that did not provide a suitable address, go back to
    ask for a larger arena, within which to align the mapping suitably.

    There have to be some direct calls to shmem_get_unmapped_area(), not via
    the file_operations: because of the way shmem_zero_setup() is called to
    create a shmem object late in the mmap sequence, when MAP_SHARED is
    requested with MAP_ANONYMOUS or /dev/zero. Though this only matters
    when /proc/sys/vm/shmem_huge has been set.

    Link: http://lkml.kernel.org/r/1466021202-61880-29-git-send-email-kirill.shutemov@linux.intel.com
    Signed-off-by: Hugh Dickins
    Signed-off-by: Kirill A. Shutemov

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

24 May, 2016

1 commit

  • shmat and shmdt rely on mmap_sem for write. If the waiting task gets
    killed by the oom killer it would block oom_reaper from asynchronous
    address space reclaim and reduce the chances of timely OOM resolving.
    Wait for the lock in the killable mode and return with EINTR if the task
    got killed while waiting.

    Signed-off-by: Michal Hocko
    Acked-by: Davidlohr Bueso
    Acked-by: Vlastimil Babka
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

19 Feb, 2016

1 commit

  • remap_file_pages(2) emulation can reach file which represents removed
    IPC ID as long as a memory segment is mapped. It breaks expectations of
    IPC subsystem.

    Test case (rewritten to be more human readable, originally autogenerated
    by syzkaller[1]):

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include

    #define PAGE_SIZE 4096

    int main()
    {
    int id;
    void *p;

    id = shmget(IPC_PRIVATE, 3 * PAGE_SIZE, 0);
    p = shmat(id, NULL, 0);
    shmctl(id, IPC_RMID, NULL);
    remap_file_pages(p, 3 * PAGE_SIZE, 0, 7, 0);

    return 0;
    }

    The patch changes shm_mmap() and code around shm_lock() to propagate
    locking error back to caller of shm_mmap().

    [1] http://github.com/google/syzkaller

    Signed-off-by: Kirill A. Shutemov
    Reported-by: Dmitry Vyukov
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

21 Jan, 2016

1 commit

  • Make is_file_shm_hugepages() return bool to improve readability due to
    this particular function only using either one or zero as its return
    value.

    No functional change.

    Signed-off-by: Yaowei Bai
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yaowei Bai
     

01 Oct, 2015

1 commit

  • As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
    having initialized the IPC object state. Yes, we initialize the IPC
    object in a locked state, but with all the lockless RCU lookup work,
    that IPC object lock no longer means that the state cannot be seen.

    We already did this for the IPC semaphore code (see commit e8577d1f0329:
    "ipc/sem.c: fully initialize sem_array before making it visible") but we
    clearly forgot about msg and shm.

    Reported-by: Dmitry Vyukov
    Cc: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

11 Sep, 2015

1 commit

  • Considering Linus' past rants about the (ab)use of BUG in the kernel, I
    took a look at how we deal with such calls in ipc. Given that any errors
    or corruption in ipc code are most likely contained within the set of
    processes participating in the broken mechanisms, there aren't really many
    strong fatal system failure scenarios that would require a BUG call.
    Also, if something is seriously wrong, ipc might not be the place for such
    a BUG either.

    1. For example, recently, a customer hit one of these BUG_ONs in shm
    after failing shm_lock(). A busted ID imho does not merit a BUG_ON,
    and WARN would have been better.

    2. MSG_COPY functionality of posix msgrcv(2) for checkpoint/restore.
    I don't see how we can hit this anyway -- at least it should be IS_ERR.
    The 'copy' arg from do_msgrcv is always set by calling prepare_copy()
    first and foremost. We could also probably drop this check altogether.
    Either way, it does not merit a BUG_ON.

    3. No ->fault() callback for the fs getting the corresponding page --
    seems selfish to make the system unusable.

    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

07 Aug, 2015

1 commit

  • The shm implementation internally uses shmem or hugetlbfs inodes for shm
    segments. As these inodes are never directly exposed to userspace and
    only accessed through the shm operations which are already hooked by
    security modules, mark the inodes with the S_PRIVATE flag so that inode
    security initialization and permission checking is skipped.

    This was motivated by the following lockdep warning:

    ======================================================
    [ INFO: possible circular locking dependency detected ]
    4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: G W
    -------------------------------------------------------
    httpd/1597 is trying to acquire lock:
    (&ids->rwsem){+++++.}, at: shm_close+0x34/0x130
    but task is already holding lock:
    (&mm->mmap_sem){++++++}, at: SyS_shmdt+0x4b/0x180
    which lock already depends on the new lock.
    the existing dependency chain (in reverse order) is:
    -> #3 (&mm->mmap_sem){++++++}:
    lock_acquire+0xc7/0x270
    __might_fault+0x7a/0xa0
    filldir+0x9e/0x130
    xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
    xfs_readdir+0x1b4/0x330 [xfs]
    xfs_file_readdir+0x2b/0x30 [xfs]
    iterate_dir+0x97/0x130
    SyS_getdents+0x91/0x120
    entry_SYSCALL_64_fastpath+0x12/0x76
    -> #2 (&xfs_dir_ilock_class){++++.+}:
    lock_acquire+0xc7/0x270
    down_read_nested+0x57/0xa0
    xfs_ilock+0x167/0x350 [xfs]
    xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
    xfs_attr_get+0xbd/0x190 [xfs]
    xfs_xattr_get+0x3d/0x70 [xfs]
    generic_getxattr+0x4f/0x70
    inode_doinit_with_dentry+0x162/0x670
    sb_finish_set_opts+0xd9/0x230
    selinux_set_mnt_opts+0x35c/0x660
    superblock_doinit+0x77/0xf0
    delayed_superblock_init+0x10/0x20
    iterate_supers+0xb3/0x110
    selinux_complete_init+0x2f/0x40
    security_load_policy+0x103/0x600
    sel_write_load+0xc1/0x750
    __vfs_write+0x37/0x100
    vfs_write+0xa9/0x1a0
    SyS_write+0x58/0xd0
    entry_SYSCALL_64_fastpath+0x12/0x76
    ...

    Signed-off-by: Stephen Smalley
    Reported-by: Morten Stevens
    Acked-by: Hugh Dickins
    Acked-by: Paul Moore
    Cc: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Prarit Bhargava
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Smalley
     

01 Jul, 2015

2 commits

  • ... to ipc_obtain_object_idr, which is more meaningful and makes the code
    slightly easier to follow.

    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Upon every shm_lock call, we BUG_ON if an error was returned, indicating
    racing either in idr or in shm_destroy. Move this logic into the locking.

    [akpm@linux-foundation.org: simplify code]
    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso