05 Sep, 2018

1 commit

  • [ Upstream commit f075faa300acc4f6301e348acde0a4580ed5f77c ]

    In order for load/store tearing prevention to work, _all_ accesses to
    the variable in question need to be done around READ and WRITE_ONCE()
    macros. Ensure everyone does so for q->status variable for
    semtimedop().

    Link: http://lkml.kernel.org/r/20180717052654.676-1-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     

30 May, 2018

2 commits

  • commit 8f89c007b6dec16a1793cb88de88fcc02117bbbc upstream.

    shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
    fact the very first thing we check for. Andrea reported that for
    SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
    but we need to check again if the address was rounded down to nil. As
    of this patch, such cases will return -EINVAL.

    Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805
    Signed-off-by: Davidlohr Bueso
    Reported-by: Andrea Arcangeli
    Cc: Joe Lawrence
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit a73ab244f0dad8fffb3291b905f73e2d3eaa7c00 upstream.

    Patch series "ipc/shm: shmat() fixes around nil-page".

    These patches fix two issues reported[1] a while back by Joe and Andrea
    around how shmat(2) behaves with nil-page.

    The first reverts a commit that it was incorrectly thought that mapping
    nil-page (address=0) was a no no with MAP_FIXED. This is not the case,
    with the exception of SHM_REMAP; which is address in the second patch.

    I chose two patches because it is easier to backport and it explicitly
    reverts bogus behaviour. Both patches ought to be in -stable and ltp
    testcases need updated (the added testcase around the cve can be
    modified to just test for SHM_RND|SHM_REMAP).

    [1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805

    This patch (of 2):

    Commit 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
    worked on the idea that we should not be mapping as root addr=0 and
    MAP_FIXED. However, it was reported that this scenario is in fact
    valid, thus making the patch both bogus and breaks userspace as well.

    For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
    initialization[1].

    [1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
    Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
    Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
    Signed-off-by: Davidlohr Bueso
    Reported-by: Joe Lawrence
    Reported-by: Andrea Arcangeli
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     

24 Apr, 2018

1 commit

  • commit 3f05317d9889ab75c7190dcd39491d2a97921984 upstream.

    syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
    shm_get_unmapped_area(), called via sys_remap_file_pages().

    Unfortunately it couldn't generate a reproducer, but I found a bug which
    I think caused it. When remap_file_pages() is passed a full System V
    shared memory segment, the memory is first unmapped, then a new map is
    created using the ->vm_file. Between these steps, the shm ID can be
    removed and reused for a new shm segment. But, shm_mmap() only checks
    whether the ID is currently valid before calling the underlying file's
    ->mmap(); it doesn't check whether it was reused. Thus it can use the
    wrong underlying file, one that was already freed.

    Fix this by making the "outer" shm file (the one that gets put in
    ->vm_file) hold a reference to the real shm file, and by making
    __shm_open() require that the file associated with the shm ID matches
    the one associated with the "outer" file.

    Taking the reference to the real shm file is needed to fully solve the
    problem, since otherwise sfd->file could point to a freed file, which
    then could be reallocated for the reused shm ID, causing the wrong shm
    segment to be mapped (and without the required permission checks).

    Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in
    shm_mmap()") almost fixed this bug, but it didn't go far enough because
    it didn't consider the case where the shm ID is reused.

    The following program usually reproduces this bug:

    #include
    #include
    #include
    #include

    int main()
    {
    int is_parent = (fork() != 0);
    srand(getpid());
    for (;;) {
    int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
    if (is_parent) {
    void *addr = shmat(id, NULL, 0);
    usleep(rand() % 50);
    while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
    } else {
    usleep(rand() % 50);
    shmctl(id, IPC_RMID, NULL);
    }
    }
    }

    It causes the following NULL pointer dereference due to a 'struct file'
    being used while it's being freed. (I couldn't actually get a KASAN
    use-after-free splat like in the syzbot report. But I think it's
    possible with this bug; it would just take a more extraordinary race...)

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
    PGD 0 P4D 0
    Oops: 0000 [#1] SMP NOPTI
    CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
    RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
    RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
    [...]
    Call Trace:
    file_accessed include/linux/fs.h:2063 [inline]
    shmem_mmap+0x25/0x40 mm/shmem.c:2149
    call_mmap include/linux/fs.h:1789 [inline]
    shm_mmap+0x34/0x80 ipc/shm.c:465
    call_mmap include/linux/fs.h:1789 [inline]
    mmap_region+0x309/0x5b0 mm/mmap.c:1712
    do_mmap+0x294/0x4a0 mm/mmap.c:1483
    do_mmap_pgoff include/linux/mm.h:2235 [inline]
    SYSC_remap_file_pages mm/mmap.c:2853 [inline]
    SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
    do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    [ebiggers@google.com: add comment]
    Link: http://lkml.kernel.org/r/20180410192850.235835-1-ebiggers3@gmail.com
    Link: http://lkml.kernel.org/r/20180409043039.28915-1-ebiggers3@gmail.com
    Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439@syzkaller.appspotmail.com
    Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation")
    Signed-off-by: Eric Biggers
    Acked-by: Kirill A. Shutemov
    Acked-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: "Eric W . Biederman"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     

08 Apr, 2018

1 commit

  • commit 3d942ee079b917b24e2a0c5f18d35ac8ec9fee48 upstream.

    If System V shmget/shmat operations are used to create a hugetlbfs
    backed mapping, it is possible to munmap part of the mapping and split
    the underlying vma such that it is not huge page aligned. This will
    untimately result in the following BUG:

    kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE SMP NR_CPUS=2048 NUMA PowerNV
    Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
    CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G C E 4.15.0-10-generic #11-Ubuntu
    NIP: c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009
    REGS: c000003fbcdcf810 TRAP: 0700 Tainted: G C E (4.15.0-10-generic)
    MSR: 9000000000029033 CR: 24002222 XER: 20040000
    CFAR: c00000000036ee44 SOFTE: 1
    NIP __unmap_hugepage_range+0xa4/0x760
    LR __unmap_hugepage_range_final+0x28/0x50
    Call Trace:
    0x7115e4e00000 (unreliable)
    __unmap_hugepage_range_final+0x28/0x50
    unmap_single_vma+0x11c/0x190
    unmap_vmas+0x94/0x140
    exit_mmap+0x9c/0x1d0
    mmput+0xa8/0x1d0
    do_exit+0x360/0xc80
    do_group_exit+0x60/0x100
    SyS_exit_group+0x24/0x30
    system_call+0x58/0x6c
    ---[ end trace ee88f958a1c62605 ]---

    This bug was introduced by commit 31383c6865a5 ("mm, hugetlbfs:
    introduce ->split() to vm_operations_struct"). A split function was
    added to vm_operations_struct to determine if a mapping can be split.
    This was mostly for device-dax and hugetlbfs mappings which have
    specific alignment constraints.

    Mappings initiated via shmget/shmat have their original vm_ops
    overwritten with shm_vm_ops. shm_vm_ops functions will call back to the
    original vm_ops if needed. Add such a split function to shm_vm_ops.

    Link: http://lkml.kernel.org/r/20180321161314.7711-1-mike.kravetz@oracle.com
    Fixes: 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct")
    Signed-off-by: Mike Kravetz
    Reported-by: Laurent Dufour
    Reviewed-by: Laurent Dufour
    Tested-by: Laurent Dufour
    Reviewed-by: Dan Williams
    Acked-by: Michal Hocko
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Mike Kravetz
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

26 Sep, 2017

1 commit


21 Sep, 2017

1 commit

  • Commit 553f770ef71b ("ipc: move compat shmctl to native") moved the
    compat IPC syscall handling into ipc/shm.c and refactored the struct
    accessors in the process. Unfortunately, the call to
    copy_compat_shmid_to_user when handling a compat {IPC,SHM}_STAT command
    gets the arguments the wrong way round, passing a kernel stack address
    as the user buffer (destination) and the user buffer as the kernel stack
    address (source).

    This patch fixes the parameter ordering so the buffers are accessed
    correctly.

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Will Deacon
    Signed-off-by: Al Viro

    Will Deacon
     

15 Sep, 2017

1 commit

  • Pull ipc compat cleanup and 64-bit time_t from Al Viro:
    "IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa
    Dinamani"

    * 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    utimes: Make utimes y2038 safe
    ipc: shm: Make shmid_kernel timestamps y2038 safe
    ipc: sem: Make sem_array timestamps y2038 safe
    ipc: msg: Make msg_queue timestamps y2038 safe
    ipc: mqueue: Replace timespec with timespec64
    ipc: Make sys_semtimedop() y2038 safe
    get rid of SYSVIPC_COMPAT on ia64
    semtimedop(): move compat to native
    shmat(2): move compat to native
    msgrcv(2), msgsnd(2): move compat to native
    ipc(2): move compat to native
    ipc: make use of compat ipc_perm helpers
    semctl(): move compat to native
    semctl(): separate all layout-dependent copyin/copyout
    msgctl(): move compat to native
    msgctl(): split the actual work from copyin/copyout
    ipc: move compat shmctl to native
    shmctl: split the work from copyin/copyout

    Linus Torvalds
     

09 Sep, 2017

6 commits

  • ipc_findkey() used to scan all objects to look for the wanted key. This
    is slow when using a high number of keys. This change adds an rhashtable
    of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n).

    This change gives a 865% improvement of benchmark reaim.jobs_per_min on a
    56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1]

    Other (more micro) benchmark results, by the author: On an i5 laptop, the
    following loop executed right after a reboot took, without and with this
    change:

    for (int i = 0, k=0x424242; i < KEYS; ++i)
    semget(k++, 1, IPC_CREAT | 0600);

    total total max single max single
    KEYS without with call without call with

    1 3.5 4.9 µs 3.5 4.9
    10 7.6 8.6 µs 3.7 4.7
    32 16.2 15.9 µs 4.3 5.3
    100 72.9 41.8 µs 3.7 4.7
    1000 5,630.0 502.0 µs * *
    10000 1,340,000.0 7,240.0 µs * *
    31900 17,600,000.0 22,200.0 µs * *

    *: unreliable measure: high variance

    The duration for a lookup-only usage was obtained by the same loop once
    the keys are present:

    total total max single max single
    KEYS without with call without call with

    1 2.1 2.5 µs 2.1 2.5
    10 4.5 4.8 µs 2.2 2.3
    32 13.0 10.8 µs 2.3 2.8
    100 82.9 25.1 µs * 2.3
    1000 5,780.0 217.0 µs * *
    10000 1,470,000.0 2,520.0 µs * *
    31900 17,400,000.0 7,810.0 µs * *

    Finally, executing each semget() in a new process gave, when still
    summing only the durations of these syscalls:

    creation:
    total total
    KEYS without with

    1 3.7 5.0 µs
    10 32.9 36.7 µs
    32 125.0 109.0 µs
    100 523.0 353.0 µs
    1000 20,300.0 3,280.0 µs
    10000 2,470,000.0 46,700.0 µs
    31900 27,800,000.0 219,000.0 µs

    lookup-only:
    total total
    KEYS without with

    1 2.5 2.7 µs
    10 25.4 24.4 µs
    32 106.0 72.6 µs
    100 591.0 352.0 µs
    1000 22,400.0 2,250.0 µs
    10000 2,510,000.0 25,700.0 µs
    31900 28,200,000.0 115,000.0 µs

    [1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop

    Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix
    Signed-off-by: Guillaume Knispel
    Reviewed-by: Marc Pardo
    Cc: Davidlohr Bueso
    Cc: Kees Cook
    Cc: Manfred Spraul
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Cc: "Peter Zijlstra (Intel)"
    Cc: Ingo Molnar
    Cc: Sebastian Andrzej Siewior
    Cc: Serge Hallyn
    Cc: Andrey Vagin
    Cc: Guillaume Knispel
    Cc: Marc Pardo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guillaume Knispel
     
  • Replacing semop()'s kmalloc for kvmalloc was originally proposed by
    Manfred on the premise that it can be called for large (than order-1)
    sizes. For example, while Oracle recommends setting SEMOPM to a _minimum_
    of 100, some distros[1] encourage the setting to be a factor of the amount
    of db tasks (PROCESSES), which can get fishy for large systems (easily
    going beyond 1000).

    [1] An Example of Semaphore Settings
    https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/sect-Oracle_9i_and_10g_Tuning_Guide-Setting_Semaphores-An_Example_of_Semaphore_Settings.html

    So let's just convert this to kvmalloc, just like the rest of the
    allocations we do in ipc. While the fallback vmalloc obviously involves
    more overhead, this by far the uncommon path, and it's better for the user
    than just erroring out with kmalloc.

    Link: http://lkml.kernel.org/r/20170803184136.13855-2-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • ... 'tis not used.

    Link: http://lkml.kernel.org/r/20170803184136.13855-1-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • refcount_t type and corresponding API should be used instead of atomic_t
    when the variable is used as a reference counter. This allows to avoid
    accidental refcounter overflows that might lead to use-after-free
    situations.

    Link: http://lkml.kernel.org/r/1499417992-3238-4-git-send-email-elena.reshetova@intel.com
    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Cc: Peter Zijlstra
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Ingo Molnar
    Cc: Alexey Dobriyan
    Cc: Serge Hallyn
    Cc:
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Elena Reshetova
     
  • refcount_t type and corresponding API should be used instead of atomic_t
    when the variable is used as a reference counter. This allows to avoid
    accidental refcounter overflows that might lead to use-after-free
    situations.

    Link: http://lkml.kernel.org/r/1499417992-3238-3-git-send-email-elena.reshetova@intel.com
    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Cc: Peter Zijlstra
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Ingo Molnar
    Cc: Alexey Dobriyan
    Cc: Serge Hallyn
    Cc:
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Elena Reshetova
     
  • refcount_t type and corresponding API should be used instead of atomic_t
    when the variable is used as a reference counter. This allows to avoid
    accidental refcounter overflows that might lead to use-after-free
    situations.

    Link: http://lkml.kernel.org/r/1499417992-3238-2-git-send-email-elena.reshetova@intel.com
    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Cc: Peter Zijlstra
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Ingo Molnar
    Cc: Alexey Dobriyan
    Cc: Serge Hallyn
    Cc:
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Elena Reshetova
     

04 Sep, 2017

5 commits

  • time_t is not y2038 safe. Replace all uses of
    time_t by y2038 safe time64_t.

    Similarly, replace the calls to get_seconds() with
    y2038 safe ktime_get_real_seconds().
    Note that this preserves fast access on 64 bit systems,
    but 32 bit systems need sequence counters.

    The syscall interfaces themselves are not changed as part of
    the patch. They will be part of a different series.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Al Viro

    Deepa Dinamani
     
  • time_t is not y2038 safe. Replace all uses of
    time_t by y2038 safe time64_t.

    Similarly, replace the calls to get_seconds() with
    y2038 safe ktime_get_real_seconds().
    Note that this preserves fast access on 64 bit systems,
    but 32 bit systems need sequence counters.

    The syscall interface themselves are not changed as part of
    the patch. They will be part of a different series.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Al Viro

    Deepa Dinamani
     
  • time_t is not y2038 safe. Replace all uses of
    time_t by y2038 safe time64_t.

    Similarly, replace the calls to get_seconds() with
    y2038 safe ktime_get_real_seconds().
    Note that this preserves fast access on 64 bit systems,
    but 32 bit systems need sequence counters.

    The syscall interfaces themselves are not changed as part of
    the patch. They will be part of a different series.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Al Viro

    Deepa Dinamani
     
  • struct timespec is not y2038 safe. Replace
    all uses of timespec by y2038 safe struct timespec64.

    Even though timespec is used here to represent timeouts,
    replace these with timespec64 so that it facilitates
    in verification by creating a y2038 safe kernel image
    that is free of timespec.

    The syscall interfaces themselves are not changed as part
    of the patch. They will be part of a different series.

    Signed-off-by: Deepa Dinamani
    Cc: Paul Moore
    Cc: Richard Guy Briggs
    Reviewed-by: Richard Guy Briggs
    Reviewed-by: Arnd Bergmann
    Acked-by: Paul Moore
    Signed-off-by: Al Viro

    Deepa Dinamani
     
  • struct timespec is not y2038 safe on 32 bit machines.
    Replace timespec with y2038 safe struct timespec64.

    Note that the patch only changes the internals without
    modifying the syscall interface. This will be part
    of a separate series.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Al Viro

    Deepa Dinamani
     

21 Aug, 2017

1 commit


17 Aug, 2017

1 commit

  • There is no agreed-upon definition of spin_unlock_wait()'s semantics,
    and it appears that all callers could do just as well with a lock/unlock
    pair. This commit therefore replaces the spin_unlock_wait() call in
    exit_sem() with spin_lock() followed immediately by spin_unlock().
    This should be safe from a performance perspective because exit_sem()
    is rarely invoked in production.

    Signed-off-by: Paul E. McKenney
    Cc: Andrew Morton
    Cc: Davidlohr Bueso
    Cc: Will Deacon
    Cc: Peter Zijlstra
    Cc: Alan Stern
    Cc: Andrea Parri
    Cc: Linus Torvalds
    Acked-by: Manfred Spraul

    Paul E. McKenney
     

03 Aug, 2017

1 commit

  • When building with the randstruct gcc plugin, the layout of the IPC
    structs will be randomized, which requires any sub-structure accesses to
    use container_of(). The proc display handlers were missing the needed
    container_of()s since the iterator is passing in the top-level struct
    kern_ipc_perm.

    This would lead to crashes when running the "lsipc" program after the
    system had IPC registered (e.g. after starting up Gnome):

    general protection fault: 0000 [#1] PREEMPT SMP
    ...
    RIP: 0010:shm_add_rss_swap.isra.1+0x13/0xa0
    ...
    Call Trace:
    sysvipc_shm_proc_show+0x5e/0x150
    sysvipc_proc_show+0x1a/0x30
    seq_read+0x2e9/0x3f0
    ...

    Link: http://lkml.kernel.org/r/20170730205950.GA55841@beast
    Fixes: 3859a271a003 ("randstruct: Mark various structs for randomization")
    Signed-off-by: Kees Cook
    Reported-by: Dominik Brodowski
    Acked-by: Davidlohr Bueso
    Acked-by: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

16 Jul, 2017

11 commits


13 Jul, 2017

6 commits

  • Now that ipc_rcu_alloc() and ipc_rcu_free() are removed, document when
    it is valid to use ipc_getref() and ipc_putref().

    Link: http://lkml.kernel.org/r/20170525185107.12869-21-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • The remaining users of __sem_free() can simply call kvfree() instead for
    better readability.

    [manfred@colorfullife.com: Rediff to keep rcu protection for security_sem_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-20-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • There is nothing special about the msg_alloc/free routines any more, so
    remove them to make code more readable.

    [manfred@colorfullife.com: Rediff to keep rcu protection for security_msg_queue_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-19-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • There is nothing special about the shm_alloc/free routines any more, so
    remove them to make code more readable.

    [manfred@colorfullife.com: Rediff, to continue to keep rcu for free calls after a successful security_shm_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-18-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Only after ipc_addid() has succeeded will refcounting be used, so move
    initialization into ipc_addid() and remove from open-coded *_alloc()
    routines.

    Link: http://lkml.kernel.org/r/20170525185107.12869-17-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Loosely based on a patch from Kees Cook :
    - id and retval can be merged
    - if ipc_addid() fails, then use call_rcu() directly.

    The difference is that call_rcu is used for failed ipc_addid() calls, to
    continue to guaranteed an rcu delay for security_msg_queue_free().

    Link: http://lkml.kernel.org/r/20170525185107.12869-16-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul