02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

15 Sep, 2017

1 commit

  • Pull ipc compat cleanup and 64-bit time_t from Al Viro:
    "IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa
    Dinamani"

    * 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    utimes: Make utimes y2038 safe
    ipc: shm: Make shmid_kernel timestamps y2038 safe
    ipc: sem: Make sem_array timestamps y2038 safe
    ipc: msg: Make msg_queue timestamps y2038 safe
    ipc: mqueue: Replace timespec with timespec64
    ipc: Make sys_semtimedop() y2038 safe
    get rid of SYSVIPC_COMPAT on ia64
    semtimedop(): move compat to native
    shmat(2): move compat to native
    msgrcv(2), msgsnd(2): move compat to native
    ipc(2): move compat to native
    ipc: make use of compat ipc_perm helpers
    semctl(): move compat to native
    semctl(): separate all layout-dependent copyin/copyout
    msgctl(): move compat to native
    msgctl(): split the actual work from copyin/copyout
    ipc: move compat shmctl to native
    shmctl: split the work from copyin/copyout

    Linus Torvalds
     

09 Sep, 2017

1 commit

  • ipc_findkey() used to scan all objects to look for the wanted key. This
    is slow when using a high number of keys. This change adds an rhashtable
    of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n).

    This change gives a 865% improvement of benchmark reaim.jobs_per_min on a
    56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1]

    Other (more micro) benchmark results, by the author: On an i5 laptop, the
    following loop executed right after a reboot took, without and with this
    change:

    for (int i = 0, k=0x424242; i < KEYS; ++i)
    semget(k++, 1, IPC_CREAT | 0600);

    total total max single max single
    KEYS without with call without call with

    1 3.5 4.9 µs 3.5 4.9
    10 7.6 8.6 µs 3.7 4.7
    32 16.2 15.9 µs 4.3 5.3
    100 72.9 41.8 µs 3.7 4.7
    1000 5,630.0 502.0 µs * *
    10000 1,340,000.0 7,240.0 µs * *
    31900 17,600,000.0 22,200.0 µs * *

    *: unreliable measure: high variance

    The duration for a lookup-only usage was obtained by the same loop once
    the keys are present:

    total total max single max single
    KEYS without with call without call with

    1 2.1 2.5 µs 2.1 2.5
    10 4.5 4.8 µs 2.2 2.3
    32 13.0 10.8 µs 2.3 2.8
    100 82.9 25.1 µs * 2.3
    1000 5,780.0 217.0 µs * *
    10000 1,470,000.0 2,520.0 µs * *
    31900 17,400,000.0 7,810.0 µs * *

    Finally, executing each semget() in a new process gave, when still
    summing only the durations of these syscalls:

    creation:
    total total
    KEYS without with

    1 3.7 5.0 µs
    10 32.9 36.7 µs
    32 125.0 109.0 µs
    100 523.0 353.0 µs
    1000 20,300.0 3,280.0 µs
    10000 2,470,000.0 46,700.0 µs
    31900 27,800,000.0 219,000.0 µs

    lookup-only:
    total total
    KEYS without with

    1 2.5 2.7 µs
    10 25.4 24.4 µs
    32 106.0 72.6 µs
    100 591.0 352.0 µs
    1000 22,400.0 2,250.0 µs
    10000 2,510,000.0 25,700.0 µs
    31900 28,200,000.0 115,000.0 µs

    [1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop

    Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix
    Signed-off-by: Guillaume Knispel
    Reviewed-by: Marc Pardo
    Cc: Davidlohr Bueso
    Cc: Kees Cook
    Cc: Manfred Spraul
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Cc: "Peter Zijlstra (Intel)"
    Cc: Ingo Molnar
    Cc: Sebastian Andrzej Siewior
    Cc: Serge Hallyn
    Cc: Andrey Vagin
    Cc: Guillaume Knispel
    Cc: Marc Pardo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guillaume Knispel
     

04 Sep, 2017

1 commit

  • time_t is not y2038 safe. Replace all uses of
    time_t by y2038 safe time64_t.

    Similarly, replace the calls to get_seconds() with
    y2038 safe ktime_get_real_seconds().
    Note that this preserves fast access on 64 bit systems,
    but 32 bit systems need sequence counters.

    The syscall interfaces themselves are not changed as part of
    the patch. They will be part of a different series.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Al Viro

    Deepa Dinamani
     

03 Aug, 2017

1 commit

  • When building with the randstruct gcc plugin, the layout of the IPC
    structs will be randomized, which requires any sub-structure accesses to
    use container_of(). The proc display handlers were missing the needed
    container_of()s since the iterator is passing in the top-level struct
    kern_ipc_perm.

    This would lead to crashes when running the "lsipc" program after the
    system had IPC registered (e.g. after starting up Gnome):

    general protection fault: 0000 [#1] PREEMPT SMP
    ...
    RIP: 0010:shm_add_rss_swap.isra.1+0x13/0xa0
    ...
    Call Trace:
    sysvipc_shm_proc_show+0x5e/0x150
    sysvipc_proc_show+0x1a/0x30
    seq_read+0x2e9/0x3f0
    ...

    Link: http://lkml.kernel.org/r/20170730205950.GA55841@beast
    Fixes: 3859a271a003 ("randstruct: Mark various structs for randomization")
    Signed-off-by: Kees Cook
    Reported-by: Dominik Brodowski
    Acked-by: Davidlohr Bueso
    Acked-by: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

16 Jul, 2017

4 commits


13 Jul, 2017

6 commits

  • There is nothing special about the msg_alloc/free routines any more, so
    remove them to make code more readable.

    [manfred@colorfullife.com: Rediff to keep rcu protection for security_msg_queue_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-19-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Only after ipc_addid() has succeeded will refcounting be used, so move
    initialization into ipc_addid() and remove from open-coded *_alloc()
    routines.

    Link: http://lkml.kernel.org/r/20170525185107.12869-17-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Loosely based on a patch from Kees Cook :
    - id and retval can be merged
    - if ipc_addid() fails, then use call_rcu() directly.

    The difference is that call_rcu is used for failed ipc_addid() calls, to
    continue to guaranteed an rcu delay for security_msg_queue_free().

    Link: http://lkml.kernel.org/r/20170525185107.12869-16-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Instead of using ipc_rcu_alloc() which only performs the refcount bump,
    open code it. This also allows for msg_queue structure layout to be
    randomized in the future.

    Link: http://lkml.kernel.org/r/20170525185107.12869-12-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Avoid using ipc_rcu_free, since it just re-finds the original structure
    pointer. For the pre-list-init failure path, there is no RCU needed,
    since it was just allocated. It can be directly freed.

    Link: http://lkml.kernel.org/r/20170525185107.12869-8-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • ipc has two management structures that exist for every id:
    - struct kern_ipc_perm, it contains e.g. the permissions.
    - struct ipc_rcu, it contains the rcu head for rcu handling and the
    refcount.

    The patch merges both structures.

    As a bonus, we may save one cacheline, because both structures are
    cacheline aligned. In addition, it reduces the number of casts, instead
    most codepaths can use container_of.

    To simplify code, the ipc_rcu_alloc initializes the allocation to 0.

    [manfred@colorfullife.com: really include the memset() into ipc_alloc_rcu()]
    Link: http://lkml.kernel.org/r/564f8612-0601-b267-514f-a9f650ec9b32@colorfullife.com
    Link: http://lkml.kernel.org/r/20170525185107.12869-3-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

02 Mar, 2017

2 commits


15 Dec, 2016

1 commit

  • When LONG_MIN is passed to msgrcv, one would expect to recieve any
    message. But convert_mode does *msgtyp = -*msgtyp and -LONG_MIN is
    undefined. In particular, with my gcc -LONG_MIN produces -LONG_MIN
    again.

    So handle this case properly by assigning LONG_MAX to *msgtyp if
    LONG_MIN was specified as msgtyp to msgrcv.

    This code:
    long msg[] = { 100, 200 };
    int m = msgget(IPC_PRIVATE, IPC_CREAT | 0644);
    msgsnd(m, &msg, sizeof(msg), 0);
    msgrcv(m, &msg, sizeof(msg), LONG_MIN, 0);

    produces currently nothing:

    msgget(IPC_PRIVATE, IPC_CREAT|0644) = 65538
    msgsnd(65538, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0
    msgrcv(65538, ...

    Except a UBSAN warning:

    UBSAN: Undefined behaviour in ipc/msg.c:745:13
    negation of -9223372036854775808 cannot be represented in type 'long int':

    With the patch, I see what I expect:

    msgget(IPC_PRIVATE, IPC_CREAT|0644) = 0
    msgsnd(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0
    msgrcv(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, -9223372036854775808, 0) = 16

    Link: http://lkml.kernel.org/r/20161024082633.10148-1-jslaby@suse.cz
    Signed-off-by: Jiri Slaby
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

21 Nov, 2016

1 commit

  • Currently the wake_q data structure is defined by the WAKE_Q() macro.
    This macro, however, looks like a function doing something as "wake" is
    a verb. Even checkpatch.pl was confused as it reported warnings like

    WARNING: Missing a blank line after declarations
    #548: FILE: kernel/futex.c:3665:
    + int ret;
    + WAKE_Q(wake_q);

    This patch renames the WAKE_Q() macro to DEFINE_WAKE_Q() which clarifies
    what the macro is doing and eliminates the checkpatch.pl warnings.

    Signed-off-by: Waiman Long
    Acked-by: Davidlohr Bueso
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1479401198-1765-1-git-send-email-longman@redhat.com
    [ Resolved conflict and added missing rename. ]
    Signed-off-by: Ingo Molnar

    Waiman Long
     

12 Oct, 2016

4 commits

  • Blocked tasks queued in q_senders waiting for their message to fit in the
    queue are blindly awoken every time we think there's a remote chance this
    might happen. This could cause numerous (and expensive -- thundering
    herd-ish) bogus wakeups if the queue is still really full. Adding to the
    scheduling cost/overhead, there's also the fact that we need to take the
    ipc object lock and requeue ourselves in the q_senders list.

    By keeping track of the blocked sender's message size, we can know
    previously if the wakeup ought to occur or not. Otherwise, to maintain
    the current wakeup order we just move it to the tail. This is exactly
    what occurs right now if the sender needs to go back to sleep.

    The case of EIDRM is left completely untouched, as we need to wakeup all
    the tasks, and shouldn't be playing games in the first place.

    This patch was seen to save on the 'msgctl10' ltp testcase ~15% in context
    switches (avg out of ten runs). Although these tests are really about
    functionality (as opposed to performance), is does show the direct
    benefits of the optimization.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1469748819-19484-6-git-send-email-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Acked-by: Peter Zijlstra (Intel)
    Cc: Manfred Spraul
    Cc: Sebastian Andrzej Siewior
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • ... 'tis annoying.

    Link: http://lkml.kernel.org/r/1469748819-19484-4-git-send-email-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Acked-by: Peter Zijlstra (Intel)
    Cc: Manfred Spraul
    Cc: Sebastian Andrzej Siewior
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Currently the use of wake_qs in sysv msg queues are only for the receiver
    tasks that are blocked on the queue. But blocked sender tasks (due to
    queue size constraints) still are awoken with the ipc object lock held,
    which can be a problem particularly for small sized queues and far from
    gracious for -rt (just like it was for the receiver side).

    The paths that actually wakeup a sender are obviously related to when we
    are either getting rid of the queue or after (some) space is freed-up
    after a receiver takes the msg (msgrcv). Furthermore, with the exception
    of msgrcv, we can always piggy-back on expunge_all that has its own tasks
    lined-up for waking. Finally, upon unlinking the message, it should be no
    problem delaying the wakeups a bit until after we've released the lock.

    Link: http://lkml.kernel.org/r/1469748819-19484-3-git-send-email-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Acked-by: Peter Zijlstra (Intel)
    Cc: Manfred Spraul
    Cc: Sebastian Andrzej Siewior
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This patch moves the wakeup_process() invocation so it is not done under
    the ipc global lock by making use of a lockless wake_q. With this change,
    the waiter is woken up once the message has been assigned and it does not
    need to loop on SMP if the message points to NULL. In the signal case we
    still need to check the pointer under the lock to verify the state.

    This change should also avoid the introduction of preempt_disable() in -RT
    which avoids a busy-loop which pools for the NULL -> !NULL change if the
    waiter has a higher priority compared to the waker.

    By making use of wake_qs, the logic of sysv msg queues is greatly
    simplified (and very well suited as we can batch lockless wakeups),
    particularly around the lockless receive algorithm.

    This has been tested with Manred's pmsg-shared tool on a "AMD A10-7800
    Radeon R7, 12 Compute Cores 4C+8G":

    test | before | after | diff
    -----------------|------------|------------|----------
    pmsg-shared 8 60 | 19,347,422 | 30,442,191 | + ~57.34 %
    pmsg-shared 4 60 | 21,367,197 | 35,743,458 | + ~67.28 %
    pmsg-shared 2 60 | 22,884,224 | 24,278,200 | + ~6.09 %

    Link: http://lkml.kernel.org/r/1469748819-19484-2-git-send-email-dave@stgolabs.net
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Davidlohr Bueso
    Acked-by: Peter Zijlstra (Intel)
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sebastian Andrzej Siewior
     

03 Aug, 2016

1 commit

  • Commit 53dad6d3a8e5 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
    to receive rcu freeing function but used generic ipc_rcu_free() instead
    of msg_rcu_free() which does security cleaning.

    Running LTP msgsnd06 with kmemleak gives the following:

    cat /sys/kernel/debug/kmemleak

    unreferenced object 0xffff88003c0a11f8 (size 8):
    comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s)
    hex dump (first 8 bytes):
    1b 00 00 00 01 00 00 00 ........
    backtrace:
    kmemleak_alloc+0x23/0x40
    kmem_cache_alloc_trace+0xe1/0x180
    selinux_msg_queue_alloc_security+0x3f/0xd0
    security_msg_queue_alloc+0x2e/0x40
    newque+0x4e/0x150
    ipcget+0x159/0x1b0
    SyS_msgget+0x39/0x40
    entry_SYSCALL_64_fastpath+0x13/0x8f

    Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
    only use ipc_rcu_free in case of security allocation failure in newary()

    Fixes: 53dad6d3a8e ("ipc: fix race with LSMs")
    Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.be
    Signed-off-by: Fabian Frederick
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: [3.12+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     

01 Oct, 2015

1 commit

  • As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
    having initialized the IPC object state. Yes, we initialize the IPC
    object in a locked state, but with all the lockless RCU lookup work,
    that IPC object lock no longer means that the state cannot be seen.

    We already did this for the IPC semaphore code (see commit e8577d1f0329:
    "ipc/sem.c: fully initialize sem_array before making it visible") but we
    clearly forgot about msg and shm.

    Reported-by: Dmitry Vyukov
    Cc: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Jul, 2015

2 commits

  • ... to ipc_obtain_object_idr, which is more meaningful and makes the code
    slightly easier to follow.

    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • We currently use a full barrier on the sender side to to avoid receiver
    tasks disappearing on us while still performing on the sender side wakeup.
    We lack however, the proper CPU-CPU interactions pairing on the receiver
    side which busy-waits for the message. Similarly, we do not need a full
    smp_mb, and can relax the semantics for the writer and reader sides of the
    message. This is safe as we are only ordering loads and stores to r_msg.
    And in both smp_wmb and smp_rmb, there are no stores after the calls
    _anyway_.

    This obviously applies for pipelined_send and expunge_all, for EIRDM when
    destroying a queue.

    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

16 Apr, 2015

1 commit

  • The seq_printf return value, because it's frequently misused,
    will eventually be converted to void.

    See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
    seq_has_overflowed() and make public")

    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

14 Dec, 2014

1 commit

  • SysV can be abused to allocate locked kernel memory. For most systems, a
    small limit doesn't make sense, see the discussion with regards to SHMMAX.

    Therefore: increase MSGMNI to the maximum supported.

    And: If we ignore the risk of locking too much memory, then an automatic
    scaling of MSGMNI doesn't make sense. Therefore the logic can be removed.

    The code preserves auto_msgmni to avoid breaking any user space applications
    that expect that the value exists.

    Notes:
    1) If an administrator must limit the memory allocations, then he can set
    MSGMNI as necessary.

    Or he can disable sysv entirely (as e.g. done by Android).

    2) MSGMAX and MSGMNB are intentionally not increased, as these values are used
    to control latency vs. throughput:
    If MSGMNB is large, then msgsnd() just returns and more messages can be queued
    before a task switch to a task that calls msgrcv() is forced.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Rafael Aquini
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

07 Jun, 2014

6 commits

  • The need for volatile is not obvious, document it.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Manfred Spraul
    Cc: Aswin Chandramouleeswaran
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Nothing big and no logical changes, just get rid of some redundant
    function declarations. Move msg_[init/exit]_ns down the end of the
    file.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Manfred Spraul
    Cc: Aswin Chandramouleeswaran
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Call __set_current_state() instead of assigning the new state directly.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Manfred Spraul
    Cc: Aswin Chandramouleeswaran
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • trailing whitespace

    Signed-off-by: Paul McQuade
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul McQuade
     
  • Use #include instead of
    Use #include instead of

    Signed-off-by: Paul McQuade
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul McQuade
     
  • There is no need to recreate the very same ipc_ops structure on every
    kernel entry for msgget/semget/shmget. Just declare it static and be
    done with it. While at it, constify it as we don't modify the structure
    at runtime.

    Found in the PaX patch, written by the PaX Team.

    Signed-off-by: Mathias Krause
    Cc: PaX Team
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathias Krause
     

17 Mar, 2014

1 commit

  • While testing and documenting the msgrcv() MSG_COPY flag that Stanislav
    Kinsbursky added in commit 4a674f34ba04 ("ipc: introduce message queue
    copy feature" => kernel 3.8), I discovered a couple of bugs in the
    implementation. The two bugs concern MSG_COPY interactions with other
    msgrcv() flags, namely:

    (A) MSG_COPY + MSG_EXCEPT
    (B) MSG_COPY + !IPC_NOWAIT

    The bugs are distinct (and the fix for the first one is obvious),
    however my fix for both is a single-line patch, which is why I'm
    combining them in a single mail, rather than writing two mails+patches.

    ===== (A) MSG_COPY + MSG_EXCEPT =====

    With the addition of the MSG_COPY flag, there are now two msgrcv()
    flags--MSG_COPY and MSG_EXCEPT--that modify the meaning of the 'msgtyp'
    argument in unrelated ways. Specifying both in the same call is a
    logical error that is currently permitted, with the effect that MSG_COPY
    has priority and MSG_EXCEPT is ignored. The call should give an error
    if both flags are specified. The patch below implements that behavior.

    ===== (B) (B) MSG_COPY + !IPC_NOWAIT =====

    The test code that was submitted in commit 3a665531a3b7 ("selftests: IPC
    message queue copy feature test") shows MSG_COPY being used in
    conjunction with IPC_NOWAIT. In other words, if there is no message at
    the position 'msgtyp'. return immediately with the error in ENOMSG.

    What was not (fully) tested is the behavior if MSG_COPY is specified
    *without* IPC_NOWAIT, and there is an odd behavior. If the queue
    contains less than 'msgtyp' messages, then the call blocks until the
    next message is written to the queue. At that point, the msgrcv() call
    returns a copy of the newly added message, regardless of whether that
    message is at the ordinal position 'msgtyp'. This is clearly bogus, and
    problematic for applications that might want to make use of the MSG_COPY
    flag.

    I considered the following possible solutions to this problem:

    (1) Force the call to block until a message *does* appear at the
    position 'msgtyp'.

    (2) If the MSG_COPY flag is specified, the kernel should implicitly add
    IPC_NOWAIT, so that the call fails with ENOMSG for this case.

    (3) If the MSG_COPY flag is specified, but IPC_NOWAIT is not, generate
    an error (probably, EINVAL is the right one).

    I do not know if any application would really want to have the
    functionality of solution (1), especially since an application can
    determine in advance the number of messages in the queue using msgctl()
    IPC_STAT. Obviously, this solution would be the most work to implement.

    Solution (2) would have the effect of silently fixing any applications
    that tried to employ broken behavior. However, it would mean that if we
    later decided to implement solution (1), then user-space could not
    easily detect what the kernel supports (but, since I'm somewhat doubtful
    that solution (1) is needed, I'm not sure that this is much of a
    problem).

    Solution (3) would have the effect of informing broken applications that
    they are doing something broken. The downside is that this would cause
    a ABI breakage for any applications that are currently employing the
    broken behavior. However:

    a) Those applications are almost certainly not getting the results they
    expect.
    b) Possibly, those applications don't even exist, because MSG_COPY is
    currently hidden behind CONFIG_CHECKPOINT_RESTORE.

    The upside of solution (3) is that if we later decided to implement
    solution (1), user-space could determine what the kernel supports, via
    the error return.

    In my view, solution (3) is mildly preferable to solution (2), and
    solution (1) could still be done later if anyone really cares. The
    patch below implements solution (3).

    PS. For anyone out there still listening, it's the usual story:
    documenting an API (and the thinking about, and the testing of the API,
    that documentation entails) is the one of the single best ways of
    finding bugs in the API, as I've learned from a lot of experience. Best
    to do that documentation before releasing the API.

    Signed-off-by: Michael Kerrisk
    Acked-by: Stanislav Kinsbursky
    Cc: Stanislav Kinsbursky
    Cc: stable@vger.kernel.org
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Al Viro
    Cc: KOSAKI Motohiro
    Signed-off-by: Linus Torvalds

    Michael Kerrisk
     

28 Jan, 2014

3 commits

  • Both expunge_all() and pipeline_send() rely on both a nil msg value and
    a full barrier to guarantee the correct ordering when waking up a task.

    While its counterpart at the receiving end is well documented for the
    lockless recv algorithm, we still need to document these specific
    smp_mb() calls.

    [akpm@linux-foundation.org: fix typo, per Mike]
    [akpm@linux-foundation.org: mroe tpyos]
    Signed-off-by: Davidlohr Bueso
    Cc: Aswin Chandramouleeswaran
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • The ipc code does not adhere the typical linux coding style.
    This patch fixes lots of simple whitespace errors.

    - mostly autogenerated by
    scripts/checkpatch.pl -f --fix \
    --types=pointer_location,spacing,space_before_tab
    - one manual fixup (keep structure members tab-aligned)
    - removal of additional space_before_tab that were not found by --fix

    Tested with some of my msg and sem test apps.

    Andrew: Could you include it in -mm and move it towards Linus' tree?

    Signed-off-by: Manfred Spraul
    Suggested-by: Li Bin
    Cc: Joe Perches
    Acked-by: Rafael Aquini
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • After the locking semantics for the SysV IPC API got improved, a couple
    of IPC_RMID race windows were opened because we ended up dropping the
    'kern_ipc_perm.deleted' check performed way down in ipc_lock(). The
    spotted races got sorted out by re-introducing the old test within the
    racy critical sections.

    This patch introduces ipc_valid_object() to consolidate the way we cope
    with IPC_RMID races by using the same abstraction across the API
    implementation.

    Signed-off-by: Rafael Aquini
    Acked-by: Rik van Riel
    Acked-by: Greg Thelen
    Reviewed-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     

01 Oct, 2013

1 commit

  • This fixes a race in both msgrcv() and msgsnd() between finding the msg
    and actually dealing with the queue, as another thread can delete shmid
    underneath us if we are preempted before acquiring the
    kern_ipc_perm.lock.

    Manfred illustrates this nicely:

    Assume a preemptible kernel that is preempted just after

    msq = msq_obtain_object_check(ns, msqid)

    in do_msgrcv(). The only lock that is held is rcu_read_lock().

    Now the other thread processes IPC_RMID. When the first task is
    resumed, then it will happily wait for messages on a deleted queue.

    Fix this by checking for if the queue has been deleted after taking the
    lock.

    Signed-off-by: Davidlohr Bueso
    Reported-by: Manfred Spraul
    Cc: Rik van Riel
    Cc: Mike Galbraith
    Cc: [3.11]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso