12 Oct, 2016

4 commits

  • Blocked tasks queued in q_senders waiting for their message to fit in the
    queue are blindly awoken every time we think there's a remote chance this
    might happen. This could cause numerous (and expensive -- thundering
    herd-ish) bogus wakeups if the queue is still really full. Adding to the
    scheduling cost/overhead, there's also the fact that we need to take the
    ipc object lock and requeue ourselves in the q_senders list.

    By keeping track of the blocked sender's message size, we can know
    previously if the wakeup ought to occur or not. Otherwise, to maintain
    the current wakeup order we just move it to the tail. This is exactly
    what occurs right now if the sender needs to go back to sleep.

    The case of EIDRM is left completely untouched, as we need to wakeup all
    the tasks, and shouldn't be playing games in the first place.

    This patch was seen to save on the 'msgctl10' ltp testcase ~15% in context
    switches (avg out of ten runs). Although these tests are really about
    functionality (as opposed to performance), is does show the direct
    benefits of the optimization.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1469748819-19484-6-git-send-email-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Acked-by: Peter Zijlstra (Intel)
    Cc: Manfred Spraul
    Cc: Sebastian Andrzej Siewior
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • ... 'tis annoying.

    Link: http://lkml.kernel.org/r/1469748819-19484-4-git-send-email-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Acked-by: Peter Zijlstra (Intel)
    Cc: Manfred Spraul
    Cc: Sebastian Andrzej Siewior
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Currently the use of wake_qs in sysv msg queues are only for the receiver
    tasks that are blocked on the queue. But blocked sender tasks (due to
    queue size constraints) still are awoken with the ipc object lock held,
    which can be a problem particularly for small sized queues and far from
    gracious for -rt (just like it was for the receiver side).

    The paths that actually wakeup a sender are obviously related to when we
    are either getting rid of the queue or after (some) space is freed-up
    after a receiver takes the msg (msgrcv). Furthermore, with the exception
    of msgrcv, we can always piggy-back on expunge_all that has its own tasks
    lined-up for waking. Finally, upon unlinking the message, it should be no
    problem delaying the wakeups a bit until after we've released the lock.

    Link: http://lkml.kernel.org/r/1469748819-19484-3-git-send-email-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Acked-by: Peter Zijlstra (Intel)
    Cc: Manfred Spraul
    Cc: Sebastian Andrzej Siewior
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This patch moves the wakeup_process() invocation so it is not done under
    the ipc global lock by making use of a lockless wake_q. With this change,
    the waiter is woken up once the message has been assigned and it does not
    need to loop on SMP if the message points to NULL. In the signal case we
    still need to check the pointer under the lock to verify the state.

    This change should also avoid the introduction of preempt_disable() in -RT
    which avoids a busy-loop which pools for the NULL -> !NULL change if the
    waiter has a higher priority compared to the waker.

    By making use of wake_qs, the logic of sysv msg queues is greatly
    simplified (and very well suited as we can batch lockless wakeups),
    particularly around the lockless receive algorithm.

    This has been tested with Manred's pmsg-shared tool on a "AMD A10-7800
    Radeon R7, 12 Compute Cores 4C+8G":

    test | before | after | diff
    -----------------|------------|------------|----------
    pmsg-shared 8 60 | 19,347,422 | 30,442,191 | + ~57.34 %
    pmsg-shared 4 60 | 21,367,197 | 35,743,458 | + ~67.28 %
    pmsg-shared 2 60 | 22,884,224 | 24,278,200 | + ~6.09 %

    Link: http://lkml.kernel.org/r/1469748819-19484-2-git-send-email-dave@stgolabs.net
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Davidlohr Bueso
    Acked-by: Peter Zijlstra (Intel)
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sebastian Andrzej Siewior
     

03 Aug, 2016

1 commit

  • Commit 53dad6d3a8e5 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
    to receive rcu freeing function but used generic ipc_rcu_free() instead
    of msg_rcu_free() which does security cleaning.

    Running LTP msgsnd06 with kmemleak gives the following:

    cat /sys/kernel/debug/kmemleak

    unreferenced object 0xffff88003c0a11f8 (size 8):
    comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s)
    hex dump (first 8 bytes):
    1b 00 00 00 01 00 00 00 ........
    backtrace:
    kmemleak_alloc+0x23/0x40
    kmem_cache_alloc_trace+0xe1/0x180
    selinux_msg_queue_alloc_security+0x3f/0xd0
    security_msg_queue_alloc+0x2e/0x40
    newque+0x4e/0x150
    ipcget+0x159/0x1b0
    SyS_msgget+0x39/0x40
    entry_SYSCALL_64_fastpath+0x13/0x8f

    Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
    only use ipc_rcu_free in case of security allocation failure in newary()

    Fixes: 53dad6d3a8e ("ipc: fix race with LSMs")
    Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.be
    Signed-off-by: Fabian Frederick
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: [3.12+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     

01 Oct, 2015

1 commit

  • As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
    having initialized the IPC object state. Yes, we initialize the IPC
    object in a locked state, but with all the lockless RCU lookup work,
    that IPC object lock no longer means that the state cannot be seen.

    We already did this for the IPC semaphore code (see commit e8577d1f0329:
    "ipc/sem.c: fully initialize sem_array before making it visible") but we
    clearly forgot about msg and shm.

    Reported-by: Dmitry Vyukov
    Cc: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Jul, 2015

2 commits

  • ... to ipc_obtain_object_idr, which is more meaningful and makes the code
    slightly easier to follow.

    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • We currently use a full barrier on the sender side to to avoid receiver
    tasks disappearing on us while still performing on the sender side wakeup.
    We lack however, the proper CPU-CPU interactions pairing on the receiver
    side which busy-waits for the message. Similarly, we do not need a full
    smp_mb, and can relax the semantics for the writer and reader sides of the
    message. This is safe as we are only ordering loads and stores to r_msg.
    And in both smp_wmb and smp_rmb, there are no stores after the calls
    _anyway_.

    This obviously applies for pipelined_send and expunge_all, for EIRDM when
    destroying a queue.

    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

16 Apr, 2015

1 commit

  • The seq_printf return value, because it's frequently misused,
    will eventually be converted to void.

    See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
    seq_has_overflowed() and make public")

    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

14 Dec, 2014

1 commit

  • SysV can be abused to allocate locked kernel memory. For most systems, a
    small limit doesn't make sense, see the discussion with regards to SHMMAX.

    Therefore: increase MSGMNI to the maximum supported.

    And: If we ignore the risk of locking too much memory, then an automatic
    scaling of MSGMNI doesn't make sense. Therefore the logic can be removed.

    The code preserves auto_msgmni to avoid breaking any user space applications
    that expect that the value exists.

    Notes:
    1) If an administrator must limit the memory allocations, then he can set
    MSGMNI as necessary.

    Or he can disable sysv entirely (as e.g. done by Android).

    2) MSGMAX and MSGMNB are intentionally not increased, as these values are used
    to control latency vs. throughput:
    If MSGMNB is large, then msgsnd() just returns and more messages can be queued
    before a task switch to a task that calls msgrcv() is forced.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Rafael Aquini
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

07 Jun, 2014

6 commits

  • The need for volatile is not obvious, document it.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Manfred Spraul
    Cc: Aswin Chandramouleeswaran
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Nothing big and no logical changes, just get rid of some redundant
    function declarations. Move msg_[init/exit]_ns down the end of the
    file.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Manfred Spraul
    Cc: Aswin Chandramouleeswaran
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Call __set_current_state() instead of assigning the new state directly.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Manfred Spraul
    Cc: Aswin Chandramouleeswaran
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • trailing whitespace

    Signed-off-by: Paul McQuade
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul McQuade
     
  • Use #include instead of
    Use #include instead of

    Signed-off-by: Paul McQuade
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul McQuade
     
  • There is no need to recreate the very same ipc_ops structure on every
    kernel entry for msgget/semget/shmget. Just declare it static and be
    done with it. While at it, constify it as we don't modify the structure
    at runtime.

    Found in the PaX patch, written by the PaX Team.

    Signed-off-by: Mathias Krause
    Cc: PaX Team
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathias Krause
     

17 Mar, 2014

1 commit

  • While testing and documenting the msgrcv() MSG_COPY flag that Stanislav
    Kinsbursky added in commit 4a674f34ba04 ("ipc: introduce message queue
    copy feature" => kernel 3.8), I discovered a couple of bugs in the
    implementation. The two bugs concern MSG_COPY interactions with other
    msgrcv() flags, namely:

    (A) MSG_COPY + MSG_EXCEPT
    (B) MSG_COPY + !IPC_NOWAIT

    The bugs are distinct (and the fix for the first one is obvious),
    however my fix for both is a single-line patch, which is why I'm
    combining them in a single mail, rather than writing two mails+patches.

    ===== (A) MSG_COPY + MSG_EXCEPT =====

    With the addition of the MSG_COPY flag, there are now two msgrcv()
    flags--MSG_COPY and MSG_EXCEPT--that modify the meaning of the 'msgtyp'
    argument in unrelated ways. Specifying both in the same call is a
    logical error that is currently permitted, with the effect that MSG_COPY
    has priority and MSG_EXCEPT is ignored. The call should give an error
    if both flags are specified. The patch below implements that behavior.

    ===== (B) (B) MSG_COPY + !IPC_NOWAIT =====

    The test code that was submitted in commit 3a665531a3b7 ("selftests: IPC
    message queue copy feature test") shows MSG_COPY being used in
    conjunction with IPC_NOWAIT. In other words, if there is no message at
    the position 'msgtyp'. return immediately with the error in ENOMSG.

    What was not (fully) tested is the behavior if MSG_COPY is specified
    *without* IPC_NOWAIT, and there is an odd behavior. If the queue
    contains less than 'msgtyp' messages, then the call blocks until the
    next message is written to the queue. At that point, the msgrcv() call
    returns a copy of the newly added message, regardless of whether that
    message is at the ordinal position 'msgtyp'. This is clearly bogus, and
    problematic for applications that might want to make use of the MSG_COPY
    flag.

    I considered the following possible solutions to this problem:

    (1) Force the call to block until a message *does* appear at the
    position 'msgtyp'.

    (2) If the MSG_COPY flag is specified, the kernel should implicitly add
    IPC_NOWAIT, so that the call fails with ENOMSG for this case.

    (3) If the MSG_COPY flag is specified, but IPC_NOWAIT is not, generate
    an error (probably, EINVAL is the right one).

    I do not know if any application would really want to have the
    functionality of solution (1), especially since an application can
    determine in advance the number of messages in the queue using msgctl()
    IPC_STAT. Obviously, this solution would be the most work to implement.

    Solution (2) would have the effect of silently fixing any applications
    that tried to employ broken behavior. However, it would mean that if we
    later decided to implement solution (1), then user-space could not
    easily detect what the kernel supports (but, since I'm somewhat doubtful
    that solution (1) is needed, I'm not sure that this is much of a
    problem).

    Solution (3) would have the effect of informing broken applications that
    they are doing something broken. The downside is that this would cause
    a ABI breakage for any applications that are currently employing the
    broken behavior. However:

    a) Those applications are almost certainly not getting the results they
    expect.
    b) Possibly, those applications don't even exist, because MSG_COPY is
    currently hidden behind CONFIG_CHECKPOINT_RESTORE.

    The upside of solution (3) is that if we later decided to implement
    solution (1), user-space could determine what the kernel supports, via
    the error return.

    In my view, solution (3) is mildly preferable to solution (2), and
    solution (1) could still be done later if anyone really cares. The
    patch below implements solution (3).

    PS. For anyone out there still listening, it's the usual story:
    documenting an API (and the thinking about, and the testing of the API,
    that documentation entails) is the one of the single best ways of
    finding bugs in the API, as I've learned from a lot of experience. Best
    to do that documentation before releasing the API.

    Signed-off-by: Michael Kerrisk
    Acked-by: Stanislav Kinsbursky
    Cc: Stanislav Kinsbursky
    Cc: stable@vger.kernel.org
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Al Viro
    Cc: KOSAKI Motohiro
    Signed-off-by: Linus Torvalds

    Michael Kerrisk
     

28 Jan, 2014

3 commits

  • Both expunge_all() and pipeline_send() rely on both a nil msg value and
    a full barrier to guarantee the correct ordering when waking up a task.

    While its counterpart at the receiving end is well documented for the
    lockless recv algorithm, we still need to document these specific
    smp_mb() calls.

    [akpm@linux-foundation.org: fix typo, per Mike]
    [akpm@linux-foundation.org: mroe tpyos]
    Signed-off-by: Davidlohr Bueso
    Cc: Aswin Chandramouleeswaran
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • The ipc code does not adhere the typical linux coding style.
    This patch fixes lots of simple whitespace errors.

    - mostly autogenerated by
    scripts/checkpatch.pl -f --fix \
    --types=pointer_location,spacing,space_before_tab
    - one manual fixup (keep structure members tab-aligned)
    - removal of additional space_before_tab that were not found by --fix

    Tested with some of my msg and sem test apps.

    Andrew: Could you include it in -mm and move it towards Linus' tree?

    Signed-off-by: Manfred Spraul
    Suggested-by: Li Bin
    Cc: Joe Perches
    Acked-by: Rafael Aquini
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • After the locking semantics for the SysV IPC API got improved, a couple
    of IPC_RMID race windows were opened because we ended up dropping the
    'kern_ipc_perm.deleted' check performed way down in ipc_lock(). The
    spotted races got sorted out by re-introducing the old test within the
    racy critical sections.

    This patch introduces ipc_valid_object() to consolidate the way we cope
    with IPC_RMID races by using the same abstraction across the API
    implementation.

    Signed-off-by: Rafael Aquini
    Acked-by: Rik van Riel
    Acked-by: Greg Thelen
    Reviewed-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     

01 Oct, 2013

1 commit

  • This fixes a race in both msgrcv() and msgsnd() between finding the msg
    and actually dealing with the queue, as another thread can delete shmid
    underneath us if we are preempted before acquiring the
    kern_ipc_perm.lock.

    Manfred illustrates this nicely:

    Assume a preemptible kernel that is preempted just after

    msq = msq_obtain_object_check(ns, msqid)

    in do_msgrcv(). The only lock that is held is rcu_read_lock().

    Now the other thread processes IPC_RMID. When the first task is
    resumed, then it will happily wait for messages on a deleted queue.

    Fix this by checking for if the queue has been deleted after taking the
    lock.

    Signed-off-by: Davidlohr Bueso
    Reported-by: Manfred Spraul
    Cc: Rik van Riel
    Cc: Mike Galbraith
    Cc: [3.11]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

25 Sep, 2013

1 commit

  • Currently, IPC mechanisms do security and auditing related checks under
    RCU. However, since security modules can free the security structure,
    for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
    race if the structure is freed before other tasks are done with it,
    creating a use-after-free condition. Manfred illustrates this nicely,
    for instance with shared mem and selinux:

    -> do_shmat calls rcu_read_lock()
    -> do_shmat calls shm_object_check().
    Checks that the object is still valid - but doesn't acquire any locks.
    Then it returns.
    -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
    -> selinux_shm_shmat calls ipc_has_perm()
    -> ipc_has_perm accesses ipc_perms->security

    shm_close()
    -> shm_close acquires rw_mutex & shm_lock
    -> shm_close calls shm_destroy
    -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
    -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
    -> ipc_free_security calls kfree(ipc_perms->security)

    This patch delays the freeing of the security structures after all RCU
    readers are done. Furthermore it aligns the security life cycle with
    that of the rest of IPC - freeing them based on the reference counter.
    For situations where we need not free security, the current behavior is
    kept. Linus states:

    "... the old behavior was suspect for another reason too: having the
    security blob go away from under a user sounds like it could cause
    various other problems anyway, so I think the old code was at least
    _prone_ to bugs even if it didn't have catastrophic behavior."

    I have tested this patch with IPC testcases from LTP on both my
    quad-core laptop and on a 64 core NUMA server. In both cases selinux is
    enabled, and tests pass for both voluntary and forced preemption models.
    While the mentioned races are theoretical (at least no one as reported
    them), I wanted to make sure that this new logic doesn't break anything
    we weren't aware of.

    Suggested-by: Linus Torvalds
    Signed-off-by: Davidlohr Bueso
    Acked-by: Manfred Spraul
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

12 Sep, 2013

2 commits

  • There is only one user left, drop this function and just call
    ipc_unlock_object() and rcu_read_unlock().

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Since in some situations the lock can be shared for readers, we shouldn't
    be calling it a mutex, rename it to rwsem.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

04 Sep, 2013

1 commit

  • The check if the queue is full and adding current to the wait queue of
    pending msgsnd() operations (ss_add()) must be atomic.

    Otherwise:
    - the thread that performs msgsnd() finds a full queue and decides to
    sleep.
    - the thread that performs msgrcv() first reads all messages from the
    queue and then sleeps, because the queue is empty.
    - the msgrcv() calls do not perform any wakeups, because the msgsnd()
    task has not yet called ss_add().
    - then the msgsnd()-thread first calls ss_add() and then sleeps.

    Net result: msgsnd() and msgrcv() both sleep forever.

    Observed with msgctl08 from ltp with a preemptible kernel.

    Fix: Call ipc_lock_object() before performing the check.

    The patch also moves security_msg_queue_msgsnd() under ipc_lock_object:
    - msgctl(IPC_SET) explicitely mentions that it tries to expunge any
    pending operations that are not allowed anymore with the new
    permissions. If security_msg_queue_msgsnd() is called without locks,
    then there might be races.
    - it makes the patch much simpler.

    Reported-and-tested-by: Vineet Gupta
    Acked-by: Rik van Riel
    Cc: stable@vger.kernel.org # for 3.11
    Signed-off-by: Manfred Spraul
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

29 Aug, 2013

1 commit

  • According to 'man msgrcv': "If msgtyp is less than 0, the first message of
    the lowest type that is less than or equal to the absolute value of msgtyp
    shall be received."

    Bug: The kernel only returns a message if its type is 1; other messages
    with type < abs(msgtype) will never get returned.

    Fix: After having traversed the list to find the first message with the
    lowest type, we need to actually return that message.

    This regression was introduced by commit daaf74cf0867 ("ipc: refactor
    msg list search into separate function")

    Signed-off-by: Svenning Soerensen
    Reviewed-by: Peter Hurley
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Svenning Sørensen
     

10 Jul, 2013

10 commits

  • We can now drop the msg_lock and msg_lock_check functions along with a
    bogus comment introduced previously in semctl_down.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • do_msgrcv() is the last msg queue function that abuses the ipc lock Take
    it only when needed when actually updating msq.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • do_msgsnd() is another function that does too many things with the ipc
    object lock acquired. Take it only when needed when actually updating
    msq.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • While the INFO cmd doesn't take the ipc lock, the STAT commands do
    acquire it unnecessarily. We can do the permissions and security checks
    only holding the rcu lock.

    This function now mimics semctl_nolock().

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Add msq_obtain_object() and msq_obtain_object_check(), which will allow
    us to get the ipc object without acquiring the lock. Just as with
    semaphores, these functions are basically wrappers around
    ipc_obtain_object*().

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Similar to semctl, when calling msgctl, the *_INFO and *_STAT commands
    can be performed without acquiring the ipc object.

    Add a msgctl_nolock() function and move the logic of *_INFO and *_STAT
    out of msgctl(). This change still takes the lock and it will be
    properly lockless in the next patch

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Instead of holding the ipc lock for the entire function, use the
    ipcctl_pre_down_nolock and only acquire the lock for specific commands:
    RMID and SET.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This function currently acquires both the rw_mutex and the rcu lock on
    successful lookups, leaving the callers to explicitly unlock them,
    creating another two level locking situation.

    Make the callers (including those that still use ipcctl_pre_down())
    explicitly lock and unlock the rwsem and rcu lock.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This patchset continues the work that began in the sysv ipc semaphore
    scaling series, see

    https://lkml.org/lkml/2013/3/20/546

    Just like semaphores used to be, sysv shared memory and msg queues also
    abuse the ipc lock, unnecessarily holding it for operations such as
    permission and security checks.

    This patchset mostly deals with mqueues, and while shared mem can be
    done in a very similar way, I want to get these patches out in the open
    first. It also does some pending cleanups, mostly focused on the two
    level locking we have in ipc code, taking care of ipc_addid() and
    ipcctl_pre_down_nolock() - yes there are still functions that need to be
    updated as well.

    This patch:

    Make all callers explicitly take and release the RCU read lock.

    This addresses the two level locking seen in newary(), newseg() and
    newqueue(). For the last two, explicitly unlock the ipc object and the
    rcu lock, instead of calling the custom shm_unlock and msg_unlock
    functions. The next patch will deal with the open coded locking for
    ->perm.lock

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

01 May, 2013

4 commits

  • The ipc/msg.c code does its list operations by hand and it open-codes the
    accesses, instead of using for_each_entry_[safe].

    Signed-off-by: Nikola Pajkovsky
    Cc: Stanislav Kinsbursky
    Cc: "Eric W. Biederman"
    Cc: Peter Hurley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikola Pajkovsky
     
  • Introduce finer grained locking for semtimedop, to handle the common case
    of a program wanting to manipulate one semaphore from an array with
    multiple semaphores.

    If the call is a semop manipulating just one semaphore in an array with
    multiple semaphores, only take the lock for that semaphore itself.

    If the call needs to manipulate multiple semaphores, or another caller is
    in a transaction that manipulates multiple semaphores, the sem_array lock
    is taken, as well as all the locks for the individual semaphores.

    On a 24 CPU system, performance numbers with the semop-multi
    test with N threads and N semaphores, look like this:

    vanilla Davidlohr's Davidlohr's + Davidlohr's +
    threads patches rwlock patches v3 patches
    10 610652 726325 1783589 2142206
    20 341570 365699 1520453 1977878
    30 288102 307037 1498167 2037995
    40 290714 305955 1612665 2256484
    50 288620 312890 1733453 2650292
    60 289987 306043 1649360 2388008
    70 291298 306347 1723167 2717486
    80 290948 305662 1729545 2763582
    90 290996 306680 1736021 2757524
    100 292243 306700 1773700 3059159

    [davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
    [davidlohr.bueso@hp.com: make refcounter atomic]
    Signed-off-by: Rik van Riel
    Suggested-by: Linus Torvalds
    Acked-by: Davidlohr Bueso
    Cc: Chegu Vinod
    Cc: Jason Low
    Reviewed-by: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Emmanuel Benisty
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • [fengguang.wu@intel.com: find_msg can be static]
    Signed-off-by: Peter Hurley
    Cc: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley