18 Oct, 2013

39 commits

  • commit 4271b05a227dc6175b66c3d9941aeab09048aeb2 upstream.

    This fixes a race in both msgrcv() and msgsnd() between finding the msg
    and actually dealing with the queue, as another thread can delete shmid
    underneath us if we are preempted before acquiring the
    kern_ipc_perm.lock.

    Manfred illustrates this nicely:

    Assume a preemptible kernel that is preempted just after

    msq = msq_obtain_object_check(ns, msqid)

    in do_msgrcv(). The only lock that is held is rcu_read_lock().

    Now the other thread processes IPC_RMID. When the first task is
    resumed, then it will happily wait for messages on a deleted queue.

    Fix this by checking for if the queue has been deleted after taking the
    lock.

    Signed-off-by: Davidlohr Bueso
    Reported-by: Manfred Spraul
    Cc: Rik van Riel
    Cc: Mike Galbraith
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 0e8c665699e953fa58dc1b0b0d09e5dce7343cc7 upstream.

    In commit 0a2b9d4c7967 ("ipc/sem.c: move wake_up_process out of the
    spinlock section"), the update of semaphore's sem_otime(last semop time)
    was moved to one central position (do_smart_update).

    But since do_smart_update() is only called for operations that modify
    the array, this means that wait-for-zero semops do not update sem_otime
    anymore.

    The fix is simple:
    Non-alter operations must update sem_otime.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Manfred Spraul
    Reported-by: Jia He
    Tested-by: Jia He
    Cc: Davidlohr Bueso
    Cc: Mike Galbraith
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Manfred Spraul
     
  • commit d8c633766ad88527f25d9f81a5c2f083d78a2b39 upstream.

    The proc interface is not aware of sem_lock(), it instead calls
    ipc_lock_object() directly. This means that simple semop() operations
    can run in parallel with the proc interface. Right now, this is
    uncritical, because the implementation doesn't do anything that requires
    a proper synchronization.

    But it is dangerous and therefore should be fixed.

    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Mike Galbraith
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Manfred Spraul
     
  • commit 6d07b68ce16ae9535955ba2059dedba5309c3ca1 upstream.

    Operations that need access to the whole array must guarantee that there
    are no simple operations ongoing. Right now this is achieved by
    spin_unlock_wait(sem->lock) on all semaphores.

    If complex_count is nonzero, then this spin_unlock_wait() is not
    necessary, because it was already performed in the past by the thread
    that increased complex_count and even though sem_perm.lock was dropped
    inbetween, no simple operation could have started, because simple
    operations cannot start when complex_count is non-zero.

    Signed-off-by: Manfred Spraul
    Cc: Mike Galbraith
    Cc: Rik van Riel
    Reviewed-by: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Manfred Spraul
     
  • commit 5e9d527591421ccdb16acb8c23662231135d8686 upstream.

    The exclusion of complex operations in sem_lock() is insufficient: after
    acquiring the per-semaphore lock, a simple op must first check that
    sem_perm.lock is not locked and only after that test check
    complex_count. The current code does it the other way around - and that
    creates a race. Details are below.

    The patch is a complete rewrite of sem_lock(), based in part on the code
    from Mike Galbraith. It removes all gotos and all loops and thus the
    risk of livelocks.

    I have tested the patch (together with the next one) on my i3 laptop and
    it didn't cause any problems.

    The bug is probably also present in 3.10 and 3.11, but for these kernels
    it might be simpler just to move the test of sma->complex_count after
    the spin_is_locked() test.

    Details of the bug:

    Assume:
    - sma->complex_count = 0.
    - Thread 1: semtimedop(complex op that must sleep)
    - Thread 2: semtimedop(simple op).

    Pseudo-Trace:

    Thread 1: sem_lock(): acquire sem_perm.lock
    Thread 1: sem_lock(): check for ongoing simple ops
    Nothing ongoing, thread 2 is still before sem_lock().
    Thread 1: try_atomic_semop()
    <<< preempted.

    Thread 2: sem_lock():
    static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
    int nsops)
    {
    int locknum;
    again:
    if (nsops == 1 && !sma->complex_count) {
    struct sem *sem = sma->sem_base + sops->sem_num;

    /* Lock just the semaphore we are interested in. */
    spin_lock(&sem->lock);

    /*
    * If sma->complex_count was set while we were spinning,
    * we may need to look at things we did not lock here.
    */
    if (unlikely(sma->complex_count)) {
    spin_unlock(&sem->lock);
    goto lock_array;
    }
    <<<<<<<<<
    <<< complex_count is still 0.
    <<<
    <<< Here it is preempted
    <<<<<<<<<

    Thread 1: try_atomic_semop() returns, notices that it must sleep.
    Thread 1: increases sma->complex_count.
    Thread 1: drops sem_perm.lock
    Thread 2:
    /*
    * Another process is holding the global lock on the
    * sem_array; we cannot enter our critical section,
    * but have to wait for the global lock to be released.
    */
    if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
    spin_unlock(&sem->lock);
    spin_unlock_wait(&sma->sem_perm.lock);
    goto again;
    }
    <<< sem_perm.lock already dropped, thus no "goto again;"

    locknum = sops->sem_num;

    Signed-off-by: Manfred Spraul
    Cc: Mike Galbraith
    Cc: Rik van Riel
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Manfred Spraul
     
  • commit 53dad6d3a8e5ac1af8bacc6ac2134ae1a8b085f1 upstream.

    Currently, IPC mechanisms do security and auditing related checks under
    RCU. However, since security modules can free the security structure,
    for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
    race if the structure is freed before other tasks are done with it,
    creating a use-after-free condition. Manfred illustrates this nicely,
    for instance with shared mem and selinux:

    -> do_shmat calls rcu_read_lock()
    -> do_shmat calls shm_object_check().
    Checks that the object is still valid - but doesn't acquire any locks.
    Then it returns.
    -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
    -> selinux_shm_shmat calls ipc_has_perm()
    -> ipc_has_perm accesses ipc_perms->security

    shm_close()
    -> shm_close acquires rw_mutex & shm_lock
    -> shm_close calls shm_destroy
    -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
    -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
    -> ipc_free_security calls kfree(ipc_perms->security)

    This patch delays the freeing of the security structures after all RCU
    readers are done. Furthermore it aligns the security life cycle with
    that of the rest of IPC - freeing them based on the reference counter.
    For situations where we need not free security, the current behavior is
    kept. Linus states:

    "... the old behavior was suspect for another reason too: having the
    security blob go away from under a user sounds like it could cause
    various other problems anyway, so I think the old code was at least
    _prone_ to bugs even if it didn't have catastrophic behavior."

    I have tested this patch with IPC testcases from LTP on both my
    quad-core laptop and on a 64 core NUMA server. In both cases selinux is
    enabled, and tests pass for both voluntary and forced preemption models.
    While the mentioned races are theoretical (at least no one as reported
    them), I wanted to make sure that this new logic doesn't break anything
    we weren't aware of.

    Suggested-by: Linus Torvalds
    Signed-off-by: Davidlohr Bueso
    Acked-by: Manfred Spraul
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 20b8875abcf2daa1dda5cf70bd6369df5e85d4c1 upstream.

    No remaining users, we now use ipc_obtain_object_check().

    Signed-off-by: Davidlohr Bueso
    Cc: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 7a25dd9e042b2b94202a67e5551112f4ac87285a upstream.

    This function was replaced by a the lockless shm_obtain_object_check(),
    and no longer has any users.

    Signed-off-by: Davidlohr Bueso
    Cc: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 32a2750010981216fb788c5190fb0e646abfab30 upstream.

    After previous cleanups and optimizations, this function is no longer
    heavily used and we don't have a good reason to keep it. Update the few
    remaining callers and get rid of it.

    Signed-off-by: Davidlohr Bueso
    Cc: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 530fcd16d87cd2417c472a581ba5a1e501556c86 upstream.

    When !CONFIG_MMU there's a chance we can derefence a NULL pointer when the
    VM area isn't found - check the return value of find_vma().

    Also, remove the redundant -EINVAL return: retval is set to the proper
    return code and *only* changed to 0, when we actually unmap the segments.

    Signed-off-by: Davidlohr Bueso
    Cc: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 05603c44a7627793219b0bd9a7b236099dc9cd9d upstream.

    As suggested by Andrew, add a generic initial locking scheme used
    throughout all sysv ipc mechanisms. Documenting the ids rwsem, how rcu
    can be enough to do the initial checks and when to actually acquire the
    kern_ipc_perm.lock spinlock.

    I found that adding it to util.c was generic enough.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 4718787d1f626f45ddb239912bc07266b9880044 upstream.

    There is only one user left, drop this function and just call
    ipc_unlock_object() and rcu_read_unlock().

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit d9a605e40b1376eb02b067d7690580255a0df68f upstream.

    Since in some situations the lock can be shared for readers, we shouldn't
    be calling it a mutex, rename it to rwsem.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit c2c737a0461e61a34676bd0bd1bc1a70a1b4e396 upstream.

    Similar to other system calls, acquire the kern_ipc_perm lock after doing
    the initial permission and security checks.

    [sasha.levin@oracle.com: dont leave do_shmat with rcu lock held]
    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit f42569b1388b1408b574a5e93a23a663647d4181 upstream.

    Clean up some of the messy do_shmat() spaghetti code, getting rid of
    out_free and out_put_dentry labels. This makes shortening the critical
    region of this function in the next patch a little easier to do and read.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 2caacaa82a51b78fc0c800e206473874094287ed upstream.

    With the *_INFO, *_STAT, IPC_RMID and IPC_SET commands already optimized,
    deal with the remaining SHM_LOCK and SHM_UNLOCK commands. Take the
    shm_perm lock after doing the initial auditing and security checks. The
    rest of the logic remains unchanged.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit c97cb9ccab8c85428ec21eff690642ad2ce1fa8a upstream.

    While the INFO cmd doesn't take the ipc lock, the STAT commands do acquire
    it unnecessarily. We can do the permissions and security checks only
    holding the rcu lock.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 68eccc1dc345539d589ae78ee43b835c1a06a134 upstream.

    Similar to semctl and msgctl, when calling msgctl, the *_INFO and *_STAT
    commands can be performed without acquiring the ipc object.

    Add a shmctl_nolock() function and move the logic of *_INFO and *_STAT out
    of msgctl(). Since we are just moving functionality, this change still
    takes the lock and it will be properly lockless in the next patch.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 3b1c4ad37741e53804ffe0a30dd01e08b2ab6241 upstream.

    Now that sem, msgque and shm, through *_down(), all use the lockless
    variant of ipcctl_pre_down(), go ahead and delete it.

    [akpm@linux-foundation.org: fix function name in kerneldoc, cleanups]
    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 79ccf0f8c8e04e8b9eda6645ba0f63b0915a3075 upstream.

    Instead of holding the ipc lock for the entire function, use the
    ipcctl_pre_down_nolock and only acquire the lock for specific commands:
    RMID and SET.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 8b8d52ac382b17a19906b930cd69e2edb0aca8ba upstream.

    This is the third and final patchset that deals with reducing the amount
    of contention we impose on the ipc lock (kern_ipc_perm.lock). These
    changes mostly deal with shared memory, previous work has already been
    done for semaphores and message queues:

    http://lkml.org/lkml/2013/3/20/546 (sems)
    http://lkml.org/lkml/2013/5/15/584 (mqueues)

    With these patches applied, a custom shm microbenchmark stressing shmctl
    doing IPC_STAT with 4 threads a million times, reduces the execution
    time by 50%. A similar run, this time with IPC_SET, reduces the
    execution time from 3 mins and 35 secs to 27 seconds.

    Patches 1-8: replaces blindly taking the ipc lock for a smarter
    combination of rcu and ipc_obtain_object, only acquiring the spinlock
    when updating.

    Patch 9: renames the ids rw_mutex to rwsem, which is what it already was.

    Patch 10: is a trivial mqueue leftover cleanup

    Patch 11: adds a brief lock scheme description, requested by Andrew.

    This patch:

    Add shm_obtain_object() and shm_obtain_object_check(), which will allow us
    to get the ipc object without acquiring the lock. Just as with other
    forms of ipc, these functions are basically wrappers around
    ipc_obtain_object*().

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit bebcb928c820d0ee83aca4b192adc195e43e66a2 upstream.

    The check if the queue is full and adding current to the wait queue of
    pending msgsnd() operations (ss_add()) must be atomic.

    Otherwise:
    - the thread that performs msgsnd() finds a full queue and decides to
    sleep.
    - the thread that performs msgrcv() first reads all messages from the
    queue and then sleeps, because the queue is empty.
    - the msgrcv() calls do not perform any wakeups, because the msgsnd()
    task has not yet called ss_add().
    - then the msgsnd()-thread first calls ss_add() and then sleeps.

    Net result: msgsnd() and msgrcv() both sleep forever.

    Observed with msgctl08 from ltp with a preemptible kernel.

    Fix: Call ipc_lock_object() before performing the check.

    The patch also moves security_msg_queue_msgsnd() under ipc_lock_object:
    - msgctl(IPC_SET) explicitely mentions that it tries to expunge any
    pending operations that are not allowed anymore with the new
    permissions. If security_msg_queue_msgsnd() is called without locks,
    then there might be races.
    - it makes the patch much simpler.

    Reported-and-tested-by: Vineet Gupta
    Acked-by: Rik van Riel
    Signed-off-by: Manfred Spraul
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Manfred Spraul
     
  • commit 758a6ba39ef6df4cdc615e5edd7bd86eab81a5f7 upstream.

    Cleanup: Some minor points that I noticed while writing the previous
    patches

    1) The name try_atomic_semop() is misleading: The function performs the
    operation (if it is possible).

    2) Some documentation updates.

    No real code change, a rename and documentation changes.

    Signed-off-by: Manfred Spraul
    Cc: Rik van Riel
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Manfred Spraul
     
  • commit d12e1e50e47e0900dbbf52237b7e171f4f15ea1e upstream.

    sem_otime contains the time of the last semaphore operation that
    completed successfully. Every operation updates this value, thus access
    from multiple cpus can cause thrashing.

    Therefore the patch replaces the variable with a per-semaphore variable.
    The per-array sem_otime is only calculated when required.

    No performance improvement on a single-socket i3 - only important for
    larger systems.

    Signed-off-by: Manfred Spraul
    Cc: Rik van Riel
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Manfred Spraul
     
  • commit f269f40ad5aeee229ed70044926f44318abe41ef upstream.

    There are two places that can contain alter operations:
    - the global queue: sma->pending_alter
    - the per-semaphore queues: sma->sem_base[].pending_alter.

    Since one of the queues must be processed first, this causes an odd
    priorization of the wakeups: complex operations have priority over
    simple ops.

    The patch restores the behavior of linux pending_alter is used.
    - otherwise, the per-semaphore queues are used.

    As a side effect, do_smart_update_queue() becomes much simpler: no more
    goto logic.

    Signed-off-by: Manfred Spraul
    Cc: Rik van Riel
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Manfred Spraul
     
  • commit 1a82e9e1d0f1b45f47a97c9e2349020536ff8987 upstream.

    Introduce separate queues for operations that do not modify the
    semaphore values. Advantages:

    - Simpler logic in check_restart().
    - Faster update_queue(): Right now, all wait-for-zero operations are
    always tested, even if the semaphore value is not 0.
    - wait-for-zero gets again priority, as in linux
    Cc: Rik van Riel
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Manfred Spraul
     
  • commit f5c936c0f267ec58641451cf8b8d39b4c207ee4d upstream.

    As now each semaphore has its own spinlock and parallel operations are
    possible, give each semaphore its own cacheline.

    On a i3 laptop, this gives up to 28% better performance:

    #semscale 10 | grep "interleave 2"
    - before:
    Cpus 1, interleave 2 delay 0: 36109234 in 10 secs
    Cpus 2, interleave 2 delay 0: 55276317 in 10 secs
    Cpus 3, interleave 2 delay 0: 62411025 in 10 secs
    Cpus 4, interleave 2 delay 0: 81963928 in 10 secs

    -after:
    Cpus 1, interleave 2 delay 0: 35527306 in 10 secs
    Cpus 2, interleave 2 delay 0: 70922909 in 10 secs <<< + 28%
    Cpus 3, interleave 2 delay 0: 80518538 in 10 secs
    Cpus 4, interleave 2 delay 0: 89115148 in 10 secs <<< + 8.7%

    i3, with 2 cores and with hyperthreading enabled. Interleave 2 in order
    use first the full cores. HT partially hides the delay from cacheline
    trashing, thus the improvement is "only" 8.7% if 4 threads are running.

    Signed-off-by: Manfred Spraul
    Cc: Rik van Riel
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Manfred Spraul
     
  • commit 196aa0132fc7261f34b10ae1bfb44abc1bc69b3c upstream.

    Enforce that ipc_rcu_alloc returns a cacheline aligned pointer on SMP.

    Rationale:

    The SysV sem code tries to move the main spinlock into a seperate
    cacheline (____cacheline_aligned_in_smp). This works only if
    ipc_rcu_alloc returns cacheline aligned pointers. vmalloc and kmalloc
    return cacheline algined pointers, the implementation of ipc_rcu_alloc
    breaks that.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Manfred Spraul
    Cc: Rik van Riel
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Manfred Spraul
     
  • commit 9ad66ae65fc8d3e7e3344310fb0aa835910264fe upstream.

    We can now drop the msg_lock and msg_lock_check functions along with a
    bogus comment introduced previously in semctl_down.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 41a0d523d0f626e9da0dc01de47f1b89058033cf upstream.

    do_msgrcv() is the last msg queue function that abuses the ipc lock Take
    it only when needed when actually updating msq.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 3dd1f784ed6603d7ab1043e51e6371235edf2313 upstream.

    do_msgsnd() is another function that does too many things with the ipc
    object lock acquired. Take it only when needed when actually updating
    msq.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit ac0ba20ea6f2201a1589d6dc26ad1a4f0f967bb8 upstream.

    While the INFO cmd doesn't take the ipc lock, the STAT commands do
    acquire it unnecessarily. We can do the permissions and security checks
    only holding the rcu lock.

    This function now mimics semctl_nolock().

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit a5001a0d9768568de5d613c3b3a5b9c7721299da upstream.

    Add msq_obtain_object() and msq_obtain_object_check(), which will allow
    us to get the ipc object without acquiring the lock. Just as with
    semaphores, these functions are basically wrappers around
    ipc_obtain_object*().

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 2cafed30f150f7314f98717b372df8173516cae0 upstream.

    Similar to semctl, when calling msgctl, the *_INFO and *_STAT commands
    can be performed without acquiring the ipc object.

    Add a msgctl_nolock() function and move the logic of *_INFO and *_STAT
    out of msgctl(). This change still takes the lock and it will be
    properly lockless in the next patch

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 15724ecb7e9bab35fc694c666ad563adba820cc3 upstream.

    Instead of holding the ipc lock for the entire function, use the
    ipcctl_pre_down_nolock and only acquire the lock for specific commands:
    RMID and SET.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 7b4cc5d8411bd4e9d61d8714f53859740cf830c2 upstream.

    This function currently acquires both the rw_mutex and the rcu lock on
    successful lookups, leaving the callers to explicitly unlock them,
    creating another two level locking situation.

    Make the callers (including those that still use ipcctl_pre_down())
    explicitly lock and unlock the rwsem and rcu lock.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit cf9d5d78d05bca96df7618dfc3a5ee4414dcae58 upstream.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit 1ca7003ab41152d673d9e359632283d05294f3d6 upstream.

    Simple helpers around the (kern_ipc_perm *)->lock spinlock.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • commit dbfcd91f06f0e2d5564b2fd184e9c2a43675f9ab upstream.

    This patchset continues the work that began in the sysv ipc semaphore
    scaling series, see

    https://lkml.org/lkml/2013/3/20/546

    Just like semaphores used to be, sysv shared memory and msg queues also
    abuse the ipc lock, unnecessarily holding it for operations such as
    permission and security checks.

    This patchset mostly deals with mqueues, and while shared mem can be
    done in a very similar way, I want to get these patches out in the open
    first. It also does some pending cleanups, mostly focused on the two
    level locking we have in ipc code, taking care of ipc_addid() and
    ipcctl_pre_down_nolock() - yes there are still functions that need to be
    updated as well.

    This patch:

    Make all callers explicitly take and release the RCU read lock.

    This addresses the two level locking seen in newary(), newseg() and
    newqueue(). For the last two, explicitly unlock the ipc object and the
    rcu lock, instead of calling the custom shm_unlock and msg_unlock
    functions. The next patch will deal with the open coded locking for
    ->perm.lock

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     

08 Sep, 2013

1 commit

  • commit 368ae537e056acd3f751fa276f48423f06803922 upstream.

    According to 'man msgrcv': "If msgtyp is less than 0, the first message of
    the lowest type that is less than or equal to the absolute value of msgtyp
    shall be received."

    Bug: The kernel only returns a message if its type is 1; other messages
    with type < abs(msgtype) will never get returned.

    Fix: After having traversed the list to find the first message with the
    lowest type, we need to actually return that message.

    This regression was introduced by commit daaf74cf0867 ("ipc: refactor
    msg list search into separate function")

    Signed-off-by: Svenning Soerensen
    Reviewed-by: Peter Hurley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Svenning Soerensen