Eric Lee / smarc-fsl-linux-kernel

18 Oct, 2013

39 commits

3232569ec ipc,msg: prevent race with rmid in msgsnd,msgrcv ... Browse Code »

commit 4271b05a227dc6175b66c3d9941aeab09048aeb2 upstream.

This fixes a race in both msgrcv() and msgsnd() between finding the msg
and actually dealing with the queue, as another thread can delete shmid
underneath us if we are preempted before acquiring the
kern_ipc_perm.lock.

Manfred illustrates this nicely:

Assume a preemptible kernel that is preempted just after

msq = msq_obtain_object_check(ns, msqid)

in do_msgrcv(). The only lock that is held is rcu_read_lock().

Now the other thread processes IPC_RMID. When the first task is
resumed, then it will happily wait for messages on a deleted queue.

Fix this by checking for if the queue has been deleted after taking the
lock.

Signed-off-by: Davidlohr Bueso
Reported-by: Manfred Spraul
Cc: Rik van Riel
Cc: Mike Galbraith
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:48 +0800
e556ea019 ipc/sem.c: update sem_otime for all operations ... Browse Code »

commit 0e8c665699e953fa58dc1b0b0d09e5dce7343cc7 upstream.

In commit 0a2b9d4c7967 ("ipc/sem.c: move wake_up_process out of the
spinlock section"), the update of semaphore's sem_otime(last semop time)
was moved to one central position (do_smart_update).

But since do_smart_update() is only called for operations that modify
the array, this means that wait-for-zero semops do not update sem_otime
anymore.

The fix is simple:
Non-alter operations must update sem_otime.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul
Reported-by: Jia He
Tested-by: Jia He
Cc: Davidlohr Bueso
Cc: Mike Galbraith
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Manfred Spraul
2013-10-18 22:45:48 +0800
83aeb6e34 ipc/sem.c: synchronize the proc interface ... Browse Code »

commit d8c633766ad88527f25d9f81a5c2f083d78a2b39 upstream.

The proc interface is not aware of sem_lock(), it instead calls
ipc_lock_object() directly. This means that simple semop() operations
can run in parallel with the proc interface. Right now, this is
uncritical, because the implementation doesn't do anything that requires
a proper synchronization.

But it is dangerous and therefore should be fixed.

Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Mike Galbraith
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Manfred Spraul
2013-10-18 22:45:48 +0800
901f6fedc ipc/sem.c: optimize sem_lock() ... Browse Code »

commit 6d07b68ce16ae9535955ba2059dedba5309c3ca1 upstream.

Operations that need access to the whole array must guarantee that there
are no simple operations ongoing. Right now this is achieved by
spin_unlock_wait(sem->lock) on all semaphores.

If complex_count is nonzero, then this spin_unlock_wait() is not
necessary, because it was already performed in the past by the thread
that increased complex_count and even though sem_perm.lock was dropped
inbetween, no simple operation could have started, because simple
operations cannot start when complex_count is non-zero.

Signed-off-by: Manfred Spraul
Cc: Mike Galbraith
Cc: Rik van Riel
Reviewed-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Manfred Spraul
2013-10-18 22:45:48 +0800
184076a9f ipc/sem.c: fix race in sem_lock() ... Browse Code »

commit 5e9d527591421ccdb16acb8c23662231135d8686 upstream.

The exclusion of complex operations in sem_lock() is insufficient: after
acquiring the per-semaphore lock, a simple op must first check that
sem_perm.lock is not locked and only after that test check
complex_count. The current code does it the other way around - and that
creates a race. Details are below.

The patch is a complete rewrite of sem_lock(), based in part on the code
from Mike Galbraith. It removes all gotos and all loops and thus the
risk of livelocks.

I have tested the patch (together with the next one) on my i3 laptop and
it didn't cause any problems.

The bug is probably also present in 3.10 and 3.11, but for these kernels
it might be simpler just to move the test of sma->complex_count after
the spin_is_locked() test.

Details of the bug:

Assume:
- sma->complex_count = 0.
- Thread 1: semtimedop(complex op that must sleep)
- Thread 2: semtimedop(simple op).

Pseudo-Trace:

Thread 1: sem_lock(): acquire sem_perm.lock
Thread 1: sem_lock(): check for ongoing simple ops
Nothing ongoing, thread 2 is still before sem_lock().
Thread 1: try_atomic_semop()
<<< preempted.

Thread 2: sem_lock():
static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
int nsops)
{
int locknum;
again:
if (nsops == 1 && !sma->complex_count) {
struct sem *sem = sma->sem_base + sops->sem_num;

/* Lock just the semaphore we are interested in. */
spin_lock(&sem->lock);

/*
* If sma->complex_count was set while we were spinning,
* we may need to look at things we did not lock here.
*/
if (unlikely(sma->complex_count)) {
spin_unlock(&sem->lock);
goto lock_array;
}
<<<<<<<<<
<<< complex_count is still 0.
<<<
<<< Here it is preempted
<<<<<<<<<

Thread 1: try_atomic_semop() returns, notices that it must sleep.
Thread 1: increases sma->complex_count.
Thread 1: drops sem_perm.lock
Thread 2:
/*
* Another process is holding the global lock on the
* sem_array; we cannot enter our critical section,
* but have to wait for the global lock to be released.
*/
if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
spin_unlock(&sem->lock);
spin_unlock_wait(&sma->sem_perm.lock);
goto again;
}
<<< sem_perm.lock already dropped, thus no "goto again;"

locknum = sops->sem_num;

Signed-off-by: Manfred Spraul
Cc: Mike Galbraith
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Manfred Spraul
2013-10-18 22:45:48 +0800
e84ca3337 ipc: fix race with LSMs ... Browse Code »

commit 53dad6d3a8e5ac1af8bacc6ac2134ae1a8b085f1 upstream.

Currently, IPC mechanisms do security and auditing related checks under
RCU. However, since security modules can free the security structure,
for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
race if the structure is freed before other tasks are done with it,
creating a use-after-free condition. Manfred illustrates this nicely,
for instance with shared mem and selinux:

-> do_shmat calls rcu_read_lock()
-> do_shmat calls shm_object_check().
Checks that the object is still valid - but doesn't acquire any locks.
Then it returns.
-> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
-> selinux_shm_shmat calls ipc_has_perm()
-> ipc_has_perm accesses ipc_perms->security

shm_close()
-> shm_close acquires rw_mutex & shm_lock
-> shm_close calls shm_destroy
-> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
-> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
-> ipc_free_security calls kfree(ipc_perms->security)

This patch delays the freeing of the security structures after all RCU
readers are done. Furthermore it aligns the security life cycle with
that of the rest of IPC - freeing them based on the reference counter.
For situations where we need not free security, the current behavior is
kept. Linus states:

"... the old behavior was suspect for another reason too: having the
security blob go away from under a user sounds like it could cause
various other problems anyway, so I think the old code was at least
_prone_ to bugs even if it didn't have catastrophic behavior."

I have tested this patch with IPC testcases from LTP on both my
quad-core laptop and on a 64 core NUMA server. In both cases selinux is
enabled, and tests pass for both voluntary and forced preemption models.
While the mentioned races are theoretical (at least no one as reported
them), I wanted to make sure that this new logic doesn't break anything
we weren't aware of.

Suggested-by: Linus Torvalds
Signed-off-by: Davidlohr Bueso
Acked-by: Manfred Spraul
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:48 +0800
c42107e68 ipc: drop ipc_lock_check ... Browse Code »

commit 20b8875abcf2daa1dda5cf70bd6369df5e85d4c1 upstream.

No remaining users, we now use ipc_obtain_object_check().

Signed-off-by: Davidlohr Bueso
Cc: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:48 +0800
1129a4810 ipc, shm: drop shm_lock_check ... Browse Code »

commit 7a25dd9e042b2b94202a67e5551112f4ac87285a upstream.

This function was replaced by a the lockless shm_obtain_object_check(),
and no longer has any users.

Signed-off-by: Davidlohr Bueso
Cc: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:48 +0800
ffa02e67e ipc: drop ipc_lock_by_ptr ... Browse Code »

commit 32a2750010981216fb788c5190fb0e646abfab30 upstream.

After previous cleanups and optimizations, this function is no longer
heavily used and we don't have a good reason to keep it. Update the few
remaining callers and get rid of it.

Signed-off-by: Davidlohr Bueso
Cc: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:48 +0800
48ec782ce ipc, shm: guard against non-existant vma in shmdt(2) ... Browse Code »

commit 530fcd16d87cd2417c472a581ba5a1e501556c86 upstream.

When !CONFIG_MMU there's a chance we can derefence a NULL pointer when the
VM area isn't found - check the return value of find_vma().

Also, remove the redundant -EINVAL return: retval is set to the proper
return code and *only* changed to 0, when we actually unmap the segments.

Signed-off-by: Davidlohr Bueso
Cc: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:48 +0800
a5daa172f ipc: document general ipc locking scheme ... Browse Code »

commit 05603c44a7627793219b0bd9a7b236099dc9cd9d upstream.

As suggested by Andrew, add a generic initial locking scheme used
throughout all sysv ipc mechanisms. Documenting the ids rwsem, how rcu
can be enough to do the initial checks and when to actually acquire the
kern_ipc_perm.lock spinlock.

I found that adding it to util.c was generic enough.

Signed-off-by: Davidlohr Bueso
Tested-by: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:48 +0800
c14381373 ipc,msg: drop msg_unlock ... Browse Code »

commit 4718787d1f626f45ddb239912bc07266b9880044 upstream.

There is only one user left, drop this function and just call
ipc_unlock_object() and rcu_read_unlock().

Signed-off-by: Davidlohr Bueso
Tested-by: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:48 +0800
33b746698 ipc: rename ids->rw_mutex ... Browse Code »

commit d9a605e40b1376eb02b067d7690580255a0df68f upstream.

Since in some situations the lock can be shared for readers, we shouldn't
be calling it a mutex, rename it to rwsem.

Signed-off-by: Davidlohr Bueso
Tested-by: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:47 +0800
bd58e2dc2 ipc,shm: shorten critical region for shmat ... Browse Code »

commit c2c737a0461e61a34676bd0bd1bc1a70a1b4e396 upstream.

Similar to other system calls, acquire the kern_ipc_perm lock after doing
the initial permission and security checks.

[sasha.levin@oracle.com: dont leave do_shmat with rcu lock held]
Signed-off-by: Davidlohr Bueso
Tested-by: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Sasha Levin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:47 +0800
247ec302b ipc,shm: cleanup do_shmat pasta ... Browse Code »

commit f42569b1388b1408b574a5e93a23a663647d4181 upstream.

Clean up some of the messy do_shmat() spaghetti code, getting rid of
out_free and out_put_dentry labels. This makes shortening the critical
region of this function in the next patch a little easier to do and read.

Signed-off-by: Davidlohr Bueso
Tested-by: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:47 +0800
00c88e695 ipc,shm: shorten critical region for shmctl ... Browse Code »

commit 2caacaa82a51b78fc0c800e206473874094287ed upstream.

With the *_INFO, *_STAT, IPC_RMID and IPC_SET commands already optimized,
deal with the remaining SHM_LOCK and SHM_UNLOCK commands. Take the
shm_perm lock after doing the initial auditing and security checks. The
rest of the logic remains unchanged.

Signed-off-by: Davidlohr Bueso
Tested-by: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:47 +0800
c29c40392 ipc,shm: make shmctl_nolock lockless ... Browse Code »

commit c97cb9ccab8c85428ec21eff690642ad2ce1fa8a upstream.

While the INFO cmd doesn't take the ipc lock, the STAT commands do acquire
it unnecessarily. We can do the permissions and security checks only
holding the rcu lock.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Davidlohr Bueso
Tested-by: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:47 +0800
d6187ddfc ipc,shm: introduce shmctl_nolock ... Browse Code »

commit 68eccc1dc345539d589ae78ee43b835c1a06a134 upstream.

Similar to semctl and msgctl, when calling msgctl, the *_INFO and *_STAT
commands can be performed without acquiring the ipc object.

Add a shmctl_nolock() function and move the logic of *_INFO and *_STAT out
of msgctl(). Since we are just moving functionality, this change still
takes the lock and it will be properly lockless in the next patch.

Signed-off-by: Davidlohr Bueso
Tested-by: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:47 +0800
60425b7b2 ipc: drop ipcctl_pre_down ... Browse Code »

commit 3b1c4ad37741e53804ffe0a30dd01e08b2ab6241 upstream.

Now that sem, msgque and shm, through *_down(), all use the lockless
variant of ipcctl_pre_down(), go ahead and delete it.

[akpm@linux-foundation.org: fix function name in kerneldoc, cleanups]
Signed-off-by: Davidlohr Bueso
Tested-by: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:47 +0800
b3b7b427f ipc,shm: shorten critical region in shmctl_down ... Browse Code »

commit 79ccf0f8c8e04e8b9eda6645ba0f63b0915a3075 upstream.

Instead of holding the ipc lock for the entire function, use the
ipcctl_pre_down_nolock and only acquire the lock for specific commands:
RMID and SET.

Signed-off-by: Davidlohr Bueso
Tested-by: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:47 +0800
1b2ad1674 ipc,shm: introduce lockless functions to obtain the ipc object ... Browse Code »

commit 8b8d52ac382b17a19906b930cd69e2edb0aca8ba upstream.

This is the third and final patchset that deals with reducing the amount
of contention we impose on the ipc lock (kern_ipc_perm.lock). These
changes mostly deal with shared memory, previous work has already been
done for semaphores and message queues:

http://lkml.org/lkml/2013/3/20/546 (sems)
http://lkml.org/lkml/2013/5/15/584 (mqueues)

With these patches applied, a custom shm microbenchmark stressing shmctl
doing IPC_STAT with 4 threads a million times, reduces the execution
time by 50%. A similar run, this time with IPC_SET, reduces the
execution time from 3 mins and 35 secs to 27 seconds.

Patches 1-8: replaces blindly taking the ipc lock for a smarter
combination of rcu and ipc_obtain_object, only acquiring the spinlock
when updating.

Patch 9: renames the ids rw_mutex to rwsem, which is what it already was.

Patch 10: is a trivial mqueue leftover cleanup

Patch 11: adds a brief lock scheme description, requested by Andrew.

This patch:

Add shm_obtain_object() and shm_obtain_object_check(), which will allow us
to get the ipc object without acquiring the lock. Just as with other
forms of ipc, these functions are basically wrappers around
ipc_obtain_object*().

Signed-off-by: Davidlohr Bueso
Tested-by: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:47 +0800
11ce33923 ipc/msg.c: Fix lost wakeup in msgsnd(). ... Browse Code »

commit bebcb928c820d0ee83aca4b192adc195e43e66a2 upstream.

The check if the queue is full and adding current to the wait queue of
pending msgsnd() operations (ss_add()) must be atomic.

Otherwise:
- the thread that performs msgsnd() finds a full queue and decides to
sleep.
- the thread that performs msgrcv() first reads all messages from the
queue and then sleeps, because the queue is empty.
- the msgrcv() calls do not perform any wakeups, because the msgsnd()
task has not yet called ss_add().
- then the msgsnd()-thread first calls ss_add() and then sleeps.

Net result: msgsnd() and msgrcv() both sleep forever.

Observed with msgctl08 from ltp with a preemptible kernel.

Fix: Call ipc_lock_object() before performing the check.

The patch also moves security_msg_queue_msgsnd() under ipc_lock_object:
- msgctl(IPC_SET) explicitely mentions that it tries to expunge any
pending operations that are not allowed anymore with the new
permissions. If security_msg_queue_msgsnd() is called without locks,
then there might be races.
- it makes the patch much simpler.

Reported-and-tested-by: Vineet Gupta
Acked-by: Rik van Riel
Signed-off-by: Manfred Spraul
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Manfred Spraul
2013-10-18 22:45:47 +0800
b56e88e25 ipc/sem.c: rename try_atomic_semop() to perform_atomic_semop(), docu update ... Browse Code »

commit 758a6ba39ef6df4cdc615e5edd7bd86eab81a5f7 upstream.

Cleanup: Some minor points that I noticed while writing the previous
patches

1) The name try_atomic_semop() is misleading: The function performs the
operation (if it is possible).

2) Some documentation updates.

No real code change, a rename and documentation changes.

Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Manfred Spraul
2013-10-18 22:45:47 +0800
bf6830ad6 ipc/sem.c: replace shared sem_otime with per-semaphore value ... Browse Code »

commit d12e1e50e47e0900dbbf52237b7e171f4f15ea1e upstream.

sem_otime contains the time of the last semaphore operation that
completed successfully. Every operation updates this value, thus access
from multiple cpus can cause thrashing.

Therefore the patch replaces the variable with a per-semaphore variable.
The per-array sem_otime is only calculated when required.

No performance improvement on a single-socket i3 - only important for
larger systems.

Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Manfred Spraul
2013-10-18 22:45:47 +0800
e5639c528 ipc/sem.c: always use only one queue for alter operations ... Browse Code »

commit f269f40ad5aeee229ed70044926f44318abe41ef upstream.

There are two places that can contain alter operations:
- the global queue: sma->pending_alter
- the per-semaphore queues: sma->sem_base[].pending_alter.

Since one of the queues must be processed first, this causes an odd
priorization of the wakeups: complex operations have priority over
simple ops.

The patch restores the behavior of linux pending_alter is used.
- otherwise, the per-semaphore queues are used.

As a side effect, do_smart_update_queue() becomes much simpler: no more
goto logic.

Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Manfred Spraul
2013-10-18 22:45:47 +0800
ab63bc97f ipc/sem: separate wait-for-zero and alter tasks into seperate queues ... Browse Code »

commit 1a82e9e1d0f1b45f47a97c9e2349020536ff8987 upstream.

Introduce separate queues for operations that do not modify the
semaphore values. Advantages:

- Simpler logic in check_restart().
- Faster update_queue(): Right now, all wait-for-zero operations are
always tested, even if the semaphore value is not 0.
- wait-for-zero gets again priority, as in linux
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Manfred Spraul
2013-10-18 22:45:47 +0800
0824e44c3 ipc/sem.c: cacheline align the semaphore structures ... Browse Code »

commit f5c936c0f267ec58641451cf8b8d39b4c207ee4d upstream.

As now each semaphore has its own spinlock and parallel operations are
possible, give each semaphore its own cacheline.

On a i3 laptop, this gives up to 28% better performance:

#semscale 10 | grep "interleave 2"
- before:
Cpus 1, interleave 2 delay 0: 36109234 in 10 secs
Cpus 2, interleave 2 delay 0: 55276317 in 10 secs
Cpus 3, interleave 2 delay 0: 62411025 in 10 secs
Cpus 4, interleave 2 delay 0: 81963928 in 10 secs

-after:
Cpus 1, interleave 2 delay 0: 35527306 in 10 secs
Cpus 2, interleave 2 delay 0: 70922909 in 10 secs <<< + 28%
Cpus 3, interleave 2 delay 0: 80518538 in 10 secs
Cpus 4, interleave 2 delay 0: 89115148 in 10 secs <<< + 8.7%

i3, with 2 cores and with hyperthreading enabled. Interleave 2 in order
use first the full cores. HT partially hides the delay from cacheline
trashing, thus the improvement is "only" 8.7% if 4 threads are running.

Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Manfred Spraul
2013-10-18 22:45:47 +0800
49f7f31ab ipc/util.c, ipc_rcu_alloc: cacheline align allocation ... Browse Code »

commit 196aa0132fc7261f34b10ae1bfb44abc1bc69b3c upstream.

Enforce that ipc_rcu_alloc returns a cacheline aligned pointer on SMP.

Rationale:

The SysV sem code tries to move the main spinlock into a seperate
cacheline (____cacheline_aligned_in_smp). This works only if
ipc_rcu_alloc returns cacheline aligned pointers. vmalloc and kmalloc
return cacheline algined pointers, the implementation of ipc_rcu_alloc
breaks that.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Manfred Spraul
2013-10-18 22:45:46 +0800
5cd37e921 ipc: remove unused functions ... Browse Code »

commit 9ad66ae65fc8d3e7e3344310fb0aa835910264fe upstream.

We can now drop the msg_lock and msg_lock_check functions along with a
bogus comment introduced previously in semctl_down.

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:46 +0800
7b527fcdf ipc,msg: shorten critical region in msgrcv ... Browse Code »

commit 41a0d523d0f626e9da0dc01de47f1b89058033cf upstream.

do_msgrcv() is the last msg queue function that abuses the ipc lock Take
it only when needed when actually updating msq.

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Tested-by: Sedat Dilek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:46 +0800
8398fe18b ipc,msg: shorten critical region in msgsnd ... Browse Code »

commit 3dd1f784ed6603d7ab1043e51e6371235edf2313 upstream.

do_msgsnd() is another function that does too many things with the ipc
object lock acquired. Take it only when needed when actually updating
msq.

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:46 +0800
107b413cd ipc,msg: make msgctl_nolock lockless ... Browse Code »

commit ac0ba20ea6f2201a1589d6dc26ad1a4f0f967bb8 upstream.

While the INFO cmd doesn't take the ipc lock, the STAT commands do
acquire it unnecessarily. We can do the permissions and security checks
only holding the rcu lock.

This function now mimics semctl_nolock().

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:46 +0800
15d49ab48 ipc,msg: introduce lockless functions to obtain the ipc object ... Browse Code »

commit a5001a0d9768568de5d613c3b3a5b9c7721299da upstream.

Add msq_obtain_object() and msq_obtain_object_check(), which will allow
us to get the ipc object without acquiring the lock. Just as with
semaphores, these functions are basically wrappers around
ipc_obtain_object*().

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:46 +0800
f1f791398 ipc,msg: introduce msgctl_nolock ... Browse Code »

commit 2cafed30f150f7314f98717b372df8173516cae0 upstream.

Similar to semctl, when calling msgctl, the *_INFO and *_STAT commands
can be performed without acquiring the ipc object.

Add a msgctl_nolock() function and move the logic of *_INFO and *_STAT
out of msgctl(). This change still takes the lock and it will be
properly lockless in the next patch

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:46 +0800
f880aca02 ipc,msg: shorten critical region in msgctl_down ... Browse Code »

commit 15724ecb7e9bab35fc694c666ad563adba820cc3 upstream.

Instead of holding the ipc lock for the entire function, use the
ipcctl_pre_down_nolock and only acquire the lock for specific commands:
RMID and SET.

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:46 +0800
ac9bc6e39 ipc: move locking out of ipcctl_pre_down_nolock ... Browse Code »

commit 7b4cc5d8411bd4e9d61d8714f53859740cf830c2 upstream.

This function currently acquires both the rw_mutex and the rcu lock on
successful lookups, leaving the callers to explicitly unlock them,
creating another two level locking situation.

Make the callers (including those that still use ipcctl_pre_down())
explicitly lock and unlock the rwsem and rcu lock.

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:46 +0800
115d40dbe ipc: close open coded spin lock calls ... Browse Code »

commit cf9d5d78d05bca96df7618dfc3a5ee4414dcae58 upstream.

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:46 +0800
9f7b399c9 ipc: introduce ipc object locking helpers ... Browse Code »

commit 1ca7003ab41152d673d9e359632283d05294f3d6 upstream.

Simple helpers around the (kern_ipc_perm *)->lock spinlock.

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:46 +0800
34b209241 ipc: move rcu lock out of ipc_addid ... Browse Code »

commit dbfcd91f06f0e2d5564b2fd184e9c2a43675f9ab upstream.

This patchset continues the work that began in the sysv ipc semaphore
scaling series, see

https://lkml.org/lkml/2013/3/20/546

Just like semaphores used to be, sysv shared memory and msg queues also
abuse the ipc lock, unnecessarily holding it for operations such as
permission and security checks.

This patchset mostly deals with mqueues, and while shared mem can be
done in a very similar way, I want to get these patches out in the open
first. It also does some pending cleanups, mostly focused on the two
level locking we have in ipc code, taking care of ipc_addid() and
ipcctl_pre_down_nolock() - yes there are still functions that need to be
updated as well.

This patch:

Make all callers explicitly take and release the RCU read lock.

This addresses the two level locking seen in newary(), newseg() and
newqueue(). For the last two, explicitly unlock the ipc object and the
rcu lock, instead of calling the custom shm_unlock and msg_unlock
functions. The next patch will deal with the open coded locking for
->perm.lock

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Davidlohr Bueso
2013-10-18 22:45:46 +0800

08 Sep, 2013

1 commit

07245f175 IPC: bugfix for msgrcv with msgtyp < 0 ... Browse Code »

commit 368ae537e056acd3f751fa276f48423f06803922 upstream.

According to 'man msgrcv': "If msgtyp is less than 0, the first message of
the lowest type that is less than or equal to the absolute value of msgtyp
shall be received."

Bug: The kernel only returns a message if its type is 1; other messages
with type < abs(msgtype) will never get returned.

Fix: After having traversed the list to find the first message with the
lowest type, we need to actually return that message.

This regression was introduced by commit daaf74cf0867 ("ipc: refactor
msg list search into separate function")

Signed-off-by: Svenning Soerensen
Reviewed-by: Peter Hurley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Svenning Soerensen
2013-09-08 13:09:58 +0800