Eric Lee / smarc-fsl-linux-kernel

01 Oct, 2013

4 commits

0e8c66569 ipc/sem.c: update sem_otime for all operations ... Browse Code »

In commit 0a2b9d4c7967 ("ipc/sem.c: move wake_up_process out of the
spinlock section"), the update of semaphore's sem_otime(last semop time)
was moved to one central position (do_smart_update).

But since do_smart_update() is only called for operations that modify
the array, this means that wait-for-zero semops do not update sem_otime
anymore.

The fix is simple:
Non-alter operations must update sem_otime.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul
Reported-by: Jia He
Tested-by: Jia He
Cc: Davidlohr Bueso
Cc: Mike Galbraith
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-10-01 05:31:03 +0800
d8c633766 ipc/sem.c: synchronize the proc interface ... Browse Code »

The proc interface is not aware of sem_lock(), it instead calls
ipc_lock_object() directly. This means that simple semop() operations
can run in parallel with the proc interface. Right now, this is
uncritical, because the implementation doesn't do anything that requires
a proper synchronization.

But it is dangerous and therefore should be fixed.

Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Mike Galbraith
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-10-01 05:31:01 +0800
6d07b68ce ipc/sem.c: optimize sem_lock() ... Browse Code »

Operations that need access to the whole array must guarantee that there
are no simple operations ongoing. Right now this is achieved by
spin_unlock_wait(sem->lock) on all semaphores.

If complex_count is nonzero, then this spin_unlock_wait() is not
necessary, because it was already performed in the past by the thread
that increased complex_count and even though sem_perm.lock was dropped
inbetween, no simple operation could have started, because simple
operations cannot start when complex_count is non-zero.

Signed-off-by: Manfred Spraul
Cc: Mike Galbraith
Cc: Rik van Riel
Reviewed-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-10-01 05:31:01 +0800
5e9d52759 ipc/sem.c: fix race in sem_lock() ... Browse Code »

The exclusion of complex operations in sem_lock() is insufficient: after
acquiring the per-semaphore lock, a simple op must first check that
sem_perm.lock is not locked and only after that test check
complex_count. The current code does it the other way around - and that
creates a race. Details are below.

The patch is a complete rewrite of sem_lock(), based in part on the code
from Mike Galbraith. It removes all gotos and all loops and thus the
risk of livelocks.

I have tested the patch (together with the next one) on my i3 laptop and
it didn't cause any problems.

The bug is probably also present in 3.10 and 3.11, but for these kernels
it might be simpler just to move the test of sma->complex_count after
the spin_is_locked() test.

Details of the bug:

Assume:
- sma->complex_count = 0.
- Thread 1: semtimedop(complex op that must sleep)
- Thread 2: semtimedop(simple op).

Pseudo-Trace:

Thread 1: sem_lock(): acquire sem_perm.lock
Thread 1: sem_lock(): check for ongoing simple ops
Nothing ongoing, thread 2 is still before sem_lock().
Thread 1: try_atomic_semop()
<<< preempted.

Thread 2: sem_lock():
static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
int nsops)
{
int locknum;
again:
if (nsops == 1 && !sma->complex_count) {
struct sem *sem = sma->sem_base + sops->sem_num;

/* Lock just the semaphore we are interested in. */
spin_lock(&sem->lock);

/*
* If sma->complex_count was set while we were spinning,
* we may need to look at things we did not lock here.
*/
if (unlikely(sma->complex_count)) {
spin_unlock(&sem->lock);
goto lock_array;
}
<<<<<<<<<
<<< complex_count is still 0.
<<<
<<< Here it is preempted
<<<<<<<<<

Thread 1: try_atomic_semop() returns, notices that it must sleep.
Thread 1: increases sma->complex_count.
Thread 1: drops sem_perm.lock
Thread 2:
/*
* Another process is holding the global lock on the
* sem_array; we cannot enter our critical section,
* but have to wait for the global lock to be released.
*/
if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
spin_unlock(&sem->lock);
spin_unlock_wait(&sma->sem_perm.lock);
goto again;
}
<<< sem_perm.lock already dropped, thus no "goto again;"

locknum = sops->sem_num;

Signed-off-by: Manfred Spraul
Cc: Mike Galbraith
Cc: Rik van Riel
Cc: Davidlohr Bueso
Cc: [3.10+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-10-01 05:31:01 +0800

25 Sep, 2013

1 commit

53dad6d3a ipc: fix race with LSMs ... Browse Code »

Currently, IPC mechanisms do security and auditing related checks under
RCU. However, since security modules can free the security structure,
for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
race if the structure is freed before other tasks are done with it,
creating a use-after-free condition. Manfred illustrates this nicely,
for instance with shared mem and selinux:

-> do_shmat calls rcu_read_lock()
-> do_shmat calls shm_object_check().
Checks that the object is still valid - but doesn't acquire any locks.
Then it returns.
-> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
-> selinux_shm_shmat calls ipc_has_perm()
-> ipc_has_perm accesses ipc_perms->security

shm_close()
-> shm_close acquires rw_mutex & shm_lock
-> shm_close calls shm_destroy
-> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
-> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
-> ipc_free_security calls kfree(ipc_perms->security)

This patch delays the freeing of the security structures after all RCU
readers are done. Furthermore it aligns the security life cycle with
that of the rest of IPC - freeing them based on the reference counter.
For situations where we need not free security, the current behavior is
kept. Linus states:

"... the old behavior was suspect for another reason too: having the
security blob go away from under a user sounds like it could cause
various other problems anyway, so I think the old code was at least
_prone_ to bugs even if it didn't have catastrophic behavior."

I have tested this patch with IPC testcases from LTP on both my
quad-core laptop and on a 64 core NUMA server. In both cases selinux is
enabled, and tests pass for both voluntary and forced preemption models.
While the mentioned races are theoretical (at least no one as reported
them), I wanted to make sure that this new logic doesn't break anything
we weren't aware of.

Suggested-by: Linus Torvalds
Signed-off-by: Davidlohr Bueso
Acked-by: Manfred Spraul
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2013-09-25 00:36:53 +0800

12 Sep, 2013

1 commit

d9a605e40 ipc: rename ids->rw_mutex ... Browse Code »

Since in some situations the lock can be shared for readers, we shouldn't
be calling it a mutex, rename it to rwsem.

Signed-off-by: Davidlohr Bueso
Tested-by: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2013-09-12 06:59:42 +0800

10 Jul, 2013

8 commits

758a6ba39 ipc/sem.c: rename try_atomic_semop() to perform_atomic_semop(), docu update ... Browse Code »

Cleanup: Some minor points that I noticed while writing the previous
patches

1) The name try_atomic_semop() is misleading: The function performs the
operation (if it is possible).

2) Some documentation updates.

No real code change, a rename and documentation changes.

Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-07-10 01:33:28 +0800
d12e1e50e ipc/sem.c: replace shared sem_otime with per-semaphore value ... Browse Code »

sem_otime contains the time of the last semaphore operation that
completed successfully. Every operation updates this value, thus access
from multiple cpus can cause thrashing.

Therefore the patch replaces the variable with a per-semaphore variable.
The per-array sem_otime is only calculated when required.

No performance improvement on a single-socket i3 - only important for
larger systems.

Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-07-10 01:33:28 +0800
f269f40ad ipc/sem.c: always use only one queue for alter operations ... Browse Code »

There are two places that can contain alter operations:
- the global queue: sma->pending_alter
- the per-semaphore queues: sma->sem_base[].pending_alter.

Since one of the queues must be processed first, this causes an odd
priorization of the wakeups: complex operations have priority over
simple ops.

The patch restores the behavior of linux pending_alter is used.
- otherwise, the per-semaphore queues are used.

As a side effect, do_smart_update_queue() becomes much simpler: no more
goto logic.

Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-07-10 01:33:28 +0800
1a82e9e1d ipc/sem: separate wait-for-zero and alter tasks into seperate queues ... Browse Code »

Introduce separate queues for operations that do not modify the
semaphore values. Advantages:

- Simpler logic in check_restart().
- Faster update_queue(): Right now, all wait-for-zero operations are
always tested, even if the semaphore value is not 0.
- wait-for-zero gets again priority, as in linux
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-07-10 01:33:28 +0800
f5c936c0f ipc/sem.c: cacheline align the semaphore structures ... Browse Code »

As now each semaphore has its own spinlock and parallel operations are
possible, give each semaphore its own cacheline.

On a i3 laptop, this gives up to 28% better performance:

#semscale 10 | grep "interleave 2"
- before:
Cpus 1, interleave 2 delay 0: 36109234 in 10 secs
Cpus 2, interleave 2 delay 0: 55276317 in 10 secs
Cpus 3, interleave 2 delay 0: 62411025 in 10 secs
Cpus 4, interleave 2 delay 0: 81963928 in 10 secs

-after:
Cpus 1, interleave 2 delay 0: 35527306 in 10 secs
Cpus 2, interleave 2 delay 0: 70922909 in 10 secs <<< + 28%
Cpus 3, interleave 2 delay 0: 80518538 in 10 secs
Cpus 4, interleave 2 delay 0: 89115148 in 10 secs <<< + 8.7%

i3, with 2 cores and with hyperthreading enabled. Interleave 2 in order
use first the full cores. HT partially hides the delay from cacheline
trashing, thus the improvement is "only" 8.7% if 4 threads are running.

Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-07-10 01:33:28 +0800
9ad66ae65 ipc: remove unused functions ... Browse Code »

We can now drop the msg_lock and msg_lock_check functions along with a
bogus comment introduced previously in semctl_down.

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2013-07-10 01:33:27 +0800
7b4cc5d84 ipc: move locking out of ipcctl_pre_down_nolock ... Browse Code »

This function currently acquires both the rw_mutex and the rcu lock on
successful lookups, leaving the callers to explicitly unlock them,
creating another two level locking situation.

Make the callers (including those that still use ipcctl_pre_down())
explicitly lock and unlock the rwsem and rcu lock.

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2013-07-10 01:33:27 +0800
cf9d5d78d ipc: close open coded spin lock calls ... Browse Code »

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2013-07-10 01:33:27 +0800

27 May, 2013

1 commit

ab465df9d ipc/sem.c: Fix missing wakeups in do_smart_update_queue() ... Browse Code »

do_smart_update_queue() is called when an operation (semop,
semctl(SETVAL), semctl(SETALL), ...) modified the array. It must check
which of the sleeping tasks can proceed.

do_smart_update_queue() missed a few wakeups:
- if a sleeping complex op was completed, then all per-semaphore queues
must be scanned - not only those that were modified by *sops
- if a sleeping simple op proceeded, then the global queue must be
scanned again

And:
- the test for "|sops == NULL) before scanning the global queue is not
required: If the global queue is empty, then it doesn't need to be
scanned - regardless of the reason for calling do_smart_update_queue()

The patch is not optimized, i.e. even completing a wait-for-zero
operation causes a rescan. This is done to keep the patch as simple as
possible.

Signed-off-by: Manfred Spraul
Acked-by: Davidlohr Bueso
Cc: Rik van Riel
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-05-27 06:14:51 +0800

10 May, 2013

2 commits

de2657f94 ipc,sem: fix semctl(..., GETNCNT) ... Browse Code »

The semctl GETNCNT returns the number of semops waiting for the
specified semaphore to become nonzero. After commit 9f1bc2c9022c
("ipc,sem: have only one list in struct sem_queue"), the semops waiting
on just one semaphore are waiting on that semaphore's list.

In order to return the correct count, we have to walk that list too, in
addition to the sem_array's list for complex operations.

Signed-off-by: Rik van Riel
Signed-off-by: Linus Torvalds

Rik van Riel
2013-05-10 05:17:47 +0800
ebc2e5e6a ipc,sem: fix semctl(..., GETZCNT) ... Browse Code »

The semctl GETZCNT returns the number of semops waiting for the
specified semaphore to become zero. After commit 9f1bc2c9022c
("ipc,sem: have only one list in struct sem_queue"), the semops waiting
on just one semaphore are waiting on that semaphore's list.

In order to return the correct count, we have to walk that list too, in
addition to the sem_array's list for complex operations.

This bug broke dbench; it works again with this patch applied.

Signed-off-by: Rik van Riel
Reported-by: Kent Overstreet
Tested-by: Kent Overstreet
Signed-off-by: Linus Torvalds

Rik van Riel
2013-05-10 05:17:47 +0800

05 May, 2013

7 commits

941b0304a ipc: simplify rcu_read_lock() in semctl_nolock() ... Browse Code »

This trivially combines two rcu_read_lock() calls in both sides of a
if-statement into one single one in front of the if-statement.

Split out as an independent cleanup from the previous commit.

Acked-by: Davidlohr Bueso
Cc: Rik van Riel
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-05-05 08:24:59 +0800
c728b9c87 ipc: simplify semtimedop/semctl_main() common error path handling ... Browse Code »

With various straight RCU lock/unlock movements, one common exit path
pattern had become

rcu_read_unlock();
goto out_wakeup;

and in fact there were no cases where we wanted to exit to out_wakeup
_without_ releasing the RCU read lock.

So replace that pattern with "goto out_rcu_wakeup", and remove the old
out_wakeup.

Acked-by: Davidlohr Bueso
Cc: Rik van Riel
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-05-05 08:21:58 +0800
321310ced ipc: move sem_obtain_lock() rcu locking into the only caller ... Browse Code »

sem_obtain_lock() was another of those functions that returned with the
RCU lock held for reading in the success case. Move the RCU locking to
the caller (semtimedop()), making it more obvious. We already did RCU
locking elsewhere in that function.

Side note: why does semtimedop() re-do the semphore lookup after the
sleep, rather than just getting a reference to the semaphore it already
looked up originally?

Acked-by: Davidlohr Bueso
Cc: Rik van Riel
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-05-05 08:20:14 +0800
fbfd1d286 ipc: fix double sem unlock in semctl error path ... Browse Code »

Fix another ipc locking buglet introduced by the scalability patches:
when semctl_down() was changed to delay the semaphore locking, one error
path for security_sem_semctl() went through the semaphore unlock logic
even though the semaphore had never been locked.

Introduced by commit 16df3674efe3 ("ipc,sem: do not hold ipc lock more
than necessary")

Acked-by: Davidlohr Bueso
Cc: Rik van Riel
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-05-05 08:19:59 +0800
4091fd942 ipc: move the rcu_read_lock() from sem_lock_and_putref() into callers ... Browse Code »

This is another ipc semaphore locking cleanup, trying to make the
locking more straightforward. We move the rcu read locking into the
callers of sem_lock_and_putref(), which in general means that we now
mostly do the rcu_read_lock() and rcu_read_unlock() in the same
function.

Mostly. We still have the ipc_addid/newary/freeary mess, and things
like ipcctl_pre_down_nolock().

Acked-by: Davidlohr Bueso
Cc: Rik van Riel
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-05-05 08:19:39 +0800
73b29505c ipc: sem_putref() does not need the semaphore lock any more ... Browse Code »

ipc_rcu_putref() uses atomics for the refcount, and the games to lock
and unlock the semaphore just to try to keep the reference counting
working are no longer useful.

Acked-by: Davidlohr Bueso
Cc: Rik van Riel
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-05-05 02:24:21 +0800
6d49dab8a ipc: move rcu_read_unlock() out of sem_unlock() and into callers ... Browse Code »

The IPC locking is a mess, and sem_unlock() unlocks not only the
semaphore spinlock, it also drops the rcu read lock. Unlike sem_lock(),
which just gets the spin-lock, and expects the caller to get the rcu
read lock.

This all makes things very hard to follow, and it's very confusing when
you take the rcu read lock in one function, and then release it in
another. And it has caused actual bugs: the sem_obtain_lock() function
ended up dropping the RCU read lock twice in one error path, because it
first did the sem_unlock(), and then did a rcu_read_unlock() to match
the rcu_read_lock() it had done.

This is just a totally mindless "remove rcu_read_unlock() from
sem_unlock() and add it immediately after each caller" (except for the
aforementioned bug where we did too many rcu_read_unlock(), and in
find_alloc_undo() where we just got the rcu_read_lock() to correct for
the fact that sem_unlock would immediately drop it again).

We can (and should) clean things up further, but this fixes the bug with
the minimal amount of subtlety.

Reviewed-by: Davidlohr Bueso
Cc: Rik van Riel
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-05-05 02:24:00 +0800

03 May, 2013

1 commit

ce857229e ipc: fix GETALL/IPC_RM race for sysv semaphores ... Browse Code »

We can step on WARN_ON_ONCE() in sem_getref() if a semaphore is removed
just as we are about to call sem_getref() from semctl_main(); results
are not pretty.

We should fail with -EIDRM, same as if IPC_RM happened while we'd been
doing allocation there. This also expands sem_getref() at its only
callsite (and fixed there), while sem_getref_and_unlock() is simply
killed off - it has no callers at all.

Signed-off-by: Al Viro
Acked-by: Davidlohr Bueso
Signed-off-by: Linus Torvalds

Al Viro
2013-05-03 10:51:31 +0800

01 May, 2013

4 commits

6062a8dc0 ipc,sem: fine grained locking for semtimedop ... Browse Code »

Introduce finer grained locking for semtimedop, to handle the common case
of a program wanting to manipulate one semaphore from an array with
multiple semaphores.

If the call is a semop manipulating just one semaphore in an array with
multiple semaphores, only take the lock for that semaphore itself.

If the call needs to manipulate multiple semaphores, or another caller is
in a transaction that manipulates multiple semaphores, the sem_array lock
is taken, as well as all the locks for the individual semaphores.

On a 24 CPU system, performance numbers with the semop-multi
test with N threads and N semaphores, look like this:

vanilla Davidlohr's Davidlohr's + Davidlohr's +
threads patches rwlock patches v3 patches
10 610652 726325 1783589 2142206
20 341570 365699 1520453 1977878
30 288102 307037 1498167 2037995
40 290714 305955 1612665 2256484
50 288620 312890 1733453 2650292
60 289987 306043 1649360 2388008
70 291298 306347 1723167 2717486
80 290948 305662 1729545 2763582
90 290996 306680 1736021 2757524
100 292243 306700 1773700 3059159

[davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
[davidlohr.bueso@hp.com: make refcounter atomic]
Signed-off-by: Rik van Riel
Suggested-by: Linus Torvalds
Acked-by: Davidlohr Bueso
Cc: Chegu Vinod
Cc: Jason Low
Reviewed-by: Michel Lespinasse
Cc: Peter Hurley
Cc: Stanislav Kinsbursky
Tested-by: Emmanuel Benisty
Tested-by: Sedat Dilek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2013-05-01 23:12:58 +0800
9f1bc2c90 ipc,sem: have only one list in struct sem_queue ... Browse Code »

Having only one list in struct sem_queue, and only queueing simple
semaphore operations on the list for the semaphore involved, allows us to
introduce finer grained locking for semtimedop.

Signed-off-by: Rik van Riel
Acked-by: Davidlohr Bueso
Cc: Chegu Vinod
Cc: Emmanuel Benisty
Cc: Jason Low
Cc: Michel Lespinasse
Cc: Peter Hurley
Cc: Stanislav Kinsbursky
Tested-by: Sedat Dilek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2013-05-01 23:12:58 +0800
c460b662d ipc,sem: open code and rename sem_lock ... Browse Code »

Rename sem_lock() to sem_obtain_lock(), so we can introduce a sem_lock()
later that only locks the sem_array and does nothing else.

Open code the locking from ipc_lock() in sem_obtain_lock() so we can
introduce finer grained locking for the sem_array in the next patch.

[akpm@linux-foundation.org: propagate the ipc_obtain_object() errno out of sem_obtain_lock()]
Signed-off-by: Rik van Riel
Acked-by: Davidlohr Bueso
Cc: Chegu Vinod
Cc: Emmanuel Benisty
Cc: Jason Low
Cc: Michel Lespinasse
Cc: Peter Hurley
Cc: Stanislav Kinsbursky
Tested-by: Sedat Dilek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2013-05-01 23:12:58 +0800
16df3674e ipc,sem: do not hold ipc lock more than necessary ... Browse Code »

Instead of holding the ipc lock for permissions and security checks, among
others, only acquire it when necessary.

Some numbers....

1) With Rik's semop-multi.c microbenchmark we can see the following
results:

Baseline (3.9-rc1):
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 151452270, ops/sec 5048409

+ 59.40% a.out [kernel.kallsyms] [k] _raw_spin_lock
+ 6.14% a.out [kernel.kallsyms] [k] sys_semtimedop
+ 3.84% a.out [kernel.kallsyms] [k] avc_has_perm_flags
+ 3.64% a.out [kernel.kallsyms] [k] __audit_syscall_exit
+ 2.06% a.out [kernel.kallsyms] [k] copy_user_enhanced_fast_string
+ 1.86% a.out [kernel.kallsyms] [k] ipc_lock

With this patchset:
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 273156400, ops/sec 9105213

+ 18.54% a.out [kernel.kallsyms] [k] _raw_spin_lock
+ 11.72% a.out [kernel.kallsyms] [k] sys_semtimedop
+ 7.70% a.out [kernel.kallsyms] [k] ipc_has_perm.isra.21
+ 6.58% a.out [kernel.kallsyms] [k] avc_has_perm_flags
+ 6.54% a.out [kernel.kallsyms] [k] __audit_syscall_exit
+ 4.71% a.out [kernel.kallsyms] [k] ipc_obtain_object_check

2) While on an Oracle swingbench DSS (data mining) workload the
improvements are not as exciting as with Rik's benchmark, we can see
some positive numbers. For an 8 socket machine the following are the
percentages of %sys time incurred in the ipc lock:

Baseline (3.9-rc1):
100 swingbench users: 8,74%
400 swingbench users: 21,86%
800 swingbench users: 84,35%

With this patchset:
100 swingbench users: 8,11%
400 swingbench users: 19,93%
800 swingbench users: 77,69%

[riel@redhat.com: fix two locking bugs]
[sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Davidlohr Bueso
Signed-off-by: Rik van Riel
Reviewed-by: Chegu Vinod
Acked-by: Michel Lespinasse
Cc: Rik van Riel
Cc: Jason Low
Cc: Emmanuel Benisty
Cc: Peter Hurley
Cc: Stanislav Kinsbursky
Tested-by: Sedat Dilek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2013-05-01 23:12:58 +0800

06 Mar, 2013

1 commit

e1fd1f490 get rid of union semop in sys_semctl(2) arguments ... Browse Code »

just have the bugger take unsigned long and deal with SETVAL
case (when we use an int member in the union) explicitly.

Signed-off-by: Al Viro

Al Viro
2013-03-06 04:14:16 +0800

04 Mar, 2013

1 commit

22d1a35da make HAVE_SYSCALL_WRAPPERS unconditional ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-03-04 11:58:30 +0800

07 Sep, 2012

1 commit

1efdb69b0 userns: Convert ipc to use kuid and kgid where appropriate ... Browse Code »

- Store the ipc owner and creator with a kuid
- Store the ipc group and the crators group with a kgid.
- Add error handling to ipc_update_perms, allowing it to
fail if the uids and gids can not be converted to kuids
or kgids.
- Modify the proc files to display the ipc creator and
owner in the user namespace of the opener of the proc file.

Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-09-07 13:17:20 +0800

03 Nov, 2011

3 commits

e57940d71 ipc/sem.c: remove private structures from public header file ... Browse Code »

include/linux/sem.h contains several structures that are only used within
ipc/sem.c.

The patch moves them into ipc/sem.c - there is no need to expose the
structures to the whole kernel.

No functional changes, only whitespace cleanups and 80-char per line
fixes.

Signed-off-by: Manfred Spraul
Acked-by: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Mike Galbraith
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2011-11-03 07:07:01 +0800
0b0577f60 ipc/sem.c: handle spurious wakeups ... Browse Code »

semtimedop() does not handle spurious wakeups, it returns -EINTR to user
space. Most other schedule() users would just loop and not return to user
space. The patch adds such a loop to semtimedop()

Signed-off-by: Manfred Spraul
Reported-by: Peter Zijlstra
Acked-by: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Mike Galbraith
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2011-11-03 07:07:01 +0800
3c24783bb ipc/sem.c: fix return code race with semop vs. semop +semctl(IPC_RMID) ... Browse Code »

sys_semtimedop() may return -EIDRM although the semaphore operation
completed successfully:

thread 1: thread 2:
semtimedop(), sleeps
semop():
* acquires sem_lock()
semtimedop() woken up due to timeout
sem_lock() loops
* notices that thread 2 could be completed.
* performs the operations that thread 2 is sleeping on.
* marks the semaphore operation as IN_WAKEUP
* drops sem_lock(), does wakeup, sets return code to 0
* thread delayed due to interrupt, whatever
* returns to user space
* thread still delayed
semctl(IPC_RMID)
* acquires sem_lock()
* ipc_rmid(), ipcp->deleted=1
* drops sem_lock()
* thread finally continues - but seem_lock()
now fails due to ipcp->deleted == 1
* returns -EIDRM instead of 0

The fix is trivial: Always use the return code in queue.status.

In real world, the race probably doesn't matter:
If the semaphore array is destroyed, the app is probably not interested
if the last operation succeeded or was already cancelled.

Signed-off-by: Manfred Spraul
Cc: Thomas Gleixner
Cc: Mike Galbraith
Acked-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2011-11-03 07:07:01 +0800

26 Jul, 2011

1 commit

d694ad62b ipc/sem.c: fix race with concurrent semtimedop() timeouts and IPC_RMID ... Browse Code »
1

If a semaphore array is removed and in parallel a sleeping task is woken
up (signal or timeout, does not matter), then the woken up task does not
wait until wake_up_sem_queue_do() is completed. This will cause crashes,
because wake_up_sem_queue_do() will read from a stale pointer.

The fix is simple: Regardless of anything, always call get_queue_result().
This function waits until wake_up_sem_queue_do() has finished it's task.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=27142

Reported-by: Yuriy Yevtukhov
Reported-by: Harald Laabs
Signed-off-by: Manfred Spraul
Acked-by: Eric Dumazet
Cc: [2.6.35+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2011-07-26 11:57:07 +0800

21 Jul, 2011

1 commit

693a8b6ee ipc,rcu: Convert call_rcu(free_un) to kfree_rcu() ... Browse Code »

The rcu callback free_un() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(free_un).

Signed-off-by: Lai Jiangshan
Signed-off-by: Paul E. McKenney
Cc: Andrew Morton
Cc: Manfred Spraul
Reviewed-by: Josh Triplett

Lai Jiangshan
2011-07-21 05:10:16 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

24 Mar, 2011

1 commit

b0e77598f userns: user namespaces: convert several capable() calls ... Browse Code »

CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
because the resource comes from current's own ipc namespace.

setuid/setgid are to uids in own namespace, so again checks can be against
current_user_ns().

Changelog:
Jan 11: Use task_ns_capable() in place of sched_capable().
Jan 11: Use nsown_capable() as suggested by Bastian Blank.
Jan 11: Clarify (hopefully) some logic in futex and sched.c
Feb 15: use ns_capable for ipc, not nsown_capable
Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
Feb 23: pass ns down rather than taking it from current

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:47:08 +0800

02 Oct, 2010

1 commit

982f7c2b2 sys_semctl: fix kernel stack leakage ... Browse Code »

The semctl syscall has several code paths that lead to the leakage of
uninitialized kernel stack memory (namely the IPC_INFO, SEM_INFO,
IPC_STAT, and SEM_STAT commands) during the use of the older, obsolete
version of the semid_ds struct.

The copy_semid_to_user() function declares a semid_ds struct on the stack
and copies it back to the user without initializing or zeroing the
"sem_base", "sem_pending", "sem_pending_last", and "undo" pointers,
allowing the leakage of 16 bytes of kernel stack memory.

The code is still reachable on 32-bit systems - when calling semctl()
newer glibc's automatically OR the IPC command with the IPC_64 flag, but
invoking the syscall directly allows users to use the older versions of
the struct.

Signed-off-by: Dan Rosenberg
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Rosenberg
2010-10-02 01:50:58 +0800