Eric Lee / smarc-fsl-linux-kernel

14 Dec, 2014

1 commit

2e094abfd ipc/sem.c: change memory barrier in sem_lock() to smp_rmb() ... Browse Code »

When I fixed bugs in the sem_lock() logic, I was more conservative than
necessary. Therefore it is safe to replace the smp_mb() with smp_rmb().
And: With smp_rmb(), semop() syscalls are up to 10% faster.

The race we must protect against is:

sem->lock is free
sma->complex_count = 0
sma->sem_perm.lock held by thread B

thread A:

A: spin_lock(&sem->lock)

B: sma->complex_count++; (now 1)
B: spin_unlock(&sma->sem_perm.lock);

A: spin_is_locked(&sma->sem_perm.lock);
A: XXXXX memory barrier
A: if (sma->complex_count == 0)

Thread A must read the increased complex_count value, i.e. the read must
not be reordered with the read of sem_perm.lock done by spin_is_locked().

Since it's about ordering of reads, smp_rmb() is sufficient.

[akpm@linux-foundation.org: update sem_lock() comment, from Davidlohr]
Signed-off-by: Manfred Spraul
Reviewed-by: Davidlohr Bueso
Acked-by: Rafael Aquini
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-12-14 04:42:52 +0800

04 Dec, 2014

1 commit

e8577d1f0 ipc/sem.c: fully initialize sem_array before making it visible ... Browse Code »

ipc_addid() makes a new ipc identifier visible to everyone. New objects
start as locked, so that the caller can complete the initialization
after the call. Within struct sem_array, at least sma->sem_base and
sma->sem_nsems are accessed without any locks, therefore this approach
doesn't work.

Thus: Move the ipc_addid() to the end of the initialization.

Signed-off-by: Manfred Spraul
Reported-by: Rik van Riel
Acked-by: Rik van Riel
Acked-by: Davidlohr Bueso
Acked-by: Rafael Aquini
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-12-04 01:36:03 +0800

07 Jun, 2014

9 commits

9b44ee2ee ipc/sem.c: add a printk_once for semctl(GETNCNT/GETZCNT) ... Browse Code »

The actual Linux implementation for semctl(GETNCNT) and semctl(GETZCNT)
always (since 0.99.10) reported a thread as sleeping on all semaphores
that are listed in the semop() call.

The documented behavior (both in the Linux man page and in the Single
Unix Specification) is that a task should be reported on exactly one
semaphore: The semaphore that caused the thread to got to sleep.

This patch adds a pr_info_once() that is triggered if a thread hits the
relevant case.

The code triggers slightly too often, otherwise it would be necessary to
replicate the old code. As there are no known users of GETNCNT or
GETZCNT, this is done to prevent unnecessary bloat.

The task that triggered is reported with name (tsk->comm) and pid.

Signed-off-by: Manfred Spraul
Acked-by: Davidlohr Bueso
Cc: Michael Kerrisk
Cc: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:15 +0800
b220c57ae ipc/sem.c: make semctl(,,{GETNCNT,GETZCNT}) standard compliant ... Browse Code »

SUSv4 clearly defines how semncnt and semzcnt must be calculated: A task
waits on exactly one semaphore: The semaphore from the first operation
in the sop array that cannot proceed.

The Linux implementation never followed the standard, it tried to count
all semaphores that might be the reason why a task sleeps.

This patch fixes that.

Note:
a) The implementation assumes that GETNCNT and GETZCNT are rare operations,
therefore the code counts them only on demand.
(If they wouldn't be rare, then the non-compliance would have
been found earlier)

b) compared to the initial version of the patch, the BUG_ONs were removed
and it was clarified that the new behavior conforms to SUS.

Back-compatibility concerns:

Manfred:

: - there is no application in Fedora that uses GETNCNT or GETZCNT.
:
: - application that use only single-sop semop() are also safe, the
: difference only affects complex apps.
:
: - portable application are also safe, the new behavior is standard
: compliant.
:
: But that's it. The old behavior existed in Linux from 0.99.something
: until now.

Michael:

: * These operations seem to be very little used. Grepping the public
: source that is contained Fedora 20 source DVD, there appear to be no
: uses. Of course, this says nothing about uses in private /
: non-mainstream FOSS code, but it seems likely that the same pattern
: is followed there.
:
: * The existing behavior is hard enough to understand that I suspect
: that no one understood it well enough to rely on it anyway
: (especially as that behavior contradicted both man page and POSIX).
:
: So, there's a chance of breakage, but I estimate that it's minute.

Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:15 +0800
ed247b7ca ipc/sem.c: store which operation blocks in perform_atomic_semop() ... Browse Code »

Preparation for the next patch:

In the slow-path of perform_atomic_semop(), store a pointer to the
operation that caused the operation to block.

Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:15 +0800
d198cd6d6 ipc/sem.c: change perform_atomic_semop parameters ... Browse Code »

Right now, perform_atomic_semop gets the content of sem_queue as
individual fields. Changes that, instead pass a pointer to sem_queue.

This is a preparation for the next patch: it uses sem_queue to store the
reason why a task must sleep.

Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:15 +0800
2f2ed41dc ipc/sem.c: remove code duplication ... Browse Code »

count_semzcnt and count_semncnt are more of less identical. The patch
creates a single function that either counts the number of tasks waiting
for zero or waiting due to a decrease operation.

Compared to the initial version, the BUG_ONs were removed.

Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:15 +0800
1994862dc ipc/sem.c: bugfix for semctl(,,GETZCNT) ... Browse Code »

GETZCNT is supposed to return the number of threads that wait until a
semaphore value becomes 0.

The current implementation overlooks complex operations that contain
both wait-for-zero operation and operations that alter at least one
semaphore.

The patch fixes that. It's intentionally copy&paste, this will be
cleaned up in the next patch.

Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:15 +0800
46c0a8ca3 ipc, kernel: clear whitespace ... Browse Code »

trailing whitespace

Signed-off-by: Paul McQuade
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul McQuade
2014-06-07 07:08:14 +0800
7153e4027 ipc, kernel: use Linux headers ... Browse Code »

Use #include instead of
Use #include instead of

Signed-off-by: Paul McQuade
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul McQuade
2014-06-07 07:08:14 +0800
eb66ec44f ipc: constify ipc_ops ... Browse Code »

There is no need to recreate the very same ipc_ops structure on every
kernel entry for msgget/semget/shmget. Just declare it static and be
done with it. While at it, constify it as we don't modify the structure
at runtime.

Found in the PaX patch, written by the PaX Team.

Signed-off-by: Mathias Krause
Cc: PaX Team
Cc: Davidlohr Bueso
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathias Krause
2014-06-07 07:08:14 +0800

28 Jan, 2014

6 commits

3ab08fe20 ipc: remove braces for single statements ... Browse Code »

Deal with checkpatch messages:
WARNING: braces {} are not necessary for single statement blocks

Signed-off-by: Davidlohr Bueso
Cc: Aswin Chandramouleeswaran
Cc: Rik van Riel
Acked-by: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2014-01-28 13:02:39 +0800
8001c8581 ipc: standardize code comments ... Browse Code »

IPC commenting style is all over the place, *specially* in util.c. This
patch orders things a bit.

Signed-off-by: Davidlohr Bueso
Cc: Aswin Chandramouleeswaran
Cc: Rik van Riel
Acked-by: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2014-01-28 13:02:39 +0800
239521f31 ipc: whitespace cleanup ... Browse Code »

The ipc code does not adhere the typical linux coding style.
This patch fixes lots of simple whitespace errors.

- mostly autogenerated by
scripts/checkpatch.pl -f --fix \
--types=pointer_location,spacing,space_before_tab
- one manual fixup (keep structure members tab-aligned)
- removal of additional space_before_tab that were not found by --fix

Tested with some of my msg and sem test apps.

Andrew: Could you include it in -mm and move it towards Linus' tree?

Signed-off-by: Manfred Spraul
Suggested-by: Li Bin
Cc: Joe Perches
Acked-by: Rafael Aquini
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-01-28 13:02:39 +0800
72a8ff2f9 ipc: change kern_ipc_perm.deleted type to bool ... Browse Code »

struct kern_ipc_perm.deleted is meant to be used as a boolean toggle, and
the changes introduced by this patch are just to make the case explicit.

Signed-off-by: Rafael Aquini
Reviewed-by: Rik van Riel
Cc: Greg Thelen
Acked-by: Davidlohr Bueso
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael Aquini
2014-01-28 13:02:39 +0800
0f3d2b013 ipc: introduce ipc_valid_object() helper to sort out IPC_RMID races ... Browse Code »

After the locking semantics for the SysV IPC API got improved, a couple
of IPC_RMID race windows were opened because we ended up dropping the
'kern_ipc_perm.deleted' check performed way down in ipc_lock(). The
spotted races got sorted out by re-introducing the old test within the
racy critical sections.

This patch introduces ipc_valid_object() to consolidate the way we cope
with IPC_RMID races by using the same abstraction across the API
implementation.

Signed-off-by: Rafael Aquini
Acked-by: Rik van Riel
Acked-by: Greg Thelen
Reviewed-by: Davidlohr Bueso
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael Aquini
2014-01-28 13:02:39 +0800
78f5009cc ipc/sem.c: avoid overflow of semop undo (semadj) value ... Browse Code »

When trying to understand semop code, I found a small mistake in the check
for semadj (undo) value overflow. The new undo value is not stored
immediately and next potential checks are done against the old value.

The failing scenario is not much practical. One semop call has to do more
operations on the same semaphore. Also semval and semadj must have
different values, so there has to be some operations without SEM_UNDO
flag. For example:

struct sembuf depositor_op[1];
struct sembuf collector_op[2];

depositor_op[0].sem_num = 0;
depositor_op[0].sem_op = 20000;
depositor_op[0].sem_flg = 0;

collector_op[0].sem_num = 0;
collector_op[0].sem_op = -10000;
collector_op[0].sem_flg = SEM_UNDO;
collector_op[1].sem_num = 0;
collector_op[1].sem_op = -10000;
collector_op[1].sem_flg = SEM_UNDO;

if (semop(semid, depositor_op, 1) == -1)
{ perror("Failed to do 1st deposit"); return 1; }

if (semop(semid, collector_op, 2) == -1)
{ perror("Failed to do 1st collect"); return 1; }

if (semop(semid, depositor_op, 1) == -1)
{ perror("Failed to do 2nd deposit"); return 1; }

if (semop(semid, collector_op, 2) == -1)
{ perror("Failed to do 2nd collect"); return 1; }

return 0;

It passes without error now but the semadj value has overflown in the 2nd
collector operation.

[akpm@linux-foundation.org: restore lessened scope of local `undo']
[davidlohr@hp.com: correct header comment for perform_atomic_semop]
Signed-off-by: Petr Mladek
Acked-by: Davidlohr Bueso
Acked-by: Manfred Spraul
Cc: Jiri Kosina
Signed-off-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2014-01-28 13:02:39 +0800

17 Oct, 2013

1 commit

6e224f945 ipc/sem.c: synchronize semop and semctl with IPC_RMID ... Browse Code »

After acquiring the semlock spinlock, operations must test that the
array is still valid.

- semctl() and exit_sem() would walk stale linked lists (ugly, but
should be ok: all lists are empty)

- semtimedop() would sleep forever - and if woken up due to a signal -
access memory after free.

The patch also:
- standardizes the tests for .deleted, so that all tests in one
function leave the function with the same approach.
- unconditionally tests for .deleted immediately after every call to
sem_lock - even it it means that for semctl(GETALL), .deleted will be
tested twice.

Both changes make the review simpler: After every sem_lock, there must
be a test of .deleted, followed by a goto to the cleanup code (if the
function uses "goto cleanup").

The only exception is semctl_down(): If sem_ids().rwsem is locked, then
the presence in ids->ipcs_idr is equivalent to !.deleted, thus no
additional test is required.

Signed-off-by: Manfred Spraul
Cc: Mike Galbraith
Acked-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-10-17 12:35:52 +0800

01 Oct, 2013

4 commits

0e8c66569 ipc/sem.c: update sem_otime for all operations ... Browse Code »

In commit 0a2b9d4c7967 ("ipc/sem.c: move wake_up_process out of the
spinlock section"), the update of semaphore's sem_otime(last semop time)
was moved to one central position (do_smart_update).

But since do_smart_update() is only called for operations that modify
the array, this means that wait-for-zero semops do not update sem_otime
anymore.

The fix is simple:
Non-alter operations must update sem_otime.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul
Reported-by: Jia He
Tested-by: Jia He
Cc: Davidlohr Bueso
Cc: Mike Galbraith
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-10-01 05:31:03 +0800
d8c633766 ipc/sem.c: synchronize the proc interface ... Browse Code »

The proc interface is not aware of sem_lock(), it instead calls
ipc_lock_object() directly. This means that simple semop() operations
can run in parallel with the proc interface. Right now, this is
uncritical, because the implementation doesn't do anything that requires
a proper synchronization.

But it is dangerous and therefore should be fixed.

Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Mike Galbraith
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-10-01 05:31:01 +0800
6d07b68ce ipc/sem.c: optimize sem_lock() ... Browse Code »

Operations that need access to the whole array must guarantee that there
are no simple operations ongoing. Right now this is achieved by
spin_unlock_wait(sem->lock) on all semaphores.

If complex_count is nonzero, then this spin_unlock_wait() is not
necessary, because it was already performed in the past by the thread
that increased complex_count and even though sem_perm.lock was dropped
inbetween, no simple operation could have started, because simple
operations cannot start when complex_count is non-zero.

Signed-off-by: Manfred Spraul
Cc: Mike Galbraith
Cc: Rik van Riel
Reviewed-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-10-01 05:31:01 +0800
5e9d52759 ipc/sem.c: fix race in sem_lock() ... Browse Code »

The exclusion of complex operations in sem_lock() is insufficient: after
acquiring the per-semaphore lock, a simple op must first check that
sem_perm.lock is not locked and only after that test check
complex_count. The current code does it the other way around - and that
creates a race. Details are below.

The patch is a complete rewrite of sem_lock(), based in part on the code
from Mike Galbraith. It removes all gotos and all loops and thus the
risk of livelocks.

I have tested the patch (together with the next one) on my i3 laptop and
it didn't cause any problems.

The bug is probably also present in 3.10 and 3.11, but for these kernels
it might be simpler just to move the test of sma->complex_count after
the spin_is_locked() test.

Details of the bug:

Assume:
- sma->complex_count = 0.
- Thread 1: semtimedop(complex op that must sleep)
- Thread 2: semtimedop(simple op).

Pseudo-Trace:

Thread 1: sem_lock(): acquire sem_perm.lock
Thread 1: sem_lock(): check for ongoing simple ops
Nothing ongoing, thread 2 is still before sem_lock().
Thread 1: try_atomic_semop()
<<< preempted.

Thread 2: sem_lock():
static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
int nsops)
{
int locknum;
again:
if (nsops == 1 && !sma->complex_count) {
struct sem *sem = sma->sem_base + sops->sem_num;

/* Lock just the semaphore we are interested in. */
spin_lock(&sem->lock);

/*
* If sma->complex_count was set while we were spinning,
* we may need to look at things we did not lock here.
*/
if (unlikely(sma->complex_count)) {
spin_unlock(&sem->lock);
goto lock_array;
}
<<<<<<<<<
<<< complex_count is still 0.
<<<
<<< Here it is preempted
<<<<<<<<<

Thread 1: try_atomic_semop() returns, notices that it must sleep.
Thread 1: increases sma->complex_count.
Thread 1: drops sem_perm.lock
Thread 2:
/*
* Another process is holding the global lock on the
* sem_array; we cannot enter our critical section,
* but have to wait for the global lock to be released.
*/
if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
spin_unlock(&sem->lock);
spin_unlock_wait(&sma->sem_perm.lock);
goto again;
}
<<< sem_perm.lock already dropped, thus no "goto again;"

locknum = sops->sem_num;

Signed-off-by: Manfred Spraul
Cc: Mike Galbraith
Cc: Rik van Riel
Cc: Davidlohr Bueso
Cc: [3.10+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-10-01 05:31:01 +0800

25 Sep, 2013

1 commit

53dad6d3a ipc: fix race with LSMs ... Browse Code »

Currently, IPC mechanisms do security and auditing related checks under
RCU. However, since security modules can free the security structure,
for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
race if the structure is freed before other tasks are done with it,
creating a use-after-free condition. Manfred illustrates this nicely,
for instance with shared mem and selinux:

-> do_shmat calls rcu_read_lock()
-> do_shmat calls shm_object_check().
Checks that the object is still valid - but doesn't acquire any locks.
Then it returns.
-> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
-> selinux_shm_shmat calls ipc_has_perm()
-> ipc_has_perm accesses ipc_perms->security

shm_close()
-> shm_close acquires rw_mutex & shm_lock
-> shm_close calls shm_destroy
-> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
-> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
-> ipc_free_security calls kfree(ipc_perms->security)

This patch delays the freeing of the security structures after all RCU
readers are done. Furthermore it aligns the security life cycle with
that of the rest of IPC - freeing them based on the reference counter.
For situations where we need not free security, the current behavior is
kept. Linus states:

"... the old behavior was suspect for another reason too: having the
security blob go away from under a user sounds like it could cause
various other problems anyway, so I think the old code was at least
_prone_ to bugs even if it didn't have catastrophic behavior."

I have tested this patch with IPC testcases from LTP on both my
quad-core laptop and on a 64 core NUMA server. In both cases selinux is
enabled, and tests pass for both voluntary and forced preemption models.
While the mentioned races are theoretical (at least no one as reported
them), I wanted to make sure that this new logic doesn't break anything
we weren't aware of.

Suggested-by: Linus Torvalds
Signed-off-by: Davidlohr Bueso
Acked-by: Manfred Spraul
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2013-09-25 00:36:53 +0800

12 Sep, 2013

1 commit

d9a605e40 ipc: rename ids->rw_mutex ... Browse Code »

Since in some situations the lock can be shared for readers, we shouldn't
be calling it a mutex, rename it to rwsem.

Signed-off-by: Davidlohr Bueso
Tested-by: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2013-09-12 06:59:42 +0800

10 Jul, 2013

8 commits

758a6ba39 ipc/sem.c: rename try_atomic_semop() to perform_atomic_semop(), docu update ... Browse Code »

Cleanup: Some minor points that I noticed while writing the previous
patches

1) The name try_atomic_semop() is misleading: The function performs the
operation (if it is possible).

2) Some documentation updates.

No real code change, a rename and documentation changes.

Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-07-10 01:33:28 +0800
d12e1e50e ipc/sem.c: replace shared sem_otime with per-semaphore value ... Browse Code »

sem_otime contains the time of the last semaphore operation that
completed successfully. Every operation updates this value, thus access
from multiple cpus can cause thrashing.

Therefore the patch replaces the variable with a per-semaphore variable.
The per-array sem_otime is only calculated when required.

No performance improvement on a single-socket i3 - only important for
larger systems.

Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-07-10 01:33:28 +0800
f269f40ad ipc/sem.c: always use only one queue for alter operations ... Browse Code »

There are two places that can contain alter operations:
- the global queue: sma->pending_alter
- the per-semaphore queues: sma->sem_base[].pending_alter.

Since one of the queues must be processed first, this causes an odd
priorization of the wakeups: complex operations have priority over
simple ops.

The patch restores the behavior of linux pending_alter is used.
- otherwise, the per-semaphore queues are used.

As a side effect, do_smart_update_queue() becomes much simpler: no more
goto logic.

Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-07-10 01:33:28 +0800
1a82e9e1d ipc/sem: separate wait-for-zero and alter tasks into seperate queues ... Browse Code »

Introduce separate queues for operations that do not modify the
semaphore values. Advantages:

- Simpler logic in check_restart().
- Faster update_queue(): Right now, all wait-for-zero operations are
always tested, even if the semaphore value is not 0.
- wait-for-zero gets again priority, as in linux
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-07-10 01:33:28 +0800
f5c936c0f ipc/sem.c: cacheline align the semaphore structures ... Browse Code »

As now each semaphore has its own spinlock and parallel operations are
possible, give each semaphore its own cacheline.

On a i3 laptop, this gives up to 28% better performance:

#semscale 10 | grep "interleave 2"
- before:
Cpus 1, interleave 2 delay 0: 36109234 in 10 secs
Cpus 2, interleave 2 delay 0: 55276317 in 10 secs
Cpus 3, interleave 2 delay 0: 62411025 in 10 secs
Cpus 4, interleave 2 delay 0: 81963928 in 10 secs

-after:
Cpus 1, interleave 2 delay 0: 35527306 in 10 secs
Cpus 2, interleave 2 delay 0: 70922909 in 10 secs <<< + 28%
Cpus 3, interleave 2 delay 0: 80518538 in 10 secs
Cpus 4, interleave 2 delay 0: 89115148 in 10 secs <<< + 8.7%

i3, with 2 cores and with hyperthreading enabled. Interleave 2 in order
use first the full cores. HT partially hides the delay from cacheline
trashing, thus the improvement is "only" 8.7% if 4 threads are running.

Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-07-10 01:33:28 +0800
9ad66ae65 ipc: remove unused functions ... Browse Code »

We can now drop the msg_lock and msg_lock_check functions along with a
bogus comment introduced previously in semctl_down.

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2013-07-10 01:33:27 +0800
7b4cc5d84 ipc: move locking out of ipcctl_pre_down_nolock ... Browse Code »

This function currently acquires both the rw_mutex and the rcu lock on
successful lookups, leaving the callers to explicitly unlock them,
creating another two level locking situation.

Make the callers (including those that still use ipcctl_pre_down())
explicitly lock and unlock the rwsem and rcu lock.

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2013-07-10 01:33:27 +0800
cf9d5d78d ipc: close open coded spin lock calls ... Browse Code »

Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2013-07-10 01:33:27 +0800

27 May, 2013

1 commit

ab465df9d ipc/sem.c: Fix missing wakeups in do_smart_update_queue() ... Browse Code »

do_smart_update_queue() is called when an operation (semop,
semctl(SETVAL), semctl(SETALL), ...) modified the array. It must check
which of the sleeping tasks can proceed.

do_smart_update_queue() missed a few wakeups:
- if a sleeping complex op was completed, then all per-semaphore queues
must be scanned - not only those that were modified by *sops
- if a sleeping simple op proceeded, then the global queue must be
scanned again

And:
- the test for "|sops == NULL) before scanning the global queue is not
required: If the global queue is empty, then it doesn't need to be
scanned - regardless of the reason for calling do_smart_update_queue()

The patch is not optimized, i.e. even completing a wait-for-zero
operation causes a rescan. This is done to keep the patch as simple as
possible.

Signed-off-by: Manfred Spraul
Acked-by: Davidlohr Bueso
Cc: Rik van Riel
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2013-05-27 06:14:51 +0800

10 May, 2013

2 commits

de2657f94 ipc,sem: fix semctl(..., GETNCNT) ... Browse Code »

The semctl GETNCNT returns the number of semops waiting for the
specified semaphore to become nonzero. After commit 9f1bc2c9022c
("ipc,sem: have only one list in struct sem_queue"), the semops waiting
on just one semaphore are waiting on that semaphore's list.

In order to return the correct count, we have to walk that list too, in
addition to the sem_array's list for complex operations.

Signed-off-by: Rik van Riel
Signed-off-by: Linus Torvalds

Rik van Riel
2013-05-10 05:17:47 +0800
ebc2e5e6a ipc,sem: fix semctl(..., GETZCNT) ... Browse Code »

The semctl GETZCNT returns the number of semops waiting for the
specified semaphore to become zero. After commit 9f1bc2c9022c
("ipc,sem: have only one list in struct sem_queue"), the semops waiting
on just one semaphore are waiting on that semaphore's list.

In order to return the correct count, we have to walk that list too, in
addition to the sem_array's list for complex operations.

This bug broke dbench; it works again with this patch applied.

Signed-off-by: Rik van Riel
Reported-by: Kent Overstreet
Tested-by: Kent Overstreet
Signed-off-by: Linus Torvalds

Rik van Riel
2013-05-10 05:17:47 +0800

05 May, 2013

5 commits

941b0304a ipc: simplify rcu_read_lock() in semctl_nolock() ... Browse Code »

This trivially combines two rcu_read_lock() calls in both sides of a
if-statement into one single one in front of the if-statement.

Split out as an independent cleanup from the previous commit.

Acked-by: Davidlohr Bueso
Cc: Rik van Riel
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-05-05 08:24:59 +0800
c728b9c87 ipc: simplify semtimedop/semctl_main() common error path handling ... Browse Code »

With various straight RCU lock/unlock movements, one common exit path
pattern had become

rcu_read_unlock();
goto out_wakeup;

and in fact there were no cases where we wanted to exit to out_wakeup
_without_ releasing the RCU read lock.

So replace that pattern with "goto out_rcu_wakeup", and remove the old
out_wakeup.

Acked-by: Davidlohr Bueso
Cc: Rik van Riel
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-05-05 08:21:58 +0800
321310ced ipc: move sem_obtain_lock() rcu locking into the only caller ... Browse Code »

sem_obtain_lock() was another of those functions that returned with the
RCU lock held for reading in the success case. Move the RCU locking to
the caller (semtimedop()), making it more obvious. We already did RCU
locking elsewhere in that function.

Side note: why does semtimedop() re-do the semphore lookup after the
sleep, rather than just getting a reference to the semaphore it already
looked up originally?

Acked-by: Davidlohr Bueso
Cc: Rik van Riel
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-05-05 08:20:14 +0800
fbfd1d286 ipc: fix double sem unlock in semctl error path ... Browse Code »

Fix another ipc locking buglet introduced by the scalability patches:
when semctl_down() was changed to delay the semaphore locking, one error
path for security_sem_semctl() went through the semaphore unlock logic
even though the semaphore had never been locked.

Introduced by commit 16df3674efe3 ("ipc,sem: do not hold ipc lock more
than necessary")

Acked-by: Davidlohr Bueso
Cc: Rik van Riel
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-05-05 08:19:59 +0800
4091fd942 ipc: move the rcu_read_lock() from sem_lock_and_putref() into callers ... Browse Code »

This is another ipc semaphore locking cleanup, trying to make the
locking more straightforward. We move the rcu read locking into the
callers of sem_lock_and_putref(), which in general means that we now
mostly do the rcu_read_lock() and rcu_read_unlock() in the same
function.

Mostly. We still have the ipc_addid/newary/freeary mess, and things
like ipcctl_pre_down_nolock().

Acked-by: Davidlohr Bueso
Cc: Rik van Riel
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-05-05 08:19:39 +0800