Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

17 Dec, 2014

1 commit

603ba7e41 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs pile #2 from Al Viro:
"Next pile (and there'll be one or two more).

The large piece in this one is getting rid of /proc/*/ns/* weirdness;
among other things, it allows to (finally) make nameidata completely
opaque outside of fs/namei.c, making for easier further cleanups in
there"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coda_venus_readdir(): use file_inode()
fs/namei.c: fold link_path_walk() call into path_init()
path_init(): don't bother with LOOKUP_PARENT in argument
fs/namei.c: new helper (path_cleanup())
path_init(): store the "base" pointer to file in nameidata itself
make default ->i_fop have ->open() fail with ENXIO
make nameidata completely opaque outside of fs/namei.c
kill proc_ns completely
take the targets of /proc/*/ns/* symlinks to separate fs
bury struct proc_ns in fs/proc
copy address of proc_ns_ops into ns_common
new helpers: ns_alloc_inum/ns_free_inum
make proc_ns_operations work with struct ns_common * instead of void *
switch the rest of proc_ns_operations to working with &...->ns
netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
common object embedded into various struct ....ns

Linus Torvalds
2014-12-17 07:53:03 +0800

14 Dec, 2014

4 commits

07a46ed27 shmdt: use i_size_read() instead of ->i_size ... Browse Code »

Andrew Morton noted

http://lkml.kernel.org/r/20141104142027.a7a0d010772d84560b445f59@linux-foundation.org

that the shmdt uses inode->i_size outside of i_mutex being held.
There is one more case in shm.c in shm_destroy(). This converts
both users over to use i_size_read().

Signed-off-by: Dave Hansen
Cc: Manfred Spraul
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2014-12-14 04:42:52 +0800
d3c97900b ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments ... Browse Code »

This is a highly-contrived scenario. But, a single shmdt() call can be
induced in to unmapping memory from mulitple shm segments. Example code
is here:

http://www.sr71.net/~dave/intel/shmfun.c

The fix is pretty simple: Record the 'struct file' for the first VMA we
encounter and then stick to it. Decline to unmap anything not from the
same file and thus the same segment.

I found this by inspection and the odds of anyone hitting this in practice
are pretty darn small.

Lightly tested, but it's a pretty small patch.

Signed-off-by: Dave Hansen
Cc: Manfred Spraul
Reviewed-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2014-12-14 04:42:52 +0800
0050ee059 ipc/msg: increase MSGMNI, remove scaling ... Browse Code »

SysV can be abused to allocate locked kernel memory. For most systems, a
small limit doesn't make sense, see the discussion with regards to SHMMAX.

Therefore: increase MSGMNI to the maximum supported.

And: If we ignore the risk of locking too much memory, then an automatic
scaling of MSGMNI doesn't make sense. Therefore the logic can be removed.

The code preserves auto_msgmni to avoid breaking any user space applications
that expect that the value exists.

Notes:
1) If an administrator must limit the memory allocations, then he can set
MSGMNI as necessary.

Or he can disable sysv entirely (as e.g. done by Android).

2) MSGMAX and MSGMNB are intentionally not increased, as these values are used
to control latency vs. throughput:
If MSGMNB is large, then msgsnd() just returns and more messages can be queued
before a task switch to a task that calls msgrcv() is forced.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Rafael Aquini
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-12-14 04:42:52 +0800
2e094abfd ipc/sem.c: change memory barrier in sem_lock() to smp_rmb() ... Browse Code »

When I fixed bugs in the sem_lock() logic, I was more conservative than
necessary. Therefore it is safe to replace the smp_mb() with smp_rmb().
And: With smp_rmb(), semop() syscalls are up to 10% faster.

The race we must protect against is:

sem->lock is free
sma->complex_count = 0
sma->sem_perm.lock held by thread B

thread A:

A: spin_lock(&sem->lock)

B: sma->complex_count++; (now 1)
B: spin_unlock(&sma->sem_perm.lock);

A: spin_is_locked(&sma->sem_perm.lock);
A: XXXXX memory barrier
A: if (sma->complex_count == 0)

Thread A must read the increased complex_count value, i.e. the read must
not be reordered with the read of sem_perm.lock done by spin_is_locked().

Since it's about ordering of reads, smp_rmb() is sufficient.

[akpm@linux-foundation.org: update sem_lock() comment, from Davidlohr]
Signed-off-by: Manfred Spraul
Reviewed-by: Davidlohr Bueso
Acked-by: Rafael Aquini
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-12-14 04:42:52 +0800

11 Dec, 2014

2 commits

707c5960f Merge branch 'nsfs' into for-next Browse Code »

Al Viro
2014-12-11 10:31:59 +0800
cbfe0de30 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull VFS changes from Al Viro:
"First pile out of several (there _definitely_ will be more). Stuff in
this one:

- unification of d_splice_alias()/d_materialize_unique()

- iov_iter rewrite

- killing a bunch of ->f_path.dentry users (and f_dentry macro).

Getting that completed will make life much simpler for
unionmount/overlayfs, since then we'll be able to limit the places
sensitive to file _dentry_ to reasonably few. Which allows to have
file_inode(file) pointing to inode in a covered layer, with dentry
pointing to (negative) dentry in union one.

Still not complete, but much closer now.

- crapectomy in lustre (dead code removal, mostly)

- "let's make seq_printf return nothing" preparations

- assorted cleanups and fixes

There _definitely_ will be more piles"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
copy_from_iter_nocache()
new helper: iov_iter_kvec()
csum_and_copy_..._iter()
iov_iter.c: handle ITER_KVEC directly
iov_iter.c: convert copy_to_iter() to iterate_and_advance
iov_iter.c: convert copy_from_iter() to iterate_and_advance
iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
iov_iter.c: convert iov_iter_zero() to iterate_and_advance
iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
iov_iter.c: iterate_and_advance
iov_iter.c: macros for iterating over iov_iter
kill f_dentry macro
dcache: fix kmemcheck warning in switch_names
new helper: audit_file()
nfsd_vfs_write(): use file_inode()
ncpfs: use file_inode()
kill f_dentry uses
lockd: get rid of ->f_path.dentry->d_sb
...

Linus Torvalds
2014-12-11 08:10:49 +0800

05 Dec, 2014

5 commits

33c429405 copy address of proc_ns_ops into ns_common ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-12-05 03:34:47 +0800
6344c433a new helpers: ns_alloc_inum/ns_free_inum ... Browse Code »

take struct ns_common *, for now simply wrappers around proc_{alloc,free}_inum()

Signed-off-by: Al Viro

Al Viro
2014-12-05 03:34:36 +0800
64964528b make proc_ns_operations work with struct ns_common * instead of void * ... Browse Code »

We can do that now. And kill ->inum(), while we are at it - all instances
are identical.

Signed-off-by: Al Viro

Al Viro
2014-12-05 03:34:17 +0800
3c0411846 switch the rest of proc_ns_operations to working with &...->ns ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-12-05 03:34:11 +0800
435d5f4bb common object embedded into various struct ....ns ... Browse Code »

for now - just move corresponding ->proc_inum instances over there

Acked-by: "Eric W. Biederman"
Signed-off-by: Al Viro

Al Viro
2014-12-05 03:31:00 +0800

04 Dec, 2014

1 commit

e8577d1f0 ipc/sem.c: fully initialize sem_array before making it visible ... Browse Code »
9

ipc_addid() makes a new ipc identifier visible to everyone. New objects
start as locked, so that the caller can complete the initialization
after the call. Within struct sem_array, at least sma->sem_base and
sma->sem_nsems are accessed without any locks, therefore this approach
doesn't work.

Thus: Move the ipc_addid() to the end of the initialization.

Signed-off-by: Manfred Spraul
Reported-by: Rik van Riel
Acked-by: Rik van Riel
Acked-by: Davidlohr Bueso
Acked-by: Rafael Aquini
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-12-04 01:36:03 +0800

20 Nov, 2014

1 commit

9f45f5bf3 new helper: audit_file() ... Browse Code »

... for situations when we don't have any candidate in pathnames - basically,
in descriptor-based syscalls.

[Folded the build fix for !CONFIG_AUDITSYSCALL configs from Chen Gang]

Signed-off-by: Al Viro

Al Viro
2014-11-20 02:01:26 +0800

14 Oct, 2014

4 commits

0d5e75802 ipc: resolve shadow warnings ... Browse Code »

Resolve some shadow warnings produced in W=2 builds by changing the name
of some parameters and local variables. Change instances of "s64"
because that clashes with the well-known typedef. Also change a local
variable with the name "up" because that clashes with the name of of the
"up" function for semaphores. These are hazards so eliminate the
hazards by renaming them.

Signed-off-by: Mark Rustad
Signed-off-by: Jeff Kirsher
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mark Rustad
2014-10-14 08:18:23 +0800
d66a0520c ipc/util.c: use __seq_open_private() instead of seq_open() ... Browse Code »

Using __seq_open_private() removes boilerplate code from
sysvipc_proc_open().

The resultant code is shorter and easier to follow.

However, please note that __seq_open_private() call kzalloc() rather than
kmalloc() which may affect timing due to the memory initialisation
overhead.

Signed-off-by: Rob Jones
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rob Jones
2014-10-14 08:18:23 +0800
bf77b94c9 ipc/shm: kill the historical/wrong mm->start_stack check ... Browse Code »

do_shmat() is the only user of ->start_stack (proc just reports its
value), and this check looks ugly and wrong.

The reason for this check is not clear at all, and it wrongly assumes that
the stack can only grow down.

But the main problem is that in general mm->start_stack has nothing to do
with stack_vma->vm_start. Not only the application can switch to another
stack and even unmap this area, setup_arg_pages() expands the stack
without updating mm->start_stack during exec(). This means that in the
likely case "addr > start_stack - size - PAGE_SIZE * 5" is simply
impossible after find_vma_intersection() == F, or the stack can't grow
anyway because of RLIMIT_STACK.

Many thanks to Hugh for his explanations.

Signed-off-by: Oleg Nesterov
Acked-by: Hugh Dickins
Cc: Cyrill Gorcunov
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2014-10-14 08:18:23 +0800
1195d94e0 ipc: always handle a new value of auto_msgmni ... Browse Code »
5

proc_dointvec_minmax() returns zero if a new value has been set. So we
don't need to check all charecters have been handled.

Below you can find two examples. In the new value has not been handled
properly.

$ strace ./a.out
open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3
write(3, "0\n\0", 3) = 2
close(3) = 0
exit_group(0)
$ cat /sys/kernel/debug/tracing/trace

$strace ./a.out
open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3
write(3, "0\n", 2) = 2
close(3) = 0

$ cat /sys/kernel/debug/tracing/trace
a.out-697 [000] .... 3280.998235: unregister_ipcns_notifier
Cc: Mathias Krause
Cc: Manfred Spraul
Cc: Joe Perches
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Vagin
2014-10-14 08:18:22 +0800

08 Oct, 2014

1 commit

28596c972 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull "trivial tree" updates from Jiri Kosina:
"Usual pile from trivial tree everyone is so eagerly waiting for"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
Remove MN10300_PROC_MN2WS0038
mei: fix comments
treewide: Fix typos in Kconfig
kprobes: update jprobe_example.c for do_fork() change
Documentation: change "&" to "and" in Documentation/applying-patches.txt
Documentation: remove obsolete pcmcia-cs from Changes
Documentation: update links in Changes
Documentation: Docbook: Fix generated DocBook/kernel-api.xml
score: Remove GENERIC_HAS_IOMAP
gpio: fix 'CONFIG_GPIO_IRQCHIP' comments
tty: doc: Fix grammar in serial/tty
dma-debug: modify check_for_stack output
treewide: fix errors in printk
genirq: fix reference in devm_request_threaded_irq comment
treewide: fix synchronize_rcu() in comments
checkstack.pl: port to AArch64
doc: queue-sysfs: minor fixes
init/do_mounts: better syntax description
MIPS: fix comment spelling
powerpc/simpleboot: fix comment
...

Linus Torvalds
2014-10-08 09:16:26 +0800

09 Sep, 2014

1 commit

da3dae54e Documentation: Docbook: Fix generated DocBook/kernel-api.xml ... Browse Code »

This patch fix spelling typo found in DocBook/kernel-api.xml.
It is because the file is generated from the source comments,
I have to fix the comments in source codes.

Signed-off-by: Masanari Iida
Acked-by: Randy Dunlap
Signed-off-by: Jiri Kosina

Masanari Iida
2014-09-09 16:34:56 +0800

10 Aug, 2014

1 commit

77e40aae7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull namespace updates from Eric Biederman:
"This is a bunch of small changes built against 3.16-rc6. The most
significant change for users is the first patch which makes setns
drmatically faster by removing unneded rcu handling.

The next chunk of changes are so that "mount -o remount,.." will not
allow the user namespace root to drop flags on a mount set by the
system wide root. Aks this forces read-only mounts to stay read-only,
no-dev mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec
mounts to stay no exec and it prevents unprivileged users from messing
with a mounts atime settings. I have included my test case as the
last patch in this series so people performing backports can verify
this change works correctly.

The next change fixes a bug in NFS that was discovered while auditing
nsproxy users for the first optimization. Today you can oops the
kernel by reading /proc/fs/nfsfs/{servers,volumes} if you are clever
with pid namespaces. I rebased and fixed the build of the
!CONFIG_NFS_FS case yesterday when a build bot caught my typo. Given
that no one to my knowledge bases anything on my tree fixing the typo
in place seems more responsible that requiring a typo-fix to be
backported as well.

The last change is a small semantic cleanup introducing
/proc/thread-self and pointing /proc/mounts and /proc/net at it. This
prevents several kinds of problemantic corner cases. It is a
user-visible change so it has a minute chance of causing regressions
so the change to /proc/mounts and /proc/net are individual one line
commits that can be trivially reverted. Unfortunately I lost and
could not find the email of the original reporter so he is not
credited. From at least one perspective this change to /proc/net is a
refgression fix to allow pthread /proc/net uses that were broken by
the introduction of the network namespace"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts
proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net
proc: Implement /proc/thread-self to point at the directory of the current thread
proc: Have net show up under /proc//task/
NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes
mnt: Add tests for unprivileged remount cases that have found to be faulty
mnt: Change the default remount atime from relatime to the existing value
mnt: Correct permission checks in do_remount
mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
mnt: Only change user settable mount flags in remount
namespaces: Use task_lock and not rcu to protect nsproxy

Linus Torvalds
2014-08-10 08:10:41 +0800

09 Aug, 2014

2 commits

83293c0f5 shm: allow exit_shm in parallel if only marking orphans ... Browse Code »

If shm_rmid_force (the default state) is not set then the shmids are only
marked as orphaned and does not require any add, delete, or locking of the
tree structure.

Seperate the sysctl on and off case, and only obtain the read lock. The
newly added list head can be deleted under the read lock because we are
only called with current and will only change the semids allocated by this
task and not manipulate the list.

This commit assumes that up_read includes a sufficient memory barrier for
the writes to be seen my others that later obtain a write lock.

Signed-off-by: Milton Miller
Signed-off-by: Jack Miller
Cc: Davidlohr Bueso
Cc: Manfred Spraul
Cc: Anton Blanchard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jack Miller
2014-08-09 06:57:26 +0800
ab602f799 shm: make exit_shm work proportional to task activity ... Browse Code »

This is small set of patches our team has had kicking around for a few
versions internally that fixes tasks getting hung on shm_exit when there
are many threads hammering it at once.

Anton wrote a simple test to cause the issue:

http://ozlabs.org/~anton/junkcode/bust_shm_exit.c

Before applying this patchset, this test code will cause either hanging
tracebacks or pthread out of memory errors.

After this patchset, it will still produce output like:

root@somehost:~# ./bust_shm_exit 1024 160
...
INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 116, t=2111 jiffies, g=241, c=240, q=7113)
INFO: Stall ended before state dump start
...

But the task will continue to run along happily, so we consider this an
improvement over hanging, even if it's a bit noisy.

This patch (of 3):

exit_shm obtains the ipc_ns shm rwsem for write and holds it while it
walks every shared memory segment in the namespace. Thus the amount of
work is related to the number of shm segments in the namespace not the
number of segments that might need to be cleaned.

In addition, this occurs after the task has been notified the thread has
exited, so the number of tasks waiting for the ns shm rwsem can grow
without bound until memory is exausted.

Add a list to the task struct of all shmids allocated by this task. Init
the list head in copy_process. Use the ns->rwsem for locking. Add
segments after id is added, remove before removing from id.

On unshare of NEW_IPCNS orphan any ids as if the task had exited, similar
to handling of semaphore undo.

I chose a define for the init sequence since its a simple list init,
otherwise it would require a function call to avoid include loops between
the semaphore code and the task struct. Converting the list_del to
list_del_init for the unshare cases would remove the exit followed by
init, but I left it blow up if not inited.

Signed-off-by: Milton Miller
Signed-off-by: Jack Miller
Cc: Davidlohr Bueso
Cc: Manfred Spraul
Cc: Anton Blanchard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jack Miller
2014-08-09 06:57:26 +0800

30 Jul, 2014

1 commit

728dba3a3 namespaces: Use task_lock and not rcu to protect nsproxy ... Browse Code »

The synchronous syncrhonize_rcu in switch_task_namespaces makes setns
a sufficiently expensive system call that people have complained.

Upon inspect nsproxy no longer needs rcu protection for remote reads.
remote reads are rare. So optimize for same process reads and write
by switching using rask_lock instead.

This yields a simpler to understand lock, and a faster setns system call.

In particular this fixes a performance regression observed
by Rafael David Tinoco .

This is effectively a revert of Pavel Emelyanov's commit
cf7b708c8d1d7a27736771bcf4c457b332b0f818 Make access to task's nsproxy lighter
from 2007. The race this originialy fixed no longer exists as
do_notify_parent uses task_active_pid_ns(parent) instead of
parent->nsproxy.

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2014-07-30 09:08:50 +0800

07 Jun, 2014

16 commits

a5c5928b7 ipc: convert use of typedef ctl_table to struct ctl_table ... Browse Code »

This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2014-06-07 07:08:16 +0800
9b44ee2ee ipc/sem.c: add a printk_once for semctl(GETNCNT/GETZCNT) ... Browse Code »

The actual Linux implementation for semctl(GETNCNT) and semctl(GETZCNT)
always (since 0.99.10) reported a thread as sleeping on all semaphores
that are listed in the semop() call.

The documented behavior (both in the Linux man page and in the Single
Unix Specification) is that a task should be reported on exactly one
semaphore: The semaphore that caused the thread to got to sleep.

This patch adds a pr_info_once() that is triggered if a thread hits the
relevant case.

The code triggers slightly too often, otherwise it would be necessary to
replicate the old code. As there are no known users of GETNCNT or
GETZCNT, this is done to prevent unnecessary bloat.

The task that triggered is reported with name (tsk->comm) and pid.

Signed-off-by: Manfred Spraul
Acked-by: Davidlohr Bueso
Cc: Michael Kerrisk
Cc: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:15 +0800
b220c57ae ipc/sem.c: make semctl(,,{GETNCNT,GETZCNT}) standard compliant ... Browse Code »

SUSv4 clearly defines how semncnt and semzcnt must be calculated: A task
waits on exactly one semaphore: The semaphore from the first operation
in the sop array that cannot proceed.

The Linux implementation never followed the standard, it tried to count
all semaphores that might be the reason why a task sleeps.

This patch fixes that.

Note:
a) The implementation assumes that GETNCNT and GETZCNT are rare operations,
therefore the code counts them only on demand.
(If they wouldn't be rare, then the non-compliance would have
been found earlier)

b) compared to the initial version of the patch, the BUG_ONs were removed
and it was clarified that the new behavior conforms to SUS.

Back-compatibility concerns:

Manfred:

: - there is no application in Fedora that uses GETNCNT or GETZCNT.
:
: - application that use only single-sop semop() are also safe, the
: difference only affects complex apps.
:
: - portable application are also safe, the new behavior is standard
: compliant.
:
: But that's it. The old behavior existed in Linux from 0.99.something
: until now.

Michael:

: * These operations seem to be very little used. Grepping the public
: source that is contained Fedora 20 source DVD, there appear to be no
: uses. Of course, this says nothing about uses in private /
: non-mainstream FOSS code, but it seems likely that the same pattern
: is followed there.
:
: * The existing behavior is hard enough to understand that I suspect
: that no one understood it well enough to rely on it anyway
: (especially as that behavior contradicted both man page and POSIX).
:
: So, there's a chance of breakage, but I estimate that it's minute.

Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:15 +0800
ed247b7ca ipc/sem.c: store which operation blocks in perform_atomic_semop() ... Browse Code »

Preparation for the next patch:

In the slow-path of perform_atomic_semop(), store a pointer to the
operation that caused the operation to block.

Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:15 +0800
d198cd6d6 ipc/sem.c: change perform_atomic_semop parameters ... Browse Code »

Right now, perform_atomic_semop gets the content of sem_queue as
individual fields. Changes that, instead pass a pointer to sem_queue.

This is a preparation for the next patch: it uses sem_queue to store the
reason why a task must sleep.

Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:15 +0800
2f2ed41dc ipc/sem.c: remove code duplication ... Browse Code »

count_semzcnt and count_semncnt are more of less identical. The patch
creates a single function that either counts the number of tasks waiting
for zero or waiting due to a decrease operation.

Compared to the initial version, the BUG_ONs were removed.

Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:15 +0800
1994862dc ipc/sem.c: bugfix for semctl(,,GETZCNT) ... Browse Code »

GETZCNT is supposed to return the number of threads that wait until a
semaphore value becomes 0.

The current implementation overlooks complex operations that contain
both wait-for-zero operation and operations that alter at least one
semaphore.

The patch fixes that. It's intentionally copy&paste, this will be
cleaned up in the next patch.

Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:15 +0800
4bb6657dd ipc,msg: document volatile r_msg ... Browse Code »

The need for volatile is not obvious, document it.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Manfred Spraul
Cc: Aswin Chandramouleeswaran
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2014-06-07 07:08:15 +0800
3440a6bd1 ipc,msg: move some msgq ns code around ... Browse Code »

Nothing big and no logical changes, just get rid of some redundant
function declarations. Move msg_[init/exit]_ns down the end of the
file.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Manfred Spraul
Cc: Aswin Chandramouleeswaran
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2014-06-07 07:08:14 +0800
f75a2f358 ipc,msg: use current->state helpers ... Browse Code »

Call __set_current_state() instead of assigning the new state directly.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Manfred Spraul
Cc: Aswin Chandramouleeswaran
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2014-06-07 07:08:14 +0800
1376327ce ipc/shm.c: check for integer overflow during shmget. ... Browse Code »

SHMMAX is the upper limit for the size of a shared memory segment, counted
in bytes. The actual allocation is that size, rounded up to the next full
page.

Add a check that prevents the creation of segments where the rounded up
size causes an integer overflow.

Signed-off-by: Manfred Spraul
Acked-by: Davidlohr Bueso
Acked-by: KOSAKI Motohiro
Acked-by: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:14 +0800
09c6eb1f6 ipc/shm.c: check for overflows of shm_tot ... Browse Code »

shm_tot counts the total number of pages used by shm segments.

If SHMALL is ULONG_MAX (or nearly ULONG_MAX), then the number can
overflow. Subsequent calls to shmctl(,SHM_INFO,) would return wrong
values for shm_tot.

The patch adds a detection for overflows.

Signed-off-by: Manfred Spraul
Acked-by: Davidlohr Bueso
Acked-by: KOSAKI Motohiro
Acked-by: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:14 +0800
247a8ce82 ipc/shm.c: check for ulong overflows in shmat ... Browse Code »

The increase of SHMMAX/SHMALL is a 4 patch series.

The change itself is trivial, the only problem are interger overflows.
The overflows are not new, but if we make huge values the default, then
the code should be free from overflows.

SHMMAX:

- shmmem_file_setup places a hard limit on the segment size:
MAX_LFS_FILESIZE.

On 32-bit, the limit is > 1 TB, i.e. 4 GB-1 byte segments are
possible. Rounded up to full pages the actual allocated size
is 0. --> must be fixed, patch 3

- shmat:
- find_vma_intersection does not handle overflows properly.
--> must be fixed, patch 1

- the rest is fine, do_mmap_pgoff limits mappings to TASK_SIZE
and checks for overflows (i.e.: map 2 GB, starting from
addr=2.5GB fails).

SHMALL:
- after creating 8192 segments size (1L< must be fixed, patch 2.

Userspace:
- Obviously, there could be overflows in userspace. There is nothing
we can do, only use values smaller than ULONG_MAX.
I ended with "ULONG_MAX - 1L<
Acked-by: Davidlohr Bueso
Acked-by: KOSAKI Motohiro
Acked-by: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-06-07 07:08:14 +0800
46c0a8ca3 ipc, kernel: clear whitespace ... Browse Code »

trailing whitespace

Signed-off-by: Paul McQuade
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul McQuade
2014-06-07 07:08:14 +0800
7153e4027 ipc, kernel: use Linux headers ... Browse Code »

Use #include instead of
Use #include instead of

Signed-off-by: Paul McQuade
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul McQuade
2014-06-07 07:08:14 +0800
eb66ec44f ipc: constify ipc_ops ... Browse Code »

There is no need to recreate the very same ipc_ops structure on every
kernel entry for msgget/semget/shmget. Just declare it static and be
done with it. While at it, constify it as we don't modify the structure
at runtime.

Found in the PaX patch, written by the PaX Team.

Signed-off-by: Mathias Krause
Cc: PaX Team
Cc: Davidlohr Bueso
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathias Krause
2014-06-07 07:08:14 +0800