27 May, 2013

1 commit

  • do_smart_update_queue() is called when an operation (semop,
    semctl(SETVAL), semctl(SETALL), ...) modified the array. It must check
    which of the sleeping tasks can proceed.

    do_smart_update_queue() missed a few wakeups:
    - if a sleeping complex op was completed, then all per-semaphore queues
    must be scanned - not only those that were modified by *sops
    - if a sleeping simple op proceeded, then the global queue must be
    scanned again

    And:
    - the test for "|sops == NULL) before scanning the global queue is not
    required: If the global queue is empty, then it doesn't need to be
    scanned - regardless of the reason for calling do_smart_update_queue()

    The patch is not optimized, i.e. even completing a wait-for-zero
    operation causes a rescan. This is done to keep the patch as simple as
    possible.

    Signed-off-by: Manfred Spraul
    Acked-by: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

10 May, 2013

3 commits

  • Dave reported an oops triggered by trinity:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: newseg+0x10d/0x390
    PGD cf8c1067 PUD cf8c2067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    CPU: 2 PID: 7636 Comm: trinity-child2 Not tainted 3.9.0+#67
    ...
    Call Trace:
    ipcget+0x182/0x380
    SyS_shmget+0x5a/0x60
    tracesys+0xdd/0xe2

    This bug was introduced by commit af73e4d9506d ("hugetlbfs: fix mmap
    failure in unaligned size request").

    Reported-by: Dave Jones
    Cc:
    Signed-off-by: Li Zefan
    Reviewed-by: Naoya Horiguchi
    Acked-by: Rik van Riel
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • The semctl GETNCNT returns the number of semops waiting for the
    specified semaphore to become nonzero. After commit 9f1bc2c9022c
    ("ipc,sem: have only one list in struct sem_queue"), the semops waiting
    on just one semaphore are waiting on that semaphore's list.

    In order to return the correct count, we have to walk that list too, in
    addition to the sem_array's list for complex operations.

    Signed-off-by: Rik van Riel
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • The semctl GETZCNT returns the number of semops waiting for the
    specified semaphore to become zero. After commit 9f1bc2c9022c
    ("ipc,sem: have only one list in struct sem_queue"), the semops waiting
    on just one semaphore are waiting on that semaphore's list.

    In order to return the correct count, we have to walk that list too, in
    addition to the sem_array's list for complex operations.

    This bug broke dbench; it works again with this patch applied.

    Signed-off-by: Rik van Riel
    Reported-by: Kent Overstreet
    Tested-by: Kent Overstreet
    Signed-off-by: Linus Torvalds

    Rik van Riel
     

08 May, 2013

1 commit

  • The current kernel returns -EINVAL unless a given mmap length is
    "almost" hugepage aligned. This is because in sys_mmap_pgoff() the
    given length is passed to vm_mmap_pgoff() as it is without being aligned
    with hugepage boundary.

    This is a regression introduced in commit 40716e29243d ("hugetlbfs: fix
    alignment of huge page requests"), where alignment code is pushed into
    hugetlb_file_setup() and the variable len in caller side is not changed.

    To fix this, this patch partially reverts that commit, and adds
    alignment code in caller side. And it also introduces hstate_sizelog()
    in order to get proper hstate to specified hugepage size.

    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881

    [akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n]
    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Johannes Weiner
    Reported-by:
    Cc: Steven Truelove
    Cc: Jianguo Wu
    Cc: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

05 May, 2013

7 commits

  • This trivially combines two rcu_read_lock() calls in both sides of a
    if-statement into one single one in front of the if-statement.

    Split out as an independent cleanup from the previous commit.

    Acked-by: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • With various straight RCU lock/unlock movements, one common exit path
    pattern had become

    rcu_read_unlock();
    goto out_wakeup;

    and in fact there were no cases where we wanted to exit to out_wakeup
    _without_ releasing the RCU read lock.

    So replace that pattern with "goto out_rcu_wakeup", and remove the old
    out_wakeup.

    Acked-by: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • sem_obtain_lock() was another of those functions that returned with the
    RCU lock held for reading in the success case. Move the RCU locking to
    the caller (semtimedop()), making it more obvious. We already did RCU
    locking elsewhere in that function.

    Side note: why does semtimedop() re-do the semphore lookup after the
    sleep, rather than just getting a reference to the semaphore it already
    looked up originally?

    Acked-by: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Fix another ipc locking buglet introduced by the scalability patches:
    when semctl_down() was changed to delay the semaphore locking, one error
    path for security_sem_semctl() went through the semaphore unlock logic
    even though the semaphore had never been locked.

    Introduced by commit 16df3674efe3 ("ipc,sem: do not hold ipc lock more
    than necessary")

    Acked-by: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • This is another ipc semaphore locking cleanup, trying to make the
    locking more straightforward. We move the rcu read locking into the
    callers of sem_lock_and_putref(), which in general means that we now
    mostly do the rcu_read_lock() and rcu_read_unlock() in the same
    function.

    Mostly. We still have the ipc_addid/newary/freeary mess, and things
    like ipcctl_pre_down_nolock().

    Acked-by: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • ipc_rcu_putref() uses atomics for the refcount, and the games to lock
    and unlock the semaphore just to try to keep the reference counting
    working are no longer useful.

    Acked-by: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • The IPC locking is a mess, and sem_unlock() unlocks not only the
    semaphore spinlock, it also drops the rcu read lock. Unlike sem_lock(),
    which just gets the spin-lock, and expects the caller to get the rcu
    read lock.

    This all makes things very hard to follow, and it's very confusing when
    you take the rcu read lock in one function, and then release it in
    another. And it has caused actual bugs: the sem_obtain_lock() function
    ended up dropping the RCU read lock twice in one error path, because it
    first did the sem_unlock(), and then did a rcu_read_unlock() to match
    the rcu_read_lock() it had done.

    This is just a totally mindless "remove rcu_read_unlock() from
    sem_unlock() and add it immediately after each caller" (except for the
    aforementioned bug where we did too many rcu_read_unlock(), and in
    find_alloc_undo() where we just got the rcu_read_lock() to correct for
    the fact that sem_unlock would immediately drop it again).

    We can (and should) clean things up further, but this fixes the bug with
    the minimal amount of subtlety.

    Reviewed-by: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

03 May, 2013

1 commit

  • We can step on WARN_ON_ONCE() in sem_getref() if a semaphore is removed
    just as we are about to call sem_getref() from semctl_main(); results
    are not pretty.

    We should fail with -EIDRM, same as if IPC_RM happened while we'd been
    doing allocation there. This also expands sem_getref() at its only
    callsite (and fixed there), while sem_getref_and_unlock() is simply
    killed off - it has no callers at all.

    Signed-off-by: Al Viro
    Acked-by: Davidlohr Bueso
    Signed-off-by: Linus Torvalds

    Al Viro
     

02 May, 2013

3 commits

  • Commit 32fcfd40715e ("make vfree() safe to call from interrupt
    contexts") made it safe to do vfree directly from the RCU callback,
    which allows us to simplify ipc/util.c a lot by getting rid of the
    differences between vmalloc/kmalloc memory.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     
  • Split the proc namespace stuff out into linux/proc_ns.h.

    Signed-off-by: David Howells
    cc: netdev@vger.kernel.org
    cc: Serge E. Hallyn
    cc: Eric W. Biederman
    Signed-off-by: Al Viro

    David Howells
     

01 May, 2013

19 commits

  • Trying to run an application which was trying to put data into half of
    memory using shmget(), we found that having a shmall value below 8EiB-8TiB
    would prevent us from using anything more than 8TiB. By setting
    kernel.shmall greater than 8EiB-8TiB would make the job work.

    In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.

    ipc/shm.c:
    458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
    459 {
    ...
    465 int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT;
    ...
    474 if (ns->shm_tot + numpages > ns->shm_ctlall)
    475 return -ENOSPC;

    [akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
    Signed-off-by: Robin Holt
    Reported-by: Alex Thorlton
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Holt
     
  • The ipc/msg.c code does its list operations by hand and it open-codes the
    accesses, instead of using for_each_entry_[safe].

    Signed-off-by: Nikola Pajkovsky
    Cc: Stanislav Kinsbursky
    Cc: "Eric W. Biederman"
    Cc: Peter Hurley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikola Pajkovsky
     
  • Introduce finer grained locking for semtimedop, to handle the common case
    of a program wanting to manipulate one semaphore from an array with
    multiple semaphores.

    If the call is a semop manipulating just one semaphore in an array with
    multiple semaphores, only take the lock for that semaphore itself.

    If the call needs to manipulate multiple semaphores, or another caller is
    in a transaction that manipulates multiple semaphores, the sem_array lock
    is taken, as well as all the locks for the individual semaphores.

    On a 24 CPU system, performance numbers with the semop-multi
    test with N threads and N semaphores, look like this:

    vanilla Davidlohr's Davidlohr's + Davidlohr's +
    threads patches rwlock patches v3 patches
    10 610652 726325 1783589 2142206
    20 341570 365699 1520453 1977878
    30 288102 307037 1498167 2037995
    40 290714 305955 1612665 2256484
    50 288620 312890 1733453 2650292
    60 289987 306043 1649360 2388008
    70 291298 306347 1723167 2717486
    80 290948 305662 1729545 2763582
    90 290996 306680 1736021 2757524
    100 292243 306700 1773700 3059159

    [davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
    [davidlohr.bueso@hp.com: make refcounter atomic]
    Signed-off-by: Rik van Riel
    Suggested-by: Linus Torvalds
    Acked-by: Davidlohr Bueso
    Cc: Chegu Vinod
    Cc: Jason Low
    Reviewed-by: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Emmanuel Benisty
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Having only one list in struct sem_queue, and only queueing simple
    semaphore operations on the list for the semaphore involved, allows us to
    introduce finer grained locking for semtimedop.

    Signed-off-by: Rik van Riel
    Acked-by: Davidlohr Bueso
    Cc: Chegu Vinod
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Rename sem_lock() to sem_obtain_lock(), so we can introduce a sem_lock()
    later that only locks the sem_array and does nothing else.

    Open code the locking from ipc_lock() in sem_obtain_lock() so we can
    introduce finer grained locking for the sem_array in the next patch.

    [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno out of sem_obtain_lock()]
    Signed-off-by: Rik van Riel
    Acked-by: Davidlohr Bueso
    Cc: Chegu Vinod
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Instead of holding the ipc lock for permissions and security checks, among
    others, only acquire it when necessary.

    Some numbers....

    1) With Rik's semop-multi.c microbenchmark we can see the following
    results:

    Baseline (3.9-rc1):
    cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
    total operations: 151452270, ops/sec 5048409

    + 59.40% a.out [kernel.kallsyms] [k] _raw_spin_lock
    + 6.14% a.out [kernel.kallsyms] [k] sys_semtimedop
    + 3.84% a.out [kernel.kallsyms] [k] avc_has_perm_flags
    + 3.64% a.out [kernel.kallsyms] [k] __audit_syscall_exit
    + 2.06% a.out [kernel.kallsyms] [k] copy_user_enhanced_fast_string
    + 1.86% a.out [kernel.kallsyms] [k] ipc_lock

    With this patchset:
    cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
    total operations: 273156400, ops/sec 9105213

    + 18.54% a.out [kernel.kallsyms] [k] _raw_spin_lock
    + 11.72% a.out [kernel.kallsyms] [k] sys_semtimedop
    + 7.70% a.out [kernel.kallsyms] [k] ipc_has_perm.isra.21
    + 6.58% a.out [kernel.kallsyms] [k] avc_has_perm_flags
    + 6.54% a.out [kernel.kallsyms] [k] __audit_syscall_exit
    + 4.71% a.out [kernel.kallsyms] [k] ipc_obtain_object_check

    2) While on an Oracle swingbench DSS (data mining) workload the
    improvements are not as exciting as with Rik's benchmark, we can see
    some positive numbers. For an 8 socket machine the following are the
    percentages of %sys time incurred in the ipc lock:

    Baseline (3.9-rc1):
    100 swingbench users: 8,74%
    400 swingbench users: 21,86%
    800 swingbench users: 84,35%

    With this patchset:
    100 swingbench users: 8,11%
    400 swingbench users: 19,93%
    800 swingbench users: 77,69%

    [riel@redhat.com: fix two locking bugs]
    [sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Reviewed-by: Chegu Vinod
    Acked-by: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Jason Low
    Cc: Emmanuel Benisty
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Various forms of ipc use ipcctl_pre_down() to retrieve an ipc object and
    check permissions, mostly for IPC_RMID and IPC_SET commands.

    Introduce ipcctl_pre_down_nolock(), a lockless version of this function.
    The locking version is retained, yet modified to call the nolock version
    without affecting its semantics, thus transparent to all ipc callers.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Suggested-by: Linus Torvalds
    Cc: Chegu Vinod
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Through ipc_lock() and therefore ipc_lock_check() we currently return the
    locked ipc object. This is not necessary for all situations and can,
    therefore, cause unnecessary ipc lock contention.

    Introduce analogous ipc_obtain_object() and ipc_obtain_object_check()
    functions that only lookup and return the ipc object.

    Both these functions must be called within the RCU read critical section.

    [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno from ipc_lock()]
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Reviewed-by: Chegu Vinod
    Acked-by: Michel Lespinasse
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This series makes the sysv semaphore code more scalable, by reducing the
    time the semaphore lock is held, and making the locking more scalable for
    semaphore arrays with multiple semaphores.

    The first four patches were written by Davidlohr Buesso, and reduce the
    hold time of the semaphore lock.

    The last three patches change the sysv semaphore code locking to be more
    fine grained, providing a performance boost when multiple semaphores in a
    semaphore array are being manipulated simultaneously.

    On a 24 CPU system, performance numbers with the semop-multi
    test with N threads and N semaphores, look like this:

    vanilla Davidlohr's Davidlohr's + Davidlohr's +
    threads patches rwlock patches v3 patches
    10 610652 726325 1783589 2142206
    20 341570 365699 1520453 1977878
    30 288102 307037 1498167 2037995
    40 290714 305955 1612665 2256484
    50 288620 312890 1733453 2650292
    60 289987 306043 1649360 2388008
    70 291298 306347 1723167 2717486
    80 290948 305662 1729545 2763582
    90 290996 306680 1736021 2757524
    100 292243 306700 1773700 3059159

    This patch:

    There is no reason to be holding the ipc lock while reading ipcp->seq,
    hence remove misleading comment.

    Also simplify the return value for the function.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Cc: Chegu Vinod
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Signed-off-by: HoSung Jung
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HoSung Jung
     
  • [fengguang.wu@intel.com: find_msg can be static]
    Signed-off-by: Peter Hurley
    Cc: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Teach the helper routines about MSG_COPY so that msgtyp is preserved as
    the message number to copy.

    The security functions affected by this change were audited and no
    additional changes are necessary.

    Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • In preparation for refactoring the queue scan into a separate
    function, relocate msg copying.

    Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Separating msg allocation enables single-block vmalloc
    allocation instead.

    Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Pull compat cleanup from Al Viro:
    "Mostly about syscall wrappers this time; there will be another pile
    with patches in the same general area from various people, but I'd
    rather push those after both that and vfs.git pile are in."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    syscalls.h: slightly reduce the jungles of macros
    get rid of union semop in sys_semctl(2) arguments
    make do_mremap() static
    sparc: no need to sign-extend in sync_file_range() wrapper
    ppc compat wrappers for add_key(2) and request_key(2) are pointless
    x86: trim sys_ia32.h
    x86: sys32_kill and sys32_mprotect are pointless
    get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
    merge compat sys_ipc instances
    consolidate compat lookup_dcookie()
    convert vmsplice to COMPAT_SYSCALL_DEFINE
    switch getrusage() to COMPAT_SYSCALL_DEFINE
    switch epoll_pwait to COMPAT_SYSCALL_DEFINE
    convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
    switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
    make SYSCALL_DEFINE-generated wrappers do asmlinkage_protect
    make HAVE_SYSCALL_WRAPPERS unconditional
    consolidate cond_syscall and SYSCALL_ALIAS declarations
    teach SYSCALL_DEFINE how to deal with long long/unsigned long long
    get rid of duplicate logics in __SC_....[1-6] definitions

    Linus Torvalds
     

30 Apr, 2013

1 commit


10 Apr, 2013

1 commit

  • The only part of proc_dir_entry the code outside of fs/proc
    really cares about is PDE(inode)->data. Provide a helper
    for that; static inline for now, eventually will be moved
    to fs/proc, along with the knowledge of struct proc_dir_entry
    layout.

    Signed-off-by: Al Viro

    Al Viro
     

03 Apr, 2013

1 commit

  • Make sure that msg pointer is set back to error value in case of
    MSG_COPY flag is set and desired message to copy wasn't found. This
    garantees that msg is either a error pointer or a copy address.

    Otherwise the last message in queue will be freed without unlinking from
    the queue (which leads to memory corruption) and the dummy allocated
    copy won't be released.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     

29 Mar, 2013

1 commit

  • Pull userns fixes from Eric W Biederman:
    "The bulk of the changes are fixing the worst consequences of the user
    namespace design oversight in not considering what happens when one
    namespace starts off as a clone of another namespace, as happens with
    the mount namespace.

    The rest of the changes are just plain bug fixes.

    Many thanks to Andy Lutomirski for pointing out many of these issues."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Restrict when proc and sysfs can be mounted
    ipc: Restrict mounting the mqueue filesystem
    vfs: Carefully propogate mounts across user namespaces
    vfs: Add a mount flag to lock read only bind mounts
    userns: Don't allow creation if the user is chrooted
    yama: Better permission check for ptraceme
    pid: Handle the exit of a multi-threaded init.
    scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.

    Linus Torvalds
     

27 Mar, 2013

1 commit

  • Only allow mounting the mqueue filesystem if the caller has CAP_SYS_ADMIN
    rights over the ipc namespace. The principle here is if you create
    or have capabilities over it you can mount it, otherwise you get to live
    with what other people have mounted.

    This information is not particularly sensitive and mqueue essentially
    only reports which posix messages queues exist. Still when creating a
    restricted environment for an application to live any extra
    information may be of use to someone with sufficient creativity. The
    historical if imperfect way this information has been restricted has
    been not to allow mounts and restricting this to ipc namespace
    creators maintains the spirit of the historical restriction.

    Cc: stable@vger.kernel.org
    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman