10 Jul, 2013

10 commits

  • We can now drop the msg_lock and msg_lock_check functions along with a
    bogus comment introduced previously in semctl_down.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • do_msgrcv() is the last msg queue function that abuses the ipc lock Take
    it only when needed when actually updating msq.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • do_msgsnd() is another function that does too many things with the ipc
    object lock acquired. Take it only when needed when actually updating
    msq.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • While the INFO cmd doesn't take the ipc lock, the STAT commands do
    acquire it unnecessarily. We can do the permissions and security checks
    only holding the rcu lock.

    This function now mimics semctl_nolock().

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Add msq_obtain_object() and msq_obtain_object_check(), which will allow
    us to get the ipc object without acquiring the lock. Just as with
    semaphores, these functions are basically wrappers around
    ipc_obtain_object*().

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Similar to semctl, when calling msgctl, the *_INFO and *_STAT commands
    can be performed without acquiring the ipc object.

    Add a msgctl_nolock() function and move the logic of *_INFO and *_STAT
    out of msgctl(). This change still takes the lock and it will be
    properly lockless in the next patch

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Instead of holding the ipc lock for the entire function, use the
    ipcctl_pre_down_nolock and only acquire the lock for specific commands:
    RMID and SET.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This function currently acquires both the rw_mutex and the rcu lock on
    successful lookups, leaving the callers to explicitly unlock them,
    creating another two level locking situation.

    Make the callers (including those that still use ipcctl_pre_down())
    explicitly lock and unlock the rwsem and rcu lock.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This patchset continues the work that began in the sysv ipc semaphore
    scaling series, see

    https://lkml.org/lkml/2013/3/20/546

    Just like semaphores used to be, sysv shared memory and msg queues also
    abuse the ipc lock, unnecessarily holding it for operations such as
    permission and security checks.

    This patchset mostly deals with mqueues, and while shared mem can be
    done in a very similar way, I want to get these patches out in the open
    first. It also does some pending cleanups, mostly focused on the two
    level locking we have in ipc code, taking care of ipc_addid() and
    ipcctl_pre_down_nolock() - yes there are still functions that need to be
    updated as well.

    This patch:

    Make all callers explicitly take and release the RCU read lock.

    This addresses the two level locking seen in newary(), newseg() and
    newqueue(). For the last two, explicitly unlock the ipc object and the
    rcu lock, instead of calling the custom shm_unlock and msg_unlock
    functions. The next patch will deal with the open coded locking for
    ->perm.lock

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

01 May, 2013

6 commits

  • The ipc/msg.c code does its list operations by hand and it open-codes the
    accesses, instead of using for_each_entry_[safe].

    Signed-off-by: Nikola Pajkovsky
    Cc: Stanislav Kinsbursky
    Cc: "Eric W. Biederman"
    Cc: Peter Hurley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikola Pajkovsky
     
  • Introduce finer grained locking for semtimedop, to handle the common case
    of a program wanting to manipulate one semaphore from an array with
    multiple semaphores.

    If the call is a semop manipulating just one semaphore in an array with
    multiple semaphores, only take the lock for that semaphore itself.

    If the call needs to manipulate multiple semaphores, or another caller is
    in a transaction that manipulates multiple semaphores, the sem_array lock
    is taken, as well as all the locks for the individual semaphores.

    On a 24 CPU system, performance numbers with the semop-multi
    test with N threads and N semaphores, look like this:

    vanilla Davidlohr's Davidlohr's + Davidlohr's +
    threads patches rwlock patches v3 patches
    10 610652 726325 1783589 2142206
    20 341570 365699 1520453 1977878
    30 288102 307037 1498167 2037995
    40 290714 305955 1612665 2256484
    50 288620 312890 1733453 2650292
    60 289987 306043 1649360 2388008
    70 291298 306347 1723167 2717486
    80 290948 305662 1729545 2763582
    90 290996 306680 1736021 2757524
    100 292243 306700 1773700 3059159

    [davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
    [davidlohr.bueso@hp.com: make refcounter atomic]
    Signed-off-by: Rik van Riel
    Suggested-by: Linus Torvalds
    Acked-by: Davidlohr Bueso
    Cc: Chegu Vinod
    Cc: Jason Low
    Reviewed-by: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Emmanuel Benisty
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • [fengguang.wu@intel.com: find_msg can be static]
    Signed-off-by: Peter Hurley
    Cc: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Teach the helper routines about MSG_COPY so that msgtyp is preserved as
    the message number to copy.

    The security functions affected by this change were audited and no
    additional changes are necessary.

    Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • In preparation for refactoring the queue scan into a separate
    function, relocate msg copying.

    Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     

03 Apr, 2013

1 commit

  • Make sure that msg pointer is set back to error value in case of
    MSG_COPY flag is set and desired message to copy wasn't found. This
    garantees that msg is either a error pointer or a copy address.

    Otherwise the last message in queue will be freed without unlinking from
    the queue (which leads to memory corruption) and the dummy allocated
    copy won't be released.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     

09 Mar, 2013

1 commit

  • When MSG_COPY is set, a duplicate message must be allocated for the copy
    before locking the queue. However, the copy could not be larger than was
    sent which is limited to msg_ctlmax.

    Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     

05 Jan, 2013

7 commits

  • Signed-off-by: Stanislav Kinsbursky
    Cc: "Eric W. Biederman"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     
  • Remove the redundant and confusing fill_copy(). Also add copy_msg()
    check for error. In this case exit from the function have to be done
    instead of break, because further code interprets any error as EAGAIN.

    Also define copy_msg() for the case when CONFIG_CHECKPOINT_RESTORE is
    disabled.

    Signed-off-by: Stanislav Kinsbursky
    Cc: "Eric W. Biederman"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     
  • This code works if CONFIG_CHECKPOINT_RESTORE is disabled.

    [akpm@linux-foundation.org: remove __maybe_unused]
    Signed-off-by: Stanislav Kinsbursky
    Cc: "Eric W. Biederman"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     
  • Passing and checking of msgflg to free_copy() is redundant. This patch
    sets copy to NULL on declaration instead and checks for non-NULL in
    free_copy().

    Note: in case of copy allocation failure, error is returned immediately.
    So no need to check for IS_ERR() in free_copy().

    Signed-off-by: Stanislav Kinsbursky
    Cc: "Eric W. Biederman"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     
  • This patch is required for checkpoint/restore in userspace.

    c/r requires some way to get all pending IPC messages without deleting
    them from the queue (checkpoint can fail and in this case tasks will be
    resumed, so queue have to be valid).

    To achive this, new operation flag MSG_COPY for sys_msgrcv() system call
    was introduced. If this flag was specified, then mtype is interpreted as
    number of the message to copy.

    If MSG_COPY is set, then kernel will allocate dummy message with passed
    size, and then use new copy_msg() helper function to copy desired message
    (instead of unlinking it from the queue).

    Notes:

    1) Return -ENOSYS if MSG_COPY is specified, but
    CONFIG_CHECKPOINT_RESTORE is not set.

    Signed-off-by: Stanislav Kinsbursky
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Al Viro
    Cc: KOSAKI Motohiro
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     
  • Move all message related manipulation into one function msg_fill().
    Actually, two functions because of the compat one.

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Stanislav Kinsbursky
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Al Viro
    Cc: KOSAKI Motohiro
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     
  • This is a cleanup patch. The assignment is redundant.

    Signed-off-by: Stanislav Kinsbursky
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Al Viro
    Cc: KOSAKI Motohiro
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     

07 Sep, 2012

1 commit

  • - Store the ipc owner and creator with a kuid
    - Store the ipc group and the crators group with a kgid.
    - Add error handling to ipc_update_perms, allowing it to
    fail if the uids and gids can not be converted to kuids
    or kgids.
    - Modify the proc files to display the ipc creator and
    owner in the user namespace of the opener of the proc file.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

31 Mar, 2011

1 commit


24 Mar, 2011

1 commit

  • CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
    because the resource comes from current's own ipc namespace.

    setuid/setgid are to uids in own namespace, so again checks can be against
    current_user_ns().

    Changelog:
    Jan 11: Use task_ns_capable() in place of sched_capable().
    Jan 11: Use nsown_capable() as suggested by Bastian Blank.
    Jan 11: Clarify (hopefully) some logic in futex and sched.c
    Feb 15: use ns_capable for ipc, not nsown_capable
    Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
    Feb 23: pass ns down rather than taking it from current

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Serge E. Hallyn
    Acked-by: "Eric W. Biederman"
    Acked-by: Daniel Lezcano
    Acked-by: David Howells
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

25 May, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

16 Dec, 2009

1 commit

  • We have apparently had a memory leak since
    7ca7e564e049d8b350ec9d958ff25eaa24226352 "ipc: store ipcs into IDRs" in
    2007. The idr of which 3 exist for each ipc namespace is never freed.

    This patch simply frees them when the ipcns is freed. I don't believe any
    idr_remove() are done from rcu (and could therefore be delayed until after
    this idr_destroy()), so the patch should be safe. Some quick testing
    showed no harm, and the memory leak fixed.

    Caught by kmemleak.

    Signed-off-by: Serge E. Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

04 Dec, 2009

1 commit

  • Commit a0d092f introduced the following warning:
    ipc/msg.c: In function ?msgctl_down?:
    ipc/msg.c:415: warning: ?msqid64? may be used uninitialized in this function

    The gcc warning in this case is actually bogus, as msqid64 is touched only
    iff cmd == IPC_SET, and in such case, copy_msqid_from_user() initializes
    it properly.

    Signed-off-by: Felipe Contreras
    Signed-off-by: Jiri Kosina

    Felipe Contreras
     

14 Jan, 2009

1 commit


07 Jun, 2008

1 commit

  • When posting:
    [PATCH 1/8] Scaling msgmni to the amount of lowmem
    (see http://lkml.org/lkml/2008/2/11/171), I have added a KERN_INFO message
    that is output each time msgmni is recomputed.

    In http://lkml.org/lkml/2008/4/29/575 Tony Luck complained that this
    message references an ipc namespace address that is useless.

    I first thought of using an audit_log instead of a printk, as suggested by
    Serge Hallyn. But unfortunately, we do not have any other information
    than the namespace address to provide here too. So I chose to move the
    message and output it only at boot time, removing the reference to the
    namespace.

    Signed-off-by: Nadia Derbey
    Cc: Pierre Peiffer
    Cc: Manfred Spraul
    Acked-by: Tony Luck
    Cc: "Serge E. Hallyn"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

29 Apr, 2008

6 commits

  • Add definitions of USHORT_MAX and others into kernel. ipc uses it and slub
    implementation might also use it.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Zhang Yanmin
    Reviewed-by: Christoph Lameter
    Cc: Nadia Derbey
    Cc: "Pierre Peiffer"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang, Yanmin
     
  • semctl_down(), msgctl_down() and shmctl_down() are used to handle the same set
    of commands for each kind of IPC. They all start to do the same job (they
    retrieve the ipc and do some permission checks) before handling the commands
    on their own.

    This patch proposes to consolidate this by moving these same pieces of code
    into one common function called ipcctl_pre_down().

    It simplifies a little these xxxctl_down() functions and increases a little
    the maintainability.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • The IPC_SET command performs the same permission setting for all IPCs. This
    patch introduces a common ipc_update_perm() function to update these
    permissions and makes use of it for all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • All IPCs make use of an intermetiate *_setbuf structure to handle the IPC_SET
    command. This is not really needed and, moreover, it complicates a little bit
    the code.

    This patch gets rid of the use of it and uses directly the semid64_ds/
    msgid64_ds/shmid64_ds structure.

    In addition of removing one struture declaration, it also simplifies and
    improves a little bit the common 64-bits path.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Currently, sys_msgctl is not easy to read.

    This patch tries to improve that by introducing the msgctl_down function to
    handle all commands requiring the rwmutex to be taken in write mode (ie
    IPC_SET and IPC_RMID for now). It is the equivalent function of semctl_down
    for message queues.

    This greatly changes the readability of sys_msgctl and also harmonizes the way
    these commands are handled among all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Introduce the registration of a callback routine that recomputes msg_ctlmni
    upon memory add / remove.

    A single notifier block is registered in the hotplug memory chain for all the
    ipc namespaces.

    Since the ipc namespaces are not linked together, they have their own
    notification chain: one notifier_block is defined per ipc namespace.

    Each time an ipc namespace is created (removed) it registers (unregisters) its
    notifier block in (from) the ipcns chain. The callback routine registered in
    the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event.
    Each callback routine registered in the ipcns namespace, in turn, recomputes
    msgmni for the owning namespace.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey