10 Jul, 2013

3 commits

  • Enforce that ipc_rcu_alloc returns a cacheline aligned pointer on SMP.

    Rationale:

    The SysV sem code tries to move the main spinlock into a seperate
    cacheline (____cacheline_aligned_in_smp). This works only if
    ipc_rcu_alloc returns cacheline aligned pointers. vmalloc and kmalloc
    return cacheline algined pointers, the implementation of ipc_rcu_alloc
    breaks that.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Manfred Spraul
    Cc: Rik van Riel
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • This function currently acquires both the rw_mutex and the rcu lock on
    successful lookups, leaving the callers to explicitly unlock them,
    creating another two level locking situation.

    Make the callers (including those that still use ipcctl_pre_down())
    explicitly lock and unlock the rwsem and rcu lock.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This patchset continues the work that began in the sysv ipc semaphore
    scaling series, see

    https://lkml.org/lkml/2013/3/20/546

    Just like semaphores used to be, sysv shared memory and msg queues also
    abuse the ipc lock, unnecessarily holding it for operations such as
    permission and security checks.

    This patchset mostly deals with mqueues, and while shared mem can be
    done in a very similar way, I want to get these patches out in the open
    first. It also does some pending cleanups, mostly focused on the two
    level locking we have in ipc code, taking care of ipc_addid() and
    ipcctl_pre_down_nolock() - yes there are still functions that need to be
    updated as well.

    This patch:

    Make all callers explicitly take and release the RCU read lock.

    This addresses the two level locking seen in newary(), newseg() and
    newqueue(). For the last two, explicitly unlock the ipc object and the
    rcu lock, instead of calling the custom shm_unlock and msg_unlock
    functions. The next patch will deal with the open coded locking for
    ->perm.lock

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

02 May, 2013

2 commits

  • Commit 32fcfd40715e ("make vfree() safe to call from interrupt
    contexts") made it safe to do vfree directly from the RCU callback,
    which allows us to simplify ipc/util.c a lot by getting rid of the
    differences between vmalloc/kmalloc memory.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     

01 May, 2013

3 commits

  • Introduce finer grained locking for semtimedop, to handle the common case
    of a program wanting to manipulate one semaphore from an array with
    multiple semaphores.

    If the call is a semop manipulating just one semaphore in an array with
    multiple semaphores, only take the lock for that semaphore itself.

    If the call needs to manipulate multiple semaphores, or another caller is
    in a transaction that manipulates multiple semaphores, the sem_array lock
    is taken, as well as all the locks for the individual semaphores.

    On a 24 CPU system, performance numbers with the semop-multi
    test with N threads and N semaphores, look like this:

    vanilla Davidlohr's Davidlohr's + Davidlohr's +
    threads patches rwlock patches v3 patches
    10 610652 726325 1783589 2142206
    20 341570 365699 1520453 1977878
    30 288102 307037 1498167 2037995
    40 290714 305955 1612665 2256484
    50 288620 312890 1733453 2650292
    60 289987 306043 1649360 2388008
    70 291298 306347 1723167 2717486
    80 290948 305662 1729545 2763582
    90 290996 306680 1736021 2757524
    100 292243 306700 1773700 3059159

    [davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
    [davidlohr.bueso@hp.com: make refcounter atomic]
    Signed-off-by: Rik van Riel
    Suggested-by: Linus Torvalds
    Acked-by: Davidlohr Bueso
    Cc: Chegu Vinod
    Cc: Jason Low
    Reviewed-by: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Emmanuel Benisty
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Various forms of ipc use ipcctl_pre_down() to retrieve an ipc object and
    check permissions, mostly for IPC_RMID and IPC_SET commands.

    Introduce ipcctl_pre_down_nolock(), a lockless version of this function.
    The locking version is retained, yet modified to call the nolock version
    without affecting its semantics, thus transparent to all ipc callers.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Suggested-by: Linus Torvalds
    Cc: Chegu Vinod
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Through ipc_lock() and therefore ipc_lock_check() we currently return the
    locked ipc object. This is not necessary for all situations and can,
    therefore, cause unnecessary ipc lock contention.

    Introduce analogous ipc_obtain_object() and ipc_obtain_object_check()
    functions that only lookup and return the ipc object.

    Both these functions must be called within the RCU read critical section.

    [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno from ipc_lock()]
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Reviewed-by: Chegu Vinod
    Acked-by: Michel Lespinasse
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

30 Apr, 2013

1 commit


10 Apr, 2013

1 commit

  • The only part of proc_dir_entry the code outside of fs/proc
    really cares about is PDE(inode)->data. Provide a helper
    for that; static inline for now, eventually will be moved
    to fs/proc, along with the knowledge of struct proc_dir_entry
    layout.

    Signed-off-by: Al Viro

    Al Viro
     

28 Feb, 2013

1 commit

  • Convert to the much saner new idr interface.

    The new interface doesn't directly translate to the way idr_pre_get()
    was used around ipc_addid() as preloading disables preemption. From
    my cursory reading, it seems like we should be able to do all
    allocation from ipc_addid(), so I moved it there. Can you please
    check whether this would be okay? If this is wrong and ipc_addid()
    should be allowed to be called from non-sleepable context, I'd suggest
    allocating id itself in the outer functions and later install the
    pointer using idr_replace().

    Signed-off-by: Tejun Heo
    Reported-by: Sedat Dilek
    Tested-by: Sedat Dilek
    Cc: Stanislav Kinsbursky
    Cc: "Eric W. Biederman"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

05 Jan, 2013

1 commit

  • Add 3 new variables and sysctls to tune them (by one "next_id" variable
    for messages, semaphores and shared memory respectively). This variable
    can be used to set desired id for next allocated IPC object. By default
    it's equal to -1 and old behaviour is preserved. If this variable is
    non-negative, then desired idr will be extracted from it and used as a
    start value to search for free IDR slot.

    Notes:

    1) this patch doesn't guarantee that the new object will have desired
    id. So it's up to user space how to handle new object with wrong id.

    2) After a sucessful id allocation attempt, "next_id" will be set back
    to -1 (if it was non-negative).

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Stanislav Kinsbursky
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Al Viro
    Cc: KOSAKI Motohiro
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     

07 Sep, 2012

1 commit

  • - Store the ipc owner and creator with a kuid
    - Store the ipc group and the crators group with a kgid.
    - Add error handling to ipc_update_perms, allowing it to
    fail if the uids and gids can not be converted to kuids
    or kgids.
    - Modify the proc files to display the ipc creator and
    owner in the user namespace of the opener of the proc file.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

31 Jul, 2012

1 commit

  • Rather than #define the options manually in the architecture code, add
    Kconfig options for them and select them there instead. This also allows
    us to select the compat IPC version parsing automatically for platforms
    using the old compat IPC interface.

    Reported-by: Andrew Morton
    Signed-off-by: Will Deacon
    Cc: Arnd Bergmann
    Cc: Chris Metcalf
    Cc: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon
     

21 Jul, 2011

1 commit


28 Mar, 2011

1 commit

  • Fix ipc/util.c kernel-doc warnings:

    Warning(ipc/util.c:336): No description found for parameter 'ns'
    Warning(ipc/util.c:620): No description found for parameter 'ns'
    Warning(ipc/util.c:790): No description found for parameter 'ns'

    Signed-off-by: Randy Dunlap
    Reviewed-by: Jesper Juhl
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

24 Mar, 2011

1 commit

  • CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
    because the resource comes from current's own ipc namespace.

    setuid/setgid are to uids in own namespace, so again checks can be against
    current_user_ns().

    Changelog:
    Jan 11: Use task_ns_capable() in place of sched_capable().
    Jan 11: Use nsown_capable() as suggested by Bastian Blank.
    Jan 11: Clarify (hopefully) some logic in futex and sched.c
    Feb 15: use ns_capable for ipc, not nsown_capable
    Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
    Feb 23: pass ns down rather than taking it from current

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Serge E. Hallyn
    Acked-by: "Eric W. Biederman"
    Acked-by: Daniel Lezcano
    Acked-by: David Howells
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

25 May, 2010

1 commit


23 Sep, 2009

1 commit

  • Make all seq_operations structs const, to help mitigate against
    revectoring user-triggerable function pointers.

    This is derived from the grsecurity patch, although generated from scratch
    because it's simpler than extracting the changes from there.

    Signed-off-by: James Morris
    Acked-by: Serge Hallyn
    Acked-by: Casey Schaufler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    James Morris
     

07 Apr, 2009

1 commit

  • Move mqueue vfsmount plus a few tunables into the ipc_namespace struct.
    The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the
    posix message queue namespaces and the SYSV ipc namespaces.

    The sysctl code will be fixed separately in patch 3. After just this
    patch, making a change to posix mqueue tunables always changes the values
    in the initial ipc namespace.

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

05 Jan, 2009

2 commits


04 Dec, 2008

1 commit


20 Nov, 2008

1 commit

  • A problem was found while reviewing the code after Bugzilla bug
    http://bugzilla.kernel.org/show_bug.cgi?id=11796.

    In ipc_addid(), the newly allocated ipc structure is inserted into the
    ipcs tree (i.e made visible to readers) without locking it. This is not
    correct since its initialization continues after it has been inserted in
    the tree.

    This patch moves the ipc structure lock initialization + locking before
    the actual insertion.

    Signed-off-by: Nadia Derbey
    Reported-by: Clement Calmels
    Cc: Manfred Spraul
    Cc: [2.6.27.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

14 Nov, 2008

1 commit

  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     

26 Jul, 2008

2 commits

  • Remove the ipc_lock_down() routines: they used to call idr_find() locklessly
    (given that the ipc ids lock was already held), so they are not needed
    anymore.

    Signed-off-by: Nadia Derbey
    Acked-by: "Paul E. McKenney"
    Cc: Manfred Spraul
    Cc: Jim Houston
    Cc: Pierre Peiffer
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Call idr_find() locklessly from ipc_lock(), since the idr tree is now RCU
    protected.

    Signed-off-by: Nadia Derbey
    Acked-by: "Paul E. McKenney"
    Cc: Manfred Spraul
    Cc: Jim Houston
    Cc: Pierre Peiffer
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

29 Apr, 2008

8 commits

  • Use proc_create_data() to make sure that ->proc_fops and ->data be setup
    before gluing PDE to main tree.

    Signed-off-by: Denis V. Lunev
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Cc: Nadia Derbey
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denis V. Lunev
     
  • Add definitions of USHORT_MAX and others into kernel. ipc uses it and slub
    implementation might also use it.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Zhang Yanmin
    Reviewed-by: Christoph Lameter
    Cc: Nadia Derbey
    Cc: "Pierre Peiffer"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang, Yanmin
     
  • semctl_down(), msgctl_down() and shmctl_down() are used to handle the same set
    of commands for each kind of IPC. They all start to do the same job (they
    retrieve the ipc and do some permission checks) before handling the commands
    on their own.

    This patch proposes to consolidate this by moving these same pieces of code
    into one common function called ipcctl_pre_down().

    It simplifies a little these xxxctl_down() functions and increases a little
    the maintainability.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • The IPC_SET command performs the same permission setting for all IPCs. This
    patch introduces a common ipc_update_perm() function to update these
    permissions and makes use of it for all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Make the memory hotplug chain's mutex held for a shorter time: when memory is
    offlined or onlined a work item is added to the global workqueue. When the
    work item is run, it notifies the ipcns notifier chain with the
    IPCNS_MEMCHANGED event.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Introduce the registration of a callback routine that recomputes msg_ctlmni
    upon memory add / remove.

    A single notifier block is registered in the hotplug memory chain for all the
    ipc namespaces.

    Since the ipc namespaces are not linked together, they have their own
    notification chain: one notifier_block is defined per ipc namespace.

    Each time an ipc namespace is created (removed) it registers (unregisters) its
    notifier block in (from) the ipcns chain. The callback routine registered in
    the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event.
    Each callback routine registered in the ipcns namespace, in turn, recomputes
    msgmni for the owning namespace.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Since all the namespaces see the same amount of memory (the total one) this
    patch introduces a new variable that counts the ipc namespaces and divides
    msg_ctlmni by this counter.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • By continuing to consolidate a little the IPC code, each id can be built
    directly in ipc_addid() instead of having it built from each callers of
    ipc_addid()

    And I also remove shm_addid() in order to have, as much as possible, the
    same code for shm/sem/msg.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     

09 Feb, 2008

3 commits

  • Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for
    msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids'
    are dynamically allocated for each icp_namespace as the ipc_namespace
    itself (for the init namespace, they are initialized with pointers to
    static variables instead)

    It is so for historical reason: in fact, before the use of idr to store the
    ipcs, the ipcs were stored in tables of variable length, depending of the
    maximum number of ipc allowed. Now, these 'struct ipc_ids' have a fixed
    size. As they are allocated in any cases for each new ipc_namespace, there
    is no gain of memory in having them allocated separately of the struct
    ipc_namespace.

    This patch proposes to make this table static in the struct ipc_namespace.
    Thus, we can allocate all in once and get rid of all the code needed to
    allocate and free these ipc_ids separately.

    Signed-off-by: Pierre Peiffer
    Acked-by: Cedric Le Goater
    Cc: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • ipc_lock_check_down(), ipc_lock_check() and ipcget() seem too large to be
    inline. Besides, they give no optimization being inline as they perform
    calls inside in any case.

    Moving them into ipc/util.c saves 500 bytes of vmlinux and shortens IPC
    internal API.

    $ ./scripts/bloat-o-meter vmlinux-orig vmlinux
    add/remove: 3/2 grow/shrink: 0/10 up/down: 490/-989 (-499)
    function old new delta
    ipcget - 392 +392
    ipc_lock_check_down - 49 +49
    ipc_lock_check - 49 +49
    sys_semget 119 105 -14
    sys_shmget 108 86 -22
    sys_msgget 100 78 -22
    do_msgsnd 665 631 -34
    do_msgrcv 680 644 -36
    do_shmat 771 733 -38
    sys_msgctl 1302 1229 -73
    ipcget_new 80 - -80
    sys_semtimedop 1534 1452 -82
    sys_semctl 2034 1922 -112
    sys_shmctl 1919 1765 -154
    ipcget_public 322 - -322

    The ipcget() growth is the result of gcc inlining of currently static
    ipcget_new/_public.

    Signed-off-by: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Currently the IPC namespace management code is spread over the ipc/*.c files.
    I moved this code into ipc/namespace.c file which is compiled out when needed.

    The linux/ipc_namespace.h file is used to store the prototypes of the
    functions in namespace.c and the stubs for NAMESPACES=n case. This is done
    so, because the stub for copy_ipc_namespace requires the knowledge of the
    CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in
    included into many many .c files via the sys.h->sem.h sequence so adding the
    sched.h into it will make all these .c depend on sched.h which is not that
    good. On the other hand the knowledge about the namespaces stuff is required
    in 4 .c files only.

    Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
    msg.c and shm.c files. It turned out that moving these functions into
    namespaces.c is not that easy because they use many other calls and macros
    from the original file. Moving them would make this patch complicated. On
    the other hand all these functions can be consolidated, so I will send a
    separate patch doing this a bit later.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

07 Feb, 2008

1 commit


20 Oct, 2007

1 commit

  • With the use of idr to store the ipc, the case where the idr cache is
    empty, when idr_get_new is called (this may happen even if we call
    idr_pre_get() before), is not well handled: it lets
    semget()/shmget()/msgget() return ENOSPC when this cache is empty, what 1.
    does not reflect the facts and 2. does not conform to the man(s).

    This patch fixes this by retrying the whole process of allocation in this case.

    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer