23 Jan, 2016

1 commit

  • There are many locations that do

    if (memory_was_allocated_by_vmalloc)
    vfree(ptr);
    else
    kfree(ptr);

    but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory
    using is_vmalloc_addr(). Unless callers have special reasons, we can
    replace this branch with kvfree(). Please check and reply if you found
    problems.

    Signed-off-by: Tetsuo Handa
    Acked-by: Michal Hocko
    Acked-by: Jan Kara
    Acked-by: Russell King
    Reviewed-by: Andreas Dilger
    Acked-by: "Rafael J. Wysocki"
    Acked-by: David Rientjes
    Cc: "Luck, Tony"
    Cc: Oleg Drokin
    Cc: Boris Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

01 Jul, 2015

1 commit


07 Jun, 2014

2 commits

  • trailing whitespace

    Signed-off-by: Paul McQuade
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul McQuade
     
  • There is no need to recreate the very same ipc_ops structure on every
    kernel entry for msgget/semget/shmget. Just declare it static and be
    done with it. While at it, constify it as we don't modify the structure
    at runtime.

    Found in the PaX patch, written by the PaX Team.

    Signed-off-by: Mathias Krause
    Cc: PaX Team
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathias Krause
     

28 Jan, 2014

4 commits

  • This field is only used to reset the ids seq number if it exceeds the
    smaller of INT_MAX/SEQ_MULTIPLIER and USHRT_MAX, and can therefore be
    moved out of the structure and into its own macro. Since each
    ipc_namespace contains a table of 3 pointers to struct ipc_ids we can
    save space in instruction text:

    text data bss dec hex filename
    56232 2348 24 58604 e4ec ipc/built-in.o
    56216 2348 24 58588 e4dc ipc/built-in.o-after

    Signed-off-by: Davidlohr Bueso
    Reviewed-by: Jonathan Gonzalez
    Cc: Aswin Chandramouleeswaran
    Cc: Rik van Riel
    Acked-by: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • The ipc code does not adhere the typical linux coding style.
    This patch fixes lots of simple whitespace errors.

    - mostly autogenerated by
    scripts/checkpatch.pl -f --fix \
    --types=pointer_location,spacing,space_before_tab
    - one manual fixup (keep structure members tab-aligned)
    - removal of additional space_before_tab that were not found by --fix

    Tested with some of my msg and sem test apps.

    Andrew: Could you include it in -mm and move it towards Linus' tree?

    Signed-off-by: Manfred Spraul
    Suggested-by: Li Bin
    Cc: Joe Perches
    Acked-by: Rafael Aquini
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • struct kern_ipc_perm.deleted is meant to be used as a boolean toggle, and
    the changes introduced by this patch are just to make the case explicit.

    Signed-off-by: Rafael Aquini
    Reviewed-by: Rik van Riel
    Cc: Greg Thelen
    Acked-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     
  • After the locking semantics for the SysV IPC API got improved, a couple
    of IPC_RMID race windows were opened because we ended up dropping the
    'kern_ipc_perm.deleted' check performed way down in ipc_lock(). The
    spotted races got sorted out by re-introducing the old test within the
    racy critical sections.

    This patch introduces ipc_valid_object() to consolidate the way we cope
    with IPC_RMID races by using the same abstraction across the API
    implementation.

    Signed-off-by: Rafael Aquini
    Acked-by: Rik van Riel
    Acked-by: Greg Thelen
    Reviewed-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     

13 Nov, 2013

1 commit

  • On 64 bit systems the test for negative message sizes is bogus as the
    size, which may be positive when evaluated as a long, will get truncated
    to an int when passed to load_msg(). So a long might very well contain a
    positive value but when truncated to an int it would become negative.

    That in combination with a small negative value of msg_ctlmax (which will
    be promoted to an unsigned type for the comparison against msgsz, making
    it a big positive value and therefore make it pass the check) will lead to
    two problems: 1/ The kmalloc() call in alloc_msg() will allocate a too
    small buffer as the addition of alen is effectively a subtraction. 2/ The
    copy_from_user() call in load_msg() will first overflow the buffer with
    userland data and then, when the userland access generates an access
    violation, the fixup handler copy_user_handle_tail() will try to fill the
    remainder with zeros -- roughly 4GB. That almost instantly results in a
    system crash or reset.

    ,-[ Reproducer (needs to be run as root) ]--
    | #include
    | #include
    | #include
    | #include
    |
    | int main(void) {
    | long msg = 1;
    | int fd;
    |
    | fd = open("/proc/sys/kernel/msgmax", O_WRONLY);
    | write(fd, "-1", 2);
    | close(fd);
    |
    | msgsnd(0, &msg, 0xfffffff0, IPC_NOWAIT);
    |
    | return 0;
    | }
    '---

    Fix the issue by preventing msgsz from getting truncated by consistently
    using size_t for the message length. This way the size checks in
    do_msgsnd() could still be passed with a negative value for msg_ctlmax but
    we would fail on the buffer allocation in that case and error out.

    Also change the type of m_ts from int to size_t to avoid similar nastiness
    in other code paths -- it is used in similar constructs, i.e. signed vs.
    unsigned checks. It should never become negative under normal
    circumstances, though.

    Setting msg_ctlmax to a negative value is an odd configuration and should
    be prevented. As that might break existing userland, it will be handled
    in a separate commit so it could easily be reverted and reworked without
    reintroducing the above described bug.

    Hardening mechanisms for user copy operations would have catched that bug
    early -- e.g. checking slab object sizes on user copy operations as the
    usercopy feature of the PaX patch does. Or, for that matter, detect the
    long vs. int sign change due to truncation, as the size overflow plugin
    of the very same patch does.

    [akpm@linux-foundation.org: fix i386 min() warnings]
    Signed-off-by: Mathias Krause
    Cc: Pax Team
    Cc: Davidlohr Bueso
    Cc: Brad Spengler
    Cc: Manfred Spraul
    Cc: [ v2.3.27+ -- yes, that old ;) ]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathias Krause
     

25 Sep, 2013

1 commit

  • Currently, IPC mechanisms do security and auditing related checks under
    RCU. However, since security modules can free the security structure,
    for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
    race if the structure is freed before other tasks are done with it,
    creating a use-after-free condition. Manfred illustrates this nicely,
    for instance with shared mem and selinux:

    -> do_shmat calls rcu_read_lock()
    -> do_shmat calls shm_object_check().
    Checks that the object is still valid - but doesn't acquire any locks.
    Then it returns.
    -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
    -> selinux_shm_shmat calls ipc_has_perm()
    -> ipc_has_perm accesses ipc_perms->security

    shm_close()
    -> shm_close acquires rw_mutex & shm_lock
    -> shm_close calls shm_destroy
    -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
    -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
    -> ipc_free_security calls kfree(ipc_perms->security)

    This patch delays the freeing of the security structures after all RCU
    readers are done. Furthermore it aligns the security life cycle with
    that of the rest of IPC - freeing them based on the reference counter.
    For situations where we need not free security, the current behavior is
    kept. Linus states:

    "... the old behavior was suspect for another reason too: having the
    security blob go away from under a user sounds like it could cause
    various other problems anyway, so I think the old code was at least
    _prone_ to bugs even if it didn't have catastrophic behavior."

    I have tested this patch with IPC testcases from LTP on both my
    quad-core laptop and on a 64 core NUMA server. In both cases selinux is
    enabled, and tests pass for both voluntary and forced preemption models.
    While the mentioned races are theoretical (at least no one as reported
    them), I wanted to make sure that this new logic doesn't break anything
    we weren't aware of.

    Suggested-by: Linus Torvalds
    Signed-off-by: Davidlohr Bueso
    Acked-by: Manfred Spraul
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

12 Sep, 2013

4 commits

  • No remaining users, we now use ipc_obtain_object_check().

    Signed-off-by: Davidlohr Bueso
    Cc: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • After previous cleanups and optimizations, this function is no longer
    heavily used and we don't have a good reason to keep it. Update the few
    remaining callers and get rid of it.

    Signed-off-by: Davidlohr Bueso
    Cc: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Since in some situations the lock can be shared for readers, we shouldn't
    be calling it a mutex, rename it to rwsem.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Now that sem, msgque and shm, through *_down(), all use the lockless
    variant of ipcctl_pre_down(), go ahead and delete it.

    [akpm@linux-foundation.org: fix function name in kerneldoc, cleanups]
    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

10 Jul, 2013

2 commits


01 May, 2013

5 commits

  • Introduce finer grained locking for semtimedop, to handle the common case
    of a program wanting to manipulate one semaphore from an array with
    multiple semaphores.

    If the call is a semop manipulating just one semaphore in an array with
    multiple semaphores, only take the lock for that semaphore itself.

    If the call needs to manipulate multiple semaphores, or another caller is
    in a transaction that manipulates multiple semaphores, the sem_array lock
    is taken, as well as all the locks for the individual semaphores.

    On a 24 CPU system, performance numbers with the semop-multi
    test with N threads and N semaphores, look like this:

    vanilla Davidlohr's Davidlohr's + Davidlohr's +
    threads patches rwlock patches v3 patches
    10 610652 726325 1783589 2142206
    20 341570 365699 1520453 1977878
    30 288102 307037 1498167 2037995
    40 290714 305955 1612665 2256484
    50 288620 312890 1733453 2650292
    60 289987 306043 1649360 2388008
    70 291298 306347 1723167 2717486
    80 290948 305662 1729545 2763582
    90 290996 306680 1736021 2757524
    100 292243 306700 1773700 3059159

    [davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
    [davidlohr.bueso@hp.com: make refcounter atomic]
    Signed-off-by: Rik van Riel
    Suggested-by: Linus Torvalds
    Acked-by: Davidlohr Bueso
    Cc: Chegu Vinod
    Cc: Jason Low
    Reviewed-by: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Emmanuel Benisty
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Instead of holding the ipc lock for permissions and security checks, among
    others, only acquire it when necessary.

    Some numbers....

    1) With Rik's semop-multi.c microbenchmark we can see the following
    results:

    Baseline (3.9-rc1):
    cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
    total operations: 151452270, ops/sec 5048409

    + 59.40% a.out [kernel.kallsyms] [k] _raw_spin_lock
    + 6.14% a.out [kernel.kallsyms] [k] sys_semtimedop
    + 3.84% a.out [kernel.kallsyms] [k] avc_has_perm_flags
    + 3.64% a.out [kernel.kallsyms] [k] __audit_syscall_exit
    + 2.06% a.out [kernel.kallsyms] [k] copy_user_enhanced_fast_string
    + 1.86% a.out [kernel.kallsyms] [k] ipc_lock

    With this patchset:
    cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
    total operations: 273156400, ops/sec 9105213

    + 18.54% a.out [kernel.kallsyms] [k] _raw_spin_lock
    + 11.72% a.out [kernel.kallsyms] [k] sys_semtimedop
    + 7.70% a.out [kernel.kallsyms] [k] ipc_has_perm.isra.21
    + 6.58% a.out [kernel.kallsyms] [k] avc_has_perm_flags
    + 6.54% a.out [kernel.kallsyms] [k] __audit_syscall_exit
    + 4.71% a.out [kernel.kallsyms] [k] ipc_obtain_object_check

    2) While on an Oracle swingbench DSS (data mining) workload the
    improvements are not as exciting as with Rik's benchmark, we can see
    some positive numbers. For an 8 socket machine the following are the
    percentages of %sys time incurred in the ipc lock:

    Baseline (3.9-rc1):
    100 swingbench users: 8,74%
    400 swingbench users: 21,86%
    800 swingbench users: 84,35%

    With this patchset:
    100 swingbench users: 8,11%
    400 swingbench users: 19,93%
    800 swingbench users: 77,69%

    [riel@redhat.com: fix two locking bugs]
    [sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Reviewed-by: Chegu Vinod
    Acked-by: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Jason Low
    Cc: Emmanuel Benisty
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Various forms of ipc use ipcctl_pre_down() to retrieve an ipc object and
    check permissions, mostly for IPC_RMID and IPC_SET commands.

    Introduce ipcctl_pre_down_nolock(), a lockless version of this function.
    The locking version is retained, yet modified to call the nolock version
    without affecting its semantics, thus transparent to all ipc callers.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Suggested-by: Linus Torvalds
    Cc: Chegu Vinod
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Through ipc_lock() and therefore ipc_lock_check() we currently return the
    locked ipc object. This is not necessary for all situations and can,
    therefore, cause unnecessary ipc lock contention.

    Introduce analogous ipc_obtain_object() and ipc_obtain_object_check()
    functions that only lookup and return the ipc object.

    Both these functions must be called within the RCU read critical section.

    [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno from ipc_lock()]
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Reviewed-by: Chegu Vinod
    Acked-by: Michel Lespinasse
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This series makes the sysv semaphore code more scalable, by reducing the
    time the semaphore lock is held, and making the locking more scalable for
    semaphore arrays with multiple semaphores.

    The first four patches were written by Davidlohr Buesso, and reduce the
    hold time of the semaphore lock.

    The last three patches change the sysv semaphore code locking to be more
    fine grained, providing a performance boost when multiple semaphores in a
    semaphore array are being manipulated simultaneously.

    On a 24 CPU system, performance numbers with the semop-multi
    test with N threads and N semaphores, look like this:

    vanilla Davidlohr's Davidlohr's + Davidlohr's +
    threads patches rwlock patches v3 patches
    10 610652 726325 1783589 2142206
    20 341570 365699 1520453 1977878
    30 288102 307037 1498167 2037995
    40 290714 305955 1612665 2256484
    50 288620 312890 1733453 2650292
    60 289987 306043 1649360 2388008
    70 291298 306347 1723167 2717486
    80 290948 305662 1729545 2763582
    90 290996 306680 1736021 2757524
    100 292243 306700 1773700 3059159

    This patch:

    There is no reason to be holding the ipc lock while reading ipcp->seq,
    hence remove misleading comment.

    Also simplify the return value for the function.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Cc: Chegu Vinod
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

05 Jan, 2013

2 commits

  • This patch is required for checkpoint/restore in userspace.

    c/r requires some way to get all pending IPC messages without deleting
    them from the queue (checkpoint can fail and in this case tasks will be
    resumed, so queue have to be valid).

    To achive this, new operation flag MSG_COPY for sys_msgrcv() system call
    was introduced. If this flag was specified, then mtype is interpreted as
    number of the message to copy.

    If MSG_COPY is set, then kernel will allocate dummy message with passed
    size, and then use new copy_msg() helper function to copy desired message
    (instead of unlinking it from the queue).

    Notes:

    1) Return -ENOSYS if MSG_COPY is specified, but
    CONFIG_CHECKPOINT_RESTORE is not set.

    Signed-off-by: Stanislav Kinsbursky
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Al Viro
    Cc: KOSAKI Motohiro
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     
  • Add 3 new variables and sysctls to tune them (by one "next_id" variable
    for messages, semaphores and shared memory respectively). This variable
    can be used to set desired id for next allocated IPC object. By default
    it's equal to -1 and old behaviour is preserved. If this variable is
    non-negative, then desired idr will be extracted from it and used as a
    start value to search for free IDR slot.

    Notes:

    1) this patch doesn't guarantee that the new object will have desired
    id. So it's up to user space how to handle new object with wrong id.

    2) After a sucessful id allocation attempt, "next_id" will be set back
    to -1 (if it was non-negative).

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Stanislav Kinsbursky
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Al Viro
    Cc: KOSAKI Motohiro
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     

07 Sep, 2012

1 commit

  • - Store the ipc owner and creator with a kuid
    - Store the ipc group and the crators group with a kgid.
    - Add error handling to ipc_update_perms, allowing it to
    fail if the uids and gids can not be converted to kuids
    or kgids.
    - Modify the proc files to display the ipc creator and
    owner in the user namespace of the opener of the proc file.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

31 Jul, 2012

1 commit

  • Rather than #define the options manually in the architecture code, add
    Kconfig options for them and select them there instead. This also allows
    us to select the compat IPC version parsing automatically for platforms
    using the old compat IPC interface.

    Reported-by: Andrew Morton
    Signed-off-by: Will Deacon
    Cc: Arnd Bergmann
    Cc: Chris Metcalf
    Cc: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon
     

24 Mar, 2011

1 commit

  • CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
    because the resource comes from current's own ipc namespace.

    setuid/setgid are to uids in own namespace, so again checks can be against
    current_user_ns().

    Changelog:
    Jan 11: Use task_ns_capable() in place of sched_capable().
    Jan 11: Use nsown_capable() as suggested by Bastian Blank.
    Jan 11: Clarify (hopefully) some logic in futex and sched.c
    Feb 15: use ns_capable for ipc, not nsown_capable
    Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
    Feb 23: pass ns down rather than taking it from current

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Serge E. Hallyn
    Acked-by: "Eric W. Biederman"
    Acked-by: Daniel Lezcano
    Acked-by: David Howells
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

22 Jun, 2009

1 commit

  • 31a985f "ipc: use __ARCH_WANT_IPC_PARSE_VERSION in ipc/util.h" would
    choose the implementation of ipc_parse_version() based on a symbol
    defined in .

    But it failed to also include this header and thus broke
    IPC_64-passing 32-bit userspace because the flag wasn't masked out
    properly anymore and the command not understood.

    Include to give the architecture a chance to ask for
    the no-no-op ipc_parse_version().

    Signed-off-by: Johannes Weiner
    Acked-by: Arnd Bergmann
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

19 Jun, 2009

2 commits

  • Function is really private to ipc/ and avoid struct kern_ipc_perm
    forward declaration.

    Signed-off-by: Alexey Dobriyan
    Reviewed-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • The definition of ipc_parse_version depends on
    __ARCH_WANT_IPC_PARSE_VERSION, but the header file declares it
    conditionally based on the architecture.

    Use the macro consistently to make it easier to add new architectures.

    Signed-off-by: Arnd Bergmann
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

07 Apr, 2009

2 commits

  • Implement multiple mounts of the mqueue file system, and link it to usage
    of CLONE_NEWIPC.

    Each ipc ns has a corresponding mqueuefs superblock. When a user does
    clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an
    internal mount of a new mqueuefs sb linked to the new ipc ns.

    When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the
    mqueuefs superblock.

    Posix message queues can be worked with both through the mq_* system calls
    (see mq_overview(7)), and through the VFS through the mqueue mount. Any
    usage of mq_open() and friends will work with the acting task's ipc
    namespace. Any actions through the VFS will work with the mqueuefs in
    which the file was created. So if a user doesn't remount mqueuefs after
    unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls
    /dev/mqueue".

    If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns,
    ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1)
    ipc_ns:1 will be freed, (2) it's superblock will live on until task b
    umounts the corresponding mqueuefs, and vfs actions will continue to
    succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to
    the deceased ipc_ns:1.

    To make this happen, we must protect the ipc reference count when

    a) a task exits and drops its ipcns->count, since it might be dropping
    it to 0 and freeing the ipcns

    b) a task accesses the ipcns through its mqueuefs interface, since it
    bumps the ipcns refcount and might race with the last task in the ipcns
    exiting.

    So the kref is changed to an atomic_t so we can use
    atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns
    through ns = mqueuefs_sb->s_fs_info is protected by the same lock.

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Move mqueue vfsmount plus a few tunables into the ipc_namespace struct.
    The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the
    posix message queue namespaces and the SYSV ipc namespaces.

    The sysctl code will be fixed separately in patch 3. After just this
    patch, making a change to posix mqueue tunables always changes the values
    in the initial ipc namespace.

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

26 Jul, 2008

1 commit

  • Remove the ipc_lock_down() routines: they used to call idr_find() locklessly
    (given that the ipc ids lock was already held), so they are not needed
    anymore.

    Signed-off-by: Nadia Derbey
    Acked-by: "Paul E. McKenney"
    Cc: Manfred Spraul
    Cc: Jim Houston
    Cc: Pierre Peiffer
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

29 Apr, 2008

4 commits

  • Add definitions of USHORT_MAX and others into kernel. ipc uses it and slub
    implementation might also use it.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Zhang Yanmin
    Reviewed-by: Christoph Lameter
    Cc: Nadia Derbey
    Cc: "Pierre Peiffer"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang, Yanmin
     
  • semctl_down(), msgctl_down() and shmctl_down() are used to handle the same set
    of commands for each kind of IPC. They all start to do the same job (they
    retrieve the ipc and do some permission checks) before handling the commands
    on their own.

    This patch proposes to consolidate this by moving these same pieces of code
    into one common function called ipcctl_pre_down().

    It simplifies a little these xxxctl_down() functions and increases a little
    the maintainability.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • The IPC_SET command performs the same permission setting for all IPCs. This
    patch introduces a common ipc_update_perm() function to update these
    permissions and makes use of it for all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Introduce the registration of a callback routine that recomputes msg_ctlmni
    upon memory add / remove.

    A single notifier block is registered in the hotplug memory chain for all the
    ipc namespaces.

    Since the ipc namespaces are not linked together, they have their own
    notification chain: one notifier_block is defined per ipc namespace.

    Each time an ipc namespace is created (removed) it registers (unregisters) its
    notifier block in (from) the ipcns chain. The callback routine registered in
    the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event.
    Each callback routine registered in the ipcns namespace, in turn, recomputes
    msgmni for the owning namespace.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

09 Feb, 2008

3 commits

  • Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for
    msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids'
    are dynamically allocated for each icp_namespace as the ipc_namespace
    itself (for the init namespace, they are initialized with pointers to
    static variables instead)

    It is so for historical reason: in fact, before the use of idr to store the
    ipcs, the ipcs were stored in tables of variable length, depending of the
    maximum number of ipc allowed. Now, these 'struct ipc_ids' have a fixed
    size. As they are allocated in any cases for each new ipc_namespace, there
    is no gain of memory in having them allocated separately of the struct
    ipc_namespace.

    This patch proposes to make this table static in the struct ipc_namespace.
    Thus, we can allocate all in once and get rid of all the code needed to
    allocate and free these ipc_ids separately.

    Signed-off-by: Pierre Peiffer
    Acked-by: Cedric Le Goater
    Cc: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • ipc_lock_check_down(), ipc_lock_check() and ipcget() seem too large to be
    inline. Besides, they give no optimization being inline as they perform
    calls inside in any case.

    Moving them into ipc/util.c saves 500 bytes of vmlinux and shortens IPC
    internal API.

    $ ./scripts/bloat-o-meter vmlinux-orig vmlinux
    add/remove: 3/2 grow/shrink: 0/10 up/down: 490/-989 (-499)
    function old new delta
    ipcget - 392 +392
    ipc_lock_check_down - 49 +49
    ipc_lock_check - 49 +49
    sys_semget 119 105 -14
    sys_shmget 108 86 -22
    sys_msgget 100 78 -22
    do_msgsnd 665 631 -34
    do_msgrcv 680 644 -36
    do_shmat 771 733 -38
    sys_msgctl 1302 1229 -73
    ipcget_new 80 - -80
    sys_semtimedop 1534 1452 -82
    sys_semctl 2034 1922 -112
    sys_shmctl 1919 1765 -154
    ipcget_public 322 - -322

    The ipcget() growth is the result of gcc inlining of currently static
    ipcget_new/_public.

    Signed-off-by: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Currently the IPC namespace management code is spread over the ipc/*.c files.
    I moved this code into ipc/namespace.c file which is compiled out when needed.

    The linux/ipc_namespace.h file is used to store the prototypes of the
    functions in namespace.c and the stubs for NAMESPACES=n case. This is done
    so, because the stub for copy_ipc_namespace requires the knowledge of the
    CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in
    included into many many .c files via the sys.h->sem.h sequence so adding the
    sched.h into it will make all these .c depend on sched.h which is not that
    good. On the other hand the knowledge about the namespaces stuff is required
    in 4 .c files only.

    Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
    msg.c and shm.c files. It turned out that moving these functions into
    namespaces.c is not that easy because they use many other calls and macros
    from the original file. Moving them would make this patch complicated. On
    the other hand all these functions can be consolidated, so I will send a
    separate patch doing this a bit later.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

20 Oct, 2007

1 commit