02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

15 Sep, 2017

1 commit

  • Pull ipc compat cleanup and 64-bit time_t from Al Viro:
    "IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa
    Dinamani"

    * 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    utimes: Make utimes y2038 safe
    ipc: shm: Make shmid_kernel timestamps y2038 safe
    ipc: sem: Make sem_array timestamps y2038 safe
    ipc: msg: Make msg_queue timestamps y2038 safe
    ipc: mqueue: Replace timespec with timespec64
    ipc: Make sys_semtimedop() y2038 safe
    get rid of SYSVIPC_COMPAT on ia64
    semtimedop(): move compat to native
    shmat(2): move compat to native
    msgrcv(2), msgsnd(2): move compat to native
    ipc(2): move compat to native
    ipc: make use of compat ipc_perm helpers
    semctl(): move compat to native
    semctl(): separate all layout-dependent copyin/copyout
    msgctl(): move compat to native
    msgctl(): split the actual work from copyin/copyout
    ipc: move compat shmctl to native
    shmctl: split the work from copyin/copyout

    Linus Torvalds
     

09 Sep, 2017

1 commit

  • ipc_findkey() used to scan all objects to look for the wanted key. This
    is slow when using a high number of keys. This change adds an rhashtable
    of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n).

    This change gives a 865% improvement of benchmark reaim.jobs_per_min on a
    56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1]

    Other (more micro) benchmark results, by the author: On an i5 laptop, the
    following loop executed right after a reboot took, without and with this
    change:

    for (int i = 0, k=0x424242; i < KEYS; ++i)
    semget(k++, 1, IPC_CREAT | 0600);

    total total max single max single
    KEYS without with call without call with

    1 3.5 4.9 µs 3.5 4.9
    10 7.6 8.6 µs 3.7 4.7
    32 16.2 15.9 µs 4.3 5.3
    100 72.9 41.8 µs 3.7 4.7
    1000 5,630.0 502.0 µs * *
    10000 1,340,000.0 7,240.0 µs * *
    31900 17,600,000.0 22,200.0 µs * *

    *: unreliable measure: high variance

    The duration for a lookup-only usage was obtained by the same loop once
    the keys are present:

    total total max single max single
    KEYS without with call without call with

    1 2.1 2.5 µs 2.1 2.5
    10 4.5 4.8 µs 2.2 2.3
    32 13.0 10.8 µs 2.3 2.8
    100 82.9 25.1 µs * 2.3
    1000 5,780.0 217.0 µs * *
    10000 1,470,000.0 2,520.0 µs * *
    31900 17,400,000.0 7,810.0 µs * *

    Finally, executing each semget() in a new process gave, when still
    summing only the durations of these syscalls:

    creation:
    total total
    KEYS without with

    1 3.7 5.0 µs
    10 32.9 36.7 µs
    32 125.0 109.0 µs
    100 523.0 353.0 µs
    1000 20,300.0 3,280.0 µs
    10000 2,470,000.0 46,700.0 µs
    31900 27,800,000.0 219,000.0 µs

    lookup-only:
    total total
    KEYS without with

    1 2.5 2.7 µs
    10 25.4 24.4 µs
    32 106.0 72.6 µs
    100 591.0 352.0 µs
    1000 22,400.0 2,250.0 µs
    10000 2,510,000.0 25,700.0 µs
    31900 28,200,000.0 115,000.0 µs

    [1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop

    Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix
    Signed-off-by: Guillaume Knispel
    Reviewed-by: Marc Pardo
    Cc: Davidlohr Bueso
    Cc: Kees Cook
    Cc: Manfred Spraul
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Cc: "Peter Zijlstra (Intel)"
    Cc: Ingo Molnar
    Cc: Sebastian Andrzej Siewior
    Cc: Serge Hallyn
    Cc: Andrey Vagin
    Cc: Guillaume Knispel
    Cc: Marc Pardo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guillaume Knispel
     

16 Jul, 2017

2 commits


13 Jul, 2017

5 commits

  • Now that ipc_rcu_alloc() and ipc_rcu_free() are removed, document when
    it is valid to use ipc_getref() and ipc_putref().

    Link: http://lkml.kernel.org/r/20170525185107.12869-21-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • No callers remain for ipc_rcu_alloc(). Drop the function.

    [manfred@colorfullife.com: Rediff because the memset was temporarily inside ipc_rcu_free()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-13-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • There are no more callers of ipc_rcu_free(), so remove it.

    Link: http://lkml.kernel.org/r/20170525185107.12869-9-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • The only users of ipc_alloc() were ipc_rcu_alloc() and the on-heap
    sem_io fall-back memory. Better to just open-code these to make things
    easier to read.

    [manfred@colorfullife.com: Rediff due to inclusion of memset() into ipc_rcu_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-5-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • ipc has two management structures that exist for every id:
    - struct kern_ipc_perm, it contains e.g. the permissions.
    - struct ipc_rcu, it contains the rcu head for rcu handling and the
    refcount.

    The patch merges both structures.

    As a bonus, we may save one cacheline, because both structures are
    cacheline aligned. In addition, it reduces the number of casts, instead
    most codepaths can use container_of.

    To simplify code, the ipc_rcu_alloc initializes the allocation to 0.

    [manfred@colorfullife.com: really include the memset() into ipc_alloc_rcu()]
    Link: http://lkml.kernel.org/r/564f8612-0601-b267-514f-a9f650ec9b32@colorfullife.com
    Link: http://lkml.kernel.org/r/20170525185107.12869-3-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

18 Apr, 2017

1 commit


23 Jan, 2016

1 commit

  • There are many locations that do

    if (memory_was_allocated_by_vmalloc)
    vfree(ptr);
    else
    kfree(ptr);

    but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory
    using is_vmalloc_addr(). Unless callers have special reasons, we can
    replace this branch with kvfree(). Please check and reply if you found
    problems.

    Signed-off-by: Tetsuo Handa
    Acked-by: Michal Hocko
    Acked-by: Jan Kara
    Acked-by: Russell King
    Reviewed-by: Andreas Dilger
    Acked-by: "Rafael J. Wysocki"
    Acked-by: David Rientjes
    Cc: "Luck, Tony"
    Cc: Oleg Drokin
    Cc: Boris Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

01 Jul, 2015

1 commit


07 Jun, 2014

2 commits

  • trailing whitespace

    Signed-off-by: Paul McQuade
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul McQuade
     
  • There is no need to recreate the very same ipc_ops structure on every
    kernel entry for msgget/semget/shmget. Just declare it static and be
    done with it. While at it, constify it as we don't modify the structure
    at runtime.

    Found in the PaX patch, written by the PaX Team.

    Signed-off-by: Mathias Krause
    Cc: PaX Team
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathias Krause
     

28 Jan, 2014

4 commits

  • This field is only used to reset the ids seq number if it exceeds the
    smaller of INT_MAX/SEQ_MULTIPLIER and USHRT_MAX, and can therefore be
    moved out of the structure and into its own macro. Since each
    ipc_namespace contains a table of 3 pointers to struct ipc_ids we can
    save space in instruction text:

    text data bss dec hex filename
    56232 2348 24 58604 e4ec ipc/built-in.o
    56216 2348 24 58588 e4dc ipc/built-in.o-after

    Signed-off-by: Davidlohr Bueso
    Reviewed-by: Jonathan Gonzalez
    Cc: Aswin Chandramouleeswaran
    Cc: Rik van Riel
    Acked-by: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • The ipc code does not adhere the typical linux coding style.
    This patch fixes lots of simple whitespace errors.

    - mostly autogenerated by
    scripts/checkpatch.pl -f --fix \
    --types=pointer_location,spacing,space_before_tab
    - one manual fixup (keep structure members tab-aligned)
    - removal of additional space_before_tab that were not found by --fix

    Tested with some of my msg and sem test apps.

    Andrew: Could you include it in -mm and move it towards Linus' tree?

    Signed-off-by: Manfred Spraul
    Suggested-by: Li Bin
    Cc: Joe Perches
    Acked-by: Rafael Aquini
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • struct kern_ipc_perm.deleted is meant to be used as a boolean toggle, and
    the changes introduced by this patch are just to make the case explicit.

    Signed-off-by: Rafael Aquini
    Reviewed-by: Rik van Riel
    Cc: Greg Thelen
    Acked-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     
  • After the locking semantics for the SysV IPC API got improved, a couple
    of IPC_RMID race windows were opened because we ended up dropping the
    'kern_ipc_perm.deleted' check performed way down in ipc_lock(). The
    spotted races got sorted out by re-introducing the old test within the
    racy critical sections.

    This patch introduces ipc_valid_object() to consolidate the way we cope
    with IPC_RMID races by using the same abstraction across the API
    implementation.

    Signed-off-by: Rafael Aquini
    Acked-by: Rik van Riel
    Acked-by: Greg Thelen
    Reviewed-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     

13 Nov, 2013

1 commit

  • On 64 bit systems the test for negative message sizes is bogus as the
    size, which may be positive when evaluated as a long, will get truncated
    to an int when passed to load_msg(). So a long might very well contain a
    positive value but when truncated to an int it would become negative.

    That in combination with a small negative value of msg_ctlmax (which will
    be promoted to an unsigned type for the comparison against msgsz, making
    it a big positive value and therefore make it pass the check) will lead to
    two problems: 1/ The kmalloc() call in alloc_msg() will allocate a too
    small buffer as the addition of alen is effectively a subtraction. 2/ The
    copy_from_user() call in load_msg() will first overflow the buffer with
    userland data and then, when the userland access generates an access
    violation, the fixup handler copy_user_handle_tail() will try to fill the
    remainder with zeros -- roughly 4GB. That almost instantly results in a
    system crash or reset.

    ,-[ Reproducer (needs to be run as root) ]--
    | #include
    | #include
    | #include
    | #include
    |
    | int main(void) {
    | long msg = 1;
    | int fd;
    |
    | fd = open("/proc/sys/kernel/msgmax", O_WRONLY);
    | write(fd, "-1", 2);
    | close(fd);
    |
    | msgsnd(0, &msg, 0xfffffff0, IPC_NOWAIT);
    |
    | return 0;
    | }
    '---

    Fix the issue by preventing msgsz from getting truncated by consistently
    using size_t for the message length. This way the size checks in
    do_msgsnd() could still be passed with a negative value for msg_ctlmax but
    we would fail on the buffer allocation in that case and error out.

    Also change the type of m_ts from int to size_t to avoid similar nastiness
    in other code paths -- it is used in similar constructs, i.e. signed vs.
    unsigned checks. It should never become negative under normal
    circumstances, though.

    Setting msg_ctlmax to a negative value is an odd configuration and should
    be prevented. As that might break existing userland, it will be handled
    in a separate commit so it could easily be reverted and reworked without
    reintroducing the above described bug.

    Hardening mechanisms for user copy operations would have catched that bug
    early -- e.g. checking slab object sizes on user copy operations as the
    usercopy feature of the PaX patch does. Or, for that matter, detect the
    long vs. int sign change due to truncation, as the size overflow plugin
    of the very same patch does.

    [akpm@linux-foundation.org: fix i386 min() warnings]
    Signed-off-by: Mathias Krause
    Cc: Pax Team
    Cc: Davidlohr Bueso
    Cc: Brad Spengler
    Cc: Manfred Spraul
    Cc: [ v2.3.27+ -- yes, that old ;) ]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathias Krause
     

25 Sep, 2013

1 commit

  • Currently, IPC mechanisms do security and auditing related checks under
    RCU. However, since security modules can free the security structure,
    for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
    race if the structure is freed before other tasks are done with it,
    creating a use-after-free condition. Manfred illustrates this nicely,
    for instance with shared mem and selinux:

    -> do_shmat calls rcu_read_lock()
    -> do_shmat calls shm_object_check().
    Checks that the object is still valid - but doesn't acquire any locks.
    Then it returns.
    -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
    -> selinux_shm_shmat calls ipc_has_perm()
    -> ipc_has_perm accesses ipc_perms->security

    shm_close()
    -> shm_close acquires rw_mutex & shm_lock
    -> shm_close calls shm_destroy
    -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
    -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
    -> ipc_free_security calls kfree(ipc_perms->security)

    This patch delays the freeing of the security structures after all RCU
    readers are done. Furthermore it aligns the security life cycle with
    that of the rest of IPC - freeing them based on the reference counter.
    For situations where we need not free security, the current behavior is
    kept. Linus states:

    "... the old behavior was suspect for another reason too: having the
    security blob go away from under a user sounds like it could cause
    various other problems anyway, so I think the old code was at least
    _prone_ to bugs even if it didn't have catastrophic behavior."

    I have tested this patch with IPC testcases from LTP on both my
    quad-core laptop and on a 64 core NUMA server. In both cases selinux is
    enabled, and tests pass for both voluntary and forced preemption models.
    While the mentioned races are theoretical (at least no one as reported
    them), I wanted to make sure that this new logic doesn't break anything
    we weren't aware of.

    Suggested-by: Linus Torvalds
    Signed-off-by: Davidlohr Bueso
    Acked-by: Manfred Spraul
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

12 Sep, 2013

4 commits

  • No remaining users, we now use ipc_obtain_object_check().

    Signed-off-by: Davidlohr Bueso
    Cc: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • After previous cleanups and optimizations, this function is no longer
    heavily used and we don't have a good reason to keep it. Update the few
    remaining callers and get rid of it.

    Signed-off-by: Davidlohr Bueso
    Cc: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Since in some situations the lock can be shared for readers, we shouldn't
    be calling it a mutex, rename it to rwsem.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Now that sem, msgque and shm, through *_down(), all use the lockless
    variant of ipcctl_pre_down(), go ahead and delete it.

    [akpm@linux-foundation.org: fix function name in kerneldoc, cleanups]
    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

10 Jul, 2013

2 commits


01 May, 2013

5 commits

  • Introduce finer grained locking for semtimedop, to handle the common case
    of a program wanting to manipulate one semaphore from an array with
    multiple semaphores.

    If the call is a semop manipulating just one semaphore in an array with
    multiple semaphores, only take the lock for that semaphore itself.

    If the call needs to manipulate multiple semaphores, or another caller is
    in a transaction that manipulates multiple semaphores, the sem_array lock
    is taken, as well as all the locks for the individual semaphores.

    On a 24 CPU system, performance numbers with the semop-multi
    test with N threads and N semaphores, look like this:

    vanilla Davidlohr's Davidlohr's + Davidlohr's +
    threads patches rwlock patches v3 patches
    10 610652 726325 1783589 2142206
    20 341570 365699 1520453 1977878
    30 288102 307037 1498167 2037995
    40 290714 305955 1612665 2256484
    50 288620 312890 1733453 2650292
    60 289987 306043 1649360 2388008
    70 291298 306347 1723167 2717486
    80 290948 305662 1729545 2763582
    90 290996 306680 1736021 2757524
    100 292243 306700 1773700 3059159

    [davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
    [davidlohr.bueso@hp.com: make refcounter atomic]
    Signed-off-by: Rik van Riel
    Suggested-by: Linus Torvalds
    Acked-by: Davidlohr Bueso
    Cc: Chegu Vinod
    Cc: Jason Low
    Reviewed-by: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Emmanuel Benisty
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Instead of holding the ipc lock for permissions and security checks, among
    others, only acquire it when necessary.

    Some numbers....

    1) With Rik's semop-multi.c microbenchmark we can see the following
    results:

    Baseline (3.9-rc1):
    cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
    total operations: 151452270, ops/sec 5048409

    + 59.40% a.out [kernel.kallsyms] [k] _raw_spin_lock
    + 6.14% a.out [kernel.kallsyms] [k] sys_semtimedop
    + 3.84% a.out [kernel.kallsyms] [k] avc_has_perm_flags
    + 3.64% a.out [kernel.kallsyms] [k] __audit_syscall_exit
    + 2.06% a.out [kernel.kallsyms] [k] copy_user_enhanced_fast_string
    + 1.86% a.out [kernel.kallsyms] [k] ipc_lock

    With this patchset:
    cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
    total operations: 273156400, ops/sec 9105213

    + 18.54% a.out [kernel.kallsyms] [k] _raw_spin_lock
    + 11.72% a.out [kernel.kallsyms] [k] sys_semtimedop
    + 7.70% a.out [kernel.kallsyms] [k] ipc_has_perm.isra.21
    + 6.58% a.out [kernel.kallsyms] [k] avc_has_perm_flags
    + 6.54% a.out [kernel.kallsyms] [k] __audit_syscall_exit
    + 4.71% a.out [kernel.kallsyms] [k] ipc_obtain_object_check

    2) While on an Oracle swingbench DSS (data mining) workload the
    improvements are not as exciting as with Rik's benchmark, we can see
    some positive numbers. For an 8 socket machine the following are the
    percentages of %sys time incurred in the ipc lock:

    Baseline (3.9-rc1):
    100 swingbench users: 8,74%
    400 swingbench users: 21,86%
    800 swingbench users: 84,35%

    With this patchset:
    100 swingbench users: 8,11%
    400 swingbench users: 19,93%
    800 swingbench users: 77,69%

    [riel@redhat.com: fix two locking bugs]
    [sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Reviewed-by: Chegu Vinod
    Acked-by: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Jason Low
    Cc: Emmanuel Benisty
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Various forms of ipc use ipcctl_pre_down() to retrieve an ipc object and
    check permissions, mostly for IPC_RMID and IPC_SET commands.

    Introduce ipcctl_pre_down_nolock(), a lockless version of this function.
    The locking version is retained, yet modified to call the nolock version
    without affecting its semantics, thus transparent to all ipc callers.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Suggested-by: Linus Torvalds
    Cc: Chegu Vinod
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Through ipc_lock() and therefore ipc_lock_check() we currently return the
    locked ipc object. This is not necessary for all situations and can,
    therefore, cause unnecessary ipc lock contention.

    Introduce analogous ipc_obtain_object() and ipc_obtain_object_check()
    functions that only lookup and return the ipc object.

    Both these functions must be called within the RCU read critical section.

    [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno from ipc_lock()]
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Reviewed-by: Chegu Vinod
    Acked-by: Michel Lespinasse
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This series makes the sysv semaphore code more scalable, by reducing the
    time the semaphore lock is held, and making the locking more scalable for
    semaphore arrays with multiple semaphores.

    The first four patches were written by Davidlohr Buesso, and reduce the
    hold time of the semaphore lock.

    The last three patches change the sysv semaphore code locking to be more
    fine grained, providing a performance boost when multiple semaphores in a
    semaphore array are being manipulated simultaneously.

    On a 24 CPU system, performance numbers with the semop-multi
    test with N threads and N semaphores, look like this:

    vanilla Davidlohr's Davidlohr's + Davidlohr's +
    threads patches rwlock patches v3 patches
    10 610652 726325 1783589 2142206
    20 341570 365699 1520453 1977878
    30 288102 307037 1498167 2037995
    40 290714 305955 1612665 2256484
    50 288620 312890 1733453 2650292
    60 289987 306043 1649360 2388008
    70 291298 306347 1723167 2717486
    80 290948 305662 1729545 2763582
    90 290996 306680 1736021 2757524
    100 292243 306700 1773700 3059159

    This patch:

    There is no reason to be holding the ipc lock while reading ipcp->seq,
    hence remove misleading comment.

    Also simplify the return value for the function.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Cc: Chegu Vinod
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

05 Jan, 2013

2 commits

  • This patch is required for checkpoint/restore in userspace.

    c/r requires some way to get all pending IPC messages without deleting
    them from the queue (checkpoint can fail and in this case tasks will be
    resumed, so queue have to be valid).

    To achive this, new operation flag MSG_COPY for sys_msgrcv() system call
    was introduced. If this flag was specified, then mtype is interpreted as
    number of the message to copy.

    If MSG_COPY is set, then kernel will allocate dummy message with passed
    size, and then use new copy_msg() helper function to copy desired message
    (instead of unlinking it from the queue).

    Notes:

    1) Return -ENOSYS if MSG_COPY is specified, but
    CONFIG_CHECKPOINT_RESTORE is not set.

    Signed-off-by: Stanislav Kinsbursky
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Al Viro
    Cc: KOSAKI Motohiro
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     
  • Add 3 new variables and sysctls to tune them (by one "next_id" variable
    for messages, semaphores and shared memory respectively). This variable
    can be used to set desired id for next allocated IPC object. By default
    it's equal to -1 and old behaviour is preserved. If this variable is
    non-negative, then desired idr will be extracted from it and used as a
    start value to search for free IDR slot.

    Notes:

    1) this patch doesn't guarantee that the new object will have desired
    id. So it's up to user space how to handle new object with wrong id.

    2) After a sucessful id allocation attempt, "next_id" will be set back
    to -1 (if it was non-negative).

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Stanislav Kinsbursky
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Al Viro
    Cc: KOSAKI Motohiro
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     

07 Sep, 2012

1 commit

  • - Store the ipc owner and creator with a kuid
    - Store the ipc group and the crators group with a kgid.
    - Add error handling to ipc_update_perms, allowing it to
    fail if the uids and gids can not be converted to kuids
    or kgids.
    - Modify the proc files to display the ipc creator and
    owner in the user namespace of the opener of the proc file.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

31 Jul, 2012

1 commit

  • Rather than #define the options manually in the architecture code, add
    Kconfig options for them and select them there instead. This also allows
    us to select the compat IPC version parsing automatically for platforms
    using the old compat IPC interface.

    Reported-by: Andrew Morton
    Signed-off-by: Will Deacon
    Cc: Arnd Bergmann
    Cc: Chris Metcalf
    Cc: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon
     

24 Mar, 2011

1 commit

  • CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
    because the resource comes from current's own ipc namespace.

    setuid/setgid are to uids in own namespace, so again checks can be against
    current_user_ns().

    Changelog:
    Jan 11: Use task_ns_capable() in place of sched_capable().
    Jan 11: Use nsown_capable() as suggested by Bastian Blank.
    Jan 11: Clarify (hopefully) some logic in futex and sched.c
    Feb 15: use ns_capable for ipc, not nsown_capable
    Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
    Feb 23: pass ns down rather than taking it from current

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Serge E. Hallyn
    Acked-by: "Eric W. Biederman"
    Acked-by: Daniel Lezcano
    Acked-by: David Howells
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

22 Jun, 2009

1 commit

  • 31a985f "ipc: use __ARCH_WANT_IPC_PARSE_VERSION in ipc/util.h" would
    choose the implementation of ipc_parse_version() based on a symbol
    defined in .

    But it failed to also include this header and thus broke
    IPC_64-passing 32-bit userspace because the flag wasn't masked out
    properly anymore and the command not understood.

    Include to give the architecture a chance to ask for
    the no-no-op ipc_parse_version().

    Signed-off-by: Johannes Weiner
    Acked-by: Arnd Bergmann
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

19 Jun, 2009

2 commits

  • Function is really private to ipc/ and avoid struct kern_ipc_perm
    forward declaration.

    Signed-off-by: Alexey Dobriyan
    Reviewed-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • The definition of ipc_parse_version depends on
    __ARCH_WANT_IPC_PARSE_VERSION, but the header file declares it
    conditionally based on the architecture.

    Use the macro consistently to make it easier to add new architectures.

    Signed-off-by: Arnd Bergmann
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann