20 Apr, 2018

2 commits

  • Three ipc syscalls (mq_timedsend, mq_timedreceive and and semtimedop)
    take a timespec argument. After we move 32-bit architectures over to
    useing 64-bit time_t based syscalls, we need seperate entry points for
    the old 32-bit based interfaces.

    This changes the #ifdef guards for the existing 32-bit compat syscalls
    to check for CONFIG_COMPAT_32BIT_TIME instead, which will then be
    enabled on all existing 32-bit architectures.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • This is a preparatation for changing over __kernel_timespec to 64-bit
    times, which involves assigning new system call numbers for mq_timedsend(),
    mq_timedreceive() and semtimedop() for compatibility with future y2038
    proof user space.

    The existing ABIs will remain available through compat code.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

25 Mar, 2018

1 commit

  • This reverts commit 36735a6a2b5e042db1af956ce4bcc13f3ff99e21.

    Aleksa Sarai writes:
    > [REGRESSION v4.16-rc6] [PATCH] mqueue: forbid unprivileged user access to internal mount
    >
    > Felix reported weird behaviour on 4.16.0-rc6 with regards to mqueue[1],
    > which was introduced by 36735a6a2b5e ("mqueue: switch to on-demand
    > creation of internal mount").
    >
    > Basically, the reproducer boils down to being able to mount mqueue if
    > you create a new user namespace, even if you don't unshare the IPC
    > namespace.
    >
    > Previously this was not possible, and you would get an -EPERM. The mount
    > is the *host* mqueue mount, which is being cached and just returned from
    > mqueue_mount(). To be honest, I'm not sure if this is safe or not (or if
    > it was intentional -- since I'm not familiar with mqueue).
    >
    > To me it looks like there is a missing permission check. I've included a
    > patch below that I've compile-tested, and should block the above case.
    > Can someone please tell me if I'm missing something? Is this actually
    > safe?
    >
    > [1]: https://github.com/docker/docker/issues/36674

    The issue is a lot deeper than a missing permission check. sb->s_user_ns
    was is improperly set as well. So in addition to the filesystem being
    mounted when it should not be mounted, so things are not allow that should
    be.

    We are practically to the release of 4.16 and there is no agreement between
    Al Viro and myself on what the code should looks like to fix things properly.
    So revert the code to what it was before so that we can take our time
    and discuss this properly.

    Fixes: 36735a6a2b5e ("mqueue: switch to on-demand creation of internal mount")
    Reported-by: Felix Abecassis
    Reported-by: Aleksa Sarai
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

12 Feb, 2018

1 commit

  • This is the mindless scripted replacement of kernel use of POLL*
    variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
    L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
    for f in $L; do sed -i "-es/^\([^\"]*\)\(\\)/\\1E\\2/" $f; done
    done

    with de-mangling cleanups yet to come.

    NOTE! On almost all architectures, the EPOLL* constants have the same
    values as the POLL* constants do. But they keyword here is "almost".
    For various bad reasons they aren't the same, and epoll() doesn't
    actually work quite correctly in some cases due to this on Sparc et al.

    The next patch from Al will sort out the final differences, and we
    should be all done.

    Scripted-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

07 Feb, 2018

1 commit

  • Previous behavior added tasks to the work queue using the static_prio
    value instead of the dynamic priority value in prio. This caused RT tasks
    to be added to the work queue in a FIFO manner rather than by priority.
    Normal tasks were handled by priority.

    This fix utilizes the dynamic priority of the task to ensure that both RT
    and normal tasks are added to the work queue in priority order. Utilizing
    the dynamic priority (prio) rather than the base priority (normal_prio)
    was chosen to ensure that if a task had a boosted priority when it was
    added to the work queue, it would be woken sooner to to ensure that it
    releases any other locks it may be holding in a more timely manner. It is
    understood that the task could have a lower priority when it wakes than
    when it was added to the queue in this (unlikely) case.

    Link: http://lkml.kernel.org/r/1513006652-7014-1-git-send-email-jhaws@sdl.usu.edu
    Signed-off-by: Jonathan Haws
    Reviewed-by: Steven Rostedt (VMware)
    Reviewed-by: Davidlohr Bueso
    Cc: Ingo Molnar
    Cc: Al Viro
    Cc: Arnd Bergmann
    Cc: Deepa Dinamani
    Cc: Thomas Gleixner
    Cc: Sebastian Andrzej Siewior
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jonathan Haws
     

31 Jan, 2018

2 commits

  • Pull mqueue/bpf vfs cleanups from Al Viro:
    "mqueue and bpf go through rather painful and similar contortions to
    create objects in their dentry trees. Provide a primitive for doing
    that without abusing ->mknod(), switch bpf and mqueue to it.

    Another mqueue-related thing that has ended up in that branch is
    on-demand creation of internal mount (based upon the work of Giuseppe
    Scrivano)"

    * 'work.mqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    mqueue: switch to on-demand creation of internal mount
    tidy do_mq_open() up a bit
    mqueue: clean prepare_open() up
    do_mq_open(): move all work prior to dentry_open() into a helper
    mqueue: fold mq_attr_ok() into mqueue_get_inode()
    move dentry_open() calls up into do_mq_open()
    mqueue: switch to vfs_mkobj(), quit abusing ->d_fsdata
    bpf_obj_do_pin(): switch to vfs_mkobj(), quit abusing ->mknod()
    new primitive: vfs_mkobj()

    Linus Torvalds
     
  • Pull poll annotations from Al Viro:
    "This introduces a __bitwise type for POLL### bitmap, and propagates
    the annotations through the tree. Most of that stuff is as simple as
    'make ->poll() instances return __poll_t and do the same to local
    variables used to hold the future return value'.

    Some of the obvious brainos found in process are fixed (e.g. POLLIN
    misspelled as POLL_IN). At that point the amount of sparse warnings is
    low and most of them are for genuine bugs - e.g. ->poll() instance
    deciding to return -EINVAL instead of a bitmap. I hadn't touched those
    in this series - it's large enough as it is.

    Another problem it has caught was eventpoll() ABI mess; select.c and
    eventpoll.c assumed that corresponding POLL### and EPOLL### were
    equal. That's true for some, but not all of them - EPOLL### are
    arch-independent, but POLL### are not.

    The last commit in this series separates userland POLL### values from
    the (now arch-independent) kernel-side ones, converting between them
    in the few places where they are copied to/from userland. AFAICS, this
    is the least disruptive fix preserving poll(2) ABI and making epoll()
    work on all architectures.

    As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
    it will trigger only on what would've triggered EPOLLWRBAND on other
    architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
    at all on sparc. With this patch they should work consistently on all
    architectures"

    * 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
    make kernel-side POLL... arch-independent
    eventpoll: no need to mask the result of epi_item_poll() again
    eventpoll: constify struct epoll_event pointers
    debugging printk in sg_poll() uses %x to print POLL... bitmap
    annotate poll(2) guts
    9p: untangle ->poll() mess
    ->si_band gets POLL... bitmap stored into a user-visible long field
    ring_buffer_poll_wait() return value used as return value of ->poll()
    the rest of drivers/*: annotate ->poll() instances
    media: annotate ->poll() instances
    fs: annotate ->poll() instances
    ipc, kernel, mm: annotate ->poll() instances
    net: annotate ->poll() instances
    apparmor: annotate ->poll() instances
    tomoyo: annotate ->poll() instances
    sound: annotate ->poll() instances
    acpi: annotate ->poll() instances
    crypto: annotate ->poll() instances
    block: annotate ->poll() instances
    x86: annotate ->poll() instances
    ...

    Linus Torvalds
     

13 Jan, 2018

1 commit

  • Call clear_siginfo to ensure stack allocated siginfos are fully
    initialized before being passed to the signal sending functions.

    This ensures that if there is the kind of confusion documented by
    TRAP_FIXME, FPE_FIXME, or BUS_FIXME the kernel won't send unitialized
    data to userspace when the kernel generates a signal with SI_USER but
    the copy to userspace assumes it is a different kind of signal, and
    different fields are initialized.

    This also prepares the way for turning copy_siginfo_to_user
    into a copy_to_user, by removing the need in many cases to perform
    a field by field copy simply to skip the uninitialized fields.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

06 Jan, 2018

7 commits


28 Nov, 2017

2 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • This is a pure automated search-and-replace of the internal kernel
    superblock flags.

    The s_flags are now called SB_*, with the names and the values for the
    moment mirroring the MS_* flags that they're equivalent to.

    Note how the MS_xyz flags are the ones passed to the mount system call,
    while the SB_xyz flags are what we then use in sb->s_flags.

    The script to do this was:

    # places to look in; re security/*: it generally should *not* be
    # touched (that stuff parses mount(2) arguments directly), but
    # there are two places where we really deal with superblock flags.
    FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
    include/linux/fs.h include/uapi/linux/bfs_fs.h \
    security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
    # the list of MS_... constants
    SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
    DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
    POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
    I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
    ACTIVE NOUSER"

    SED_PROG=
    for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

    # we want files that contain at least one of MS_...,
    # with fs/namespace.c and fs/pnode.c excluded.
    L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

    for f in $L; do sed -i $f $SED_PROG; done

    Requested-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

04 Sep, 2017

1 commit

  • struct timespec is not y2038 safe. Replace
    all uses of timespec by y2038 safe struct timespec64.

    Even though timespec is used here to represent timeouts,
    replace these with timespec64 so that it facilitates
    in verification by creating a y2038 safe kernel image
    that is free of timespec.

    The syscall interfaces themselves are not changed as part
    of the patch. They will be part of a different series.

    Signed-off-by: Deepa Dinamani
    Cc: Paul Moore
    Cc: Richard Guy Briggs
    Reviewed-by: Richard Guy Briggs
    Reviewed-by: Arnd Bergmann
    Acked-by: Paul Moore
    Signed-off-by: Al Viro

    Deepa Dinamani
     

10 Jul, 2017

1 commit

  • The retry logic for netlink_attachskb() inside sys_mq_notify()
    is nasty and vulnerable:

    1) The sock refcnt is already released when retry is needed
    2) The fd is controllable by user-space because we already
    release the file refcnt

    so we when retry but the fd has been just closed by user-space
    during this small window, we end up calling netlink_detachskb()
    on the error path which releases the sock again, later when
    the user-space closes this socket a use-after-free could be
    triggered.

    Setting 'sock' to NULL here should be sufficient to fix it.

    Reported-by: GeneBlue
    Signed-off-by: Cong Wang
    Cc: Andrew Morton
    Cc: Manfred Spraul
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Cong Wang
     

05 Jul, 2017

1 commit


02 Mar, 2017

3 commits


28 Feb, 2017

1 commit


21 Nov, 2016

1 commit

  • Currently the wake_q data structure is defined by the WAKE_Q() macro.
    This macro, however, looks like a function doing something as "wake" is
    a verb. Even checkpatch.pl was confused as it reported warnings like

    WARNING: Missing a blank line after declarations
    #548: FILE: kernel/futex.c:3665:
    + int ret;
    + WAKE_Q(wake_q);

    This patch renames the WAKE_Q() macro to DEFINE_WAKE_Q() which clarifies
    what the macro is doing and eliminates the checkpatch.pl warnings.

    Signed-off-by: Waiman Long
    Acked-by: Davidlohr Bueso
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1479401198-1765-1-git-send-email-longman@redhat.com
    [ Resolved conflict and added missing rename. ]
    Signed-off-by: Ingo Molnar

    Waiman Long
     

28 Sep, 2016

1 commit

  • CURRENT_TIME macro is not appropriate for filesystems as it
    doesn't use the right granularity for filesystem timestamps.
    Use current_time() instead.

    CURRENT_TIME is also not y2038 safe.

    This is also in preparation for the patch that transitions
    vfs timestamps to use 64 bit time and hence make them
    y2038 safe. As part of the effort current_time() will be
    extended to do range checks. Hence, it is necessary for all
    file system timestamps to use current_time(). Also,
    current_time() will be transitioned along with vfs to be
    y2038 safe.

    Note that whenever a single call to current_time() is used
    to change timestamps in different inodes, it is because they
    share the same time granularity.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Acked-by: Felipe Balbi
    Acked-by: Steven Whitehouse
    Acked-by: Ryusuke Konishi
    Acked-by: David Sterba
    Signed-off-by: Al Viro

    Deepa Dinamani
     

24 Jun, 2016

3 commits

  • Introduce a function may_open_dev that tests MNT_NODEV and a new
    superblock flab SB_I_NODEV. Use this new function in all of the
    places where MNT_NODEV was previously tested.

    Add the new SB_I_NODEV s_iflag to proc, sysfs, and mqueuefs as those
    filesystems should never support device nodes, and a simple superblock
    flags makes that very hard to get wrong. With SB_I_NODEV set if any
    device nodes somehow manage to show up on on a filesystem those
    device nodes will be unopenable.

    Acked-by: Seth Forshee
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Set SB_I_NOEXEC on mqueuefs to ensure small implementation mistakes
    do not result in executable on mqueuefs by accident.

    Acked-by: Seth Forshee
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Today what is normally called data (the mount options) is not passed
    to fill_super through mount_ns.

    Pass the mount options and the namespace separately to mount_ns so
    that filesystems such as proc that have mount options, can use
    mount_ns.

    Pass the user namespace to mount_ns so that the standard permission
    check that verifies the mounter has permissions over the namespace can
    be performed in mount_ns instead of in each filesystems .mount method.
    Thus removing the duplication between mqueuefs and proc in terms of
    permission checks. The extra permission check does not currently
    affect the rpc_pipefs filesystem and the nfsd filesystem as those
    filesystems do not currently allow unprivileged mounts. Without
    unpvileged mounts it is guaranteed that the caller has already passed
    capable(CAP_SYS_ADMIN) which guarantees extra permission check will
    pass.

    Update rpc_pipefs and the nfsd filesystem to ensure that the network
    namespace reference is always taken in fill_super and always put in kill_sb
    so that the logic is simpler and so that errors originating inside of
    fill_super do not cause a network namespace leak.

    Acked-by: Seth Forshee
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

15 Jan, 2016

1 commit

  • Mark those kmem allocations that are known to be easily triggered from
    userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
    memcg. For the list, see below:

    - threadinfo
    - task_struct
    - task_delay_info
    - pid
    - cred
    - mm_struct
    - vm_area_struct and vm_region (nommu)
    - anon_vma and anon_vma_chain
    - signal_struct
    - sighand_struct
    - fs_struct
    - files_struct
    - fdtable and fdtable->full_fds_bits
    - dentry and external_name
    - inode for all filesystems. This is the most tedious part, because
    most filesystems overwrite the alloc_inode method.

    The list is far from complete, so feel free to add more objects.
    Nevertheless, it should be close to "account everything" approach and
    keep most workloads within bounds. Malevolent users will be able to
    breach the limit, but this was possible even with the former "account
    everything" approach (simply because it did not account everything in
    fact).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Cc: Greg Thelen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

07 Aug, 2015

1 commit

  • A while back, the message queue implementation in the kernel was
    improved to use btrees to speed up retrieval of messages, in commit
    d6629859b36d ("ipc/mqueue: improve performance of send/recv").

    That patch introducing the improved kernel handling of message queues
    (using btrees) has, as a by-product, changed the meaning of the QSIZE
    field in the pseudo-file created for the queue. Before, this field
    reflected the size of the user-data in the queue. Since, it also takes
    kernel data structures into account. For example, if 13 bytes of user
    data are in the queue, on my machine the file reports a size of 61
    bytes.

    There was some discussion on this topic before (for example
    https://lkml.org/lkml/2014/10/1/115). Commenting on a th lkml, Michael
    Kerrisk gave the following background
    (https://lkml.org/lkml/2015/6/16/74):

    The pseudofiles in the mqueue filesystem (usually mounted at
    /dev/mqueue) expose fields with metadata describing a message
    queue. One of these fields, QSIZE, as originally implemented,
    showed the total number of bytes of user data in all messages in
    the message queue, and this feature was documented from the
    beginning in the mq_overview(7) page. In 3.5, some other (useful)
    work happened to break the user-space API in a couple of places,
    including the value exposed via QSIZE, which now includes a measure
    of kernel overhead bytes for the queue, a figure that renders QSIZE
    useless for its original purpose, since there's no way to deduce
    the number of overhead bytes consumed by the implementation.
    (The other user-space breakage was subsequently fixed.)

    This patch removes the accounting of kernel data structures in the
    queue. Reporting the size of these data-structures in the QSIZE field
    was a breaking change (see Michael's comment above). Without the QSIZE
    field reporting the total size of user-data in the queue, there is no
    way to deduce this number.

    It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
    against the worst-case size of the queue (in both the old and the new
    implementation). Therefore, the kernel overhead accounting in QSIZE is
    not necessary to help the user understand the limitations RLIMIT imposes
    on the processes.

    Signed-off-by: Marcus Gelderie
    Acked-by: Doug Ledford
    Acked-by: Michael Kerrisk
    Acked-by: Davidlohr Bueso
    Cc: David Howells
    Cc: Alexander Viro
    Cc: John Duffy
    Cc: Arto Bendiken
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marcus Gelderie
     

08 May, 2015

1 commit

  • This patch moves the wakeup_process() invocation so it is not done under
    the info->lock by making use of a lockless wake_q. With this change, the
    waiter is woken up once it is STATE_READY and it does not need to loop
    on SMP if it is still in STATE_PENDING. In the timeout case we still need
    to grab the info->lock to verify the state.

    This change should also avoid the introduction of preempt_disable() in -rt
    which avoids a busy-loop which pools for the STATE_PENDING -> STATE_READY
    change if the waiter has a higher priority compared to the waker.

    Additionally, this patch micro-optimizes wq_sleep by using the cheaper
    cousin of set_current_state(TASK_INTERRUPTABLE) as we will block no
    matter what, thus get rid of the implied barrier.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: George Spelvin
    Acked-by: Thomas Gleixner
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Mason
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Manfred Spraul
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Steven Rostedt
    Cc: dave@stgolabs.net
    Link: http://lkml.kernel.org/r/1430748166.1940.17.camel@stgolabs.net
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     

16 Apr, 2015

1 commit


20 Nov, 2014

1 commit

  • ... for situations when we don't have any candidate in pathnames - basically,
    in descriptor-based syscalls.

    [Folded the build fix for !CONFIG_AUDITSYSCALL configs from Chen Gang]

    Signed-off-by: Al Viro

    Al Viro
     

08 Apr, 2014

1 commit


26 Feb, 2014

1 commit

  • Commit 93e6f119c0ce ("ipc/mqueue: cleanup definition names and
    locations") added global hardcoded limits to the amount of message
    queues that can be created. While these limits are per-namespace,
    reality is that it ends up breaking userspace applications.
    Historically users have, at least in theory, been able to create up to
    INT_MAX queues, and limiting it to just 1024 is way too low and dramatic
    for some workloads and use cases. For instance, Madars reports:

    "This update imposes bad limits on our multi-process application. As
    our app uses approaches that each process opens its own set of queues
    (usually something about 3-5 queues per process). In some scenarios
    we might run up to 3000 processes or more (which of-course for linux
    is not a problem). Thus we might need up to 9000 queues or more. All
    processes run under one user."

    Other affected users can be found in launchpad bug #1155695:
    https://bugs.launchpad.net/ubuntu/+source/manpages/+bug/1155695

    Instead of increasing this limit, revert it entirely and fallback to the
    original way of dealing queue limits -- where once a user's resource
    limit is reached, and all memory is used, new queues cannot be created.

    Signed-off-by: Davidlohr Bueso
    Reported-by: Madars Vitolins
    Acked-by: Doug Ledford
    Cc: Manfred Spraul
    Cc: [3.5+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

28 Jan, 2014

2 commits

  • Deal with checkpatch messages:
    WARNING: braces {} are not necessary for single statement blocks

    Signed-off-by: Davidlohr Bueso
    Cc: Aswin Chandramouleeswaran
    Cc: Rik van Riel
    Acked-by: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • The ipc code does not adhere the typical linux coding style.
    This patch fixes lots of simple whitespace errors.

    - mostly autogenerated by
    scripts/checkpatch.pl -f --fix \
    --types=pointer_location,spacing,space_before_tab
    - one manual fixup (keep structure members tab-aligned)
    - removal of additional space_before_tab that were not found by --fix

    Tested with some of my msg and sem test apps.

    Andrew: Could you include it in -mm and move it towards Linus' tree?

    Signed-off-by: Manfred Spraul
    Suggested-by: Li Bin
    Cc: Joe Perches
    Acked-by: Rafael Aquini
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul