09 Jun, 2020

1 commit

  • Sparse reports a warning at freeque()

    warning: context imbalance in freeque() - unexpected unlock

    The root cause is the missing annotation at freeque()

    Add the missing __releases(RCU) annotation
    Add the missing __releases(&msq->q_perm) annotation

    Signed-off-by: Jules Irenge
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Cc: Boqun Feng
    Cc: Lu Shuaibing
    Cc: Nathan Chancellor
    Cc: Manfred Spraul
    Cc: Davidlohr Bueso
    Link: http://lkml.kernel.org/r/20200403160505.2832-2-jbi.octave@gmail.com
    Signed-off-by: Linus Torvalds

    Jules Irenge
     

04 Feb, 2020

2 commits

  • A use of uninitialized memory in msgctl_down() because msqid64 in
    ksys_msgctl hasn't been initialized. The local | msqid64 | is created in
    ksys_msgctl() and then passed into msgctl_down(). Along the way msqid64
    is never initialized before msgctl_down() checks msqid64->msg_qbytes.

    KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool)
    reports:

    ==================================================================
    BUG: KUMSAN: use of uninitialized memory in msgctl_down+0x94/0x300
    Read of size 8 at addr ffff88806bb97eb8 by task syz-executor707/2022

    CPU: 0 PID: 2022 Comm: syz-executor707 Not tainted 5.2.0-rc4+ #63
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    Call Trace:
    dump_stack+0x75/0xae
    __kumsan_report+0x17c/0x3e6
    kumsan_report+0xe/0x20
    msgctl_down+0x94/0x300
    ksys_msgctl.constprop.14+0xef/0x260
    do_syscall_64+0x7e/0x1f0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x4400e9
    Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007ffd869e0598 EFLAGS: 00000246 ORIG_RAX: 0000000000000047
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004400e9
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000
    R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000401970
    R13: 0000000000401a00 R14: 0000000000000000 R15: 0000000000000000

    The buggy address belongs to the page:
    page:ffffea0001aee5c0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0
    flags: 0x100000000000000()
    raw: 0100000000000000 0000000000000000 ffffffff01ae0101 0000000000000000
    raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
    page dumped because: kumsan: bad access detected
    ==================================================================

    Syzkaller reproducer:
    msgctl$IPC_RMID(0x0, 0x0)

    C reproducer:
    // autogenerated by syzkaller (https://github.com/google/syzkaller)

    int main(void)
    {
    syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
    syscall(__NR_msgctl, 0, 0, 0);
    return 0;
    }

    [natechancellor@gmail.com: adjust indentation in ksys_msgctl]
    Link: https://github.com/ClangBuiltLinux/linux/issues/829
    Link: http://lkml.kernel.org/r/20191218032932.37479-1-natechancellor@gmail.com
    Link: http://lkml.kernel.org/r/20190613014044.24234-1-shuaibinglu@126.com
    Signed-off-by: Lu Shuaibing
    Signed-off-by: Nathan Chancellor
    Suggested-by: Arnd Bergmann
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: NeilBrown
    From: Andrew Morton
    Subject: drivers/block/null_blk_main.c: fix layout

    Each line here overflows 80 cols by exactly one character. Delete one tab
    per line to fix.

    Cc: Shaohua Li
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lu Shuaibing
     
  • Transfer findings from ipc/mqueue.c:

    - A control barrier was missing for the lockless receive case So in
    theory, not yet initialized data may have been copied to user space -
    obviously only for architectures where control barriers are not NOP.

    - use smp_store_release(). In theory, the refount may have been
    decreased to 0 already when wake_q_add() tries to get a reference.

    Link: http://lkml.kernel.org/r/20191020123305.14715-5-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Waiman Long
    Cc: Davidlohr Bueso
    Cc:
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

26 Jan, 2019

1 commit

  • The behavior of these system calls is slightly different between
    architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION
    symbol. Most architectures that implement the split IPC syscalls don't set
    that symbol and only get the modern version, but alpha, arm, microblaze,
    mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag.

    For the architectures that so far only implement sys_ipc(), i.e. m68k,
    mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior
    when adding the split syscalls, so we need to distinguish between the
    two groups of architectures.

    The method I picked for this distinction is to have a separate system call
    entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl()
    does not. The system call tables of the five architectures are changed
    accordingly.

    As an additional benefit, we no longer need the configuration specific
    definition for ipc_parse_version(), it always does the same thing now,
    but simply won't get called on architectures with the modern interface.

    A small downside is that on architectures that do set
    ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points
    that are never called. They only add a few bytes of bloat, so it seems
    better to keep them compared to adding yet another Kconfig symbol.
    I considered adding new syscall numbers for the IPC_64 variants for
    consistency, but decided against that for now.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

27 Aug, 2018

1 commit

  • Christoph Hellwig suggested a slightly different path for handling
    backwards compatibility with the 32-bit time_t based system calls:

    Rather than simply reusing the compat_sys_* entry points on 32-bit
    architectures unchanged, we get rid of those entry points and the
    compat_time types by renaming them to something that makes more sense
    on 32-bit architectures (which don't have a compat mode otherwise),
    and then share the entry points under the new name with the 64-bit
    architectures that use them for implementing the compatibility.

    The following types and interfaces are renamed here, and moved
    from linux/compat_time.h to linux/time32.h:

    old new
    --- ---
    compat_time_t old_time32_t
    struct compat_timeval struct old_timeval32
    struct compat_timespec struct old_timespec32
    struct compat_itimerspec struct old_itimerspec32
    ns_to_compat_timeval() ns_to_old_timeval32()
    get_compat_itimerspec64() get_old_itimerspec32()
    put_compat_itimerspec64() put_old_itimerspec32()
    compat_get_timespec64() get_old_timespec32()
    compat_put_timespec64() put_old_timespec32()

    As we already have aliases in place, this patch addresses only the
    instances that are relevant to the system call interface in particular,
    not those that occur in device drivers and other modules. Those
    will get handled separately, while providing the 64-bit version
    of the respective interfaces.

    I'm not renaming the timex, rusage and itimerval structures, as we are
    still debating what the new interface will look like, and whether we
    will need a replacement at all.

    This also doesn't change the names of the syscall entry points, which can
    be done more easily when we actually switch over the 32-bit architectures
    to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to
    SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix.

    Suggested-by: Christoph Hellwig
    Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

23 Aug, 2018

5 commits

  • The varable names got a mess, thus standardize them again:

    id: user space id. Called semid, shmid, msgid if the type is known.
    Most functions use "id" already.
    idx: "index" for the idr lookup
    Right now, some functions use lid, ipc_addid() already uses idx as
    the variable name.
    seq: sequence number, to avoid quick collisions of the user space id
    key: user space key, used for the rhash tree

    Link: http://lkml.kernel.org/r/20180712185241.4017-12-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Dmitry Vyukov
    Cc: Davidlohr Bueso
    Cc: Davidlohr Bueso
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Now that we know that rhashtable_init() will not fail, we can get rid of a
    lot of the unnecessary cleanup paths when the call errored out.

    [manfred@colorfullife.com: variable name added to util.h to resolve checkpatch warning]
    Link: http://lkml.kernel.org/r/20180712185241.4017-11-manfred@colorfullife.com
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Manfred Spraul
    Cc: Dmitry Vyukov
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Both the comment and the name of ipcctl_pre_down_nolock() are misleading:
    The function must be called while holdling the rw semaphore.

    Therefore the patch renames the function to ipcctl_obtain_check(): This
    name matches the other names used in util.c:

    - "obtain" function look up a pointer in the idr, without
    acquiring the object lock.
    - The caller is responsible for locking.
    - _check means that the sequence number is checked.

    Link: http://lkml.kernel.org/r/20180712185241.4017-5-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Reviewed-by: Davidlohr Bueso
    Cc: Davidlohr Bueso
    Cc: Dmitry Vyukov
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • ipc_addid() is impossible to use:
    - for certain failures, the caller must not use ipc_rcu_putref(),
    because the reference counter is not yet initialized.
    - for other failures, the caller must use ipc_rcu_putref(),
    because parallel operations could be ongoing already.

    The patch cleans that up, by initializing the refcount early, and by
    modifying all callers.

    The issues is related to the finding of
    syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com: syzbot found an
    issue with reading kern_ipc_perm.seq, here both read and write to already
    released memory could happen.

    Link: http://lkml.kernel.org/r/20180712185241.4017-4-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Dmitry Vyukov
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Davidlohr Bueso
    Cc: Herbert Xu
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • ipc_addid() initializes kern_ipc_perm.id after having called
    ipc_idr_alloc().

    Thus a parallel semctl() or msgctl() that uses e.g. MSG_STAT may use this
    unitialized value as the return code.

    The patch moves all accesses to kern_ipc_perm.id under the spin_lock().

    The issues is related to the finding of
    syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com: syzbot found an
    issue with kern_ipc_perm.seq

    Link: http://lkml.kernel.org/r/20180712185241.4017-2-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Reviewed-by: Davidlohr Bueso
    Cc: Dmitry Vyukov
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Herbert Xu
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

22 Jun, 2018

1 commit

  • Due to the use of rhashtables in net namespaces,
    rhashtable.h is included in lots of the kernel,
    so a small changes can required a large recompilation.
    This makes development painful.

    This patch splits out rhashtable-types.h which just includes
    the major type declarations, and does not include (non-trivial)
    inline code. rhashtable.h is no longer included by anything
    in the include/ directory.
    Common include files only include rhashtable-types.h so a large
    recompilation is only triggered when that changes.

    Acked-by: Herbert Xu
    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     

20 Apr, 2018

2 commits

  • The shmid64_ds/semid64_ds/msqid64_ds data structures have been extended
    to contain extra fields for storing the upper bits of the time stamps,
    this patch does the other half of the job and and fills the new fields on
    32-bit architectures as well as 32-bit tasks running on a 64-bit kernel
    in compat mode.

    There should be no change for native 64-bit tasks.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • In some places, we still used get_seconds() instead of
    ktime_get_real_seconds(), and I'm changing the remaining ones now to
    all use ktime_get_real_seconds() so we use the full available range for
    timestamps instead of overflowing the 'unsigned long' return value in
    year 2106 on 32-bit kernels.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

12 Apr, 2018

1 commit

  • There is a permission discrepancy when consulting msq ipc object
    metadata between /proc/sysvipc/msg (0444) and the MSG_STAT shmctl
    command. The later does permission checks for the object vs S_IRUGO.
    As such there can be cases where EACCESS is returned via syscall but the
    info is displayed anyways in the procfs files.

    While this might have security implications via info leaking (albeit no
    writing to the msq metadata), this behavior goes way back and showing
    all the objects regardless of the permissions was most likely an
    overlook - so we are stuck with it. Furthermore, modifying either the
    syscall or the procfs file can cause userspace programs to break (ie
    ipcs). Some applications require getting the procfs info (without root
    privileges) and can be rather slow in comparison with a syscall -- up to
    500x in some reported cases for shm.

    This patch introduces a new MSG_STAT_ANY command such that the msq ipc
    object permissions are ignored, and only audited instead. In addition,
    I've left the lsm security hook checks in place, as if some policy can
    block the call, then the user has no other choice than just parsing the
    procfs file.

    Link: http://lkml.kernel.org/r/20180215162458.10059-4-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Reported-by: Robert Kettler
    Cc: Eric W. Biederman
    Cc: Kees Cook
    Cc: Manfred Spraul
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

04 Apr, 2018

1 commit

  • Pull namespace updates from Eric Biederman:
    "There was a lot of work this cycle fixing bugs that were discovered
    after the merge window and getting everything ready where we can
    reasonably support fully unprivileged fuse. The bug fixes you already
    have and much of the unprivileged fuse work is coming in via other
    trees.

    Still left for fully unprivileged fuse is figuring out how to cleanly
    handle .set_acl and .get_acl in the legacy case, and properly handling
    of evm xattrs on unprivileged mounts.

    Included in the tree is a cleanup from Alexely that replaced a linked
    list with a statically allocated fix sized array for the pid caches,
    which simplifies and speeds things up.

    Then there is are some cleanups and fixes for the ipc namespace. The
    motivation was that in reviewing other code it was discovered that
    access ipc objects from different pid namespaces recorded pids in such
    a way that when asked the wrong pids were returned. In the worst case
    there has been a measured 30% performance impact for sysvipc
    semaphores. Other test cases showed no measurable performance impact.
    Manfred Spraul and Davidlohr Bueso who tend to work on sysvipc
    performance both gave the nod that this is good enough.

    Casey Schaufler and James Morris have given their approval to the LSM
    side of the changes.

    I simplified the types and the code dealing with sysvipc to pass just
    kern_ipc_perm for all three types of ipc. Which reduced the header
    dependencies throughout the kernel and simplified the lsm code.

    Which let me work on the pid fixes without having to worry about
    trivial changes causing complete kernel recompiles"

    * 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    ipc/shm: Fix pid freeing.
    ipc/shm: fix up for struct file no longer being available in shm.h
    ipc/smack: Tidy up from the change in type of the ipc security hooks
    ipc: Directly call the security hook in ipc_ops.associate
    ipc/sem: Fix semctl(..., GETPID, ...) between pid namespaces
    ipc/msg: Fix msgctl(..., IPC_STAT, ...) between pid namespaces
    ipc/shm: Fix shmctl(..., IPC_STAT, ...) between pid namespaces.
    ipc/util: Helpers for making the sysvipc operations pid namespace aware
    ipc: Move IPCMNI from include/ipc.h into ipc/util.h
    msg: Move struct msg_queue into ipc/msg.c
    shm: Move struct shmid_kernel into ipc/shm.c
    sem: Move struct sem and struct sem_array into ipc/sem.c
    msg/security: Pass kern_ipc_perm not msg_queue into the msg_queue security hooks
    shm/security: Pass kern_ipc_perm not shmid_kernel into the shm security hooks
    sem/security: Pass kern_ipc_perm not sem_array into the sem security hooks
    pidns: simpler allocation of pid_* caches

    Linus Torvalds
     

03 Apr, 2018

4 commits

  • Provide ksys_msgsnd() and compat_ksys_msgsnd() wrappers to avoid in-kernel
    calls to these syscalls. The ksys_ prefix denotes that these functions are
    meant as a drop-in replacement for the syscalls. In particular, they use
    the same calling convention as sys_msgsnd() and compat_sys_msgsnd().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Provide ksys_msgrcv() and compat_ksys_msgrcv() wrappers to avoid in-kernel
    calls to these syscalls. The ksys_ prefix denotes that these functions are
    meant as a drop-in replacement for the syscalls. In particular, they use
    the same calling convention as sys_msgrcv() and compat_sys_msgrcv().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Provide ksys_msgctl() and compat_ksys_msgctl() wrappers to avoid in-kernel
    calls to these syscalls. The ksys_ prefix denotes that these functions are
    meant as a drop-in replacement for the syscalls. In particular, they use
    the same calling convention as sys_msgctl() and compat_sys_msgctl().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Provide ksys_msgget() wrapper to avoid in-kernel calls to this syscall.
    The ksys_ prefix denotes that this function is meant as a drop-in
    replacement for the syscall. In particular, it uses the same calling
    convention as sys_msgget().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     

28 Mar, 2018

2 commits

  • After the last round of cleanups the shm, sem, and msg associate
    operations just became trivial wrappers around the appropriate security
    method. Simplify things further by just calling the security method
    directly.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Today msg_lspid and msg_lrpid are remembered in the pid namespace of
    the creator and the processes that last send or received a sysvipc
    message. If you have processes in multiple pid namespaces that is
    just wrong. The process ids reported will not make the least bit of
    sense.

    This fix is slightly more susceptible to a performance problem than
    the related fix for System V shared memory. By definition the pids
    are updated by msgsnd and msgrcv, the fast path of System V message
    queues. The only concern over the previous implementation is the
    incrementing and decrementing of the pid reference count. As that is
    the only difference and multiple updates by of the task_tgid by
    threads in the same process have been shown in af_unix sockets to
    create a cache line ping-pong between cpus of the same processor.

    In this case I don't expect cache lines holding pid reference counts
    to ping pong between cpus. As senders and receivers update different
    pids there is a natural separation there. Further if multiple threads
    of the same process either send or receive messages the pid will be
    updated to the same value and ipc_update_pid will avoid the reference
    count update.

    Which means in the common case I expect msg_lspid and msg_lrpid to
    remain constant, and reference counts not to be updated when messages
    are sent.

    In rare cases it may be possible to trigger the issue which was
    observed for af_unix sockets, but it will require multiple processes
    with multiple threads to be either sending or receiving messages. It
    just does not feel likely that anyone would do that in practice.

    This change updates msgctl(..., IPC_STAT, ...) to return msg_lspid and
    msg_lrpid in the pid namespace of the process calling stat.

    This change also updates cat /proc/sysvipc/msg to return print msg_lspid
    and msg_lrpid in the pid namespace of the process that opened the proc
    file.

    Fixes: b488893a390e ("pid namespaces: changes to show virtual ids to user")
    Reviewed-by: Nagarathnam Muthusamy
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

25 Mar, 2018

1 commit


23 Mar, 2018

1 commit


07 Feb, 2018

1 commit

  • As described in the title, this patch fixes id_ds inconsistency when
    ctl_stat executes concurrently with some ds-changing function, e.g.
    shmat, msgsnd or whatever.

    For instance, if shmctl(IPC_STAT) is running concurrently
    with shmat, following data structure can be returned:
    {... shm_lpid = 0, shm_nattch = 1, ...}

    Link: http://lkml.kernel.org/r/20171202153456.6514-1-philippe.mikoyan@skat.systems
    Signed-off-by: Philippe Mikoyan
    Reviewed-by: Davidlohr Bueso
    Cc: Al Viro
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philippe Mikoyan
     

18 Nov, 2017

1 commit

  • Pull misc vfs updates from Al Viro:
    "Assorted stuff, really no common topic here"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: grab the lock instead of blocking in __fd_install during resizing
    vfs: stop clearing close on exec when closing a fd
    include/linux/fs.h: fix comment about struct address_space
    fs: make fiemap work from compat_ioctl
    coda: fix 'kernel memory exposure attempt' in fsync
    pstore: remove unneeded unlikely()
    vfs: remove unneeded unlikely()
    stubs for mount_bdev() and kill_block_super() in !CONFIG_BLOCK case
    make vfs_ustat() static
    do_handle_open() should be static
    elf_fdpic: fix unused variable warning
    fold destroy_super() into __put_super()
    new helper: destroy_unused_super()
    fix address space warnings in ipc/
    acct.h: get rid of detritus

    Linus Torvalds
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

12 Oct, 2017

1 commit


15 Sep, 2017

1 commit

  • Pull ipc compat cleanup and 64-bit time_t from Al Viro:
    "IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa
    Dinamani"

    * 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    utimes: Make utimes y2038 safe
    ipc: shm: Make shmid_kernel timestamps y2038 safe
    ipc: sem: Make sem_array timestamps y2038 safe
    ipc: msg: Make msg_queue timestamps y2038 safe
    ipc: mqueue: Replace timespec with timespec64
    ipc: Make sys_semtimedop() y2038 safe
    get rid of SYSVIPC_COMPAT on ia64
    semtimedop(): move compat to native
    shmat(2): move compat to native
    msgrcv(2), msgsnd(2): move compat to native
    ipc(2): move compat to native
    ipc: make use of compat ipc_perm helpers
    semctl(): move compat to native
    semctl(): separate all layout-dependent copyin/copyout
    msgctl(): move compat to native
    msgctl(): split the actual work from copyin/copyout
    ipc: move compat shmctl to native
    shmctl: split the work from copyin/copyout

    Linus Torvalds
     

09 Sep, 2017

1 commit

  • ipc_findkey() used to scan all objects to look for the wanted key. This
    is slow when using a high number of keys. This change adds an rhashtable
    of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n).

    This change gives a 865% improvement of benchmark reaim.jobs_per_min on a
    56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1]

    Other (more micro) benchmark results, by the author: On an i5 laptop, the
    following loop executed right after a reboot took, without and with this
    change:

    for (int i = 0, k=0x424242; i < KEYS; ++i)
    semget(k++, 1, IPC_CREAT | 0600);

    total total max single max single
    KEYS without with call without call with

    1 3.5 4.9 µs 3.5 4.9
    10 7.6 8.6 µs 3.7 4.7
    32 16.2 15.9 µs 4.3 5.3
    100 72.9 41.8 µs 3.7 4.7
    1000 5,630.0 502.0 µs * *
    10000 1,340,000.0 7,240.0 µs * *
    31900 17,600,000.0 22,200.0 µs * *

    *: unreliable measure: high variance

    The duration for a lookup-only usage was obtained by the same loop once
    the keys are present:

    total total max single max single
    KEYS without with call without call with

    1 2.1 2.5 µs 2.1 2.5
    10 4.5 4.8 µs 2.2 2.3
    32 13.0 10.8 µs 2.3 2.8
    100 82.9 25.1 µs * 2.3
    1000 5,780.0 217.0 µs * *
    10000 1,470,000.0 2,520.0 µs * *
    31900 17,400,000.0 7,810.0 µs * *

    Finally, executing each semget() in a new process gave, when still
    summing only the durations of these syscalls:

    creation:
    total total
    KEYS without with

    1 3.7 5.0 µs
    10 32.9 36.7 µs
    32 125.0 109.0 µs
    100 523.0 353.0 µs
    1000 20,300.0 3,280.0 µs
    10000 2,470,000.0 46,700.0 µs
    31900 27,800,000.0 219,000.0 µs

    lookup-only:
    total total
    KEYS without with

    1 2.5 2.7 µs
    10 25.4 24.4 µs
    32 106.0 72.6 µs
    100 591.0 352.0 µs
    1000 22,400.0 2,250.0 µs
    10000 2,510,000.0 25,700.0 µs
    31900 28,200,000.0 115,000.0 µs

    [1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop

    Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix
    Signed-off-by: Guillaume Knispel
    Reviewed-by: Marc Pardo
    Cc: Davidlohr Bueso
    Cc: Kees Cook
    Cc: Manfred Spraul
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Cc: "Peter Zijlstra (Intel)"
    Cc: Ingo Molnar
    Cc: Sebastian Andrzej Siewior
    Cc: Serge Hallyn
    Cc: Andrey Vagin
    Cc: Guillaume Knispel
    Cc: Marc Pardo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guillaume Knispel
     

04 Sep, 2017

1 commit

  • time_t is not y2038 safe. Replace all uses of
    time_t by y2038 safe time64_t.

    Similarly, replace the calls to get_seconds() with
    y2038 safe ktime_get_real_seconds().
    Note that this preserves fast access on 64 bit systems,
    but 32 bit systems need sequence counters.

    The syscall interfaces themselves are not changed as part of
    the patch. They will be part of a different series.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Al Viro

    Deepa Dinamani
     

03 Aug, 2017

1 commit

  • When building with the randstruct gcc plugin, the layout of the IPC
    structs will be randomized, which requires any sub-structure accesses to
    use container_of(). The proc display handlers were missing the needed
    container_of()s since the iterator is passing in the top-level struct
    kern_ipc_perm.

    This would lead to crashes when running the "lsipc" program after the
    system had IPC registered (e.g. after starting up Gnome):

    general protection fault: 0000 [#1] PREEMPT SMP
    ...
    RIP: 0010:shm_add_rss_swap.isra.1+0x13/0xa0
    ...
    Call Trace:
    sysvipc_shm_proc_show+0x5e/0x150
    sysvipc_proc_show+0x1a/0x30
    seq_read+0x2e9/0x3f0
    ...

    Link: http://lkml.kernel.org/r/20170730205950.GA55841@beast
    Fixes: 3859a271a003 ("randstruct: Mark various structs for randomization")
    Signed-off-by: Kees Cook
    Reported-by: Dominik Brodowski
    Acked-by: Davidlohr Bueso
    Acked-by: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

16 Jul, 2017

4 commits


13 Jul, 2017

5 commits

  • There is nothing special about the msg_alloc/free routines any more, so
    remove them to make code more readable.

    [manfred@colorfullife.com: Rediff to keep rcu protection for security_msg_queue_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-19-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Only after ipc_addid() has succeeded will refcounting be used, so move
    initialization into ipc_addid() and remove from open-coded *_alloc()
    routines.

    Link: http://lkml.kernel.org/r/20170525185107.12869-17-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Loosely based on a patch from Kees Cook :
    - id and retval can be merged
    - if ipc_addid() fails, then use call_rcu() directly.

    The difference is that call_rcu is used for failed ipc_addid() calls, to
    continue to guaranteed an rcu delay for security_msg_queue_free().

    Link: http://lkml.kernel.org/r/20170525185107.12869-16-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Instead of using ipc_rcu_alloc() which only performs the refcount bump,
    open code it. This also allows for msg_queue structure layout to be
    randomized in the future.

    Link: http://lkml.kernel.org/r/20170525185107.12869-12-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Avoid using ipc_rcu_free, since it just re-finds the original structure
    pointer. For the pre-list-init failure path, there is no RCU needed,
    since it was just allocated. It can be directly freed.

    Link: http://lkml.kernel.org/r/20170525185107.12869-8-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook