13 Mar, 2019

1 commit

  • Pull vfs mount infrastructure updates from Al Viro:
    "The rest of core infrastructure; no new syscalls in that pile, but the
    old parts are switched to new infrastructure. At that point
    conversions of individual filesystems can happen independently; some
    are done here (afs, cgroup, procfs, etc.), there's also a large series
    outside of that pile dealing with NFS (quite a bit of option-parsing
    stuff is getting used there - it's one of the most convoluted
    filesystems in terms of mount-related logics), but NFS bits are the
    next cycle fodder.

    It got seriously simplified since the last cycle; documentation is
    probably the weakest bit at the moment - I considered dropping the
    commit introducing Documentation/filesystems/mount_api.txt (cutting
    the size increase by quarter ;-), but decided that it would be better
    to fix it up after -rc1 instead.

    That pile allows to do followup work in independent branches, which
    should make life much easier for the next cycle. fs/super.c size
    increase is unpleasant; there's a followup series that allows to
    shrink it considerably, but I decided to leave that until the next
    cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
    afs: Use fs_context to pass parameters over automount
    afs: Add fs_context support
    vfs: Add some logging to the core users of the fs_context log
    vfs: Implement logging through fs_context
    vfs: Provide documentation for new mount API
    vfs: Remove kern_mount_data()
    hugetlbfs: Convert to fs_context
    cpuset: Use fs_context
    kernfs, sysfs, cgroup, intel_rdt: Support fs_context
    cgroup: store a reference to cgroup_ns into cgroup_fs_context
    cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
    cgroup_do_mount(): massage calling conventions
    cgroup: stash cgroup_root reference into cgroup_fs_context
    cgroup2: switch to option-by-option parsing
    cgroup1: switch to option-by-option parsing
    cgroup: take options parsing into ->parse_monolithic()
    cgroup: fold cgroup1_mount() into cgroup1_get_tree()
    cgroup: start switching to fs_context
    ipc: Convert mqueue fs to fs_context
    proc: Add fs_context support to procfs
    ...

    Linus Torvalds
     

08 Mar, 2019

2 commits

  • Use kvzalloc() instead of kvmalloc() and memset().

    Also, make use of the struct_size() helper instead of the open-coded
    version in order to avoid any potential type mistakes.

    This code was detected with the help of Coccinelle.

    Link: http://lkml.kernel.org/r/20190131214221.GA28930@embeddedor
    Signed-off-by: Gustavo A. R. Silva
    Reviewed-by: Andrew Morton
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gustavo A. R. Silva
     
  • There is a plan to build the kernel with -Wimplicit-fallthrough and this
    place in the code produced a warning (W=1).

    This commit remove the following warning:

    ipc/sem.c:1683:6: warning: this statement may fall through [-Wimplicit-fallthrough=]

    Link: http://lkml.kernel.org/r/20190114203608.18218-1-malat@debian.org
    Signed-off-by: Mathieu Malaterre
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Malaterre
     

28 Feb, 2019

1 commit

  • Convert the mqueue filesystem to use the filesystem context stuff.

    Notes:

    (1) The relevant ipc namespace is selected in when the context is
    initialised (and it defaults to the current task's ipc namespace).
    The caller can override this before calling vfs_get_tree().

    (2) Rather than simply calling kern_mount_data(), mq_init_ns() and
    mq_internal_mount() create a context, adjust it and then do the rest
    of the mount procedure.

    (3) The lazy mqueue mounting on creation of a new namespace is retained
    from a previous patch, but the avoidance of sget() if no superblock
    yet exists is reverted and the superblock is again keyed on the
    namespace pointer.

    Yes, there was a performance gain in not searching the superblock
    hash, but it's only paid once per ipc namespace - and only if someone
    uses mqueue within that namespace, so I'm not sure it's worth it,
    especially as calling sget() allows avoidance of recursion.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

07 Feb, 2019

1 commit

  • A lot of system calls that pass a time_t somewhere have an implementation
    using a COMPAT_SYSCALL_DEFINEx() on 64-bit architectures, and have
    been reworked so that this implementation can now be used on 32-bit
    architectures as well.

    The missing step is to redefine them using the regular SYSCALL_DEFINEx()
    to get them out of the compat namespace and make it possible to build them
    on 32-bit architectures.

    Any system call that ends in 'time' gets a '32' suffix on its name for
    that version, while the others get a '_time32' suffix, to distinguish
    them from the normal version, which takes a 64-bit time argument in the
    future.

    In this step, only 64-bit architectures are changed, doing this rename
    first lets us avoid touching the 32-bit architectures twice.

    Acked-by: Catalin Marinas
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

26 Jan, 2019

1 commit

  • The behavior of these system calls is slightly different between
    architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION
    symbol. Most architectures that implement the split IPC syscalls don't set
    that symbol and only get the modern version, but alpha, arm, microblaze,
    mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag.

    For the architectures that so far only implement sys_ipc(), i.e. m68k,
    mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior
    when adding the split syscalls, so we need to distinguish between the
    two groups of architectures.

    The method I picked for this distinction is to have a separate system call
    entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl()
    does not. The system call tables of the five architectures are changed
    accordingly.

    As an additional benefit, we no longer need the configuration specific
    definition for ipc_parse_version(), it always does the same thing now,
    but simply won't get called on architectures with the modern interface.

    A small downside is that on architectures that do set
    ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points
    that are never called. They only add a few bytes of bloat, so it seems
    better to keep them compared to adding yet another Kconfig symbol.
    I considered adding new syscall numbers for the IPC_64 variants for
    consistency, but decided against that for now.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

18 Jan, 2019

1 commit

  • The sys_ipc() and compat_ksys_ipc() functions are meant to only
    be used from the system call table, not called by another function.

    Introduce ksys_*() interfaces for this purpose, as we have done
    for many other system calls.

    Link: https://lore.kernel.org/lkml/20190116131527.2071570-3-arnd@arndb.de
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Heiko Carstens
    [heiko.carstens@de.ibm.com: compile fix for !CONFIG_COMPAT]
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Arnd Bergmann
     

31 Oct, 2018

2 commits

  • For SysV semaphores, the semmni value is the last part of the 4-element
    sem number array. To make semmni behave in a similar way to msgmni and
    shmmni, we can't directly use the _minmax handler. Instead, a special sem
    specific handler is added to check the last argument to make sure that it
    is limited to the [0, IPCMNI] range. An error will be returned if this is
    not the case.

    Link: http://lkml.kernel.org/r/1536352137-12003-3-git-send-email-longman@redhat.com
    Signed-off-by: Waiman Long
    Reviewed-by: Davidlohr Bueso
    Cc: "Eric W. Biederman"
    Cc: Jonathan Corbet
    Cc: Kees Cook
    Cc: Luis R. Rodriguez
    Cc: Matthew Wilcox
    Cc: Takashi Iwai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Waiman Long
     
  • Patch series "ipc: IPCMNI limit check for *mni & increase that limit", v9.

    The sysctl parameters msgmni, shmmni and semmni have an inherent limit of
    IPC_MNI (32k). However, users may not be aware of that because they can
    write a value much higher than that without getting any error or
    notification. Reading the parameters back will show the newly written
    values which are not real.

    The real IPCMNI limit is now enforced to make sure that users won't put in
    an unrealistic value. The first 2 patches enforce the limits.

    There are also users out there requesting increase in the IPCMNI value.
    The last 2 patches attempt to do that by using a boot kernel parameter
    "ipcmni_extend" to increase the IPCMNI limit from 32k to 8M if the users
    really want the extended value.

    This patch (of 4):

    A user can write arbitrary integer values to msgmni and shmmni sysctl
    parameters without getting error, but the actual limit is really IPCMNI
    (32k). This can mislead users as they think they can get a value that is
    not real.

    The right limits are now set for msgmni and shmmni so that the users will
    become aware if they set a value outside of the acceptable range.

    Link: http://lkml.kernel.org/r/1536352137-12003-2-git-send-email-longman@redhat.com
    Signed-off-by: Waiman Long
    Acked-by: Luis R. Rodriguez
    Reviewed-by: Davidlohr Bueso
    Cc: Kees Cook
    Cc: Jonathan Corbet
    Cc: Matthew Wilcox
    Cc: "Eric W. Biederman"
    Cc: Takashi Iwai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Waiman Long
     

26 Oct, 2018

1 commit

  • Pull timekeeping updates from Thomas Gleixner:
    "The timers and timekeeping departement provides:

    - Another large y2038 update with further preparations for providing
    the y2038 safe timespecs closer to the syscalls.

    - An overhaul of the SHCMT clocksource driver

    - SPDX license identifier updates

    - Small cleanups and fixes all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
    tick/sched : Remove redundant cpu_online() check
    clocksource/drivers/dw_apb: Add reset control
    clocksource: Remove obsolete CLOCKSOURCE_OF_DECLARE
    clocksource/drivers: Unify the names to timer-* format
    clocksource/drivers/sh_cmt: Add R-Car gen3 support
    dt-bindings: timer: renesas: cmt: document R-Car gen3 support
    clocksource/drivers/sh_cmt: Properly line-wrap sh_cmt_of_table[] initializer
    clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines
    clocksource/drivers/sh_cmt: Fixup for 64-bit machines
    clocksource/drivers/sh_tmu: Convert to SPDX identifiers
    clocksource/drivers/sh_mtu2: Convert to SPDX identifiers
    clocksource/drivers/sh_cmt: Convert to SPDX identifiers
    clocksource/drivers/renesas-ostm: Convert to SPDX identifiers
    clocksource: Convert to using %pOFn instead of device_node.name
    tick/broadcast: Remove redundant check
    RISC-V: Request newstat syscalls
    y2038: signal: Change rt_sigtimedwait to use __kernel_timespec
    y2038: socket: Change recvmmsg to use __kernel_timespec
    y2038: sched: Change sched_rr_get_interval to use __kernel_timespec
    y2038: utimes: Rework #ifdef guards for compat syscalls
    ...

    Linus Torvalds
     

24 Oct, 2018

1 commit

  • …iederm/user-namespace

    Pull siginfo updates from Eric Biederman:
    "I have been slowly sorting out siginfo and this is the culmination of
    that work.

    The primary result is in several ways the signal infrastructure has
    been made less error prone. The code has been updated so that manually
    specifying SEND_SIG_FORCED is never necessary. The conversion to the
    new siginfo sending functions is now complete, which makes it
    difficult to send a signal without filling in the proper siginfo
    fields.

    At the tail end of the patchset comes the optimization of decreasing
    the size of struct siginfo in the kernel from 128 bytes to about 48
    bytes on 64bit. The fundamental observation that enables this is by
    definition none of the known ways to use struct siginfo uses the extra
    bytes.

    This comes at the cost of a small user space observable difference.
    For the rare case of siginfo being injected into the kernel only what
    can be copied into kernel_siginfo is delivered to the destination, the
    rest of the bytes are set to 0. For cases where the signal and the
    si_code are known this is safe, because we know those bytes are not
    used. For cases where the signal and si_code combination is unknown
    the bits that won't fit into struct kernel_siginfo are tested to
    verify they are zero, and the send fails if they are not.

    I made an extensive search through userspace code and I could not find
    anything that would break because of the above change. If it turns out
    I did break something it will take just the revert of a single change
    to restore kernel_siginfo to the same size as userspace siginfo.

    Testing did reveal dependencies on preferring the signo passed to
    sigqueueinfo over si->signo, so bit the bullet and added the
    complexity necessary to handle that case.

    Testing also revealed bad things can happen if a negative signal
    number is passed into the system calls. Something no sane application
    will do but something a malicious program or a fuzzer might do. So I
    have fixed the code that performs the bounds checks to ensure negative
    signal numbers are handled"

    * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits)
    signal: Guard against negative signal numbers in copy_siginfo_from_user32
    signal: Guard against negative signal numbers in copy_siginfo_from_user
    signal: In sigqueueinfo prefer sig not si_signo
    signal: Use a smaller struct siginfo in the kernel
    signal: Distinguish between kernel_siginfo and siginfo
    signal: Introduce copy_siginfo_from_user and use it's return value
    signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE
    signal: Fail sigqueueinfo if si_signo != sig
    signal/sparc: Move EMT_TAGOVF into the generic siginfo.h
    signal/unicore32: Use force_sig_fault where appropriate
    signal/unicore32: Generate siginfo in ucs32_notify_die
    signal/unicore32: Use send_sig_fault where appropriate
    signal/arc: Use force_sig_fault where appropriate
    signal/arc: Push siginfo generation into unhandled_exception
    signal/ia64: Use force_sig_fault where appropriate
    signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn
    signal/ia64: Use the generic force_sigsegv in setup_frame
    signal/arm/kvm: Use send_sig_mceerr
    signal/arm: Use send_sig_fault where appropriate
    signal/arm: Use force_sig_fault where appropriate
    ...

    Linus Torvalds
     

06 Oct, 2018

1 commit

  • This uses ERR_CAST() instead of an open-coded cast, as it is casting
    across structure pointers, which upsets __randomize_layout:

    ipc/shm.c: In function `shm_lock':
    ipc/shm.c:209:9: note: randstruct: casting between randomized structure pointer types (ssa): `struct shmid_kernel' and `struct kern_ipc_perm'

    return (void *)ipcp;
    ^~~~~~~~~~~~

    Link: http://lkml.kernel.org/r/20180919180722.GA15073@beast
    Fixes: 82061c57ce93 ("ipc: drop ipc_lock()")
    Signed-off-by: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     

03 Oct, 2018

1 commit

  • Linus recently observed that if we did not worry about the padding
    member in struct siginfo it is only about 48 bytes, and 48 bytes is
    much nicer than 128 bytes for allocating on the stack and copying
    around in the kernel.

    The obvious thing of only adding the padding when userspace is
    including siginfo.h won't work as there are sigframe definitions in
    the kernel that embed struct siginfo.

    So split siginfo in two; kernel_siginfo and siginfo. Keeping the
    traditional name for the userspace definition. While the version that
    is used internally to the kernel and ultimately will not be padded to
    128 bytes is called kernel_siginfo.

    The definition of struct kernel_siginfo I have put in include/signal_types.h

    A set of buildtime checks has been added to verify the two structures have
    the same field offsets.

    To make it easy to verify the change kernel_siginfo retains the same
    size as siginfo. The reduction in size comes in a following change.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

05 Sep, 2018

1 commit

  • When getting rid of the general ipc_lock(), this was missed furthermore,
    making the comment around the ipc object validity check bogus. Under
    EIDRM conditions, callers will in turn not see the error and continue
    with the operation.

    Link: http://lkml.kernel.org/r/20180824030920.GD3677@linux-r8p5
    Link: http://lkml.kernel.org/r/20180823024051.GC13343@shao2-debian
    Fixes: 82061c57ce9 ("ipc: drop ipc_lock()")
    Signed-off-by: Davidlohr Bueso
    Reported-by: kernel test robot
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

27 Aug, 2018

1 commit

  • Christoph Hellwig suggested a slightly different path for handling
    backwards compatibility with the 32-bit time_t based system calls:

    Rather than simply reusing the compat_sys_* entry points on 32-bit
    architectures unchanged, we get rid of those entry points and the
    compat_time types by renaming them to something that makes more sense
    on 32-bit architectures (which don't have a compat mode otherwise),
    and then share the entry points under the new name with the 64-bit
    architectures that use them for implementing the compatibility.

    The following types and interfaces are renamed here, and moved
    from linux/compat_time.h to linux/time32.h:

    old new
    --- ---
    compat_time_t old_time32_t
    struct compat_timeval struct old_timeval32
    struct compat_timespec struct old_timespec32
    struct compat_itimerspec struct old_itimerspec32
    ns_to_compat_timeval() ns_to_old_timeval32()
    get_compat_itimerspec64() get_old_itimerspec32()
    put_compat_itimerspec64() put_old_itimerspec32()
    compat_get_timespec64() get_old_timespec32()
    compat_put_timespec64() put_old_timespec32()

    As we already have aliases in place, this patch addresses only the
    instances that are relevant to the system call interface in particular,
    not those that occur in device drivers and other modules. Those
    will get handled separately, while providing the 64-bit version
    of the respective interfaces.

    I'm not renaming the timex, rusage and itimerval structures, as we are
    still debating what the new interface will look like, and whether we
    will need a replacement at all.

    This also doesn't change the names of the syscall entry points, which can
    be done more easily when we actually switch over the 32-bit architectures
    to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to
    SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix.

    Suggested-by: Christoph Hellwig
    Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

23 Aug, 2018

10 commits

  • ipc_getref has still a return value of type "int", matching the atomic_t
    interface of atomic_inc_not_zero()/atomic_add_unless().

    ipc_getref now uses refcount_inc_not_zero, which has a return value of
    type "bool".

    Therefore, update the return code to avoid implicit conversions.

    Link: http://lkml.kernel.org/r/20180712185241.4017-13-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Davidlohr Bueso
    Cc: Dmitry Vyukov
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • The varable names got a mess, thus standardize them again:

    id: user space id. Called semid, shmid, msgid if the type is known.
    Most functions use "id" already.
    idx: "index" for the idr lookup
    Right now, some functions use lid, ipc_addid() already uses idx as
    the variable name.
    seq: sequence number, to avoid quick collisions of the user space id
    key: user space key, used for the rhash tree

    Link: http://lkml.kernel.org/r/20180712185241.4017-12-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Dmitry Vyukov
    Cc: Davidlohr Bueso
    Cc: Davidlohr Bueso
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Now that we know that rhashtable_init() will not fail, we can get rid of a
    lot of the unnecessary cleanup paths when the call errored out.

    [manfred@colorfullife.com: variable name added to util.h to resolve checkpatch warning]
    Link: http://lkml.kernel.org/r/20180712185241.4017-11-manfred@colorfullife.com
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Manfred Spraul
    Cc: Dmitry Vyukov
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • In sysvipc we have an ids->tables_initialized regarding the rhashtable,
    introduced in 0cfb6aee70bd ("ipc: optimize semget/shmget/msgget for lots
    of keys")

    It's there, specifically, to prevent nil pointer dereferences, from using
    an uninitialized api. Considering how rhashtable_init() can fail
    (probably due to ENOMEM, if anything), this made the overall ipc
    initialization capable of failure as well. That alone is ugly, but fine,
    however I've spotted a few issues regarding the semantics of
    tables_initialized (however unlikely they may be):

    - There is inconsistency in what we return to userspace: ipc_addid()
    returns ENOSPC which is certainly _wrong_, while ipc_obtain_object_idr()
    returns EINVAL.

    - After we started using rhashtables, ipc_findkey() can return nil upon
    !tables_initialized, but the caller expects nil for when the ipc
    structure isn't found, and can therefore call into ipcget() callbacks.

    Now that rhashtable initialization cannot fail, we can properly get rid of
    the hack altogether.

    [manfred@colorfullife.com: commit id extended to 12 digits]
    Link: http://lkml.kernel.org/r/20180712185241.4017-10-manfred@colorfullife.com
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Manfred Spraul
    Cc: Dmitry Vyukov
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • ipc/util.c contains multiple functions to get the ipc object pointer given
    an id number.

    There are two sets of function: One set verifies the sequence counter part
    of the id number, other functions do not check the sequence counter.

    The standard for function names in ipc/util.c is
    - ..._check() functions verify the sequence counter
    - ..._idr() functions do not verify the sequence counter

    ipc_lock() is an exception: It does not verify the sequence counter value,
    but this is not obvious from the function name.

    Furthermore, shm.c is the only user of this helper. Thus, we can simply
    move the logic into shm_lock() and get rid of the function altogether.

    [manfred@colorfullife.com: most of changelog]
    Link: http://lkml.kernel.org/r/20180712185241.4017-7-manfred@colorfullife.com
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Manfred Spraul
    Cc: Dmitry Vyukov
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • The comment that explains ipc_obtain_object_check is wrong: The function
    checks the sequence number, not the reference counter.

    Note that checking the reference counter would be meaningless: The
    reference counter is decreased without holding any locks, thus an object
    with kern_ipc_perm.deleted=true may disappear at the end of the next rcu
    grace period.

    Link: http://lkml.kernel.org/r/20180712185241.4017-6-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Reviewed-by: Davidlohr Bueso
    Cc: Davidlohr Bueso
    Cc: Dmitry Vyukov
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Both the comment and the name of ipcctl_pre_down_nolock() are misleading:
    The function must be called while holdling the rw semaphore.

    Therefore the patch renames the function to ipcctl_obtain_check(): This
    name matches the other names used in util.c:

    - "obtain" function look up a pointer in the idr, without
    acquiring the object lock.
    - The caller is responsible for locking.
    - _check means that the sequence number is checked.

    Link: http://lkml.kernel.org/r/20180712185241.4017-5-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Reviewed-by: Davidlohr Bueso
    Cc: Davidlohr Bueso
    Cc: Dmitry Vyukov
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • ipc_addid() is impossible to use:
    - for certain failures, the caller must not use ipc_rcu_putref(),
    because the reference counter is not yet initialized.
    - for other failures, the caller must use ipc_rcu_putref(),
    because parallel operations could be ongoing already.

    The patch cleans that up, by initializing the refcount early, and by
    modifying all callers.

    The issues is related to the finding of
    syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com: syzbot found an
    issue with reading kern_ipc_perm.seq, here both read and write to already
    released memory could happen.

    Link: http://lkml.kernel.org/r/20180712185241.4017-4-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Dmitry Vyukov
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Davidlohr Bueso
    Cc: Herbert Xu
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • ipc_addid() initializes kern_ipc_perm.seq after having called idr_alloc()
    (within ipc_idr_alloc()).

    Thus a parallel semop() or msgrcv() that uses ipc_obtain_object_check()
    may see an uninitialized value.

    The patch moves the initialization of kern_ipc_perm.seq before the calls
    of idr_alloc().

    Notes:
    1) This patch has a user space visible side effect:
    If /proc/sys/kernel/*_next_id is used (i.e.: checkpoint/restore) and
    if semget()/msgget()/shmget() fails in the final step of adding the id
    to the rhash tree, then .._next_id is cleared. Before the patch, is
    remained unmodified.

    There is no change of the behavior after a successful ..get() call: It
    always clears .._next_id, there is no impact to non checkpoint/restore
    code as that code does not use .._next_id.

    2) The patch correctly documents that after a call to ipc_idr_alloc(),
    the full tear-down sequence must be used. The callers of ipc_addid()
    do not fullfill that, i.e. more bugfixes are required.

    The patch is a squash of a patch from Dmitry and my own changes.

    Link: http://lkml.kernel.org/r/20180712185241.4017-3-manfred@colorfullife.com
    Reported-by: syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com
    Signed-off-by: Manfred Spraul
    Cc: Dmitry Vyukov
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Michael Kerrisk
    Cc: Davidlohr Bueso
    Cc: Herbert Xu
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • ipc_addid() initializes kern_ipc_perm.id after having called
    ipc_idr_alloc().

    Thus a parallel semctl() or msgctl() that uses e.g. MSG_STAT may use this
    unitialized value as the return code.

    The patch moves all accesses to kern_ipc_perm.id under the spin_lock().

    The issues is related to the finding of
    syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com: syzbot found an
    issue with kern_ipc_perm.seq

    Link: http://lkml.kernel.org/r/20180712185241.4017-2-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Reviewed-by: Davidlohr Bueso
    Cc: Dmitry Vyukov
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Herbert Xu
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

16 Aug, 2018

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    - Gustavo A. R. Silva keeps working on the implicit switch fallthru
    changes.

    - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
    Luca Coelho.

    - Re-enable ASPM in r8169, from Kai-Heng Feng.

    - Add virtual XFRM interfaces, which avoids all of the limitations of
    existing IPSEC tunnels. From Steffen Klassert.

    - Convert GRO over to use a hash table, so that when we have many
    flows active we don't traverse a long list during accumluation.

    - Many new self tests for routing, TC, tunnels, etc. Too many
    contributors to mention them all, but I'm really happy to keep
    seeing this stuff.

    - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.

    - Lots of cleanups and fixes in L2TP code from Guillaume Nault.

    - Add IPSEC offload support to netdevsim, from Shannon Nelson.

    - Add support for slotting with non-uniform distribution to netem
    packet scheduler, from Yousuk Seung.

    - Add UDP GSO support to mlx5e, from Boris Pismenny.

    - Support offloading of Team LAG in NFP, from John Hurley.

    - Allow to configure TX queue selection based upon RX queue, from
    Amritha Nambiar.

    - Support ethtool ring size configuration in aquantia, from Anton
    Mikaev.

    - Support DSCP and flowlabel per-transport in SCTP, from Xin Long.

    - Support list based batching and stack traversal of SKBs, this is
    very exciting work. From Edward Cree.

    - Busyloop optimizations in vhost_net, from Toshiaki Makita.

    - Introduce the ETF qdisc, which allows time based transmissions. IGB
    can offload this in hardware. From Vinicius Costa Gomes.

    - Add parameter support to devlink, from Moshe Shemesh.

    - Several multiplication and division optimizations for BPF JIT in
    nfp driver, from Jiong Wang.

    - Lots of prepatory work to make more of the packet scheduler layer
    lockless, when possible, from Vlad Buslov.

    - Add ACK filter and NAT awareness to sch_cake packet scheduler, from
    Toke Høiland-Jørgensen.

    - Support regions and region snapshots in devlink, from Alex Vesker.

    - Allow to attach XDP programs to both HW and SW at the same time on
    a given device, with initial support in nfp. From Jakub Kicinski.

    - Add TLS RX offload and support in mlx5, from Ilya Lesokhin.

    - Use PHYLIB in r8169 driver, from Heiner Kallweit.

    - All sorts of changes to support Spectrum 2 in mlxsw driver, from
    Ido Schimmel.

    - PTP support in mv88e6xxx DSA driver, from Andrew Lunn.

    - Make TCP_USER_TIMEOUT socket option more accurate, from Jon
    Maxwell.

    - Support for templates in packet scheduler classifier, from Jiri
    Pirko.

    - IPV6 support in RDS, from Ka-Cheong Poon.

    - Native tproxy support in nf_tables, from Máté Eckl.

    - Maintain IP fragment queue in an rbtree, but optimize properly for
    in-order frags. From Peter Oskolkov.

    - Improvde handling of ACKs on hole repairs, from Yuchung Cheng"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
    bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
    hv/netvsc: Fix NULL dereference at single queue mode fallback
    net: filter: mark expected switch fall-through
    xen-netfront: fix warn message as irq device name has '/'
    cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
    net: dsa: mv88e6xxx: missing unlock on error path
    rds: fix building with IPV6=m
    inet/connection_sock: prefer _THIS_IP_ to current_text_addr
    net: dsa: mv88e6xxx: bitwise vs logical bug
    net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
    ieee802154: hwsim: using right kind of iteration
    net: hns3: Add vlan filter setting by ethtool command -K
    net: hns3: Set tx ring' tc info when netdev is up
    net: hns3: Remove tx ring BD len register in hns3_enet
    net: hns3: Fix desc num set to default when setting channel
    net: hns3: Fix for phy link issue when using marvell phy driver
    net: hns3: Fix for information of phydev lost problem when down/up
    net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
    net: hns3: Add support for serdes loopback selftest
    bnxt_en: take coredump_record structure off stack
    ...

    Linus Torvalds
     

14 Aug, 2018

1 commit

  • Pull vfs open-related updates from Al Viro:

    - "do we need fput() or put_filp()" rules are gone - it's always fput()
    now. We keep track of that state where it belongs - in ->f_mode.

    - int *opened mess killed - in finish_open(), in ->atomic_open()
    instances and in fs/namei.c code around do_last()/lookup_open()/atomic_open().

    - alloc_file() wrappers with saner calling conventions are introduced
    (alloc_file_clone() and alloc_file_pseudo()); callers converted, with
    much simplification.

    - while we are at it, saner calling conventions for path_init() and
    link_path_walk(), simplifying things inside fs/namei.c (both on
    open-related paths and elsewhere).

    * 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
    few more cleanups of link_path_walk() callers
    allow link_path_walk() to take ERR_PTR()
    make path_init() unconditionally paired with terminate_walk()
    document alloc_file() changes
    make alloc_file() static
    do_shmat(): grab shp->shm_file earlier, switch to alloc_file_clone()
    new helper: alloc_file_clone()
    create_pipe_files(): switch the first allocation to alloc_file_pseudo()
    anon_inode_getfile(): switch to alloc_file_pseudo()
    hugetlb_file_setup(): switch to alloc_file_pseudo()
    ocxlflash_getfile(): switch to alloc_file_pseudo()
    cxl_getfile(): switch to alloc_file_pseudo()
    ... and switch shmem_file_setup() to alloc_file_pseudo()
    __shmem_file_setup(): reorder allocations
    new wrapper: alloc_file_pseudo()
    kill FILE_{CREATED,OPENED}
    switch atomic_open() and lookup_open() to returning 0 in all success cases
    document ->atomic_open() changes
    ->atomic_open(): return 0 in all success cases
    get rid of 'opened' in path_openat() and the helpers downstream
    ...

    Linus Torvalds
     

06 Aug, 2018

1 commit


03 Aug, 2018

2 commits

  • Commit 05ea88608d4e ("mm, hugetlbfs: introduce ->pagesize() to
    vm_operations_struct") adds a new ->pagesize() function to
    hugetlb_vm_ops, intended to cover all hugetlbfs backed files.

    With System V shared memory model, if "huge page" is specified, the
    "shared memory" is backed by hugetlbfs files, but the mappings initiated
    via shmget/shmat have their original vm_ops overwritten with shm_vm_ops,
    so we need to add a ->pagesize function to shm_vm_ops. Otherwise,
    vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs backed vma,
    result in below BUG:

    fs/hugetlbfs/inode.c
    443 if (unlikely(page_mapped(page))) {
    444 BUG_ON(truncate_op);

    resulting in

    hugetlbfs: oracle (4592): Using mlock ulimits for SHM_HUGETLB is deprecated
    ------------[ cut here ]------------
    kernel BUG at fs/hugetlbfs/inode.c:444!
    Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ...
    CPU: 35 PID: 5583 Comm: oracle_5583_sbt Not tainted 4.14.35-1829.el7uek.x86_64 #2
    RIP: 0010:remove_inode_hugepages+0x3db/0x3e2
    ....
    Call Trace:
    hugetlbfs_evict_inode+0x1e/0x3e
    evict+0xdb/0x1af
    iput+0x1a2/0x1f7
    dentry_unlink_inode+0xc6/0xf0
    __dentry_kill+0xd8/0x18d
    dput+0x1b5/0x1ed
    __fput+0x18b/0x216
    ____fput+0xe/0x10
    task_work_run+0x90/0xa7
    exit_to_usermode_loop+0xdd/0x116
    do_syscall_64+0x187/0x1ae
    entry_SYSCALL_64_after_hwframe+0x150/0x0

    [jane.chu@oracle.com: relocate comment]
    Link: http://lkml.kernel.org/r/20180731044831.26036-1-jane.chu@oracle.com
    Link: http://lkml.kernel.org/r/20180727211727.5020-1-jane.chu@oracle.com
    Fixes: 05ea88608d4e13 ("mm, hugetlbfs: introduce ->pagesize() to vm_operations_struct")
    Signed-off-by: Jane Chu
    Suggested-by: Mike Kravetz
    Reviewed-by: Mike Kravetz
    Acked-by: Davidlohr Bueso
    Acked-by: Michal Hocko
    Cc: Dan Williams
    Cc: Jan Kara
    Cc: Jérôme Glisse
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jane Chu
     
  • The BTF conflicts were simple overlapping changes.

    The virtio_net conflict was an overlap of a fix of statistics counter,
    happening alongisde a move over to a bonafide statistics structure
    rather than counting value on the stack.

    Signed-off-by: David S. Miller

    David S. Miller
     

27 Jul, 2018

1 commit

  • In order for load/store tearing prevention to work, _all_ accesses to
    the variable in question need to be done around READ and WRITE_ONCE()
    macros. Ensure everyone does so for q->status variable for
    semtimedop().

    Link: http://lkml.kernel.org/r/20180717052654.676-1-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

12 Jul, 2018

2 commits


22 Jun, 2018

1 commit

  • Due to the use of rhashtables in net namespaces,
    rhashtable.h is included in lots of the kernel,
    so a small changes can required a large recompilation.
    This makes development painful.

    This patch splits out rhashtable-types.h which just includes
    the major type declarations, and does not include (non-trivial)
    inline code. rhashtable.h is no longer included by anything
    in the include/ directory.
    Common include files only include rhashtable-types.h so a large
    recompilation is only triggered when that changes.

    Acked-by: Herbert Xu
    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     

15 Jun, 2018

2 commits

  • Use new return type vm_fault_t for fault handler. For now, this is just
    documenting that the function returns a VM_FAULT value rather than an
    errno. Once all instances are converted, vm_fault_t will become a
    distinct type.

    Commit 1c8f422059ae ("mm: change return type to vm_fault_t")

    Link: http://lkml.kernel.org/r/20180425043413.GA21467@jordon-HP-15-Notebook-PC
    Signed-off-by: Souptick Joarder
    Reviewed-by: Matthew Wilcox
    Reviewed-by: Andrew Morton
    Acked-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Souptick Joarder
     
  • Both smatch and coverity are reporting potential issues with spectre
    variant 1 with the 'semnum' index within the sma->sems array, ie:

    ipc/sem.c:388 sem_lock() warn: potential spectre issue 'sma->sems'
    ipc/sem.c:641 perform_atomic_semop_slow() warn: potential spectre issue 'sma->sems'
    ipc/sem.c:721 perform_atomic_semop() warn: potential spectre issue 'sma->sems'

    Avoid any possible speculation by using array_index_nospec() thus
    ensuring the semnum value is bounded to [0, sma->sem_nsems). With the
    exception of sem_lock() all of these are slowpaths.

    Link: http://lkml.kernel.org/r/20180423171131.njs4rfm2yzyeg6do@linux-n805
    Signed-off-by: Davidlohr Bueso
    Reported-by: Dan Carpenter
    Cc: Peter Zijlstra
    Cc: "Gustavo A. R. Silva"
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

13 Jun, 2018

1 commit

  • The kvmalloc() function has a 2-factor argument form, kvmalloc_array(). This
    patch replaces cases of:

    kvmalloc(a * b, gfp)

    with:
    kvmalloc_array(a * b, gfp)

    as well as handling cases of:

    kvmalloc(a * b * c, gfp)

    with:

    kvmalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kvmalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kvmalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kvmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kvmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kvmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kvmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kvmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kvmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kvmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kvmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kvmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kvmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kvmalloc
    + kvmalloc_array
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kvmalloc
    + kvmalloc_array
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kvmalloc
    + kvmalloc_array
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kvmalloc
    + kvmalloc_array
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kvmalloc
    + kvmalloc_array
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kvmalloc
    + kvmalloc_array
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kvmalloc
    + kvmalloc_array
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kvmalloc
    + kvmalloc_array
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kvmalloc
    + kvmalloc_array
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kvmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kvmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kvmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kvmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kvmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kvmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kvmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kvmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kvmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kvmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kvmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kvmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kvmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kvmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kvmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kvmalloc(C1 * C2 * C3, ...)
    |
    kvmalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kvmalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kvmalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kvmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kvmalloc(sizeof(THING) * C2, ...)
    |
    kvmalloc(sizeof(TYPE) * C2, ...)
    |
    kvmalloc(C1 * C2 * C3, ...)
    |
    kvmalloc(C1 * C2, ...)
    |
    - kvmalloc
    + kvmalloc_array
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kvmalloc
    + kvmalloc_array
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kvmalloc
    + kvmalloc_array
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kvmalloc
    + kvmalloc_array
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kvmalloc
    + kvmalloc_array
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kvmalloc
    + kvmalloc_array
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kvmalloc
    + kvmalloc_array
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

05 Jun, 2018

1 commit

  • Pull time/Y2038 updates from Thomas Gleixner:

    - Consolidate SySV IPC UAPI headers

    - Convert SySV IPC to the new COMPAT_32BIT_TIME mechanism

    - Cleanup the core interfaces and standardize on the ktime_get_* naming
    convention.

    - Convert the X86 platform ops to timespec64

    - Remove the ugly temporary timespec64 hack

    * 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
    x86: Convert x86_platform_ops to timespec64
    timekeeping: Add more coarse clocktai/boottime interfaces
    timekeeping: Add ktime_get_coarse_with_offset
    timekeeping: Standardize on ktime_get_*() naming
    timekeeping: Clean up ktime_get_real_ts64
    timekeeping: Remove timespec64 hack
    y2038: ipc: Redirect ipc(SEMTIMEDOP, ...) to compat_ksys_semtimedop
    y2038: ipc: Enable COMPAT_32BIT_TIME
    y2038: ipc: Use __kernel_timespec
    y2038: ipc: Report long times to user space
    y2038: ipc: Use ktime_get_real_seconds consistently
    y2038: xtensa: Extend sysvipc data structures
    y2038: powerpc: Extend sysvipc data structures
    y2038: sparc: Extend sysvipc data structures
    y2038: parisc: Extend sysvipc data structures
    y2038: mips: Extend sysvipc data structures
    y2038: arm64: Extend sysvipc compat data structures
    y2038: s390: Remove unneeded ipc uapi header files
    y2038: ia64: Remove unneeded ipc uapi header files
    y2038: alpha: Remove unneeded ipc uapi header files
    ...

    Linus Torvalds
     

26 May, 2018

2 commits

  • shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
    fact the very first thing we check for. Andrea reported that for
    SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
    but we need to check again if the address was rounded down to nil. As
    of this patch, such cases will return -EINVAL.

    Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805
    Signed-off-by: Davidlohr Bueso
    Reported-by: Andrea Arcangeli
    Cc: Joe Lawrence
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Patch series "ipc/shm: shmat() fixes around nil-page".

    These patches fix two issues reported[1] a while back by Joe and Andrea
    around how shmat(2) behaves with nil-page.

    The first reverts a commit that it was incorrectly thought that mapping
    nil-page (address=0) was a no no with MAP_FIXED. This is not the case,
    with the exception of SHM_REMAP; which is address in the second patch.

    I chose two patches because it is easier to backport and it explicitly
    reverts bogus behaviour. Both patches ought to be in -stable and ltp
    testcases need updated (the added testcase around the cve can be
    modified to just test for SHM_RND|SHM_REMAP).

    [1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805

    This patch (of 2):

    Commit 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
    worked on the idea that we should not be mapping as root addr=0 and
    MAP_FIXED. However, it was reported that this scenario is in fact
    valid, thus making the patch both bogus and breaks userspace as well.

    For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
    initialization[1].

    [1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
    Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
    Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
    Signed-off-by: Davidlohr Bueso
    Reported-by: Joe Lawrence
    Reported-by: Andrea Arcangeli
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso