24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

13 Aug, 2020

2 commits

  • Remove the superfuous break, as there is a 'return' before it.

    Signed-off-by: Liao Pingfang
    Signed-off-by: Yi Wang
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Link: http://lkml.kernel.org/r/1594724361-11525-1-git-send-email-wang.yi59@zte.com.cn
    Signed-off-by: Linus Torvalds

    Liao Pingfang
     
  • Two functions are only called via function pointers, don't bother
    inlining them.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Cc: Manfred Spraul
    Cc: Davidlohr Bueso
    Link: http://lkml.kernel.org/r/20200710200312.GA960353@localhost.localdomain
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

08 Aug, 2020

1 commit

  • The current split between do_mmap() and do_mmap_pgoff() was introduced in
    commit 1fcfd8db7f82 ("mm, mpx: add "vm_flags_t vm_flags" arg to
    do_mmap_pgoff()") to support MPX.

    The wrapper function do_mmap_pgoff() always passed 0 as the value of the
    vm_flags argument to do_mmap(). However, MPX support has subsequently
    been removed from the kernel and there were no more direct callers of
    do_mmap(); all calls were going via do_mmap_pgoff().

    Simplify the code by removing do_mmap_pgoff() and changing all callers to
    directly call do_mmap(), which now no longer takes a vm_flags argument.

    Signed-off-by: Peter Collingbourne
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Link: http://lkml.kernel.org/r/20200727194109.1371462-1-pcc@google.com
    Signed-off-by: Linus Torvalds

    Peter Collingbourne
     

10 Jun, 2020

1 commit

  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

08 Apr, 2020

1 commit

  • Fix the following sparse warning:

    ipc/shm.c:1335:6: warning: symbol 'compat_ksys_shmctl' was not declared.
    Should it be static?

    Reported-by: Hulk Robot
    Signed-off-by: Jason Yan
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200403063933.24785-1-yanaijie@huawei.com
    Signed-off-by: Linus Torvalds

    Jason Yan
     

26 Jan, 2019

1 commit

  • The behavior of these system calls is slightly different between
    architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION
    symbol. Most architectures that implement the split IPC syscalls don't set
    that symbol and only get the modern version, but alpha, arm, microblaze,
    mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag.

    For the architectures that so far only implement sys_ipc(), i.e. m68k,
    mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior
    when adding the split syscalls, so we need to distinguish between the
    two groups of architectures.

    The method I picked for this distinction is to have a separate system call
    entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl()
    does not. The system call tables of the five architectures are changed
    accordingly.

    As an additional benefit, we no longer need the configuration specific
    definition for ipc_parse_version(), it always does the same thing now,
    but simply won't get called on architectures with the modern interface.

    A small downside is that on architectures that do set
    ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points
    that are never called. They only add a few bytes of bloat, so it seems
    better to keep them compared to adding yet another Kconfig symbol.
    I considered adding new syscall numbers for the IPC_64 variants for
    consistency, but decided against that for now.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

26 Oct, 2018

1 commit

  • Pull timekeeping updates from Thomas Gleixner:
    "The timers and timekeeping departement provides:

    - Another large y2038 update with further preparations for providing
    the y2038 safe timespecs closer to the syscalls.

    - An overhaul of the SHCMT clocksource driver

    - SPDX license identifier updates

    - Small cleanups and fixes all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
    tick/sched : Remove redundant cpu_online() check
    clocksource/drivers/dw_apb: Add reset control
    clocksource: Remove obsolete CLOCKSOURCE_OF_DECLARE
    clocksource/drivers: Unify the names to timer-* format
    clocksource/drivers/sh_cmt: Add R-Car gen3 support
    dt-bindings: timer: renesas: cmt: document R-Car gen3 support
    clocksource/drivers/sh_cmt: Properly line-wrap sh_cmt_of_table[] initializer
    clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines
    clocksource/drivers/sh_cmt: Fixup for 64-bit machines
    clocksource/drivers/sh_tmu: Convert to SPDX identifiers
    clocksource/drivers/sh_mtu2: Convert to SPDX identifiers
    clocksource/drivers/sh_cmt: Convert to SPDX identifiers
    clocksource/drivers/renesas-ostm: Convert to SPDX identifiers
    clocksource: Convert to using %pOFn instead of device_node.name
    tick/broadcast: Remove redundant check
    RISC-V: Request newstat syscalls
    y2038: signal: Change rt_sigtimedwait to use __kernel_timespec
    y2038: socket: Change recvmmsg to use __kernel_timespec
    y2038: sched: Change sched_rr_get_interval to use __kernel_timespec
    y2038: utimes: Rework #ifdef guards for compat syscalls
    ...

    Linus Torvalds
     

06 Oct, 2018

1 commit

  • This uses ERR_CAST() instead of an open-coded cast, as it is casting
    across structure pointers, which upsets __randomize_layout:

    ipc/shm.c: In function `shm_lock':
    ipc/shm.c:209:9: note: randstruct: casting between randomized structure pointer types (ssa): `struct shmid_kernel' and `struct kern_ipc_perm'

    return (void *)ipcp;
    ^~~~~~~~~~~~

    Link: http://lkml.kernel.org/r/20180919180722.GA15073@beast
    Fixes: 82061c57ce93 ("ipc: drop ipc_lock()")
    Signed-off-by: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     

05 Sep, 2018

1 commit

  • When getting rid of the general ipc_lock(), this was missed furthermore,
    making the comment around the ipc object validity check bogus. Under
    EIDRM conditions, callers will in turn not see the error and continue
    with the operation.

    Link: http://lkml.kernel.org/r/20180824030920.GD3677@linux-r8p5
    Link: http://lkml.kernel.org/r/20180823024051.GC13343@shao2-debian
    Fixes: 82061c57ce9 ("ipc: drop ipc_lock()")
    Signed-off-by: Davidlohr Bueso
    Reported-by: kernel test robot
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

27 Aug, 2018

1 commit

  • Christoph Hellwig suggested a slightly different path for handling
    backwards compatibility with the 32-bit time_t based system calls:

    Rather than simply reusing the compat_sys_* entry points on 32-bit
    architectures unchanged, we get rid of those entry points and the
    compat_time types by renaming them to something that makes more sense
    on 32-bit architectures (which don't have a compat mode otherwise),
    and then share the entry points under the new name with the 64-bit
    architectures that use them for implementing the compatibility.

    The following types and interfaces are renamed here, and moved
    from linux/compat_time.h to linux/time32.h:

    old new
    --- ---
    compat_time_t old_time32_t
    struct compat_timeval struct old_timeval32
    struct compat_timespec struct old_timespec32
    struct compat_itimerspec struct old_itimerspec32
    ns_to_compat_timeval() ns_to_old_timeval32()
    get_compat_itimerspec64() get_old_itimerspec32()
    put_compat_itimerspec64() put_old_itimerspec32()
    compat_get_timespec64() get_old_timespec32()
    compat_put_timespec64() put_old_timespec32()

    As we already have aliases in place, this patch addresses only the
    instances that are relevant to the system call interface in particular,
    not those that occur in device drivers and other modules. Those
    will get handled separately, while providing the 64-bit version
    of the respective interfaces.

    I'm not renaming the timex, rusage and itimerval structures, as we are
    still debating what the new interface will look like, and whether we
    will need a replacement at all.

    This also doesn't change the names of the syscall entry points, which can
    be done more easily when we actually switch over the 32-bit architectures
    to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to
    SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix.

    Suggested-by: Christoph Hellwig
    Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

23 Aug, 2018

6 commits

  • The varable names got a mess, thus standardize them again:

    id: user space id. Called semid, shmid, msgid if the type is known.
    Most functions use "id" already.
    idx: "index" for the idr lookup
    Right now, some functions use lid, ipc_addid() already uses idx as
    the variable name.
    seq: sequence number, to avoid quick collisions of the user space id
    key: user space key, used for the rhash tree

    Link: http://lkml.kernel.org/r/20180712185241.4017-12-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Dmitry Vyukov
    Cc: Davidlohr Bueso
    Cc: Davidlohr Bueso
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Now that we know that rhashtable_init() will not fail, we can get rid of a
    lot of the unnecessary cleanup paths when the call errored out.

    [manfred@colorfullife.com: variable name added to util.h to resolve checkpatch warning]
    Link: http://lkml.kernel.org/r/20180712185241.4017-11-manfred@colorfullife.com
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Manfred Spraul
    Cc: Dmitry Vyukov
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • ipc/util.c contains multiple functions to get the ipc object pointer given
    an id number.

    There are two sets of function: One set verifies the sequence counter part
    of the id number, other functions do not check the sequence counter.

    The standard for function names in ipc/util.c is
    - ..._check() functions verify the sequence counter
    - ..._idr() functions do not verify the sequence counter

    ipc_lock() is an exception: It does not verify the sequence counter value,
    but this is not obvious from the function name.

    Furthermore, shm.c is the only user of this helper. Thus, we can simply
    move the logic into shm_lock() and get rid of the function altogether.

    [manfred@colorfullife.com: most of changelog]
    Link: http://lkml.kernel.org/r/20180712185241.4017-7-manfred@colorfullife.com
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Manfred Spraul
    Cc: Dmitry Vyukov
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Both the comment and the name of ipcctl_pre_down_nolock() are misleading:
    The function must be called while holdling the rw semaphore.

    Therefore the patch renames the function to ipcctl_obtain_check(): This
    name matches the other names used in util.c:

    - "obtain" function look up a pointer in the idr, without
    acquiring the object lock.
    - The caller is responsible for locking.
    - _check means that the sequence number is checked.

    Link: http://lkml.kernel.org/r/20180712185241.4017-5-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Reviewed-by: Davidlohr Bueso
    Cc: Davidlohr Bueso
    Cc: Dmitry Vyukov
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • ipc_addid() is impossible to use:
    - for certain failures, the caller must not use ipc_rcu_putref(),
    because the reference counter is not yet initialized.
    - for other failures, the caller must use ipc_rcu_putref(),
    because parallel operations could be ongoing already.

    The patch cleans that up, by initializing the refcount early, and by
    modifying all callers.

    The issues is related to the finding of
    syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com: syzbot found an
    issue with reading kern_ipc_perm.seq, here both read and write to already
    released memory could happen.

    Link: http://lkml.kernel.org/r/20180712185241.4017-4-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Dmitry Vyukov
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Davidlohr Bueso
    Cc: Herbert Xu
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • ipc_addid() initializes kern_ipc_perm.id after having called
    ipc_idr_alloc().

    Thus a parallel semctl() or msgctl() that uses e.g. MSG_STAT may use this
    unitialized value as the return code.

    The patch moves all accesses to kern_ipc_perm.id under the spin_lock().

    The issues is related to the finding of
    syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com: syzbot found an
    issue with kern_ipc_perm.seq

    Link: http://lkml.kernel.org/r/20180712185241.4017-2-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Reviewed-by: Davidlohr Bueso
    Cc: Dmitry Vyukov
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Herbert Xu
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

16 Aug, 2018

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    - Gustavo A. R. Silva keeps working on the implicit switch fallthru
    changes.

    - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
    Luca Coelho.

    - Re-enable ASPM in r8169, from Kai-Heng Feng.

    - Add virtual XFRM interfaces, which avoids all of the limitations of
    existing IPSEC tunnels. From Steffen Klassert.

    - Convert GRO over to use a hash table, so that when we have many
    flows active we don't traverse a long list during accumluation.

    - Many new self tests for routing, TC, tunnels, etc. Too many
    contributors to mention them all, but I'm really happy to keep
    seeing this stuff.

    - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.

    - Lots of cleanups and fixes in L2TP code from Guillaume Nault.

    - Add IPSEC offload support to netdevsim, from Shannon Nelson.

    - Add support for slotting with non-uniform distribution to netem
    packet scheduler, from Yousuk Seung.

    - Add UDP GSO support to mlx5e, from Boris Pismenny.

    - Support offloading of Team LAG in NFP, from John Hurley.

    - Allow to configure TX queue selection based upon RX queue, from
    Amritha Nambiar.

    - Support ethtool ring size configuration in aquantia, from Anton
    Mikaev.

    - Support DSCP and flowlabel per-transport in SCTP, from Xin Long.

    - Support list based batching and stack traversal of SKBs, this is
    very exciting work. From Edward Cree.

    - Busyloop optimizations in vhost_net, from Toshiaki Makita.

    - Introduce the ETF qdisc, which allows time based transmissions. IGB
    can offload this in hardware. From Vinicius Costa Gomes.

    - Add parameter support to devlink, from Moshe Shemesh.

    - Several multiplication and division optimizations for BPF JIT in
    nfp driver, from Jiong Wang.

    - Lots of prepatory work to make more of the packet scheduler layer
    lockless, when possible, from Vlad Buslov.

    - Add ACK filter and NAT awareness to sch_cake packet scheduler, from
    Toke Høiland-Jørgensen.

    - Support regions and region snapshots in devlink, from Alex Vesker.

    - Allow to attach XDP programs to both HW and SW at the same time on
    a given device, with initial support in nfp. From Jakub Kicinski.

    - Add TLS RX offload and support in mlx5, from Ilya Lesokhin.

    - Use PHYLIB in r8169 driver, from Heiner Kallweit.

    - All sorts of changes to support Spectrum 2 in mlxsw driver, from
    Ido Schimmel.

    - PTP support in mv88e6xxx DSA driver, from Andrew Lunn.

    - Make TCP_USER_TIMEOUT socket option more accurate, from Jon
    Maxwell.

    - Support for templates in packet scheduler classifier, from Jiri
    Pirko.

    - IPV6 support in RDS, from Ka-Cheong Poon.

    - Native tproxy support in nf_tables, from Máté Eckl.

    - Maintain IP fragment queue in an rbtree, but optimize properly for
    in-order frags. From Peter Oskolkov.

    - Improvde handling of ACKs on hole repairs, from Yuchung Cheng"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
    bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
    hv/netvsc: Fix NULL dereference at single queue mode fallback
    net: filter: mark expected switch fall-through
    xen-netfront: fix warn message as irq device name has '/'
    cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
    net: dsa: mv88e6xxx: missing unlock on error path
    rds: fix building with IPV6=m
    inet/connection_sock: prefer _THIS_IP_ to current_text_addr
    net: dsa: mv88e6xxx: bitwise vs logical bug
    net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
    ieee802154: hwsim: using right kind of iteration
    net: hns3: Add vlan filter setting by ethtool command -K
    net: hns3: Set tx ring' tc info when netdev is up
    net: hns3: Remove tx ring BD len register in hns3_enet
    net: hns3: Fix desc num set to default when setting channel
    net: hns3: Fix for phy link issue when using marvell phy driver
    net: hns3: Fix for information of phydev lost problem when down/up
    net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
    net: hns3: Add support for serdes loopback selftest
    bnxt_en: take coredump_record structure off stack
    ...

    Linus Torvalds
     

14 Aug, 2018

1 commit

  • Pull vfs open-related updates from Al Viro:

    - "do we need fput() or put_filp()" rules are gone - it's always fput()
    now. We keep track of that state where it belongs - in ->f_mode.

    - int *opened mess killed - in finish_open(), in ->atomic_open()
    instances and in fs/namei.c code around do_last()/lookup_open()/atomic_open().

    - alloc_file() wrappers with saner calling conventions are introduced
    (alloc_file_clone() and alloc_file_pseudo()); callers converted, with
    much simplification.

    - while we are at it, saner calling conventions for path_init() and
    link_path_walk(), simplifying things inside fs/namei.c (both on
    open-related paths and elsewhere).

    * 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
    few more cleanups of link_path_walk() callers
    allow link_path_walk() to take ERR_PTR()
    make path_init() unconditionally paired with terminate_walk()
    document alloc_file() changes
    make alloc_file() static
    do_shmat(): grab shp->shm_file earlier, switch to alloc_file_clone()
    new helper: alloc_file_clone()
    create_pipe_files(): switch the first allocation to alloc_file_pseudo()
    anon_inode_getfile(): switch to alloc_file_pseudo()
    hugetlb_file_setup(): switch to alloc_file_pseudo()
    ocxlflash_getfile(): switch to alloc_file_pseudo()
    cxl_getfile(): switch to alloc_file_pseudo()
    ... and switch shmem_file_setup() to alloc_file_pseudo()
    __shmem_file_setup(): reorder allocations
    new wrapper: alloc_file_pseudo()
    kill FILE_{CREATED,OPENED}
    switch atomic_open() and lookup_open() to returning 0 in all success cases
    document ->atomic_open() changes
    ->atomic_open(): return 0 in all success cases
    get rid of 'opened' in path_openat() and the helpers downstream
    ...

    Linus Torvalds
     

06 Aug, 2018

1 commit


03 Aug, 2018

1 commit

  • Commit 05ea88608d4e ("mm, hugetlbfs: introduce ->pagesize() to
    vm_operations_struct") adds a new ->pagesize() function to
    hugetlb_vm_ops, intended to cover all hugetlbfs backed files.

    With System V shared memory model, if "huge page" is specified, the
    "shared memory" is backed by hugetlbfs files, but the mappings initiated
    via shmget/shmat have their original vm_ops overwritten with shm_vm_ops,
    so we need to add a ->pagesize function to shm_vm_ops. Otherwise,
    vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs backed vma,
    result in below BUG:

    fs/hugetlbfs/inode.c
    443 if (unlikely(page_mapped(page))) {
    444 BUG_ON(truncate_op);

    resulting in

    hugetlbfs: oracle (4592): Using mlock ulimits for SHM_HUGETLB is deprecated
    ------------[ cut here ]------------
    kernel BUG at fs/hugetlbfs/inode.c:444!
    Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ...
    CPU: 35 PID: 5583 Comm: oracle_5583_sbt Not tainted 4.14.35-1829.el7uek.x86_64 #2
    RIP: 0010:remove_inode_hugepages+0x3db/0x3e2
    ....
    Call Trace:
    hugetlbfs_evict_inode+0x1e/0x3e
    evict+0xdb/0x1af
    iput+0x1a2/0x1f7
    dentry_unlink_inode+0xc6/0xf0
    __dentry_kill+0xd8/0x18d
    dput+0x1b5/0x1ed
    __fput+0x18b/0x216
    ____fput+0xe/0x10
    task_work_run+0x90/0xa7
    exit_to_usermode_loop+0xdd/0x116
    do_syscall_64+0x187/0x1ae
    entry_SYSCALL_64_after_hwframe+0x150/0x0

    [jane.chu@oracle.com: relocate comment]
    Link: http://lkml.kernel.org/r/20180731044831.26036-1-jane.chu@oracle.com
    Link: http://lkml.kernel.org/r/20180727211727.5020-1-jane.chu@oracle.com
    Fixes: 05ea88608d4e13 ("mm, hugetlbfs: introduce ->pagesize() to vm_operations_struct")
    Signed-off-by: Jane Chu
    Suggested-by: Mike Kravetz
    Reviewed-by: Mike Kravetz
    Acked-by: Davidlohr Bueso
    Acked-by: Michal Hocko
    Cc: Dan Williams
    Cc: Jan Kara
    Cc: Jérôme Glisse
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jane Chu
     

12 Jul, 2018

2 commits


22 Jun, 2018

1 commit

  • Due to the use of rhashtables in net namespaces,
    rhashtable.h is included in lots of the kernel,
    so a small changes can required a large recompilation.
    This makes development painful.

    This patch splits out rhashtable-types.h which just includes
    the major type declarations, and does not include (non-trivial)
    inline code. rhashtable.h is no longer included by anything
    in the include/ directory.
    Common include files only include rhashtable-types.h so a large
    recompilation is only triggered when that changes.

    Acked-by: Herbert Xu
    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     

15 Jun, 2018

1 commit

  • Use new return type vm_fault_t for fault handler. For now, this is just
    documenting that the function returns a VM_FAULT value rather than an
    errno. Once all instances are converted, vm_fault_t will become a
    distinct type.

    Commit 1c8f422059ae ("mm: change return type to vm_fault_t")

    Link: http://lkml.kernel.org/r/20180425043413.GA21467@jordon-HP-15-Notebook-PC
    Signed-off-by: Souptick Joarder
    Reviewed-by: Matthew Wilcox
    Reviewed-by: Andrew Morton
    Acked-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Souptick Joarder
     

05 Jun, 2018

1 commit

  • Pull time/Y2038 updates from Thomas Gleixner:

    - Consolidate SySV IPC UAPI headers

    - Convert SySV IPC to the new COMPAT_32BIT_TIME mechanism

    - Cleanup the core interfaces and standardize on the ktime_get_* naming
    convention.

    - Convert the X86 platform ops to timespec64

    - Remove the ugly temporary timespec64 hack

    * 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
    x86: Convert x86_platform_ops to timespec64
    timekeeping: Add more coarse clocktai/boottime interfaces
    timekeeping: Add ktime_get_coarse_with_offset
    timekeeping: Standardize on ktime_get_*() naming
    timekeeping: Clean up ktime_get_real_ts64
    timekeeping: Remove timespec64 hack
    y2038: ipc: Redirect ipc(SEMTIMEDOP, ...) to compat_ksys_semtimedop
    y2038: ipc: Enable COMPAT_32BIT_TIME
    y2038: ipc: Use __kernel_timespec
    y2038: ipc: Report long times to user space
    y2038: ipc: Use ktime_get_real_seconds consistently
    y2038: xtensa: Extend sysvipc data structures
    y2038: powerpc: Extend sysvipc data structures
    y2038: sparc: Extend sysvipc data structures
    y2038: parisc: Extend sysvipc data structures
    y2038: mips: Extend sysvipc data structures
    y2038: arm64: Extend sysvipc compat data structures
    y2038: s390: Remove unneeded ipc uapi header files
    y2038: ia64: Remove unneeded ipc uapi header files
    y2038: alpha: Remove unneeded ipc uapi header files
    ...

    Linus Torvalds
     

26 May, 2018

2 commits

  • shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
    fact the very first thing we check for. Andrea reported that for
    SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
    but we need to check again if the address was rounded down to nil. As
    of this patch, such cases will return -EINVAL.

    Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805
    Signed-off-by: Davidlohr Bueso
    Reported-by: Andrea Arcangeli
    Cc: Joe Lawrence
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Patch series "ipc/shm: shmat() fixes around nil-page".

    These patches fix two issues reported[1] a while back by Joe and Andrea
    around how shmat(2) behaves with nil-page.

    The first reverts a commit that it was incorrectly thought that mapping
    nil-page (address=0) was a no no with MAP_FIXED. This is not the case,
    with the exception of SHM_REMAP; which is address in the second patch.

    I chose two patches because it is easier to backport and it explicitly
    reverts bogus behaviour. Both patches ought to be in -stable and ltp
    testcases need updated (the added testcase around the cve can be
    modified to just test for SHM_RND|SHM_REMAP).

    [1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805

    This patch (of 2):

    Commit 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
    worked on the idea that we should not be mapping as root addr=0 and
    MAP_FIXED. However, it was reported that this scenario is in fact
    valid, thus making the patch both bogus and breaks userspace as well.

    For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
    initialization[1].

    [1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
    Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
    Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
    Signed-off-by: Davidlohr Bueso
    Reported-by: Joe Lawrence
    Reported-by: Andrea Arcangeli
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

20 Apr, 2018

1 commit

  • The shmid64_ds/semid64_ds/msqid64_ds data structures have been extended
    to contain extra fields for storing the upper bits of the time stamps,
    this patch does the other half of the job and and fills the new fields on
    32-bit architectures as well as 32-bit tasks running on a 64-bit kernel
    in compat mode.

    There should be no change for native 64-bit tasks.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

14 Apr, 2018

1 commit

  • syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
    shm_get_unmapped_area(), called via sys_remap_file_pages().

    Unfortunately it couldn't generate a reproducer, but I found a bug which
    I think caused it. When remap_file_pages() is passed a full System V
    shared memory segment, the memory is first unmapped, then a new map is
    created using the ->vm_file. Between these steps, the shm ID can be
    removed and reused for a new shm segment. But, shm_mmap() only checks
    whether the ID is currently valid before calling the underlying file's
    ->mmap(); it doesn't check whether it was reused. Thus it can use the
    wrong underlying file, one that was already freed.

    Fix this by making the "outer" shm file (the one that gets put in
    ->vm_file) hold a reference to the real shm file, and by making
    __shm_open() require that the file associated with the shm ID matches
    the one associated with the "outer" file.

    Taking the reference to the real shm file is needed to fully solve the
    problem, since otherwise sfd->file could point to a freed file, which
    then could be reallocated for the reused shm ID, causing the wrong shm
    segment to be mapped (and without the required permission checks).

    Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in
    shm_mmap()") almost fixed this bug, but it didn't go far enough because
    it didn't consider the case where the shm ID is reused.

    The following program usually reproduces this bug:

    #include
    #include
    #include
    #include

    int main()
    {
    int is_parent = (fork() != 0);
    srand(getpid());
    for (;;) {
    int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
    if (is_parent) {
    void *addr = shmat(id, NULL, 0);
    usleep(rand() % 50);
    while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
    } else {
    usleep(rand() % 50);
    shmctl(id, IPC_RMID, NULL);
    }
    }
    }

    It causes the following NULL pointer dereference due to a 'struct file'
    being used while it's being freed. (I couldn't actually get a KASAN
    use-after-free splat like in the syzbot report. But I think it's
    possible with this bug; it would just take a more extraordinary race...)

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
    PGD 0 P4D 0
    Oops: 0000 [#1] SMP NOPTI
    CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
    RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
    RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
    [...]
    Call Trace:
    file_accessed include/linux/fs.h:2063 [inline]
    shmem_mmap+0x25/0x40 mm/shmem.c:2149
    call_mmap include/linux/fs.h:1789 [inline]
    shm_mmap+0x34/0x80 ipc/shm.c:465
    call_mmap include/linux/fs.h:1789 [inline]
    mmap_region+0x309/0x5b0 mm/mmap.c:1712
    do_mmap+0x294/0x4a0 mm/mmap.c:1483
    do_mmap_pgoff include/linux/mm.h:2235 [inline]
    SYSC_remap_file_pages mm/mmap.c:2853 [inline]
    SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
    do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    [ebiggers@google.com: add comment]
    Link: http://lkml.kernel.org/r/20180410192850.235835-1-ebiggers3@gmail.com
    Link: http://lkml.kernel.org/r/20180409043039.28915-1-ebiggers3@gmail.com
    Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439@syzkaller.appspotmail.com
    Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation")
    Signed-off-by: Eric Biggers
    Acked-by: Kirill A. Shutemov
    Acked-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: "Eric W . Biederman"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Biggers
     

12 Apr, 2018

2 commits

  • This was added by the recent "ipc/shm.c: add split function to
    shm_vm_ops", but it is not necessary.

    Reviewed-by: Mike Kravetz
    Cc: Laurent Dufour
    Cc: Dan Williams
    Cc: Michal Hocko
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Patch series "sysvipc: introduce STAT_ANY commands", v2.

    The following patches adds the discussed (see [1]) new command for shm
    as well as for sems and msq as they are subject to the same
    discrepancies for ipc object permission checks between the syscall and
    via procfs. These new commands are justified in that (1) we are stuck
    with this semantics as changing syscall and procfs can break userland;
    and (2) some users can benefit from performance (for large amounts of
    shm segments, for example) from not having to parse the procfs
    interface.

    Once merged, I will submit the necesary manpage updates. But I'm thinking
    something like:

    : diff --git a/man2/shmctl.2 b/man2/shmctl.2
    : index 7bb503999941..bb00bbe21a57 100644
    : --- a/man2/shmctl.2
    : +++ b/man2/shmctl.2
    : @@ -41,6 +41,7 @@
    : .\" 2005-04-25, mtk -- noted aberrant Linux behavior w.r.t. new
    : .\" attaches to a segment that has already been marked for deletion.
    : .\" 2005-08-02, mtk: Added IPC_INFO, SHM_INFO, SHM_STAT descriptions.
    : +.\" 2018-02-13, dbueso: Added SHM_STAT_ANY description.
    : .\"
    : .TH SHMCTL 2 2017-09-15 "Linux" "Linux Programmer's Manual"
    : .SH NAME
    : @@ -242,6 +243,18 @@ However, the
    : argument is not a segment identifier, but instead an index into
    : the kernel's internal array that maintains information about
    : all shared memory segments on the system.
    : +.TP
    : +.BR SHM_STAT_ANY " (Linux-specific)"
    : +Return a
    : +.I shmid_ds
    : +structure as for
    : +.BR SHM_STAT .
    : +However, the
    : +.I shm_perm.mode
    : +is not checked for read access for
    : +.IR shmid ,
    : +resembing the behaviour of
    : +/proc/sysvipc/shm.
    : .PP
    : The caller can prevent or allow swapping of a shared
    : memory segment with the following \fIcmd\fP values:
    : @@ -287,7 +300,7 @@ operation returns the index of the highest used entry in the
    : kernel's internal array recording information about all
    : shared memory segments.
    : (This information can be used with repeated
    : -.B SHM_STAT
    : +.B SHM_STAT/SHM_STAT_ANY
    : operations to obtain information about all shared memory segments
    : on the system.)
    : A successful
    : @@ -328,7 +341,7 @@ isn't accessible.
    : \fIshmid\fP is not a valid identifier, or \fIcmd\fP
    : is not a valid command.
    : Or: for a
    : -.B SHM_STAT
    : +.B SHM_STAT/SHM_STAT_ANY
    : operation, the index value specified in
    : .I shmid
    : referred to an array slot that is currently unused.

    This patch (of 3):

    There is a permission discrepancy when consulting shm ipc object metadata
    between /proc/sysvipc/shm (0444) and the SHM_STAT shmctl command. The
    later does permission checks for the object vs S_IRUGO. As such there can
    be cases where EACCESS is returned via syscall but the info is displayed
    anyways in the procfs files.

    While this might have security implications via info leaking (albeit no
    writing to the shm metadata), this behavior goes way back and showing all
    the objects regardless of the permissions was most likely an overlook - so
    we are stuck with it. Furthermore, modifying either the syscall or the
    procfs file can cause userspace programs to break (ie ipcs). Some
    applications require getting the procfs info (without root privileges) and
    can be rather slow in comparison with a syscall -- up to 500x in some
    reported cases.

    This patch introduces a new SHM_STAT_ANY command such that the shm ipc
    object permissions are ignored, and only audited instead. In addition,
    I've left the lsm security hook checks in place, as if some policy can
    block the call, then the user has no other choice than just parsing the
    procfs file.

    [1] https://lkml.org/lkml/2017/12/19/220

    Link: http://lkml.kernel.org/r/20180215162458.10059-2-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Acked-by: Michal Hocko
    Cc: Michael Kerrisk
    Cc: Manfred Spraul
    Cc: Eric W. Biederman
    Cc: Kees Cook
    Cc: Robert Kettler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

04 Apr, 2018

1 commit

  • Pull namespace updates from Eric Biederman:
    "There was a lot of work this cycle fixing bugs that were discovered
    after the merge window and getting everything ready where we can
    reasonably support fully unprivileged fuse. The bug fixes you already
    have and much of the unprivileged fuse work is coming in via other
    trees.

    Still left for fully unprivileged fuse is figuring out how to cleanly
    handle .set_acl and .get_acl in the legacy case, and properly handling
    of evm xattrs on unprivileged mounts.

    Included in the tree is a cleanup from Alexely that replaced a linked
    list with a statically allocated fix sized array for the pid caches,
    which simplifies and speeds things up.

    Then there is are some cleanups and fixes for the ipc namespace. The
    motivation was that in reviewing other code it was discovered that
    access ipc objects from different pid namespaces recorded pids in such
    a way that when asked the wrong pids were returned. In the worst case
    there has been a measured 30% performance impact for sysvipc
    semaphores. Other test cases showed no measurable performance impact.
    Manfred Spraul and Davidlohr Bueso who tend to work on sysvipc
    performance both gave the nod that this is good enough.

    Casey Schaufler and James Morris have given their approval to the LSM
    side of the changes.

    I simplified the types and the code dealing with sysvipc to pass just
    kern_ipc_perm for all three types of ipc. Which reduced the header
    dependencies throughout the kernel and simplified the lsm code.

    Which let me work on the pid fixes without having to worry about
    trivial changes causing complete kernel recompiles"

    * 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    ipc/shm: Fix pid freeing.
    ipc/shm: fix up for struct file no longer being available in shm.h
    ipc/smack: Tidy up from the change in type of the ipc security hooks
    ipc: Directly call the security hook in ipc_ops.associate
    ipc/sem: Fix semctl(..., GETPID, ...) between pid namespaces
    ipc/msg: Fix msgctl(..., IPC_STAT, ...) between pid namespaces
    ipc/shm: Fix shmctl(..., IPC_STAT, ...) between pid namespaces.
    ipc/util: Helpers for making the sysvipc operations pid namespace aware
    ipc: Move IPCMNI from include/ipc.h into ipc/util.h
    msg: Move struct msg_queue into ipc/msg.c
    shm: Move struct shmid_kernel into ipc/shm.c
    sem: Move struct sem and struct sem_array into ipc/sem.c
    msg/security: Pass kern_ipc_perm not msg_queue into the msg_queue security hooks
    shm/security: Pass kern_ipc_perm not shmid_kernel into the shm security hooks
    sem/security: Pass kern_ipc_perm not sem_array into the sem security hooks
    pidns: simpler allocation of pid_* caches

    Linus Torvalds
     

03 Apr, 2018

4 commits

  • Pull removal of in-kernel calls to syscalls from Dominik Brodowski:
    "System calls are interaction points between userspace and the kernel.
    Therefore, system call functions such as sys_xyzzy() or
    compat_sys_xyzzy() should only be called from userspace via the
    syscall table, but not from elsewhere in the kernel.

    At least on 64-bit x86, it will likely be a hard requirement from
    v4.17 onwards to not call system call functions in the kernel: It is
    better to use use a different calling convention for system calls
    there, where struct pt_regs is decoded on-the-fly in a syscall wrapper
    which then hands processing over to the actual syscall function. This
    means that only those parameters which are actually needed for a
    specific syscall are passed on during syscall entry, instead of
    filling in six CPU registers with random user space content all the
    time (which may cause serious trouble down the call chain). Those
    x86-specific patches will be pushed through the x86 tree in the near
    future.

    Moreover, rules on how data may be accessed may differ between kernel
    data and user data. This is another reason why calling sys_xyzzy() is
    generally a bad idea, and -- at most -- acceptable in arch-specific
    code.

    This patchset removes all in-kernel calls to syscall functions in the
    kernel with the exception of arch/. On top of this, it cleans up the
    three places where many syscalls are referenced or prototyped, namely
    kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h"

    * 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits)
    bpf: whitelist all syscalls for error injection
    kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions
    kernel/sys_ni: sort cond_syscall() entries
    syscalls/x86: auto-create compat_sys_*() prototypes
    syscalls: sort syscall prototypes in include/linux/compat.h
    net: remove compat_sys_*() prototypes from net/compat.h
    syscalls: sort syscall prototypes in include/linux/syscalls.h
    kexec: move sys_kexec_load() prototype to syscalls.h
    x86/sigreturn: use SYSCALL_DEFINE0
    x86: fix sys_sigreturn() return type to be long, not unsigned long
    x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
    mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()
    mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()
    mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()
    fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate()
    fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls
    fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate()
    fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall
    kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid()
    kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare()
    ...

    Linus Torvalds
     
  • Provide ksys_shmctl() and compat_ksys_shmctl() wrappers to avoid in-kernel
    calls to these syscalls. The ksys_ prefix denotes that these functions are
    meant as a drop-in replacement for the syscalls. In particular, they use
    the same calling convention as sys_shmctl() and compat_sys_shmctl().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Provide ksys_shmdt() wrapper to avoid in-kernel calls to this syscall.
    The ksys_ prefix denotes that this function is meant as a drop-in
    replacement for the syscall. In particular, it uses the same calling
    convention as sys_shmdt().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Provide ksys_shmget() wrapper to avoid in-kernel calls to this syscall.
    The ksys_ prefix denotes that this function is meant as a drop-in
    replacement for the syscall. In particular, it uses the same calling
    convention as sys_shmget().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     

29 Mar, 2018

2 commits

  • If System V shmget/shmat operations are used to create a hugetlbfs
    backed mapping, it is possible to munmap part of the mapping and split
    the underlying vma such that it is not huge page aligned. This will
    untimately result in the following BUG:

    kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE SMP NR_CPUS=2048 NUMA PowerNV
    Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
    CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G C E 4.15.0-10-generic #11-Ubuntu
    NIP: c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009
    REGS: c000003fbcdcf810 TRAP: 0700 Tainted: G C E (4.15.0-10-generic)
    MSR: 9000000000029033 CR: 24002222 XER: 20040000
    CFAR: c00000000036ee44 SOFTE: 1
    NIP __unmap_hugepage_range+0xa4/0x760
    LR __unmap_hugepage_range_final+0x28/0x50
    Call Trace:
    0x7115e4e00000 (unreliable)
    __unmap_hugepage_range_final+0x28/0x50
    unmap_single_vma+0x11c/0x190
    unmap_vmas+0x94/0x140
    exit_mmap+0x9c/0x1d0
    mmput+0xa8/0x1d0
    do_exit+0x360/0xc80
    do_group_exit+0x60/0x100
    SyS_exit_group+0x24/0x30
    system_call+0x58/0x6c
    ---[ end trace ee88f958a1c62605 ]---

    This bug was introduced by commit 31383c6865a5 ("mm, hugetlbfs:
    introduce ->split() to vm_operations_struct"). A split function was
    added to vm_operations_struct to determine if a mapping can be split.
    This was mostly for device-dax and hugetlbfs mappings which have
    specific alignment constraints.

    Mappings initiated via shmget/shmat have their original vm_ops
    overwritten with shm_vm_ops. shm_vm_ops functions will call back to the
    original vm_ops if needed. Add such a split function to shm_vm_ops.

    Link: http://lkml.kernel.org/r/20180321161314.7711-1-mike.kravetz@oracle.com
    Fixes: 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct")
    Signed-off-by: Mike Kravetz
    Reported-by: Laurent Dufour
    Reviewed-by: Laurent Dufour
    Tested-by: Laurent Dufour
    Reviewed-by: Dan Williams
    Acked-by: Michal Hocko
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • The 0day kernel test build report reported an oops:
    >
    > IP: put_pid+0x22/0x5c
    > PGD 19efa067 P4D 19efa067 PUD 0
    > Oops: 0000 [#1]
    > CPU: 0 PID: 727 Comm: trinity Not tainted 4.16.0-rc2-00010-g98f929b #1
    > RIP: 0010:put_pid+0x22/0x5c
    > RSP: 0018:ffff986719f73e48 EFLAGS: 00010202
    > RAX: 00000006d765f710 RBX: ffff98671a4fa4d0 RCX: ffff986719f73d40
    > RDX: 000000006f6e6125 RSI: 0000000000000000 RDI: ffffffffa01e6d21
    > RBP: ffffffffa0955fe0 R08: 0000000000000020 R09: 0000000000000000
    > R10: 0000000000000078 R11: ffff986719f73e76 R12: 0000000000001000
    > R13: 00000000ffffffea R14: 0000000054000fb0 R15: 0000000000000000
    > FS: 00000000028c2880(0000) GS:ffffffffa06ad000(0000) knlGS:0000000000000000
    > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    > CR2: 0000000677846439 CR3: 0000000019fc1005 CR4: 00000000000606b0
    > Call Trace:
    > ? ipc_update_pid+0x36/0x3e
    > ? newseg+0x34c/0x3a6
    > ? ipcget+0x5d/0x528
    > ? entry_SYSCALL_64_after_hwframe+0x52/0xb7
    > ? SyS_shmget+0x5a/0x84
    > ? do_syscall_64+0x194/0x1b3
    > ? entry_SYSCALL_64_after_hwframe+0x42/0xb7
    > Code: ff 05 e7 20 9b 03 58 c9 c3 48 ff 05 85 21 9b 03 48 85 ff 74 4f 8b 47 04 8b 17 48 ff 05 7c 21 9b 03 48 83 c0 03 48 c1 e0 04 ff ca 8b 44 07 08 74 1f 48 ff 05 6c 21 9b 03 ff 0f 0f 94 c2 48 ff
    > RIP: put_pid+0x22/0x5c RSP: ffff986719f73e48
    > CR2: 0000000677846439
    > ---[ end trace ab8c5cb4389d37c5 ]---
    > Kernel panic - not syncing: Fatal exception

    In newseg when changing shm_cprid and shm_lprid from pid_t to struct
    pid* I misread the kvmalloc as kvzalloc and thought shp was
    initialized to 0. As that is not the case it is not safe to for the
    error handling to address shm_cprid and shm_lprid before they are
    initialized.

    Therefore move the cleanup of shm_cprid and shm_lprid from the no_file
    error cleanup path to the no_id error cleanup path. Ensuring that an
    early error exit won't cause the oops above.

    Reported-by: kernel test robot
    Reviewed-by: Nagarathnam Muthusamy
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

28 Mar, 2018

1 commit