05 Nov, 2018

1 commit


04 Nov, 2018

3 commits

  • Pull x86 fixes from Ingo Molnar:
    "A number of fixes and some late updates:

    - make in_compat_syscall() behavior on x86-32 similar to other
    platforms, this touches a number of generic files but is not
    intended to impact non-x86 platforms.

    - objtool fixes

    - PAT preemption fix

    - paravirt fixes/cleanups

    - cpufeatures updates for new instructions

    - earlyprintk quirk

    - make microcode version in sysfs world-readable (it is already
    world-readable in procfs)

    - minor cleanups and fixes"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    compat: Cleanup in_compat_syscall() callers
    x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
    objtool: Support GCC 9 cold subfunction naming scheme
    x86/numa_emulation: Fix uniform-split numa emulation
    x86/paravirt: Remove unused _paravirt_ident_32
    x86/mm/pat: Disable preemption around __flush_tlb_all()
    x86/paravirt: Remove GPL from pv_ops export
    x86/traps: Use format string with panic() call
    x86: Clean up 'sizeof x' => 'sizeof(x)'
    x86/cpufeatures: Enumerate MOVDIR64B instruction
    x86/cpufeatures: Enumerate MOVDIRI instruction
    x86/earlyprintk: Add a force option for pciserial device
    objtool: Support per-function rodata sections
    x86/microcode: Make revision and processor flags world-readable

    Linus Torvalds
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Pull 9p fix from Al Viro:
    "Regression fix for net/9p handling of iov_iter; broken by braino when
    switching to iov_iter_is_kvec() et.al., spotted and fixed by Marc"

    * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    iov_iter: Fix 9p virtio breakage

    Linus Torvalds
     

03 Nov, 2018

1 commit

  • When switching to the new iovec accessors, a negation got subtly
    dropped, leading to 9p being remarkably broken (here with kvmtool):

    [ 7.430941] VFS: Mounted root (9p filesystem) on device 0:15.
    [ 7.432080] devtmpfs: mounted
    [ 7.432717] Freeing unused kernel memory: 1344K
    [ 7.433658] Run /virt/init as init process
    Warning: unable to translate guest address 0x7e00902ff000 to host
    Warning: unable to translate guest address 0x7e00902fefc0 to host
    Warning: unable to translate guest address 0x7e00902ff000 to host
    Warning: unable to translate guest address 0x7e008febef80 to host
    Warning: unable to translate guest address 0x7e008febf000 to host
    Warning: unable to translate guest address 0x7e008febef00 to host
    Warning: unable to translate guest address 0x7e008febf000 to host
    [ 7.436376] Kernel panic - not syncing: Requested init /virt/init failed (error -8).
    [ 7.437554] CPU: 29 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc8-02267-g00e23707442a #291
    [ 7.439006] Hardware name: linux,dummy-virt (DT)
    [ 7.439902] Call trace:
    [ 7.440387] dump_backtrace+0x0/0x148
    [ 7.441104] show_stack+0x14/0x20
    [ 7.441768] dump_stack+0x90/0xb4
    [ 7.442425] panic+0x120/0x27c
    [ 7.443036] kernel_init+0xa4/0x100
    [ 7.443725] ret_from_fork+0x10/0x18
    [ 7.444444] SMP: stopping secondary CPUs
    [ 7.445391] Kernel Offset: disabled
    [ 7.446169] CPU features: 0x0,23000438
    [ 7.446974] Memory Limit: none
    [ 7.447645] ---[ end Kernel panic - not syncing: Requested init /virt/init failed (error -8). ]---

    Restoring the missing "!" brings the guest back to life.

    Fixes: 00e23707442a ("iov_iter: Use accessor function")
    Reported-by: Will Deacon
    Signed-off-by: Marc Zyngier
    Signed-off-by: Al Viro

    Marc Zyngier
     

02 Nov, 2018

5 commits

  • Pull AFS updates from Al Viro:
    "AFS series, with some iov_iter bits included"

    * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
    missing bits of "iov_iter: Separate type from direction and use accessor functions"
    afs: Probe multiple fileservers simultaneously
    afs: Fix callback handling
    afs: Eliminate the address pointer from the address list cursor
    afs: Allow dumping of server cursor on operation failure
    afs: Implement YFS support in the fs client
    afs: Expand data structure fields to support YFS
    afs: Get the target vnode in afs_rmdir() and get a callback on it
    afs: Calc callback expiry in op reply delivery
    afs: Fix FS.FetchStatus delivery from updating wrong vnode
    afs: Implement the YFS cache manager service
    afs: Remove callback details from afs_callback_break struct
    afs: Commit the status on a new file/dir/symlink
    afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
    afs: Don't invoke the server to read data beyond EOF
    afs: Add a couple of tracepoints to log I/O errors
    afs: Handle EIO from delivery function
    afs: Fix TTL on VL server and address lists
    afs: Implement VL server rotation
    afs: Improve FS server rotation error handling
    ...

    Linus Torvalds
     
  • sunrpc patches from nfs tree conflict with calling conventions change done
    in iov_iter work. Trivial fixup...

    Signed-off-by: Al Viro

    Al Viro
     
  • backmerge to do fixup of iov_iter_kvec() conflict

    Al Viro
     
  • The seq_send & seq_send64 fields in struct krb5_ctx are used as
    atomically incrementing counters. This is implemented using cmpxchg() &
    cmpxchg64() to implement what amount to custom versions of
    atomic_fetch_inc() & atomic64_fetch_inc().

    Besides the duplication, using cmpxchg64() has another major drawback in
    that some 32 bit architectures don't provide it. As such commit
    571ed1fd2390 ("SUNRPC: Replace krb5_seq_lock with a lockless scheme")
    resulted in build failures for some architectures.

    Change seq_send to be an atomic_t and seq_send64 to be an atomic64_t,
    then use atomic(64)_* functions to manipulate the values. The atomic64_t
    type & associated functions are provided even on architectures which
    lack real 64 bit atomic memory access via CONFIG_GENERIC_ATOMIC64 which
    uses spinlocks to serialize access. This fixes the build failures for
    architectures lacking cmpxchg64().

    A potential alternative that was raised would be to provide cmpxchg64()
    on the 32 bit architectures that currently lack it, using spinlocks.
    However this would provide a version of cmpxchg64() with semantics a
    little different to the implementations on architectures with real 64
    bit atomics - the spinlock-based implementation would only work if all
    access to the memory used with cmpxchg64() is *always* performed using
    cmpxchg64(). That is not currently a requirement for users of
    cmpxchg64(), and making it one seems questionable. As such avoiding
    cmpxchg64() outside of architecture-specific code seems best,
    particularly in cases where atomic64_t seems like a better fit anyway.

    The CONFIG_GENERIC_ATOMIC64 implementation of atomic64_* functions will
    use spinlocks & so faces the same issue, but with the key difference
    that the memory backing an atomic64_t ought to always be accessed via
    the atomic64_* functions anyway making the issue moot.

    Signed-off-by: Paul Burton
    Fixes: 571ed1fd2390 ("SUNRPC: Replace krb5_seq_lock with a lockless scheme")
    Cc: Trond Myklebust
    Cc: Anna Schumaker
    Cc: J. Bruce Fields
    Cc: Jeff Layton
    Cc: David S. Miller
    Cc: linux-nfs@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Trond Myklebust

    Paul Burton
     
  • Pull networking fixes from David Miller:

    1) BPF verifier fixes from Daniel Borkmann.

    2) HNS driver fixes from Huazhong Tan.

    3) FDB only works for ethernet devices, reject attempts to install FDB
    rules for others. From Ido Schimmel.

    4) Fix spectre V1 in vhost, from Jason Wang.

    5) Don't pass on-stack object to irq_set_affinity_hint() in mvpp2
    driver, from Marc Zyngier.

    6) Fix mlx5e checksum handling when RXFCS is enabled, from Eric
    Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits)
    openvswitch: Fix push/pop ethernet validation
    net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules
    bpf: test make sure to run unpriv test cases in test_verifier
    bpf: add various test cases to test_verifier
    bpf: don't set id on after map lookup with ptr_to_map_val return
    bpf: fix partial copy of map_ptr when dst is scalar
    libbpf: Fix compile error in libbpf_attach_type_by_name
    kselftests/bpf: use ping6 as the default ipv6 ping binary if it exists
    selftests: mlxsw: qos_mc_aware: Add a test for UC awareness
    selftests: mlxsw: qos_mc_aware: Tweak for min shaper
    mlxsw: spectrum: Set minimum shaper on MC TCs
    mlxsw: reg: QEEC: Add minimum shaper fields
    net: hns3: bugfix for rtnl_lock's range in the hclgevf_reset()
    net: hns3: bugfix for rtnl_lock's range in the hclge_reset()
    net: hns3: bugfix for handling mailbox while the command queue reinitialized
    net: hns3: fix incorrect return value/type of some functions
    net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read
    net: hns3: bugfix for is_valid_csq_clean_head()
    net: hns3: remove unnecessary queue reset in the hns3_uninit_all_ring()
    net: hns3: bugfix for the initialization of command queue's spin lock
    ...

    Linus Torvalds
     

01 Nov, 2018

6 commits

  • Now that in_compat_syscall() is consistent on all architectures and does
    not longer report true on native i686, the workarounds (ifdeffery and
    helpers) can be removed.

    Signed-off-by: Dmitry Safonov
    Signed-off-by: Thomas Gleixner
    Cc: Dmitry Safonov
    Cc: Ard Biesheuvel
    Cc: Andy Lutomirsky
    Cc: "David S. Miller"
    Cc: Herbert Xu
    Cc: "H. Peter Anvin"
    Cc: John Stultz
    Cc: "Kirill A. Shutemov"
    Cc: Oleg Nesterov
    Cc: Steffen Klassert
    Cc: Stephen Boyd
    Cc: Steven Rostedt
    Cc: linux-efi@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Link: https://lkml.kernel.org/r/20181012134253.23266-3-dima@arista.com

    Dmitry Safonov
     
  • When there are both pop and push ethernet header actions among the
    actions to be applied to a packet, an unexpected EINVAL (Invalid
    argument) error is obtained. This is due to mac_proto not being reset
    correctly when those actions are validated.

    Reported-at:
    https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047554.html
    Fixes: 91820da6ae85 ("openvswitch: add Ethernet push and pop actions")
    Signed-off-by: Jaime Caamaño Ruiz
    Tested-by: Greg Rose
    Reviewed-by: Greg Rose
    Signed-off-by: David S. Miller

    Jaime Caamaño Ruiz
     
  • Jeff Kirsher says:

    ====================
    Intel Wired LAN Driver Updates 2018-10-31

    This series contains a various collection of fixes.

    Miroslav Lichvar from Red Hat or should I say IBM now? Updates the PHC
    timecounter interval for igb so that it gets updated at least once
    every 550 seconds.

    Ngai-Mint provides a fix for fm10k to prevent a soft lockup or system
    crash by adding a new condition to determine if the SM mailbox is in the
    correct state before proceeding.

    Jake provides several fm10k fixes, first one marks complier aborts as
    non-fatal since on some platforms trigger machine check errors when the
    compile aborts. Added missing device ids to the in-kernel driver. Due
    to the recent fixes, bumped the driver version.

    I (Jeff Kirsher) fixed a XFRM_ALGO dependency for both ixgbe and
    ixgbevf. This fix was based on the original work from Arnd Bergmann,
    which only fixed ixgbe.

    Mitch provides a fix for i40e/avf to update the status codes, which
    resolves an issue between a mis-match between i40e and the iavf driver,
    which also supports the ice LAN driver.

    Radoslaw fixes the ixgbe where the driver is logging a message about
    spoofed packets detected when the VF is re-started with a different MAC
    address.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull ceph updates from Ilya Dryomov:
    "The highlights are:

    - a series that fixes some old memory allocation issues in libceph
    (myself). We no longer allocate memory in places where allocation
    failures cannot be handled and BUG when the allocation fails.

    - support for copy_file_range() syscall (Luis Henriques). If size and
    alignment conditions are met, it leverages RADOS copy-from
    operation. Otherwise, a local copy is performed.

    - a patch that reduces memory requirement of ceph_sync_read() from
    the size of the entire read to the size of one object (Zheng Yan).

    - fallocate() syscall is now restricted to FALLOC_FL_PUNCH_HOLE (Luis
    Henriques)"

    * tag 'ceph-for-4.20-rc1' of git://github.com/ceph/ceph-client: (25 commits)
    ceph: new mount option to disable usage of copy-from op
    ceph: support copy_file_range file operation
    libceph: support the RADOS copy-from operation
    ceph: add non-blocking parameter to ceph_try_get_caps()
    libceph: check reply num_data_items in setup_request_data()
    libceph: preallocate message data items
    libceph, rbd, ceph: move ceph_osdc_alloc_messages() calls
    libceph: introduce alloc_watch_request()
    libceph: assign cookies in linger_submit()
    libceph: enable fallback to ceph_msg_new() in ceph_msgpool_get()
    ceph: num_ops is off by one in ceph_aio_retry_work()
    libceph: no need to call osd_req_opcode_valid() in osd_req_encode_op()
    ceph: set timeout conditionally in __cap_delay_requeue
    libceph: don't consume a ref on pagelist in ceph_msg_data_add_pagelist()
    libceph: introduce ceph_pagelist_alloc()
    libceph: osd_req_op_cls_init() doesn't need to take opcode
    libceph: bump CEPH_MSG_MAX_DATA_LEN
    ceph: only allow punch hole mode in fallocate
    ceph: refactor ceph_sync_read()
    ceph: check if LOOKUPNAME request was aborted when filling trace
    ...

    Linus Torvalds
     
  • Based on the original work from Arnd Bergmann.

    When XFRM_ALGO is not enabled, the new ixgbe IPsec code produces a
    link error:

    drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.o: In function `ixgbe_ipsec_vf_add_sa':
    ixgbe_ipsec.c:(.text+0x1266): undefined reference to `xfrm_aead_get_byname'

    Simply selecting XFRM_ALGO from here causes circular dependencies, so
    to fix it, we probably want this slightly more complex solution that is
    similar to what other drivers with XFRM offload do:

    A separate Kconfig symbol now controls whether we include the IPsec
    offload code. To keep the old behavior, this is left as 'default y'. The
    dependency in XFRM_OFFLOAD still causes a circular dependency but is
    not actually needed because this symbol is not user visible, so removing
    that dependency on top makes it all work.

    CC: Arnd Bergmann
    CC: Shannon Nelson
    Fixes: eda0333ac293 ("ixgbe: add VF IPsec management")
    Signed-off-by: Jeff Kirsher
    Tested-by: Andrew Bowers

    Jeff Kirsher
     
  • Merge more updates from Andrew Morton:

    - the rest of MM

    - lib/bitmap updates

    - hfs updates

    - fatfs updates

    - various other misc things

    * emailed patches from Andrew Morton : (94 commits)
    mm/gup.c: fix __get_user_pages_fast() comment
    mm: Fix warning in insert_pfn()
    memory-hotplug.rst: add some details about locking internals
    powerpc/powernv: hold device_hotplug_lock when calling memtrace_offline_pages()
    powerpc/powernv: hold device_hotplug_lock when calling device_online()
    mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
    mm/memory_hotplug: make add_memory() take the device_hotplug_lock
    mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
    mm/memblock.c: warn if zero alignment was requested
    memblock: stop using implicit alignment to SMP_CACHE_BYTES
    docs/boot-time-mm: remove bootmem documentation
    mm: remove include/linux/bootmem.h
    memblock: replace BOOTMEM_ALLOC_* with MEMBLOCK variants
    mm: remove nobootmem
    memblock: rename __free_pages_bootmem to memblock_free_pages
    memblock: rename free_all_bootmem to memblock_free_all
    memblock: replace free_bootmem_late with memblock_free_late
    memblock: replace free_bootmem{_node} with memblock_free
    mm: nobootmem: remove bootmem allocation APIs
    memblock: replace alloc_bootmem with memblock_alloc
    ...

    Linus Torvalds
     

31 Oct, 2018

3 commits

  • Move remaining definitions and declarations from include/linux/bootmem.h
    into include/linux/memblock.h and remove the redundant header.

    The includes were replaced with the semantic patch below and then
    semi-automated removal of duplicated '#include

    @@
    @@
    - #include
    + #include

    [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
    [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
    [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
    Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
    Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Rothwell
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • We return 0 in the case of a nonblocking socket that has no data
    available. However, this is incorrect and may confuse applications.
    After this patch we do the correct thing and return the error
    EAGAIN.

    Quoting return codes from recvmsg manpage,

    EAGAIN or EWOULDBLOCK
    The socket is marked nonblocking and the receive operation would
    block, or a receive timeout had been set and the timeout expired
    before data was received.

    Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
    Signed-off-by: John Fastabend
    Acked-by: Song Liu
    Signed-off-by: Daniel Borkmann

    John Fastabend
     
  • Pull nfsd updates from Bruce Fields:
    "Olga added support for the NFSv4.2 asynchronous copy protocol. We
    already supported COPY, by copying a limited amount of data and then
    returning a short result, letting the client resend. The asynchronous
    protocol should offer better performance at the expense of some
    complexity.

    The other highlight is Trond's work to convert the duplicate reply
    cache to a red-black tree, and to move it and some other server caches
    to RCU. (Previously these have meant taking global spinlocks on every
    RPC)

    Otherwise, some RDMA work and miscellaneous bugfixes"

    * tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linux: (30 commits)
    lockd: fix access beyond unterminated strings in prints
    nfsd: Fix an Oops in free_session()
    nfsd: correctly decrement odstate refcount in error path
    svcrdma: Increase the default connection credit limit
    svcrdma: Remove try_module_get from backchannel
    svcrdma: Remove ->release_rqst call in bc reply handler
    svcrdma: Reduce max_send_sges
    nfsd: fix fall-through annotations
    knfsd: Improve lookup performance in the duplicate reply cache using an rbtree
    knfsd: Further simplify the cache lookup
    knfsd: Simplify NFS duplicate replay cache
    knfsd: Remove dead code from nfsd_cache_lookup
    SUNRPC: Simplify TCP receive code
    SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock
    SUNRPC: Remove non-RCU protected lookup
    NFS: Fix up a typo in nfs_dns_ent_put
    NFS: Lockless DNS lookups
    knfsd: Lockless lookup of NFSv4 identities.
    SUNRPC: Lockless server RPCSEC_GSS context lookup
    knfsd: Allow lockless lookups of the exports
    ...

    Linus Torvalds
     

30 Oct, 2018

15 commits

  • When an FDB entry is configured, the address is validated to have the
    length of an Ethernet address, but the device for which the address is
    configured can be of any type.

    The above can result in the use of uninitialized memory when the address
    is later compared against existing addresses since 'dev->addr_len' is
    used and it may be greater than ETH_ALEN, as with ip6tnl devices.

    Fix this by making sure that FDB entries are only configured for
    Ethernet devices.

    BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863
    CPU: 1 PID: 4318 Comm: syz-executor998 Not tainted 4.19.0-rc3+ #49
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x14b/0x190 lib/dump_stack.c:113
    kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956
    __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645
    memcmp+0x11d/0x180 lib/string.c:863
    dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464
    ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline]
    rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558
    rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715
    netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454
    rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733
    netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
    netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343
    netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
    __sys_sendmsg net/socket.c:2152 [inline]
    __do_sys_sendmsg net/socket.c:2161 [inline]
    __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
    do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7
    RIP: 0033:0x440ee9
    Code: e8 cc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
    48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff
    ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fff6a93b518 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440ee9
    RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000003
    RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
    R10: 00000000004002c8 R11: 0000000000000213 R12: 000000000000b4b0
    R13: 0000000000401ec0 R14: 0000000000000000 R15: 0000000000000000

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:256 [inline]
    kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:181
    kmsan_kmalloc+0x98/0x100 mm/kmsan/kmsan_hooks.c:91
    kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:100
    slab_post_alloc_hook mm/slab.h:446 [inline]
    slab_alloc_node mm/slub.c:2718 [inline]
    __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351
    __kmalloc_reserve net/core/skbuff.c:138 [inline]
    __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206
    alloc_skb include/linux/skbuff.h:996 [inline]
    netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
    netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
    __sys_sendmsg net/socket.c:2152 [inline]
    __do_sys_sendmsg net/socket.c:2161 [inline]
    __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
    do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7

    v2:
    * Make error message more specific (David)

    Fixes: 090096bf3db1 ("net: generic fdb support for drivers without ndo_fdb_")
    Signed-off-by: Ido Schimmel
    Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com
    Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com
    Cc: Vlad Yasevich
    Cc: David Ahern
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • When getting pr_assocstatus and pr_streamstatus by sctp_getsockopt,
    it doesn't correctly process the case when policy is set with
    SCTP_PR_SCTP_ALL | SCTP_PR_SCTP_MASK. It even causes a
    slab-out-of-bounds in sctp_getsockopt_pr_streamstatus().

    This patch fixes it by return -EINVAL for this case.

    Fixes: 0ac1077e3a54 ("sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL")
    Reported-by: syzbot+5da0d0a72a9e7d791748@syzkaller.appspotmail.com
    Suggested-by: Marcelo Ricardo Leitner
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • If a transport is removed by asconf but there still are some chunks with
    this transport queuing on out_chunk_list, later an use-after-free issue
    will be caused when accessing this transport from these chunks in
    sctp_outq_flush().

    This is an old bug, we fix it by clearing the transport of these chunks
    in out_chunk_list when removing a transport in sctp_assoc_rm_peer().

    Reported-by: syzbot+56a40ceee5fb35932f4d@syzkaller.appspotmail.com
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • Similiar with ipv6 mcast commit 89225d1ce6af3 ("net: ipv6: mld: fix v1/v2
    switchback timeout to rfc3810, 9.12.")

    i) RFC3376 8.12. Older Version Querier Present Timeout says:

    The Older Version Querier Interval is the time-out for transitioning
    a host back to IGMPv3 mode once an older version query is heard.
    When an older version query is received, hosts set their Older
    Version Querier Present Timer to Older Version Querier Interval.

    This value MUST be ((the Robustness Variable) times (the Query
    Interval in the last Query received)) plus (one Query Response
    Interval).

    Currently we only use a hardcode value IGMP_V1/v2_ROUTER_PRESENT_TIMEOUT.
    Fix it by adding two new items mr_qi(Query Interval) and mr_qri(Query Response
    Interval) in struct in_device.

    Now we can calculate the switchback time via (mr_qrv * mr_qi) + mr_qri.
    We need update these values when receive IGMPv3 queries.

    Reported-by: Ying Xu
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     
  • In call_xpt_users(), we delete the entry from the list, but we
    do not reinitialise it. This triggers the list poisoning when
    we later call unregister_xpt_user() in nfsd4_del_conns().

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • Since commit ffe1f0df5862 ("rpcrdma: Merge svcrdma and xprtrdma
    modules into one"), the forward and backchannel components are part
    of the same kernel module. A separate try_module_get() call in the
    backchannel code is no longer necessary.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Similar to a change made in the client's forward channel reply
    handler: The xprt_release_rqst_cong() call is not necessary.

    Also, release xprt->recv_lock when taking xprt->transport_lock
    to avoid disabling and enabling BH's while holding another
    spin lock.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • There's no need to request a large number of send SGEs because the
    inline threshold already constrains the number of SGEs per Send.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Use the fact that the iov iterators already have functionality for
    skipping a base offset.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • Now that the reader functions are all RCU protected, use a regular
    spinlock rather than a reader/writer lock.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • Clean up the cache code by removing the non-RCU protected lookup.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • Use RCU protection for looking up the RPCSEC_GSS context.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • Convert structs ip_map and unix_gid to use RCU protected lookups.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • Instead of the reader/writer spinlock, allow cache lookups to use RCU
    for looking up entries. This is more efficient since modifications can
    occur while other entries are being looked up.

    Note that for now, we keep the reader/writer spinlock until all users
    have been converted to use RCU-safe freeing of their cache entries.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • Pull 9p updates from Dominique Martinet:
    "Highlights this time around are the end of Matthew's work to remove
    the custom 9p request cache and use a slab directly for requests, with
    some extra patches on my end to not degrade performance, but it's a
    very good cleanup.

    Tomas and I fixed a few more syzkaller bugs (refcount is the big one),
    and I had a go at the coverity bugs and at some of the bugzilla
    reports we had open for a while.

    I'm a bit disappointed that I couldn't get much reviews for a few of
    my own patches, but the big ones got some and it's all been soaking in
    linux-next for quite a while so I think it should be OK.

    Summary:

    - Finish removing the custom 9p request cache mechanism

    - Embed part of the fcall in the request to have better slab
    performance (msize usually is power of two aligned)

    - syzkaller fixes:
    * add a refcount to 9p requests to avoid use after free
    * a few double free issues

    - A few coverity fixes

    - Some old patches that were in the bugzilla:
    * do not trust pdu content for size header
    * mount option for lock retry interval"

    * tag '9p-for-4.20' of git://github.com/martinetd/linux: (21 commits)
    9p/trans_fd: put worker reqs on destroy
    9p/trans_fd: abort p9_read_work if req status changed
    9p: potential NULL dereference
    9p locks: fix glock.client_id leak in do_lock
    9p: p9dirent_read: check network-provided name length
    9p/rdma: remove useless check in cm_event_handler
    9p: acl: fix uninitialized iattr access
    9p locks: add mount option for lock retry interval
    9p: do not trust pdu content for stat item size
    9p: Rename req to rreq in trans_fd
    9p: fix spelling mistake in fall-through annotation
    9p/rdma: do not disconnect on down_interruptible EAGAIN
    9p: Add refcount to p9_req_t
    9p: rename p9_free_req() function
    9p: add a per-client fcall kmem_cache
    9p: embed fcall in req to round down buffer allocs
    9p: Remove p9_idpool
    9p: Use a slab for allocating requests
    9p: clear dangling pointers in p9stat_free
    v9fs_dir_readdir: fix double-free on p9stat_read error
    ...

    Linus Torvalds
     

29 Oct, 2018

4 commits

  • Since its inception, udp_dump_one has had a bug where userspace
    needs to swap src and dst addresses and ports in order to find
    the socket it wants. This is because it passes the socket source
    address to __udp[46]_lib_lookup's saddr argument, but those
    functions are intended to find local sockets matching received
    packets, so saddr is the remote address, not the local address.

    This can no longer be fixed for backwards compatibility reasons,
    so add a brief comment explaining that this is the case. This
    will avoid confusion and help ensure SOCK_DIAG implementations
    of new protocols don't have the same problem.

    Fixes: a925aa00a55 ("udp_diag: Implement the get_exact dumping functionality")
    Signed-off-by: Lorenzo Colitti
    Signed-off-by: David S. Miller

    Lorenzo Colitti
     
  • gred_change_table_def() takes a pointer to TCA_GRED_DPS attribute,
    and expects it will be able to interpret its contents as
    struct tc_gred_sopt. Pass the correct gred attribute, instead of
    TCA_OPTIONS.

    This bug meant the table definition could never be changed after
    Qdisc was initialized (unless whatever TCA_OPTIONS contained both
    passed netlink validation and was a valid struct tc_gred_sopt...).

    Old behaviour:
    $ ip link add type dummy
    $ tc qdisc replace dev dummy0 parent root handle 7: \
    gred setup vqs 4 default 0
    $ tc qdisc replace dev dummy0 parent root handle 7: \
    gred setup vqs 4 default 0
    RTNETLINK answers: Invalid argument

    Now:
    $ ip link add type dummy
    $ tc qdisc replace dev dummy0 parent root handle 7: \
    gred setup vqs 4 default 0
    $ tc qdisc replace dev dummy0 parent root handle 7: \
    gred setup vqs 4 default 0
    $ tc qdisc replace dev dummy0 parent root handle 7: \
    gred setup vqs 4 default 0

    Fixes: f62d6b936df5 ("[PKT_SCHED]: GRED: Use central VQ change procedure")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Recently a check was added which prevents marking of routers with zero
    source address, but for IPv6 that cannot happen as the relevant RFCs
    actually forbid such packets:
    RFC 2710 (MLDv1):
    "To be valid, the Query message MUST
    come from a link-local IPv6 Source Address, be at least 24 octets
    long, and have a correct MLD checksum."

    Same goes for RFC 3810.

    And also it can be seen as a requirement in ipv6_mc_check_mld_query()
    which is used by the bridge to validate the message before processing
    it. Thus any queries with :: source address won't be processed anyway.
    So just remove the check for zero IPv6 source address from the query
    processing function.

    Fixes: 5a2de63fd1a5 ("bridge: do not add port to router list when receives query with source 0.0.0.0")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Just like with normal GRO processing, we have to initialize
    skb->next to NULL when we unlink overflow packets from the
    GRO hash lists.

    Fixes: d4546c2509b1 ("net: Convert GRO SKB handling to list_head.")
    Reported-by: Oleksandr Natalenko
    Tested-by: Oleksandr Natalenko
    Signed-off-by: David S. Miller

    David S. Miller
     

27 Oct, 2018

2 commits

  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2018-10-27

    The following pull-request contains BPF updates for your *net* tree.

    The main changes are:

    1) Fix toctou race in BTF header validation, from Martin and Wenwen.

    2) Fix devmap interface comparison in notifier call which was
    neglecting netns, from Taehee.

    3) Several fixes in various places, for example, correcting direct
    packet access and helper function availability, from Daniel.

    4) Fix BPF kselftest config fragment to include af_xdp and sockmap,
    from Naresh.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull networking fixes from David Miller:
    "What better way to start off a weekend than with some networking bug
    fixes:

    1) net namespace leak in dump filtering code of ipv4 and ipv6, fixed
    by David Ahern and Bjørn Mork.

    2) Handle bad checksums from hardware when using CHECKSUM_COMPLETE
    properly in UDP, from Sean Tranchetti.

    3) Remove TCA_OPTIONS from policy validation, it turns out we don't
    consistently use nested attributes for this across all packet
    schedulers. From David Ahern.

    4) Fix SKB corruption in cadence driver, from Tristram Ha.

    5) Fix broken WoL handling in r8169 driver, from Heiner Kallweit.

    6) Fix OOPS in pneigh_dump_table(), from Eric Dumazet"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits)
    net/neigh: fix NULL deref in pneigh_dump_table()
    net: allow traceroute with a specified interface in a vrf
    bridge: do not add port to router list when receives query with source 0.0.0.0
    net/smc: fix smc_buf_unuse to use the lgr pointer
    ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called
    net/{ipv4,ipv6}: Do not put target net if input nsid is invalid
    lan743x: Remove SPI dependency from Microchip group.
    drivers: net: remove inclusion when not needed
    net: phy: genphy_10g_driver: Avoid NULL pointer dereference
    r8169: fix broken Wake-on-LAN from S5 (poweroff)
    octeontx2-af: Use GFP_ATOMIC under spin lock
    net: ethernet: cadence: fix socket buffer corruption problem
    net/ipv6: Allow onlink routes to have a device mismatch if it is the default route
    net: sched: Remove TCA_OPTIONS from policy
    ice: Poll for link status change
    ice: Allocate VF interrupts and set queue map
    ice: Introduce ice_dev_onetime_setup
    net: hns3: Fix for warning uninitialized symbol hw_err_lst3
    octeontx2-af: Copy the right amount of memory
    net: udp: fix handling of CHECKSUM_COMPLETE packets
    ...

    Linus Torvalds