06 Apr, 2019

7 commits

  • When mknod is used to create a block special file in hugetlbfs, it will
    allocate an inode and kmalloc a 'struct resv_map' via resv_map_alloc().
    inode->i_mapping->private_data will point the newly allocated resv_map.
    However, when the device special file is opened bd_acquire() will set
    inode->i_mapping to bd_inode->i_mapping. Thus the pointer to the
    allocated resv_map is lost and the structure is leaked.

    Programs to reproduce:
    mount -t hugetlbfs nodev hugetlbfs
    mknod hugetlbfs/dev b 0 0
    exec 30<> hugetlbfs/dev
    umount hugetlbfs/

    resv_map structures are only needed for inodes which can have associated
    page allocations. To fix the leak, only allocate resv_map for those
    inodes which could possibly be associated with page allocations.

    Link: http://lkml.kernel.org/r/20190401213101.16476-1-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz
    Reviewed-by: Andrew Morton
    Reported-by: Yufen Yu
    Suggested-by: Yufen Yu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • Symmetrically to VM_FAULT_SET_HINDEX(), we need a force-cast in
    VM_FAULT_GET_HINDEX() to tell sparse that this is intentional.

    Sparse complains about the current code when building a kernel with
    CONFIG_MEMORY_FAILURE:

    arch/x86/mm/fault.c:1058:53: warning: restricted vm_fault_t degrades to integer

    Link: http://lkml.kernel.org/r/20190327204117.35215-1-jannh@google.com
    Fixes: 3d3539018d2c ("mm: create the new vm_fault_t type")
    Signed-off-by: Jann Horn
    Reviewed-by: Andrew Morton
    Cc: Souptick Joarder
    Cc: Matthew Wilcox
    Cc: Vlastimil Babka
    Cc: "Kirill A. Shutemov"
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jann Horn
     
  • For very short input data (0 - 1 bytes), lzo-rle was not behaving
    correctly. Fix this behaviour and update documentation accordingly.

    For zero-length input, lzo v0 outputs an end-of-stream marker only,
    which was misinterpreted by lzo-rle as a bitstream version number.
    Ensure bitstream versions > 0 require a minimum stream length of 5.

    Also fixes a bug in handling the tail for very short inputs when a
    bitstream version is present.

    Link: http://lkml.kernel.org/r/20190326165857.34613-1-dave.rodgman@arm.com
    Signed-off-by: Dave Rodgman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Rodgman
     
  • clang points out with hundreds of warnings that the bitrev macros have a
    problem with constant input:

    drivers/hwmon/sht15.c:187:11: error: variable '__x' is uninitialized when used within its own initialization
    [-Werror,-Wuninitialized]
    u8 crc = bitrev8(data->val_status & 0x0F);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    include/linux/bitrev.h:102:21: note: expanded from macro 'bitrev8'
    __constant_bitrev8(__x) : \
    ~~~~~~~~~~~~~~~~~~~^~~~
    include/linux/bitrev.h:67:11: note: expanded from macro '__constant_bitrev8'
    u8 __x = x; \
    ~~~ ^

    Both the bitrev and the __constant_bitrev macros use an internal
    variable named __x, which goes horribly wrong when passing one to the
    other.

    The obvious fix is to rename one of the variables, so this adds an extra
    '_'.

    It seems we got away with this because

    - there are only a few drivers using bitrev macros

    - usually there are no constant arguments to those

    - when they are constant, they tend to be either 0 or (unsigned)-1
    (drivers/isdn/i4l/isdnhdlc.o, drivers/iio/amplifiers/ad8366.c) and
    give the correct result by pure chance.

    In fact, the only driver that I could find that gets different results
    with this is drivers/net/wan/slic_ds26522.c, which in turn is a driver
    for fairly rare hardware (adding the maintainer to Cc for testing).

    Link: http://lkml.kernel.org/r/20190322140503.123580-1-arnd@arndb.de
    Fixes: 556d2f055bf6 ("ARM: 8187/1: add CONFIG_HAVE_ARCH_BITREVERSE to support rbit instruction")
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Nick Desaulniers
    Cc: Zhao Qiang
    Cc: Yalin Wang
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • Commit 2d4f567103ff ("KVM: PPC: Introduce kvm_tmp framework") adds
    kvm_tmp[] into the .bss section and then free the rest of unused spaces
    back to the page allocator.

    kernel_init
    kvm_guest_init
    kvm_free_tmp
    free_reserved_area
    free_unref_page
    free_unref_page_prepare

    With DEBUG_PAGEALLOC=y, it will unmap those pages from kernel. As the
    result, kmemleak scan will trigger a panic when it scans the .bss
    section with unmapped pages.

    This patch creates dedicated kmemleak objects for the .data, .bss and
    potentially .data..ro_after_init sections to allow partial freeing via
    the kmemleak_free_part() in the powerpc kvm_free_tmp() function.

    Link: http://lkml.kernel.org/r/20190321171917.62049-1-catalin.marinas@arm.com
    Signed-off-by: Catalin Marinas
    Reported-by: Qian Cai
    Acked-by: Michael Ellerman (powerpc)
    Tested-by: Qian Cai
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Avi Kivity
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Catalin Marinas
     
  • A recent optimization in Clang (r355672) lowers comparisons of the
    return value of memcmp against zero to comparisons of the return value
    of bcmp against zero. This helps some platforms that implement bcmp
    more efficiently than memcmp. glibc simply aliases bcmp to memcmp, but
    an optimized implementation is in the works.

    This results in linkage failures for all targets with Clang due to the
    undefined symbol. For now, just implement bcmp as a tailcail to memcmp
    to unbreak the build. This routine can be further optimized in the
    future.

    Other ideas discussed:

    * A weak alias was discussed, but breaks for architectures that define
    their own implementations of memcmp since aliases to declarations are
    not permitted (only definitions). Arch-specific memcmp
    implementations typically declare memcmp in C headers, but implement
    them in assembly.

    * -ffreestanding also is used sporadically throughout the kernel.

    * -fno-builtin-bcmp doesn't work when doing LTO.

    Link: https://bugs.llvm.org/show_bug.cgi?id=41035
    Link: https://code.woboq.org/userspace/glibc/string/memcmp.c.html#bcmp
    Link: https://github.com/llvm/llvm-project/commit/8e16d73346f8091461319a7dfc4ddd18eedcff13
    Link: https://github.com/ClangBuiltLinux/linux/issues/416
    Link: http://lkml.kernel.org/r/20190313211335.165605-1-ndesaulniers@google.com
    Signed-off-by: Nick Desaulniers
    Reported-by: Nathan Chancellor
    Reported-by: Adhemerval Zanella
    Suggested-by: Arnd Bergmann
    Suggested-by: James Y Knight
    Suggested-by: Masahiro Yamada
    Suggested-by: Nathan Chancellor
    Suggested-by: Rasmus Villemoes
    Acked-by: Steven Rostedt (VMware)
    Reviewed-by: Nathan Chancellor
    Tested-by: Nathan Chancellor
    Reviewed-by: Masahiro Yamada
    Reviewed-by: Andy Shevchenko
    Cc: David Laight
    Cc: Rasmus Villemoes
    Cc: Namhyung Kim
    Cc: Greg Kroah-Hartman
    Cc: Alexander Shishkin
    Cc: Dan Williams
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Desaulniers
     
  • Pull mm/compaction fixes from Mel Gorman:
    "The merge window for 5.1 introduced a number of compaction-related
    patches. with intermittent reports of corruption and functional
    issues. The bugs are due to sloopy checking of zone boundaries and a
    corner case where invalid indexes are used to access the free lists.

    Reports are not common but at least two users and 0-day have tripped
    over them. There is a chance that one of the syzbot reports are
    related but it has not been confirmed properly.

    The normal submission path is with Andrew but there have been some
    delays and I consider them urgent enough that they should be picked up
    before RC4 to avoid duplicate reports.

    All of these have been successfully tested on older RC windows. This
    will make this branch look like a rebase but in fact, they've simply
    been lifted again from Andrew's tree and placed on a fresh branch.
    I've no reason to believe that this has invalidated the testing given
    the lack of change in compaction and the nature of the fixes"

    * tag 'mm-compaction-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux:
    mm/compaction.c: abort search if isolation fails
    mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints

    Linus Torvalds
     

05 Apr, 2019

30 commits

  • The n_r3964 line discipline driver was written in a different time, when
    SMP machines were rare, and users were trusted to do the right thing.
    Since then, the world has moved on but not this code, it has stayed
    rooted in the past with its lovely hand-crafted list structures and
    loads of "interesting" race conditions all over the place.

    After attempting to clean up most of the issues, I just gave up and am
    now marking the driver as BROKEN so that hopefully someone who has this
    hardware will show up out of the woodwork (I know you are out there!)
    and will help with debugging a raft of changes that I had laying around
    for the code, but was too afraid to commit as odds are they would break
    things.

    Many thanks to Jann and Linus for pointing out the initial problems in
    this codebase, as well as many reviews of my attempts to fix the issues.
    It was a case of whack-a-mole, and as you can see, the mole won.

    Reported-by: Jann Horn
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Linus Torvalds

    Greg Kroah-Hartman
     
  • Pull drm fixes from Dave Airlie:
    "Pretty quiet week, just some amdgpu and i915 fixes.

    i915:
    - deadlock fix
    - gvt fixes

    amdgpu:
    - PCIE dpm feature fix
    - Powerplay fixes"

    * tag 'drm-fixes-2019-04-05' of git://anongit.freedesktop.org/drm/drm:
    drm/i915/gvt: Fix kerneldoc typo for intel_vgpu_emulate_hotplug
    drm/i915/gvt: Correct the calculation of plane size
    drm/amdgpu: remove unnecessary rlc reset function on gfx9
    drm/i915: Always backoff after a drm_modeset_lock() deadlock
    drm/i915/gvt: do not let pin count of shadow mm go negative
    drm/i915/gvt: do not deliver a workload if its creation fails
    drm/amd/display: VBIOS can't be light up HDMI when restart system
    drm/amd/powerplay: fix possible hang with 3+ 4K monitors
    drm/amd/powerplay: correct data type to avoid overflow
    drm/amd/powerplay: add ECC feature bit
    drm/amd/amdgpu: fix PCIe dpm feature issue (v3)

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) Several hash table refcount fixes in batman-adv, from Sven
    Eckelmann.

    2) Use after free in bpf_evict_inode(), from Daniel Borkmann.

    3) Fix mdio bus registration in ixgbe, from Ivan Vecera.

    4) Unbounded loop in __skb_try_recv_datagram(), from Paolo Abeni.

    5) ila rhashtable corruption fix from Herbert Xu.

    6) Don't allow upper-devices to be added to vrf devices, from Sabrina
    Dubroca.

    7) Add qmi_wwan device ID for Olicard 600, from Bjørn Mork.

    8) Don't leave skb->next poisoned in __netif_receive_skb_list_ptype,
    from Alexander Lobakin.

    9) Missing IDR checks in mlx5 driver, from Aditya Pakki.

    10) Fix false connection termination in ktls, from Jakub Kicinski.

    11) Work around some ASPM issues with r8169 by disabling rx interrupt
    coalescing on certain chips. From Heiner Kallweit.

    12) Properly use per-cpu qstat values on NOLOCK qdiscs, from Paolo
    Abeni.

    13) Fully initialize sockaddr_in structures in SCTP, from Xin Long.

    14) Various BPF flow dissector fixes from Stanislav Fomichev.

    15) Divide by zero in act_sample, from Davide Caratti.

    16) Fix bridging multicast regression introduced by rhashtable
    conversion, from Nikolay Aleksandrov.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
    ibmvnic: Fix completion structure initialization
    ipv6: sit: reset ip header pointer in ipip6_rcv
    net: bridge: always clear mcast matching struct on reports and leaves
    libcxgb: fix incorrect ppmax calculation
    vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x
    sch_cake: Make sure we can write the IP header before changing DSCP bits
    sch_cake: Use tc_skb_protocol() helper for getting packet protocol
    tcp: Ensure DCTCP reacts to losses
    net/sched: act_sample: fix divide by zero in the traffic path
    net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop
    net: hns: Fix sparse: some warnings in HNS drivers
    net: hns: Fix WARNING when remove HNS driver with SMMU enabled
    net: hns: fix ICMP6 neighbor solicitation messages discard problem
    net: hns: Fix probabilistic memory overwrite when HNS driver initialized
    net: hns: Use NAPI_POLL_WEIGHT for hns driver
    net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()
    flow_dissector: rst'ify documentation
    ipv6: Fix dangling pointer when ipv6 fragment
    net-gro: Fix GRO flush when receiving a GSO packet.
    flow_dissector: document BPF flow dissector environment
    ...

    Linus Torvalds
     
  • Fix device initialization completion handling for vNIC adapters.
    Initialize the completion structure on probe and reinitialize when needed.
    This also fixes a race condition during kdump where the driver can attempt
    to access the completion struct before it is initialized:

    Unable to handle kernel paging request for data at address 0x00000000
    Faulting instruction address: 0xc0000000081acbe0
    Oops: Kernel access of bad area, sig: 11 [#1]
    LE SMP NR_CPUS=2048 NUMA pSeries
    Modules linked in: ibmvnic(+) ibmveth sunrpc overlay squashfs loop
    CPU: 19 PID: 301 Comm: systemd-udevd Not tainted 4.18.0-64.el8.ppc64le #1
    NIP: c0000000081acbe0 LR: c0000000081ad964 CTR: c0000000081ad900
    REGS: c000000027f3f990 TRAP: 0300 Not tainted (4.18.0-64.el8.ppc64le)
    MSR: 800000010280b033 CR: 28228288 XER: 00000006
    CFAR: c000000008008934 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1
    GPR00: c0000000081ad964 c000000027f3fc10 c0000000095b5800 c0000000221b4e58
    GPR04: 0000000000000003 0000000000000001 000049a086918581 00000000000000d4
    GPR08: 0000000000000007 0000000000000000 ffffffffffffffe8 d0000000014dde28
    GPR12: c0000000081ad900 c000000009a00c00 0000000000000001 0000000000000100
    GPR16: 0000000000000038 0000000000000007 c0000000095e2230 0000000000000006
    GPR20: 0000000000400140 0000000000000001 c00000000910c880 0000000000000000
    GPR24: 0000000000000000 0000000000000006 0000000000000000 0000000000000003
    GPR28: 0000000000000001 0000000000000001 c0000000221b4e60 c0000000221b4e58
    NIP [c0000000081acbe0] __wake_up_locked+0x50/0x100
    LR [c0000000081ad964] complete+0x64/0xa0
    Call Trace:
    [c000000027f3fc10] [c000000027f3fc60] 0xc000000027f3fc60 (unreliable)
    [c000000027f3fc60] [c0000000081ad964] complete+0x64/0xa0
    [c000000027f3fca0] [d0000000014dad58] ibmvnic_handle_crq+0xce0/0x1160 [ibmvnic]
    [c000000027f3fd50] [d0000000014db270] ibmvnic_tasklet+0x98/0x130 [ibmvnic]
    [c000000027f3fda0] [c00000000813f334] tasklet_action_common.isra.3+0xc4/0x1a0
    [c000000027f3fe00] [c000000008cd13f4] __do_softirq+0x164/0x400
    [c000000027f3fef0] [c00000000813ed64] irq_exit+0x184/0x1c0
    [c000000027f3ff20] [c0000000080188e8] __do_irq+0xb8/0x210
    [c000000027f3ff90] [c00000000802d0a4] call_do_irq+0x14/0x24
    [c000000026a5b010] [c000000008018adc] do_IRQ+0x9c/0x130
    [c000000026a5b060] [c000000008008ce4] hardware_interrupt_common+0x114/0x120

    Signed-off-by: Thomas Falcon
    Signed-off-by: David S. Miller

    Thomas Falcon
     
  • ipip6 tunnels run iptunnel_pull_header on received skbs. This can
    determine the following use-after-free accessing iph pointer since
    the packet will be 'uncloned' running pskb_expand_head if it is a
    cloned gso skb (e.g if the packet has been sent though a veth device)

    [ 706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit]
    [ 706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/=
    [ 706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016
    [ 706.771839] Call trace:
    [ 706.801159] dump_backtrace+0x0/0x2f8
    [ 706.845079] show_stack+0x24/0x30
    [ 706.884833] dump_stack+0xe0/0x11c
    [ 706.925629] print_address_description+0x68/0x260
    [ 706.982070] kasan_report+0x178/0x340
    [ 707.025995] __asan_report_load1_noabort+0x30/0x40
    [ 707.083481] ipip6_rcv+0x1678/0x16e0 [sit]
    [ 707.132623] tunnel64_rcv+0xd4/0x200 [tunnel4]
    [ 707.185940] ip_local_deliver_finish+0x3b8/0x988
    [ 707.241338] ip_local_deliver+0x144/0x470
    [ 707.289436] ip_rcv_finish+0x43c/0x14b0
    [ 707.335447] ip_rcv+0x628/0x1138
    [ 707.374151] __netif_receive_skb_core+0x1670/0x2600
    [ 707.432680] __netif_receive_skb+0x28/0x190
    [ 707.482859] process_backlog+0x1d0/0x610
    [ 707.529913] net_rx_action+0x37c/0xf68
    [ 707.574882] __do_softirq+0x288/0x1018
    [ 707.619852] run_ksoftirqd+0x70/0xa8
    [ 707.662734] smpboot_thread_fn+0x3a4/0x9e8
    [ 707.711875] kthread+0x2c8/0x350
    [ 707.750583] ret_from_fork+0x10/0x18

    [ 707.811302] Allocated by task 16982:
    [ 707.854182] kasan_kmalloc.part.1+0x40/0x108
    [ 707.905405] kasan_kmalloc+0xb4/0xc8
    [ 707.948291] kasan_slab_alloc+0x14/0x20
    [ 707.994309] __kmalloc_node_track_caller+0x158/0x5e0
    [ 708.053902] __kmalloc_reserve.isra.8+0x54/0xe0
    [ 708.108280] __alloc_skb+0xd8/0x400
    [ 708.150139] sk_stream_alloc_skb+0xa4/0x638
    [ 708.200346] tcp_sendmsg_locked+0x818/0x2b90
    [ 708.251581] tcp_sendmsg+0x40/0x60
    [ 708.292376] inet_sendmsg+0xf0/0x520
    [ 708.335259] sock_sendmsg+0xac/0xf8
    [ 708.377096] sock_write_iter+0x1c0/0x2c0
    [ 708.424154] new_sync_write+0x358/0x4a8
    [ 708.470162] __vfs_write+0xc4/0xf8
    [ 708.510950] vfs_write+0x12c/0x3d0
    [ 708.551739] ksys_write+0xcc/0x178
    [ 708.592533] __arm64_sys_write+0x70/0xa0
    [ 708.639593] el0_svc_handler+0x13c/0x298
    [ 708.686646] el0_svc+0x8/0xc

    [ 708.739019] Freed by task 17:
    [ 708.774597] __kasan_slab_free+0x114/0x228
    [ 708.823736] kasan_slab_free+0x10/0x18
    [ 708.868703] kfree+0x100/0x3d8
    [ 708.905320] skb_free_head+0x7c/0x98
    [ 708.948204] skb_release_data+0x320/0x490
    [ 708.996301] pskb_expand_head+0x60c/0x970
    [ 709.044399] __iptunnel_pull_header+0x3b8/0x5d0
    [ 709.098770] ipip6_rcv+0x41c/0x16e0 [sit]
    [ 709.146873] tunnel64_rcv+0xd4/0x200 [tunnel4]
    [ 709.200195] ip_local_deliver_finish+0x3b8/0x988
    [ 709.255596] ip_local_deliver+0x144/0x470
    [ 709.303692] ip_rcv_finish+0x43c/0x14b0
    [ 709.349705] ip_rcv+0x628/0x1138
    [ 709.388413] __netif_receive_skb_core+0x1670/0x2600
    [ 709.446943] __netif_receive_skb+0x28/0x190
    [ 709.497120] process_backlog+0x1d0/0x610
    [ 709.544169] net_rx_action+0x37c/0xf68
    [ 709.589131] __do_softirq+0x288/0x1018

    [ 709.651938] The buggy address belongs to the object at ffffe01b6bd85580
    which belongs to the cache kmalloc-1024 of size 1024
    [ 709.804356] The buggy address is located 117 bytes inside of
    1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980)
    [ 709.946340] The buggy address belongs to the page:
    [ 710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0
    [ 710.099914] flags: 0xfffff8000000100(slab)
    [ 710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600
    [ 710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000
    [ 710.334966] page dumped because: kasan: bad access detected

    Fix it resetting iph pointer after iptunnel_pull_header

    Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap")
    Tested-by: Jianlin Shi
    Signed-off-by: Lorenzo Bianconi
    Signed-off-by: David S. Miller

    Lorenzo Bianconi
     
  • …/git/palmer/riscv-linux

    Pull RISC-V fixes from Palmer Dabbelt:
    "I dropped the ball a bit here: these patches should all probably have
    been part of rc2, but I wanted to get around to properly testing them
    in the various configurations (qemu32, qeum64, unleashed) first.

    Unfortunately I've been traveling and didn't have time to actually do
    that, but since these fix concrete bugs and pass my old set of tests I
    don't want to delay the fixes any longer.

    There are four independent fixes here:

    - A fix for the rv32 port that corrects the 64-bit user accesor's
    fixup label address.

    - A fix for a regression introduced during the merge window that
    broke medlow configurations at run time. This patch also includes a
    fix that disables ftrace for the same set of functions, which was
    found by inspection at the same time.

    - A modification of the memory map to avoid overlapping the FIXMAP
    and VMALLOC regions on systems with small memory maps.

    - A fix to the module handling code to use the correct syntax for
    probing Kconfig entries.

    These have passed my standard test flow, but I didn't have time to
    expand that testing like I said I would"

    * tag 'riscv-for-linus-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
    RISC-V: Use IS_ENABLED(CONFIG_CMODEL_MEDLOW)
    RISC-V: Fix FIXMAP_TOP to avoid overlap with VMALLOC area
    RISC-V: Always compile mm/init.c with cmodel=medany and notrace
    riscv: fix accessing 8-byte variable from RV32

    Linus Torvalds
     
  • We need to be careful and always zero the whole br_ip struct when it is
    used for matching since the rhashtable change. This patch fixes all the
    places which didn't properly clear it which in turn might've caused
    mismatches.

    Thanks for the great bug report with reproducing steps and bisection.

    Steps to reproduce (from the bug report):
    ip link add br0 type bridge mcast_querier 1
    ip link set br0 up

    ip link add v2 type veth peer name v3
    ip link set v2 master br0
    ip link set v2 up
    ip link set v3 up
    ip addr add 3.0.0.2/24 dev v3

    ip netns add test
    ip link add v1 type veth peer name v1 netns test
    ip link set v1 master br0
    ip link set v1 up
    ip -n test link set v1 up
    ip -n test addr add 3.0.0.1/24 dev v1

    # Multicast receiver
    ip netns exec test socat
    UDP4-RECVFROM:5588,ip-add-membership=224.224.224.224:3.0.0.1,fork -

    # Multicast sender
    echo hello | nc -u -s 3.0.0.2 224.224.224.224 5588

    Reported-by: liam.mcbirnie@boeing.com
    Fixes: 19e3a9c90c53 ("net: bridge: convert multicast to generic rhashtable")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Pull power management fixes from Rafael Wysocki:
    "These fix up the intel_pstate driver after recent changes to prevent
    it from printing pointless messages and update the turbostat utility
    (mostly fixes and new hardware support).

    Specifics:

    - Make intel_pstate only load on Intel processors and prevent it from
    printing pointless failure messages (Borislav Petkov).

    - Update the turbostat utility:
    * Assorted fixes (Ben Hutchings, Len Brown, Prarit Bhargava).
    * Support for AMD Fam 17h (Zen) RAPL and package power (Calvin
    Walton).
    * Support for Intel Icelake and for systems with more than one die
    per package (Len Brown).
    * Cleanups (Len Brown)"

    * tag 'pm-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    cpufreq/intel_pstate: Load only on Intel hardware
    tools/power turbostat: update version number
    tools/power turbostat: Warn on bad ACPI LPIT data
    tools/power turbostat: Add checks for failure of fgets() and fscanf()
    tools/power turbostat: Also read package power on AMD F17h (Zen)
    tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
    tools/power turbostat: Do not display an error on systems without a cpufreq driver
    tools/power turbostat: Add Die column
    tools/power turbostat: Add Icelake support
    tools/power turbostat: Cleanup CNL-specific code
    tools/power turbostat: Cleanup CC3-skip code
    tools/power turbostat: Restore ability to execute in topology-order

    Linus Torvalds
     
  • Pull ACPI fix from Rafael Wysocki:
    "Prevent stale GPE events from triggering spurious system wakeups from
    suspend-to-idle (Furquan Shaikh)"

    * tag 'acpi-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    ACPICA: Clear status of GPEs before enabling them

    Linus Torvalds
     
  • Only one fix for DSC (backoff after drm_modeset_lock deadlock)
    and GVT's fixes including vGPU display plane size calculation,
    shadow mm pin count, error recovery path for workload create
    and one kerneldoc fix.

    Signed-off-by: Dave Airlie

    From: Rodrigo Vivi
    Link: https://patchwork.freedesktop.org/patch/msgid/20190404161116.GA14522@intel.com

    Dave Airlie
     
  • Pull mfd fixes from Lee Jones:

    - Fix failed reads due to enabled IRQs when suspended; twl-core

    - Fix driver registration when using DT; sprd-sc27xx-spi

    - Fix `make allyesconfig` on x86_64; SUN6I_PRCM

    * tag 'mfd-fixes-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
    mfd: sun6i-prcm: Allow to compile with COMPILE_TEST
    mfd: sc27xx: Use SoC compatible string for PMIC devices
    mfd: twl-core: Disable IRQ while suspended

    Linus Torvalds
     
  • Fixes for 5.1:
    - Fix for pcie dpm
    - Powerplay fixes for vega20
    - Fix vbios display on reboot if driver display state is retained
    - Gfx9 resume robustness fix

    Signed-off-by: Dave Airlie
    From: Alex Deucher
    Link: https://patchwork.freedesktop.org/patch/msgid/20190404042939.3386-1-alexander.deucher@amd.com

    Dave Airlie
     
  • BITS_TO_LONGS() uses DIV_ROUND_UP() because of
    this ppmax value can be greater than available
    per cpu page pods.

    This patch removes BITS_TO_LONGS() to fix this
    issue.

    Signed-off-by: Varun Prakash
    Signed-off-by: David S. Miller

    Varun Prakash
     
  • Way back in 3c9c36bcedd426f2be2826da43e5163de61735f7 the
    ndo_fcoe_get_wwn pointer was switched from depending on CONFIG_FCOE to
    CONFIG_LIBFCOE in order to allow building FCoE support into the bnx2x
    driver and used by bnx2fc without including the generic software fcoe
    module.

    But, FCoE is generally used over an 802.1q VLAN, and the implementation
    of ndo_fcoe_get_wwn in the 8021q module was not similarly changed. The
    result is that if CONFIG_FCOE is disabled, then bnz2fc cannot make a
    call to ndo_fcoe_get_wwn through the 8021q interface to the underlying
    bnx2x interface. The bnx2fc driver then falls back to a potentially
    different mapping of Ethernet MAC to Fibre Channel WWN, creating an
    incompatibility with the fabric and target configurations when compared
    to the WWNs used by pre-boot firmware and differently-configured
    kernels.

    So make the conditional inclusion of FCoE code in 8021q match the
    conditional inclusion in netdevice.h

    Signed-off-by: Chris Leech
    Signed-off-by: David S. Miller

    Chris Leech
     
  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2019-04-04

    The following pull-request contains BPF updates for your *net* tree.

    The main changes are:

    1) Batch of fixes to the existing BPF flow dissector API to support
    calling BPF programs from the eth_get_headlen context (support for
    latter is planned to be added in bpf-next), from Stanislav.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • * acpica:
    ACPICA: Clear status of GPEs before enabling them

    Rafael J. Wysocki
     
  • * pm-tools:
    tools/power turbostat: update version number
    tools/power turbostat: Warn on bad ACPI LPIT data
    tools/power turbostat: Add checks for failure of fgets() and fscanf()
    tools/power turbostat: Also read package power on AMD F17h (Zen)
    tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
    tools/power turbostat: Do not display an error on systems without a cpufreq driver
    tools/power turbostat: Add Die column
    tools/power turbostat: Add Icelake support
    tools/power turbostat: Cleanup CNL-specific code
    tools/power turbostat: Cleanup CC3-skip code
    tools/power turbostat: Restore ability to execute in topology-order

    Rafael J. Wysocki
     
  • Toke Høiland-Jørgensen says:

    ====================
    sched: A few small fixes for sch_cake

    Kevin noticed a few issues with the way CAKE reads the skb protocol and the IP
    diffserv fields. This series fixes those two issues, and should probably go to
    in 4.19 as well. However, the previous refactoring patch means they don't apply
    as-is; I can send a follow-up directly to stable if that's OK with you?
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • There is not actually any guarantee that the IP headers are valid before we
    access the DSCP bits of the packets. Fix this using the same approach taken
    in sch_dsmark.

    Reported-by: Kevin Darbyshire-Bryant
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller

    Toke Høiland-Jørgensen
     
  • We shouldn't be using skb->protocol directly as that will miss cases with
    hardware-accelerated VLAN tags. Use the helper instead to get the right
    protocol number.

    Reported-by: Kevin Darbyshire-Bryant
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller

    Toke Høiland-Jørgensen
     
  • RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
    loss episodes in the same way as conventional TCP".

    Currently, Linux DCTCP performs no cwnd reduction when losses
    are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets
    alpha to its maximal value if a RTO happens. This behavior
    is sub-optimal for at least two reasons: i) it ignores losses
    triggering fast retransmissions; and ii) it causes unnecessary large
    cwnd reduction in the future if the loss was isolated as it resets
    the historical term of DCTCP's alpha EWMA to its maximal value (i.e.,
    denoting a total congestion). The second reason has an especially
    noticeable effect when using DCTCP in high BDP environments, where
    alpha normally stays at low values.

    This patch replace the clamping of alpha by setting ssthresh to
    half of cwnd for both fast retransmissions and RTOs, at most once
    per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
    has been removed.

    The table below shows experimental results where we measured the
    drop probability of a PIE AQM (not applying ECN marks) at a
    bottleneck in the presence of a single TCP flow with either the
    alpha-clamping option enabled or the cwnd halving proposed by this
    patch. Results using reno or cubic are given for comparison.

    | Link | RTT | Drop
    TCP CC | speed | base+AQM | probability
    ==================|=========|==========|============
    CUBIC | 40Mbps | 7+20ms | 0.21%
    RENO | | | 0.19%
    DCTCP-CLAMP-ALPHA | | | 25.80%
    DCTCP-HALVE-CWND | | | 0.22%
    ------------------|---------|----------|------------
    CUBIC | 100Mbps | 7+20ms | 0.03%
    RENO | | | 0.02%
    DCTCP-CLAMP-ALPHA | | | 23.30%
    DCTCP-HALVE-CWND | | | 0.04%
    ------------------|---------|----------|------------
    CUBIC | 800Mbps | 1+1ms | 0.04%
    RENO | | | 0.05%
    DCTCP-CLAMP-ALPHA | | | 18.70%
    DCTCP-HALVE-CWND | | | 0.06%

    We see that, without halving its cwnd for all source of losses,
    DCTCP drives the AQM to large drop probabilities in order to keep
    the queue length under control (i.e., it repeatedly faces RTOs).
    Instead, if DCTCP reacts to all source of losses, it can then be
    controlled by the AQM using similar drop levels than cubic or reno.

    Signed-off-by: Koen De Schepper
    Signed-off-by: Olivier Tilmans
    Cc: Bob Briscoe
    Cc: Lawrence Brakmo
    Cc: Florian Westphal
    Cc: Daniel Borkmann
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Cc: Eric Dumazet
    Cc: Andrew Shewmaker
    Cc: Glenn Judd
    Acked-by: Florian Westphal
    Acked-by: Neal Cardwell
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Koen De Schepper
     
  • the control path of 'sample' action does not validate the value of 'rate'
    provided by the user, but then it uses it as divisor in the traffic path.
    Validate it in tcf_sample_init(), and return -EINVAL with a proper extack
    message in case that value is zero, to fix a splat with the script below:

    # tc f a dev test0 egress matchall action sample rate 0 group 1 index 2
    # tc -s a s action sample
    total acts 1

    action order 0: sample rate 1/0 group 1 pipe
    index 2 ref 1 bind 1 installed 19 sec used 19 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    # ping 192.0.2.1 -I test0 -c1 -q

    divide error: 0000 [#1] SMP PTI
    CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample]
    Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01
    RSP: 0018:ffffae320190ba30 EFLAGS: 00010246
    RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49
    RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0
    RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000
    R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80
    R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000
    FS: 00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0
    Call Trace:
    tcf_action_exec+0x7c/0x1c0
    tcf_classify+0x57/0x160
    __dev_queue_xmit+0x3dc/0xd10
    ip_finish_output2+0x257/0x6d0
    ip_output+0x75/0x280
    ip_send_skb+0x15/0x40
    raw_sendmsg+0xae3/0x1410
    sock_sendmsg+0x36/0x40
    __sys_sendto+0x10e/0x140
    __x64_sys_sendto+0x24/0x30
    do_syscall_64+0x60/0x210
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [...]
    Kernel panic - not syncing: Fatal exception in interrupt

    Add a TDC selftest to document that 'rate' is now being validated.

    Reported-by: Matteo Croce
    Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
    Signed-off-by: Davide Caratti
    Acked-by: Yotam Gigi
    Signed-off-by: David S. Miller

    Davide Caratti
     
  • When a bpf program is uploaded, the driver computes the number of
    xdp tx queues resulting in the allocation of additional qsets.
    Starting from commit '2ecbe4f4a027 ("net: thunderx: replace global
    nicvf_rx_mode_wq work queue for all VFs to private for each of them")'
    the driver runs link state polling for each VF resulting in the
    following NULL pointer dereference:

    [ 56.169256] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
    [ 56.178032] Mem abort info:
    [ 56.180834] ESR = 0x96000005
    [ 56.183877] Exception class = DABT (current EL), IL = 32 bits
    [ 56.189792] SET = 0, FnV = 0
    [ 56.192834] EA = 0, S1PTW = 0
    [ 56.195963] Data abort info:
    [ 56.198831] ISV = 0, ISS = 0x00000005
    [ 56.202662] CM = 0, WnR = 0
    [ 56.205619] user pgtable: 64k pages, 48-bit VAs, pgdp = 0000000021f0c7a0
    [ 56.212315] [0000000000000020] pgd=0000000000000000, pud=0000000000000000
    [ 56.219094] Internal error: Oops: 96000005 [#1] SMP
    [ 56.260459] CPU: 39 PID: 2034 Comm: ip Not tainted 5.1.0-rc3+ #3
    [ 56.266452] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS T49 02/02/2018
    [ 56.273315] pstate: 80000005 (Nzcv daif -PAN -UAO)
    [ 56.278098] pc : __ll_sc___cmpxchg_case_acq_64+0x4/0x20
    [ 56.283312] lr : mutex_lock+0x2c/0x50
    [ 56.286962] sp : ffff0000219af1b0
    [ 56.290264] x29: ffff0000219af1b0 x28: ffff800f64de49a0
    [ 56.295565] x27: 0000000000000000 x26: 0000000000000015
    [ 56.300865] x25: 0000000000000000 x24: 0000000000000000
    [ 56.306165] x23: 0000000000000000 x22: ffff000011117000
    [ 56.311465] x21: ffff800f64dfc080 x20: 0000000000000020
    [ 56.316766] x19: 0000000000000020 x18: 0000000000000001
    [ 56.322066] x17: 0000000000000000 x16: ffff800f2e077080
    [ 56.327367] x15: 0000000000000004 x14: 0000000000000000
    [ 56.332667] x13: ffff000010964438 x12: 0000000000000002
    [ 56.337967] x11: 0000000000000000 x10: 0000000000000c70
    [ 56.343268] x9 : ffff0000219af120 x8 : ffff800f2e077d50
    [ 56.348568] x7 : 0000000000000027 x6 : 000000062a9d6a84
    [ 56.353869] x5 : 0000000000000000 x4 : ffff800f2e077480
    [ 56.359169] x3 : 0000000000000008 x2 : ffff800f2e077080
    [ 56.364469] x1 : 0000000000000000 x0 : 0000000000000020
    [ 56.369770] Process ip (pid: 2034, stack limit = 0x00000000c862da3a)
    [ 56.376110] Call trace:
    [ 56.378546] __ll_sc___cmpxchg_case_acq_64+0x4/0x20
    [ 56.383414] drain_workqueue+0x34/0x198
    [ 56.387247] nicvf_open+0x48/0x9e8 [nicvf]
    [ 56.391334] nicvf_open+0x898/0x9e8 [nicvf]
    [ 56.395507] nicvf_xdp+0x1bc/0x238 [nicvf]
    [ 56.399595] dev_xdp_install+0x68/0x90
    [ 56.403333] dev_change_xdp_fd+0xc8/0x240
    [ 56.407333] do_setlink+0x8e0/0xbe8
    [ 56.410810] __rtnl_newlink+0x5b8/0x6d8
    [ 56.414634] rtnl_newlink+0x54/0x80
    [ 56.418112] rtnetlink_rcv_msg+0x22c/0x2f8
    [ 56.422199] netlink_rcv_skb+0x60/0x120
    [ 56.426023] rtnetlink_rcv+0x28/0x38
    [ 56.429587] netlink_unicast+0x1c8/0x258
    [ 56.433498] netlink_sendmsg+0x1b4/0x350
    [ 56.437410] sock_sendmsg+0x4c/0x68
    [ 56.440887] ___sys_sendmsg+0x240/0x280
    [ 56.444711] __sys_sendmsg+0x68/0xb0
    [ 56.448275] __arm64_sys_sendmsg+0x2c/0x38
    [ 56.452361] el0_svc_handler+0x9c/0x128
    [ 56.456186] el0_svc+0x8/0xc
    [ 56.459056] Code: 35ffff91 2a1003e0 d65f03c0 f9800011 (c85ffc10)
    [ 56.465166] ---[ end trace 4a57fdc27b0a572c ]---
    [ 56.469772] Kernel panic - not syncing: Fatal exception

    Fix it by checking nicvf_rx_mode_wq pointer in nicvf_open and nicvf_stop

    Fixes: 2ecbe4f4a027 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them")
    Fixes: 2c632ad8bc74 ("net: thunderx: move link state polling function to VF")
    Reported-by: Matteo Croce
    Signed-off-by: Lorenzo Bianconi
    Tested-by: Matteo Croce
    Signed-off-by: David S. Miller

    Lorenzo Bianconi
     
  • Yonglong Liu says:

    ====================
    net: hns: bugfixes for HNS Driver

    This patchset fix some bugs that were found in the test of
    various scenarios, or identify by KASAN/sparse.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • There are some sparse warnings in the HNS drivers:

    warning: incorrect type in assignment (different address spaces)
    expected void [noderef] *io_base
    got void *vaddr
    warning: cast removes address space '' of expression
    [...]

    Add __iomem and change all the u8 __iomem to void __iomem to
    fix these kind of warnings.

    warning: incorrect type in argument 1 (different address spaces)
    expected void [noderef] *base
    got unsigned char [usertype] *base_addr
    warning: cast to restricted __le16
    warning: incorrect type in assignment (different base types)
    expected unsigned int [usertype] tbl_tcam_data_high
    got restricted __le32 [usertype]
    warning: cast to restricted __le32
    [...]

    These variables used u32/u16 as their type, and finally as a
    parameter of writel(), writel() will do the cpu_to_le32 coversion
    so remove the little endian covert code to fix these kind of warnings.

    Signed-off-by: Yonglong Liu
    Signed-off-by: David S. Miller

    Yonglong Liu
     
  • When enable SMMU, remove HNS driver will cause a WARNING:

    [ 141.924177] WARNING: CPU: 36 PID: 2708 at drivers/iommu/dma-iommu.c:443 __iommu_dma_unmap+0xc0/0xc8
    [ 141.954673] Modules linked in: hns_enet_drv(-)
    [ 141.963615] CPU: 36 PID: 2708 Comm: rmmod Tainted: G W 5.0.0-rc1-28723-gb729c57de95c-dirty #32
    [ 141.983593] Hardware name: Huawei D05/D05, BIOS Hisilicon D05 UEFI Nemo 1.8 RC0 08/31/2017
    [ 142.000244] pstate: 60000005 (nZCv daif -PAN -UAO)
    [ 142.009886] pc : __iommu_dma_unmap+0xc0/0xc8
    [ 142.018476] lr : __iommu_dma_unmap+0xc0/0xc8
    [ 142.027066] sp : ffff000013533b90
    [ 142.033728] x29: ffff000013533b90 x28: ffff8013e6983600
    [ 142.044420] x27: 0000000000000000 x26: 0000000000000000
    [ 142.055113] x25: 0000000056000000 x24: 0000000000000015
    [ 142.065806] x23: 0000000000000028 x22: ffff8013e66eee68
    [ 142.076499] x21: ffff8013db919800 x20: 0000ffffefbff000
    [ 142.087192] x19: 0000000000001000 x18: 0000000000000007
    [ 142.097885] x17: 000000000000000e x16: 0000000000000001
    [ 142.108578] x15: 0000000000000019 x14: 363139343a70616d
    [ 142.119270] x13: 6e75656761705f67 x12: 0000000000000000
    [ 142.129963] x11: 00000000ffffffff x10: 0000000000000006
    [ 142.140656] x9 : 1346c1aa88093500 x8 : ffff0000114de4e0
    [ 142.151349] x7 : 6662666578303d72 x6 : ffff0000105ffec8
    [ 142.162042] x5 : 0000000000000000 x4 : 0000000000000000
    [ 142.172734] x3 : 00000000ffffffff x2 : ffff0000114de500
    [ 142.183427] x1 : 0000000000000000 x0 : 0000000000000035
    [ 142.194120] Call trace:
    [ 142.199030] __iommu_dma_unmap+0xc0/0xc8
    [ 142.206920] iommu_dma_unmap_page+0x20/0x28
    [ 142.215335] __iommu_unmap_page+0x40/0x60
    [ 142.223399] hnae_unmap_buffer+0x110/0x134
    [ 142.231639] hnae_free_desc+0x6c/0x10c
    [ 142.239177] hnae_fini_ring+0x14/0x34
    [ 142.246540] hnae_fini_queue+0x2c/0x40
    [ 142.254080] hnae_put_handle+0x38/0xcc
    [ 142.261619] hns_nic_dev_remove+0x54/0xfc [hns_enet_drv]
    [ 142.272312] platform_drv_remove+0x24/0x64
    [ 142.280552] device_release_driver_internal+0x17c/0x20c
    [ 142.291070] driver_detach+0x4c/0x90
    [ 142.298259] bus_remove_driver+0x5c/0xd8
    [ 142.306148] driver_unregister+0x2c/0x54
    [ 142.314037] platform_driver_unregister+0x10/0x18
    [ 142.323505] hns_nic_dev_driver_exit+0x14/0xf0c [hns_enet_drv]
    [ 142.335248] __arm64_sys_delete_module+0x214/0x25c
    [ 142.344891] el0_svc_common+0xb0/0x10c
    [ 142.352430] el0_svc_handler+0x24/0x80
    [ 142.359968] el0_svc+0x8/0x7c0
    [ 142.366104] ---[ end trace 60ad1cd58e63c407 ]---

    The tx ring buffer map when xmit and unmap when xmit done. So in
    hnae_init_ring() did not map tx ring buffer, but in hnae_fini_ring()
    have a unmap operation for tx ring buffer, which is already unmapped
    when xmit done, than cause this WARNING.

    The hnae_alloc_buffers() is called in hnae_init_ring(),
    so the hnae_free_buffers() should be in hnae_fini_ring(), not in
    hnae_free_desc().

    In hnae_fini_ring(), adds a check is_rx_ring() as in hnae_init_ring().
    When the ring buffer is tx ring, adds a piece of code to ensure that
    the tx ring is unmap.

    Signed-off-by: Yonglong Liu
    Signed-off-by: Peng Li
    Signed-off-by: David S. Miller

    Yonglong Liu
     
  • ICMP6 neighbor solicitation messages will be discard by the Hip06
    chips, because of not setting forwarding pool. Enable promisc mode
    has the same problem.

    This patch fix the wrong forwarding table configs for the multicast
    vague matching when enable promisc mode, and add forwarding pool
    for the forwarding table.

    Signed-off-by: Yonglong Liu
    Signed-off-by: David S. Miller

    Yonglong Liu
     
  • When reboot the system again and again, may cause a memory
    overwrite.

    [ 15.638922] systemd[1]: Reached target Swap.
    [ 15.667561] tun: Universal TUN/TAP device driver, 1.6
    [ 15.676756] Bridge firewalling registered
    [ 17.344135] Unable to handle kernel paging request at virtual address 0000000200000040
    [ 17.352179] Mem abort info:
    [ 17.355007] ESR = 0x96000004
    [ 17.358105] Exception class = DABT (current EL), IL = 32 bits
    [ 17.364112] SET = 0, FnV = 0
    [ 17.367209] EA = 0, S1PTW = 0
    [ 17.370393] Data abort info:
    [ 17.373315] ISV = 0, ISS = 0x00000004
    [ 17.377206] CM = 0, WnR = 0
    [ 17.380214] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
    [ 17.386926] [0000000200000040] pgd=0000000000000000
    [ 17.391878] Internal error: Oops: 96000004 [#1] SMP
    [ 17.396824] CPU: 23 PID: 95 Comm: kworker/u130:0 Tainted: G E 4.19.25-1.2.78.aarch64 #1
    [ 17.414175] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.54 08/16/2018
    [ 17.425615] Workqueue: events_unbound async_run_entry_fn
    [ 17.435151] pstate: 00000005 (nzcv daif -PAN -UAO)
    [ 17.444139] pc : __mutex_lock.isra.1+0x74/0x540
    [ 17.453002] lr : __mutex_lock.isra.1+0x3c/0x540
    [ 17.461701] sp : ffff000100d9bb60
    [ 17.469146] x29: ffff000100d9bb60 x28: 0000000000000000
    [ 17.478547] x27: 0000000000000000 x26: ffff802fb8945000
    [ 17.488063] x25: 0000000000000000 x24: ffff802fa32081a8
    [ 17.497381] x23: 0000000000000002 x22: ffff801fa2b15220
    [ 17.506701] x21: ffff000009809000 x20: ffff802fa23a0888
    [ 17.515980] x19: ffff801fa2b15220 x18: 0000000000000000
    [ 17.525272] x17: 0000000200000000 x16: 0000000200000000
    [ 17.534511] x15: 0000000000000000 x14: 0000000000000000
    [ 17.543652] x13: ffff000008d95db8 x12: 000000000000000d
    [ 17.552780] x11: ffff000008d95d90 x10: 0000000000000b00
    [ 17.561819] x9 : ffff000100d9bb90 x8 : ffff802fb89d6560
    [ 17.570829] x7 : 0000000000000004 x6 : 00000004a1801d05
    [ 17.579839] x5 : 0000000000000000 x4 : 0000000000000000
    [ 17.588852] x3 : ffff802fb89d5a00 x2 : 0000000000000000
    [ 17.597734] x1 : 0000000200000000 x0 : 0000000200000000
    [ 17.606631] Process kworker/u130:0 (pid: 95, stack limit = 0x(____ptrval____))
    [ 17.617438] Call trace:
    [ 17.623349] __mutex_lock.isra.1+0x74/0x540
    [ 17.630927] __mutex_lock_slowpath+0x24/0x30
    [ 17.638602] mutex_lock+0x50/0x60
    [ 17.645295] drain_workqueue+0x34/0x198
    [ 17.652623] __sas_drain_work+0x7c/0x168
    [ 17.659903] sas_drain_work+0x60/0x68
    [ 17.666947] hisi_sas_scan_finished+0x30/0x40 [hisi_sas_main]
    [ 17.676129] do_scsi_scan_host+0x70/0xb0
    [ 17.683534] do_scan_async+0x20/0x228
    [ 17.690586] async_run_entry_fn+0x4c/0x1d0
    [ 17.697997] process_one_work+0x1b4/0x3f8
    [ 17.705296] worker_thread+0x54/0x470

    Every time the call trace is not the same, but the overwrite address
    is always the same:
    Unable to handle kernel paging request at virtual address 0000000200000040

    The root cause is, when write the reg XGMAC_MAC_TX_LF_RF_CONTROL_REG,
    didn't use the io_base offset.

    Signed-off-by: Yonglong Liu
    Signed-off-by: David S. Miller

    Yonglong Liu
     
  • When the HNS driver loaded, always have an error print:
    "netif_napi_add() called with weight 256"

    This is because the kernel checks the NAPI polling weights
    requested by drivers and it prints an error message if a driver
    requests a weight bigger than 64.

    So use NAPI_POLL_WEIGHT to fix it.

    Signed-off-by: Yonglong Liu
    Signed-off-by: Peng Li
    Signed-off-by: David S. Miller

    Yonglong Liu
     
  • This patch is trying to fix the issue due to:
    [27237.844750] BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x708/0xa18[hns_enet_drv]

    After hnae_queue_xmit() in hns_nic_net_xmit_hw(), can be
    interrupted by interruptions, and than call hns_nic_tx_poll_one()
    to handle the new packets, and free the skb. So, when turn back to
    hns_nic_net_xmit_hw(), calling skb->len will cause use-after-free.

    This patch update tx ring statistics in hns_nic_tx_poll_one() to
    fix the bug.

    Signed-off-by: Liubin Shu
    Signed-off-by: Zhen Lei
    Signed-off-by: Yonglong Liu
    Signed-off-by: Peng Li
    Signed-off-by: David S. Miller

    Liubin Shu
     

04 Apr, 2019

3 commits

  • Rename bpf_flow_dissector.txt to bpf_flow_dissector.rst and fix
    formatting. Also, link it from the Documentation/networking/index.rst.

    Tested with 'make htmldocs' to make sure it looks reasonable.

    Fixes: ae82899bbe92 ("flow_dissector: document BPF flow dissector environment")
    Signed-off-by: Stanislav Fomichev
    Acked-by: Martin KaFai Lau
    Signed-off-by: Daniel Borkmann

    Stanislav Fomichev
     
  • Running LTP oom01 in a tight loop or memory stress testing put the system
    in a low-memory situation could triggers random memory corruption like
    page flag corruption below due to in fast_isolate_freepages(), if
    isolation fails, next_search_order() does not abort the search immediately
    could lead to improper accesses.

    UBSAN: Undefined behaviour in ./include/linux/mm.h:1195:50
    index 7 is out of range for type 'zone [5]'
    Call Trace:
    dump_stack+0x62/0x9a
    ubsan_epilogue+0xd/0x7f
    __ubsan_handle_out_of_bounds+0x14d/0x192
    __isolate_free_page+0x52c/0x600
    compaction_alloc+0x886/0x25f0
    unmap_and_move+0x37/0x1e70
    migrate_pages+0x2ca/0xb20
    compact_zone+0x19cb/0x3620
    kcompactd_do_work+0x2df/0x680
    kcompactd+0x1d8/0x6c0
    kthread+0x32c/0x3f0
    ret_from_fork+0x35/0x40
    ------------[ cut here ]------------
    kernel BUG at mm/page_alloc.c:3124!
    invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    RIP: 0010:__isolate_free_page+0x464/0x600
    RSP: 0000:ffff888b9e1af848 EFLAGS: 00010007
    RAX: 0000000030000000 RBX: ffff888c39fcf0f8 RCX: 0000000000000000
    RDX: 1ffff111873f9e25 RSI: 0000000000000004 RDI: ffffed1173c35ef6
    RBP: ffff888b9e1af898 R08: fffffbfff4fc2461 R09: fffffbfff4fc2460
    R10: fffffbfff4fc2460 R11: ffffffffa7e12303 R12: 0000000000000008
    R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000007
    FS: 0000000000000000(0000) GS:ffff888ba8e80000(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc7abc00000 CR3: 0000000752416004 CR4: 00000000001606a0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    compaction_alloc+0x886/0x25f0
    unmap_and_move+0x37/0x1e70
    migrate_pages+0x2ca/0xb20
    compact_zone+0x19cb/0x3620
    kcompactd_do_work+0x2df/0x680
    kcompactd+0x1d8/0x6c0
    kthread+0x32c/0x3f0
    ret_from_fork+0x35/0x40

    Link: http://lkml.kernel.org/r/20190320192648.52499-1-cai@lca.pw
    Fixes: dbe2d4e4f12e ("mm, compaction: round-robin the order while searching the free lists for a target")
    Signed-off-by: Qian Cai
    Acked-by: Mel Gorman
    Cc: Daniel Jordan
    Cc: Mikhail Gavrilov
    Cc: Vlastimil Babka
    Cc: Pavel Tatashin
    Signed-off-by: Mel Gorman

    Qian Cai
     
  • Mikhail Gavrilo reported the following bug being triggered in a Fedora
    kernel based on 5.1-rc1 but it is relevant to a vanilla kernel.

    kernel: page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
    kernel: ------------[ cut here ]------------
    kernel: kernel BUG at include/linux/mm.h:1021!
    kernel: invalid opcode: 0000 [#1] SMP NOPTI
    kernel: CPU: 6 PID: 116 Comm: kswapd0 Tainted: G C 5.1.0-0.rc1.git1.3.fc31.x86_64 #1
    kernel: Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 1201 12/07/2018
    kernel: RIP: 0010:__reset_isolation_pfn+0x244/0x2b0
    kernel: Code: fe 06 e8 0f 8e fc ff 44 0f b6 4c 24 04 48 85 c0 0f 85 dc fe ff ff e9 68 fe ff ff 48 c7 c6 58 b7 2e 8c 4c 89 ff e8 0c 75 00 00 0b 48 c7 c6 58 b7 2e 8c e8 fe 74 00 00 0f 0b 48 89 fa 41 b8 01
    kernel: RSP: 0018:ffff9e2d03f0fde8 EFLAGS: 00010246
    kernel: RAX: 0000000000000034 RBX: 000000000081f380 RCX: ffff8cffbddd6c20
    kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff8cffbddd6c20
    kernel: RBP: 0000000000000001 R08: 0000009898b94613 R09: 0000000000000000
    kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000100000
    kernel: R13: 0000000000100000 R14: 0000000000000001 R15: ffffca7de07ce000
    kernel: FS: 0000000000000000(0000) GS:ffff8cffbdc00000(0000) knlGS:0000000000000000
    kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    kernel: CR2: 00007fc1670e9000 CR3: 00000007f5276000 CR4: 00000000003406e0
    kernel: Call Trace:
    kernel: __reset_isolation_suitable+0x62/0x120
    kernel: reset_isolation_suitable+0x3b/0x40
    kernel: kswapd+0x147/0x540
    kernel: ? finish_wait+0x90/0x90
    kernel: kthread+0x108/0x140
    kernel: ? balance_pgdat+0x560/0x560
    kernel: ? kthread_park+0x90/0x90
    kernel: ret_from_fork+0x27/0x50

    He bisected it down to e332f741a8dd ("mm, compaction: be selective about
    what pageblocks to clear skip hints"). The problem is that the patch in
    question was sloppy with respect to the handling of zone boundaries. In
    some instances, it was possible for PFNs outside of a zone to be examined
    and if those were not properly initialised or poisoned then it would
    trigger the VM_BUG_ON. This patch corrects the zone boundary issues when
    resetting pageblock skip hints and Mikhail reported that the bug did not
    trigger after 30 hours of testing.

    Link: http://lkml.kernel.org/r/20190327085424.GL3189@techsingularity.net
    Fixes: e332f741a8dd ("mm, compaction: be selective about what pageblocks to clear skip hints")
    Reported-by: Mikhail Gavrilov
    Tested-by: Mikhail Gavrilov
    Cc: Daniel Jordan
    Cc: Qian Cai
    Cc: Vlastimil Babka
    Signed-off-by: Mel Gorman

    Mel Gorman