26 Apr, 2018

40 commits

  • [ Upstream commit 1e98ffea5a8935ec040ab72299e349cb44b8defd ]

    Several netfilter matches and targets put kernel pointers into
    info objects, but don't set usersize in descriptors.
    This leads to kernel pointer leaks if a match/target is set
    and then read back to userspace.

    Properly set usersize for these matches/targets.

    Found with manual code inspection.

    Fixes: ec2318904965 ("xtables: extend matches and targets with .usersize")
    Signed-off-by: Dmitry Vyukov
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Vyukov
     
  • [ Upstream commit 89a8f6d4904c8cf3ff8fee9fdaff392a6bbb8bf6 ]

    In hyperv_init() its presumed that it always has access to VP index and
    hypercall MSRs while according to the specification it should be checked if
    it's allowed to access the corresponding MSRs before accessing them.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Thomas Gleixner
    Cc: Stephen Hemminger
    Cc: kvm@vger.kernel.org
    Cc: Radim Krčmář
    Cc: Haiyang Zhang
    Cc: "Michael Kelley (EOSG)"
    Cc: Roman Kagan
    Cc: Andy Lutomirski
    Cc: devel@linuxdriverproject.org
    Cc: Paolo Bonzini
    Cc: "K. Y. Srinivasan"
    Cc: Cathy Avery
    Cc: Mohammed Gamal
    Link: https://lkml.kernel.org/r/20180124132337.30138-2-vkuznets@redhat.com
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • [ Upstream commit 202a0a70e445caee1d0ec7aae814e64b1189fa4d ]

    When the frame check sequence (FCS) is split across the last two frames
    of a fragmented packet, part of the FCS gets counted twice, once when
    subtracting the FCS, and again when subtracting the previously received
    data.

    For example, if 1602 bytes are received, and the first fragment contains
    the first 1600 bytes (including the first two bytes of the FCS), and the
    second fragment contains the last two bytes of the FCS:

    'skb->len == 1600' from the first fragment

    size = lstatus & BD_LENGTH_MASK; # 1602
    size -= ETH_FCS_LEN; # 1598
    size -= skb->len; # -2

    Since the size is unsigned, it wraps around and causes a BUG later in
    the packet handling, as shown below:

    kernel BUG at ./include/linux/skbuff.h:2068!
    Oops: Exception in kernel mode, sig: 5 [#1]
    ...
    NIP [c021ec60] skb_pull+0x24/0x44
    LR [c01e2fbc] gfar_clean_rx_ring+0x498/0x690
    Call Trace:
    [df7edeb0] [c01e2c1c] gfar_clean_rx_ring+0xf8/0x690 (unreliable)
    [df7edf20] [c01e33a8] gfar_poll_rx_sq+0x3c/0x9c
    [df7edf40] [c023352c] net_rx_action+0x21c/0x274
    [df7edf90] [c0329000] __do_softirq+0xd8/0x240
    [df7edff0] [c000c108] call_do_irq+0x24/0x3c
    [c0597e90] [c00041dc] do_IRQ+0x64/0xc4
    [c0597eb0] [c000d920] ret_from_except+0x0/0x18
    --- interrupt: 501 at arch_cpu_idle+0x24/0x5c

    Change the size to a signed integer and then trim off any part of the
    FCS that was received prior to the last fragment.

    Fixes: 6c389fc931bc ("gianfar: fix size of scatter-gathered frames")
    Signed-off-by: Andy Spencer
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Andy Spencer
     
  • [ Upstream commit cbd27448faff4843ac4b66cc71445a10623ff48d ]

    When using the max_mw_size parameter of ntb_transport to limit the size of
    the Memory windows, communication cannot be established and the queues
    freeze.

    This is because the mw_size that's reported to the peer is correctly
    limited but the size used locally is not. So the MW is initialized
    with a buffer smaller than the window but the TX side is using the
    full window. This means the TX side will be writing to a region of the
    window that points nowhere.

    This is easily fixed by applying the same limit to tx_size in
    ntb_transport_init_queue().

    Fixes: e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
    Signed-off-by: Logan Gunthorpe
    Acked-by: Allen Hubbe
    Cc: Dave Jiang
    Signed-off-by: Jon Mason
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Logan Gunthorpe
     
  • [ Upstream commit b081808a66345ba725b77ecd8d759bee874cd937 ]

    Failure in XRCD FW deallocation command leaves memory leaked and
    returns error to the user which he can't do anything about it.

    This patch changes behavior to always free memory and always return
    success to the user.

    Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
    Reviewed-by: Majd Dibbiny
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Yuval Shaia
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • [ Upstream commit ea05ba7c559c8e5a5946c3a94a2a266e9a6680a6 ]

    This patch fixes some problems encountered at runtime with
    configurations that support memory-less nodes, or that hot-add CPUs
    into nodes that are memoryless during system execution after boot. The
    problems of interest include:

    * Nodes known to powerpc to be memoryless at boot, but to have CPUs in
    them are allowed to be 'possible' and 'online'. Memory allocations
    for those nodes are taken from another node that does have memory
    until and if memory is hot-added to the node.

    * Nodes which have no resources assigned at boot, but which may still
    be referenced subsequently by affinity or associativity attributes,
    are kept in the list of 'possible' nodes for powerpc. Hot-add of
    memory or CPUs to the system can reference these nodes and bring
    them online instead of redirecting the references to one of the set
    of nodes known to have memory at boot.

    Note that this software operates under the context of CPU hotplug. We
    are not doing memory hotplug in this code, but rather updating the
    kernel's CPU topology (i.e. arch_update_cpu_topology /
    numa_update_cpu_topology). We are initializing a node that may be used
    by CPUs or memory before it can be referenced as invalid by a CPU
    hotplug operation. CPU hotplug operations are protected by a range of
    APIs including cpu_maps_update_begin/cpu_maps_update_done,
    cpus_read/write_lock / cpus_read/write_unlock, device locks, and more.
    Memory hotplug operations, including try_online_node, are protected by
    mem_hotplug_begin/mem_hotplug_done, device locks, and more. In the
    case of CPUs being hot-added to a previously memoryless node, the
    try_online_node operation occurs wholly within the CPU locks with no
    overlap. Using HMC hot-add/hot-remove operations, we have been able to
    add and remove CPUs to any possible node without failures. HMC
    operations involve a degree self-serialization, though.

    Signed-off-by: Michael Bringmann
    Reviewed-by: Nathan Fontenot
    Signed-off-by: Michael Ellerman
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Michael Bringmann
     
  • [ Upstream commit a346137e9142b039fd13af2e59696e3d40c487ef ]

    On powerpc systems which allow 'hot-add' of CPU or memory resources,
    it may occur that the new resources are to be inserted into nodes that
    were not used for these resources at bootup. In the kernel, any node
    that is used must be defined and initialized. These empty nodes may
    occur when,

    * Dedicated vs. shared resources. Shared resources require information
    such as the VPHN hcall for CPU assignment to nodes. Associativity
    decisions made based on dedicated resource rules, such as
    associativity properties in the device tree, may vary from decisions
    made using the values returned by the VPHN hcall.

    * memoryless nodes at boot. Nodes need to be defined as 'possible' at
    boot for operation with other code modules. Previously, the powerpc
    code would limit the set of possible nodes to those which have
    memory assigned at boot, and were thus online. Subsequent add/remove
    of CPUs or memory would only work with this subset of possible
    nodes.

    * memoryless nodes with CPUs at boot. Due to the previous restriction
    on nodes, nodes that had CPUs but no memory were being collapsed
    into other nodes that did have memory at boot. In practice this
    meant that the node assignment presented by the runtime kernel
    differed from the affinity and associativity attributes presented by
    the device tree or VPHN hcalls. Nodes that might be known to the
    pHyp were not 'possible' in the runtime kernel because they did not
    have memory at boot.

    This patch ensures that sufficient nodes are defined to support
    configuration requirements after boot, as well as at boot. This patch
    set fixes a couple of problems.

    * Nodes known to powerpc to be memoryless at boot, but to have CPUs in
    them are allowed to be 'possible' and 'online'. Memory allocations
    for those nodes are taken from another node that does have memory
    until and if memory is hot-added to the node. * Nodes which have no
    resources assigned at boot, but which may still be referenced
    subsequently by affinity or associativity attributes, are kept in
    the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs
    to the system can reference these nodes and bring them online
    instead of redirecting to one of the set of nodes that were known to
    have memory at boot.

    This patch extracts the value of the lowest domain level (number of
    allocable resources) from the device tree property
    "ibm,max-associativity-domains" to use as the maximum number of nodes
    to setup as possibly available in the system. This new setting will
    override the instruction:

    nodes_and(node_possible_map, node_possible_map, node_online_map);

    presently seen in the function arch/powerpc/mm/numa.c:initmem_init().

    If the "ibm,max-associativity-domains" property is not present at
    boot, no operation will be performed to define or enable additional
    nodes, or enable the above 'nodes_and()'.

    Signed-off-by: Michael Bringmann
    Reviewed-by: Nathan Fontenot
    Signed-off-by: Michael Ellerman
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Michael Bringmann
     
  • [ Upstream commit c25ef6a5e62fa212d298ce24995ce239f29b5f96 ]

    Do not build lib/bpf/bpf.o with this Makefile but use the one from the
    library directory. This avoid making a buggy bpf.o file (e.g. missing
    symbols).

    This patch is useful if some code (e.g. Landlock tests) needs both the
    bpf.o (from tools/lib/bpf) and the bpf_load.o (from samples/bpf).

    Signed-off-by: Mickaël Salaün
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Mickaël Salaün
     
  • [ Upstream commit 40339af33c703bacb336493157d43c86a8bf2fed ]

    In commit 36777d9fa24c ("i40e: check current configured input set when
    adding ntuple filters") some code was added to report the input set
    mask for a given filter when reporting it to the user.

    This code is necessary so that the reported filter correctly displays
    that it is or is not masking certain fields.

    Unfortunately the code was incorrect. Development error accidentally
    swapped the mask values for the IPv4 addresses with the L4 port numbers.
    The port numbers are only 16bits wide while IPv4 addresses are 32 bits.
    Unfortunately we assigned only 16 bits to the IPv4 address masks.
    Additionally we assigned 32bit value 0xFFFFFFF to the TCP port numbers.
    This second part does not matter as the value would be truncated to
    16bits regardless, but it is unnecessary.

    Fix the reported masks to properly report that the entire field is
    masked.

    Fixes: 36777d9fa24c ("i40e: check current configured input set when adding ntuple filters")
    Signed-off-by: Jacob Keller
    Tested-by: Andrew Bowers
    Signed-off-by: Jeff Kirsher
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jacob Keller
     
  • [ Upstream commit 02b4016bfe43d2d5ed043be7ffa56cda6a4d1100 ]

    When implementing support for IP_USER_FLOW filters, we correctly
    programmed a filter for both the non fragmented IPv4/Other filter, as
    well as the fragmented IPv4 filters. However, we did not properly
    program the input set for fragmented IPv4 PCTYPE. This meant that the
    filters would almost certainly not match, unless the user specified all
    of the flow types.

    Add support to program the fragmented IPv4 filter input set. Since we
    always program these filters together, we'll assume that the two input
    sets must match, and will thus always program the input sets to the same
    value.

    Signed-off-by: Jacob Keller
    Tested-by: Andrew Bowers
    Signed-off-by: Jeff Kirsher
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jacob Keller
     
  • [ Upstream commit 2bafa8fac19a31ca72ae1a3e48df35f73661dbed ]

    commit 2de6aa3a666e ("ixgbe: Add support for padding packet")

    Uses RXDCTL.RLPML to limit the maximum frame size on Rx when using
    build_skb. Unfortunately that register does not work on 82599.

    Added an explicit check to avoid setting this register on 82599 MAC.

    Extended the comment related to the setting of RXDCTL.RLPML to better
    explain its purpose.

    Signed-off-by: Emil Tantilov
    Tested-by: Andrew Bowers
    Signed-off-by: Jeff Kirsher
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Emil Tantilov
     
  • [ Upstream commit 5bdd0c6f89fba430e18d636493398389dadc3b17 ]

    If jffs2_iget() fails for a newly-allocated inode, jffs2_do_clear_inode()
    can get called twice in the error handling path, the first call in
    jffs2_iget() itself and the second through iget_failed(). This can result
    to a use-after-free error in the second jffs2_do_clear_inode() call, such
    as shown by the oops below wherein the second jffs2_do_clear_inode() call
    was trying to free node fragments that were already freed in the first
    jffs2_do_clear_inode() call.

    [ 78.178860] jffs2: error: (1904) jffs2_do_read_inode_internal: CRC failed for read_inode of inode 24 at physical location 0x1fc00c
    [ 78.178914] Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b7b
    [ 78.185871] pgd = ffffffc03a567000
    [ 78.188794] [6b6b6b6b6b6b6b7b] *pgd=0000000000000000, *pud=0000000000000000
    [ 78.194968] Internal error: Oops: 96000004 [#1] PREEMPT SMP
    ...
    [ 78.513147] PC is at rb_first_postorder+0xc/0x28
    [ 78.516503] LR is at jffs2_kill_fragtree+0x28/0x90 [jffs2]
    [ 78.520672] pc : [] lr : [] pstate: 60000105
    [ 78.526757] sp : ffffff800cea38f0
    [ 78.528753] x29: ffffff800cea38f0 x28: ffffffc01f3f8e80
    [ 78.532754] x27: 0000000000000000 x26: ffffff800cea3c70
    [ 78.536756] x25: 00000000dc67c8ae x24: ffffffc033d6945d
    [ 78.540759] x23: ffffffc036811740 x22: ffffff800891a5b8
    [ 78.544760] x21: 0000000000000000 x20: 0000000000000000
    [ 78.548762] x19: ffffffc037d48910 x18: ffffff800891a588
    [ 78.552764] x17: 0000000000000800 x16: 0000000000000c00
    [ 78.556766] x15: 0000000000000010 x14: 6f2065646f6e695f
    [ 78.560767] x13: 6461657220726f66 x12: 2064656c69616620
    [ 78.564769] x11: 435243203a6c616e x10: 7265746e695f6564
    [ 78.568771] x9 : 6f6e695f64616572 x8 : ffffffc037974038
    [ 78.572774] x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008
    [ 78.576775] x5 : 002f91d85bd44a2f x4 : 0000000000000000
    [ 78.580777] x3 : 0000000000000000 x2 : 000000403755e000
    [ 78.584779] x1 : 6b6b6b6b6b6b6b6b x0 : 6b6b6b6b6b6b6b6b
    ...
    [ 79.038551] [] rb_first_postorder+0xc/0x28
    [ 79.042962] [] jffs2_do_clear_inode+0x88/0x100 [jffs2]
    [ 79.048395] [] jffs2_evict_inode+0x3c/0x48 [jffs2]
    [ 79.053443] [] evict+0xb0/0x168
    [ 79.056835] [] iput+0x1c0/0x200
    [ 79.060228] [] iget_failed+0x30/0x3c
    [ 79.064097] [] jffs2_iget+0x2d8/0x360 [jffs2]
    [ 79.068740] [] jffs2_lookup+0xe8/0x130 [jffs2]
    [ 79.073434] [] lookup_slow+0x118/0x190
    [ 79.077435] [] walk_component+0xfc/0x28c
    [ 79.081610] [] path_lookupat+0x84/0x108
    [ 79.085699] [] filename_lookup+0x88/0x100
    [ 79.089960] [] user_path_at_empty+0x58/0x6c
    [ 79.094396] [] vfs_statx+0xa4/0x114
    [ 79.098138] [] SyS_newfstatat+0x58/0x98
    [ 79.102227] [] __sys_trace_return+0x0/0x4
    [ 79.106489] Code: d65f03c0 f9400001 b40000e1 aa0103e0 (f9400821)

    The jffs2_do_clear_inode() call in jffs2_iget() is unnecessary since
    iget_failed() will eventually call jffs2_do_clear_inode() if needed, so
    just remove it.

    Fixes: 5451f79f5f81 ("iget: stop JFFS2 from using iget() and read_inode()")
    Reviewed-by: Richard Weinberger
    Signed-off-by: Jake Daryll Obina
    Signed-off-by: Al Viro
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jake Daryll Obina
     
  • [ Upstream commit 3624a8f02568f08aef299d3b117f2226f621177d ]

    Returning EOPNOTSUPP is problematic because it can also be
    returned by the method function, and we use it in quite a few
    places in drivers these days.

    Instead, dedicate EPROTONOSUPPORT to indicate that the ioctl framework
    is enabled but the requested object and method are not supported by
    the kernel. No other case will return this code, and it lets userspace
    know to fall back to write().

    grep says we do not use it today in drivers/infiniband subsystem.

    Signed-off-by: Jason Gunthorpe
    Reviewed-by: Matan Barak
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jason Gunthorpe
     
  • [ Upstream commit 980b4c95e78e4113cb7b9f430f121dab1c814b6c ]

    Since CRYPTO_SHA384 does not exists, Kconfig should not select it.
    Anyway, all SHA384 stuff is in CRYPTO_SHA512 which is already selected.

    Fixes: a21eb94fc4d3i ("crypto: axis - add ARTPEC-6/7 crypto accelerator driver")
    Signed-off-by: Corentin Labbe
    Signed-off-by: Herbert Xu
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Corentin LABBE
     
  • [ Upstream commit c505cbd45f6e9c539d57dd171d95ec7e5e9f9cd0 ]

    Some of the drivers may use the macro at runtime flow, like

    struct property_entry p[10];
    ...
    p[index++] = PROPERTY_ENTRY_U8("u8 property", u8_data);

    In that case and absence of the data type compiler fails the build:

    drivers/char/ipmi/ipmi_dmi.c:79:29: error: Expected ; at end of statement
    drivers/char/ipmi/ipmi_dmi.c:79:29: error: got {

    Acked-by: Corey Minyard
    Cc: Corey Minyard
    Signed-off-by: Andy Shevchenko
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Andy Shevchenko
     
  • [ Upstream commit c7e1b4059075c9e8eed101d7cc5da43e95eb5e18 ]

    Exar sleep wake-up handling has been done on a per-channel basis by
    virtue of INT0 being accessible from each channel's address space. I
    believe this was initially done out of necessity, but now that Exar
    devices have their own driver, we can do things more efficiently by
    registering a dedicated INT0 handler at the PCI device level.

    I see this change providing the following benefits:

    1. If more than one port is active, eliminates the redundant bus
    cycles for reading INT0 on every interrupt.
    2. This note associated with hooking in the per-channel handler in
    8250_port.c is resolved:
    /* Fixme: probably not the best place for this */

    Cc: Matt Schulte
    Signed-off-by: Aaron Sierra
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Aaron Sierra
     
  • [ Upstream commit 617ab45c9a8900e64a78b43696c02598b8cad68b ]

    When hypercall-based TLB flush was enabled for Hyper-V guests PCID feature
    was deliberately suppressed as a precaution: back then PCID was never
    exposed to Hyper-V guests and it wasn't clear what will happen if some day
    it becomes available. The day came and PCID/INVPCID features are already
    exposed on certain Hyper-V hosts.

    >From TLFS (as of 5.0b) it is unclear how TLB flush hypercalls combine with
    PCID. In particular the usage of PCID is per-cpu based: the same mm gets
    different CR3 values on different CPUs. If the hypercall does exact
    matching this will fail. However, this is not the case. David Zhang
    explains:

    "In practice, the AddressSpace argument is ignored on any VM that supports
    PCIDs.

    Architecturally, the AddressSpace argument must match the CR3 with PCID
    bits stripped out (i.e., the low 12 bits of AddressSpace should be 0 in
    long mode). The flush hypercalls flush all PCIDs for the specified
    AddressSpace."

    With this, PCID can be enabled.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: Thomas Gleixner
    Cc: David Zhang
    Cc: Stephen Hemminger
    Cc: Haiyang Zhang
    Cc: "Michael Kelley (EOSG)"
    Cc: Andy Lutomirski
    Cc: devel@linuxdriverproject.org
    Cc: "K. Y. Srinivasan"
    Cc: Aditya Bhandari
    Link: https://lkml.kernel.org/r/20180124103629.29980-1-vkuznets@redhat.com
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • [ Upstream commit cf315ea596ec26d7aa542a9ce354990875a920c0 ]

    When a VF is under PF VLAN assignment:

    ip link set vf vlan

    This will remove all previous entries in the VLAN table including those
    generated by VLAN interfaces created on the VF. The issue arises when
    the VF is under PF VLAN assignment and one or more of these VLAN
    interfaces of the VF are deleted. When deleting these VLAN interfaces,
    the following message will be generated in "dmesg":

    failed to kill vid 0081/ for device

    This is due to the fact that "ndo_vlan_rx_kill_vid" exits with an error.
    The handler for this ndo is "fm10k_update_vid". Any calls to this
    function while under PF VLAN management will exit prematurely and, thus,
    it will generate the failure message.

    Additionally, since "fm10k_update_vid" exits prematurely, none of the
    VLAN update is performed. So, even though the actual VLAN interfaces of
    the VF will be deleted, the active_vlans bitmask is not cleared. When
    the VF is no longer under PF VLAN assignment, the driver mistakenly
    restores the previous entries of the VLAN table based on an
    unsynchronized list of active VLANs.

    The solution to this issue involves checking the VLAN update action type
    before exiting "fm10k_update_vid". If the VLAN update action type is to
    "add", this action will not be permitted while the VF is under PF VLAN
    assignment and the VLAN update is abandoned like before.

    However, if the VLAN update action type is to "kill", then we need to
    also clear the active_vlans bitmask. However, we don't need to actually
    queue any messages to the PF, because the MAC and VLAN tables have
    already been cleared, and the PF would silently ignore these requests
    anyways.

    Signed-off-by: Ngai-Mint Kwan
    Signed-off-by: Jacob Keller
    Tested-by: Krishneil Singh
    Signed-off-by: Jeff Kirsher
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Ngai-Mint Kwan
     
  • [ Upstream commit 3a53285228165225a7f76c7d5ff1ddc0213ce0e4 ]

    Problem description:
    After ethernet cable connect and disconnect for several iterations on a
    device with i210, tx timestamp will stop being put into the socket.

    Steps to reproduce:
    1. Setup a device with i210 and wire it to a 802.1AS capable switch (
    Extreme Networks Summit x440 is used in our case)
    2. Have the gptp daemon running on the device and make sure it is synced
    with the switch
    3. Have the switch disable and enable the port, wait for the device gets
    resynced with the switch
    4. Iterates step 3 until the device is not albe to get resynced
    5. Review the log in dmesg and you will see warning message "igb : clearing
    Tx timestamp hang"

    Root cause:
    If ptp_tx_work() gets scheduled just before the port gets disabled, a LINK
    DOWN event will be processed before ptp_tx_work(), which may cause timeout
    in ptp_tx_work(). In the timeout logic, the TSYNCTXCTL's TXTT bit (Transmit
    timestamp valid bit) is not cleared, causing no new timestamp loaded to
    TXSTMP register. Consequently therefore, no new interrupt is triggerred by
    TSICR.TXTS bit and no more Tx timestamp send to the socket.

    Signed-off-by: Daniel Hua
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Daniel Hua
     
  • [ Upstream commit 177132df5e45b134c147f419f567a3b56aafaf2b ]

    Before libvirt modifies the MAC address and vlan tag for an SRIOV VF
    for use by a virtual machine (either using vfio device assignment or
    macvtap passthru mode), it saves the current MAC address and vlan tag
    so that it can reset them to their original value when the guest is
    done. Libvirt can't leave the VF MAC set to the value used by the
    now-defunct guest since it may be started again later using a
    different VF, but it certainly shouldn't just pick any random value,
    either. So it saves the state of everything prior to using the VF, and
    resets it to that.

    The igb driver initializes the MAC addresses of all VFs to
    00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK
    netlink message, also visible in the list of VFs in the output of "ip
    link show"). But when libvirt attempts to restore the MAC address back
    to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the kernel
    responds with "Invalid argument".

    Forbidding a reset back to the original value leaves the VF MAC at the
    value set for the now-defunct virtual machine. Especially on a system
    with NetworkManager enabled, this has very bad consequences, since
    NetworkManager forces all interfacess to be IFF_UP all the time - if
    the same virtual machine is restarted using a different VF (or even on
    a different host), there will be multiple interfaces watching for
    traffic with the same MAC address.

    To allow libvirt to revert to the original state, we need a way to
    remove the administrative set MAC on a VF, to allow normal host
    operation again, and to reset/overwrite the VF MAC via VF netdev.

    This patch implements the outlined scenario by allowing to set the
    VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF.
    igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0,
    so it's possible to reset the VF MAC back to the original value via
    the VF netdev.

    Note: Recent patches to libvirt allow for a workaround if the NIC
    isn't capable of resetting the administrative MAC back to all 0, but
    in theory the NIC should allow resetting the MAC in the first place.

    Signed-off-by: Corinna Vinschen
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Corinna Vinschen
     
  • [ Upstream commit fde7f9dbc71365230eeb8c8ea97ce9b552c8e5bd ]

    The rt5514 dsp captures pcm data through spi directly, so we should not
    use rockchip-i2s as it's cpu dai like other codecs.

    Use dummy_dai for rt5514 dsp dailink to make voice wakeup work again.

    Reported-by: Jimmy Cheng-Yi Chiang
    Fixes: (72cfb0f20c75 ASoC: rockchip: Use codec of_node and dai_name for rt5514 dsp)
    Signed-off-by: Jeffy Chen
    Tested-by: Brian Norris
    Signed-off-by: Mark Brown
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jeffy Chen
     
  • [ Upstream commit 6b136a24b05c81a24e0b648a4bd938bcd0c4f69e ]

    Attributes that only implement .seq_ops are read-only, any write to
    them should be rejected. But currently kernel would crash when
    writing to such debugfs entries, e.g.

    chmod +w /sys/kernel/debug/block//requeue_list
    echo 0 > /sys/kernel/debug/block//requeue_list
    chmod -w /sys/kernel/debug/block//requeue_list

    Fix it by returning -EPERM in blk_mq_debugfs_write() when writing to
    such attributes.

    Cc: Ming Lei
    Signed-off-by: Eryu Guan
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Eryu Guan
     
  • [ Upstream commit b3ecd4aa8632a86428605ab73393d14779019d82 ]

    Another VCPU might try to modify the SCB while we are creating the
    shadow SCB. In general this is no problem - unless the compiler decides
    to not load values once, but e.g. twice.

    For us, this is only relevant when checking/working with such values.
    E.g. the prefix value, the mso, state of transactional execution and
    addresses of satellite blocks.

    E.g. if we blindly forward values (e.g. general purpose registers or
    execution controls after masking), we don't care.

    Leaving unpin_blocks() untouched for now, will handle it separately.

    The worst thing right now that I can see would be a missed prefix
    un/remap (mso, prefix, tx) or using wrong guest addresses. Nothing
    critical, but let's try to avoid unpredictable behavior.

    Signed-off-by: David Hildenbrand
    Message-Id:
    Reviewed-by: Christian Borntraeger
    Acked-by: Cornelia Huck
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Hildenbrand
     
  • [ Upstream commit 587d8628fb71c3bfae29fb2bbe84c1478c59bac8 ]

    This patch prevents the thinkpad_acpi driver from warning about 2 event
    codes returned for keyboard palm-detection. No behavioral changes,
    other than suppressing the warning in the kernel log. The events are
    still forwarded via acpi-netlink channels.

    We could, optionally, decide to forward the event through a
    input-switch on the tpacpi input device. However, so far no suitable
    input-code exists, and no similar drivers report such events. Hence,
    leave it an acpi event for now.

    Note that the event-codes are named based on empirical studies. On the
    ThinkPad X1 5th Gen the sensor can be found underneath the arrow key.

    Cc: Matthew Thode
    Signed-off-by: David Herrmann
    Acked-by: Henrique de Moraes Holschuh
    Signed-off-by: Andy Shevchenko
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Herrmann
     
  • [ Upstream commit e0346f9fcb6c636d2f870e6666de8781413f34ea ]

    If we receive the link status message from PF with link up before queues
    are actually enabled, it will trigger a TX hang. This fixes the issue
    by ignoring a link up message if the VF state is not yet in RUNNING
    state.

    Signed-off-by: Alan Brady
    Tested-by: Andrew Bowers
    Signed-off-by: Jeff Kirsher
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Alan Brady
     
  • [ Upstream commit 06aa040f039404a0039a5158cd12f41187487a1f ]

    When a host disables and enables a PF device, all the associated
    VFs are removed and added back in. It also generates a PFR which in turn
    resets all the connected VFs. This behaviour is different from that of
    Linux guest on Linux host. Hence we end up in a situation where there's
    a PFR and device removal at the same time. And watchdog doesn't have a
    clue about this and schedules a reset_task. This patch adds code to send
    signal to reset_task that the device is currently being removed.

    Signed-off-by: Avinash Dayanand
    Tested-by: Andrew Bowers
    Signed-off-by: Jeff Kirsher
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Avinash Dayanand
     
  • [ Upstream commit 783687810e986a15ffbf86c516a1a48ff37f38f7 ]

    Bug: BPF programs and maps related to sockmaps test exist
    in memory even after test_maps ends.

    This patch fixes it as a short term workaround (sockmap
    kernel side needs real fixing) by empyting sockmaps when
    test ends.

    Fixes: 6f6d33f3b3d0f ("bpf: selftests add sockmap tests")
    Signed-off-by: Prashant Bhole
    [ daniel: Note on workaround. ]
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Prashant Bhole
     
  • [ Upstream commit 20d59023c5ec4426284af492808bcea1f39787ef ]

    We inadvertently set it again on the source bio, but we need
    to set it on the new split bio instead.

    Fixes: fbbaf700e7b1 ("block: trace completion of all bios.")
    Signed-off-by: Goldwyn Rodrigues
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Goldwyn Rodrigues
     
  • [ Upstream commit e58decc9c51eb61697aba35ba8eda33f4b80552d ]

    Fix to return error code -EINVAL instead of 0 when num_vfs above
    limit_vfs, as done elsewhere in this function.

    Fixes: 0dc786219186 ("nfp: handle SR-IOV already enabled when driver is probing")
    Signed-off-by: Wei Yongjun
    Acked-by: Jakub Kicinski
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Wei Yongjun
     
  • [ Upstream commit 7ad81482cad67cbe1ec808490d1ddfc420c42008 ]

    We get the "new_profile_index" value from the mouse device when we're
    handling raw events. Smatch taints it as untrusted data and complains
    that we need a bounds check. This seems like a reasonable warning
    otherwise there is a small read beyond the end of the array.

    Fixes: 0e70f97f257e ("HID: roccat: Add support for Kova[+] mouse")
    Signed-off-by: Dan Carpenter
    Acked-by: Silvan Jegen
    Signed-off-by: Jiri Kosina
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • [ Upstream commit cba04cdf437d745fac85220d1d692a9ae23d7004 ]

    The interrupt is requested before the device is powered on and
    it's value in some cases cannot be reliable. It happens on some
    devices that an interrupt is generated as soon as requested
    before having the chance to disable the irq.

    Set the irq flag as IRQ_NOAUTOEN before requesting it.

    This patch mutes the error:

    stmfts 2-0049: failed to read events: -11

    received sometimes during boot time.

    Signed-off-by: Andi Shyti
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Andi Shyti
     
  • [ Upstream commit 96d5eaa9bb74d299508d811d865c2c41b38b0301 ]

    While testing with the ARM specific memset() macro removed, I ran into a
    compiler warning that shows an old bug:

    drivers/scsi/arm/fas216.c: In function 'fas216_rq_sns_done':
    drivers/scsi/arm/fas216.c:2014:40: error: argument to 'sizeof' in 'memset' call is the same expression as the destination; did you mean to provide an explicit length? [-Werror=sizeof-pointer-memaccess]

    It turns out that the definition of the scsi_cmd structure changed back
    in linux-2.6.25, so now we clear only four bytes (sizeof(pointer))
    instead of 96 (SCSI_SENSE_BUFFERSIZE). I did not check whether we
    actually need to initialize the buffer here, but it's clear that if we
    do it, we should use the correct size.

    Fixes: de25deb18016 ("[SCSI] use dynamically allocated sense buffer")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     
  • [ Upstream commit 3f884a0a8bdf28cfd1e9987d54d83350096cdd46 ]

    Replace "" with NULL for product revision level, and merge TEXEL
    duplicate entries.

    Cc: Hannes Reinecke
    Cc: Martin K. Petersen
    Cc: James E.J. Bottomley
    Cc: SCSI ML
    Signed-off-by: Xose Vazquez Perez
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Xose Vazquez Perez
     
  • [ Upstream commit a9d572c7550044d5b217b5287d99a2e6d34b97b0 ]

    When io_bits is set, GCing encrypted block may hit the following hungtask.
    Since io_bits requires aligned block address, f2fs_submit_page_write may
    return -EAGAIN if new_blkaddr does not satisify io_bits alignment. As a
    result, the encrypted page will never be writtenback.

    This patch makes move_data_block aware the EAGAIN error and cancel the
    writeback.

    [ 246.751371] INFO: task kworker/u4:4:797 blocked for more than 90 seconds.
    [ 246.752423] Not tainted 4.15.0-rc4+ #11
    [ 246.754176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 246.755336] kworker/u4:4 D25448 797 2 0x80000000
    [ 246.755597] Workqueue: writeback wb_workfn (flush-7:0)
    [ 246.755616] Call Trace:
    [ 246.755695] ? __schedule+0x322/0xa90
    [ 246.755761] ? blk_init_request_from_bio+0x120/0x120
    [ 246.755773] ? pci_mmcfg_check_reserved+0xb0/0xb0
    [ 246.755801] ? __radix_tree_create+0x19e/0x200
    [ 246.755813] ? delete_node+0x136/0x370
    [ 246.755838] schedule+0x43/0xc0
    [ 246.755904] io_schedule+0x17/0x40
    [ 246.755939] wait_on_page_bit_common+0x17b/0x240
    [ 246.755950] ? wake_page_function+0xa0/0xa0
    [ 246.755961] ? add_to_page_cache_lru+0x160/0x160
    [ 246.755972] ? page_cache_tree_insert+0x170/0x170
    [ 246.755983] ? __lru_cache_add+0x96/0xb0
    [ 246.756086] __filemap_fdatawait_range+0x14f/0x1c0
    [ 246.756097] ? wait_on_page_bit_common+0x240/0x240
    [ 246.756120] ? __wake_up_locked_key_bookmark+0x20/0x20
    [ 246.756167] ? wait_on_all_pages_writeback+0xc9/0x100
    [ 246.756179] ? __remove_ino_entry+0x120/0x120
    [ 246.756192] ? wait_woken+0x100/0x100
    [ 246.756204] filemap_fdatawait_range+0x9/0x20
    [ 246.756216] write_checkpoint+0x18a1/0x1f00
    [ 246.756254] ? blk_get_request+0x10/0x10
    [ 246.756265] ? cpumask_next_and+0x43/0x60
    [ 246.756279] ? f2fs_sync_inode_meta+0x160/0x160
    [ 246.756289] ? remove_element.isra.4+0xa0/0xa0
    [ 246.756300] ? __put_compound_page+0x40/0x40
    [ 246.756310] ? f2fs_sync_fs+0xec/0x1c0
    [ 246.756320] ? f2fs_sync_fs+0x120/0x1c0
    [ 246.756329] f2fs_sync_fs+0x120/0x1c0
    [ 246.756357] ? trace_event_raw_event_f2fs__page+0x260/0x260
    [ 246.756393] ? ata_build_rw_tf+0x173/0x410
    [ 246.756397] f2fs_balance_fs_bg+0x198/0x390
    [ 246.756405] ? drop_inmem_page+0x230/0x230
    [ 246.756415] ? ahci_qc_prep+0x1bb/0x2e0
    [ 246.756418] ? ahci_qc_issue+0x1df/0x290
    [ 246.756422] ? __accumulate_pelt_segments+0x42/0xd0
    [ 246.756426] ? f2fs_write_node_pages+0xd1/0x380
    [ 246.756429] f2fs_write_node_pages+0xd1/0x380
    [ 246.756437] ? sync_node_pages+0x8f0/0x8f0
    [ 246.756440] ? update_curr+0x53/0x220
    [ 246.756444] ? __accumulate_pelt_segments+0xa2/0xd0
    [ 246.756448] ? __update_load_avg_se.isra.39+0x349/0x360
    [ 246.756452] ? do_writepages+0x2a/0xa0
    [ 246.756456] do_writepages+0x2a/0xa0
    [ 246.756460] __writeback_single_inode+0x70/0x490
    [ 246.756463] ? check_preempt_wakeup+0x199/0x310
    [ 246.756467] writeback_sb_inodes+0x2a2/0x660
    [ 246.756471] ? is_empty_dir_inode+0x40/0x40
    [ 246.756474] ? __writeback_single_inode+0x490/0x490
    [ 246.756477] ? string+0xbf/0xf0
    [ 246.756480] ? down_read_trylock+0x35/0x60
    [ 246.756484] __writeback_inodes_wb+0x9f/0xf0
    [ 246.756488] wb_writeback+0x41d/0x4b0
    [ 246.756492] ? writeback_inodes_wb.constprop.55+0x150/0x150
    [ 246.756498] ? set_worker_desc+0xf7/0x130
    [ 246.756502] ? current_is_workqueue_rescuer+0x60/0x60
    [ 246.756511] ? _find_next_bit+0x2c/0xa0
    [ 246.756514] ? wb_workfn+0x400/0x5d0
    [ 246.756518] wb_workfn+0x400/0x5d0
    [ 246.756521] ? finish_task_switch+0xdf/0x2a0
    [ 246.756525] ? inode_wait_for_writeback+0x30/0x30
    [ 246.756529] process_one_work+0x3a7/0x6f0
    [ 246.756533] worker_thread+0x82/0x750
    [ 246.756537] kthread+0x16f/0x1c0
    [ 246.756541] ? trace_event_raw_event_workqueue_work+0x110/0x110
    [ 246.756544] ? kthread_create_worker_on_cpu+0xb0/0xb0
    [ 246.756548] ret_from_fork+0x1f/0x30

    Signed-off-by: Sheng Yong
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Sheng Yong
     
  • [ Upstream commit 00db63c128dd3daf38f481371976c24d32678142 ]

    If valid netdevice is not found for RoCE, GID table should not be
    searched with NULL netdevice.

    Doing so causes the search routines to ignore the netdev argument and may
    match the wrong GID table entry if the netdev is deleted.

    Fixes: abae1b71dd37 ("IB/cma: cma_validate_port should verify the port and netdevice")
    Signed-off-by: Parav Pandit
    Reviewed-by: Mark Bloch
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     
  • [ Upstream commit 7583d8d088ff2c323b1d4f15b191ca2c23d32558 ]

    Before rbio_orig_end_io() goes to free rbio, rbio may get merged with
    more bios from other rbios and rbio->bio_list becomes non-empty,
    in that case, these newly merged bios don't end properly.

    Once unlock_stripe() is done, rbio->bio_list will not be updated any
    more and we can call bio_endio() on all queued bios.

    It should only happen in error-out cases, the normal path of recover
    and full stripe write have already set RBIO_RMW_LOCKED_BIT to disable
    merge before doing IO, so rbio_orig_end_io() called by them doesn't
    have the above issue.

    Reported-by: Jérôme Carretero
    Signed-off-by: Liu Bo
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Liu Bo
     
  • [ Upstream commit 18e83ac75bfe67009c4ddcdd581bba8eb16f4030 ]

    This fixes a corner case that is caused by a race of dio write vs dio
    read/write.

    Here is how the race could happen.

    Suppose that no extent map has been loaded into memory yet.
    There is a file extent [0, 32K), two jobs are running concurrently
    against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio
    read from [0, 4K) or [4K, 8K).

    t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K).

    ------------------------------------------------------
    t1 t2
    btrfs_get_blocks_direct() btrfs_get_blocks_direct()
    -> btrfs_get_extent() -> btrfs_get_extent()
    -> lookup_extent_mapping()
    -> add_extent_mapping() -> lookup_extent_mapping()
    # load [0, 32K)
    -> btrfs_new_extent_direct()
    -> btrfs_drop_extent_cache()
    # split [0, 32K) and
    # drop [8K, 32K)
    -> add_extent_mapping()
    # add [8K, 32K)
    -> add_extent_mapping()
    # handle -EEXIST when adding
    # [0, 32K)
    ------------------------------------------------------
    About how t2(dio read/write) runs into -EEXIST:

    a) add_extent_mapping() gets -EEXIST for adding em [0, 32k),

    b) search_extent_mapping() then returns [0, 8k) as the existing em,
    even though start == existing->start, em is [0, 32k) so that
    extent_map_end(em) > extent_map_end(existing), i.e. 32k > 8k,

    c) then it goes thru merge_extent_mapping() which tries to add a [8k, 8k)
    (with a length 0) and returns -EEXIST as [8k, 32k) is already in tree,

    d) so btrfs_get_extent() ends up returning -EEXIST to dio read/write,
    which is confusing applications.

    Here I conclude all the possible situations,
    1) start < existing->start

    +-----------+em+-----------+
    +--prev---+ | +-------------+ |
    | | | | | |
    +---------+ + +---+existing++ ++
    +
    |
    +
    start

    2) start == existing->start

    +------------em------------+
    | +-------------+ |
    | | | |
    + +----existing-+ +
    |
    |
    +
    start

    3) start > existing->start && start < (existing->start + existing->len)

    +------------em------------+
    | +-------------+ |
    | | | |
    + +----existing-+ +
    |
    |
    +
    start

    4) start >= (existing->start + existing->len)

    +-----------+em+-----------+
    | +-------------+ | +--next---+
    | | | | | |
    + +---+existing++ + +---------+
    +
    |
    +
    start

    As we can see, it turns out that if start is within existing em (front
    inclusive), then the existing em should be returned as is, otherwise,
    we try our best to merge candidate em with sibling ems to form a
    larger em (in order to reduce the total number of em).

    Reported-by: David Vallender
    Signed-off-by: Liu Bo
    Reviewed-by: Josef Bacik
    Signed-off-by: David Sterba

    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Liu Bo
     
  • [ Upstream commit 6f794e3c5c8f8fdd3b5bb20d9ded894e685b5bbe ]

    It appears from the original commit [1] that there isn't any design
    specific reason not to fail the mount instead of just warning. This
    patch will change it to fail.

    [1]
    commit 319e4d0661e5323c9f9945f0f8fb5905e5fe74c3
    btrfs: Enhance super validation check

    Fixes: 319e4d0661e5323 ("btrfs: Enhance super validation check")
    Signed-off-by: Anand Jain
    Reviewed-by: Qu Wenruo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Anand Jain
     
  • [ Upstream commit 762221f095e3932669093466aaf4b85ed9ad2ac1 ]

    The raid6 corruption is that,
    suppose that all disks can be read without problems and if the content
    that was read out doesn't match its checksum, currently for raid6
    btrfs at most retries twice,

    - the 1st retry is to rebuild with all other stripes, it'll eventually
    be a raid5 xor rebuild,
    - if the 1st fails, the 2nd retry will deliberately fail parity p so
    that it will do raid6 style rebuild,

    however, the chances are that another non-parity stripe content also
    has something corrupted, so that the above retries are not able to
    return correct content.

    We've fixed normal reads to rebuild raid6 correctly with more retries
    in Patch "Btrfs: make raid6 rebuild retry more"[1], this is to fix
    scrub to do the exactly same rebuild process.

    [1]: https://patchwork.kernel.org/patch/10091755/

    Signed-off-by: Liu Bo
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Liu Bo
     
  • [ Upstream commit 9ea2c7c9da13c9073e371c046cbbc45481ecb459 ]

    When modifying a tree where the root is at BTRFS_MAX_LEVEL - 1 then
    the level variable is going to be 7 (this is the max height of the
    tree). On the other hand btrfs_cow_block is always called with
    "level + 1" as an index into the nodes and slots arrays. This leads to
    an out of bounds access. Admittdely this will be benign since an OOB
    access of the nodes array will likely read the 0th element from the
    slots array, which in this case is going to be 0 (since we start CoW at
    the top of the tree). The OOB access into the slots array in turn will
    read the 0th and 1st values of the locks array, which would both be 0
    at the time. However, this benign behavior relies on the fact that the
    path being passed hasn't been initialised, if it has already been used to
    query a btree then it could potentially have populated the nodes/slots arrays.

    Fix it by explicitly checking if we are at level 7 (the maximum allowed
    index in nodes/slots arrays) and explicitly call the CoW routine with
    NULL for parent's node/slot.

    Signed-off-by: Nikolay Borisov
    Fixes-coverity-id: 711515
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Borisov