21 Dec, 2019

17 commits

  • commit f53056c43063257ae4159d83c425eaeb772bcd71 upstream.

    In gfs2_page_mkwrite's gfs2_allocate_page_backing helper, try to
    allocate as many blocks at once as we need. Pass in the size of the
    requested allocation.

    Fixes: 35af80aef99b ("gfs2: don't use buffer_heads in gfs2_allocate_page_backing")
    Cc: stable@vger.kernel.org # v5.3+
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Greg Kroah-Hartman

    Andreas Gruenbacher
     
  • commit e64681b487c897ec871465083bf0874087d47b66 upstream.

    KASAN shadow map doesn't need to be accessible through the linear kernel
    mapping, allocate its pages with MEMBLOCK_ALLOC_ANYWHERE so that high
    memory can be used. This frees up to ~100MB of low memory on xtensa
    configurations with KASAN and high memory.

    Cc: stable@vger.kernel.org # v5.1+
    Fixes: f240ec09bb8a ("memblock: replace memblock_alloc_base(ANYWHERE) with memblock_phys_alloc")
    Reviewed-by: Mike Rapoport
    Signed-off-by: Max Filippov
    Signed-off-by: Greg Kroah-Hartman

    Max Filippov
     
  • commit cc90bc68422318eb8e75b15cd74bc8d538a7df29 upstream.

    This partially reverts commit e3a5d8e386c3fb973fa75f2403622a8f3640ec06.

    Commit e3a5d8e386c3 ("check bi_size overflow before merge") adds a bio_full
    check to __bio_try_merge_page. This will cause __bio_try_merge_page to fail
    when the last bi_io_vec has been reached. Instead, what we want here is only
    the bi_size overflow check.

    Fixes: e3a5d8e386c3 ("block: check bi_size overflow before merge")
    Cc: stable@vger.kernel.org # v5.4+
    Reviewed-by: Ming Lei
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Andreas Gruenbacher
     
  • commit c6a3aea93571a5393602256d8f74772bd64c8225 upstream.

    QOS requests for DEFAULT_VALUE are supposed to be ignored but this is
    not the case for FREQ_QOS_MAX. Adding one request for MAX_DEFAULT_VALUE
    and one for a real value will cause freq_qos_read_value to unexpectedly
    return MAX_DEFAULT_VALUE (-1).

    This happens because freq_qos max value is aggregated with PM_QOS_MIN
    but FREQ_QOS_MAX_DEFAULT_VALUE is (-1) so it's smaller than other
    values.

    Fix this by redefining FREQ_QOS_MAX_DEFAULT_VALUE to S32_MAX.

    Looking at current users for freq_qos it seems that none of them create
    requests for FREQ_QOS_MAX_DEFAULT_VALUE.

    Fixes: 77751a466ebd ("PM: QoS: Introduce frequency QoS")
    Signed-off-by: Leonard Crestez
    Reported-by: Matthias Kaehlcke
    Reviewed-by: Matthias Kaehlcke
    Cc: 5.4+ # 5.4+
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Leonard Crestez
     
  • commit f338bb9f0179cb959977b74e8331b312264d720b upstream.

    Enhance the ACS quirk for Cavium Processors. Add the root port vendor IDs
    for ThunderX2 and ThunderX3 series of processors.

    [bhelgaas: add Fixes: and stable tag]
    Fixes: f2ddaf8dfd4a ("PCI: Apply Cavium ThunderX ACS quirk to more Root Ports")
    Link: https://lore.kernel.org/r/20191111024243.GA11408@dc5-eodlnx05.marvell.com
    Signed-off-by: George Cherian
    Signed-off-by: Bjorn Helgaas
    Reviewed-by: Robert Richter
    Cc: stable@vger.kernel.org # v4.12+
    Signed-off-by: Greg Kroah-Hartman

    George Cherian
     
  • commit 7c7e53e1c93df14690bd12c1f84730fef927a6f1 upstream.

    The R-Car Gen2/3 manual - available at:

    https://www.renesas.com/eu/en/products/microcontrollers-microprocessors/rz/rzg/rzg1m.html#documents

    "RZ/G Series User's Manual: Hardware" section

    strictly enforces the MACCTLR inizialization value - 39.3.1 - "Initial
    Setting of PCI Express":

    "Be sure to write the initial value (= H'80FF 0000) to MACCTLR before
    enabling PCIETCTLR.CFINIT".

    To avoid unexpected behavior and to match the SW initialization sequence
    guidelines, this patch programs the MACCTLR with the correct value.

    Note that the MACCTLR.SPCHG bit in the MACCTLR register description
    reports that "Only writing 1 is valid and writing 0 is invalid" but this
    "invalid" has to be interpreted as a write-ignore aka "ignored", not
    "prohibited".

    Reported-by: Eugeniu Rosca
    Fixes: c25da4778803 ("PCI: rcar: Add Renesas R-Car PCIe driver")
    Fixes: be20bbcb0a8c ("PCI: rcar: Add the initialization of PCIe link in resume_noirq()")
    Signed-off-by: Yoshihiro Shimoda
    Signed-off-by: Lorenzo Pieralisi
    Reviewed-by: Geert Uytterhoeven
    Cc: # v5.2+
    Signed-off-by: Greg Kroah-Hartman

    Yoshihiro Shimoda
     
  • commit 73884a7082f466ce6686bb8dd7e6571dd42313b4 upstream.

    As per PCIe r5.0, sec 7.8.5.2, fixed bus numbers of a bridge must be zero
    when no function that uses EA is located behind it. Hence, if EA supplies
    bus numbers of zero, assign bus numbers normally. A secondary bus can
    never have a bus number of zero, so setting a bridge's Secondary Bus Number
    to zero makes downstream devices unreachable.

    [bhelgaas: retain bool return value so "zero is invalid" logic is local]
    Fixes: 2dbce5901179 ("PCI: Assign bus numbers present in EA capability for bridges")
    Link: https://lore.kernel.org/r/1572850664-9861-1-git-send-email-sundeep.lkml@gmail.com
    Signed-off-by: Subbaraya Sundeep
    Signed-off-by: Bjorn Helgaas
    Cc: stable@vger.kernel.org # v5.2+
    Signed-off-by: Greg Kroah-Hartman

    Subbaraya Sundeep
     
  • commit e045fa29e89383c717e308609edd19d2fd29e1be upstream.

    When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector
    Control register for each vector and saves it in desc->masked. Each
    register is 32 bits and bit 0 is the actual Mask bit.

    When we restored these registers during resume, we previously set the Mask
    bit if *any* bit in desc->masked was set instead of when the Mask bit
    itself was set:

    pci_restore_state
    pci_restore_msi_state
    __pci_restore_msix_state
    for_each_pci_msi_entry
    msix_mask_irq(entry, entry->masked) masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT
    if (flag)
    Signed-off-by: Bjorn Helgaas
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Jian-Hong Pan
     
  • commit d8558ac8c93d429d65d7490b512a3a67e559d0d4 upstream.

    According to documentation [0] the correct offset for the Upstream Peer
    Decode Configuration Register (UPDCR) is 0x1014. It was previously defined
    as 0x1114.

    d99321b63b1f ("PCI: Enable quirks for PCIe ACS on Intel PCH root ports")
    intended to enforce isolation between PCI devices allowing them to be put
    into separate IOMMU groups. Due to the wrong register offset the intended
    isolation was not fully enforced. This is fixed with this patch.

    Please note that I did not test this patch because I have no hardware that
    implements this register.

    [0] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/4th-gen-core-family-mobile-i-o-datasheet.pdf (page 325)
    Fixes: d99321b63b1f ("PCI: Enable quirks for PCIe ACS on Intel PCH root ports")
    Link: https://lore.kernel.org/r/7a3505df-79ba-8a28-464c-88b83eefffa6@kernkonzept.com
    Signed-off-by: Steffen Liebergeld
    Signed-off-by: Bjorn Helgaas
    Reviewed-by: Andrew Murray
    Acked-by: Ashok Raj
    Cc: stable@vger.kernel.org # v3.15+
    Signed-off-by: Greg Kroah-Hartman

    Steffen Liebergeld
     
  • commit 157c1062fcd86ade3c674503705033051fd3d401 upstream.

    A sysfs request to enable or disable a PCIe hotplug slot should not
    return before it has been carried out. That is sought to be achieved by
    waiting until the controller's "pending_events" have been cleared.

    However the IRQ thread pciehp_ist() clears the "pending_events" before
    it acts on them. If pciehp_sysfs_enable_slot() / _disable_slot() happen
    to check the "pending_events" after they have been cleared but while
    pciehp_ist() is still running, the functions may return prematurely
    with an incorrect return value.

    Fix by introducing an "ist_running" flag which must be false before a sysfs
    request is allowed to return.

    Fixes: 32a8cef274fe ("PCI: pciehp: Enable/disable exclusively from IRQ thread")
    Link: https://lore.kernel.org/linux-pci/1562226638-54134-1-git-send-email-wangxiongfeng2@huawei.com
    Link: https://lore.kernel.org/r/4174210466e27eb7e2243dd1d801d5f75baaffd8.1565345211.git.lukas@wunner.de
    Reported-and-tested-by: Xiongfeng Wang
    Signed-off-by: Lukas Wunner
    Signed-off-by: Bjorn Helgaas
    Cc: stable@vger.kernel.org # v4.19+
    Signed-off-by: Greg Kroah-Hartman

    Lukas Wunner
     
  • commit f2c33ccacb2d4bbeae2a255a7ca0cbfd03017b7c upstream.

    pci_pm_thaw_noirq() is supposed to return the device to D0 and restore its
    configuration registers, but previously it only did that for devices whose
    drivers implemented the new power management ops.

    Hibernation, e.g., via "echo disk > /sys/power/state", involves freezing
    devices, creating a hibernation image, thawing devices, writing the image,
    and powering off. The fact that thawing did not return devices with legacy
    power management to D0 caused errors, e.g., in this path:

    pci_pm_thaw_noirq
    if (pci_has_legacy_pm_support(pci_dev)) # true for Mellanox VF driver
    return pci_legacy_resume_early(dev) # ... legacy PM skips the rest
    pci_set_power_state(pci_dev, PCI_D0)
    pci_restore_state(pci_dev)
    pci_pm_thaw
    if (pci_has_legacy_pm_support(pci_dev))
    pci_legacy_resume
    drv->resume
    mlx4_resume
    ...
    pci_enable_msix_range
    ...
    if (dev->current_state != PCI_D0) #
    Signed-off-by: Bjorn Helgaas
    Reviewed-by: Rafael J. Wysocki
    Cc: stable@vger.kernel.org # v4.13+
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     
  • commit 6acdf7e19b37cb3a9258603d0eab315079c19c5e upstream.

    The part_event_bitmap register is 64 bits wide, so read it with ioread64()
    instead of the 32-bit ioread32().

    Fixes: 52eabba5bcdb ("switchtec: Add IOCTLs to the Switchtec driver")
    Link: https://lore.kernel.org/r/20190910195833.3891-1-logang@deltatee.com
    Reported-by: Doug Meyer
    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Bjorn Helgaas
    Cc: stable@vger.kernel.org # v4.12+
    Cc: Kelvin Cao
    Signed-off-by: Greg Kroah-Hartman

    Logan Gunthorpe
     
  • commit 2ac55d5e5ec9ad0a07e194f0eaca865fe5aa3c40 upstream.

    It have turned out that it's not a good idea to unconditionally do a power
    cycle and then to re-initialize the SDIO card, as currently done through
    mmc_hw_reset() -> mmc_sdio_hw_reset(). This because there may be multiple
    SDIO func drivers probed, who also shares the same SDIO card.

    To address these scenarios, one may be tempted to use a notification
    mechanism, as to allow the core to inform each of the probed func drivers,
    about an ongoing HW reset. However, supporting such an operation from the
    func driver point of view, may not be entirely trivial.

    Therefore, let's use a more simplistic approach to solve the problem, by
    instead forcing the card to be removed and re-detected, via scheduling a
    rescan-work. In this way, we can rely on existing infrastructure, as the
    func driver's ->remove() and ->probe() callbacks, becomes invoked to deal
    with the cleanup and the re-initialization.

    This solution may be considered as rather heavy, especially if a func
    driver doesn't share its card with other func drivers. To address this,
    let's keep the current immediate HW reset option as well, but run it only
    when there is one func driver probed for the card.

    Finally, to allow the caller of mmc_hw_reset(), to understand if the reset
    is being asynchronously managed from a scheduled work, it returns 1
    (propagated from mmc_sdio_hw_reset()). If the HW reset is executed
    successfully and synchronously it returns 0, which maintains the existing
    behaviour.

    Reviewed-by: Douglas Anderson
    Tested-by: Douglas Anderson
    Cc: stable@vger.kernel.org # v5.4+
    Signed-off-by: Ulf Hansson
    Signed-off-by: Greg Kroah-Hartman

    Ulf Hansson
     
  • commit 99b4ddd8b76a6f60a8c2b3775849d65d21a418fc upstream.

    Upfront in mmc_rescan() we use the host->rescan_entered flag, to allow
    scanning only once for non-removable cards. Therefore, it's also not
    possible that we can have a corresponding card bus attached (host->bus_ops
    is NULL), when we are scanning non-removable cards.

    For this reason, let' drop the check for mmc_card_is_removable() as it's
    redundant.

    Reviewed-by: Douglas Anderson
    Tested-by: Douglas Anderson
    Cc: stable@vger.kernel.org # v5.4+
    Signed-off-by: Ulf Hansson
    Signed-off-by: Greg Kroah-Hartman

    Ulf Hansson
     
  • commit a0d4c7eb71dd08a89ad631177bb0cbbabd598f84 upstream.

    MMC IOCTLS with R1B responses may cause the card to enter the busy state,
    which means it's not ready to receive a new request. To prevent new
    requests from being sent to the card, use a CMD13 polling loop to verify
    that the card returns to the transfer state, before completing the request.

    Signed-off-by: Chaotian Jing
    Reviewed-by: Avri Altman
    Cc: stable@vger.kernel.org
    Signed-off-by: Ulf Hansson
    Signed-off-by: Greg Kroah-Hartman

    Chaotian Jing
     
  • commit 3869468e0c4800af52bfe1e0b72b338dcdae2cfc upstream.

    To prepare for more users of card_busy_detect(), let's drop the struct
    request * as an in-parameter and convert to log the error message via
    dev_err() instead of pr_err().

    Signed-off-by: Chaotian Jing
    Reviewed-by: Avri Altman
    Cc: stable@vger.kernel.org
    Signed-off-by: Ulf Hansson
    Signed-off-by: Greg Kroah-Hartman

    Chaotian Jing
     
  • commit f8c63edfd78905320e86b6b2be2b7a5ac768fa4e upstream.

    Fix commit 7b81cb6bddd2 ("usb: add a HCD_DMA flag instead of
    guestimating DMA capabilities") where local memory USB drivers
    erroneously allocate DMA memory instead of pool memory, causing

    OHCI Unrecoverable Error, disabled
    HC died; cleaning up

    The order between hcd_uses_dma() and hcd->localmem_pool is now
    arranged as in hcd_buffer_alloc() and hcd_buffer_free(), with the
    test for hcd->localmem_pool placed first.

    As an alternative, one might consider adjusting hcd_uses_dma() with

    static inline bool hcd_uses_dma(struct usb_hcd *hcd)
    {
    - return IS_ENABLED(CONFIG_HAS_DMA) && (hcd->driver->flags & HCD_DMA);
    + return IS_ENABLED(CONFIG_HAS_DMA) &&
    + (hcd->driver->flags & HCD_DMA) &&
    + (hcd->localmem_pool == NULL);
    }

    One can also consider unsetting HCD_DMA for local memory pool drivers.

    Fixes: 7b81cb6bddd2 ("usb: add a HCD_DMA flag instead of guestimating DMA capabilities")
    Cc: stable
    Signed-off-by: Fredrik Noring
    Link: https://lore.kernel.org/r/20191210172905.GA52526@sx9
    Signed-off-by: Greg Kroah-Hartman

    Fredrik Noring
     

18 Dec, 2019

23 commits

  • Greg Kroah-Hartman
     
  • [ Upstream commit 00222d1394104f0fd6c01ca9f578afec9e0f148b ]

    RTL8125 also requires to enable RX for WoL.

    v2: add missing Fixes tag

    Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125")
    Signed-off-by: Heiner Kallweit
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Heiner Kallweit
     
  • [ Upstream commit 9385973fe8db9743fa93bf17245635be4eb8c4a6 ]

    Currently a switch driver deinit frees the regmaps, but the PTP clock is
    still out there, available to user space via /dev/ptpN. Any PTP
    operation is a ticking time bomb, since it will attempt to use the freed
    regmaps and thus trigger kernel panics:

    [ 4.291746] fsl_enetc 0000:00:00.2 eth1: error -22 setting up slave phy
    [ 4.291871] mscc_felix 0000:00:00.5: Failed to register DSA switch: -22
    [ 4.308666] mscc_felix: probe of 0000:00:00.5 failed with error -22
    [ 6.358270] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000088
    [ 6.367090] Mem abort info:
    [ 6.369888] ESR = 0x96000046
    [ 6.369891] EC = 0x25: DABT (current EL), IL = 32 bits
    [ 6.369892] SET = 0, FnV = 0
    [ 6.369894] EA = 0, S1PTW = 0
    [ 6.369895] Data abort info:
    [ 6.369897] ISV = 0, ISS = 0x00000046
    [ 6.369899] CM = 0, WnR = 1
    [ 6.369902] user pgtable: 4k pages, 48-bit VAs, pgdp=00000020d58c7000
    [ 6.369904] [0000000000000088] pgd=00000020d5912003, pud=00000020d5915003, pmd=0000000000000000
    [ 6.369914] Internal error: Oops: 96000046 [#1] PREEMPT SMP
    [ 6.420443] Modules linked in:
    [ 6.423506] CPU: 1 PID: 262 Comm: phc_ctl Not tainted 5.4.0-03625-gb7b2a5dadd7f #204
    [ 6.431273] Hardware name: LS1028A RDB Board (DT)
    [ 6.435989] pstate: 40000085 (nZcv daIf -PAN -UAO)
    [ 6.440802] pc : css_release+0x24/0x58
    [ 6.444561] lr : regmap_read+0x40/0x78
    [ 6.448316] sp : ffff800010513cc0
    [ 6.451636] x29: ffff800010513cc0 x28: ffff002055873040
    [ 6.456963] x27: 0000000000000000 x26: 0000000000000000
    [ 6.462289] x25: 0000000000000000 x24: 0000000000000000
    [ 6.467617] x23: 0000000000000000 x22: 0000000000000080
    [ 6.472944] x21: ffff800010513d44 x20: 0000000000000080
    [ 6.478270] x19: 0000000000000000 x18: 0000000000000000
    [ 6.483596] x17: 0000000000000000 x16: 0000000000000000
    [ 6.488921] x15: 0000000000000000 x14: 0000000000000000
    [ 6.494247] x13: 0000000000000000 x12: 0000000000000000
    [ 6.499573] x11: 0000000000000000 x10: 0000000000000000
    [ 6.504899] x9 : 0000000000000000 x8 : 0000000000000000
    [ 6.510225] x7 : 0000000000000000 x6 : ffff800010513cf0
    [ 6.515550] x5 : 0000000000000000 x4 : 0000000fffffffe0
    [ 6.520876] x3 : 0000000000000088 x2 : ffff800010513d44
    [ 6.526202] x1 : ffffcada668ea000 x0 : ffffcada64d8b0c0
    [ 6.531528] Call trace:
    [ 6.533977] css_release+0x24/0x58
    [ 6.537385] regmap_read+0x40/0x78
    [ 6.540795] __ocelot_read_ix+0x6c/0xa0
    [ 6.544641] ocelot_ptp_gettime64+0x4c/0x110
    [ 6.548921] ptp_clock_gettime+0x4c/0x58
    [ 6.552853] pc_clock_gettime+0x5c/0xa8
    [ 6.556699] __arm64_sys_clock_gettime+0x68/0xc8
    [ 6.561331] el0_svc_common.constprop.2+0x7c/0x178
    [ 6.566133] el0_svc_handler+0x34/0xa0
    [ 6.569891] el0_sync_handler+0x114/0x1d0
    [ 6.573908] el0_sync+0x140/0x180
    [ 6.577232] Code: d503201f b00119a1 91022263 b27b7be4 (f9004663)
    [ 6.583349] ---[ end trace d196b9b14cdae2da ]---
    [ 6.587977] Kernel panic - not syncing: Fatal exception
    [ 6.593216] SMP: stopping secondary CPUs
    [ 6.597151] Kernel Offset: 0x4ada54400000 from 0xffff800010000000
    [ 6.603261] PHYS_OFFSET: 0xffffd0a7c0000000
    [ 6.607454] CPU features: 0x10002,21806008
    [ 6.611558] Memory Limit: none

    And now that ocelot->ptp_clock is checked at exit, prevent a potential
    error where ptp_clock_register returned a pointer-encoded error, which
    we are keeping in the ocelot private data structure. So now,
    ocelot->ptp_clock is now either NULL or a valid pointer.

    Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
    Cc: Antoine Tenart
    Reviewed-by: Florian Fainelli
    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Vladimir Oltean
     
  • [ Upstream commit ffac2027e18f006f42630f2e01a8a9bd8dc664b5 ]

    If the user has specified their own RSS hash key, don't
    lose it across queue resets such as DOWN/UP, MTU change,
    and number of channels change. This is fixed by moving
    the key initialization to a little earlier in the lif
    creation.

    Also, let's clean up the RSS config a little better on
    the way down by setting it all to 0.

    Fixes: aa3198819bea ("ionic: Add RSS support")
    Signed-off-by: Shannon Nelson
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Shannon Nelson
     
  • [ Upstream commit 86c76c09898332143be365c702cf8d586ed4ed21 ]

    A lockdep splat was observed when trying to remove an xdp memory
    model from the table since the mutex was obtained when trying to
    remove the entry, but not before the table walk started:

    Fix the splat by obtaining the lock before starting the table walk.

    Fixes: c3f812cea0d7 ("page_pool: do not release pool until inflight == 0.")
    Reported-by: Grygorii Strashko
    Signed-off-by: Jonathan Lemon
    Tested-by: Grygorii Strashko
    Acked-by: Jesper Dangaard Brouer
    Acked-by: Ilias Apalodimas
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jonathan Lemon
     
  • [ Upstream commit c3f812cea0d7006469d1cf33a4a9f0a12bb4b3a3 ]

    The page pool keeps track of the number of pages in flight, and
    it isn't safe to remove the pool until all pages are returned.

    Disallow removing the pool until all pages are back, so the pool
    is always available for page producers.

    Make the page pool responsible for its own delayed destruction
    instead of relying on XDP, so the page pool can be used without
    the xdp memory model.

    When all pages are returned, free the pool and notify xdp if the
    pool is registered with the xdp memory system. Have the callback
    perform a table walk since some drivers (cpsw) may share the pool
    among multiple xdp_rxq_info.

    Note that the increment of pages_state_release_cnt may result in
    inflight == 0, resulting in the pool being released.

    Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic warning")
    Signed-off-by: Jonathan Lemon
    Acked-by: Jesper Dangaard Brouer
    Acked-by: Ilias Apalodimas
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jonathan Lemon
     
  • [ Upstream commit 3d7cadae51f1b7f28358e36d0a1ce3f0ae2eee60 ]

    When setting speed to 100G via ethtool (AN is set to off), only 25G*4 is
    configured while the user, who has an advanced HW which supports
    extended PTYS, expects also 50G*2 to be configured.
    With this patch, when extended PTYS mode is available, configure
    PTYS via extended fields.

    Fixes: 4b95840a6ced ("net/mlx5e: Fix matching of speed to PRM link modes")
    Signed-off-by: Aya Levin
    Reviewed-by: Eran Ben Elisha
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Greg Kroah-Hartman

    Aya Levin
     
  • [ Upstream commit 6d485e5e555436d2c13accdb10807328c4158a17 ]

    Add a missing value in translation of PTYS ext_eth_proto_oper to its
    corresponding speed. When ext_eth_proto_oper bit 10 is set, ethtool
    shows unknown speed. With this fix, ethtool shows speed is 100G as
    expected.

    Fixes: a08b4ed1373d ("net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register")
    Signed-off-by: Aya Levin
    Reviewed-by: Eran Ben Elisha
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Greg Kroah-Hartman

    Aya Levin
     
  • [ Upstream commit a23dae79fb6555c808528707c6389345d0b0c189 ]

    Flows are allocated with kzalloc() so free with kfree().

    Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
    Signed-off-by: Roi Dayan
    Reviewed-by: Eli Britstein
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Greg Kroah-Hartman

    Roi Dayan
     
  • [ Upstream commit c431f8597863a91eea6024926e0c1b179cfa4852 ]

    SFF 8472 eeprom length is 512 bytes. Fix module info return value to
    support 512 bytes read.

    Fixes: ace329f4ab3b ("net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query")
    Signed-off-by: Eran Ben Elisha
    Reviewed-by: Aya Levin
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Greg Kroah-Hartman

    Eran Ben Elisha
     
  • [ Upstream commit 95219afbb980f10934de9f23a3e199be69c5ed09 ]

    The act_ct TC module shares a common conntrack and NAT infrastructure
    exposed via netfilter. It's possible that a packet needs both SNAT and
    DNAT manipulation, due to e.g. tuple collision. Netfilter can support
    this because it runs through the NAT table twice - once on ingress and
    again after egress. The act_ct action doesn't have such capability.

    Like netfilter hook infrastructure, we should run through NAT twice to
    keep the symmetry.

    Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
    Signed-off-by: Aaron Conole
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Aaron Conole
     
  • [ Upstream commit c55d8b108caa2ec1ae8dddd02cb9d3a740f7c838 ]

    Cited patch changed (channel index, tc) => (TXQ index) mapping to be a
    static one, in order to keep indices consistent when changing number of
    channels or TCs.

    For 32 channels (OOB) and 8 TCs, real num of TXQs is 256.
    When reducing the amount of channels to 8, the real num of TXQs will be
    changed to 64.
    This indices method is buggy:
    - Channel #0, TC 3, the TXQ index is 96.
    - Index 8 is not valid, as there is no such TXQ from driver perspective
    (As it represents channel #8, TC 0, which is not valid with the above
    configuration).

    As part of driver's select queue, it calls netdev_pick_tx which returns an
    index in the range of real number of TXQs. Depends on the return value,
    with the examples above, driver could have returned index larger than the
    real number of tx queues, or crash the kernel as it tries to read invalid
    address of SQ which was not allocated.

    Fix that by allocating sequential TXQ indices, and hold a new mapping
    between (channel index, tc) => (real TXQ index). This mapping will be
    updated as part of priv channels activation, and is used in
    mlx5e_select_queue to find the selected queue index.

    The existing indices mapping (channel_tc2txq) is no longer needed, as it
    is used only for statistics structures and can be calculated on run time.
    Delete its definintion and updates.

    Fixes: 8bfaf07f7806 ("net/mlx5e: Present SW stats when state is not opened")
    Signed-off-by: Eran Ben Elisha
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Greg Kroah-Hartman

    Eran Ben Elisha
     
  • [ Upstream commit d04ac224b1688f005a84f764cfe29844f8e9da08 ]

    The skb_mpls_push was not updating ethertype of an ethernet packet if
    the packet was originally received from a non ARPHRD_ETHER device.

    In the below OVS data path flow, since the device corresponding to
    port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does
    not update the ethertype of the packet even though the previous
    push_eth action had added an ethernet header to the packet.

    recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no),
    actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
    push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4

    Fixes: 8822e270d697 ("net: core: move push MPLS functionality from OvS to core helper")
    Signed-off-by: Martin Varghese
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Martin Varghese
     
  • [ Upstream commit df95467b6d2bfce49667ee4b71c67249b01957f7 ]

    hsr_dev_xmit() calls hsr_port_get_hsr() to find master node and that would
    return NULL if master node is not existing in the list.
    But hsr_dev_xmit() doesn't check return pointer so a NULL dereference
    could occur.

    Test commands:
    ip netns add nst
    ip link add veth0 type veth peer name veth1
    ip link add veth2 type veth peer name veth3
    ip link set veth1 netns nst
    ip link set veth3 netns nst
    ip link set veth0 up
    ip link set veth2 up
    ip link add hsr0 type hsr slave1 veth0 slave2 veth2
    ip a a 192.168.100.1/24 dev hsr0
    ip link set hsr0 up
    ip netns exec nst ip link set veth1 up
    ip netns exec nst ip link set veth3 up
    ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
    ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
    ip netns exec nst ip link set hsr1 up
    hping3 192.168.100.2 -2 --flood &
    modprobe -rv hsr

    Splat looks like:
    [ 217.351122][ T1635] kasan: CONFIG_KASAN_INLINE enabled
    [ 217.352969][ T1635] kasan: GPF could be caused by NULL-ptr deref or user memory access
    [ 217.354297][ T1635] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 217.355507][ T1635] CPU: 1 PID: 1635 Comm: hping3 Not tainted 5.4.0+ #192
    [ 217.356472][ T1635] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    [ 217.357804][ T1635] RIP: 0010:hsr_dev_xmit+0x34/0x90 [hsr]
    [ 217.373010][ T1635] Code: 48 8d be 00 0c 00 00 be 04 00 00 00 48 83 ec 08 e8 21 be ff ff 48 8d 78 10 48 ba 00 b
    [ 217.376919][ T1635] RSP: 0018:ffff8880cd8af058 EFLAGS: 00010202
    [ 217.377571][ T1635] RAX: 0000000000000000 RBX: ffff8880acde6840 RCX: 0000000000000002
    [ 217.379465][ T1635] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: 0000000000000010
    [ 217.380274][ T1635] RBP: ffff8880acde6840 R08: ffffed101b440d5d R09: 0000000000000001
    [ 217.381078][ T1635] R10: 0000000000000001 R11: ffffed101b440d5c R12: ffff8880bffcc000
    [ 217.382023][ T1635] R13: ffff8880bffcc088 R14: 0000000000000000 R15: ffff8880ca675c00
    [ 217.383094][ T1635] FS: 00007f060d9d1740(0000) GS:ffff8880da000000(0000) knlGS:0000000000000000
    [ 217.384289][ T1635] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 217.385009][ T1635] CR2: 00007faf15381dd0 CR3: 00000000d523c001 CR4: 00000000000606e0
    [ 217.385940][ T1635] Call Trace:
    [ 217.386544][ T1635] dev_hard_start_xmit+0x160/0x740
    [ 217.387114][ T1635] __dev_queue_xmit+0x1961/0x2e10
    [ 217.388118][ T1635] ? check_object+0xaf/0x260
    [ 217.391466][ T1635] ? __alloc_skb+0xb9/0x500
    [ 217.392017][ T1635] ? init_object+0x6b/0x80
    [ 217.392629][ T1635] ? netdev_core_pick_tx+0x2e0/0x2e0
    [ 217.393175][ T1635] ? __alloc_skb+0xb9/0x500
    [ 217.393727][ T1635] ? rcu_read_lock_sched_held+0x90/0xc0
    [ 217.394331][ T1635] ? rcu_read_lock_bh_held+0xa0/0xa0
    [ 217.395013][ T1635] ? kasan_unpoison_shadow+0x30/0x40
    [ 217.395668][ T1635] ? __kasan_kmalloc.constprop.4+0xa0/0xd0
    [ 217.396280][ T1635] ? __kmalloc_node_track_caller+0x3a8/0x3f0
    [ 217.399007][ T1635] ? __kasan_kmalloc.constprop.4+0xa0/0xd0
    [ 217.400093][ T1635] ? __kmalloc_reserve.isra.46+0x2e/0xb0
    [ 217.401118][ T1635] ? memset+0x1f/0x40
    [ 217.402529][ T1635] ? __alloc_skb+0x317/0x500
    [ 217.404915][ T1635] ? arp_xmit+0xca/0x2c0
    [ ... ]

    Fixes: 311633b60406 ("hsr: switch ->dellink() to ->ndo_uninit()")
    Acked-by: Cong Wang
    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • [ Upstream commit 040b5cfbcefa263ccf2c118c4938308606bb7ed8 ]

    The skb_mpls_pop was not updating ethertype of an ethernet packet if the
    packet was originally received from a non ARPHRD_ETHER device.

    In the below OVS data path flow, since the device corresponding to port 7
    is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update
    the ethertype of the packet even though the previous push_eth action had
    added an ethernet header to the packet.

    recirc_id(0),in_port(7),eth_type(0x8847),
    mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1),
    actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
    pop_mpls(eth_type=0x800),4

    Fixes: ed246cee09b9 ("net: core: move pop MPLS functionality from OvS to core helper")
    Signed-off-by: Martin Varghese
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Martin Varghese
     
  • [ Upstream commit 0e4940928c26527ce8f97237fef4c8a91cd34207 ]

    After pskb_may_pull() we should always refetch the header
    pointers from the skb->data in case it got reallocated.

    In gre_parse_header(), the erspan header is still fetched
    from the 'options' pointer which is fetched before
    pskb_may_pull().

    Found this during code review of a KMSAN bug report.

    Fixes: cb73ee40b1b3 ("net: ip_gre: use erspan key field for tunnel lookup")
    Cc: Lorenzo Bianconi
    Signed-off-by: Cong Wang
    Acked-by: Lorenzo Bianconi
    Acked-by: William Tu
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit 8ffb055beae58574d3e77b4bf9d4d15eace1ca27 ]

    The recent commit 5c72299fba9d ("net: sched: cls_flower: Classify
    packets using port ranges") had added filtering based on port ranges
    to tc flower. However the commit missed necessary changes in hw-offload
    code, so the feature gave rise to generating incorrect offloaded flow
    keys in NIC.

    One more detailed example is below:

    $ tc qdisc add dev eth0 ingress
    $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
    dst_port 100-200 action drop

    With the setup above, an exact match filter with dst_port == 0 will be
    installed in NIC by hw-offload. IOW, the NIC will have a rule which is
    equivalent to the following one.

    $ tc qdisc add dev eth0 ingress
    $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
    dst_port 0 action drop

    The behavior was caused by the flow dissector which extracts packet
    data into the flow key in the tc flower. More specifically, regardless
    of exact match or specified port ranges, fl_init_dissector() set the
    FLOW_DISSECTOR_KEY_PORTS flag in struct flow_dissector to extract port
    numbers from skb in skb_flow_dissect() called by fl_classify(). Note
    that device drivers received the same struct flow_dissector object as
    used in skb_flow_dissect(). Thus, offloaded drivers could not identify
    which of these is used because the FLOW_DISSECTOR_KEY_PORTS flag was
    set to struct flow_dissector in either case.

    This patch adds the new FLOW_DISSECTOR_KEY_PORTS_RANGE flag and the new
    tp_range field in struct fl_flow_key to recognize which filters are applied
    to offloaded drivers. At this point, when filters based on port ranges
    passed to drivers, drivers return the EOPNOTSUPP error because they do
    not support the feature (the newly created FLOW_DISSECTOR_KEY_PORTS_RANGE
    flag).

    Fixes: 5c72299fba9d ("net: sched: cls_flower: Classify packets using port ranges")
    Signed-off-by: Yoshiki Komachi
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Yoshiki Komachi
     
  • [ Upstream commit 25a443f74bcff2c4d506a39eae62fc15ad7c618a ]

    When a device is bound to a clsact qdisc, bind events are triggered to
    registered drivers for both ingress and egress. However, if a driver
    registers to such a device using the indirect block routines then it is
    assumed that it is only interested in ingress offload and so only replays
    ingress bind/unbind messages.

    The NFP driver supports the offload of some egress filters when
    registering to a block with qdisc of type clsact. However, on unregister,
    if the block is still active, it will not receive an unbind egress
    notification which can prevent proper cleanup of other registered
    callbacks.

    Modify the indirect block callback command in TC to send messages of
    ingress and/or egress bind depending on the qdisc in use. NFP currently
    supports egress offload for TC flower offload so the changes are only
    added to TC.

    Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports")
    Signed-off-by: John Hurley
    Acked-by: Jakub Kicinski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    John Hurley
     
  • [ Upstream commit dbad3408896c3c5722ec9cda065468b3df16c5bf ]

    With indirect blocks, a driver can register for callbacks from a device
    that is does not 'own', for example, a tunnel device. When registering to
    or unregistering from a new device, a callback is triggered to generate
    a bind/unbind event. This, in turn, allows the driver to receive any
    existing rules or to properly clean up installed rules.

    When first added, it was assumed that all indirect block registrations
    would be for ingress offloads. However, the NFP driver can, in some
    instances, support clsact qdisc binds for egress offload.

    Change the name of the indirect block callback command in flow_offload to
    remove the 'ingress' identifier from it. While this does not change
    functionality, a follow up patch will implement a more more generic
    callback than just those currently just supporting ingress offload.

    Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports")
    Signed-off-by: John Hurley
    Acked-by: Jakub Kicinski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    John Hurley
     
  • [ Upstream commit 721c8dafad26ccfa90ff659ee19755e3377b829d ]

    Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the
    timestamp of the last synflood. Protect them with READ_ONCE() and
    WRITE_ONCE() since reads and writes aren't serialised.

    Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was
    introduced by a0f82f64e269 ("syncookies: remove last_synq_overflow from
    struct tcp_sock"). But unprotected accesses were already there when
    timestamp was stored in .last_synq_overflow.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Guillaume Nault
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Guillaume Nault
     
  • [ Upstream commit cb44a08f8647fd2e8db5cc9ac27cd8355fa392d8 ]

    When no synflood occurs, the synflood timestamp isn't updated.
    Therefore it can be so old that time_after32() can consider it to be
    in the future.

    That's a problem for tcp_synq_no_recent_overflow() as it may report
    that a recent overflow occurred while, in fact, it's just that jiffies
    has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.

    Spurious detection of recent overflows lead to extra syncookie
    verification in cookie_v[46]_check(). At that point, the verification
    should fail and the packet dropped. But we should have dropped the
    packet earlier as we didn't even send a syncookie.

    Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
    only if jiffies is within the
    [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
    way, no spurious recent overflow is reported when jiffies wraps and
    'last_overflow' becomes in the future from the point of view of
    time_after32().

    However, if jiffies wraps and enters the
    [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
    'last_overflow' being a stale synflood timestamp), then
    tcp_synq_no_recent_overflow() still erroneously reports an
    overflow. In such cases, we have to rely on syncookie verification
    to drop the packet. We unfortunately have no way to differentiate
    between a fresh and a stale syncookie timestamp.

    In practice, using last_overflow as lower bound is problematic.
    If the synflood timestamp is concurrently updated between the time
    we read jiffies and the moment we store the timestamp in
    'last_overflow', then 'now' becomes smaller than 'last_overflow' and
    tcp_synq_no_recent_overflow() returns true, potentially dropping a
    valid syncookie.

    Reading jiffies after loading the timestamp could fix the problem,
    but that'd require a memory barrier. Let's just accommodate for
    potential timestamp growth instead and extend the interval using
    'last_overflow - HZ' as lower bound.

    Signed-off-by: Guillaume Nault
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Guillaume Nault
     
  • [ Upstream commit 04d26e7b159a396372646a480f4caa166d1b6720 ]

    If no synflood happens for a long enough period of time, then the
    synflood timestamp isn't refreshed and jiffies can advance so much
    that time_after32() can't accurately compare them any more.

    Therefore, we can end up in a situation where time_after32(now,
    last_overflow + HZ) returns false, just because these two values are
    too far apart. In that case, the synflood timestamp isn't updated as
    it should be, which can trick tcp_synq_no_recent_overflow() into
    rejecting valid syncookies.

    For example, let's consider the following scenario on a system
    with HZ=1000:

    * The synflood timestamp is 0, either because that's the timestamp
    of the last synflood or, more commonly, because we're working with
    a freshly created socket.

    * We receive a new SYN, which triggers synflood protection. Let's say
    that this happens when jiffies == 2147484649 (that is,
    'synflood timestamp' + HZ + 2^31 + 1).

    * Then tcp_synq_overflow() doesn't update the synflood timestamp,
    because time_after32(2147484649, 1000) returns false.
    With:
    - 2147484649: the value of jiffies, aka. 'now'.
    - 1000: the value of 'last_overflow' + HZ.

    * A bit later, we receive the ACK completing the 3WHS. But
    cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow()
    says that we're not under synflood. That's because
    time_after32(2147484649, 120000) returns false.
    With:
    - 2147484649: the value of jiffies, aka. 'now'.
    - 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID.

    Of course, in reality jiffies would have increased a bit, but this
    condition will last for the next 119 seconds, which is far enough
    to accommodate for jiffie's growth.

    Fix this by updating the overflow timestamp whenever jiffies isn't
    within the [last_overflow, last_overflow + HZ] range. That shouldn't
    have any performance impact since the update still happens at most once
    per second.

    Now we're guaranteed to have fresh timestamps while under synflood, so
    tcp_synq_no_recent_overflow() can safely use it with time_after32() in
    such situations.

    Stale timestamps can still make tcp_synq_no_recent_overflow() return
    the wrong verdict when not under synflood. This will be handled in the
    next patch.

    For 64 bits architectures, the problem was introduced with the
    conversion of ->tw_ts_recent_stamp to 32 bits integer by commit
    cca9bab1b72c ("tcp: use monotonic timestamps for PAWS").
    The problem has always been there on 32 bits architectures.

    Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS")
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Guillaume Nault
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Guillaume Nault
     
  • [ Upstream commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 ]

    ipv6_stub uses the ip6_dst_lookup function to allow other modules to
    perform IPv6 lookups. However, this function skips the XFRM layer
    entirely.

    All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
    ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
    which calls xfrm_lookup_route(). This patch fixes this inconsistent
    behavior by switching the stub to ip6_dst_lookup_flow, which also calls
    xfrm_lookup_route().

    This requires some changes in all the callers, as these two functions
    take different arguments and have different return types.

    Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan")
    Reported-by: Xiumei Mu
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca