21 Dec, 2019
17 commits
-
commit f53056c43063257ae4159d83c425eaeb772bcd71 upstream.
In gfs2_page_mkwrite's gfs2_allocate_page_backing helper, try to
allocate as many blocks at once as we need. Pass in the size of the
requested allocation.Fixes: 35af80aef99b ("gfs2: don't use buffer_heads in gfs2_allocate_page_backing")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Greg Kroah-Hartman -
commit e64681b487c897ec871465083bf0874087d47b66 upstream.
KASAN shadow map doesn't need to be accessible through the linear kernel
mapping, allocate its pages with MEMBLOCK_ALLOC_ANYWHERE so that high
memory can be used. This frees up to ~100MB of low memory on xtensa
configurations with KASAN and high memory.Cc: stable@vger.kernel.org # v5.1+
Fixes: f240ec09bb8a ("memblock: replace memblock_alloc_base(ANYWHERE) with memblock_phys_alloc")
Reviewed-by: Mike Rapoport
Signed-off-by: Max Filippov
Signed-off-by: Greg Kroah-Hartman -
commit cc90bc68422318eb8e75b15cd74bc8d538a7df29 upstream.
This partially reverts commit e3a5d8e386c3fb973fa75f2403622a8f3640ec06.
Commit e3a5d8e386c3 ("check bi_size overflow before merge") adds a bio_full
check to __bio_try_merge_page. This will cause __bio_try_merge_page to fail
when the last bi_io_vec has been reached. Instead, what we want here is only
the bi_size overflow check.Fixes: e3a5d8e386c3 ("block: check bi_size overflow before merge")
Cc: stable@vger.kernel.org # v5.4+
Reviewed-by: Ming Lei
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit c6a3aea93571a5393602256d8f74772bd64c8225 upstream.
QOS requests for DEFAULT_VALUE are supposed to be ignored but this is
not the case for FREQ_QOS_MAX. Adding one request for MAX_DEFAULT_VALUE
and one for a real value will cause freq_qos_read_value to unexpectedly
return MAX_DEFAULT_VALUE (-1).This happens because freq_qos max value is aggregated with PM_QOS_MIN
but FREQ_QOS_MAX_DEFAULT_VALUE is (-1) so it's smaller than other
values.Fix this by redefining FREQ_QOS_MAX_DEFAULT_VALUE to S32_MAX.
Looking at current users for freq_qos it seems that none of them create
requests for FREQ_QOS_MAX_DEFAULT_VALUE.Fixes: 77751a466ebd ("PM: QoS: Introduce frequency QoS")
Signed-off-by: Leonard Crestez
Reported-by: Matthias Kaehlcke
Reviewed-by: Matthias Kaehlcke
Cc: 5.4+ # 5.4+
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Greg Kroah-Hartman -
commit f338bb9f0179cb959977b74e8331b312264d720b upstream.
Enhance the ACS quirk for Cavium Processors. Add the root port vendor IDs
for ThunderX2 and ThunderX3 series of processors.[bhelgaas: add Fixes: and stable tag]
Fixes: f2ddaf8dfd4a ("PCI: Apply Cavium ThunderX ACS quirk to more Root Ports")
Link: https://lore.kernel.org/r/20191111024243.GA11408@dc5-eodlnx05.marvell.com
Signed-off-by: George Cherian
Signed-off-by: Bjorn Helgaas
Reviewed-by: Robert Richter
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Greg Kroah-Hartman -
commit 7c7e53e1c93df14690bd12c1f84730fef927a6f1 upstream.
The R-Car Gen2/3 manual - available at:
https://www.renesas.com/eu/en/products/microcontrollers-microprocessors/rz/rzg/rzg1m.html#documents
"RZ/G Series User's Manual: Hardware" section
strictly enforces the MACCTLR inizialization value - 39.3.1 - "Initial
Setting of PCI Express":"Be sure to write the initial value (= H'80FF 0000) to MACCTLR before
enabling PCIETCTLR.CFINIT".To avoid unexpected behavior and to match the SW initialization sequence
guidelines, this patch programs the MACCTLR with the correct value.Note that the MACCTLR.SPCHG bit in the MACCTLR register description
reports that "Only writing 1 is valid and writing 0 is invalid" but this
"invalid" has to be interpreted as a write-ignore aka "ignored", not
"prohibited".Reported-by: Eugeniu Rosca
Fixes: c25da4778803 ("PCI: rcar: Add Renesas R-Car PCIe driver")
Fixes: be20bbcb0a8c ("PCI: rcar: Add the initialization of PCIe link in resume_noirq()")
Signed-off-by: Yoshihiro Shimoda
Signed-off-by: Lorenzo Pieralisi
Reviewed-by: Geert Uytterhoeven
Cc: # v5.2+
Signed-off-by: Greg Kroah-Hartman -
commit 73884a7082f466ce6686bb8dd7e6571dd42313b4 upstream.
As per PCIe r5.0, sec 7.8.5.2, fixed bus numbers of a bridge must be zero
when no function that uses EA is located behind it. Hence, if EA supplies
bus numbers of zero, assign bus numbers normally. A secondary bus can
never have a bus number of zero, so setting a bridge's Secondary Bus Number
to zero makes downstream devices unreachable.[bhelgaas: retain bool return value so "zero is invalid" logic is local]
Fixes: 2dbce5901179 ("PCI: Assign bus numbers present in EA capability for bridges")
Link: https://lore.kernel.org/r/1572850664-9861-1-git-send-email-sundeep.lkml@gmail.com
Signed-off-by: Subbaraya Sundeep
Signed-off-by: Bjorn Helgaas
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Greg Kroah-Hartman -
commit e045fa29e89383c717e308609edd19d2fd29e1be upstream.
When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector
Control register for each vector and saves it in desc->masked. Each
register is 32 bits and bit 0 is the actual Mask bit.When we restored these registers during resume, we previously set the Mask
bit if *any* bit in desc->masked was set instead of when the Mask bit
itself was set:pci_restore_state
pci_restore_msi_state
__pci_restore_msix_state
for_each_pci_msi_entry
msix_mask_irq(entry, entry->masked) masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT
if (flag)
Signed-off-by: Bjorn Helgaas
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit d8558ac8c93d429d65d7490b512a3a67e559d0d4 upstream.
According to documentation [0] the correct offset for the Upstream Peer
Decode Configuration Register (UPDCR) is 0x1014. It was previously defined
as 0x1114.d99321b63b1f ("PCI: Enable quirks for PCIe ACS on Intel PCH root ports")
intended to enforce isolation between PCI devices allowing them to be put
into separate IOMMU groups. Due to the wrong register offset the intended
isolation was not fully enforced. This is fixed with this patch.Please note that I did not test this patch because I have no hardware that
implements this register.[0] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/4th-gen-core-family-mobile-i-o-datasheet.pdf (page 325)
Fixes: d99321b63b1f ("PCI: Enable quirks for PCIe ACS on Intel PCH root ports")
Link: https://lore.kernel.org/r/7a3505df-79ba-8a28-464c-88b83eefffa6@kernkonzept.com
Signed-off-by: Steffen Liebergeld
Signed-off-by: Bjorn Helgaas
Reviewed-by: Andrew Murray
Acked-by: Ashok Raj
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Greg Kroah-Hartman -
commit 157c1062fcd86ade3c674503705033051fd3d401 upstream.
A sysfs request to enable or disable a PCIe hotplug slot should not
return before it has been carried out. That is sought to be achieved by
waiting until the controller's "pending_events" have been cleared.However the IRQ thread pciehp_ist() clears the "pending_events" before
it acts on them. If pciehp_sysfs_enable_slot() / _disable_slot() happen
to check the "pending_events" after they have been cleared but while
pciehp_ist() is still running, the functions may return prematurely
with an incorrect return value.Fix by introducing an "ist_running" flag which must be false before a sysfs
request is allowed to return.Fixes: 32a8cef274fe ("PCI: pciehp: Enable/disable exclusively from IRQ thread")
Link: https://lore.kernel.org/linux-pci/1562226638-54134-1-git-send-email-wangxiongfeng2@huawei.com
Link: https://lore.kernel.org/r/4174210466e27eb7e2243dd1d801d5f75baaffd8.1565345211.git.lukas@wunner.de
Reported-and-tested-by: Xiongfeng Wang
Signed-off-by: Lukas Wunner
Signed-off-by: Bjorn Helgaas
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Greg Kroah-Hartman -
commit f2c33ccacb2d4bbeae2a255a7ca0cbfd03017b7c upstream.
pci_pm_thaw_noirq() is supposed to return the device to D0 and restore its
configuration registers, but previously it only did that for devices whose
drivers implemented the new power management ops.Hibernation, e.g., via "echo disk > /sys/power/state", involves freezing
devices, creating a hibernation image, thawing devices, writing the image,
and powering off. The fact that thawing did not return devices with legacy
power management to D0 caused errors, e.g., in this path:pci_pm_thaw_noirq
if (pci_has_legacy_pm_support(pci_dev)) # true for Mellanox VF driver
return pci_legacy_resume_early(dev) # ... legacy PM skips the rest
pci_set_power_state(pci_dev, PCI_D0)
pci_restore_state(pci_dev)
pci_pm_thaw
if (pci_has_legacy_pm_support(pci_dev))
pci_legacy_resume
drv->resume
mlx4_resume
...
pci_enable_msix_range
...
if (dev->current_state != PCI_D0) #
Signed-off-by: Bjorn Helgaas
Reviewed-by: Rafael J. Wysocki
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Greg Kroah-Hartman -
commit 6acdf7e19b37cb3a9258603d0eab315079c19c5e upstream.
The part_event_bitmap register is 64 bits wide, so read it with ioread64()
instead of the 32-bit ioread32().Fixes: 52eabba5bcdb ("switchtec: Add IOCTLs to the Switchtec driver")
Link: https://lore.kernel.org/r/20190910195833.3891-1-logang@deltatee.com
Reported-by: Doug Meyer
Signed-off-by: Logan Gunthorpe
Signed-off-by: Bjorn Helgaas
Cc: stable@vger.kernel.org # v4.12+
Cc: Kelvin Cao
Signed-off-by: Greg Kroah-Hartman -
commit 2ac55d5e5ec9ad0a07e194f0eaca865fe5aa3c40 upstream.
It have turned out that it's not a good idea to unconditionally do a power
cycle and then to re-initialize the SDIO card, as currently done through
mmc_hw_reset() -> mmc_sdio_hw_reset(). This because there may be multiple
SDIO func drivers probed, who also shares the same SDIO card.To address these scenarios, one may be tempted to use a notification
mechanism, as to allow the core to inform each of the probed func drivers,
about an ongoing HW reset. However, supporting such an operation from the
func driver point of view, may not be entirely trivial.Therefore, let's use a more simplistic approach to solve the problem, by
instead forcing the card to be removed and re-detected, via scheduling a
rescan-work. In this way, we can rely on existing infrastructure, as the
func driver's ->remove() and ->probe() callbacks, becomes invoked to deal
with the cleanup and the re-initialization.This solution may be considered as rather heavy, especially if a func
driver doesn't share its card with other func drivers. To address this,
let's keep the current immediate HW reset option as well, but run it only
when there is one func driver probed for the card.Finally, to allow the caller of mmc_hw_reset(), to understand if the reset
is being asynchronously managed from a scheduled work, it returns 1
(propagated from mmc_sdio_hw_reset()). If the HW reset is executed
successfully and synchronously it returns 0, which maintains the existing
behaviour.Reviewed-by: Douglas Anderson
Tested-by: Douglas Anderson
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Ulf Hansson
Signed-off-by: Greg Kroah-Hartman -
commit 99b4ddd8b76a6f60a8c2b3775849d65d21a418fc upstream.
Upfront in mmc_rescan() we use the host->rescan_entered flag, to allow
scanning only once for non-removable cards. Therefore, it's also not
possible that we can have a corresponding card bus attached (host->bus_ops
is NULL), when we are scanning non-removable cards.For this reason, let' drop the check for mmc_card_is_removable() as it's
redundant.Reviewed-by: Douglas Anderson
Tested-by: Douglas Anderson
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Ulf Hansson
Signed-off-by: Greg Kroah-Hartman -
commit a0d4c7eb71dd08a89ad631177bb0cbbabd598f84 upstream.
MMC IOCTLS with R1B responses may cause the card to enter the busy state,
which means it's not ready to receive a new request. To prevent new
requests from being sent to the card, use a CMD13 polling loop to verify
that the card returns to the transfer state, before completing the request.Signed-off-by: Chaotian Jing
Reviewed-by: Avri Altman
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson
Signed-off-by: Greg Kroah-Hartman -
commit 3869468e0c4800af52bfe1e0b72b338dcdae2cfc upstream.
To prepare for more users of card_busy_detect(), let's drop the struct
request * as an in-parameter and convert to log the error message via
dev_err() instead of pr_err().Signed-off-by: Chaotian Jing
Reviewed-by: Avri Altman
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson
Signed-off-by: Greg Kroah-Hartman -
commit f8c63edfd78905320e86b6b2be2b7a5ac768fa4e upstream.
Fix commit 7b81cb6bddd2 ("usb: add a HCD_DMA flag instead of
guestimating DMA capabilities") where local memory USB drivers
erroneously allocate DMA memory instead of pool memory, causingOHCI Unrecoverable Error, disabled
HC died; cleaning upThe order between hcd_uses_dma() and hcd->localmem_pool is now
arranged as in hcd_buffer_alloc() and hcd_buffer_free(), with the
test for hcd->localmem_pool placed first.As an alternative, one might consider adjusting hcd_uses_dma() with
static inline bool hcd_uses_dma(struct usb_hcd *hcd)
{
- return IS_ENABLED(CONFIG_HAS_DMA) && (hcd->driver->flags & HCD_DMA);
+ return IS_ENABLED(CONFIG_HAS_DMA) &&
+ (hcd->driver->flags & HCD_DMA) &&
+ (hcd->localmem_pool == NULL);
}One can also consider unsetting HCD_DMA for local memory pool drivers.
Fixes: 7b81cb6bddd2 ("usb: add a HCD_DMA flag instead of guestimating DMA capabilities")
Cc: stable
Signed-off-by: Fredrik Noring
Link: https://lore.kernel.org/r/20191210172905.GA52526@sx9
Signed-off-by: Greg Kroah-Hartman
18 Dec, 2019
23 commits
-
[ Upstream commit 00222d1394104f0fd6c01ca9f578afec9e0f148b ]
RTL8125 also requires to enable RX for WoL.
v2: add missing Fixes tag
Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125")
Signed-off-by: Heiner Kallweit
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 9385973fe8db9743fa93bf17245635be4eb8c4a6 ]
Currently a switch driver deinit frees the regmaps, but the PTP clock is
still out there, available to user space via /dev/ptpN. Any PTP
operation is a ticking time bomb, since it will attempt to use the freed
regmaps and thus trigger kernel panics:[ 4.291746] fsl_enetc 0000:00:00.2 eth1: error -22 setting up slave phy
[ 4.291871] mscc_felix 0000:00:00.5: Failed to register DSA switch: -22
[ 4.308666] mscc_felix: probe of 0000:00:00.5 failed with error -22
[ 6.358270] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000088
[ 6.367090] Mem abort info:
[ 6.369888] ESR = 0x96000046
[ 6.369891] EC = 0x25: DABT (current EL), IL = 32 bits
[ 6.369892] SET = 0, FnV = 0
[ 6.369894] EA = 0, S1PTW = 0
[ 6.369895] Data abort info:
[ 6.369897] ISV = 0, ISS = 0x00000046
[ 6.369899] CM = 0, WnR = 1
[ 6.369902] user pgtable: 4k pages, 48-bit VAs, pgdp=00000020d58c7000
[ 6.369904] [0000000000000088] pgd=00000020d5912003, pud=00000020d5915003, pmd=0000000000000000
[ 6.369914] Internal error: Oops: 96000046 [#1] PREEMPT SMP
[ 6.420443] Modules linked in:
[ 6.423506] CPU: 1 PID: 262 Comm: phc_ctl Not tainted 5.4.0-03625-gb7b2a5dadd7f #204
[ 6.431273] Hardware name: LS1028A RDB Board (DT)
[ 6.435989] pstate: 40000085 (nZcv daIf -PAN -UAO)
[ 6.440802] pc : css_release+0x24/0x58
[ 6.444561] lr : regmap_read+0x40/0x78
[ 6.448316] sp : ffff800010513cc0
[ 6.451636] x29: ffff800010513cc0 x28: ffff002055873040
[ 6.456963] x27: 0000000000000000 x26: 0000000000000000
[ 6.462289] x25: 0000000000000000 x24: 0000000000000000
[ 6.467617] x23: 0000000000000000 x22: 0000000000000080
[ 6.472944] x21: ffff800010513d44 x20: 0000000000000080
[ 6.478270] x19: 0000000000000000 x18: 0000000000000000
[ 6.483596] x17: 0000000000000000 x16: 0000000000000000
[ 6.488921] x15: 0000000000000000 x14: 0000000000000000
[ 6.494247] x13: 0000000000000000 x12: 0000000000000000
[ 6.499573] x11: 0000000000000000 x10: 0000000000000000
[ 6.504899] x9 : 0000000000000000 x8 : 0000000000000000
[ 6.510225] x7 : 0000000000000000 x6 : ffff800010513cf0
[ 6.515550] x5 : 0000000000000000 x4 : 0000000fffffffe0
[ 6.520876] x3 : 0000000000000088 x2 : ffff800010513d44
[ 6.526202] x1 : ffffcada668ea000 x0 : ffffcada64d8b0c0
[ 6.531528] Call trace:
[ 6.533977] css_release+0x24/0x58
[ 6.537385] regmap_read+0x40/0x78
[ 6.540795] __ocelot_read_ix+0x6c/0xa0
[ 6.544641] ocelot_ptp_gettime64+0x4c/0x110
[ 6.548921] ptp_clock_gettime+0x4c/0x58
[ 6.552853] pc_clock_gettime+0x5c/0xa8
[ 6.556699] __arm64_sys_clock_gettime+0x68/0xc8
[ 6.561331] el0_svc_common.constprop.2+0x7c/0x178
[ 6.566133] el0_svc_handler+0x34/0xa0
[ 6.569891] el0_sync_handler+0x114/0x1d0
[ 6.573908] el0_sync+0x140/0x180
[ 6.577232] Code: d503201f b00119a1 91022263 b27b7be4 (f9004663)
[ 6.583349] ---[ end trace d196b9b14cdae2da ]---
[ 6.587977] Kernel panic - not syncing: Fatal exception
[ 6.593216] SMP: stopping secondary CPUs
[ 6.597151] Kernel Offset: 0x4ada54400000 from 0xffff800010000000
[ 6.603261] PHYS_OFFSET: 0xffffd0a7c0000000
[ 6.607454] CPU features: 0x10002,21806008
[ 6.611558] Memory Limit: noneAnd now that ocelot->ptp_clock is checked at exit, prevent a potential
error where ptp_clock_register returned a pointer-encoded error, which
we are keeping in the ocelot private data structure. So now,
ocelot->ptp_clock is now either NULL or a valid pointer.Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Cc: Antoine Tenart
Reviewed-by: Florian Fainelli
Signed-off-by: Vladimir Oltean
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit ffac2027e18f006f42630f2e01a8a9bd8dc664b5 ]
If the user has specified their own RSS hash key, don't
lose it across queue resets such as DOWN/UP, MTU change,
and number of channels change. This is fixed by moving
the key initialization to a little earlier in the lif
creation.Also, let's clean up the RSS config a little better on
the way down by setting it all to 0.Fixes: aa3198819bea ("ionic: Add RSS support")
Signed-off-by: Shannon Nelson
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 86c76c09898332143be365c702cf8d586ed4ed21 ]
A lockdep splat was observed when trying to remove an xdp memory
model from the table since the mutex was obtained when trying to
remove the entry, but not before the table walk started:Fix the splat by obtaining the lock before starting the table walk.
Fixes: c3f812cea0d7 ("page_pool: do not release pool until inflight == 0.")
Reported-by: Grygorii Strashko
Signed-off-by: Jonathan Lemon
Tested-by: Grygorii Strashko
Acked-by: Jesper Dangaard Brouer
Acked-by: Ilias Apalodimas
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit c3f812cea0d7006469d1cf33a4a9f0a12bb4b3a3 ]
The page pool keeps track of the number of pages in flight, and
it isn't safe to remove the pool until all pages are returned.Disallow removing the pool until all pages are back, so the pool
is always available for page producers.Make the page pool responsible for its own delayed destruction
instead of relying on XDP, so the page pool can be used without
the xdp memory model.When all pages are returned, free the pool and notify xdp if the
pool is registered with the xdp memory system. Have the callback
perform a table walk since some drivers (cpsw) may share the pool
among multiple xdp_rxq_info.Note that the increment of pages_state_release_cnt may result in
inflight == 0, resulting in the pool being released.Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic warning")
Signed-off-by: Jonathan Lemon
Acked-by: Jesper Dangaard Brouer
Acked-by: Ilias Apalodimas
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 3d7cadae51f1b7f28358e36d0a1ce3f0ae2eee60 ]
When setting speed to 100G via ethtool (AN is set to off), only 25G*4 is
configured while the user, who has an advanced HW which supports
extended PTYS, expects also 50G*2 to be configured.
With this patch, when extended PTYS mode is available, configure
PTYS via extended fields.Fixes: 4b95840a6ced ("net/mlx5e: Fix matching of speed to PRM link modes")
Signed-off-by: Aya Levin
Reviewed-by: Eran Ben Elisha
Signed-off-by: Saeed Mahameed
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 6d485e5e555436d2c13accdb10807328c4158a17 ]
Add a missing value in translation of PTYS ext_eth_proto_oper to its
corresponding speed. When ext_eth_proto_oper bit 10 is set, ethtool
shows unknown speed. With this fix, ethtool shows speed is 100G as
expected.Fixes: a08b4ed1373d ("net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register")
Signed-off-by: Aya Levin
Reviewed-by: Eran Ben Elisha
Signed-off-by: Saeed Mahameed
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit a23dae79fb6555c808528707c6389345d0b0c189 ]
Flows are allocated with kzalloc() so free with kfree().
Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Signed-off-by: Roi Dayan
Reviewed-by: Eli Britstein
Signed-off-by: Saeed Mahameed
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit c431f8597863a91eea6024926e0c1b179cfa4852 ]
SFF 8472 eeprom length is 512 bytes. Fix module info return value to
support 512 bytes read.Fixes: ace329f4ab3b ("net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query")
Signed-off-by: Eran Ben Elisha
Reviewed-by: Aya Levin
Signed-off-by: Saeed Mahameed
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 95219afbb980f10934de9f23a3e199be69c5ed09 ]
The act_ct TC module shares a common conntrack and NAT infrastructure
exposed via netfilter. It's possible that a packet needs both SNAT and
DNAT manipulation, due to e.g. tuple collision. Netfilter can support
this because it runs through the NAT table twice - once on ingress and
again after egress. The act_ct action doesn't have such capability.Like netfilter hook infrastructure, we should run through NAT twice to
keep the symmetry.Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
Signed-off-by: Aaron Conole
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit c55d8b108caa2ec1ae8dddd02cb9d3a740f7c838 ]
Cited patch changed (channel index, tc) => (TXQ index) mapping to be a
static one, in order to keep indices consistent when changing number of
channels or TCs.For 32 channels (OOB) and 8 TCs, real num of TXQs is 256.
When reducing the amount of channels to 8, the real num of TXQs will be
changed to 64.
This indices method is buggy:
- Channel #0, TC 3, the TXQ index is 96.
- Index 8 is not valid, as there is no such TXQ from driver perspective
(As it represents channel #8, TC 0, which is not valid with the above
configuration).As part of driver's select queue, it calls netdev_pick_tx which returns an
index in the range of real number of TXQs. Depends on the return value,
with the examples above, driver could have returned index larger than the
real number of tx queues, or crash the kernel as it tries to read invalid
address of SQ which was not allocated.Fix that by allocating sequential TXQ indices, and hold a new mapping
between (channel index, tc) => (real TXQ index). This mapping will be
updated as part of priv channels activation, and is used in
mlx5e_select_queue to find the selected queue index.The existing indices mapping (channel_tc2txq) is no longer needed, as it
is used only for statistics structures and can be calculated on run time.
Delete its definintion and updates.Fixes: 8bfaf07f7806 ("net/mlx5e: Present SW stats when state is not opened")
Signed-off-by: Eran Ben Elisha
Signed-off-by: Saeed Mahameed
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit d04ac224b1688f005a84f764cfe29844f8e9da08 ]
The skb_mpls_push was not updating ethertype of an ethernet packet if
the packet was originally received from a non ARPHRD_ETHER device.In the below OVS data path flow, since the device corresponding to
port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does
not update the ethertype of the packet even though the previous
push_eth action had added an ethernet header to the packet.recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no),
actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4Fixes: 8822e270d697 ("net: core: move push MPLS functionality from OvS to core helper")
Signed-off-by: Martin Varghese
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit df95467b6d2bfce49667ee4b71c67249b01957f7 ]
hsr_dev_xmit() calls hsr_port_get_hsr() to find master node and that would
return NULL if master node is not existing in the list.
But hsr_dev_xmit() doesn't check return pointer so a NULL dereference
could occur.Test commands:
ip netns add nst
ip link add veth0 type veth peer name veth1
ip link add veth2 type veth peer name veth3
ip link set veth1 netns nst
ip link set veth3 netns nst
ip link set veth0 up
ip link set veth2 up
ip link add hsr0 type hsr slave1 veth0 slave2 veth2
ip a a 192.168.100.1/24 dev hsr0
ip link set hsr0 up
ip netns exec nst ip link set veth1 up
ip netns exec nst ip link set veth3 up
ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
ip netns exec nst ip link set hsr1 up
hping3 192.168.100.2 -2 --flood &
modprobe -rv hsrSplat looks like:
[ 217.351122][ T1635] kasan: CONFIG_KASAN_INLINE enabled
[ 217.352969][ T1635] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ 217.354297][ T1635] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 217.355507][ T1635] CPU: 1 PID: 1635 Comm: hping3 Not tainted 5.4.0+ #192
[ 217.356472][ T1635] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 217.357804][ T1635] RIP: 0010:hsr_dev_xmit+0x34/0x90 [hsr]
[ 217.373010][ T1635] Code: 48 8d be 00 0c 00 00 be 04 00 00 00 48 83 ec 08 e8 21 be ff ff 48 8d 78 10 48 ba 00 b
[ 217.376919][ T1635] RSP: 0018:ffff8880cd8af058 EFLAGS: 00010202
[ 217.377571][ T1635] RAX: 0000000000000000 RBX: ffff8880acde6840 RCX: 0000000000000002
[ 217.379465][ T1635] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: 0000000000000010
[ 217.380274][ T1635] RBP: ffff8880acde6840 R08: ffffed101b440d5d R09: 0000000000000001
[ 217.381078][ T1635] R10: 0000000000000001 R11: ffffed101b440d5c R12: ffff8880bffcc000
[ 217.382023][ T1635] R13: ffff8880bffcc088 R14: 0000000000000000 R15: ffff8880ca675c00
[ 217.383094][ T1635] FS: 00007f060d9d1740(0000) GS:ffff8880da000000(0000) knlGS:0000000000000000
[ 217.384289][ T1635] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 217.385009][ T1635] CR2: 00007faf15381dd0 CR3: 00000000d523c001 CR4: 00000000000606e0
[ 217.385940][ T1635] Call Trace:
[ 217.386544][ T1635] dev_hard_start_xmit+0x160/0x740
[ 217.387114][ T1635] __dev_queue_xmit+0x1961/0x2e10
[ 217.388118][ T1635] ? check_object+0xaf/0x260
[ 217.391466][ T1635] ? __alloc_skb+0xb9/0x500
[ 217.392017][ T1635] ? init_object+0x6b/0x80
[ 217.392629][ T1635] ? netdev_core_pick_tx+0x2e0/0x2e0
[ 217.393175][ T1635] ? __alloc_skb+0xb9/0x500
[ 217.393727][ T1635] ? rcu_read_lock_sched_held+0x90/0xc0
[ 217.394331][ T1635] ? rcu_read_lock_bh_held+0xa0/0xa0
[ 217.395013][ T1635] ? kasan_unpoison_shadow+0x30/0x40
[ 217.395668][ T1635] ? __kasan_kmalloc.constprop.4+0xa0/0xd0
[ 217.396280][ T1635] ? __kmalloc_node_track_caller+0x3a8/0x3f0
[ 217.399007][ T1635] ? __kasan_kmalloc.constprop.4+0xa0/0xd0
[ 217.400093][ T1635] ? __kmalloc_reserve.isra.46+0x2e/0xb0
[ 217.401118][ T1635] ? memset+0x1f/0x40
[ 217.402529][ T1635] ? __alloc_skb+0x317/0x500
[ 217.404915][ T1635] ? arp_xmit+0xca/0x2c0
[ ... ]Fixes: 311633b60406 ("hsr: switch ->dellink() to ->ndo_uninit()")
Acked-by: Cong Wang
Signed-off-by: Taehee Yoo
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 040b5cfbcefa263ccf2c118c4938308606bb7ed8 ]
The skb_mpls_pop was not updating ethertype of an ethernet packet if the
packet was originally received from a non ARPHRD_ETHER device.In the below OVS data path flow, since the device corresponding to port 7
is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update
the ethertype of the packet even though the previous push_eth action had
added an ethernet header to the packet.recirc_id(0),in_port(7),eth_type(0x8847),
mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1),
actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
pop_mpls(eth_type=0x800),4Fixes: ed246cee09b9 ("net: core: move pop MPLS functionality from OvS to core helper")
Signed-off-by: Martin Varghese
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 0e4940928c26527ce8f97237fef4c8a91cd34207 ]
After pskb_may_pull() we should always refetch the header
pointers from the skb->data in case it got reallocated.In gre_parse_header(), the erspan header is still fetched
from the 'options' pointer which is fetched before
pskb_may_pull().Found this during code review of a KMSAN bug report.
Fixes: cb73ee40b1b3 ("net: ip_gre: use erspan key field for tunnel lookup")
Cc: Lorenzo Bianconi
Signed-off-by: Cong Wang
Acked-by: Lorenzo Bianconi
Acked-by: William Tu
Reviewed-by: Simon Horman
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 8ffb055beae58574d3e77b4bf9d4d15eace1ca27 ]
The recent commit 5c72299fba9d ("net: sched: cls_flower: Classify
packets using port ranges") had added filtering based on port ranges
to tc flower. However the commit missed necessary changes in hw-offload
code, so the feature gave rise to generating incorrect offloaded flow
keys in NIC.One more detailed example is below:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
dst_port 100-200 action dropWith the setup above, an exact match filter with dst_port == 0 will be
installed in NIC by hw-offload. IOW, the NIC will have a rule which is
equivalent to the following one.$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
dst_port 0 action dropThe behavior was caused by the flow dissector which extracts packet
data into the flow key in the tc flower. More specifically, regardless
of exact match or specified port ranges, fl_init_dissector() set the
FLOW_DISSECTOR_KEY_PORTS flag in struct flow_dissector to extract port
numbers from skb in skb_flow_dissect() called by fl_classify(). Note
that device drivers received the same struct flow_dissector object as
used in skb_flow_dissect(). Thus, offloaded drivers could not identify
which of these is used because the FLOW_DISSECTOR_KEY_PORTS flag was
set to struct flow_dissector in either case.This patch adds the new FLOW_DISSECTOR_KEY_PORTS_RANGE flag and the new
tp_range field in struct fl_flow_key to recognize which filters are applied
to offloaded drivers. At this point, when filters based on port ranges
passed to drivers, drivers return the EOPNOTSUPP error because they do
not support the feature (the newly created FLOW_DISSECTOR_KEY_PORTS_RANGE
flag).Fixes: 5c72299fba9d ("net: sched: cls_flower: Classify packets using port ranges")
Signed-off-by: Yoshiki Komachi
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 25a443f74bcff2c4d506a39eae62fc15ad7c618a ]
When a device is bound to a clsact qdisc, bind events are triggered to
registered drivers for both ingress and egress. However, if a driver
registers to such a device using the indirect block routines then it is
assumed that it is only interested in ingress offload and so only replays
ingress bind/unbind messages.The NFP driver supports the offload of some egress filters when
registering to a block with qdisc of type clsact. However, on unregister,
if the block is still active, it will not receive an unbind egress
notification which can prevent proper cleanup of other registered
callbacks.Modify the indirect block callback command in TC to send messages of
ingress and/or egress bind depending on the qdisc in use. NFP currently
supports egress offload for TC flower offload so the changes are only
added to TC.Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports")
Signed-off-by: John Hurley
Acked-by: Jakub Kicinski
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit dbad3408896c3c5722ec9cda065468b3df16c5bf ]
With indirect blocks, a driver can register for callbacks from a device
that is does not 'own', for example, a tunnel device. When registering to
or unregistering from a new device, a callback is triggered to generate
a bind/unbind event. This, in turn, allows the driver to receive any
existing rules or to properly clean up installed rules.When first added, it was assumed that all indirect block registrations
would be for ingress offloads. However, the NFP driver can, in some
instances, support clsact qdisc binds for egress offload.Change the name of the indirect block callback command in flow_offload to
remove the 'ingress' identifier from it. While this does not change
functionality, a follow up patch will implement a more more generic
callback than just those currently just supporting ingress offload.Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports")
Signed-off-by: John Hurley
Acked-by: Jakub Kicinski
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 721c8dafad26ccfa90ff659ee19755e3377b829d ]
Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the
timestamp of the last synflood. Protect them with READ_ONCE() and
WRITE_ONCE() since reads and writes aren't serialised.Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was
introduced by a0f82f64e269 ("syncookies: remove last_synq_overflow from
struct tcp_sock"). But unprotected accesses were already there when
timestamp was stored in .last_synq_overflow.Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Guillaume Nault
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit cb44a08f8647fd2e8db5cc9ac27cd8355fa392d8 ]
When no synflood occurs, the synflood timestamp isn't updated.
Therefore it can be so old that time_after32() can consider it to be
in the future.That's a problem for tcp_synq_no_recent_overflow() as it may report
that a recent overflow occurred while, in fact, it's just that jiffies
has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.Spurious detection of recent overflows lead to extra syncookie
verification in cookie_v[46]_check(). At that point, the verification
should fail and the packet dropped. But we should have dropped the
packet earlier as we didn't even send a syncookie.Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
only if jiffies is within the
[last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
way, no spurious recent overflow is reported when jiffies wraps and
'last_overflow' becomes in the future from the point of view of
time_after32().However, if jiffies wraps and enters the
[last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
'last_overflow' being a stale synflood timestamp), then
tcp_synq_no_recent_overflow() still erroneously reports an
overflow. In such cases, we have to rely on syncookie verification
to drop the packet. We unfortunately have no way to differentiate
between a fresh and a stale syncookie timestamp.In practice, using last_overflow as lower bound is problematic.
If the synflood timestamp is concurrently updated between the time
we read jiffies and the moment we store the timestamp in
'last_overflow', then 'now' becomes smaller than 'last_overflow' and
tcp_synq_no_recent_overflow() returns true, potentially dropping a
valid syncookie.Reading jiffies after loading the timestamp could fix the problem,
but that'd require a memory barrier. Let's just accommodate for
potential timestamp growth instead and extend the interval using
'last_overflow - HZ' as lower bound.Signed-off-by: Guillaume Nault
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 04d26e7b159a396372646a480f4caa166d1b6720 ]
If no synflood happens for a long enough period of time, then the
synflood timestamp isn't refreshed and jiffies can advance so much
that time_after32() can't accurately compare them any more.Therefore, we can end up in a situation where time_after32(now,
last_overflow + HZ) returns false, just because these two values are
too far apart. In that case, the synflood timestamp isn't updated as
it should be, which can trick tcp_synq_no_recent_overflow() into
rejecting valid syncookies.For example, let's consider the following scenario on a system
with HZ=1000:* The synflood timestamp is 0, either because that's the timestamp
of the last synflood or, more commonly, because we're working with
a freshly created socket.* We receive a new SYN, which triggers synflood protection. Let's say
that this happens when jiffies == 2147484649 (that is,
'synflood timestamp' + HZ + 2^31 + 1).* Then tcp_synq_overflow() doesn't update the synflood timestamp,
because time_after32(2147484649, 1000) returns false.
With:
- 2147484649: the value of jiffies, aka. 'now'.
- 1000: the value of 'last_overflow' + HZ.* A bit later, we receive the ACK completing the 3WHS. But
cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow()
says that we're not under synflood. That's because
time_after32(2147484649, 120000) returns false.
With:
- 2147484649: the value of jiffies, aka. 'now'.
- 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID.Of course, in reality jiffies would have increased a bit, but this
condition will last for the next 119 seconds, which is far enough
to accommodate for jiffie's growth.Fix this by updating the overflow timestamp whenever jiffies isn't
within the [last_overflow, last_overflow + HZ] range. That shouldn't
have any performance impact since the update still happens at most once
per second.Now we're guaranteed to have fresh timestamps while under synflood, so
tcp_synq_no_recent_overflow() can safely use it with time_after32() in
such situations.Stale timestamps can still make tcp_synq_no_recent_overflow() return
the wrong verdict when not under synflood. This will be handled in the
next patch.For 64 bits architectures, the problem was introduced with the
conversion of ->tw_ts_recent_stamp to 32 bits integer by commit
cca9bab1b72c ("tcp: use monotonic timestamps for PAWS").
The problem has always been there on 32 bits architectures.Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Guillaume Nault
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 ]
ipv6_stub uses the ip6_dst_lookup function to allow other modules to
perform IPv6 lookups. However, this function skips the XFRM layer
entirely.All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
which calls xfrm_lookup_route(). This patch fixes this inconsistent
behavior by switching the stub to ip6_dst_lookup_flow, which also calls
xfrm_lookup_route().This requires some changes in all the callers, as these two functions
take different arguments and have different return types.Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan")
Reported-by: Xiumei Mu
Signed-off-by: Sabrina Dubroca
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman