25 Jan, 2021
1 commit
-
Change-Id: I173bc0325c93daaf82bb894f77ba35fa827864de
06 Jan, 2021
1 commit
-
commit dc2da7b45ffe954a0090f5d0310ed7b0b37d2bd2 upstream.
VMware observed a performance regression during memmap init on their
platform, and bisected to commit 73a6e474cb376 ("mm: memmap_init:
iterate over memblock regions rather that check each PFN") causing it.Before the commit:
[0.033176] Normal zone: 1445888 pages used for memmap
[0.033176] Normal zone: 89391104 pages, LIFO batch:63
[0.035851] ACPI: PM-Timer IO Port: 0x448With commit
[0.026874] Normal zone: 1445888 pages used for memmap
[0.026875] Normal zone: 89391104 pages, LIFO batch:63
[2.028450] ACPI: PM-Timer IO Port: 0x448The root cause is the current memmap defer init doesn't work as expected.
Before, memmap_init_zone() was used to do memmap init of one whole zone,
to initialize all low zones of one numa node, but defer memmap init of
the last zone in that numa node. However, since commit 73a6e474cb376,
function memmap_init() is adapted to iterater over memblock regions
inside one zone, then call memmap_init_zone() to do memmap init for each
region.E.g, on VMware's system, the memory layout is as below, there are two
memory regions in node 2. The current code will mistakenly initialize the
whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to
iniatialize only one memmory section on the 2nd region [mem
0x10000000000-0x1033fffffff]. In fact, we only expect to see that there's
only one memory section's memmap initialized. That's why more time is
costed at the time.[ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[ 0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
[ 0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
[ 0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
[ 0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
down the real zone end pfn so that defer_init() can use it to judge
whether defer need be taken in zone wide.Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com
Fixes: commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Baoquan He
Reported-by: Rahul Gopakumar
Reviewed-by: Mike Rapoport
Cc: David Hildenbrand
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman
30 Dec, 2020
2 commits
-
Changes in 5.10.4
hwmon: (k10temp) Remove support for displaying voltage and current on Zen CPUs
drm/gma500: fix double free of gma_connector
iio: adc: at91_adc: add Kconfig dep on the OF symbol and remove of_match_ptr()
drm/aspeed: Fix Kconfig warning & subsequent build errors
drm/mcde: Fix handling of platform_get_irq() error
drm/tve200: Fix handling of platform_get_irq() error
arm64: dts: renesas: hihope-rzg2-ex: Drop rxc-skew-ps from ethernet-phy node
arm64: dts: renesas: cat875: Remove rxc-skew-ps from ethernet-phy node
soc: renesas: rmobile-sysc: Fix some leaks in rmobile_init_pm_domains()
soc: mediatek: Check if power domains can be powered on at boot time
arm64: dts: mediatek: mt8183: fix gce incorrect mbox-cells value
arm64: dts: ipq6018: update the reserved-memory node
arm64: dts: qcom: sc7180: Fix one forgotten interconnect reference
soc: qcom: geni: More properly switch to DMA mode
Revert "i2c: i2c-qcom-geni: Fix DMA transfer race"
RDMA/bnxt_re: Set queue pair state when being queried
rtc: pcf2127: fix pcf2127_nvmem_read/write() returns
RDMA/bnxt_re: Fix entry size during SRQ create
selinux: fix error initialization in inode_doinit_with_dentry()
ARM: dts: aspeed-g6: Fix the GPIO memory size
ARM: dts: aspeed: s2600wf: Fix VGA memory region location
RDMA/core: Fix error return in _ib_modify_qp()
RDMA/rxe: Compute PSN windows correctly
x86/mm/ident_map: Check for errors from ident_pud_init()
ARM: p2v: fix handling of LPAE translation in BE mode
RDMA/rtrs-clt: Remove destroy_con_cq_qp in case route resolving failed
RDMA/rtrs-clt: Missing error from rtrs_rdma_conn_established
RDMA/rtrs-srv: Don't guard the whole __alloc_srv with srv_mutex
x86/apic: Fix x2apic enablement without interrupt remapping
ASoC: qcom: fix unsigned int bitwidth compared to less than zero
sched/deadline: Fix sched_dl_global_validate()
sched: Reenable interrupts in do_sched_yield()
drm/amdgpu: fix incorrect enum type
crypto: talitos - Endianess in current_desc_hdr()
crypto: talitos - Fix return type of current_desc_hdr()
crypto: inside-secure - Fix sizeof() mismatch
ASoC: sun4i-i2s: Fix lrck_period computation for I2S justified mode
drm/msm: Add missing stub definition
ARM: dts: aspeed: tiogapass: Remove vuart
drm/amdgpu: fix build_coefficients() argument
powerpc/64: Set up a kernel stack for secondaries before cpu_restore()
spi: img-spfi: fix reference leak in img_spfi_resume
f2fs: call f2fs_get_meta_page_retry for nat page
RDMA/mlx5: Fix corruption of reg_pages in mlx5_ib_rereg_user_mr()
perf test: Use generic event for expand_libpfm_events()
drm/msm/dp: DisplayPort PHY compliance tests fixup
drm/msm/dsi_pll_7nm: restore VCO rate during restore_state
drm/msm/dsi_pll_10nm: restore VCO rate during restore_state
drm/msm/dpu: fix clock scaling on non-sc7180 board
spi: spi-mem: fix reference leak in spi_mem_access_start
scsi: aacraid: Improve compat_ioctl handlers
pinctrl: core: Add missing #ifdef CONFIG_GPIOLIB
ASoC: pcm: DRAIN support reactivation
drm/bridge: tpd12s015: Fix irq registering in tpd12s015_probe
crypto: arm64/poly1305-neon - reorder PAC authentication with SP update
crypto: arm/aes-neonbs - fix usage of cbc(aes) fallback
crypto: caam - fix printing on xts fallback allocation error path
selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling
nl80211/cfg80211: fix potential infinite loop
spi: stm32: fix reference leak in stm32_spi_resume
bpf: Fix tests for local_storage
x86/mce: Correct the detection of invalid notifier priorities
drm/edid: Fix uninitialized variable in drm_cvt_modes()
ath11k: Initialize complete alpha2 for regulatory change
ath11k: Fix number of rules in filtered ETSI regdomain
ath11k: fix wmi init configuration
brcmfmac: Fix memory leak for unpaired brcmf_{alloc/free}
arm64: dts: exynos: Include common syscon restart/poweroff for Exynos7
arm64: dts: exynos: Correct psci compatible used on Exynos7
drm/panel: simple: Add flags to boe_nv133fhm_n61
Bluetooth: Fix null pointer dereference in hci_event_packet()
Bluetooth: Fix: LL PRivacy BLE device fails to connect
Bluetooth: hci_h5: fix memory leak in h5_close
spi: stm32-qspi: fix reference leak in stm32 qspi operations
spi: spi-ti-qspi: fix reference leak in ti_qspi_setup
spi: mt7621: fix missing clk_disable_unprepare() on error in mt7621_spi_probe
spi: tegra20-slink: fix reference leak in slink ops of tegra20
spi: tegra20-sflash: fix reference leak in tegra_sflash_resume
spi: tegra114: fix reference leak in tegra spi ops
spi: bcm63xx-hsspi: fix missing clk_disable_unprepare() on error in bcm63xx_hsspi_resume
spi: imx: fix reference leak in two imx operations
ASoC: qcom: common: Fix refcounting in qcom_snd_parse_of()
ath11k: Handle errors if peer creation fails
mwifiex: fix mwifiex_shutdown_sw() causing sw reset failure
drm/msm/a6xx: Clear shadow on suspend
drm/msm/a5xx: Clear shadow on suspend
firmware: tegra: fix strncpy()/strncat() confusion
drm/msm/dp: return correct connection status after suspend
drm/msm/dp: skip checking LINK_STATUS_UPDATED bit
drm/msm/dp: do not notify audio subsystem if sink doesn't support audio
selftests/run_kselftest.sh: fix dry-run typo
selftest/bpf: Add missed ip6ip6 test back
ASoC: wm8994: Fix PM disable depth imbalance on error
ASoC: wm8998: Fix PM disable depth imbalance on error
spi: sprd: fix reference leak in sprd_spi_remove
virtiofs fix leak in setup
ASoC: arizona: Fix a wrong free in wm8997_probe
RDMa/mthca: Work around -Wenum-conversion warning
ASoC: SOF: Intel: fix Kconfig dependency for SND_INTEL_DSP_CONFIG
arm64: dts: ti: k3-am65*/j721e*: Fix unit address format error for dss node
MIPS: BCM47XX: fix kconfig dependency bug for BCM47XX_BCMA
drm/amdgpu: fix compute queue priority if num_kcq is less than 4
soc: ti: omap-prm: Do not check rstst bit on deassert if already deasserted
crypto: Kconfig - CRYPTO_MANAGER_EXTRA_TESTS requires the manager
crypto: qat - fix status check in qat_hal_put_rel_rd_xfer()
firmware: arm_scmi: Fix missing destroy_workqueue()
drm/udl: Fix missing error code in udl_handle_damage()
staging: greybus: codecs: Fix reference counter leak in error handling
staging: gasket: interrupt: fix the missed eventfd_ctx_put() in gasket_interrupt.c
scripts: kernel-doc: Restore anonymous enum parsing
drm/amdkfd: Put ACPI table after using it
ionic: use mc sync for multicast filters
ionic: flatten calls to ionic_lif_rx_mode
ionic: change set_rx_mode from_ndo to can_sleep
media: tm6000: Fix sizeof() mismatches
media: platform: add missing put_device() call in mtk_jpeg_clk_init()
media: mtk-vcodec: add missing put_device() call in mtk_vcodec_init_dec_pm()
media: mtk-vcodec: add missing put_device() call in mtk_vcodec_release_dec_pm()
media: mtk-vcodec: add missing put_device() call in mtk_vcodec_init_enc_pm()
media: v4l2-fwnode: Return -EINVAL for invalid bus-type
media: v4l2-fwnode: v4l2_fwnode_endpoint_parse caller must init vep argument
media: ov5640: fix support of BT656 bus mode
media: staging: rkisp1: cap: fix runtime PM imbalance on error
media: cedrus: fix reference leak in cedrus_start_streaming
media: platform: add missing put_device() call in mtk_jpeg_probe() and mtk_jpeg_remove()
media: venus: core: change clk enable and disable order in resume and suspend
media: venus: core: vote for video-mem path
media: venus: core: vote with average bandwidth and peak bandwidth as zero
RDMA/cma: Add missing error handling of listen_id
ASoC: meson: fix COMPILE_TEST error
spi: dw: fix build error by selecting MULTIPLEXER
scsi: core: Fix VPD LUN ID designator priorities
media: venus: put dummy vote on video-mem path after last session release
media: solo6x10: fix missing snd_card_free in error handling case
video: fbdev: atmel_lcdfb: fix return error code in atmel_lcdfb_of_init()
mmc: sdhci: tegra: fix wrong unit with busy_timeout
drm/omap: dmm_tiler: fix return error code in omap_dmm_probe()
drm/meson: Free RDMA resources after tearing down DRM
drm/meson: Unbind all connectors on module removal
drm/meson: dw-hdmi: Register a callback to disable the regulator
drm/meson: dw-hdmi: Ensure that clocks are enabled before touching the TOP registers
ASoC: intel: SND_SOC_INTEL_KEEMBAY should depend on ARCH_KEEMBAY
iommu/vt-d: include conditionally on CONFIG_INTEL_IOMMU_SVM
Input: ads7846 - fix race that causes missing releases
Input: ads7846 - fix integer overflow on Rt calculation
Input: ads7846 - fix unaligned access on 7845
bus: mhi: core: Remove double locking from mhi_driver_remove()
bus: mhi: core: Fix null pointer access when parsing MHI configuration
usb/max3421: fix return error code in max3421_probe()
spi: mxs: fix reference leak in mxs_spi_probe
selftests/bpf: Fix broken riscv build
powerpc: Avoid broken GCC __attribute__((optimize))
powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32
ARM: dts: tacoma: Fix node vs reg mismatch for flash memory
Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"
powerpc/powernv/sriov: fix unsigned int win compared to less than zero
mfd: htc-i2cpld: Add the missed i2c_put_adapter() in htcpld_register_chip_i2c()
mfd: MFD_SL28CPLD should depend on ARCH_LAYERSCAPE
mfd: stmfx: Fix dev_err_probe() call in stmfx_chip_init()
mfd: cpcap: Fix interrupt regression with regmap clear_ack
EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId
scsi: ufs: Avoid to call REQ_CLKS_OFF to CLKS_OFF
scsi: ufs: Fix clkgating on/off
rcu: Allow rcu_irq_enter_check_tick() from NMI
rcu,ftrace: Fix ftrace recursion
rcu/tree: Defer kvfree_rcu() allocation to a clean context
crypto: crypto4xx - Replace bitwise OR with logical OR in crypto4xx_build_pd
crypto: omap-aes - Fix PM disable depth imbalance in omap_aes_probe
crypto: sun8i-ce - fix two error path's memory leak
spi: fix resource leak for drivers without .remove callback
drm/meson: dw-hdmi: Disable clocks on driver teardown
drm/meson: dw-hdmi: Enable the iahb clock early enough
PCI: Disable MSI for Pericom PCIe-USB adapter
PCI: brcmstb: Initialize "tmp" before use
soc: ti: knav_qmss: fix reference leak in knav_queue_probe
soc: ti: Fix reference imbalance in knav_dma_probe
drivers: soc: ti: knav_qmss_queue: Fix error return code in knav_queue_probe
soc: qcom: initialize local variable
arm64: dts: qcom: sm8250: correct compatible for sm8250-mtp
arm64: dts: qcom: msm8916-samsung-a2015: Disable muic i2c pin bias
Input: omap4-keypad - fix runtime PM error handling
clk: meson: Kconfig: fix dependency for G12A
staging: mfd: hi6421-spmi-pmic: fix error return code in hi6421_spmi_pmic_probe()
ath11k: Fix the rx_filter flag setting for peer rssi stats
RDMA/cxgb4: Validate the number of CQEs
soundwire: Fix DEBUG_LOCKS_WARN_ON for uninitialized attribute
pinctrl: sunxi: fix irq bank map for the Allwinner A100 pin controller
memstick: fix a double-free bug in memstick_check
ARM: dts: at91: sam9x60: add pincontrol for USB Host
ARM: dts: at91: sama5d4_xplained: add pincontrol for USB Host
ARM: dts: at91: sama5d3_xplained: add pincontrol for USB Host
mmc: pxamci: Fix error return code in pxamci_probe
brcmfmac: fix error return code in brcmf_cfg80211_connect()
orinoco: Move context allocation after processing the skb
qtnfmac: fix error return code in qtnf_pcie_probe()
rsi: fix error return code in rsi_reset_card()
cw1200: fix missing destroy_workqueue() on error in cw1200_init_common
dmaengine: mv_xor_v2: Fix error return code in mv_xor_v2_probe()
arm64: dts: qcom: sdm845: Limit ipa iommu streams
leds: netxbig: add missing put_device() call in netxbig_leds_get_of_pdata()
leds: lp50xx: Fix an error handling path in 'lp50xx_probe_dt()'
leds: turris-omnia: check for LED_COLOR_ID_RGB instead LED_COLOR_ID_MULTI
arm64: tegra: Fix DT binding for IO High Voltage entry
RDMA/cma: Fix deadlock on &lock in rdma_cma_listen_on_all() error unwind
soundwire: qcom: Fix build failure when slimbus is module
drm/imx/dcss: fix rotations for Vivante tiled formats
media: siano: fix memory leak of debugfs members in smsdvb_hotplug
platform/x86: mlx-platform: Remove PSU EEPROM from default platform configuration
platform/x86: mlx-platform: Remove PSU EEPROM from MSN274x platform configuration
arm64: dts: qcom: sc7180: limit IPA iommu streams
RDMA/hns: Only record vlan info for HIP08
RDMA/hns: Fix missing fields in address vector
RDMA/hns: Avoid setting loopback indicator when smac is same as dmac
serial: 8250-mtk: Fix reference leak in mtk8250_probe
samples: bpf: Fix lwt_len_hist reusing previous BPF map
media: imx214: Fix stop streaming
mips: cdmm: fix use-after-free in mips_cdmm_bus_discover
media: max2175: fix max2175_set_csm_mode() error code
slimbus: qcom-ngd-ctrl: Avoid sending power requests without QMI
RDMA/core: Track device memory MRs
drm/mediatek: Use correct aliases name for ovl
HSI: omap_ssi: Don't jump to free ID in ssi_add_controller()
ARM: dts: Remove non-existent i2c1 from 98dx3236
arm64: dts: armada-3720-turris-mox: update ethernet-phy handle name
power: supply: bq25890: Use the correct range for IILIM register
arm64: dts: rockchip: Set dr_mode to "host" for OTG on rk3328-roc-cc
power: supply: max17042_battery: Fix current_{avg,now} hiding with no current sense
power: supply: axp288_charger: Fix HP Pavilion x2 10 DMI matching
power: supply: bq24190_charger: fix reference leak
genirq/irqdomain: Don't try to free an interrupt that has no mapping
arm64: dts: ls1028a: fix ENETC PTP clock input
arm64: dts: ls1028a: fix FlexSPI clock input
arm64: dts: freescale: sl28: combine SPI MTD partitions
phy: tegra: xusb: Fix usb_phy device driver field
arm64: dts: qcom: c630: Polish i2c-hid devices
arm64: dts: qcom: c630: Fix pinctrl pins properties
PCI: Bounds-check command-line resource alignment requests
PCI: Fix overflow in command-line resource alignment requests
PCI: iproc: Fix out-of-bound array accesses
PCI: iproc: Invalidate correct PAXB inbound windows
arm64: dts: meson: fix spi-max-frequency on Khadas VIM2
arm64: dts: meson-sm1: fix typo in opp table
soc: amlogic: canvas: add missing put_device() call in meson_canvas_get()
scsi: hisi_sas: Fix up probe error handling for v3 hw
scsi: pm80xx: Do not sleep in atomic context
spi: spi-fsl-dspi: Use max_native_cs instead of num_chipselect to set SPI_MCR
ARM: dts: at91: at91sam9rl: fix ADC triggers
RDMA/hns: Fix 0-length sge calculation error
RDMA/hns: Bugfix for calculation of extended sge
mailbox: arm_mhu_db: Fix mhu_db_shutdown by replacing kfree with devm_kfree
soundwire: master: use pm_runtime_set_active() on add
platform/x86: dell-smbios-base: Fix error return code in dell_smbios_init
ASoC: Intel: Boards: tgl_max98373: update TDM slot_width
media: max9271: Fix GPIO enable/disable
media: rdacm20: Enable GPIO1 explicitly
media: i2c: imx219: Selection compliance fixes
ath11k: Don't cast ath11k_skb_cb to ieee80211_tx_info.control
ath11k: Reset ath11k_skb_cb before setting new flags
ath11k: Fix an error handling path
ath10k: Fix the parsing error in service available event
ath10k: Fix an error handling path
ath10k: Release some resources in an error handling path
SUNRPC: rpc_wake_up() should wake up tasks in the correct order
NFSv4.2: condition READDIR's mask for security label based on LSM state
SUNRPC: xprt_load_transport() needs to support the netid "rdma6"
NFSv4: Fix the alignment of page data in the getdeviceinfo reply
net: sunrpc: Fix 'snprintf' return value check in 'do_xprt_debugfs'
lockd: don't use interval-based rebinding over TCP
NFS: switch nfsiod to be an UNBOUND workqueue.
selftests/seccomp: Update kernel config
vfio-pci: Use io_remap_pfn_range() for PCI IO memory
hwmon: (ina3221) Fix PM usage counter unbalance in ina3221_write_enable
f2fs: fix double free of unicode map
media: tvp5150: Fix wrong return value of tvp5150_parse_dt()
media: saa7146: fix array overflow in vidioc_s_audio()
powerpc/perf: Fix crash with is_sier_available when pmu is not set
powerpc/64: Fix an EMIT_BUG_ENTRY in head_64.S
powerpc/xmon: Fix build failure for 8xx
powerpc/perf: Fix to update radix_scope_qual in power10
powerpc/perf: Update the PMU group constraints for l2l3 events in power10
powerpc/perf: Fix the PMU group constraints for threshold events in power10
clocksource/drivers/orion: Add missing clk_disable_unprepare() on error path
clocksource/drivers/cadence_ttc: Fix memory leak in ttc_setup_clockevent()
clocksource/drivers/ingenic: Fix section mismatch
clocksource/drivers/riscv: Make RISCV_TIMER depends on RISCV_SBI
arm64: mte: fix prctl(PR_GET_TAGGED_ADDR_CTRL) if TCF0=NONE
iio: hrtimer-trigger: Mark hrtimer to expire in hard interrupt context
libbpf: Sanitise map names before pinning
ARM: dts: at91: sam9x60ek: remove bypass property
ARM: dts: at91: sama5d2: map securam as device
scripts: kernel-doc: fix parsing function-like typedefs
bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address()
selftests/bpf: Fix invalid use of strncat in test_sockmap
pinctrl: falcon: add missing put_device() call in pinctrl_falcon_probe()
soc: rockchip: io-domain: Fix error return code in rockchip_iodomain_probe()
arm64: dts: rockchip: Fix UART pull-ups on rk3328
memstick: r592: Fix error return in r592_probe()
MIPS: Don't round up kernel sections size for memblock_add()
mt76: mt7663s: fix a possible ple quota underflow
mt76: mt7915: set fops_sta_stats.owner to THIS_MODULE
mt76: set fops_tx_stats.owner to THIS_MODULE
mt76: dma: fix possible deadlock running mt76_dma_cleanup
net/mlx5: Properly convey driver version to firmware
mt76: fix memory leak if device probing fails
mt76: fix tkip configuration for mt7615/7663 devices
ASoC: jz4740-i2s: add missed checks for clk_get()
ASoC: q6afe-clocks: Add missing parent clock rate
dm ioctl: fix error return code in target_message
ASoC: cros_ec_codec: fix uninitialized memory read
ASoC: atmel: mchp-spdifrx needs COMMON_CLK
ASoC: qcom: fix QDSP6 dependencies, attempt #3
phy: mediatek: allow compile-testing the hdmi phy
phy: renesas: rcar-gen3-usb2: disable runtime pm in case of failure
memory: ti-emif-sram: only build for ARMv7
memory: jz4780_nemc: Fix potential NULL dereference in jz4780_nemc_probe()
drm/msm: a5xx: Make preemption reset case reentrant
drm/msm: add IOMMU_SUPPORT dependency
clocksource/drivers/arm_arch_timer: Use stable count reader in erratum sne
clocksource/drivers/arm_arch_timer: Correct fault programming of CNTKCTL_EL1.EVNTI
cpufreq: ap806: Add missing MODULE_DEVICE_TABLE
cpufreq: highbank: Add missing MODULE_DEVICE_TABLE
cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE
cpufreq: qcom: Add missing MODULE_DEVICE_TABLE
cpufreq: st: Add missing MODULE_DEVICE_TABLE
cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE
cpufreq: loongson1: Add missing MODULE_ALIAS
cpufreq: scpi: Add missing MODULE_ALIAS
cpufreq: vexpress-spc: Add missing MODULE_ALIAS
cpufreq: imx: fix NVMEM_IMX_OCOTP dependency
macintosh/adb-iop: Always wait for reply message from IOP
macintosh/adb-iop: Send correct poll command
staging: bcm2835: fix vchiq_mmal dependencies
staging: greybus: audio: Fix possible leak free widgets in gbaudio_dapm_free_controls
spi: dw: Fix error return code in dw_spi_bt1_probe()
Bluetooth: btusb: Add the missed release_firmware() in btusb_mtk_setup_firmware()
Bluetooth: btmtksdio: Add the missed release_firmware() in mtk_setup_firmware()
Bluetooth: sco: Fix crash when using BT_SNDMTU/BT_RCVMTU option
block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name
block/rnbd: fix a null pointer dereference on dev->blk_symlink_name
Bluetooth: btusb: Fix detection of some fake CSR controllers with a bcdDevice val of 0x0134
platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on some HP x360 models
adm8211: fix error return code in adm8211_probe()
mtd: spi-nor: sst: fix BPn bits for the SST25VF064C
mtd: spi-nor: ignore errors in spi_nor_unlock_all()
mtd: spi-nor: atmel: remove global protection flag
mtd: spi-nor: atmel: fix unlock_all() for AT25FS010/040
arm64: dts: meson: g12b: odroid-n2: fix PHY deassert timing requirements
arm64: dts: meson: fix PHY deassert timing requirements
ARM: dts: meson: fix PHY deassert timing requirements
arm64: dts: meson: g12a: x96-max: fix PHY deassert timing requirements
arm64: dts: meson: g12b: w400: fix PHY deassert timing requirements
clk: fsl-sai: fix memory leak
scsi: qedi: Fix missing destroy_workqueue() on error in __qedi_probe
scsi: pm80xx: Fix error return in pm8001_pci_probe()
scsi: iscsi: Fix inappropriate use of put_device()
seq_buf: Avoid type mismatch for seq_buf_init
scsi: fnic: Fix error return code in fnic_probe()
platform/x86: mlx-platform: Fix item counter assignment for MSN2700, MSN24xx systems
platform/x86: mlx-platform: Fix item counter assignment for MSN2700/ComEx system
ARM: 9030/1: entry: omit FP emulation for UND exceptions taken in kernel mode
powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops
powerpc/pseries/hibernation: remove redundant cacheinfo update
powerpc/powermac: Fix low_sleep_handler with CONFIG_VMAP_STACK
drm/mediatek: avoid dereferencing a null hdmi_phy on an error message
ASoC: amd: change clk_get() to devm_clk_get() and add missed checks
coresight: remove broken __exit annotations
ASoC: max98390: Fix error codes in max98390_dsm_init()
powerpc/mm: sanity_check_fault() should work for all, not only BOOK3S
usb: ehci-omap: Fix PM disable depth umbalance in ehci_hcd_omap_probe
usb: oxu210hp-hcd: Fix memory leak in oxu_create
speakup: fix uninitialized flush_lock
nfsd: Fix message level for normal termination
NFSD: Fix 5 seconds delay when doing inter server copy
nfs_common: need lock during iterate through the list
x86/kprobes: Restore BTF if the single-stepping is cancelled
scsi: qla2xxx: Fix FW initialization error on big endian machines
scsi: qla2xxx: Fix N2N and NVMe connect retry failure
platform/chrome: cros_ec_spi: Don't overwrite spi::mode
misc: pci_endpoint_test: fix return value of error branch
bus: fsl-mc: add back accidentally dropped error check
bus: fsl-mc: fix error return code in fsl_mc_object_allocate()
fsi: Aspeed: Add mutex to protect HW access
s390/cio: fix use-after-free in ccw_device_destroy_console
iwlwifi: dbg-tlv: fix old length in is_trig_data_contained()
iwlwifi: mvm: hook up missing RX handlers
erofs: avoid using generic_block_bmap
clk: renesas: r8a779a0: Fix R and OSC clocks
can: m_can: m_can_config_endisable(): remove double clearing of clock stop request bit
powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
powerpc/sstep: Cover new VSX instructions under CONFIG_VSX
slimbus: qcom: fix potential NULL dereference in qcom_slim_prg_slew()
ALSA: hda/hdmi: fix silent stream for first playback to DP
RDMA/core: Do not indicate device ready when device enablement fails
RDMA/uverbs: Fix incorrect variable type
remoteproc/mediatek: change MT8192 CFG register base
remoteproc/mtk_scp: surround DT device IDs with CONFIG_OF
remoteproc: q6v5-mss: fix error handling in q6v5_pds_enable
remoteproc: qcom: fix reference leak in adsp_start
remoteproc: qcom: pas: fix error handling in adsp_pds_enable
remoteproc: k3-dsp: Fix return value check in k3_dsp_rproc_of_get_memories()
remoteproc: qcom: Fix potential NULL dereference in adsp_init_mmio()
remoteproc/mediatek: unprepare clk if scp_before_load fails
clk: qcom: gcc-sc7180: Use floor ops for sdcc clks
clk: tegra: Fix duplicated SE clock entry
mtd: rawnand: gpmi: fix reference count leak in gpmi ops
mtd: rawnand: meson: Fix a resource leak in init
mtd: rawnand: gpmi: Fix the random DMA timeout issue
samples/bpf: Fix possible hang in xdpsock with multiple threads
fs: Handle I_DONTCACHE in iput_final() instead of generic_drop_inode()
extcon: max77693: Fix modalias string
crypto: atmel-i2c - select CONFIG_BITREVERSE
mac80211: don't set set TDLS STA bandwidth wider than possible
mac80211: fix a mistake check for rx_stats update
ASoC: wm_adsp: remove "ctl" from list on error in wm_adsp_create_control()
irqchip/alpine-msi: Fix freeing of interrupts on allocation error path
irqchip/ti-sci-inta: Fix printing of inta id on probe success
irqchip/ti-sci-intr: Fix freeing of irqs
dmaengine: ti: k3-udma: Correct normal channel offset when uchan_cnt is not 0
RDMA/hns: Limit the length of data copied between kernel and userspace
RDMA/hns: Normalization the judgment of some features
RDMA/hns: Do shift on traffic class when using RoCEv2
gpiolib: irq hooks: fix recursion in gpiochip_irq_unmask
ath11k: Fix incorrect tlvs in scan start command
irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling
watchdog: armada_37xx: Add missing dependency on HAS_IOMEM
watchdog: sirfsoc: Add missing dependency on HAS_IOMEM
watchdog: sprd: remove watchdog disable from resume fail path
watchdog: sprd: check busy bit before new loading rather than after that
watchdog: Fix potential dereferencing of null pointer
ubifs: Fix error return code in ubifs_init_authentication()
um: Monitor error events in IRQ controller
um: tty: Fix handling of close in tty lines
um: chan_xterm: Fix fd leak
sunrpc: fix xs_read_xdr_buf for partial pages receive
RDMA/mlx5: Fix MR cache memory leak
RDMA/cma: Don't overwrite sgid_attr after device is released
nfc: s3fwrn5: Release the nfc firmware
drm: mxsfb: Silence -EPROBE_DEFER while waiting for bridge
powerpc/perf: Fix Threshold Event Counter Multiplier width for P10
powerpc/ps3: use dma_mapping_error()
perf test: Fix metric parsing test
drm/amdgpu: fix regression in vbios reservation handling on headless
mm/gup: reorganize internal_get_user_pages_fast()
mm/gup: prevent gup_fast from racing with COW during fork
mm/gup: combine put_compound_head() and unpin_user_page()
mm: memcg/slab: fix return of child memcg objcg for root memcg
mm: memcg/slab: fix use after free in obj_cgroup_charge
mm/rmap: always do TTU_IGNORE_ACCESS
sparc: fix handling of page table constructor failure
mm/vmalloc: Fix unlock order in s_stop()
mm/vmalloc.c: fix kasan shadow poisoning size
mm,memory_failure: always pin the page in madvise_inject_error
hugetlb: fix an error code in hugetlb_reserve_pages()
mm: don't wake kswapd prematurely when watermark boosting is disabled
proc: fix lookup in /proc/net subdirectories after setns(2)
checkpatch: fix unescaped left brace
s390/test_unwind: fix CALL_ON_STACK tests
lan743x: fix rx_napi_poll/interrupt ping-pong
ice, xsk: clear the status bits for the next_to_use descriptor
i40e, xsk: clear the status bits for the next_to_use descriptor
net: dsa: qca: ar9331: fix sleeping function called from invalid context bug
dpaa2-eth: fix the size of the mapped SGT buffer
net: bcmgenet: Fix a resource leak in an error handling path in the probe functin
net: mscc: ocelot: Fix a resource leak in the error handling path of the probe function
net: allwinner: Fix some resources leak in the error handling path of the probe and in the remove function
block/rnbd-clt: Get rid of warning regarding size argument in strlcpy
block/rnbd-clt: Fix possible memleak
NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read()
net: korina: fix return value
devlink: use _BITUL() macro instead of BIT() in the UAPI header
libnvdimm/label: Return -ENXIO for no slot in __blk_label_update
powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug
watchdog: qcom: Avoid context switch in restart handler
watchdog: coh901327: add COMMON_CLK dependency
clk: ti: Fix memleak in ti_fapll_synth_setup
pwm: zx: Add missing cleanup in error path
pwm: lp3943: Dynamically allocate PWM chip base
pwm: imx27: Fix overflow for bigger periods
pwm: sun4i: Remove erroneous else branch
io_uring: cancel only requests of current task
tools build: Add missing libcap to test-all.bin target
perf record: Fix memory leak when using '--user-regs=?' to list registers
qlcnic: Fix error code in probe
nfp: move indirect block cleanup to flower app stop callback
vdpa/mlx5: Use write memory barrier after updating CQ index
virtio_ring: Cut and paste bugs in vring_create_virtqueue_packed()
virtio_net: Fix error code in probe()
virtio_ring: Fix two use after free bugs
vhost scsi: fix error return code in vhost_scsi_set_endpoint()
epoll: check for events when removing a timed out thread from the wait queue
clk: bcm: dvp: Add MODULE_DEVICE_TABLE()
clk: at91: sama7g5: fix compilation error
clk: at91: sam9x60: remove atmel,osc-bypass support
clk: s2mps11: Fix a resource leak in error handling paths in the probe function
clk: sunxi-ng: Make sure divider tables have sentinel
clk: vc5: Use "idt,voltage-microvolt" instead of "idt,voltage-microvolts"
kconfig: fix return value of do_error_if()
powerpc/boot: Fix build of dts/fsl
powerpc/smp: Add __init to init_big_cores()
ARM: 9044/1: vfp: use undef hook for VFP support detection
ARM: 9036/1: uncompress: Fix dbgadtb size parameter name
perf probe: Fix memory leak when synthesizing SDT probes
io_uring: fix racy IOPOLL flush overflow
io_uring: cancel reqs shouldn't kill overflow list
Smack: Handle io_uring kernel thread privileges
proc mountinfo: make splice available again
io_uring: fix io_cqring_events()'s noflush
io_uring: fix racy IOPOLL completions
io_uring: always let io_iopoll_complete() complete polled io
vfio/pci: Move dummy_resources_list init in vfio_pci_probe()
vfio/pci/nvlink2: Do not attempt NPU2 setup on POWER8NVL NPU
media: gspca: Fix memory leak in probe
io_uring: fix io_wqe->work_list corruption
io_uring: fix 0-iov read buffer select
io_uring: hold uring_lock while completing failed polled io in io_wq_submit_work()
io_uring: fix ignoring xa_store errors
io_uring: fix double io_uring free
io_uring: make ctx cancel on exit targeted to actual ctx
media: sunxi-cir: ensure IR is handled when it is continuous
media: netup_unidvb: Don't leak SPI master in probe error path
media: ipu3-cio2: Remove traces of returned buffers
media: ipu3-cio2: Return actual subdev format
media: ipu3-cio2: Serialise access to pad format
media: ipu3-cio2: Validate mbus format in setting subdev format
media: ipu3-cio2: Make the field on subdev format V4L2_FIELD_NONE
Input: cyapa_gen6 - fix out-of-bounds stack access
ALSA: hda/ca0132 - Change Input Source enum strings.
ACPI: NFIT: Fix input validation of bus-family
PM: ACPI: PCI: Drop acpi_pm_set_bridge_wakeup()
Revert "ACPI / resources: Use AE_CTRL_TERMINATE to terminate resources walks"
ACPI: PNP: compare the string length in the matching_id()
ALSA: hda: Fix regressions on clear and reconfig sysfs
ALSA: hda/ca0132 - Fix AE-5 rear headphone pincfg.
ALSA: hda/realtek: make bass spk volume adjustable on a yoga laptop
ALSA: hda/realtek - Enable headset mic of ASUS X430UN with ALC256
ALSA: hda/realtek - Enable headset mic of ASUS Q524UQK with ALC255
ALSA: hda/realtek - Add supported for more Lenovo ALC285 Headset Button
ALSA: pcm: oss: Fix a few more UBSAN fixes
ALSA/hda: apply jack fixup for the Acer Veriton N4640G/N6640G/N2510G
ALSA: hda/realtek: Add quirk for MSI-GP73
ALSA: hda/realtek: Apply jack fixup for Quanta NL3
ALSA: hda/realtek: Remove dummy lineout on Acer TravelMate P648/P658
ALSA: hda/realtek - Supported Dell fixed type headset
ALSA: usb-audio: Add VID to support native DSD reproduction on FiiO devices
ALSA: usb-audio: Disable sample read check if firmware doesn't give back
ALSA: usb-audio: Add alias entry for ASUS PRIME TRX40 PRO-S
ALSA: core: memalloc: add page alignment for iram
s390/smp: perform initial CPU reset also for SMT siblings
s390/kexec_file: fix diag308 subcode when loading crash kernel
s390/idle: add missing mt_cycles calculation
s390/idle: fix accounting with machine checks
s390/dasd: fix hanging device offline processing
s390/dasd: prevent inconsistent LCU device data
s390/dasd: fix list corruption of pavgroup group list
s390/dasd: fix list corruption of lcu list
binder: add flag to clear buffer on txn complete
ASoC: cx2072x: Fix doubly definitions of Playback and Capture streams
ASoC: AMD Renoir - add DMI table to avoid the ACP mic probe (broken BIOS)
ASoC: AMD Raven/Renoir - fix the PCI probe (PCI revision)
staging: comedi: mf6x4: Fix AI end-of-conversion detection
z3fold: simplify freeing slots
z3fold: stricter locking and more careful reclaim
perf/x86/intel: Add event constraint for CYCLE_ACTIVITY.STALLS_MEM_ANY
perf/x86/intel: Fix rtm_abort_event encoding on Ice Lake
perf/x86/intel/lbr: Fix the return type of get_lbr_cycles()
powerpc/perf: Exclude kernel samples while counting events in user space.
cpufreq: intel_pstate: Use most recent guaranteed performance values
crypto: ecdh - avoid unaligned accesses in ecdh_set_secret()
crypto: arm/aes-ce - work around Cortex-A57/A72 silion errata
m68k: Fix WARNING splat in pmac_zilog driver
Documentation: seqlock: s/LOCKTYPE/LOCKNAME/g
EDAC/i10nm: Use readl() to access MMIO registers
EDAC/amd64: Fix PCI component registration
cpuset: fix race between hotplug work and later CPU offline
dyndbg: fix use before null check
USB: serial: mos7720: fix parallel-port state restore
USB: serial: digi_acceleport: fix write-wakeup deadlocks
USB: serial: keyspan_pda: fix dropped unthrottle interrupts
USB: serial: keyspan_pda: fix write deadlock
USB: serial: keyspan_pda: fix stalled writes
USB: serial: keyspan_pda: fix write-wakeup use-after-free
USB: serial: keyspan_pda: fix tx-unthrottle use-after-free
USB: serial: keyspan_pda: fix write unthrottling
btrfs: do not shorten unpin len for caching block groups
btrfs: update last_byte_to_unpin in switch_commit_roots
btrfs: fix race when defragmenting leads to unnecessary IO
ext4: fix an IS_ERR() vs NULL check
ext4: fix a memory leak of ext4_free_data
ext4: fix deadlock with fs freezing and EA inodes
ext4: don't remount read-only with errors=continue on reboot
RISC-V: Fix usage of memblock_enforce_memory_limit
arm64: dts: ti: k3-am65: mark dss as dma-coherent
arm64: dts: marvell: keep SMMU disabled by default for Armada 7040 and 8040
KVM: arm64: Introduce handling of AArch32 TTBCR2 traps
KVM: x86: reinstate vendor-agnostic check on SPEC_CTRL cpuid bits
KVM: SVM: Remove the call to sev_platform_status() during setup
iommu/arm-smmu: Allow implementation specific write_s2cr
iommu/arm-smmu-qcom: Read back stream mappings
iommu/arm-smmu-qcom: Implement S2CR quirk
ARM: dts: pandaboard: fix pinmux for gpio user button of Pandaboard ES
ARM: dts: at91: sama5d2: fix CAN message ram offset and size
ARM: tegra: Populate OPP table for Tegra20 Ventana
xprtrdma: Fix XDRBUF_SPARSE_PAGES support
powerpc/32: Fix vmap stack - Properly set r1 before activating MMU on syscall too
powerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_at
powerpc/rtas: Fix typo of ibm,open-errinjct in RTAS filter
powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()
powerpc/feature: Add CPU_FTR_NOEXECUTE to G2_LE
powerpc/xmon: Change printk() to pr_cont()
powerpc/8xx: Fix early debug when SMC1 is relocated
powerpc/mm: Fix verification of MMU_FTR_TYPE_44x
powerpc/powernv/npu: Do not attempt NPU2 setup on POWER8NVL NPU
powerpc/powernv/memtrace: Don't leak kernel memory to user space
powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrently
ovl: make ioctl() safe
ima: Don't modify file descriptor mode on the fly
um: Remove use of asprinf in umid.c
um: Fix time-travel mode
ceph: fix race in concurrent __ceph_remove_cap invocations
SMB3: avoid confusing warning message on mount to Azure
SMB3.1.1: remove confusing mount warning when no SPNEGO info on negprot rsp
SMB3.1.1: do not log warning message if server doesn't populate salt
ubifs: wbuf: Don't leak kernel memory to flash
jffs2: Fix GC exit abnormally
jffs2: Fix ignoring mounting options problem during remounting
fsnotify: generalize handle_inode_event()
inotify: convert to handle_inode_event() interface
fsnotify: fix events reported to watching parent and child
jfs: Fix array index bounds check in dbAdjTree
drm/panfrost: Fix job timeout handling
drm/panfrost: Move the GPU reset bits outside the timeout handler
platform/x86: mlx-platform: remove an unused variable
drm/amdgpu: only set DP subconnector type on DP and eDP connectors
drm/amd/display: Fix memory leaks in S3 resume
drm/dp_aux_dev: check aux_dev before use in drm_dp_aux_dev_get_by_minor()
drm/i915: Fix mismatch between misplaced vma check and vma insert
iio: ad_sigma_delta: Don't put SPI transfer buffer on the stack
spi: pxa2xx: Fix use-after-free on unbind
spi: spi-sh: Fix use-after-free on unbind
spi: atmel-quadspi: Fix use-after-free on unbind
spi: spi-mtk-nor: Don't leak SPI master in probe error path
spi: ar934x: Don't leak SPI master in probe error path
spi: davinci: Fix use-after-free on unbind
spi: fsl: fix use of spisel_boot signal on MPC8309
spi: gpio: Don't leak SPI master in probe error path
spi: mxic: Don't leak SPI master in probe error path
spi: npcm-fiu: Disable clock in probe error path
spi: pic32: Don't leak DMA channels in probe error path
spi: rb4xx: Don't leak SPI master in probe error path
spi: rpc-if: Fix use-after-free on unbind
spi: sc18is602: Don't leak SPI master in probe error path
spi: spi-geni-qcom: Fix use-after-free on unbind
spi: spi-qcom-qspi: Fix use-after-free on unbind
spi: st-ssc4: Fix unbalanced pm_runtime_disable() in probe error path
spi: synquacer: Disable clock in probe error path
spi: mt7621: Disable clock in probe error path
spi: mt7621: Don't leak SPI master in probe error path
spi: atmel-quadspi: Disable clock in probe error path
spi: atmel-quadspi: Fix AHB memory accesses
soc: qcom: smp2p: Safely acquire spinlock without IRQs
mtd: spinand: Fix OOB read
mtd: parser: cmdline: Fix parsing of part-names with colons
mtd: core: Fix refcounting for unpartitioned MTDs
mtd: rawnand: qcom: Fix DMA sync on FLASH_STATUS register read
mtd: rawnand: meson: fix meson_nfc_dma_buffer_release() arguments
scsi: qla2xxx: Fix crash during driver load on big endian machines
scsi: lpfc: Fix invalid sleeping context in lpfc_sli4_nvmet_alloc()
scsi: lpfc: Fix scheduling call while in softirq context in lpfc_unreg_rpi
scsi: lpfc: Re-fix use after free in lpfc_rq_buf_free()
openat2: reject RESOLVE_BENEATH|RESOLVE_IN_ROOT
iio: buffer: Fix demux update
iio: adc: rockchip_saradc: fix missing clk_disable_unprepare() on error in rockchip_saradc_resume
iio: imu: st_lsm6dsx: fix edge-trigger interrupts
iio:light:rpr0521: Fix timestamp alignment and prevent data leak.
iio:light:st_uvis25: Fix timestamp alignment and prevent data leak.
iio:magnetometer:mag3110: Fix alignment and data leak issues.
iio:pressure:mpl3115: Force alignment of buffer
iio:imu:bmi160: Fix too large a buffer.
iio:imu:bmi160: Fix alignment and data leak issues
iio:adc:ti-ads124s08: Fix buffer being too long.
iio:adc:ti-ads124s08: Fix alignment and data leak issues.
md/cluster: block reshape with remote resync job
md/cluster: fix deadlock when node is doing resync job
pinctrl: sunxi: Always call chained_irq_{enter, exit} in sunxi_pinctrl_irq_handler
clk: ingenic: Fix divider calculation with div tables
clk: mvebu: a3700: fix the XTAL MODE pin to MPP1_9
clk: tegra: Do not return 0 on failure
counter: microchip-tcb-capture: Fix CMR value check
device-dax/core: Fix memory leak when rmmod dax.ko
dma-buf/dma-resv: Respect num_fences when initializing the shared fence list.
driver: core: Fix list corruption after device_del()
xen-blkback: set ring->xenblkd to NULL after kthread_stop()
xen/xenbus: Allow watches discard events before queueing
xen/xenbus: Add 'will_handle' callback support in xenbus_watch_path()
xen/xenbus/xen_bus_type: Support will_handle watch callback
xen/xenbus: Count pending messages for each watch
xenbus/xenbus_backend: Disallow pending watch messages
memory: jz4780_nemc: Fix an error pointer vs NULL check in probe()
memory: renesas-rpc-if: Fix a node reference leak in rpcif_probe()
memory: renesas-rpc-if: Return correct value to the caller of rpcif_manual_xfer()
memory: renesas-rpc-if: Fix unbalanced pm_runtime_enable in rpcif_{enable,disable}_rpm
libnvdimm/namespace: Fix reaping of invalidated block-window-namespace labels
platform/x86: intel-vbtn: Allow switch events on Acer Switch Alpha 12
tracing: Disable ftrace selftests when any tracer is running
mt76: add back the SUPPORTS_REORDERING_BUFFER flag
of: fix linker-section match-table corruption
PCI: Fix pci_slot_release() NULL pointer dereference
regulator: axp20x: Fix DLDO2 voltage control register mask for AXP22x
remoteproc: sysmon: Ensure remote notification ordering
thermal/drivers/cpufreq_cooling: Update cpufreq_state only if state has changed
rtc: ep93xx: Fix NULL pointer dereference in ep93xx_rtc_read_time
Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"
null_blk: Fix zone size initialization
null_blk: Fail zone append to conventional zones
drm/edid: fix objtool warning in drm_cvt_modes()
x86/CPU/AMD: Save AMD NodeId as cpu_die_id
Linux 5.10.4Signed-off-by: Greg Kroah-Hartman
Change-Id: I25209e79d8b9faf5382087955a29b7404bdefe38 -
[ Upstream commit 597c892038e08098b17ccfe65afd9677e6979800 ]
On 2-node NUMA hosts we see bursts of kswapd reclaim and subsequent
pressure spikes and stalls from cache refaults while there is plenty of
free memory in the system.Usually, kswapd is woken up when all eligible nodes in an allocation are
full. But the code related to watermark boosting can wake kswapd on one
full node while the other one is mostly empty. This may be justified to
fight fragmentation, but is currently unconditionally done whether
watermark boosting is occurring or not.In our case, many of our workloads' throughput scales with available
memory, and pure utilization is a more tangible concern than trends
around longer-term fragmentation. As a result we generally disable
watermark boosting.Wake kswapd only woken when watermark boosting is requested.
Link: https://lkml.kernel.org/r/20201020175833.397286-1-hannes@cmpxchg.org
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Johannes Weiner
Acked-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin
02 Dec, 2020
2 commits
-
Add a PCP list for __GFP_CMA allocations so as not to deprive
MIGRATE_MOVABLE allocations quick access to order-zero pages.Bug: 158645321
Signed-off-by: Liam Mark
Signed-off-by: Chris Goldsworthy
Change-Id: I601f686097de733dedeb1c47b00693bcc25829ed -
CMA pages are designed to be used as fallback for movable allocations
and cannot be used for non-movable allocations. If CMA pages are
utilized poorly, non-movable allocations may end up getting starved if
all regular movable pages are allocated and the only pages left are
CMA. Always using CMA pages first creates unacceptable performance
problems. As a midway alternative, use CMA pages for certain
userspace allocations. The userspace pages can be migrated or dropped
quickly which giving decent utilization.Additionally, add a fall-backs for failed CMA allocations in rmqueue()
and __rmqueue_pcplist() (the latter addition being driven by a report
by the kernel test robot); these fallbacks were dealt with differently
in the original version of the patch as the rmqueue() call chain has
changed).Bug: 158645321
Link: https://lore.kernel.org/lkml/cover.1604282969.git.cgoldswo@codeaurora.org/
Reported-by: kernel test robot
Signed-off-by: Kyungmin Park
Signed-off-by: Heesub Shin
Signed-off-by: Vinayak Menon
[cgoldswo@codeaurora.org: Place in bugfixes; remove cma_alloc zone flag]
Signed-off-by: Chris Goldsworthy
Change-Id: Ibca5eedfc5eacd44542ad483851d741166715f84
20 Nov, 2020
1 commit
-
…nux/kernel/git/netdev/net") into android-mainline
Steps on the way to 5.10-rc5
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I00726ee0d08f08ae6ac5edd07c8fa502b41d4800
19 Nov, 2020
1 commit
-
The ethernet driver may allocate skb (and skb->data) via napi_alloc_skb().
This ends up to page_frag_alloc() to allocate skb->data from
page_frag_cache->va.During the memory pressure, page_frag_cache->va may be allocated as
pfmemalloc page. As a result, the skb->pfmemalloc is always true as
skb->data is from page_frag_cache->va. The skb will be dropped if the
sock (receiver) does not have SOCK_MEMALLOC. This is expected behaviour
under memory pressure.However, once kernel is not under memory pressure any longer (suppose large
amount of memory pages are just reclaimed), the page_frag_alloc() may still
re-use the prior pfmemalloc page_frag_cache->va to allocate skb->data. As a
result, the skb->pfmemalloc is always true unless page_frag_cache->va is
re-allocated, even if the kernel is not under memory pressure any longer.Here is how kernel runs into issue.
1. The kernel is under memory pressure and allocation of
PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail. Instead,
the pfmemalloc page is allocated for page_frag_cache->va.2: All skb->data from page_frag_cache->va (pfmemalloc) will have
skb->pfmemalloc=true. The skb will always be dropped by sock without
SOCK_MEMALLOC. This is an expected behaviour.3. Suppose a large amount of pages are reclaimed and kernel is not under
memory pressure any longer. We expect skb->pfmemalloc drop will not happen.4. Unfortunately, page_frag_alloc() does not proactively re-allocate
page_frag_alloc->va and will always re-use the prior pfmemalloc page. The
skb->pfmemalloc is always true even kernel is not under memory pressure any
longer.Fix this by freeing and re-allocating the page instead of recycling it.
References: https://lore.kernel.org/lkml/20201103193239.1807-1-dongli.zhang@oracle.com/
References: https://lore.kernel.org/linux-mm/20201105042140.5253-1-willy@infradead.org/
Suggested-by: Matthew Wilcox (Oracle)
Cc: Aruna Ramakrishna
Cc: Bert Barbe
Cc: Rama Nichanamatlu
Cc: Venkat Venkatsubra
Cc: Manjunath Patil
Cc: Joe Jin
Cc: SRINIVAS
Fixes: 79930f5892e1 ("net: do not deplete pfmemalloc reserve")
Signed-off-by: Dongli Zhang
Acked-by: Vlastimil Babka
Reviewed-by: Eric Dumazet
Link: https://lore.kernel.org/r/20201115201029.11903-1-dongli.zhang@oracle.com
Signed-off-by: Jakub Kicinski
26 Oct, 2020
2 commits
-
…nux/kernel/git/mchehab/linux-media") into android-mainline
Steps on the way to 5.10-rc1
Resolves conflicts in:
fs/userfaultfd.cSigned-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af -
…inux/kernel/git/netdev/net-next") into android-mainline
Steps on the way to 5.10-rc1
Resolves merge issues in:
drivers/net/virtio_net.c
net/xfrm/xfrm_state.c
net/xfrm/xfrm_user.cSigned-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3132e7802f25cb775eb02d0b3a03068da39a6fe2
25 Oct, 2020
1 commit
-
steps on the way to 5.10-rc1
Change-Id: Iddc84c25b6a9d71fa8542b927d6f69c364131c3d
Signed-off-by: Greg Kroah-Hartman
17 Oct, 2020
13 commits
-
Merge more updates from Andrew Morton:
"155 patches.Subsystems affected by this patch series: mm (dax, debug, thp,
readahead, page-poison, util, memory-hotplug, zram, cleanups), misc,
core-kernel, get_maintainer, MAINTAINERS, lib, bitops, checkpatch,
binfmt, ramfs, autofs, nilfs, rapidio, panic, relay, kgdb, ubsan,
romfs, and fault-injection"* emailed patches from Andrew Morton : (155 commits)
lib, uaccess: add failure injection to usercopy functions
lib, include/linux: add usercopy failure capability
ROMFS: support inode blocks calculation
ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for Clang
sched.h: drop in_ubsan field when UBSAN is in trap mode
scripts/gdb/tasks: add headers and improve spacing format
scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command
kernel/relay.c: drop unneeded initialization
panic: dump registers on panic_on_warn
rapidio: fix the missed put_device() for rio_mport_add_riodev
rapidio: fix error handling path
nilfs2: fix some kernel-doc warnings for nilfs2
autofs: harden ioctl table
ramfs: fix nommu mmap with gaps in the page cache
mm: remove the now-unnecessary mmget_still_valid() hack
mm/gup: take mmap_lock in get_dump_page()
binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot
coredump: rework elf/elf_fdpic vma_dump_size() into common helper
coredump: refactor page range dumping into common helper
coredump: let dump_emit() bail out on short writes
... -
The current page_order() can only be called on pages in the buddy
allocator. For compound pages, you have to use compound_order(). This is
confusing and led to a bug, so rename page_order() to buddy_order().Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/20201001152259.14932-2-willy@infradead.org
Signed-off-by: Linus Torvalds -
__free_pages_core() is used when exposing fresh memory to the buddy during
system boot and when onlining memory in generic_online_page().generic_online_page() is used in two cases:
1. Direct memory onlining in online_pages().
2. Deferred memory onlining in memory-ballooning-like mechanisms (HyperV
balloon and virtio-mem), when parts of a section are kept
fake-offline to be fake-onlined later on.In 1, we already place pages to the tail of the freelist. Pages will be
freed to MIGRATE_ISOLATE lists first and moved to the tail of the
freelists via undo_isolate_page_range().In 2, we currently don't implement a proper rule. In case of virtio-mem,
where we currently always online MAX_ORDER - 1 pages, the pages will be
placed to the HEAD of the freelist - undesireable. While the hyper-v
balloon calls generic_online_page() with single pages, usually it will
call it on successive single pages in a larger block.The pages are fresh, so place them to the tail of the freelist and avoid
the PCP. In __free_pages_core(), remove the now superflouos call to
set_page_refcounted() and add a comment regarding page initialization and
the refcount.Note: In 2. we currently don't shuffle. If ever relevant (page shuffling
is usually of limited use in virtualized environments), we might want to
shuffle after a sequence of generic_online_page() calls in the relevant
callers.Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Oscar Salvador
Reviewed-by: Wei Yang
Acked-by: Pankaj Gupta
Acked-by: Michal Hocko
Cc: Alexander Duyck
Cc: Mel Gorman
Cc: Dave Hansen
Cc: Mike Rapoport
Cc: "K. Y. Srinivasan"
Cc: Haiyang Zhang
Cc: Stephen Hemminger
Cc: Wei Liu
Cc: Matthew Wilcox
Cc: Michael Ellerman
Cc: Michal Hocko
Cc: Scott Cheloha
Link: https://lkml.kernel.org/r/20201005121534.15649-5-david@redhat.com
Signed-off-by: Linus Torvalds -
Whenever we move pages between freelists via move_to_free_list()/
move_freepages_block(), we don't actually touch the pages:
1. Page isolation doesn't actually touch the pages, it simply isolates
pageblocks and moves all free pages to the MIGRATE_ISOLATE freelist.
When undoing isolation, we move the pages back to the target list.
2. Page stealing (steal_suitable_fallback()) moves free pages directly
between lists without touching them.
3. reserve_highatomic_pageblock()/unreserve_highatomic_pageblock() moves
free pages directly between freelists without touching them.We already place pages to the tail of the freelists when undoing isolation
via __putback_isolated_page(), let's do it in any case (e.g., if order
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Reviewed-by: Wei Yang
Acked-by: Pankaj Gupta
Acked-by: Michal Hocko
Cc: Alexander Duyck
Cc: Mel Gorman
Cc: Dave Hansen
Cc: Vlastimil Babka
Cc: Mike Rapoport
Cc: Scott Cheloha
Cc: Michael Ellerman
Cc: Haiyang Zhang
Cc: "K. Y. Srinivasan"
Cc: Matthew Wilcox
Cc: Michal Hocko
Cc: Stephen Hemminger
Cc: Wei Liu
Link: https://lkml.kernel.org/r/20201005121534.15649-4-david@redhat.com
Signed-off-by: Linus Torvalds -
__putback_isolated_page() already documents that pages will be placed to
the tail of the freelist - this is, however, not the case for "order >=
MAX_ORDER - 2" (see buddy_merge_likely()) - which should be the case for
all existing users.This change affects two users:
- free page reporting
- page isolation, when undoing the isolation (including memory onlining).This behavior is desirable for pages that haven't really been touched
lately, so exactly the two users that don't actually read/write page
content, but rather move untouched pages.The new behavior is especially desirable for memory onlining, where we
allow allocation of newly onlined pages via undo_isolate_page_range() in
online_pages(). Right now, we always place them to the head of the
freelist, resulting in undesireable behavior: Assume we add individual
memory chunks via add_memory() and online them right away to the NORMAL
zone. We create a dependency chain of unmovable allocations e.g., via the
memmap. The memmap of the next chunk will be placed onto previous chunks
- if the last block cannot get offlined+removed, all dependent ones cannot
get offlined+removed. While this can already be observed with individual
DIMMs, it's more of an issue for virtio-mem (and I suspect also ppc
DLPAR).Document that this should only be used for optimizations, and no code
should rely on this behavior for correction (if the order of the freelists
ever changes).We won't care about page shuffling: memory onlining already properly
shuffles after onlining. free page reporting doesn't care about
physically contiguous ranges, and there are already cases where page
isolation will simply move (physically close) free pages to (currently)
the head of the freelists via move_freepages_block() instead of shuffling.
If this becomes ever relevant, we should shuffle the whole zone when
undoing isolation of larger ranges, and after free_contig_range().Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Alexander Duyck
Reviewed-by: Oscar Salvador
Reviewed-by: Wei Yang
Reviewed-by: Pankaj Gupta
Acked-by: Michal Hocko
Cc: Mel Gorman
Cc: Dave Hansen
Cc: Vlastimil Babka
Cc: Mike Rapoport
Cc: Scott Cheloha
Cc: Michael Ellerman
Cc: Haiyang Zhang
Cc: "K. Y. Srinivasan"
Cc: Matthew Wilcox
Cc: Michal Hocko
Cc: Stephen Hemminger
Cc: Wei Liu
Link: https://lkml.kernel.org/r/20201005121534.15649-3-david@redhat.com
Signed-off-by: Linus Torvalds -
Patch series "mm: place pages to the freelist tail when onlining and undoing isolation", v2.
When adding separate memory blocks via add_memory*() and onlining them
immediately, the metadata (especially the memmap) of the next block will
be placed onto one of the just added+onlined block. This creates a chain
of unmovable allocations: If the last memory block cannot get
offlined+removed() so will all dependent ones. We directly have unmovable
allocations all over the place.This can be observed quite easily using virtio-mem, however, it can also
be observed when using DIMMs. The freshly onlined pages will usually be
placed to the head of the freelists, meaning they will be allocated next,
turning the just-added memory usually immediately un-removable. The fresh
pages are cold, prefering to allocate others (that might be hot) also
feels to be the natural thing to do.It also applies to the hyper-v balloon xen-balloon, and ppc64 dlpar: when
adding separate, successive memory blocks, each memory block will have
unmovable allocations on them - for example gigantic pages will fail to
allocate.While the ZONE_NORMAL doesn't provide any guarantees that memory can get
offlined+removed again (any kind of fragmentation with unmovable
allocations is possible), there are many scenarios (hotplugging a lot of
memory, running workload, hotunplug some memory/as much as possible) where
we can offline+remove quite a lot with this patchset.a) To visualize the problem, a very simple example:
Start a VM with 4GB and 8GB of virtio-mem memory:
[root@localhost ~]# lsmem
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000000000000-0x00000000bfffffff 3G online yes 0-23
0x0000000100000000-0x000000033fffffff 9G online yes 32-103Memory block size: 128M
Total online memory: 12G
Total offline memory: 0BThen try to unplug as much as possible using virtio-mem. Observe which
memory blocks are still around. Without this patch set:[root@localhost ~]# lsmem
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000000000000-0x00000000bfffffff 3G online yes 0-23
0x0000000100000000-0x000000013fffffff 1G online yes 32-39
0x0000000148000000-0x000000014fffffff 128M online yes 41
0x0000000158000000-0x000000015fffffff 128M online yes 43
0x0000000168000000-0x000000016fffffff 128M online yes 45
0x0000000178000000-0x000000017fffffff 128M online yes 47
0x0000000188000000-0x0000000197ffffff 256M online yes 49-50
0x00000001a0000000-0x00000001a7ffffff 128M online yes 52
0x00000001b0000000-0x00000001b7ffffff 128M online yes 54
0x00000001c0000000-0x00000001c7ffffff 128M online yes 56
0x00000001d0000000-0x00000001d7ffffff 128M online yes 58
0x00000001e0000000-0x00000001e7ffffff 128M online yes 60
0x00000001f0000000-0x00000001f7ffffff 128M online yes 62
0x0000000200000000-0x0000000207ffffff 128M online yes 64
0x0000000210000000-0x0000000217ffffff 128M online yes 66
0x0000000220000000-0x0000000227ffffff 128M online yes 68
0x0000000230000000-0x0000000237ffffff 128M online yes 70
0x0000000240000000-0x0000000247ffffff 128M online yes 72
0x0000000250000000-0x0000000257ffffff 128M online yes 74
0x0000000260000000-0x0000000267ffffff 128M online yes 76
0x0000000270000000-0x0000000277ffffff 128M online yes 78
0x0000000280000000-0x0000000287ffffff 128M online yes 80
0x0000000290000000-0x0000000297ffffff 128M online yes 82
0x00000002a0000000-0x00000002a7ffffff 128M online yes 84
0x00000002b0000000-0x00000002b7ffffff 128M online yes 86
0x00000002c0000000-0x00000002c7ffffff 128M online yes 88
0x00000002d0000000-0x00000002d7ffffff 128M online yes 90
0x00000002e0000000-0x00000002e7ffffff 128M online yes 92
0x00000002f0000000-0x00000002f7ffffff 128M online yes 94
0x0000000300000000-0x0000000307ffffff 128M online yes 96
0x0000000310000000-0x0000000317ffffff 128M online yes 98
0x0000000320000000-0x0000000327ffffff 128M online yes 100
0x0000000330000000-0x000000033fffffff 256M online yes 102-103Memory block size: 128M
Total online memory: 8.1G
Total offline memory: 0BWith this patch set:
[root@localhost ~]# lsmem
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000000000000-0x00000000bfffffff 3G online yes 0-23
0x0000000100000000-0x000000013fffffff 1G online yes 32-39Memory block size: 128M
Total online memory: 4G
Total offline memory: 0BAll memory can get unplugged, all memory block can get removed. Of
course, no workload ran and the system was basically idle, but it
highlights the issue - the fairly deterministic chain of unmovable
allocations. When a huge page for the 2MB memmap is needed, a
just-onlined 4MB page will be split. The remaining 2MB page will be used
for the memmap of the next memory block. So one memory block will hold
the memmap of the two following memory blocks. Finally the pages of the
last-onlined memory block will get used for the next bigger allocations -
if any allocation is unmovable, all dependent memory blocks cannot get
unplugged and removed until that allocation is gone.Note that with bigger memory blocks (e.g., 256MB), *all* memory
blocks are dependent and none can get unplugged again!b) Experiment with memory intensive workload
I performed an experiment with an older version of this patch set (before
we used undo_isolate_page_range() in online_pages(): Hotplug 56GB to a VM
with an initial 4GB, onlining all memory to ZONE_NORMAL right from the
kernel when adding it. I then run various memory intensive workloads that
consume most system memory for a total of 45 minutes. Once finished, I
try to unplug as much memory as possible.With this change, I am able to remove via virtio-mem (adding individual
128MB memory blocks) 413 out of 448 added memory blocks. Via individual
(256MB) DIMMs 380 out of 448 added memory blocks. (I don't have any
numbers without this patchset, but looking at the above example, it's at
most half of the 448 memory blocks for virtio-mem, and most probably none
for DIMMs).Again, there are workloads that might behave very differently due to the
nature of ZONE_NORMAL.This change also affects (besides memory onlining):
- Other users of undo_isolate_page_range(): Pages are always placed to the
tail.
-- When memory offlining fails
-- When memory isolation fails after having isolated some pageblocks
-- When alloc_contig_range() either succeeds or fails
- Other users of __putback_isolated_page(): Pages are always placed to the
tail.
-- Free page reporting
- Other users of __free_pages_core()
-- AFAIKs, any memory that is getting exposed to the buddy during boot.
IIUC we will now usually allocate memory from lower addresses within
a zone first (especially during boot).
- Other users of generic_online_page()
-- Hyper-V balloonThis patch (of 5):
Let's prepare for additional flags and avoid long parameter lists of
bools. Follow-up patches will also make use of the flags in
__free_pages_ok().Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Alexander Duyck
Reviewed-by: Vlastimil Babka
Reviewed-by: Oscar Salvador
Reviewed-by: Wei Yang
Reviewed-by: Pankaj Gupta
Acked-by: Michal Hocko
Cc: Mel Gorman
Cc: Dave Hansen
Cc: Mike Rapoport
Cc: Matthew Wilcox
Cc: Haiyang Zhang
Cc: "K. Y. Srinivasan"
Cc: Michael Ellerman
Cc: Michal Hocko
Cc: Scott Cheloha
Cc: Stephen Hemminger
Cc: Wei Liu
Cc: Michal Hocko
Link: https://lkml.kernel.org/r/20201005121534.15649-1-david@redhat.com
Link: https://lkml.kernel.org/r/20201005121534.15649-2-david@redhat.com
Signed-off-by: Linus Torvalds -
On the memory onlining path, we want to start with MIGRATE_ISOLATE, to
un-isolate the pages after memory onlining is complete. Let's allow
passing in the migratetype.Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Wei Yang
Cc: Baoquan He
Cc: Pankaj Gupta
Cc: Tony Luck
Cc: Fenghua Yu
Cc: Logan Gunthorpe
Cc: Dan Williams
Cc: Mike Rapoport
Cc: "Matthew Wilcox (Oracle)"
Cc: Michel Lespinasse
Cc: Charan Teja Reddy
Cc: Mel Gorman
Link: https://lkml.kernel.org/r/20200819175957.28465-10-david@redhat.com
Signed-off-by: Linus Torvalds -
Commit ac5d2539b238 ("mm: meminit: reduce number of times pageblocks are
set during struct page init") moved the actual zone range check, leaving
only the alignment check for pageblocks.Let's drop the stale comment and make the pageblock check easier to read.
Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Wei Yang
Cc: Baoquan He
Cc: Pankaj Gupta
Cc: Mel Gorman
Cc: Charan Teja Reddy
Cc: Dan Williams
Cc: Fenghua Yu
Cc: Logan Gunthorpe
Cc: "Matthew Wilcox (Oracle)"
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Mike Rapoport
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200819175957.28465-9-david@redhat.com
Signed-off-by: Linus Torvalds -
Callers no longer need the number of isolated pageblocks. Let's simplify.
Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Wei Yang
Cc: Baoquan He
Cc: Pankaj Gupta
Cc: Charan Teja Reddy
Cc: Dan Williams
Cc: Fenghua Yu
Cc: Logan Gunthorpe
Cc: "Matthew Wilcox (Oracle)"
Cc: Mel Gorman
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Mike Rapoport
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200819175957.28465-7-david@redhat.com
Signed-off-by: Linus Torvalds -
offline_pages() is the only user. __offline_isolated_pages() never gets
called with ranges that contain memory holes and we no longer care about
the return value. Drop the return value handling and all pfn_valid()
checks.Update the documentation.
Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Wei Yang
Cc: Baoquan He
Cc: Pankaj Gupta
Cc: Charan Teja Reddy
Cc: Dan Williams
Cc: Fenghua Yu
Cc: Logan Gunthorpe
Cc: "Matthew Wilcox (Oracle)"
Cc: Mel Gorman
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Mike Rapoport
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200819175957.28465-5-david@redhat.com
Signed-off-by: Linus Torvalds -
This patch changes the way we set and handle in-use poisoned pages. Until
now, poisoned pages were released to the buddy allocator, trusting that
the checks that take place at allocation time would act as a safe net and
would skip that page.This has proved to be wrong, as we got some pfn walkers out there, like
compaction, that all they care is the page to be in a buddy freelist.Although this might not be the only user, having poisoned pages in the
buddy allocator seems a bad idea as we should only have free pages that
are ready and meant to be used as such.Before explaining the taken approach, let us break down the kind of pages
we can soft offline.- Anonymous THP (after the split, they end up being 4K pages)
- Hugetlb
- Order-0 pages (that can be either migrated or invalited)* Normal pages (order-0 and anon-THP)
- If they are clean and unmapped page cache pages, we invalidate
then by means of invalidate_inode_page().
- If they are mapped/dirty, we do the isolate-and-migrate dance.Either way, do not call put_page directly from those paths. Instead, we
keep the page and send it to page_handle_poison to perform the right
handling.page_handle_poison sets the HWPoison flag and does the last put_page.
Down the chain, we placed a check for HWPoison page in
free_pages_prepare, that just skips any poisoned page, so those pages
do not end up in any pcplist/freelist.After that, we set the refcount on the page to 1 and we increment
the poisoned pages counter.If we see that the check in free_pages_prepare creates trouble, we can
always do what we do for free pages:- wait until the page hits buddy's freelists
- take it off, and flag itThe downside of the above approach is that we could race with an
allocation, so by the time we want to take the page off the buddy, the
page has been already allocated so we cannot soft offline it.
But the user could always retry it.* Hugetlb pages
- We isolate-and-migrate them
After the migration has been successful, we call dissolve_free_huge_page,
and we set HWPoison on the page if we succeed.
Hugetlb has a slightly different handling though.While for non-hugetlb pages we cared about closing the race with an
allocation, doing so for hugetlb pages requires quite some additional
and intrusive code (we would need to hook in free_huge_page and some other
places).
So I decided to not make the code overly complicated and just fail
normally if the page we allocated in the meantime.We can always build on top of this.
As a bonus, because of the way we handle now in-use pages, we no longer
need the put-as-isolation-migratetype dance, that was guarding for poisoned
pages to end up in pcplists.Signed-off-by: Oscar Salvador
Signed-off-by: Andrew Morton
Acked-by: Naoya Horiguchi
Cc: "Aneesh Kumar K.V"
Cc: Aneesh Kumar K.V
Cc: Aristeu Rozanski
Cc: Dave Hansen
Cc: David Hildenbrand
Cc: Dmitry Yakunin
Cc: Michal Hocko
Cc: Mike Kravetz
Cc: Oscar Salvador
Cc: Qian Cai
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200922135650.1634-10-osalvador@suse.de
Signed-off-by: Linus Torvalds -
When trying to soft-offline a free page, we need to first take it off the
buddy allocator. Once we know is out of reach, we can safely flag it as
poisoned.take_page_off_buddy will be used to take a page meant to be poisoned off
the buddy allocator. take_page_off_buddy calls break_down_buddy_pages,
which splits a higher-order page in case our page belongs to one.Once the page is under our control, we call page_handle_poison to set it
as poisoned and grab a refcount on it.Signed-off-by: Oscar Salvador
Signed-off-by: Andrew Morton
Acked-by: Naoya Horiguchi
Cc: "Aneesh Kumar K.V"
Cc: Aneesh Kumar K.V
Cc: Aristeu Rozanski
Cc: Dave Hansen
Cc: David Hildenbrand
Cc: Dmitry Yakunin
Cc: Michal Hocko
Cc: Mike Kravetz
Cc: Oscar Salvador
Cc: Qian Cai
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200922135650.1634-9-osalvador@suse.de
Signed-off-by: Linus Torvalds -
The implementation of split_page_owner() prefers a count rather than the
old order of the page. When we support a variable size THP, we won't
have the order at this point, but we will have the number of pages.
So change the interface to what the caller and callee would prefer.Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: SeongJae Park
Acked-by: Kirill A. Shutemov
Cc: Huang Ying
Link: https://lkml.kernel.org/r/20200908195539.25896-4-willy@infradead.org
Signed-off-by: Linus Torvalds
16 Oct, 2020
1 commit
-
Pull networking updates from Jakub Kicinski:
- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.- Allow more than 255 IPv4 multicast interfaces.
- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.- Allow more calls to same peer in RxRPC.
- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.- Add TC actions for implementing MPLS L2 VPNs.
- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.- Support sleepable BPF programs, initially targeting LSM and tracing.
- Add bpf_d_path() helper for returning full path for given 'struct
path'.- Make bpf_tail_call compatible with bpf-to-bpf calls.
- Allow BPF programs to call map_update_elem on sockmaps.
- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).- Support listing and getting information about bpf_links via the bpf
syscall.- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.- Add XDP support for Intel's igb driver.
- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...
14 Oct, 2020
11 commits
-
for_each_memblock() is used to iterate over memblock.memory in a few
places that use data from memblock_region rather than the memory ranges.Introduce separate for_each_mem_region() and
for_each_reserved_mem_region() to improve encapsulation of memblock
internals from its users.Signed-off-by: Mike Rapoport
Signed-off-by: Andrew Morton
Reviewed-by: Baoquan He
Acked-by: Ingo Molnar [x86]
Acked-by: Thomas Bogendoerfer [MIPS]
Acked-by: Miguel Ojeda [.clang-format]
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christoph Hellwig
Cc: Daniel Axtens
Cc: Dave Hansen
Cc: Emil Renner Berthing
Cc: Hari Bathini
Cc: Ingo Molnar
Cc: Jonathan Cameron
Cc: Marek Szyprowski
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Peter Zijlstra
Cc: Russell King
Cc: Stafford Horne
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: Yoshinori Sato
Link: https://lkml.kernel.org/r/20200818151634.14343-18-rppt@kernel.org
Signed-off-by: Linus Torvalds -
Currently for_each_mem_range() and for_each_mem_range_rev() iterators are
the most generic way to traverse memblock regions. As such, they have 8
parameters and they are hardly convenient to users. Most users choose to
utilize one of their wrappers and the only user that actually needs most
of the parameters is memblock itself.To avoid yet another naming for memblock iterators, rename the existing
for_each_mem_range[_rev]() to __for_each_mem_range[_rev]() and add a new
for_each_mem_range[_rev]() wrappers with only index, start and end
parameters.The new wrapper nicely fits into init_unavailable_mem() and will be used
in upcoming changes to simplify memblock traversals.Signed-off-by: Mike Rapoport
Signed-off-by: Andrew Morton
Acked-by: Thomas Bogendoerfer [MIPS]
Cc: Andy Lutomirski
Cc: Baoquan He
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christoph Hellwig
Cc: Daniel Axtens
Cc: Dave Hansen
Cc: Emil Renner Berthing
Cc: Hari Bathini
Cc: Ingo Molnar
Cc: Ingo Molnar
Cc: Jonathan Cameron
Cc: Marek Szyprowski
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Miguel Ojeda
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Peter Zijlstra
Cc: Russell King
Cc: Stafford Horne
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: Yoshinori Sato
Link: https://lkml.kernel.org/r/20200818151634.14343-11-rppt@kernel.org
Signed-off-by: Linus Torvalds -
Here is a very rare race which leaks memory:
Page P0 is allocated to the page cache. Page P1 is free.
Thread A Thread B Thread C
find_get_entry():
xas_load() returns P0
Removes P0 from page cache
P0 finds its buddy P1
alloc_pages(GFP_KERNEL, 1) returns P0
P0 has refcount 1
page_cache_get_speculative(P0)
P0 has refcount 2
__free_pages(P0)
P0 has refcount 1
put_page(P0)
P1 is not freedFix this by freeing all the pages in __free_pages() that won't be freed
by the call to put_page(). It's usually not a good idea to split a page,
but this is a very unlikely scenario.Fixes: e286781d5f2e ("mm: speculative page references")
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Acked-by: Mike Rapoport
Cc: Nick Piggin
Cc: Hugh Dickins
Cc: Peter Zijlstra
Link: https://lkml.kernel.org/r/20200926213919.26642-1-willy@infradead.org
Signed-off-by: Linus Torvalds -
Previously 'for_next_zone_zonelist_nodemask' macro parameter 'zlist' was
unused so this patch removes it.Signed-off-by: Mateusz Nosek
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200917211906.30059-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds -
__perform_reclaim()'s single caller expects it to return 'unsigned long',
hence change its return value and a local variable to 'unsigned long'.Suggested-by: Andrew Morton
Signed-off-by: Yanfei Xu
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200916022138.16740-1-yanfei.xu@windriver.com
Signed-off-by: Linus Torvalds -
finalise_ac() is just 'epilogue' for 'prepare_alloc_pages'. Therefore
there is no need to keep them both so 'finalise_ac' content can be merged
into prepare_alloc_pages() code. It would make __alloc_pages_nodemask()
cleaner when it comes to readability.Signed-off-by: Mateusz Nosek
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Cc: Mel Gorman
Cc: Mike Rapoport
Link: https://lkml.kernel.org/r/20200916110118.6537-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds -
Previously in '__init early_init_on_alloc' and '__init early_init_on_free'
the return values from 'kstrtobool' were not handled properly. That
caused potential garbage value read from variable 'bool_result'.
Introduced patch fixes error handling.Signed-off-by: Mateusz Nosek
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200916214125.28271-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds -
Previously flags check was separated into two separated checks with two
separated branches. In case of presence of any of two mentioned flags,
the same effect on flow occurs. Therefore checks can be merged and one
branch can be avoided.Signed-off-by: Mateusz Nosek
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200911092310.31136-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds -
Previously variable 'tmp' was initialized, but was not read later before
reassigning. So the initialization can be removed.[akpm@linux-foundation.org: remove `tmp' altogether]
Signed-off-by: Mateusz Nosek
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200904132422.17387-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds -
In has_unmovable_pages(), the page parameter would not always be the first
page within a pageblock (see how the page pointer is passed in from
start_isolate_page_range() after call __first_valid_page()), so that would
cause checking unmovable pages span two pageblocks.After this patch, the checking is enforced within one pageblock no matter
the page is first one or not, and obey the semantics of this function.This issue is found by code inspection.
Michal said "this might lead to false negatives when an unrelated block
would cause an isolation failure".Signed-off-by: Li Xinhai
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: David Hildenbrand
Link: https://lkml.kernel.org/r/20200824065811.383266-1-lixinhai.lxh@gmail.com
Signed-off-by: Linus Torvalds -
Patch series "mm / virtio-mem: support ZONE_MOVABLE", v5.
When introducing virtio-mem, the semantics of ZONE_MOVABLE were rather
unclear, which is why we special-cased ZONE_MOVABLE such that partially
plugged blocks would never end up in ZONE_MOVABLE.Now that the semantics are much clearer (and are documented in patch #6),
let's support partially plugged memory blocks in ZONE_MOVABLE, allowing
partially plugged memory blocks to be online to ZONE_MOVABLE and also
unplugging from such memory blocks. This avoids surprises when onlining
of memory blocks suddenly fails, just because they are not completely
populated by virtio-mem (yet).This is especially helpful for testing, but also paves the way for
virtio-mem optimizations, allowing more memory to get reliably unplugged.Cleanup has_unmovable_pages() and set_migratetype_isolate(), providing
better documentation of how ZONE_MOVABLE interacts with different kind of
unmovable pages (memory offlining vs. alloc_contig_range()).This patch (of 6):
Let's move the split comment regarding bootmem allocations and memory
holes, especially in the context of ZONE_MOVABLE, to the PageReserved()
check.Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Baoquan He
Cc: Michal Hocko
Cc: Michael S. Tsirkin
Cc: Mike Kravetz
Cc: Pankaj Gupta
Cc: Jason Wang
Cc: Mike Rapoport
Cc: Qian Cai
Link: http://lkml.kernel.org/r/20200816125333.7434-1-david@redhat.com
Link: http://lkml.kernel.org/r/20200816125333.7434-2-david@redhat.com
Signed-off-by: Linus Torvalds
12 Oct, 2020
2 commits
-
Linux 5.9
Change-Id: Ic4308a3e2a4015058efdac52bd51794b604c8435
Signed-off-by: Greg Kroah-Hartman -
When memory is hotplug added or removed the min_free_kbytes should be
recalculated based on what is expected by khugepaged. Currently after
hotplug, min_free_kbytes will be set to a lower default and higher
default set when THP enabled is lost.This change restores min_free_kbytes as expected for THP consumers.
[vijayb@linux.microsoft.com: v5]
Link: https://lkml.kernel.org/r/1601398153-5517-1-git-send-email-vijayb@linux.microsoft.comFixes: f000565adb77 ("thp: set recommended min free kbytes")
Signed-off-by: Vijay Balakrishna
Signed-off-by: Andrew Morton
Reviewed-by: Pavel Tatashin
Acked-by: Michal Hocko
Cc: Allen Pais
Cc: Andrea Arcangeli
Cc: "Kirill A. Shutemov"
Cc: Oleg Nesterov
Cc: Song Liu
Cc:
Link: https://lkml.kernel.org/r/1600305709-2319-2-git-send-email-vijayb@linux.microsoft.com
Link: https://lkml.kernel.org/r/1600204258-13683-1-git-send-email-vijayb@linux.microsoft.com
Signed-off-by: Linus Torvalds
06 Oct, 2020
2 commits
-
Linux 5.9-rc8
Signed-off-by: Greg Kroah-Hartman
Change-Id: I4a03857522443f5d032a0cc1ecdced3675255d5e -
Rejecting non-native endian BTF overlapped with the addition
of support for it.The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.Signed-off-by: David S. Miller