30 Dec, 2020

2 commits

  • Changes in 5.10.4
    hwmon: (k10temp) Remove support for displaying voltage and current on Zen CPUs
    drm/gma500: fix double free of gma_connector
    iio: adc: at91_adc: add Kconfig dep on the OF symbol and remove of_match_ptr()
    drm/aspeed: Fix Kconfig warning & subsequent build errors
    drm/mcde: Fix handling of platform_get_irq() error
    drm/tve200: Fix handling of platform_get_irq() error
    arm64: dts: renesas: hihope-rzg2-ex: Drop rxc-skew-ps from ethernet-phy node
    arm64: dts: renesas: cat875: Remove rxc-skew-ps from ethernet-phy node
    soc: renesas: rmobile-sysc: Fix some leaks in rmobile_init_pm_domains()
    soc: mediatek: Check if power domains can be powered on at boot time
    arm64: dts: mediatek: mt8183: fix gce incorrect mbox-cells value
    arm64: dts: ipq6018: update the reserved-memory node
    arm64: dts: qcom: sc7180: Fix one forgotten interconnect reference
    soc: qcom: geni: More properly switch to DMA mode
    Revert "i2c: i2c-qcom-geni: Fix DMA transfer race"
    RDMA/bnxt_re: Set queue pair state when being queried
    rtc: pcf2127: fix pcf2127_nvmem_read/write() returns
    RDMA/bnxt_re: Fix entry size during SRQ create
    selinux: fix error initialization in inode_doinit_with_dentry()
    ARM: dts: aspeed-g6: Fix the GPIO memory size
    ARM: dts: aspeed: s2600wf: Fix VGA memory region location
    RDMA/core: Fix error return in _ib_modify_qp()
    RDMA/rxe: Compute PSN windows correctly
    x86/mm/ident_map: Check for errors from ident_pud_init()
    ARM: p2v: fix handling of LPAE translation in BE mode
    RDMA/rtrs-clt: Remove destroy_con_cq_qp in case route resolving failed
    RDMA/rtrs-clt: Missing error from rtrs_rdma_conn_established
    RDMA/rtrs-srv: Don't guard the whole __alloc_srv with srv_mutex
    x86/apic: Fix x2apic enablement without interrupt remapping
    ASoC: qcom: fix unsigned int bitwidth compared to less than zero
    sched/deadline: Fix sched_dl_global_validate()
    sched: Reenable interrupts in do_sched_yield()
    drm/amdgpu: fix incorrect enum type
    crypto: talitos - Endianess in current_desc_hdr()
    crypto: talitos - Fix return type of current_desc_hdr()
    crypto: inside-secure - Fix sizeof() mismatch
    ASoC: sun4i-i2s: Fix lrck_period computation for I2S justified mode
    drm/msm: Add missing stub definition
    ARM: dts: aspeed: tiogapass: Remove vuart
    drm/amdgpu: fix build_coefficients() argument
    powerpc/64: Set up a kernel stack for secondaries before cpu_restore()
    spi: img-spfi: fix reference leak in img_spfi_resume
    f2fs: call f2fs_get_meta_page_retry for nat page
    RDMA/mlx5: Fix corruption of reg_pages in mlx5_ib_rereg_user_mr()
    perf test: Use generic event for expand_libpfm_events()
    drm/msm/dp: DisplayPort PHY compliance tests fixup
    drm/msm/dsi_pll_7nm: restore VCO rate during restore_state
    drm/msm/dsi_pll_10nm: restore VCO rate during restore_state
    drm/msm/dpu: fix clock scaling on non-sc7180 board
    spi: spi-mem: fix reference leak in spi_mem_access_start
    scsi: aacraid: Improve compat_ioctl handlers
    pinctrl: core: Add missing #ifdef CONFIG_GPIOLIB
    ASoC: pcm: DRAIN support reactivation
    drm/bridge: tpd12s015: Fix irq registering in tpd12s015_probe
    crypto: arm64/poly1305-neon - reorder PAC authentication with SP update
    crypto: arm/aes-neonbs - fix usage of cbc(aes) fallback
    crypto: caam - fix printing on xts fallback allocation error path
    selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling
    nl80211/cfg80211: fix potential infinite loop
    spi: stm32: fix reference leak in stm32_spi_resume
    bpf: Fix tests for local_storage
    x86/mce: Correct the detection of invalid notifier priorities
    drm/edid: Fix uninitialized variable in drm_cvt_modes()
    ath11k: Initialize complete alpha2 for regulatory change
    ath11k: Fix number of rules in filtered ETSI regdomain
    ath11k: fix wmi init configuration
    brcmfmac: Fix memory leak for unpaired brcmf_{alloc/free}
    arm64: dts: exynos: Include common syscon restart/poweroff for Exynos7
    arm64: dts: exynos: Correct psci compatible used on Exynos7
    drm/panel: simple: Add flags to boe_nv133fhm_n61
    Bluetooth: Fix null pointer dereference in hci_event_packet()
    Bluetooth: Fix: LL PRivacy BLE device fails to connect
    Bluetooth: hci_h5: fix memory leak in h5_close
    spi: stm32-qspi: fix reference leak in stm32 qspi operations
    spi: spi-ti-qspi: fix reference leak in ti_qspi_setup
    spi: mt7621: fix missing clk_disable_unprepare() on error in mt7621_spi_probe
    spi: tegra20-slink: fix reference leak in slink ops of tegra20
    spi: tegra20-sflash: fix reference leak in tegra_sflash_resume
    spi: tegra114: fix reference leak in tegra spi ops
    spi: bcm63xx-hsspi: fix missing clk_disable_unprepare() on error in bcm63xx_hsspi_resume
    spi: imx: fix reference leak in two imx operations
    ASoC: qcom: common: Fix refcounting in qcom_snd_parse_of()
    ath11k: Handle errors if peer creation fails
    mwifiex: fix mwifiex_shutdown_sw() causing sw reset failure
    drm/msm/a6xx: Clear shadow on suspend
    drm/msm/a5xx: Clear shadow on suspend
    firmware: tegra: fix strncpy()/strncat() confusion
    drm/msm/dp: return correct connection status after suspend
    drm/msm/dp: skip checking LINK_STATUS_UPDATED bit
    drm/msm/dp: do not notify audio subsystem if sink doesn't support audio
    selftests/run_kselftest.sh: fix dry-run typo
    selftest/bpf: Add missed ip6ip6 test back
    ASoC: wm8994: Fix PM disable depth imbalance on error
    ASoC: wm8998: Fix PM disable depth imbalance on error
    spi: sprd: fix reference leak in sprd_spi_remove
    virtiofs fix leak in setup
    ASoC: arizona: Fix a wrong free in wm8997_probe
    RDMa/mthca: Work around -Wenum-conversion warning
    ASoC: SOF: Intel: fix Kconfig dependency for SND_INTEL_DSP_CONFIG
    arm64: dts: ti: k3-am65*/j721e*: Fix unit address format error for dss node
    MIPS: BCM47XX: fix kconfig dependency bug for BCM47XX_BCMA
    drm/amdgpu: fix compute queue priority if num_kcq is less than 4
    soc: ti: omap-prm: Do not check rstst bit on deassert if already deasserted
    crypto: Kconfig - CRYPTO_MANAGER_EXTRA_TESTS requires the manager
    crypto: qat - fix status check in qat_hal_put_rel_rd_xfer()
    firmware: arm_scmi: Fix missing destroy_workqueue()
    drm/udl: Fix missing error code in udl_handle_damage()
    staging: greybus: codecs: Fix reference counter leak in error handling
    staging: gasket: interrupt: fix the missed eventfd_ctx_put() in gasket_interrupt.c
    scripts: kernel-doc: Restore anonymous enum parsing
    drm/amdkfd: Put ACPI table after using it
    ionic: use mc sync for multicast filters
    ionic: flatten calls to ionic_lif_rx_mode
    ionic: change set_rx_mode from_ndo to can_sleep
    media: tm6000: Fix sizeof() mismatches
    media: platform: add missing put_device() call in mtk_jpeg_clk_init()
    media: mtk-vcodec: add missing put_device() call in mtk_vcodec_init_dec_pm()
    media: mtk-vcodec: add missing put_device() call in mtk_vcodec_release_dec_pm()
    media: mtk-vcodec: add missing put_device() call in mtk_vcodec_init_enc_pm()
    media: v4l2-fwnode: Return -EINVAL for invalid bus-type
    media: v4l2-fwnode: v4l2_fwnode_endpoint_parse caller must init vep argument
    media: ov5640: fix support of BT656 bus mode
    media: staging: rkisp1: cap: fix runtime PM imbalance on error
    media: cedrus: fix reference leak in cedrus_start_streaming
    media: platform: add missing put_device() call in mtk_jpeg_probe() and mtk_jpeg_remove()
    media: venus: core: change clk enable and disable order in resume and suspend
    media: venus: core: vote for video-mem path
    media: venus: core: vote with average bandwidth and peak bandwidth as zero
    RDMA/cma: Add missing error handling of listen_id
    ASoC: meson: fix COMPILE_TEST error
    spi: dw: fix build error by selecting MULTIPLEXER
    scsi: core: Fix VPD LUN ID designator priorities
    media: venus: put dummy vote on video-mem path after last session release
    media: solo6x10: fix missing snd_card_free in error handling case
    video: fbdev: atmel_lcdfb: fix return error code in atmel_lcdfb_of_init()
    mmc: sdhci: tegra: fix wrong unit with busy_timeout
    drm/omap: dmm_tiler: fix return error code in omap_dmm_probe()
    drm/meson: Free RDMA resources after tearing down DRM
    drm/meson: Unbind all connectors on module removal
    drm/meson: dw-hdmi: Register a callback to disable the regulator
    drm/meson: dw-hdmi: Ensure that clocks are enabled before touching the TOP registers
    ASoC: intel: SND_SOC_INTEL_KEEMBAY should depend on ARCH_KEEMBAY
    iommu/vt-d: include conditionally on CONFIG_INTEL_IOMMU_SVM
    Input: ads7846 - fix race that causes missing releases
    Input: ads7846 - fix integer overflow on Rt calculation
    Input: ads7846 - fix unaligned access on 7845
    bus: mhi: core: Remove double locking from mhi_driver_remove()
    bus: mhi: core: Fix null pointer access when parsing MHI configuration
    usb/max3421: fix return error code in max3421_probe()
    spi: mxs: fix reference leak in mxs_spi_probe
    selftests/bpf: Fix broken riscv build
    powerpc: Avoid broken GCC __attribute__((optimize))
    powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32
    ARM: dts: tacoma: Fix node vs reg mismatch for flash memory
    Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"
    powerpc/powernv/sriov: fix unsigned int win compared to less than zero
    mfd: htc-i2cpld: Add the missed i2c_put_adapter() in htcpld_register_chip_i2c()
    mfd: MFD_SL28CPLD should depend on ARCH_LAYERSCAPE
    mfd: stmfx: Fix dev_err_probe() call in stmfx_chip_init()
    mfd: cpcap: Fix interrupt regression with regmap clear_ack
    EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId
    scsi: ufs: Avoid to call REQ_CLKS_OFF to CLKS_OFF
    scsi: ufs: Fix clkgating on/off
    rcu: Allow rcu_irq_enter_check_tick() from NMI
    rcu,ftrace: Fix ftrace recursion
    rcu/tree: Defer kvfree_rcu() allocation to a clean context
    crypto: crypto4xx - Replace bitwise OR with logical OR in crypto4xx_build_pd
    crypto: omap-aes - Fix PM disable depth imbalance in omap_aes_probe
    crypto: sun8i-ce - fix two error path's memory leak
    spi: fix resource leak for drivers without .remove callback
    drm/meson: dw-hdmi: Disable clocks on driver teardown
    drm/meson: dw-hdmi: Enable the iahb clock early enough
    PCI: Disable MSI for Pericom PCIe-USB adapter
    PCI: brcmstb: Initialize "tmp" before use
    soc: ti: knav_qmss: fix reference leak in knav_queue_probe
    soc: ti: Fix reference imbalance in knav_dma_probe
    drivers: soc: ti: knav_qmss_queue: Fix error return code in knav_queue_probe
    soc: qcom: initialize local variable
    arm64: dts: qcom: sm8250: correct compatible for sm8250-mtp
    arm64: dts: qcom: msm8916-samsung-a2015: Disable muic i2c pin bias
    Input: omap4-keypad - fix runtime PM error handling
    clk: meson: Kconfig: fix dependency for G12A
    staging: mfd: hi6421-spmi-pmic: fix error return code in hi6421_spmi_pmic_probe()
    ath11k: Fix the rx_filter flag setting for peer rssi stats
    RDMA/cxgb4: Validate the number of CQEs
    soundwire: Fix DEBUG_LOCKS_WARN_ON for uninitialized attribute
    pinctrl: sunxi: fix irq bank map for the Allwinner A100 pin controller
    memstick: fix a double-free bug in memstick_check
    ARM: dts: at91: sam9x60: add pincontrol for USB Host
    ARM: dts: at91: sama5d4_xplained: add pincontrol for USB Host
    ARM: dts: at91: sama5d3_xplained: add pincontrol for USB Host
    mmc: pxamci: Fix error return code in pxamci_probe
    brcmfmac: fix error return code in brcmf_cfg80211_connect()
    orinoco: Move context allocation after processing the skb
    qtnfmac: fix error return code in qtnf_pcie_probe()
    rsi: fix error return code in rsi_reset_card()
    cw1200: fix missing destroy_workqueue() on error in cw1200_init_common
    dmaengine: mv_xor_v2: Fix error return code in mv_xor_v2_probe()
    arm64: dts: qcom: sdm845: Limit ipa iommu streams
    leds: netxbig: add missing put_device() call in netxbig_leds_get_of_pdata()
    leds: lp50xx: Fix an error handling path in 'lp50xx_probe_dt()'
    leds: turris-omnia: check for LED_COLOR_ID_RGB instead LED_COLOR_ID_MULTI
    arm64: tegra: Fix DT binding for IO High Voltage entry
    RDMA/cma: Fix deadlock on &lock in rdma_cma_listen_on_all() error unwind
    soundwire: qcom: Fix build failure when slimbus is module
    drm/imx/dcss: fix rotations for Vivante tiled formats
    media: siano: fix memory leak of debugfs members in smsdvb_hotplug
    platform/x86: mlx-platform: Remove PSU EEPROM from default platform configuration
    platform/x86: mlx-platform: Remove PSU EEPROM from MSN274x platform configuration
    arm64: dts: qcom: sc7180: limit IPA iommu streams
    RDMA/hns: Only record vlan info for HIP08
    RDMA/hns: Fix missing fields in address vector
    RDMA/hns: Avoid setting loopback indicator when smac is same as dmac
    serial: 8250-mtk: Fix reference leak in mtk8250_probe
    samples: bpf: Fix lwt_len_hist reusing previous BPF map
    media: imx214: Fix stop streaming
    mips: cdmm: fix use-after-free in mips_cdmm_bus_discover
    media: max2175: fix max2175_set_csm_mode() error code
    slimbus: qcom-ngd-ctrl: Avoid sending power requests without QMI
    RDMA/core: Track device memory MRs
    drm/mediatek: Use correct aliases name for ovl
    HSI: omap_ssi: Don't jump to free ID in ssi_add_controller()
    ARM: dts: Remove non-existent i2c1 from 98dx3236
    arm64: dts: armada-3720-turris-mox: update ethernet-phy handle name
    power: supply: bq25890: Use the correct range for IILIM register
    arm64: dts: rockchip: Set dr_mode to "host" for OTG on rk3328-roc-cc
    power: supply: max17042_battery: Fix current_{avg,now} hiding with no current sense
    power: supply: axp288_charger: Fix HP Pavilion x2 10 DMI matching
    power: supply: bq24190_charger: fix reference leak
    genirq/irqdomain: Don't try to free an interrupt that has no mapping
    arm64: dts: ls1028a: fix ENETC PTP clock input
    arm64: dts: ls1028a: fix FlexSPI clock input
    arm64: dts: freescale: sl28: combine SPI MTD partitions
    phy: tegra: xusb: Fix usb_phy device driver field
    arm64: dts: qcom: c630: Polish i2c-hid devices
    arm64: dts: qcom: c630: Fix pinctrl pins properties
    PCI: Bounds-check command-line resource alignment requests
    PCI: Fix overflow in command-line resource alignment requests
    PCI: iproc: Fix out-of-bound array accesses
    PCI: iproc: Invalidate correct PAXB inbound windows
    arm64: dts: meson: fix spi-max-frequency on Khadas VIM2
    arm64: dts: meson-sm1: fix typo in opp table
    soc: amlogic: canvas: add missing put_device() call in meson_canvas_get()
    scsi: hisi_sas: Fix up probe error handling for v3 hw
    scsi: pm80xx: Do not sleep in atomic context
    spi: spi-fsl-dspi: Use max_native_cs instead of num_chipselect to set SPI_MCR
    ARM: dts: at91: at91sam9rl: fix ADC triggers
    RDMA/hns: Fix 0-length sge calculation error
    RDMA/hns: Bugfix for calculation of extended sge
    mailbox: arm_mhu_db: Fix mhu_db_shutdown by replacing kfree with devm_kfree
    soundwire: master: use pm_runtime_set_active() on add
    platform/x86: dell-smbios-base: Fix error return code in dell_smbios_init
    ASoC: Intel: Boards: tgl_max98373: update TDM slot_width
    media: max9271: Fix GPIO enable/disable
    media: rdacm20: Enable GPIO1 explicitly
    media: i2c: imx219: Selection compliance fixes
    ath11k: Don't cast ath11k_skb_cb to ieee80211_tx_info.control
    ath11k: Reset ath11k_skb_cb before setting new flags
    ath11k: Fix an error handling path
    ath10k: Fix the parsing error in service available event
    ath10k: Fix an error handling path
    ath10k: Release some resources in an error handling path
    SUNRPC: rpc_wake_up() should wake up tasks in the correct order
    NFSv4.2: condition READDIR's mask for security label based on LSM state
    SUNRPC: xprt_load_transport() needs to support the netid "rdma6"
    NFSv4: Fix the alignment of page data in the getdeviceinfo reply
    net: sunrpc: Fix 'snprintf' return value check in 'do_xprt_debugfs'
    lockd: don't use interval-based rebinding over TCP
    NFS: switch nfsiod to be an UNBOUND workqueue.
    selftests/seccomp: Update kernel config
    vfio-pci: Use io_remap_pfn_range() for PCI IO memory
    hwmon: (ina3221) Fix PM usage counter unbalance in ina3221_write_enable
    f2fs: fix double free of unicode map
    media: tvp5150: Fix wrong return value of tvp5150_parse_dt()
    media: saa7146: fix array overflow in vidioc_s_audio()
    powerpc/perf: Fix crash with is_sier_available when pmu is not set
    powerpc/64: Fix an EMIT_BUG_ENTRY in head_64.S
    powerpc/xmon: Fix build failure for 8xx
    powerpc/perf: Fix to update radix_scope_qual in power10
    powerpc/perf: Update the PMU group constraints for l2l3 events in power10
    powerpc/perf: Fix the PMU group constraints for threshold events in power10
    clocksource/drivers/orion: Add missing clk_disable_unprepare() on error path
    clocksource/drivers/cadence_ttc: Fix memory leak in ttc_setup_clockevent()
    clocksource/drivers/ingenic: Fix section mismatch
    clocksource/drivers/riscv: Make RISCV_TIMER depends on RISCV_SBI
    arm64: mte: fix prctl(PR_GET_TAGGED_ADDR_CTRL) if TCF0=NONE
    iio: hrtimer-trigger: Mark hrtimer to expire in hard interrupt context
    libbpf: Sanitise map names before pinning
    ARM: dts: at91: sam9x60ek: remove bypass property
    ARM: dts: at91: sama5d2: map securam as device
    scripts: kernel-doc: fix parsing function-like typedefs
    bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address()
    selftests/bpf: Fix invalid use of strncat in test_sockmap
    pinctrl: falcon: add missing put_device() call in pinctrl_falcon_probe()
    soc: rockchip: io-domain: Fix error return code in rockchip_iodomain_probe()
    arm64: dts: rockchip: Fix UART pull-ups on rk3328
    memstick: r592: Fix error return in r592_probe()
    MIPS: Don't round up kernel sections size for memblock_add()
    mt76: mt7663s: fix a possible ple quota underflow
    mt76: mt7915: set fops_sta_stats.owner to THIS_MODULE
    mt76: set fops_tx_stats.owner to THIS_MODULE
    mt76: dma: fix possible deadlock running mt76_dma_cleanup
    net/mlx5: Properly convey driver version to firmware
    mt76: fix memory leak if device probing fails
    mt76: fix tkip configuration for mt7615/7663 devices
    ASoC: jz4740-i2s: add missed checks for clk_get()
    ASoC: q6afe-clocks: Add missing parent clock rate
    dm ioctl: fix error return code in target_message
    ASoC: cros_ec_codec: fix uninitialized memory read
    ASoC: atmel: mchp-spdifrx needs COMMON_CLK
    ASoC: qcom: fix QDSP6 dependencies, attempt #3
    phy: mediatek: allow compile-testing the hdmi phy
    phy: renesas: rcar-gen3-usb2: disable runtime pm in case of failure
    memory: ti-emif-sram: only build for ARMv7
    memory: jz4780_nemc: Fix potential NULL dereference in jz4780_nemc_probe()
    drm/msm: a5xx: Make preemption reset case reentrant
    drm/msm: add IOMMU_SUPPORT dependency
    clocksource/drivers/arm_arch_timer: Use stable count reader in erratum sne
    clocksource/drivers/arm_arch_timer: Correct fault programming of CNTKCTL_EL1.EVNTI
    cpufreq: ap806: Add missing MODULE_DEVICE_TABLE
    cpufreq: highbank: Add missing MODULE_DEVICE_TABLE
    cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE
    cpufreq: qcom: Add missing MODULE_DEVICE_TABLE
    cpufreq: st: Add missing MODULE_DEVICE_TABLE
    cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE
    cpufreq: loongson1: Add missing MODULE_ALIAS
    cpufreq: scpi: Add missing MODULE_ALIAS
    cpufreq: vexpress-spc: Add missing MODULE_ALIAS
    cpufreq: imx: fix NVMEM_IMX_OCOTP dependency
    macintosh/adb-iop: Always wait for reply message from IOP
    macintosh/adb-iop: Send correct poll command
    staging: bcm2835: fix vchiq_mmal dependencies
    staging: greybus: audio: Fix possible leak free widgets in gbaudio_dapm_free_controls
    spi: dw: Fix error return code in dw_spi_bt1_probe()
    Bluetooth: btusb: Add the missed release_firmware() in btusb_mtk_setup_firmware()
    Bluetooth: btmtksdio: Add the missed release_firmware() in mtk_setup_firmware()
    Bluetooth: sco: Fix crash when using BT_SNDMTU/BT_RCVMTU option
    block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name
    block/rnbd: fix a null pointer dereference on dev->blk_symlink_name
    Bluetooth: btusb: Fix detection of some fake CSR controllers with a bcdDevice val of 0x0134
    platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on some HP x360 models
    adm8211: fix error return code in adm8211_probe()
    mtd: spi-nor: sst: fix BPn bits for the SST25VF064C
    mtd: spi-nor: ignore errors in spi_nor_unlock_all()
    mtd: spi-nor: atmel: remove global protection flag
    mtd: spi-nor: atmel: fix unlock_all() for AT25FS010/040
    arm64: dts: meson: g12b: odroid-n2: fix PHY deassert timing requirements
    arm64: dts: meson: fix PHY deassert timing requirements
    ARM: dts: meson: fix PHY deassert timing requirements
    arm64: dts: meson: g12a: x96-max: fix PHY deassert timing requirements
    arm64: dts: meson: g12b: w400: fix PHY deassert timing requirements
    clk: fsl-sai: fix memory leak
    scsi: qedi: Fix missing destroy_workqueue() on error in __qedi_probe
    scsi: pm80xx: Fix error return in pm8001_pci_probe()
    scsi: iscsi: Fix inappropriate use of put_device()
    seq_buf: Avoid type mismatch for seq_buf_init
    scsi: fnic: Fix error return code in fnic_probe()
    platform/x86: mlx-platform: Fix item counter assignment for MSN2700, MSN24xx systems
    platform/x86: mlx-platform: Fix item counter assignment for MSN2700/ComEx system
    ARM: 9030/1: entry: omit FP emulation for UND exceptions taken in kernel mode
    powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops
    powerpc/pseries/hibernation: remove redundant cacheinfo update
    powerpc/powermac: Fix low_sleep_handler with CONFIG_VMAP_STACK
    drm/mediatek: avoid dereferencing a null hdmi_phy on an error message
    ASoC: amd: change clk_get() to devm_clk_get() and add missed checks
    coresight: remove broken __exit annotations
    ASoC: max98390: Fix error codes in max98390_dsm_init()
    powerpc/mm: sanity_check_fault() should work for all, not only BOOK3S
    usb: ehci-omap: Fix PM disable depth umbalance in ehci_hcd_omap_probe
    usb: oxu210hp-hcd: Fix memory leak in oxu_create
    speakup: fix uninitialized flush_lock
    nfsd: Fix message level for normal termination
    NFSD: Fix 5 seconds delay when doing inter server copy
    nfs_common: need lock during iterate through the list
    x86/kprobes: Restore BTF if the single-stepping is cancelled
    scsi: qla2xxx: Fix FW initialization error on big endian machines
    scsi: qla2xxx: Fix N2N and NVMe connect retry failure
    platform/chrome: cros_ec_spi: Don't overwrite spi::mode
    misc: pci_endpoint_test: fix return value of error branch
    bus: fsl-mc: add back accidentally dropped error check
    bus: fsl-mc: fix error return code in fsl_mc_object_allocate()
    fsi: Aspeed: Add mutex to protect HW access
    s390/cio: fix use-after-free in ccw_device_destroy_console
    iwlwifi: dbg-tlv: fix old length in is_trig_data_contained()
    iwlwifi: mvm: hook up missing RX handlers
    erofs: avoid using generic_block_bmap
    clk: renesas: r8a779a0: Fix R and OSC clocks
    can: m_can: m_can_config_endisable(): remove double clearing of clock stop request bit
    powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
    powerpc/sstep: Cover new VSX instructions under CONFIG_VSX
    slimbus: qcom: fix potential NULL dereference in qcom_slim_prg_slew()
    ALSA: hda/hdmi: fix silent stream for first playback to DP
    RDMA/core: Do not indicate device ready when device enablement fails
    RDMA/uverbs: Fix incorrect variable type
    remoteproc/mediatek: change MT8192 CFG register base
    remoteproc/mtk_scp: surround DT device IDs with CONFIG_OF
    remoteproc: q6v5-mss: fix error handling in q6v5_pds_enable
    remoteproc: qcom: fix reference leak in adsp_start
    remoteproc: qcom: pas: fix error handling in adsp_pds_enable
    remoteproc: k3-dsp: Fix return value check in k3_dsp_rproc_of_get_memories()
    remoteproc: qcom: Fix potential NULL dereference in adsp_init_mmio()
    remoteproc/mediatek: unprepare clk if scp_before_load fails
    clk: qcom: gcc-sc7180: Use floor ops for sdcc clks
    clk: tegra: Fix duplicated SE clock entry
    mtd: rawnand: gpmi: fix reference count leak in gpmi ops
    mtd: rawnand: meson: Fix a resource leak in init
    mtd: rawnand: gpmi: Fix the random DMA timeout issue
    samples/bpf: Fix possible hang in xdpsock with multiple threads
    fs: Handle I_DONTCACHE in iput_final() instead of generic_drop_inode()
    extcon: max77693: Fix modalias string
    crypto: atmel-i2c - select CONFIG_BITREVERSE
    mac80211: don't set set TDLS STA bandwidth wider than possible
    mac80211: fix a mistake check for rx_stats update
    ASoC: wm_adsp: remove "ctl" from list on error in wm_adsp_create_control()
    irqchip/alpine-msi: Fix freeing of interrupts on allocation error path
    irqchip/ti-sci-inta: Fix printing of inta id on probe success
    irqchip/ti-sci-intr: Fix freeing of irqs
    dmaengine: ti: k3-udma: Correct normal channel offset when uchan_cnt is not 0
    RDMA/hns: Limit the length of data copied between kernel and userspace
    RDMA/hns: Normalization the judgment of some features
    RDMA/hns: Do shift on traffic class when using RoCEv2
    gpiolib: irq hooks: fix recursion in gpiochip_irq_unmask
    ath11k: Fix incorrect tlvs in scan start command
    irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling
    watchdog: armada_37xx: Add missing dependency on HAS_IOMEM
    watchdog: sirfsoc: Add missing dependency on HAS_IOMEM
    watchdog: sprd: remove watchdog disable from resume fail path
    watchdog: sprd: check busy bit before new loading rather than after that
    watchdog: Fix potential dereferencing of null pointer
    ubifs: Fix error return code in ubifs_init_authentication()
    um: Monitor error events in IRQ controller
    um: tty: Fix handling of close in tty lines
    um: chan_xterm: Fix fd leak
    sunrpc: fix xs_read_xdr_buf for partial pages receive
    RDMA/mlx5: Fix MR cache memory leak
    RDMA/cma: Don't overwrite sgid_attr after device is released
    nfc: s3fwrn5: Release the nfc firmware
    drm: mxsfb: Silence -EPROBE_DEFER while waiting for bridge
    powerpc/perf: Fix Threshold Event Counter Multiplier width for P10
    powerpc/ps3: use dma_mapping_error()
    perf test: Fix metric parsing test
    drm/amdgpu: fix regression in vbios reservation handling on headless
    mm/gup: reorganize internal_get_user_pages_fast()
    mm/gup: prevent gup_fast from racing with COW during fork
    mm/gup: combine put_compound_head() and unpin_user_page()
    mm: memcg/slab: fix return of child memcg objcg for root memcg
    mm: memcg/slab: fix use after free in obj_cgroup_charge
    mm/rmap: always do TTU_IGNORE_ACCESS
    sparc: fix handling of page table constructor failure
    mm/vmalloc: Fix unlock order in s_stop()
    mm/vmalloc.c: fix kasan shadow poisoning size
    mm,memory_failure: always pin the page in madvise_inject_error
    hugetlb: fix an error code in hugetlb_reserve_pages()
    mm: don't wake kswapd prematurely when watermark boosting is disabled
    proc: fix lookup in /proc/net subdirectories after setns(2)
    checkpatch: fix unescaped left brace
    s390/test_unwind: fix CALL_ON_STACK tests
    lan743x: fix rx_napi_poll/interrupt ping-pong
    ice, xsk: clear the status bits for the next_to_use descriptor
    i40e, xsk: clear the status bits for the next_to_use descriptor
    net: dsa: qca: ar9331: fix sleeping function called from invalid context bug
    dpaa2-eth: fix the size of the mapped SGT buffer
    net: bcmgenet: Fix a resource leak in an error handling path in the probe functin
    net: mscc: ocelot: Fix a resource leak in the error handling path of the probe function
    net: allwinner: Fix some resources leak in the error handling path of the probe and in the remove function
    block/rnbd-clt: Get rid of warning regarding size argument in strlcpy
    block/rnbd-clt: Fix possible memleak
    NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read()
    net: korina: fix return value
    devlink: use _BITUL() macro instead of BIT() in the UAPI header
    libnvdimm/label: Return -ENXIO for no slot in __blk_label_update
    powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug
    watchdog: qcom: Avoid context switch in restart handler
    watchdog: coh901327: add COMMON_CLK dependency
    clk: ti: Fix memleak in ti_fapll_synth_setup
    pwm: zx: Add missing cleanup in error path
    pwm: lp3943: Dynamically allocate PWM chip base
    pwm: imx27: Fix overflow for bigger periods
    pwm: sun4i: Remove erroneous else branch
    io_uring: cancel only requests of current task
    tools build: Add missing libcap to test-all.bin target
    perf record: Fix memory leak when using '--user-regs=?' to list registers
    qlcnic: Fix error code in probe
    nfp: move indirect block cleanup to flower app stop callback
    vdpa/mlx5: Use write memory barrier after updating CQ index
    virtio_ring: Cut and paste bugs in vring_create_virtqueue_packed()
    virtio_net: Fix error code in probe()
    virtio_ring: Fix two use after free bugs
    vhost scsi: fix error return code in vhost_scsi_set_endpoint()
    epoll: check for events when removing a timed out thread from the wait queue
    clk: bcm: dvp: Add MODULE_DEVICE_TABLE()
    clk: at91: sama7g5: fix compilation error
    clk: at91: sam9x60: remove atmel,osc-bypass support
    clk: s2mps11: Fix a resource leak in error handling paths in the probe function
    clk: sunxi-ng: Make sure divider tables have sentinel
    clk: vc5: Use "idt,voltage-microvolt" instead of "idt,voltage-microvolts"
    kconfig: fix return value of do_error_if()
    powerpc/boot: Fix build of dts/fsl
    powerpc/smp: Add __init to init_big_cores()
    ARM: 9044/1: vfp: use undef hook for VFP support detection
    ARM: 9036/1: uncompress: Fix dbgadtb size parameter name
    perf probe: Fix memory leak when synthesizing SDT probes
    io_uring: fix racy IOPOLL flush overflow
    io_uring: cancel reqs shouldn't kill overflow list
    Smack: Handle io_uring kernel thread privileges
    proc mountinfo: make splice available again
    io_uring: fix io_cqring_events()'s noflush
    io_uring: fix racy IOPOLL completions
    io_uring: always let io_iopoll_complete() complete polled io
    vfio/pci: Move dummy_resources_list init in vfio_pci_probe()
    vfio/pci/nvlink2: Do not attempt NPU2 setup on POWER8NVL NPU
    media: gspca: Fix memory leak in probe
    io_uring: fix io_wqe->work_list corruption
    io_uring: fix 0-iov read buffer select
    io_uring: hold uring_lock while completing failed polled io in io_wq_submit_work()
    io_uring: fix ignoring xa_store errors
    io_uring: fix double io_uring free
    io_uring: make ctx cancel on exit targeted to actual ctx
    media: sunxi-cir: ensure IR is handled when it is continuous
    media: netup_unidvb: Don't leak SPI master in probe error path
    media: ipu3-cio2: Remove traces of returned buffers
    media: ipu3-cio2: Return actual subdev format
    media: ipu3-cio2: Serialise access to pad format
    media: ipu3-cio2: Validate mbus format in setting subdev format
    media: ipu3-cio2: Make the field on subdev format V4L2_FIELD_NONE
    Input: cyapa_gen6 - fix out-of-bounds stack access
    ALSA: hda/ca0132 - Change Input Source enum strings.
    ACPI: NFIT: Fix input validation of bus-family
    PM: ACPI: PCI: Drop acpi_pm_set_bridge_wakeup()
    Revert "ACPI / resources: Use AE_CTRL_TERMINATE to terminate resources walks"
    ACPI: PNP: compare the string length in the matching_id()
    ALSA: hda: Fix regressions on clear and reconfig sysfs
    ALSA: hda/ca0132 - Fix AE-5 rear headphone pincfg.
    ALSA: hda/realtek: make bass spk volume adjustable on a yoga laptop
    ALSA: hda/realtek - Enable headset mic of ASUS X430UN with ALC256
    ALSA: hda/realtek - Enable headset mic of ASUS Q524UQK with ALC255
    ALSA: hda/realtek - Add supported for more Lenovo ALC285 Headset Button
    ALSA: pcm: oss: Fix a few more UBSAN fixes
    ALSA/hda: apply jack fixup for the Acer Veriton N4640G/N6640G/N2510G
    ALSA: hda/realtek: Add quirk for MSI-GP73
    ALSA: hda/realtek: Apply jack fixup for Quanta NL3
    ALSA: hda/realtek: Remove dummy lineout on Acer TravelMate P648/P658
    ALSA: hda/realtek - Supported Dell fixed type headset
    ALSA: usb-audio: Add VID to support native DSD reproduction on FiiO devices
    ALSA: usb-audio: Disable sample read check if firmware doesn't give back
    ALSA: usb-audio: Add alias entry for ASUS PRIME TRX40 PRO-S
    ALSA: core: memalloc: add page alignment for iram
    s390/smp: perform initial CPU reset also for SMT siblings
    s390/kexec_file: fix diag308 subcode when loading crash kernel
    s390/idle: add missing mt_cycles calculation
    s390/idle: fix accounting with machine checks
    s390/dasd: fix hanging device offline processing
    s390/dasd: prevent inconsistent LCU device data
    s390/dasd: fix list corruption of pavgroup group list
    s390/dasd: fix list corruption of lcu list
    binder: add flag to clear buffer on txn complete
    ASoC: cx2072x: Fix doubly definitions of Playback and Capture streams
    ASoC: AMD Renoir - add DMI table to avoid the ACP mic probe (broken BIOS)
    ASoC: AMD Raven/Renoir - fix the PCI probe (PCI revision)
    staging: comedi: mf6x4: Fix AI end-of-conversion detection
    z3fold: simplify freeing slots
    z3fold: stricter locking and more careful reclaim
    perf/x86/intel: Add event constraint for CYCLE_ACTIVITY.STALLS_MEM_ANY
    perf/x86/intel: Fix rtm_abort_event encoding on Ice Lake
    perf/x86/intel/lbr: Fix the return type of get_lbr_cycles()
    powerpc/perf: Exclude kernel samples while counting events in user space.
    cpufreq: intel_pstate: Use most recent guaranteed performance values
    crypto: ecdh - avoid unaligned accesses in ecdh_set_secret()
    crypto: arm/aes-ce - work around Cortex-A57/A72 silion errata
    m68k: Fix WARNING splat in pmac_zilog driver
    Documentation: seqlock: s/LOCKTYPE/LOCKNAME/g
    EDAC/i10nm: Use readl() to access MMIO registers
    EDAC/amd64: Fix PCI component registration
    cpuset: fix race between hotplug work and later CPU offline
    dyndbg: fix use before null check
    USB: serial: mos7720: fix parallel-port state restore
    USB: serial: digi_acceleport: fix write-wakeup deadlocks
    USB: serial: keyspan_pda: fix dropped unthrottle interrupts
    USB: serial: keyspan_pda: fix write deadlock
    USB: serial: keyspan_pda: fix stalled writes
    USB: serial: keyspan_pda: fix write-wakeup use-after-free
    USB: serial: keyspan_pda: fix tx-unthrottle use-after-free
    USB: serial: keyspan_pda: fix write unthrottling
    btrfs: do not shorten unpin len for caching block groups
    btrfs: update last_byte_to_unpin in switch_commit_roots
    btrfs: fix race when defragmenting leads to unnecessary IO
    ext4: fix an IS_ERR() vs NULL check
    ext4: fix a memory leak of ext4_free_data
    ext4: fix deadlock with fs freezing and EA inodes
    ext4: don't remount read-only with errors=continue on reboot
    RISC-V: Fix usage of memblock_enforce_memory_limit
    arm64: dts: ti: k3-am65: mark dss as dma-coherent
    arm64: dts: marvell: keep SMMU disabled by default for Armada 7040 and 8040
    KVM: arm64: Introduce handling of AArch32 TTBCR2 traps
    KVM: x86: reinstate vendor-agnostic check on SPEC_CTRL cpuid bits
    KVM: SVM: Remove the call to sev_platform_status() during setup
    iommu/arm-smmu: Allow implementation specific write_s2cr
    iommu/arm-smmu-qcom: Read back stream mappings
    iommu/arm-smmu-qcom: Implement S2CR quirk
    ARM: dts: pandaboard: fix pinmux for gpio user button of Pandaboard ES
    ARM: dts: at91: sama5d2: fix CAN message ram offset and size
    ARM: tegra: Populate OPP table for Tegra20 Ventana
    xprtrdma: Fix XDRBUF_SPARSE_PAGES support
    powerpc/32: Fix vmap stack - Properly set r1 before activating MMU on syscall too
    powerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_at
    powerpc/rtas: Fix typo of ibm,open-errinjct in RTAS filter
    powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()
    powerpc/feature: Add CPU_FTR_NOEXECUTE to G2_LE
    powerpc/xmon: Change printk() to pr_cont()
    powerpc/8xx: Fix early debug when SMC1 is relocated
    powerpc/mm: Fix verification of MMU_FTR_TYPE_44x
    powerpc/powernv/npu: Do not attempt NPU2 setup on POWER8NVL NPU
    powerpc/powernv/memtrace: Don't leak kernel memory to user space
    powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrently
    ovl: make ioctl() safe
    ima: Don't modify file descriptor mode on the fly
    um: Remove use of asprinf in umid.c
    um: Fix time-travel mode
    ceph: fix race in concurrent __ceph_remove_cap invocations
    SMB3: avoid confusing warning message on mount to Azure
    SMB3.1.1: remove confusing mount warning when no SPNEGO info on negprot rsp
    SMB3.1.1: do not log warning message if server doesn't populate salt
    ubifs: wbuf: Don't leak kernel memory to flash
    jffs2: Fix GC exit abnormally
    jffs2: Fix ignoring mounting options problem during remounting
    fsnotify: generalize handle_inode_event()
    inotify: convert to handle_inode_event() interface
    fsnotify: fix events reported to watching parent and child
    jfs: Fix array index bounds check in dbAdjTree
    drm/panfrost: Fix job timeout handling
    drm/panfrost: Move the GPU reset bits outside the timeout handler
    platform/x86: mlx-platform: remove an unused variable
    drm/amdgpu: only set DP subconnector type on DP and eDP connectors
    drm/amd/display: Fix memory leaks in S3 resume
    drm/dp_aux_dev: check aux_dev before use in drm_dp_aux_dev_get_by_minor()
    drm/i915: Fix mismatch between misplaced vma check and vma insert
    iio: ad_sigma_delta: Don't put SPI transfer buffer on the stack
    spi: pxa2xx: Fix use-after-free on unbind
    spi: spi-sh: Fix use-after-free on unbind
    spi: atmel-quadspi: Fix use-after-free on unbind
    spi: spi-mtk-nor: Don't leak SPI master in probe error path
    spi: ar934x: Don't leak SPI master in probe error path
    spi: davinci: Fix use-after-free on unbind
    spi: fsl: fix use of spisel_boot signal on MPC8309
    spi: gpio: Don't leak SPI master in probe error path
    spi: mxic: Don't leak SPI master in probe error path
    spi: npcm-fiu: Disable clock in probe error path
    spi: pic32: Don't leak DMA channels in probe error path
    spi: rb4xx: Don't leak SPI master in probe error path
    spi: rpc-if: Fix use-after-free on unbind
    spi: sc18is602: Don't leak SPI master in probe error path
    spi: spi-geni-qcom: Fix use-after-free on unbind
    spi: spi-qcom-qspi: Fix use-after-free on unbind
    spi: st-ssc4: Fix unbalanced pm_runtime_disable() in probe error path
    spi: synquacer: Disable clock in probe error path
    spi: mt7621: Disable clock in probe error path
    spi: mt7621: Don't leak SPI master in probe error path
    spi: atmel-quadspi: Disable clock in probe error path
    spi: atmel-quadspi: Fix AHB memory accesses
    soc: qcom: smp2p: Safely acquire spinlock without IRQs
    mtd: spinand: Fix OOB read
    mtd: parser: cmdline: Fix parsing of part-names with colons
    mtd: core: Fix refcounting for unpartitioned MTDs
    mtd: rawnand: qcom: Fix DMA sync on FLASH_STATUS register read
    mtd: rawnand: meson: fix meson_nfc_dma_buffer_release() arguments
    scsi: qla2xxx: Fix crash during driver load on big endian machines
    scsi: lpfc: Fix invalid sleeping context in lpfc_sli4_nvmet_alloc()
    scsi: lpfc: Fix scheduling call while in softirq context in lpfc_unreg_rpi
    scsi: lpfc: Re-fix use after free in lpfc_rq_buf_free()
    openat2: reject RESOLVE_BENEATH|RESOLVE_IN_ROOT
    iio: buffer: Fix demux update
    iio: adc: rockchip_saradc: fix missing clk_disable_unprepare() on error in rockchip_saradc_resume
    iio: imu: st_lsm6dsx: fix edge-trigger interrupts
    iio:light:rpr0521: Fix timestamp alignment and prevent data leak.
    iio:light:st_uvis25: Fix timestamp alignment and prevent data leak.
    iio:magnetometer:mag3110: Fix alignment and data leak issues.
    iio:pressure:mpl3115: Force alignment of buffer
    iio:imu:bmi160: Fix too large a buffer.
    iio:imu:bmi160: Fix alignment and data leak issues
    iio:adc:ti-ads124s08: Fix buffer being too long.
    iio:adc:ti-ads124s08: Fix alignment and data leak issues.
    md/cluster: block reshape with remote resync job
    md/cluster: fix deadlock when node is doing resync job
    pinctrl: sunxi: Always call chained_irq_{enter, exit} in sunxi_pinctrl_irq_handler
    clk: ingenic: Fix divider calculation with div tables
    clk: mvebu: a3700: fix the XTAL MODE pin to MPP1_9
    clk: tegra: Do not return 0 on failure
    counter: microchip-tcb-capture: Fix CMR value check
    device-dax/core: Fix memory leak when rmmod dax.ko
    dma-buf/dma-resv: Respect num_fences when initializing the shared fence list.
    driver: core: Fix list corruption after device_del()
    xen-blkback: set ring->xenblkd to NULL after kthread_stop()
    xen/xenbus: Allow watches discard events before queueing
    xen/xenbus: Add 'will_handle' callback support in xenbus_watch_path()
    xen/xenbus/xen_bus_type: Support will_handle watch callback
    xen/xenbus: Count pending messages for each watch
    xenbus/xenbus_backend: Disallow pending watch messages
    memory: jz4780_nemc: Fix an error pointer vs NULL check in probe()
    memory: renesas-rpc-if: Fix a node reference leak in rpcif_probe()
    memory: renesas-rpc-if: Return correct value to the caller of rpcif_manual_xfer()
    memory: renesas-rpc-if: Fix unbalanced pm_runtime_enable in rpcif_{enable,disable}_rpm
    libnvdimm/namespace: Fix reaping of invalidated block-window-namespace labels
    platform/x86: intel-vbtn: Allow switch events on Acer Switch Alpha 12
    tracing: Disable ftrace selftests when any tracer is running
    mt76: add back the SUPPORTS_REORDERING_BUFFER flag
    of: fix linker-section match-table corruption
    PCI: Fix pci_slot_release() NULL pointer dereference
    regulator: axp20x: Fix DLDO2 voltage control register mask for AXP22x
    remoteproc: sysmon: Ensure remote notification ordering
    thermal/drivers/cpufreq_cooling: Update cpufreq_state only if state has changed
    rtc: ep93xx: Fix NULL pointer dereference in ep93xx_rtc_read_time
    Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"
    null_blk: Fix zone size initialization
    null_blk: Fail zone append to conventional zones
    drm/edid: fix objtool warning in drm_cvt_modes()
    x86/CPU/AMD: Save AMD NodeId as cpu_die_id
    Linux 5.10.4

    Signed-off-by: Greg Kroah-Hartman
    Change-Id: I25209e79d8b9faf5382087955a29b7404bdefe38

    Greg Kroah-Hartman
     
  • [ Upstream commit 1e8aaedb182d6ddffc894b832e4962629907b3e0 ]

    madvise_inject_error() uses get_user_pages_fast to translate the address
    we specified to a page. After [1], we drop the extra reference count for
    memory_failure() path. That commit says that memory_failure wanted to
    keep the pin in order to take the page out of circulation.

    The truth is that we need to keep the page pinned, otherwise the page
    might be re-used after the put_page() and we can end up messing with
    someone else's memory.

    E.g:

    CPU0
    process X CPU1
    madvise_inject_error
    get_user_pages
    put_page
    page gets reclaimed
    process Y allocates the page
    memory_failure
    // We mess with process Y memory

    madvise() is meant to operate on a self address space, so messing with
    pages that do not belong to us seems the wrong thing to do.
    To avoid that, let us keep the page pinned for memory_failure as well.

    Pages for DAX mappings will release this extra refcount in
    memory_failure_dev_pagemap.

    [1] ("23e7b5c2e271: mm, madvise_inject_error:
    Let memory_failure() optionally take a page reference")

    Link: https://lkml.kernel.org/r/20201207094818.8518-1-osalvador@suse.de
    Fixes: 23e7b5c2e271 ("mm, madvise_inject_error: Let memory_failure() optionally take a page reference")
    Signed-off-by: Oscar Salvador
    Suggested-by: Vlastimil Babka
    Acked-by: Naoya Horiguchi
    Cc: Vlastimil Babka
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Oscar Salvador
     

10 Dec, 2020

1 commit


09 Dec, 2020

1 commit

  • Jann spotted the security hole due to race of mm ownership check.

    If the task is sharing the mm_struct but goes through execve() before
    mm_access(), it could skip process_madvise_behavior_valid check. That
    makes *any advice hint* to reach into the remote process.

    This patch removes the mm ownership check. With it, it will lose the
    ability that local process could give *any* advice hint with vector
    interface for some reason (e.g., performance). Since there is no
    concrete example in upstream yet, it would be better to remove the
    abiliity at this moment and need to review when such new advice comes
    up.

    Fixes: ecb8ac8b1f14 ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
    Reported-by: Jann Horn
    Suggested-by: Jann Horn
    Signed-off-by: Minchan Kim
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

23 Nov, 2020

3 commits

  • Linux 5.10-rc5

    Signed-off-by: Greg Kroah-Hartman
    Change-Id: Ia5b23cceb3e0212c1c841f1297ecfab65cc9aaa6

    Greg Kroah-Hartman
     
  • The calculation of the end page index was incorrect, leading to a
    regression of 70% when running stress-ng.

    With this fix, we instead see a performance improvement of 3%.

    Fixes: e6e88712e43b ("mm: optimise madvise WILLNEED")
    Reported-by: kernel test robot
    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Tested-by: Xing Zhengjun
    Acked-by: Johannes Weiner
    Cc: William Kucharski
    Cc: Feng Tang
    Cc: "Chen, Rong A"
    Link: https://lkml.kernel.org/r/20201109134851.29692-1-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • The early return in process_madvise() will produce a memory leak.

    Fix it.

    Fixes: ecb8ac8b1f14 ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
    Signed-off-by: Eric Dumazet
    Signed-off-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Link: https://lkml.kernel.org/r/20201116155132.GA3805951@google.com
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

26 Oct, 2020

2 commits


25 Oct, 2020

1 commit


19 Oct, 2020

2 commits

  • There is usecase that System Management Software(SMS) want to give a
    memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
    case of Android, it is the ActivityManagerService.

    The information required to make the reclaim decision is not known to the
    app. Instead, it is known to the centralized userspace
    daemon(ActivityManagerService), and that daemon must be able to initiate
    reclaim on its own without any app involvement.

    To solve the issue, this patch introduces a new syscall
    process_madvise(2). It uses pidfd of an external process to give the
    hint. It also supports vector address range because Android app has
    thousands of vmas due to zygote so it's totally waste of CPU and power if
    we should call the syscall one by one for each vma.(With testing 2000-vma
    syscall vs 1-vector syscall, it showed 15% performance improvement. I
    think it would be bigger in real practice because the testing ran very
    cache friendly environment).

    Another potential use case for the vector range is to amortize the cost
    ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
    benefit users like TCP receive zerocopy and malloc implementations. In
    future, we could find more usecases for other advises so let's make it
    happens as API since we introduce a new syscall at this moment. With
    that, existing madvise(2) user could replace it with process_madvise(2)
    with their own pid if they want to have batch address ranges support
    feature.

    ince it could affect other process's address range, only privileged
    process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
    UID) gives it the right to ptrace the process could use it successfully.
    The flag argument is reserved for future use if we need to extend the API.

    I think supporting all hints madvise has/will supported/support to
    process_madvise is rather risky. Because we are not sure all hints make
    sense from external process and implementation for the hint may rely on
    the caller being in the current context so it could be error-prone. Thus,
    I just limited hints as MADV_[COLD|PAGEOUT] in this patch.

    If someone want to add other hints, we could hear the usecase and review
    it for each hint. It's safer for maintenance rather than introducing a
    buggy syscall but hard to fix it later.

    So finally, the API is as follows,

    ssize_t process_madvise(int pidfd, const struct iovec *iovec,
    unsigned long vlen, int advice, unsigned int flags);

    DESCRIPTION
    The process_madvise() system call is used to give advice or directions
    to the kernel about the address ranges from external process as well as
    local process. It provides the advice to address ranges of process
    described by iovec and vlen. The goal of such advice is to improve
    system or application performance.

    The pidfd selects the process referred to by the PID file descriptor
    specified in pidfd. (See pidofd_open(2) for further information)

    The pointer iovec points to an array of iovec structures, defined in
    as:

    struct iovec {
    void *iov_base; /* starting address */
    size_t iov_len; /* number of bytes to be advised */
    };

    The iovec describes address ranges beginning at address(iov_base)
    and with size length of bytes(iov_len).

    The vlen represents the number of elements in iovec.

    The advice is indicated in the advice argument, which is one of the
    following at this moment if the target process specified by pidfd is
    external.

    MADV_COLD
    MADV_PAGEOUT

    Permission to provide a hint to external process is governed by a
    ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).

    The process_madvise supports every advice madvise(2) has if target
    process is in same thread group with calling process so user could
    use process_madvise(2) to extend existing madvise(2) to support
    vector address ranges.

    RETURN VALUE
    On success, process_madvise() returns the number of bytes advised.
    This return value may be less than the total number of requested
    bytes, if an error occurred. The caller should check return value
    to determine whether a partial advice occurred.

    FAQ:

    Q.1 - Why does any external entity have better knowledge?

    Quote from Sandeep

    "For Android, every application (including the special SystemServer)
    are forked from Zygote. The reason of course is to share as many
    libraries and classes between the two as possible to benefit from the
    preloading during boot.

    After applications start, (almost) all of the APIs end up calling into
    this SystemServer process over IPC (binder) and back to the
    application.

    In a fully running system, the SystemServer monitors every single
    process periodically to calculate their PSS / RSS and also decides
    which process is "important" to the user for interactivity.

    So, because of how these processes start _and_ the fact that the
    SystemServer is looping to monitor each process, it does tend to *know*
    which address range of the application is not used / useful.

    Besides, we can never rely on applications to clean things up
    themselves. We've had the "hey app1, the system is low on memory,
    please trim your memory usage down" notifications for a long time[1].
    They rely on applications honoring the broadcasts and very few do.

    So, if we want to avoid the inevitable killing of the application and
    restarting it, some way to be able to tell the OS about unimportant
    memory in these applications will be useful.

    - ssp

    Q.2 - How to guarantee the race(i.e., object validation) between when
    giving a hint from an external process and get the hint from the target
    process?

    process_madvise operates on the target process's address space as it
    exists at the instant that process_madvise is called. If the space
    target process can run between the time the process_madvise process
    inspects the target process address space and the time that
    process_madvise is actually called, process_madvise may operate on
    memory regions that the calling process does not expect. It's the
    responsibility of the process calling process_madvise to close this
    race condition. For example, the calling process can suspend the
    target process with ptrace, SIGSTOP, or the freezer cgroup so that it
    doesn't have an opportunity to change its own address space before
    process_madvise is called. Another option is to operate on memory
    regions that the caller knows a priori will be unchanged in the target
    process. Yet another option is to accept the race for certain
    process_madvise calls after reasoning that mistargeting will do no
    harm. The suggested API itself does not provide synchronization. It
    also apply other APIs like move_pages, process_vm_write.

    The race isn't really a problem though. Why is it so wrong to require
    that callers do their own synchronization in some manner? Nobody
    objects to write(2) merely because it's possible for two processes to
    open the same file and clobber each other's writes --- instead, we tell
    people to use flock or something. Think about mmap. It never
    guarantees newly allocated address space is still valid when the user
    tries to access it because other threads could unmap the memory right
    before. That's where we need synchronization by using other API or
    design from userside. It shouldn't be part of API itself. If someone
    needs more fine-grained synchronization rather than process level,
    there were two ideas suggested - cookie[2] and anon-fd[3]. Both are
    applicable via using last reserved argument of the API but I don't
    think it's necessary right now since we have already ways to prevent
    the race so don't want to add additional complexity with more
    fine-grained optimization model.

    To make the API extend, it reserved an unsigned long as last argument
    so we could support it in future if someone really needs it.

    Q.3 - Why doesn't ptrace work?

    Injecting an madvise in the target process using ptrace would not work
    for us because such injected madvise would have to be executed by the
    target process, which means that process would have to be runnable and
    that creates the risk of the abovementioned race and hinting a wrong
    VMA. Furthermore, we want to act the hint in caller's context, not the
    callee's, because the callee is usually limited in cpuset/cgroups or
    even freezed state so they can't act by themselves quick enough, which
    causes more thrashing/kill. It doesn't work if the target process are
    ptraced(e.g., strace, debugger, minidump) because a process can have at
    most one ptracer.

    [1] https://developer.android.com/topic/performance/memory"

    [2] process_getinfo for getting the cookie which is updated whenever
    vma of process address layout are changed - Daniel Colascione -
    https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224

    [3] anonymous fd which is used for the object(i.e., address range)
    validation - Michal Hocko -
    https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/

    [minchan@kernel.org: fix process_madvise build break for arm64]
    Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
    [minchan@kernel.org: fix build error for mips of process_madvise]
    Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
    [akpm@linux-foundation.org: fix patch ordering issue]
    [akpm@linux-foundation.org: fix arm64 whoops]
    [minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
    [akpm@linux-foundation.org: fix i386 build]
    [sfr@canb.auug.org.au: fix syscall numbering]
    Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
    [sfr@canb.auug.org.au: madvise.c needs compat.h]
    Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
    [minchan@kernel.org: fix mips build]
    Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
    [yuehaibing@huawei.com: remove duplicate header which is included twice]
    Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
    [minchan@kernel.org: do not use helper functions for process_madvise]
    Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
    [akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
    [sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
    Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au

    Signed-off-by: Minchan Kim
    Signed-off-by: YueHaibing
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Reviewed-by: Suren Baghdasaryan
    Reviewed-by: Vlastimil Babka
    Acked-by: David Rientjes
    Cc: Alexander Duyck
    Cc: Brian Geffon
    Cc: Christian Brauner
    Cc: Daniel Colascione
    Cc: Jann Horn
    Cc: Jens Axboe
    Cc: Joel Fernandes
    Cc: Johannes Weiner
    Cc: John Dias
    Cc: Kirill Tkhai
    Cc: Michal Hocko
    Cc: Oleksandr Natalenko
    Cc: Sandeep Patil
    Cc: SeongJae Park
    Cc: SeongJae Park
    Cc: Shakeel Butt
    Cc: Sonny Rao
    Cc: Tim Murray
    Cc: Christian Brauner
    Cc: Florian Weimer
    Cc:
    Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
    Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
    Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
    Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Patch series "introduce memory hinting API for external process", v9.

    Now, we have MADV_PAGEOUT and MADV_COLD as madvise hinting API. With
    that, application could give hints to kernel what memory range are
    preferred to be reclaimed. However, in some platform(e.g., Android), the
    information required to make the hinting decision is not known to the app.
    Instead, it is known to a centralized userspace daemon(e.g.,
    ActivityManagerService), and that daemon must be able to initiate reclaim
    on its own without any app involvement.

    To solve the concern, this patch introduces new syscall -
    process_madvise(2). Bascially, it's same with madvise(2) syscall but it
    has some differences.

    1. It needs pidfd of target process to provide the hint

    2. It supports only MADV_{COLD|PAGEOUT|MERGEABLE|UNMEREABLE} at this
    moment. Other hints in madvise will be opened when there are explicit
    requests from community to prevent unexpected bugs we couldn't support.

    3. Only privileged processes can do something for other process's
    address space.

    For more detail of the new API, please see "mm: introduce external memory
    hinting API" description in this patchset.

    This patch (of 3):

    In upcoming patches, do_madvise will be called from external process
    context so we shouldn't asssume "current" is always hinted process's
    task_struct.

    Furthermore, we must not access mm_struct via task->mm, but obtain it via
    access_mm() once (in the following patch) and only use that pointer [1],
    so pass it to do_madvise() as well. Note the vma->vm_mm pointers are
    safe, so we can use them further down the call stack.

    And let's pass current->mm as arguments of do_madvise so it shouldn't
    change existing behavior but prepare next patch to make review easy.

    [vbabka@suse.cz: changelog tweak]
    [minchan@kernel.org: use current->mm for io_uring]
    Link: http://lkml.kernel.org/r/20200423145215.72666-1-minchan@kernel.org
    [akpm@linux-foundation.org: fix it for upstream changes]
    [akpm@linux-foundation.org: whoops]
    [rdunlap@infradead.org: add missing includes]

    Signed-off-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Reviewed-by: Suren Baghdasaryan
    Reviewed-by: Vlastimil Babka
    Acked-by: David Rientjes
    Cc: Jens Axboe
    Cc: Jann Horn
    Cc: Tim Murray
    Cc: Daniel Colascione
    Cc: Sandeep Patil
    Cc: Sonny Rao
    Cc: Brian Geffon
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: Shakeel Butt
    Cc: John Dias
    Cc: Joel Fernandes
    Cc: Alexander Duyck
    Cc: SeongJae Park
    Cc: Christian Brauner
    Cc: Kirill Tkhai
    Cc: Oleksandr Natalenko
    Cc: SeongJae Park
    Cc: Christian Brauner
    Cc: Florian Weimer
    Cc:
    Link: https://lkml.kernel.org/r/20200901000633.1920247-1-minchan@kernel.org
    Link: http://lkml.kernel.org/r/20200622192900.22757-1-minchan@kernel.org
    Link: http://lkml.kernel.org/r/20200302193630.68771-2-minchan@kernel.org
    Link: http://lkml.kernel.org/r/20200622192900.22757-2-minchan@kernel.org
    Link: https://lkml.kernel.org/r/20200901000633.1920247-2-minchan@kernel.org
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

17 Oct, 2020

3 commits

  • The preceding patches have ensured that core dumping properly takes the
    mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all
    its users.

    Signed-off-by: Jann Horn
    Signed-off-by: Andrew Morton
    Acked-by: Linus Torvalds
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Cc: "Eric W . Biederman"
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com
    Signed-off-by: Linus Torvalds

    Jann Horn
     
  • Currently, there is an inconsistency when calling soft-offline from
    different paths on a page that is already poisoned.

    1) madvise:

    madvise_inject_error skips any poisoned page and continues
    the loop.
    If that was the only page to madvise, it returns 0.

    2) /sys/devices/system/memory/:

    When calling soft_offline_page_store()->soft_offline_page(),
    we return -EBUSY in case the page is already poisoned.
    This is inconsistent with a) the above example and b)
    memory_failure, where we return 0 if the page was poisoned.

    Fix this by dropping the PageHWPoison() check in madvise_inject_error, and
    let soft_offline_page return 0 if it finds the page already poisoned.

    Please, note that this represents a user-api change, since now the return
    error when calling soft_offline_page_store()->soft_offline_page() will be
    different.

    Signed-off-by: Oscar Salvador
    Signed-off-by: Andrew Morton
    Acked-by: Naoya Horiguchi
    Cc: "Aneesh Kumar K.V"
    Cc: Aneesh Kumar K.V
    Cc: Aristeu Rozanski
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Dmitry Yakunin
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Oscar Salvador
    Cc: Qian Cai
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200922135650.1634-12-osalvador@suse.de
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     
  • Make a proper if-else condition for {hard,soft}-offline.

    Signed-off-by: Oscar Salvador
    Signed-off-by: Andrew Morton
    Acked-by: Naoya Horiguchi
    Cc: Michal Hocko
    Cc: Qian Cai
    Cc: Tony Luck
    Cc: "Aneesh Kumar K.V"
    Cc: Aneesh Kumar K.V
    Cc: Aristeu Rozanski
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Dmitry Yakunin
    Cc: Mike Kravetz
    Link: https://lkml.kernel.org/r/20200908075626.11976-3-osalvador@suse.de
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     

14 Oct, 2020

1 commit

  • Instead of calling find_get_entry() for every page index, use an XArray
    iterator to skip over NULL entries, and avoid calling get_page(),
    because we only want the swap entries.

    [willy@infradead.org: fix LTP soft lockups]
    Link: https://lkml.kernel.org/r/20200914165032.GS6583@casper.infradead.org

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Acked-by: Johannes Weiner
    Cc: Alexey Dobriyan
    Cc: Chris Wilson
    Cc: Huang Ying
    Cc: Hugh Dickins
    Cc: Jani Nikula
    Cc: Matthew Auld
    Cc: William Kucharski
    Cc: Qian Cai
    Link: https://lkml.kernel.org/r/20200910183318.20139-4-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

27 Sep, 2020

2 commits

  • …x/kernel/git/jejb/scsi") into 'android-mainline'

    Fixes up a merge issue in:
    net/ipv6/route.c
    on the way to a 5.9-rc7 release.

    Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
    Change-Id: I4eb508eb3761b95ad8f39dd79f03b3352481ceaf

    Greg Kroah-Hartman
     
  • syzbot reported the following KASAN splat:

    general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
    KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
    CPU: 1 PID: 6826 Comm: syz-executor142 Not tainted 5.9.0-rc4-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:__lock_acquire+0x84/0x2ae0 kernel/locking/lockdep.c:4296
    Code: ff df 8a 04 30 84 c0 0f 85 e3 16 00 00 83 3d 56 58 35 08 00 0f 84 0e 17 00 00 83 3d 25 c7 f5 07 00 74 2c 4c 89 e8 48 c1 e8 03 3c 30 00 74 12 4c 89 ef e8 3e d1 5a 00 48 be 00 00 00 00 00 fc
    RSP: 0018:ffffc90004b9f850 EFLAGS: 00010006
    Call Trace:
    lock_acquire+0x140/0x6f0 kernel/locking/lockdep.c:5006
    __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
    _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
    spin_lock include/linux/spinlock.h:354 [inline]
    madvise_cold_or_pageout_pte_range+0x52f/0x25c0 mm/madvise.c:389
    walk_pmd_range mm/pagewalk.c:89 [inline]
    walk_pud_range mm/pagewalk.c:160 [inline]
    walk_p4d_range mm/pagewalk.c:193 [inline]
    walk_pgd_range mm/pagewalk.c:229 [inline]
    __walk_page_range+0xe7b/0x1da0 mm/pagewalk.c:331
    walk_page_range+0x2c3/0x5c0 mm/pagewalk.c:427
    madvise_pageout_page_range mm/madvise.c:521 [inline]
    madvise_pageout mm/madvise.c:557 [inline]
    madvise_vma mm/madvise.c:946 [inline]
    do_madvise+0x12d0/0x2090 mm/madvise.c:1145
    __do_sys_madvise mm/madvise.c:1171 [inline]
    __se_sys_madvise mm/madvise.c:1169 [inline]
    __x64_sys_madvise+0x76/0x80 mm/madvise.c:1169
    do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    The backing vma was shmem.

    In case of split page of file-backed THP, madvise zaps the pmd instead
    of remapping of sub-pages. So we need to check pmd validity after
    split.

    Reported-by: syzbot+ecf80462cb7d5d552bc7@syzkaller.appspotmail.com
    Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
    Signed-off-by: Minchan Kim
    Acked-by: Kirill A. Shutemov
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

07 Sep, 2020

1 commit


06 Sep, 2020

1 commit

  • The syzbot reported the below use-after-free:

    BUG: KASAN: use-after-free in madvise_willneed mm/madvise.c:293 [inline]
    BUG: KASAN: use-after-free in madvise_vma mm/madvise.c:942 [inline]
    BUG: KASAN: use-after-free in do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
    Read of size 8 at addr ffff8880a6163eb0 by task syz-executor.0/9996

    CPU: 0 PID: 9996 Comm: syz-executor.0 Not tainted 5.9.0-rc1-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x18f/0x20d lib/dump_stack.c:118
    print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
    __kasan_report mm/kasan/report.c:513 [inline]
    kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
    madvise_willneed mm/madvise.c:293 [inline]
    madvise_vma mm/madvise.c:942 [inline]
    do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
    do_madvise mm/madvise.c:1169 [inline]
    __do_sys_madvise mm/madvise.c:1171 [inline]
    __se_sys_madvise mm/madvise.c:1169 [inline]
    __x64_sys_madvise+0xd9/0x110 mm/madvise.c:1169
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Allocated by task 9992:
    kmem_cache_alloc+0x138/0x3a0 mm/slab.c:3482
    vm_area_alloc+0x1c/0x110 kernel/fork.c:347
    mmap_region+0x8e5/0x1780 mm/mmap.c:1743
    do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
    vm_mmap_pgoff+0x195/0x200 mm/util.c:506
    ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Freed by task 9992:
    kmem_cache_free.part.0+0x67/0x1f0 mm/slab.c:3693
    remove_vma+0x132/0x170 mm/mmap.c:184
    remove_vma_list mm/mmap.c:2613 [inline]
    __do_munmap+0x743/0x1170 mm/mmap.c:2869
    do_munmap mm/mmap.c:2877 [inline]
    mmap_region+0x257/0x1780 mm/mmap.c:1716
    do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
    vm_mmap_pgoff+0x195/0x200 mm/util.c:506
    ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    It is because vma is accessed after releasing mmap_lock, but someone
    else acquired the mmap_lock and the vma is gone.

    Releasing mmap_lock after accessing vma should fix the problem.

    Fixes: 692fe62433d4c ("mm: Handle MADV_WILLNEED through vfs_fadvise()")
    Reported-by: syzbot+b90df26038d1d5d85c97@syzkaller.appspotmail.com
    Signed-off-by: Yang Shi
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Reviewed-by: Jan Kara
    Cc: [5.4+]
    Link: https://lkml.kernel.org/r/20200816141204.162624-1-shy828301@gmail.com
    Signed-off-by: Linus Torvalds

    Yang Shi
     

24 Jun, 2020

1 commit


10 Jun, 2020

2 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

27 Apr, 2020

1 commit


25 Apr, 2020

1 commit

  • IORING_OP_MADVISE can end up basically doing mprotect() on the VM of
    another process, which means that it can race with our crazy core dump
    handling which accesses the VM state without holding the mmap_sem
    (because it incorrectly thinks that it is the final user).

    This is clearly a core dumping problem, but we've never fixed it the
    right way, and instead have the notion of "check that the mm is still
    ok" using mmget_still_valid() after getting the mmap_sem for writing in
    any situation where we're not the original VM thread.

    See commit 04f5866e41fb ("coredump: fix race condition between
    mmget_not_zero()/get_task_mm() and core dumping") for more background on
    this whole mmget_still_valid() thing. You might want to have a barf bag
    handy when you do.

    We're discussing just fixing this properly in the only remaining core
    dumping routines. But even if we do that, let's make do_madvise() do
    the right thing, and then when we fix core dumping, we can remove all
    these mmget_still_valid() checks.

    Reported-and-tested-by: Jann Horn
    Fixes: c1ca757bd6f4 ("io_uring: add IORING_OP_MADVISE")
    Acked-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

23 Mar, 2020

1 commit


22 Mar, 2020

1 commit

  • Jann has brought up a very interesting point [1]. While shared pages
    are excluded from MADV_PAGEOUT normally, CoW pages can be easily
    reclaimed that way. This can lead to all sorts of hard to debug
    problems. E.g. performance problems outlined by Daniel [2].

    There are runtime environments where there is a substantial memory
    shared among security domains via CoW memory and a easy to reclaim way
    of that memory, which MADV_{COLD,PAGEOUT} offers, can lead to either
    performance degradation in for the parent process which might be more
    privileged or even open side channel attacks.

    The feasibility of the latter is not really clear to me TBH but there is
    no real reason for exposure at this stage. It seems there is no real
    use case to depend on reclaiming CoW memory via madvise at this stage so
    it is much easier to simply disallow it and this is what this patch
    does. Put it simply MADV_{PAGEOUT,COLD} can operate only on the
    exclusively owned memory which is a straightforward semantic.

    [1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKuw@mail.gmail.com
    [2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zRQ@mail.gmail.com

    Fixes: 9c276cc65a58 ("mm: introduce MADV_COLD")
    Reported-by: Jann Horn
    Signed-off-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Acked-by: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Daniel Colascione
    Cc: Dave Hansen
    Cc: "Joel Fernandes (Google)"
    Cc:
    Link: http://lkml.kernel.org/r/20200312082248.GS23944@dhcp22.suse.cz
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

03 Feb, 2020

1 commit


21 Jan, 2020

1 commit

  • This is in preparation for enabling this functionality through io_uring.
    Add a helper that is just exporting what sys_madvise() does, and have the
    system call use it.

    No functional changes in this patch.

    Reviewed-by: Pavel Begunkov
    Signed-off-by: Jens Axboe

    Jens Axboe
     

09 Dec, 2019

1 commit


02 Dec, 2019

3 commits

  • Improve readability, no functional change.

    Link: http://lkml.kernel.org/r/20191118032857.22683-1-richardw.yang@linux.intel.com
    Signed-off-by: Wei Yang
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • page_size() is supported after the commit a50b854e073c ("mm: introduce
    page_size()").

    Use page_size() in madvise_inject_error() for readability.

    [akpm@linux-foundation.org: use ulong for `size', per David]
    Link: http://lkml.kernel.org/r/29dce60c-38d6-0220-f292-e298f0c78c4d@huawei.com
    Signed-off-by: Yunfeng Ye
    Reviewed-by: Andrew Morton
    Acked-by: David Rientjes
    Cc: Jason Gunthorpe
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Peter Zijlstra
    Cc: Jan Kara
    Cc: Mike Rapoport
    Cc: Hu Shiyuan
    Cc: Feilong Lin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yunfeng Ye
     
  • Currently soft_offline_page() receives struct page, and its sibling
    memory_failure() receives pfn. This discrepancy looks weird and makes
    precheck on pfn validity tricky. So let's align them.

    Link: http://lkml.kernel.org/r/20191016234706.GA5493@www9186uo.sakura.ne.jp
    Signed-off-by: Naoya Horiguchi
    Acked-by: Andrew Morton
    Cc: David Hildenbrand
    Cc: Michal Hocko
    Cc: Oscar Salvador
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

18 Nov, 2019

1 commit


16 Nov, 2019

1 commit

  • Recently, I hit the following issue when running upstream.

    kernel BUG at mm/vmscan.c:1521!
    invalid opcode: 0000 [#1] SMP KASAN PTI
    CPU: 0 PID: 23385 Comm: syz-executor.6 Not tainted 5.4.0-rc4+ #1
    RIP: 0010:shrink_page_list+0x12b6/0x3530 mm/vmscan.c:1521
    Call Trace:
    reclaim_pages+0x499/0x800 mm/vmscan.c:2188
    madvise_cold_or_pageout_pte_range+0x58a/0x710 mm/madvise.c:453
    walk_pmd_range mm/pagewalk.c:53 [inline]
    walk_pud_range mm/pagewalk.c:112 [inline]
    walk_p4d_range mm/pagewalk.c:139 [inline]
    walk_pgd_range mm/pagewalk.c:166 [inline]
    __walk_page_range+0x45a/0xc20 mm/pagewalk.c:261
    walk_page_range+0x179/0x310 mm/pagewalk.c:349
    madvise_pageout_page_range mm/madvise.c:506 [inline]
    madvise_pageout+0x1f0/0x330 mm/madvise.c:542
    madvise_vma mm/madvise.c:931 [inline]
    __do_sys_madvise+0x7d2/0x1600 mm/madvise.c:1113
    do_syscall_64+0x9f/0x4c0 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    madvise_pageout() accesses the specified range of the vma and isolates
    them, then runs shrink_page_list() to reclaim its memory. But it also
    isolates the unevictable pages to reclaim. Hence, we can catch the
    cases in shrink_page_list().

    The root cause is that we scan the page tables instead of specific LRU
    list. and so we need to filter out the unevictable lru pages from our
    end.

    Link: http://lkml.kernel.org/r/1572616245-18946-1-git-send-email-zhongjiang@huawei.com
    Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
    Signed-off-by: zhong jiang
    Suggested-by: Johannes Weiner
    Acked-by: Johannes Weiner
    Acked-by: Minchan Kim
    Acked-by: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zhong jiang
     

02 Oct, 2019

1 commit


26 Sep, 2019

4 commits

  • There are many common parts between MADV_COLD and MADV_PAGEOUT.
    This patch factor them out to save code duplication.

    Link: http://lkml.kernel.org/r/20190726023435.214162-6-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Suggested-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Chris Zankel
    Cc: Daniel Colascione
    Cc: Dave Hansen
    Cc: Hillf Danton
    Cc: James E.J. Bottomley
    Cc: Joel Fernandes (Google)
    Cc: kbuild test robot
    Cc: Kirill A. Shutemov
    Cc: Oleksandr Natalenko
    Cc: Ralf Baechle
    Cc: Richard Henderson
    Cc: Shakeel Butt
    Cc: Sonny Rao
    Cc: Suren Baghdasaryan
    Cc: Tim Murray
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • When a process expects no accesses to a certain memory range for a long
    time, it could hint kernel that the pages can be reclaimed instantly but
    data should be preserved for future use. This could reduce workingset
    eviction so it ends up increasing performance.

    This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall.
    MADV_PAGEOUT can be used by a process to mark a memory range as not
    expected to be used for a long time so that kernel reclaims *any LRU*
    pages instantly. The hint can help kernel in deciding which pages to
    evict proactively.

    A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit
    intentionally because it's automatically bounded by PMD size. If PMD
    size(e.g., 256) makes some trouble, we could fix it later by limit it to
    SWAP_CLUSTER_MAX[1].

    - man-page material

    MADV_PAGEOUT (since Linux x.x)

    Do not expect access in the near future so pages in the specified
    regions could be reclaimed instantly regardless of memory pressure.
    Thus, access in the range after successful operation could cause
    major page fault but never lose the up-to-date contents unlike
    MADV_DONTNEED. Pages belonging to a shared mapping are only processed
    if a write access is allowed for the calling process.

    MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or
    VM_PFNMAP pages.

    [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/

    [minchan@kernel.org: clear PG_active on MADV_PAGEOUT]
    Link: http://lkml.kernel.org/r/20190802200643.GA181880@google.com
    [akpm@linux-foundation.org: resolve conflicts with hmm.git]
    Link: http://lkml.kernel.org/r/20190726023435.214162-5-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reported-by: kbuild test robot
    Acked-by: Michal Hocko
    Cc: James E.J. Bottomley
    Cc: Richard Henderson
    Cc: Ralf Baechle
    Cc: Chris Zankel
    Cc: Daniel Colascione
    Cc: Dave Hansen
    Cc: Hillf Danton
    Cc: Joel Fernandes (Google)
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Oleksandr Natalenko
    Cc: Shakeel Butt
    Cc: Sonny Rao
    Cc: Suren Baghdasaryan
    Cc: Tim Murray
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7.

    - Background

    The Android terminology used for forking a new process and starting an app
    from scratch is a cold start, while resuming an existing app is a hot
    start. While we continually try to improve the performance of cold
    starts, hot starts will always be significantly less power hungry as well
    as faster so we are trying to make hot start more likely than cold start.

    To increase hot start, Android userspace manages the order that apps
    should be killed in a process called ActivityManagerService.
    ActivityManagerService tracks every Android app or service that the user
    could be interacting with at any time and translates that into a ranked
    list for lmkd(low memory killer daemon). They are likely to be killed by
    lmkd if the system has to reclaim memory. In that sense they are similar
    to entries in any other cache. Those apps are kept alive for
    opportunistic performance improvements but those performance improvements
    will vary based on the memory requirements of individual workloads.

    - Problem

    Naturally, cached apps were dominant consumers of memory on the system.
    However, they were not significant consumers of swap even though they are
    good candidate for swap. Under investigation, swapping out only begins
    once the low zone watermark is hit and kswapd wakes up, but the overall
    allocation rate in the system might trip lmkd thresholds and cause a
    cached process to be killed(we measured performance swapping out vs.
    zapping the memory by killing a process. Unsurprisingly, zapping is 10x
    times faster even though we use zram which is much faster than real
    storage) so kill from lmkd will often satisfy the high zone watermark,
    resulting in very few pages actually being moved to swap.

    - Approach

    The approach we chose was to use a new interface to allow userspace to
    proactively reclaim entire processes by leveraging platform information.
    This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
    that are known to be cold from userspace and to avoid races with lmkd by
    reclaiming apps as soon as they entered the cached state. Additionally,
    it could provide many chances for platform to use much information to
    optimize memory efficiency.

    To achieve the goal, the patchset introduce two new options for madvise.
    One is MADV_COLD which will deactivate activated pages and the other is
    MADV_PAGEOUT which will reclaim private pages instantly. These new
    options complement MADV_DONTNEED and MADV_FREE by adding non-destructive
    ways to gain some free memory space. MADV_PAGEOUT is similar to
    MADV_DONTNEED in a way that it hints the kernel that memory region is not
    currently needed and should be reclaimed immediately; MADV_COLD is similar
    to MADV_FREE in a way that it hints the kernel that memory region is not
    currently needed and should be reclaimed when memory pressure rises.

    This patch (of 5):

    When a process expects no accesses to a certain memory range, it could
    give a hint to kernel that the pages can be reclaimed when memory pressure
    happens but data should be preserved for future use. This could reduce
    workingset eviction so it ends up increasing performance.

    This patch introduces the new MADV_COLD hint to madvise(2) syscall.
    MADV_COLD can be used by a process to mark a memory range as not expected
    to be used in the near future. The hint can help kernel in deciding which
    pages to evict early during memory pressure.

    It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves

    active file page -> inactive file LRU
    active anon page -> inacdtive anon LRU

    Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file
    LRU's head because MADV_COLD is a little bit different symantic.
    MADV_FREE means it's okay to discard when the memory pressure because the
    content of the page is *garbage* so freeing such pages is almost zero
    overhead since we don't need to swap out and access afterward causes just
    minor fault. Thus, it would make sense to put those freeable pages in
    inactive file LRU to compete other used-once pages. It makes sense for
    implmentaion point of view, too because it's not swapbacked memory any
    longer until it would be re-dirtied. Even, it could give a bonus to make
    them be reclaimed on swapless system. However, MADV_COLD doesn't mean
    garbage so reclaiming them requires swap-out/in in the end so it's bigger
    cost. Since we have designed VM LRU aging based on cost-model, anonymous
    cold pages would be better to position inactive anon's LRU list, not file
    LRU. Furthermore, it would help to avoid unnecessary scanning if system
    doesn't have a swap device. Let's start simpler way without adding
    complexity at this moment. However, keep in mind, too that it's a caveat
    that workloads with a lot of pages cache are likely to ignore MADV_COLD on
    anonymous memory because we rarely age anonymous LRU lists.

    * man-page material

    MADV_COLD (since Linux x.x)

    Pages in the specified regions will be treated as less-recently-accessed
    compared to pages in the system with similar access frequencies. In
    contrast to MADV_FREE, the contents of the region are preserved regardless
    of subsequent writes to pages.

    MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP
    pages.

    [akpm@linux-foundation.org: resolve conflicts with hmm.git]
    Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reported-by: kbuild test robot
    Acked-by: Michal Hocko
    Acked-by: Johannes Weiner
    Cc: James E.J. Bottomley
    Cc: Richard Henderson
    Cc: Ralf Baechle
    Cc: Chris Zankel
    Cc: Johannes Weiner
    Cc: Daniel Colascione
    Cc: Dave Hansen
    Cc: Hillf Danton
    Cc: Joel Fernandes (Google)
    Cc: Kirill A. Shutemov
    Cc: Oleksandr Natalenko
    Cc: Shakeel Butt
    Cc: Sonny Rao
    Cc: Suren Baghdasaryan
    Cc: Tim Murray
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • This patch is a part of a series that extends kernel ABI to allow to pass
    tagged user pointers (with the top byte set to something else other than
    0x00) as syscall arguments.

    This patch allows tagged pointers to be passed to the following memory
    syscalls: get_mempolicy, madvise, mbind, mincore, mlock, mlock2, mprotect,
    mremap, msync, munlock, move_pages.

    The mmap and mremap syscalls do not currently accept tagged addresses.
    Architectures may interpret the tag as a background colour for the
    corresponding vma.

    Link: http://lkml.kernel.org/r/aaf0c0969d46b2feb9017f3e1b3ef3970b633d91.1563904656.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Reviewed-by: Khalid Aziz
    Reviewed-by: Vincenzo Frascino
    Reviewed-by: Catalin Marinas
    Reviewed-by: Kees Cook
    Cc: Al Viro
    Cc: Dave Hansen
    Cc: Eric Auger
    Cc: Felix Kuehling
    Cc: Jens Wiklander
    Cc: Mauro Carvalho Chehab
    Cc: Mike Rapoport
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov