Eric Lee / smarc-fsl-linux-kernel

08 Aug, 2020

40 commits

d57b2b5bc Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull mount leak fix from Al Viro:
"Regression fix for the syscalls-for-init series - fix a leak of a 'struct path'"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: fix a struct path leak in path_umount

Linus Torvalds
2020-08-08 12:03:25 +0800
049eb096d Merge tag 'pci-v5.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci ... Browse Code »

Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Fix pci_cfg_wait queue locking problem (Bjorn Helgaas)
- Convert PCIe capability PCIBIOS errors to errno (Bolarinwa Olayemi
Saheed)
- Align PCIe capability and PCI accessor return values (Bolarinwa
Olayemi Saheed)
- Fix pci_create_slot() reference count leak (Qiushi Wu)
- Announce device after early fixups (Tiezhu Yang)

PCI device hotplug:
- Make rpadlpar functions static (Wei Yongjun)

Driver binding:
- Add device even if driver attach failed (Rajat Jain)

Virtualization:
- xen: Remove redundant initialization of irq (Colin Ian King)

IOMMU:
- Add pci_pri_supported() to check device or associated PF (Ashok Raj)
- Release IVRS table in AMD ACS quirk (Hanjun Guo)
- Mark AMD Navi10 GPU rev 0x00 ATS as broken (Kai-Heng Feng)
- Treat "external-facing" devices themselves as internal (Rajat Jain)

MSI:
- Forward MSI-X error code in pci_alloc_irq_vectors_affinity() (Piotr
Stankiewicz)

Error handling:
- Clear PCIe Device Status errors only if OS owns AER (Jonathan
Cameron)
- Log correctable errors as warning, not error (Matt Jolly)
- Use 'pci_channel_state_t' instead of 'enum pci_channel_state' (Luc
Van Oostenryck)

Peer-to-peer DMA:
- Allow P2PDMA on AMD Zen and newer CPUs (Logan Gunthorpe)

ASPM:
- Add missing newline in sysfs 'policy' (Xiongfeng Wang)

Native PCIe controllers:
- Convert to devm_platform_ioremap_resource_byname() (Dejin Zheng)
- Convert to devm_platform_ioremap_resource() (Dejin Zheng)
- Remove duplicate error message from devm_pci_remap_cfg_resource()
callers (Dejin Zheng)
- Fix runtime PM imbalance on error (Dinghao Liu)
- Remove dev_err() when handing an error from platform_get_irq()
(Krzysztof Wilczyński)
- Use pci_host_bridge.windows list directly instead of splicing in a
temporary list for cadence, mvebu, host-common (Rob Herring)
- Use pci_host_probe() instead of open-coding all the pieces for
altera, brcmstb, iproc, mobiveil, rcar, rockchip, tegra, v3,
versatile, xgene, xilinx, xilinx-nwl (Rob Herring)
- Default host bridge parent device to the platform device (Rob
Herring)
- Use pci_is_root_bus() instead of tracking root bus number
separately in aardvark, designware (imx6, keystone,
designware-host), mobiveil, xilinx-nwl, xilinx, rockchip, rcar (Rob
Herring)
- Set host bridge bus number in pci_scan_root_bus_bridge() instead of
each driver for aardvark, designware-host, host-common, mediatek,
rcar, tegra, v3-semi (Rob Herring)
- Move DT resource setup into devm_pci_alloc_host_bridge() (Rob
Herring)
- Set bridge map_irq and swizzle_irq to default functions; drivers
that don't support legacy IRQs (iproc) need to undo this (Rob
Herring)

ARM Versatile PCIe controller driver:
- Drop flag PCI_ENABLE_PROC_DOMAINS (Rob Herring)

Cadence PCIe controller driver:
- Use "dma-ranges" instead of "cdns,no-bar-match-nbits" property
(Kishon Vijay Abraham I)
- Remove "mem" from reg binding (Kishon Vijay Abraham I)
- Fix cdns_pcie_{host|ep}_setup() error path (Kishon Vijay Abraham I)
- Convert all r/w accessors to perform only 32-bit accesses (Kishon
Vijay Abraham I)
- Add support to start link and verify link status (Kishon Vijay
Abraham I)
- Allow pci_host_bridge to have custom pci_ops (Kishon Vijay Abraham I)
- Add new *ops* for CPU addr fixup (Kishon Vijay Abraham I)
- Fix updating Vendor ID and Subsystem Vendor ID register (Kishon
Vijay Abraham I)
- Use bridge resources for outbound window setup (Rob Herring)
- Remove private bus number and range storage (Rob Herring)

Cadence PCIe endpoint driver:
- Add MSI-X support (Alan Douglas)

HiSilicon PCIe controller driver:
- Remove non-ECAM HiSilicon hip05/hip06 driver (Rob Herring)

Intel VMD host bridge driver:
- Use Shadow MEMBAR registers for QEMU/KVM guests (Jon Derrick)

Loongson PCIe controller driver:
- Use DECLARE_PCI_FIXUP_EARLY for bridge_class_quirk() (Tiezhu Yang)

Marvell Aardvark PCIe controller driver:
- Indicate error in 'val' when config read fails (Pali Rohár)
- Don't touch PCIe registers if no card connected (Pali Rohár)

Marvell MVEBU PCIe controller driver:
- Setup BAR0 in order to fix MSI (Shmuel Hazan)

Microsoft Hyper-V host bridge driver:
- Fix a timing issue which causes kdump to fail occasionally (Wei Hu)
- Make some functions static (Wei Yongjun)

NVIDIA Tegra PCIe controller driver:
- Revert tegra124 raw_violation_fixup (Nicolas Chauvet)
- Remove PLL power supplies (Thierry Reding)

Qualcomm PCIe controller driver:
- Change duplicate PCI reset to phy reset (Abhishek Sahu)
- Add missing ipq806x clocks in PCIe driver (Ansuel Smith)
- Add missing reset for ipq806x (Ansuel Smith)
- Add ext reset (Ansuel Smith)
- Use bulk clk API and assert on error (Ansuel Smith)
- Add support for tx term offset for rev 2.1.0 (Ansuel Smith)
- Define some PARF params needed for ipq8064 SoC (Ansuel Smith)
- Add ipq8064 rev2 variant (Ansuel Smith)
- Support PCI speed set for ipq806x (Sham Muthayyan)

Renesas R-Car PCIe controller driver:
- Use devm_pci_alloc_host_bridge() (Rob Herring)
- Use struct pci_host_bridge.windows list directly (Rob Herring)
- Convert rcar-gen2 to use modern host bridge probe functions (Rob
Herring)

TI J721E PCIe driver:
- Add TI J721E PCIe host and endpoint driver (Kishon Vijay Abraham I)

Xilinx Versal CPM PCIe controller driver:
- Add Versal CPM Root Port driver and YAML schema (Bharat Kumar
Gogada)

MicroSemi Switchtec management driver:
- Add missing __iomem and __user tags to fix sparse warnings (Logan
Gunthorpe)

Miscellaneous:
- Replace http:// links with https:// (Alexander A. Klimov)
- Replace lkml.org, spinics, gmane with lore.kernel.org (Bjorn
Helgaas)
- Remove unused pci_lost_interrupt() (Heiner Kallweit)
- Move PCI_VENDOR_ID_REDHAT definition to pci_ids.h (Huacai Chen)
- Fix kerneldoc warnings (Krzysztof Kozlowski)"

* tag 'pci-v5.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (113 commits)
PCI: Fix kerneldoc warnings
PCI: xilinx-cpm: Add Versal CPM Root Port driver
PCI: xilinx-cpm: Add YAML schemas for Versal CPM Root Port
PCI: Set bridge map_irq and swizzle_irq to default functions
PCI: Move DT resource setup into devm_pci_alloc_host_bridge()
PCI: rcar-gen2: Convert to use modern host bridge probe functions
PCI: Remove dev_err() when handing an error from platform_get_irq()
MAINTAINERS: Add Kishon Vijay Abraham I for TI J721E SoC PCIe
misc: pci_endpoint_test: Add J721E in pci_device_id table
PCI: j721e: Add TI J721E PCIe driver
PCI: switchtec: Add missing __iomem tag to fix sparse warnings
PCI: switchtec: Add missing __iomem and __user tags to fix sparse warnings
PCI: rpadlpar: Make functions static
PCI/P2PDMA: Allow P2PDMA on AMD Zen and newer CPUs
PCI: Release IVRS table in AMD ACS quirk
PCI: Announce device after early fixups
PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken
PCI: Remove unused pci_lost_interrupt()
dt-bindings: PCI: Add EP mode dt-bindings for TI's J721E SoC
dt-bindings: PCI: Add host mode dt-bindings for TI's J721E SoC
...

Linus Torvalds
2020-08-08 09:48:15 +0800
32663c78c Merge tag 'trace-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing updates from Steven Rostedt:

- The biggest news in that the tracing ring buffer can now time events
that interrupted other ring buffer events.

Before this change, if an interrupt came in while recording another
event, and that interrupt also had an event, those events would all
have the same time stamp as the event it interrupted.

Now, with the new design, those events will have a unique time stamp
and rightfully display the time for those events that were recorded
while interrupting another event.

- Bootconfig how has an "override" operator that lets the users have a
default config, but then add options to override the default.

- A fix was made to properly filter function graph tracing to the
ftrace PIDs. This came in at the end of the -rc cycle, and needs to
be backported.

- Several clean ups, performance updates, and minor fixes as well.

* tag 'trace-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (39 commits)
tracing: Add trace_array_init_printk() to initialize instance trace_printk() buffers
kprobes: Fix compiler warning for !CONFIG_KPROBES_ON_FTRACE
tracing: Use trace_sched_process_free() instead of exit() for pid tracing
bootconfig: Fix to find the initargs correctly
Documentation: bootconfig: Add bootconfig override operator
tools/bootconfig: Add testcases for value override operator
lib/bootconfig: Add override operator support
kprobes: Remove show_registers() function prototype
tracing/uprobe: Remove dead code in trace_uprobe_register()
kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler
ftrace: Fix ftrace_trace_task return value
tracepoint: Use __used attribute definitions from compiler_attributes.h
tracepoint: Mark __tracepoint_string's __used
trace : Have tracing buffer info use kvzalloc instead of kzalloc
tracing: Remove outdated comment in stack handling
ftrace: Do not let direct or IPMODIFY ftrace_ops be added to module and set trampolines
ftrace: Setup correct FTRACE_FL_REGS flags for module
tracing/hwlat: Honor the tracing_cpumask
tracing/hwlat: Drop the duplicate assignment in start_kthread()
tracing: Save one trace_event->type by using __TRACE_LAST_TYPE
...

Linus Torvalds
2020-08-08 09:29:15 +0800
7b9de9771 powerpc/ptrace: Fix build error in pkey_get() ... Browse Code »

The merge resolution in commit 25d8d4eecace left ret no longer used,
leading to:

arch/powerpc/kernel/ptrace/ptrace-view.c: In function ‘pkey_get’:
arch/powerpc/kernel/ptrace/ptrace-view.c:473:6: error: unused variable ‘ret’
473 | int ret;

Fix it by removing ret.

Fixes: 25d8d4eecace ("Merge tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux")
Signed-off-by: Michael Ellerman
Signed-off-by: Linus Torvalds

Michael Ellerman
2020-08-08 09:27:26 +0800
25ccd24ff fs: fix a struct path leak in path_umount ... Browse Code »

Make sure we also put the dentry and vfsmnt in the illegal flags
and !may_umount cases.

Fixes: 41525f56e256 ("fs: refactor ksys_umount")
Reported-by: Vikas Kumar
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2020-08-08 07:21:30 +0800
38ce2a9e3 tracing: Add trace_array_init_printk() to initialize instance trace_printk() buffers ... Browse Code »

As trace_array_printk() used with not global instances will not add noise to
the main buffer, they are OK to have in the kernel (unlike trace_printk()).
This require the subsystem to create their own tracing instance, and the
trace_array_printk() only writes into those instances.

Add trace_array_init_printk() to initialize the trace_printk() buffers
without printing out the WARNING message.

Reported-by: Sean Paul
Reviewed-by: Sean Paul
Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2020-08-08 05:05:01 +0800
30185b69a Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux ... Browse Code »

Pull clk updates from Stephen Boyd:
"It looks like a smaller batch of clk updates this time around.

In the core framework we just have some minor tweaks and a debugfs
feature, so not much to see there. The driver updates are fairly well
split between AT91 and Qualcomm clk support. Adding those two drivers
together equals about 50% of the diffstat.

Otherwise, the big amount of work this time was on supporting
Broadcom's Raspberry Pi firmware clks.

Highlights:

Core:
- Document clk_hw_round_rate() so it gets some more use
- Remove unused __clk_get_flags()
- Add a prepare/enable debugfs feature similar to rate setting

New Drivers:
- Add support for SAMA7G5 SoC clks
- Enable CPU clks on Qualcomm IPQ6018 SoCs
- Enable CPU clks on Qualcomm MSM8996 SoCs
- GPU clk support for Qualcomm SM8150 and SM8250 SoCs
- Audio clks on Qualcomm SC7180 SoCs
- Microchip Sparx5 DPLL clk
- Add support for the new Renesas RZ/G2H (R8A774E1) SoC

Updates:
- Make defines for bcm63xx-gate clks to use in DT
- Support BCM2711 SoC firmware clks
- Add HDMI clks for BCM2711 SoCs
- Add RTC related clks on Ingenic SoCs
- Support USB PHY clks on Ingenic SoCs
- Support gate clks on BCM6318 SoCs
- RMU and DMAC/GPIO clock support for Actions Semi S500 SoCs
- Use poll_timeout functions in Rockchip clk driver
- Support Rockchip rk3288w SoC variant
- Mark mac_lbtest critical on Rockchip rk3188
- Add CAAM clock support for i.MX vf610 driver
- Add MU root clock support for i.MX imx8mp driver
- Amlogic g12: add neural network accelerator clock sources
- Amlogic meson8: remove critical flag for main PLL divider
- Amlogic meson8: add video decoder clock gates
- Convert one more Renesas DT binding to json-schema
- Enhance critical clock handling on Renesas platforms to only
consider clocks that were enabled at boot time"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (79 commits)
clk: qcom: gcc: Make disp gpll0 branch aon for sc7180/sdm845
ipq806x: gcc: add support for child probe
clk: qcom: msm8996: Make symbol 'cpu_msm8996_clks' static
clk: qcom: ipq8074: Add correct index for PCIe clocks
clk: : drop a duplicated word
clk: renesas: cpg-mssr: Add r8a774e1 support
dt-bindings: clock: renesas,cpg-mssr: Document r8a774e1
clk: Drop duplicate selection in Kconfig
clk: qcom: smd: Add support for MSM8992/4 rpm clocks
clk: qcom: ipq8074: Add missing clocks for pcie
dt-bindings: clock: qcom: ipq8074: Add missing bindings for PCIe
Replace HTTP links with HTTPS ones: Common CLK framework
clk: qcom: Add CPU clock driver for msm8996
dt-bindings: clk: qcom: Add bindings for CPU clock for msm8996
soc: qcom: Separate kryo l2 accessors from PMU driver
clk: meson: meson8b: add the vclk2_en gate clock
clk: meson: meson8b: add the vclk_en gate clock
clk: qcom: Fix return value check in apss_ipq6018_probe()
clk: bcm: dvp: Add missing module informations
clk: meson: meson8b: Drop CLK_IS_CRITICAL from fclk_div2
...

Linus Torvalds
2020-08-08 04:35:51 +0800
0f43283be Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull fdpick coredump update from Al Viro:
"Switches fdpic coredumps away from original aout dumping primitives to
the same kind of regset use as regular elf coredumps do"

* 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
[elf-fdpic] switch coredump to regsets
[elf-fdpic] use elf_dump_thread_status() for the dumper thread as well
[elf-fdpic] move allocation of elf_thread_status into elf_dump_thread_status()
[elf-fdpic] coredump: don't bother with cyclic list for per-thread objects
kill elf_fpxregs_t
take fdpic-related parts of elf_prstatus out
unexport linux/elfcore.h

Linus Torvalds
2020-08-08 04:29:39 +0800
6ba0d2e4f Merge tag 'kallsyms_show_value-fix-v5.9-rc1' of git://git.kernel.org/pub/scm/lin… ... Browse Code »

…ux/kernel/git/kees/linux

Pull sysfs module section fix from Kees Cook:
"Fix sysfs module section output overflow.

About a month after my kallsyms_show_value() refactoring landed, 0day
noticed that there was a path through the kernfs binattr read handlers
that did not have PAGE_SIZEd buffers, and the module "sections" read
handler made a bad assumption about this, resulting in it stomping on
memory when reached through small-sized splice() calls.

I've added a set of tests to find these kinds of regressions more
quickly in the future as well"

Sefltests-acked-by: Shuah Khan <skhan@linuxfoundation.org>

* tag 'kallsyms_show_value-fix-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests: splice: Check behavior of full and short splices
module: Correctly truncate sysfs sections output

Linus Torvalds
2020-08-08 04:24:58 +0800
1fa2c0a0c Merge tag 'seccomp-v5.9-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull seccomp fix from Kees Cook:
"This fixes my typo in the SCM_RIGHTS refactoring that broke compat
handling.

Thanks to Thadeu Lima de Souza Cascardo for tracking it down, and to
Christian Zigotzky and Alex Xu for their reports"

* tag 'seccomp-v5.9-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
net/scm: Fix typo in SCM_RIGHTS compat refactoring

Linus Torvalds
2020-08-08 04:16:22 +0800
f6235eb18 Merge tag 'pm-5.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull more power management updates from Rafael Wysocki:
"These are mostly ARM cpufreq driver updates plus a cpufreq core
cleanup, an ARM-wide change to make schedutil the default scaling
governor, an intel_pstate driver fix and some runtime PM changes
regarding kerneldoc comments.

Specifics:

- Add adaptive voltage scaling (AVS) support to the brcmstb cpufreq
driver and clean it up (Florian Fainelli, Markus Mayer).

- Add a new Tegra cpufreq driver and clean up the existing one (Jon
Hunter, Sumit Gupta).

- Add bandwidth level support to the Qcom cpufreq driver along with
OPP changes (Sibi Sankar).

- Clean up the sti, cpufreq-dt, ap806, CPPC cpufreq drivers (Viresh
Kumar, Lee Jones, Ivan Kokshaysky, Sven Auhagen, Xin Hao).

- Make schedutil the default governor for ARM (Valentin Schneider).

- Fix dependency issues for the imx cpufreq driver (Walter Lozano).

- Clean up cached_resolved_idx handlihng in the cpufreq core (Viresh
Kumar).

- Fix the intel_pstate driver to use the correct maximum frequency
value when MSR_TURBO_RATIO_LIMIT is 0 (Srinivas Pandruvada).

- Provide kenrneldoc comments for multiple runtime PM helpers and
improve the pm_runtime_get_if_active() kerneldoc (Rafael Wysocki)"

* tag 'pm-5.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (22 commits)
cpufreq: intel_pstate: Fix cpuinfo_max_freq when MSR_TURBO_RATIO_LIMIT is 0
PM: runtime: Improve kerneldoc of pm_runtime_get_if_active()
PM: runtime: Add kerneldoc comments to multiple helpers
cpufreq: make schedutil the default for arm and arm64
cpufreq: cached_resolved_idx can not be negative
cpufreq: Add Tegra194 cpufreq driver
dt-bindings: arm: Add NVIDIA Tegra194 CPU Complex binding
cpufreq: imx: Select NVMEM_IMX_OCOTP
cpufreq: sti-cpufreq: Fix some formatting and misspelling issues
cpufreq: tegra186: Simplify probe return path
cpufreq: CPPC: Reuse caps variable in few routines
cpufreq: ap806: fix cpufreq driver needs ap cpu clk
cpufreq: cppc: Reorder code and remove apply_hisi_workaround variable
cpufreq: dt: fix oops on armada37xx
cpufreq: brcmstb-avs-cpufreq: send S2_ENTER / S2_EXIT commands to AVS
cpufreq: brcmstb-avs-cpufreq: Support polling AVS firmware
cpufreq: brcmstb-avs-cpufreq: more flexible interface for __issue_avs_command()
cpufreq: qcom: Disable fast switch when scaling DDR/L3
cpufreq: qcom: Update the bandwidth levels on frequency change
OPP: Add and export helper to set bandwidth
...

Linus Torvalds
2020-08-08 04:13:09 +0800
2f12d4408 Merge tag 'for-5.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/… ... Browse Code »

…device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- DM multipath locking fixes around m->flags tests and improvements to
bio-based code so that it follows patterns established by
request-based code.

- Request-based DM core improvement to eliminate unnecessary call to
blk_mq_queue_stopped().

- Add "panic_on_corruption" error handling mode to DM verity target.

- DM bufio fix to to perform buffer cleanup from a workqueue rather
than wait for IO in reclaim context from shrinker.

- DM crypt improvement to optionally avoid async processing via
workqueues for reads and/or writes -- via "no_read_workqueue" and
"no_write_workqueue" features. This more direct IO processing
improves latency and throughput with faster storage. Avoiding
workqueue IO submission for writes (DM_CRYPT_NO_WRITE_WORKQUEUE) is a
requirement for adding zoned block device support to DM crypt.

- Add zoned block device support to DM crypt. Makes use of
DM_CRYPT_NO_WRITE_WORKQUEUE and a new optional feature
(DM_CRYPT_WRITE_INLINE) that allows write completion to wait for
encryption to complete. This allows write ordering to be preserved,
which is needed for zoned block devices.

- Fix DM ebs target's check for REQ_OP_FLUSH.

- Fix DM core's report zones support to not report more zones than were
requested.

- A few small compiler warning fixes.

- DM dust improvements to return output directly to the user rather
than require they scrape the system log for output.

* tag 'for-5.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: don't call report zones for more than the user requested
dm ebs: Fix incorrect checking for REQ_OP_FLUSH
dm init: Set file local variable static
dm ioctl: Fix compilation warning
dm raid: Remove empty if statement
dm verity: Fix compilation warning
dm crypt: Enable zoned block device support
dm crypt: add flags to optionally bypass kcryptd workqueues
dm bufio: do buffer cleanup from a workqueue
dm rq: don't call blk_mq_queue_stopped() in dm_stop_queue()
dm dust: add interface to list all badblocks
dm dust: report some message results directly back to user
dm verity: add "panic_on_corruption" error handling mode
dm mpath: use double checked locking in fast path
dm mpath: rename current_pgpath to pgpath in multipath_prepare_ioctl
dm mpath: rework __map_bio()
dm mpath: factor out multipath_queue_bio
dm mpath: push locking down to must_push_back_rq()
dm mpath: take m->lock spinlock when testing QUEUE_IF_NO_PATH
dm mpath: changes from initial m->flags locking audit

Linus Torvalds
2020-08-08 04:08:09 +0800
fa73e2123 Merge tag 'media/v5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media ... Browse Code »

Pull media updates from Mauro Carvalho Chehab:

- Legacy soc_camera driver was removed from staging

- New I2C sensor related drivers: dw9768, ch7322, max9271, rdacm20

- TI vpe driver code was re-organized and had new features added

- Added Xilinx MIPI CSI-2 Rx Subsystem driver

- Added support for Infrared Toy and IR Droid devices

- Lots of random driver fixes, new features and cleanups

* tag 'media/v5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (318 commits)
media: camss: fix memory leaks on error handling paths in probe
media: davinci: vpif_capture: fix potential double free
media: radio: remove redundant assignment to variable retval
media: allegro: fix potential null dereference on header
media: mtk-mdp: Fix a refcounting bug on error in init
media: allegro: fix an error pointer vs NULL check
media: meye: fix missing pm_mchip_mode field
media: cafe-driver: use generic power management
media: saa7164: use generic power management
media: v4l2-dev/ioctl: Fix document for VIDIOC_QUERYCAP
media: v4l2: Correct kernel-doc inconsistency
media: v4l2: Correct kernel-doc inconsistency
media: dvbdev.h: keep * together with the type
media: v4l2-subdev.h: keep * together with the type
media: videobuf2: Print videobuf2 buffer state by name
media: colorspaces-details.rst: fix V4L2_COLORSPACE_JPEG description
media: tw68: use generic power management
media: meye: use generic power management
media: cx88: use generic power management
media: cx25821: use generic power management
...

Linus Torvalds
2020-08-08 04:00:53 +0800
75dee3b6d Merge tag 'mailbox-v5.9' of git://git.linaro.org/landing-teams/working/fujitsu/integration ... Browse Code »

Pull mailbox updates from Jassi Brar:
"mediatek:
- add support for mt6779 gce
- shutdown cleanup and address shift support

qcom:
- add msm8994 apcs and sdm660 hmss compatibility

imx:
- mark PM funcs __maybe

pcc:
- put acpi table before bailout

misc:
- replace http with https links"

* tag 'mailbox-v5.9' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: mediatek: cmdq: clear task in channel before shutdown
mailbox: cmdq: support mt6779 gce platform definition
mailbox: cmdq: variablize address shift in platform
dt-binding: gce: add gce header file for mt6779
mailbox: qcom: Add msm8994 apcs compatible
mailbox: qcom: Add sdm660 hmss compatible
mailbox: imx: Mark PM functions as __maybe_unused
mailbox: pcc: Put the PCCT table for error path
mailbox: Replace HTTP links with HTTPS ones

Linus Torvalds
2020-08-08 03:58:11 +0800
16b89f695 net/scm: Fix typo in SCM_RIGHTS compat refactoring ... Browse Code »

When refactoring the SCM_RIGHTS code, I accidentally mis-merged my
native/compat diffs, which entirely broke using SCM_RIGHTS in compat
mode. Use the correct helper.

Reported-by: Christian Zigotzky
Link: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-August/216156.html
Reported-by: "Alex Xu (Hello71)"
Link: https://lore.kernel.org/lkml/1596812929.lz7fuo8r2w.none@localhost/
Suggested-by: Thadeu Lima de Souza Cascardo
Fixes: c0029de50982 ("net/scm: Regularize compat handling of scm_detach_fds()")
Tested-by: Alex Xu (Hello71)
Acked-by: Thadeu Lima de Souza Cascardo
Signed-off-by: Kees Cook

Kees Cook
2020-08-08 03:43:25 +0800
ce615f5c1 Merge tag 'dmaengine-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine ... Browse Code »

Pull dmaengine updates from Vinod Koul:
"Core:
- Support out of order dma completion
- Support for repeating transaction

New controllers:
- Support for Actions S700 DMA engine
- Renesas R8A774E1, r8a7742 controller binding
- New driver for Xilinx DPDMA controller

Other:
- Support of out of order dma completion in idxd driver
- W=1 warning cleanup of subsystem
- Updates to ti-k3-dma, dw, idxd drivers"

* tag 'dmaengine-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (68 commits)
dmaengine: dw: Don't include unneeded header to platform data header
dmaengine: Actions: Add support for S700 DMA engine
dmaengine: Actions: get rid of bit fields from dma descriptor
dt-bindings: dmaengine: convert Actions Semi Owl SoCs bindings to yaml
dmaengine: idxd: add missing invalid flags field to completion
dmaengine: dw: Initialize max_sg_burst capability
dmaengine: dw: Introduce max burst length hw config
dmaengine: dw: Initialize min and max burst DMA device capability
dmaengine: dw: Set DMA device max segment size parameter
dmaengine: dw: Take HC_LLP flag into account for noLLP auto-config
dmaengine: Introduce DMA-device device_caps callback
dmaengine: Introduce max SG burst capability
dmaengine: Introduce min burst length capability
dt-bindings: dma: dw: Add max burst transaction length property
dt-bindings: dma: dw: Convert DW DMAC to DT binding
dmaengine: ti: k3-udma: Query throughput level information from hardware
dmaengine: ti: k3-udma: Use defines for capabilities register parsing
dmaengine: xilinx: dpdma: Fix kerneldoc warning
dmaengine: xilinx: dpdma: add missing kernel doc
dmaengine: xilinx: dpdma: remove comparison of unsigned expression
...

Linus Torvalds
2020-08-08 03:41:36 +0800
81e11336d Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc updates from Andrew Morton:

- a few MM hotfixes

- kthread, tools, scripts, ntfs and ocfs2

- some of MM

Subsystems affected by this patch series: kthread, tools, scripts, ntfs,
ocfs2 and mm (hofixes, pagealloc, slab-generic, slab, slub, kcsan,
debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, mincore,
sparsemem, vmalloc, kasan, pagealloc, hugetlb and vmscan).

* emailed patches from Andrew Morton : (162 commits)
mm: vmscan: consistent update to pgrefill
mm/vmscan.c: fix typo
khugepaged: khugepaged_test_exit() check mmget_still_valid()
khugepaged: retract_page_tables() remember to test exit
khugepaged: collapse_pte_mapped_thp() protect the pmd lock
khugepaged: collapse_pte_mapped_thp() flush the right range
mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
mm: thp: replace HTTP links with HTTPS ones
mm/page_alloc: fix memalloc_nocma_{save/restore} APIs
mm/page_alloc.c: skip setting nodemask when we are in interrupt
mm/page_alloc: fallbacks at most has 3 elements
mm/page_alloc: silence a KASAN false positive
mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask()
mm/page_alloc.c: simplify pageblock bitmap access
mm/page_alloc.c: extract the common part in pfn_to_bitidx()
mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits
mm/shuffle: remove dynamic reconfiguration
mm/memory_hotplug: document why shuffle_zone() is relevant
mm/page_alloc: remove nr_free_pagecache_pages()
mm: remove vm_total_pages
...

Linus Torvalds
2020-08-08 02:39:33 +0800
912c05720 mm: vmscan: consistent update to pgrefill ... Browse Code »

The vmstat pgrefill is useful together with pgscan and pgsteal stats to
measure the reclaim efficiency. However vmstat's pgrefill is not updated
consistently at system level. It gets updated for both global and memcg
reclaim however pgscan and pgsteal are updated for only global reclaim.
So, update pgrefill only for global reclaim. If someone is interested in
the stats representing both system level as well as memcg level reclaim,
then consult the root memcg's memory.stat instead of /proc/vmstat.

Signed-off-by: Shakeel Butt
Signed-off-by: Andrew Morton
Acked-by: Yafang Shao
Acked-by: Roman Gushchin
Acked-by: Chris Down
Cc: Johannes Weiner
Cc: Michal Hocko
Link: http://lkml.kernel.org/r/20200711011459.1159929-1-shakeelb@google.com
Signed-off-by: Linus Torvalds

Shakeel Butt
2020-08-08 02:33:29 +0800
238c30468 mm/vmscan.c: fix typo ... Browse Code »

Change "optizimation" to "optimization".

Signed-off-by: dylan-meiners
Signed-off-by: Andrew Morton
Reviewed-by: David Hildenbrand
Link: http://lkml.kernel.org/r/20200609185144.10049-1-spacct.spacct@gmail.com
Signed-off-by: Linus Torvalds

dylan-meiners
2020-08-08 02:33:29 +0800
bbe98f9ca khugepaged: khugepaged_test_exit() check mmget_still_valid() ... Browse Code »

Move collapse_huge_page()'s mmget_still_valid() check into
khugepaged_test_exit() itself. collapse_huge_page() is used for anon THP
only, and earned its mmget_still_valid() check because it inserts a huge
pmd entry in place of the page table's pmd entry; whereas
collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp()
merely clears the page table's pmd entry. But core dumping without mmap
lock must have been as open to mistaking a racily cleared pmd entry for a
page table at physical page 0, as exit_mmap() was. And we certainly have
no interest in mapping as a THP once dumping core.

Fixes: 59ea6d06cfa9 ("coredump: fix race condition between collapse_huge_page() and core dumping")
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Cc: Andrea Arcangeli
Cc: Song Liu
Cc: Mike Kravetz
Cc: Kirill A. Shutemov
Cc: [4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021217020.27773@eggly.anvils
Signed-off-by: Linus Torvalds

Hugh Dickins
2020-08-08 02:33:29 +0800
18e77600f khugepaged: retract_page_tables() remember to test exit ... Browse Code »

Only once have I seen this scenario (and forgot even to notice what forced
the eventual crash): a sequence of "BUG: Bad page map" alerts from
vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.

The pte mappings being zapped in this case were supposed to be from a huge
page of ext4 text (but could as well have been shmem): my belief is that
it was racing with collapse_file()'s retract_page_tables(), found *pmd
pointing to a page table, locked it, but *pmd had become 0 by the time
start_pte was decided.

In most cases, that possibility is excluded by holding mmap lock; but
exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged
checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
for example. But retract_page_tables() did not: fix that.

The fix is for retract_page_tables() to check khugepaged_test_exit(),
after acquiring mmap lock, before doing anything to the page table.
Getting the mmap lock serializes with __mmput(), which briefly takes and
drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
mm_users makes sure we don't touch the page table once exit_mmap() might
reach it, since exit_mmap() will be proceeding without mmap lock, not
expecting anyone to be racing with it.

Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Acked-by: Kirill A. Shutemov
Cc: Andrea Arcangeli
Cc: Mike Kravetz
Cc: Song Liu
Cc: [4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
Signed-off-by: Linus Torvalds

Hugh Dickins
2020-08-08 02:33:29 +0800
119a5fc16 khugepaged: collapse_pte_mapped_thp() protect the pmd lock ... Browse Code »

When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.

That's not enough. One machine has twice crashed under load, with "BUG:
spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second
crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
page_referenced() on a file THP, that had found a page table at *pmd)
discovers that the page table page and its lock have already been freed by
the time it comes to unlock.

Follow the example of retract_page_tables(), but we only need one of huge
page lock or i_mmap_lock_write to secure against this: because it's the
narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
the hpage earlier, choose to rely on huge page lock here.

Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Acked-by: Kirill A. Shutemov
Cc: Andrea Arcangeli
Cc: Mike Kravetz
Cc: Song Liu
Cc: [5.4+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils
Signed-off-by: Linus Torvalds

Hugh Dickins
2020-08-08 02:33:29 +0800
723a80daf khugepaged: collapse_pte_mapped_thp() flush the right range ... Browse Code »

pmdp_collapse_flush() should be given the start address at which the huge
page is mapped, haddr: it was given addr, which at that point has been
used as a local variable, incremented to the end address of the extent.

Found by source inspection while chasing a hugepage locking bug, which I
then could not explain by this. At first I thought this was very bad;
then saw that all of the page translations that were not flushed would
actually still point to the right pages afterwards, so harmless; then
realized that I know nothing of how different architectures and models
cache intermediate paging structures, so maybe it matters after all -
particularly since the page table concerned is immediately freed.

Much easier to fix than to think about.

Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Acked-by: Kirill A. Shutemov
Cc: Andrea Arcangeli
Cc: Mike Kravetz
Cc: Song Liu
Cc: [5.4+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021204390.27773@eggly.anvils
Signed-off-by: Linus Torvalds

Hugh Dickins
2020-08-08 02:33:29 +0800
75802ca66 mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible ... Browse Code »

This is found by code observation only.

Firstly, the worst case scenario should assume the whole range was covered
by pmd sharing. The old algorithm might not work as expected for ranges
like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the
expected range should be (0, 2g).

Since at it, remove the loop since it should not be required. With that,
the new code should be faster too when the invalidating range is huge.

Mike said:

: With range (1g-2m, 1g+2m) within a vma (0, 2g) the existing code will only
: adjust to (0, 1g+2m) which is incorrect.
:
: We should cc stable. The original reason for adjusting the range was to
: prevent data corruption (getting wrong page). Since the range is not
: always adjusted correctly, the potential for corruption still exists.
:
: However, I am fairly confident that adjust_range_if_pmd_sharing_possible
: is only gong to be called in two cases:
:
: 1) for a single page
: 2) for range == entire vma
:
: In those cases, the current code should produce the correct results.
:
: To be safe, let's just cc stable.

Fixes: 017b1660df89 ("mm: migration: fix migration of huge PMD shared pages")
Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Reviewed-by: Mike Kravetz
Cc: Andrea Arcangeli
Cc: Matthew Wilcox
Cc:
Link: http://lkml.kernel.org/r/20200730201636.74778-1-peterx@redhat.com
Signed-off-by: Linus Torvalds

Peter Xu
2020-08-08 02:33:29 +0800
42742d9bd mm: thp: replace HTTP links with HTTPS ones ... Browse Code »

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `xmlns`:
For each link, `http://[^# ]*(?:\w|/)`:
If neither `gnu\.org/license`, nor `mozilla\.org/MPL`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.

[akpm@linux-foundation.org: fix amd.com URL, per Vlastimil]

Signed-off-by: Alexander A. Klimov
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Cc: Vlastimil Babka
Link: http://lkml.kernel.org/r/20200713164345.36088-1-grandmaster@al2klimov.de
Signed-off-by: Linus Torvalds

Alexander A. Klimov
2020-08-08 02:33:29 +0800
8510e69c8 mm/page_alloc: fix memalloc_nocma_{save/restore} APIs ... Browse Code »

Currently, memalloc_nocma_{save/restore} API that prevents CMA area
in page allocation is implemented by using current_gfp_context(). However,
there are two problems of this implementation.

First, this doesn't work for allocation fastpath. In the fastpath,
original gfp_mask is used since current_gfp_context() is introduced in
order to control reclaim and it is on slowpath. So, CMA area can be
allocated through the allocation fastpath even if
memalloc_nocma_{save/restore} APIs are used. Currently, there is just
one user for these APIs and it has a fallback method to prevent actual
problem.
Second, clearing __GFP_MOVABLE in current_gfp_context() has a side effect
to exclude the memory on the ZONE_MOVABLE for allocation target.

To fix these problems, this patch changes the implementation to exclude
CMA area in page allocation. Main point of this change is using the
alloc_flags. alloc_flags is mainly used to control allocation so it fits
for excluding CMA area in allocation.

Fixes: d7fefcc8de91 (mm/cma: add PF flag to force non cma alloc)
Signed-off-by: Joonsoo Kim
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Cc: Christoph Hellwig
Cc: Roman Gushchin
Cc: Mike Kravetz
Cc: Naoya Horiguchi
Cc: Michal Hocko
Cc: "Aneesh Kumar K . V"
Link: http://lkml.kernel.org/r/1595468942-29687-1-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds

Joonsoo Kim
2020-08-08 02:33:29 +0800
182f3d7a0 mm/page_alloc.c: skip setting nodemask when we are in interrupt ... Browse Code »

When we are in the interrupt context, it is irrelevant to the current task
context. If we use current task's mems_allowed, we can be fair to alloc
pages in the fast path and fall back to slow path memory allocation when
the current node(which is the current task mems_allowed) does not have
enough memory to allocate. In this case, it slows down the memory
allocation speed of interrupt context. So we can skip setting the
nodemask to allow any node to allocate memory, so that fast path
allocation can success.

Signed-off-by: Muchun Song
Signed-off-by: Andrew Morton
Reviewed-by: Pekka Enberg
Cc: David Hildenbrand
Link: http://lkml.kernel.org/r/20200706025921.53683-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds

Muchun Song
2020-08-08 02:33:29 +0800
da4156639 mm/page_alloc: fallbacks at most has 3 elements ... Browse Code »

MIGRAGE_TYPES is used to be the mark of end and there are at most 3
elements for the one dimension array.

Reduce to 3 to save little memory.

Signed-off-by: Wei Yang
Signed-off-by: Andrew Morton
Reviewed-by: David Hildenbrand
Link: http://lkml.kernel.org/r/20200625231022.18784-1-richard.weiyang@linux.alibaba.com
Signed-off-by: Linus Torvalds

Wei Yang
2020-08-08 02:33:29 +0800
9e15afa5a mm/page_alloc: silence a KASAN false positive ... Browse Code »

kernel_init_free_pages() will use memset() on s390 to clear all pages from
kmalloc_order() which will override KASAN redzones because a redzone was
setup from the end of the allocation size to the end of the last page.
Silence it by not reporting it there. An example of the report is,

BUG: KASAN: slab-out-of-bounds in __free_pages_ok
Write of size 4096 at addr 000000014beaa000
Call Trace:
show_stack+0x152/0x210
dump_stack+0x1f8/0x248
print_address_description.isra.13+0x5e/0x4d0
kasan_report+0x130/0x178
check_memory_region+0x190/0x218
memset+0x34/0x60
__free_pages_ok+0x894/0x12f0
kfree+0x4f2/0x5e0
unpack_to_rootfs+0x60e/0x650
populate_rootfs+0x56/0x358
do_one_initcall+0x1f4/0xa20
kernel_init_freeable+0x758/0x7e8
kernel_init+0x1c/0x170
ret_from_fork+0x24/0x28
Memory state around the buggy address:
000000014bea9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000014bea9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>000000014beaa000: 03 fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
^
000000014beaa080: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
000000014beaa100: fe fe fe fe fe fe fe fe fe fe fe fe fe fe

Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
Signed-off-by: Qian Cai
Signed-off-by: Andrew Morton
Tested-by: Vasily Gorbik
Acked-by: Vasily Gorbik
Cc: Dmitry Vyukov
Cc: Christian Borntraeger
Cc: Alexander Potapenko
Cc: Kees Cook
Cc: Heiko Carstens
Link: http://lkml.kernel.org/r/20200610052154.5180-1-cai@lca.pw
Signed-off-by: Linus Torvalds

Qian Cai
2020-08-08 02:33:29 +0800
535b81e20 mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask() ... Browse Code »

After previous cleanup, the end_bitidx is not necessary any more.

Signed-off-by: Wei Yang
Signed-off-by: Andrew Morton
Cc: Mel Gorman
Link: http://lkml.kernel.org/r/20200623124201.8199-4-richard.weiyang@linux.alibaba.com
Signed-off-by: Linus Torvalds

Wei Yang
2020-08-08 02:33:29 +0800
d93d5ab9c mm/page_alloc.c: simplify pageblock bitmap access ... Browse Code »

Due to commit e58469bafd05 ("mm: page_alloc: use word-based accesses for
get/set pageblock bitmaps"), pageblock bitmap is accessed with word-based
access. This operation could be simplified a little.

Intuitively, if we want to get a bit range [start_idx, end_idx] in a word,
we can do like this:

mask = (1 << (end_bitidx - start_bitidx + 1)) - 1;
ret = (word >> start_idx) & mask;

And also if we want to set a bit range [start_idx, end_idx] with flags, we
can do the same by just shift start_bitidx.

By doing so we reduce some instructions for these two helper functions:

Before Patched
set_pfnblock_flags_mask 209 198(-5%)
get_pfnblock_flags_mask 101 87(-13%)

Since the syntax is changed a little, we need to check the whole 4-bit
migrate_type instead of part of it.

Signed-off-by: Wei Yang
Signed-off-by: Andrew Morton
Cc: Mel Gorman
Link: http://lkml.kernel.org/r/20200623124201.8199-3-richard.weiyang@linux.alibaba.com
Signed-off-by: Linus Torvalds

Wei Yang
2020-08-08 02:33:29 +0800
399b795b7 mm/page_alloc.c: extract the common part in pfn_to_bitidx() ... Browse Code »

The return value calculation is the same both for SPARSEMEM or not.

Just take it out.

Signed-off-by: Wei Yang
Signed-off-by: Andrew Morton
Cc: Mel Gorman
Link: http://lkml.kernel.org/r/20200623124201.8199-2-richard.weiyang@linux.alibaba.com
Signed-off-by: Linus Torvalds

Wei Yang
2020-08-08 02:33:29 +0800
d38ac97f8 mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits ... Browse Code »

We already have the definition of PB_migratetype_bits and current
NR_MIGRATETYPE_BITS looks like a cyclic definition.

Just use PB_migratetype_bits is enough.

Signed-off-by: Wei Yang
Signed-off-by: Andrew Morton
Cc: Mel Gorman
Link: http://lkml.kernel.org/r/20200623124201.8199-1-richard.weiyang@linux.alibaba.com
Signed-off-by: Linus Torvalds

Wei Yang
2020-08-08 02:33:29 +0800
839195352 mm/shuffle: remove dynamic reconfiguration ... Browse Code »

Commit e900a918b098 ("mm: shuffle initial free memory to improve
memory-side-cache utilization") promised "autodetection of a
memory-side-cache (to be added in a follow-on patch)" over a year ago.

The original series included patches [1], however, they were dropped
during review [2] to be followed-up later.

Due to lack of platforms that publish an HMAT, autodetection is currently
not implemented. However, manual activation is actively used [3]. Let's
simplify for now and re-add when really (ever?) needed.

[1] https://lkml.kernel.org/r/154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com
[2] https://lkml.kernel.org/r/154690326478.676627.103843791978176914.stgit@dwillia2-desk3.amr.corp.intel.com
[3] https://lkml.kernel.org/r/CAPcyv4irwGUU2x+c6b4L=KbB1dnasNKaaZd6oSpYjL9kfsnROQ@mail.gmail.com

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Wei Yang
Acked-by: Dan Williams
Acked-by: Michal Hocko
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Minchan Kim
Cc: Huang Ying
Cc: Wei Yang
Cc: Mel Gorman
Cc: Dan Williams
Link: http://lkml.kernel.org/r/20200624094741.9918-4-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-08-08 02:33:29 +0800
93146d98c mm/memory_hotplug: document why shuffle_zone() is relevant ... Browse Code »

It's not completely obvious why we have to shuffle the complete zone -
introduced in commit e900a918b098 ("mm: shuffle initial free memory to
improve memory-side-cache utilization") - because some sort of shuffling
is already performed when onlining pages via __free_one_page(), placing
MAX_ORDER-1 pages either to the head or the tail of the freelist. Let's
document why we have to shuffle the complete zone when exposing larger,
contiguous physical memory areas to the buddy.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Acked-by: Dan Williams
Acked-by: Michal Hocko
Cc: Alexander Duyck
Cc: Dan Williams
Cc: Michal Hocko
Link: http://lkml.kernel.org/r/20200624094741.9918-3-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-08-08 02:33:29 +0800
56b9413bc mm/page_alloc: remove nr_free_pagecache_pages() ... Browse Code »

nr_free_pagecache_pages() isn't used outside page_alloc.c anymore - and
the name does not really help to understand what's going on. Let's
open-code it instead and add a comment.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Wei Yang
Reviewed-by: Pankaj Gupta
Reviewed-by: Mike Rapoport
Acked-by: Michal Hocko
Cc: Johannes Weiner
Cc: Minchan Kim
Cc: Huang Ying
Link: http://lkml.kernel.org/r/20200619132410.23859-3-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-08-08 02:33:29 +0800
0a18e6078 mm: remove vm_total_pages ... Browse Code »

The global variable "vm_total_pages" is a relic from older days. There is
only a single user that reads the variable - build_all_zonelists() - and
the first thing it does is update it.

Use a local variable in build_all_zonelists() instead and remove the
global variable.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Wei Yang
Reviewed-by: Pankaj Gupta
Reviewed-by: Mike Rapoport
Acked-by: Michal Hocko
Cc: Johannes Weiner
Cc: Huang Ying
Cc: Minchan Kim
Link: http://lkml.kernel.org/r/20200619132410.23859-2-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-08-08 02:33:28 +0800
f80b08fc4 mm, page_alloc: skip ->waternark_boost for atomic order-0 allocations ... Browse Code »

When boosting is enabled, it is observed that rate of atomic order-0
allocation failures are high due to the fact that free levels in the
system are checked with ->watermark_boost offset. This is not a problem
for sleepable allocations but for atomic allocations which looks like
regression.

This problem is seen frequently on system setup of Android kernel running
on Snapdragon hardware with 4GB RAM size. When no extfrag event occurred
in the system, ->watermark_boost factor is zero, thus the watermark
configurations in the system are:

_watermark = (
[WMARK_MIN] = 1272, --> ~5MB
[WMARK_LOW] = 9067, --> ~36MB
[WMARK_HIGH] = 9385), --> ~38MB
watermark_boost = 0

After launching some memory hungry applications in Android which can cause
extfrag events in the system to an extent that ->watermark_boost can be
set to max i.e. default boost factor makes it to 150% of high watermark.

_watermark = (
[WMARK_MIN] = 1272, --> ~5MB
[WMARK_LOW] = 9067, --> ~36MB
[WMARK_HIGH] = 9385), --> ~38MB
watermark_boost = 14077, -->~57MB

With default system configuration, for an atomic order-0 allocation to
succeed, having free memory of ~2MB will suffice. But boosting makes the
min_wmark to ~61MB thus for an atomic order-0 allocation to be successful
system should have minimum of ~23MB of free memory(from calculations of
zone_watermark_ok(), min = 3/4(min/2)). But failures are observed despite
system is having ~20MB of free memory. In the testing, this is
reproducible as early as first 300secs since boot and with furtherlowram
configurations(watermark_boost in
watermark caluculations for atomic order-0 allocations.

[akpm@linux-foundation.org: fix comment grammar, reflow comment]
[charante@codeaurora.org: fix suggested by Mel Gorman]
Link: http://lkml.kernel.org/r/31556793-57b1-1c21-1a9d-22674d9bd938@codeaurora.org

Signed-off-by: Charan Teja Reddy
Signed-off-by: Andrew Morton
Acked-by: Vlastimil Babka
Cc: Vinayak Menon
Cc: Mel Gorman
Link: http://lkml.kernel.org/r/1589882284-21010-1-git-send-email-charante@codeaurora.org
Signed-off-by: Linus Torvalds

Charan Teja Reddy
2020-08-08 02:33:28 +0800
f27ce0e14 page_alloc: consider highatomic reserve in watermark fast ... Browse Code »

zone_watermark_fast was introduced by commit 48ee5f3696f6 ("mm,
page_alloc: shortcut watermark checks for order-0 pages"). The commit
simply checks if free pages is bigger than watermark without additional
calculation such like reducing watermark.

It considered free cma pages but it did not consider highatomic reserved.
This may incur exhaustion of free pages except high order atomic free
pages.

Assume that reserved_highatomic pageblock is bigger than watermark min,
and there are only few free pages except high order atomic free. Because
zone_watermark_fast passes the allocation without considering high order
atomic free, normal reclaimable allocation like GFP_HIGHUSER will consume
all the free pages. Then finally order-0 atomic allocation may fail on
allocation.

This means watermark min is not protected against non-atomic allocation.
The order-0 atomic allocation with ALLOC_HARDER unwantedly can be failed.
Additionally the __GFP_MEMALLOC allocation with ALLOC_NO_WATERMARKS also
can be failed.

To avoid the problem, zone_watermark_fast should consider highatomic
reserve. If the actual size of high atomic free is counted accurately
like cma free, we may use it. On this patch just use
nr_reserved_highatomic. Additionally introduce
__zone_watermark_unusable_free to factor out common parts between
zone_watermark_fast and __zone_watermark_ok.

This is an example of ALLOC_HARDER allocation failure using v4.19 based
kernel.

Binder:9343_3: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
Call trace:
[] dump_stack+0xb8/0xf0
[] warn_alloc+0xd8/0x12c
[] __alloc_pages_nodemask+0x120c/0x1250
[] new_slab+0x128/0x604
[] ___slab_alloc+0x508/0x670
[] __kmalloc+0x2f8/0x310
[] context_struct_to_string+0x104/0x1cc
[] security_sid_to_context_core+0x74/0x144
[] security_sid_to_context+0x10/0x18
[] selinux_secid_to_secctx+0x20/0x28
[] security_secid_to_secctx+0x3c/0x70
[] binder_transaction+0xe68/0x454c
Mem-Info:
active_anon:102061 inactive_anon:81551 isolated_anon:0
active_file:59102 inactive_file:68924 isolated_file:64
unevictable:611 dirty:63 writeback:0 unstable:0
slab_reclaimable:13324 slab_unreclaimable:44354
mapped:83015 shmem:4858 pagetables:26316 bounce:0
free:2727 free_pcp:1035 free_cma:178
Node 0 active_anon:408244kB inactive_anon:326204kB active_file:236408kB inactive_file:275696kB unevictable:2444kB isolated(anon):0kB isolated(file):256kB mapped:332060kB dirty:252kB writeback:0kB shmem:19432kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Normal free:10908kB min:6192kB low:44388kB high:47060kB active_anon:409160kB inactive_anon:325924kB active_file:235820kB inactive_file:276628kB unevictable:2444kB writepending:252kB present:3076096kB managed:2673676kB mlocked:2444kB kernel_stack:62512kB pagetables:105264kB bounce:0kB free_pcp:4140kB local_pcp:40kB free_cma:712kB
lowmem_reserve[]: 0 0
Normal: 505*4kB (H) 357*8kB (H) 201*16kB (H) 65*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 10236kB
138826 total pagecache pages
5460 pages in swap cache
Swap cache stats: add 8273090, delete 8267506, find 1004381/4060142

This is an example of ALLOC_NO_WATERMARKS allocation failure using v4.14
based kernel.

kswapd0: page allocation failure: order:0, mode:0x140000a(GFP_NOIO|__GFP_HIGHMEM|__GFP_MOVABLE), nodemask=(null)
kswapd0 cpuset=/ mems_allowed=0
CPU: 4 PID: 1221 Comm: kswapd0 Not tainted 4.14.113-18770262-userdebug #1
Call trace:
[] dump_backtrace+0x0/0x248
[] show_stack+0x18/0x20
[] __dump_stack+0x20/0x28
[] dump_stack+0x68/0x90
[] warn_alloc+0x104/0x198
[] __alloc_pages_nodemask+0xdc0/0xdf0
[] zs_malloc+0x148/0x3d0
[] zram_bvec_rw+0x410/0x798
[] zram_rw_page+0x88/0xdc
[] bdev_write_page+0x70/0xbc
[] __swap_writepage+0x58/0x37c
[] swap_writepage+0x40/0x4c
[] shrink_page_list+0xc30/0xf48
[] shrink_inactive_list+0x2b0/0x61c
[] shrink_node_memcg+0x23c/0x618
[] shrink_node+0x1c8/0x304
[] kswapd+0x680/0x7c4
[] kthread+0x110/0x120
[] ret_from_fork+0x10/0x18
Mem-Info:
active_anon:111826 inactive_anon:65557 isolated_anon:0\x0a active_file:44260 inactive_file:83422 isolated_file:0\x0a unevictable:4158 dirty:117 writeback:0 unstable:0\x0a slab_reclaimable:13943 slab_unreclaimable:43315\x0a mapped:102511 shmem:3299 pagetables:19566 bounce:0\x0a free:3510 free_pcp:553 free_cma:0
Node 0 active_anon:447304kB inactive_anon:262228kB active_file:177040kB inactive_file:333688kB unevictable:16632kB isolated(anon):0kB isolated(file):0kB mapped:410044kB d irty:468kB writeback:0kB shmem:13196kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Normal free:14040kB min:7440kB low:94500kB high:98136kB reserved_highatomic:32768KB active_anon:447336kB inactive_anon:261668kB active_file:177572kB inactive_file:333768k B unevictable:16632kB writepending:480kB present:4081664kB managed:3637088kB mlocked:16632kB kernel_stack:47072kB pagetables:78264kB bounce:0kB free_pcp:2280kB local_pcp:720kB free_cma:0kB [ 4738.329607] lowmem_reserve[]: 0 0
Normal: 860*4kB (H) 453*8kB (H) 180*16kB (H) 26*32kB (H) 34*64kB (H) 6*128kB (H) 2*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 14232kB

This is trace log which shows GFP_HIGHUSER consumes free pages right
before ALLOC_NO_WATERMARKS.

-22275 [006] .... 889.213383: mm_page_alloc: page=00000000d2be5665 pfn=970744 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
-22275 [006] .... 889.213385: mm_page_alloc: page=000000004b2335c2 pfn=970745 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
-22275 [006] .... 889.213387: mm_page_alloc: page=00000000017272e1 pfn=970278 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
-22275 [006] .... 889.213389: mm_page_alloc: page=00000000c4be79fb pfn=970279 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
-22275 [006] .... 889.213391: mm_page_alloc: page=00000000f8a51d4f pfn=970260 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
-22275 [006] .... 889.213393: mm_page_alloc: page=000000006ba8f5ac pfn=970261 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
-22275 [006] .... 889.213395: mm_page_alloc: page=00000000819f1cd3 pfn=970196 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
-22275 [006] .... 889.213396: mm_page_alloc: page=00000000f6b72a64 pfn=970197 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
kswapd0-1207 [005] ...1 889.213398: mm_page_alloc: page= (null) pfn=0 order=0 migratetype=1 nr_free=3650 gfp_flags=GFP_NOWAIT|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_MOVABLE

[jaewon31.kim@samsung.com: remove redundant code for high-order]
Link: http://lkml.kernel.org/r/20200623035242.27232-1-jaewon31.kim@samsung.com

Reported-by: Yong-Taek Lee
Suggested-by: Minchan Kim
Signed-off-by: Jaewon Kim
Signed-off-by: Andrew Morton
Reviewed-by: Baoquan He
Acked-by: Vlastimil Babka
Acked-by: Mel Gorman
Cc: Johannes Weiner
Cc: Yong-Taek Lee
Cc: Michal Hocko
Link: http://lkml.kernel.org/r/20200619235958.11283-1-jaewon31.kim@samsung.com
Signed-off-by: Linus Torvalds

Jaewon Kim
2020-08-08 02:33:28 +0800
deba04872 mm, page_alloc: use unlikely() in task_capc() ... Browse Code »

Hugh noted that task_capc() could use unlikely(), as most of the time
there is no capture in progress and we are in page freeing hot path.
Indeed adding unlikely() produces assembly that better matches the
assumption and moves all the tests away from the hot path.

I have also noticed that we don't need to test for cc->direct_compaction
as the only place we set current->task_capture is compact_zone_order()
which also always sets cc->direct_compaction true.

Suggested-by: Hugh Dickins
Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Acked-by: Hugh Dickins
Acked-by: Mel Gorman
Cc: Alex Shi
Cc: Li Wang
Link: http://lkml.kernel.org/r/4a24f7af-3aa5-6e80-4ae6-8f253b562039@suse.cz
Signed-off-by: Linus Torvalds

Vlastimil Babka
2020-08-08 02:33:28 +0800