Eric Lee / smarc-fsl-linux-kernel

08 Oct, 2020

1 commit

2f68e5475 Merge tag 'v5.4.70' into imx_5.4.y ... Browse Code »

* tag 'v5.4.70': (3051 commits)
Linux 5.4.70
netfilter: ctnetlink: add a range check for l3/l4 protonum
ep_create_wakeup_source(): dentry name can change under you...
...

Conflicts:
arch/arm/mach-imx/pm-imx6.c
arch/arm64/boot/dts/freescale/imx8mm-evk.dts
arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts
drivers/crypto/caam/caamalg.c
drivers/gpu/drm/imx/dw_hdmi-imx.c
drivers/gpu/drm/imx/imx-ldb.c
drivers/gpu/drm/imx/ipuv3/ipuv3-crtc.c
drivers/mmc/host/sdhci-esdhc-imx.c
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
drivers/net/ethernet/freescale/enetc/enetc.c
drivers/net/ethernet/freescale/enetc/enetc_pf.c
drivers/thermal/imx_thermal.c
drivers/usb/cdns3/ep0.c
drivers/xen/swiotlb-xen.c
sound/soc/fsl/fsl_esai.c
sound/soc/fsl/fsl_sai.c

Signed-off-by: Jason Liu

Jason Liu
2020-10-08 17:46:51 +0800

07 Oct, 2020

1 commit

2334b2d5a block/diskstats: more accurate approximation of io_ticks for slow disks ... Browse Code »

commit 2b8bd423614c595540eaadcfbc702afe8e155e50 upstream.

Currently io_ticks is approximated by adding one at each start and end of
requests if jiffies counter has changed. This works perfectly for requests
shorter than a jiffy or if one of requests starts/ends at each jiffy.

If disk executes just one request at a time and they are longer than two
jiffies then only first and last jiffies will be accounted.

Fix is simple: at the end of request add up into io_ticks jiffies passed
since last update rather than just one jiffy.

Example: common HDD executes random read 4k requests around 12ms.

fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 &
iostat -x 10 sdb

Note changes of iostat's "%util" 8,43% -> 99,99% before/after patch:

Before:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,60 0,00 330,40 0,00 8,00 0,96 12,09 12,09 0,00 1,02 8,43

After:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,50 0,00 330,00 0,00 8,00 1,00 12,10 12,10 0,00 12,12 99,99

Now io_ticks does not loose time between start and end of requests, but
for queue-depth > 1 some I/O time between adjacent starts might be lost.

For load estimation "%util" is not as useful as average queue length,
but it clearly shows how often disk queue is completely empty.

Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
Signed-off-by: Konstantin Khlebnikov
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe
From: "Banerjee, Debabrata"
Signed-off-by: Greg Kroah-Hartman

Konstantin Khlebnikov
2020-10-07 14:01:29 +0800

19 Jun, 2020

1 commit

5691e2271 Merge tag 'v5.4.47' into imx_5.4.y ... Browse Code »

* tag 'v5.4.47': (2193 commits)
Linux 5.4.47
KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
...

Conflicts:
arch/arm/boot/dts/imx6qdl.dtsi
arch/arm/mach-imx/Kconfig
arch/arm/mach-imx/common.h
arch/arm/mach-imx/suspend-imx6.S
arch/arm64/boot/dts/freescale/imx8qxp-mek.dts
arch/powerpc/include/asm/cacheflush.h
drivers/cpufreq/imx6q-cpufreq.c
drivers/dma/imx-sdma.c
drivers/edac/synopsys_edac.c
drivers/firmware/imx/imx-scu.c
drivers/net/ethernet/freescale/fec.h
drivers/net/ethernet/freescale/fec_main.c
drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
drivers/net/phy/phy_device.c
drivers/perf/fsl_imx8_ddr_perf.c
drivers/usb/cdns3/gadget.c
drivers/usb/dwc3/gadget.c
include/uapi/linux/dma-buf.h

Signed-off-by: Jason Liu

Jason Liu
2020-06-19 17:32:49 +0800

11 Jun, 2020

3 commits

590459086 x86/speculation: Add Ivy Bridge to affected list ... Browse Code »

commit 3798cc4d106e91382bfe016caa2edada27c2bb3f upstream

Make the docs match the code.

Signed-off-by: Josh Poimboeuf
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Josh Poimboeuf
2020-06-11 02:24:58 +0800
faf187abd x86/speculation: Add SRBDS vulnerability and mitigation documentation ... Browse Code »

commit 7222a1b5b87417f22265c92deea76a6aecd0fb0f upstream

Add documentation for the SRBDS vulnerability and its mitigation.

[ bp: Massage.
jpoimboe: sysfs table strings. ]

Signed-off-by: Mark Gross
Signed-off-by: Borislav Petkov
Reviewed-by: Tony Luck
Reviewed-by: Josh Poimboeuf
Signed-off-by: Greg Kroah-Hartman

Mark Gross
2020-06-11 02:24:57 +0800
b0f61a050 x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation ... Browse Code »

commit 7e5b3c267d256822407a22fdce6afdf9cd13f9fb upstream

SRBDS is an MDS-like speculative side channel that can leak bits from the
random number generator (RNG) across cores and threads. New microcode
serializes the processor access during the execution of RDRAND and
RDSEED. This ensures that the shared buffer is overwritten before it is
released for reuse.

While it is present on all affected CPU models, the microcode mitigation
is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the
cases where TSX is not supported or has been disabled with TSX_CTRL.

The mitigation is activated by default on affected processors and it
increases latency for RDRAND and RDSEED instructions. Among other
effects this will reduce throughput from /dev/urandom.

* Enable administrator to configure the mitigation off when desired using
either mitigations=off or srbds=off.

* Export vulnerability status via sysfs

* Rename file-scoped macros to apply for non-whitelist table initializations.

[ bp: Massage,
- s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g,
- do not read arch cap MSR a second time in tsx_fused_off() - just pass it in,
- flip check in cpu_set_bug_bits() to save an indentation level,
- reflow comments.
jpoimboe: s/Mitigated/Mitigation/ in user-visible strings
tglx: Dropped the fused off magic for now
]

Signed-off-by: Mark Gross
Signed-off-by: Borislav Petkov
Signed-off-by: Thomas Gleixner
Reviewed-by: Tony Luck
Reviewed-by: Pawan Gupta
Reviewed-by: Josh Poimboeuf
Tested-by: Neelima Krishnan
Signed-off-by: Greg Kroah-Hartman

Mark Gross
2020-06-11 02:24:57 +0800

29 Apr, 2020

1 commit

4fbf19bbb USB: hub: Revert commit bd0e6c9614b9 ("usb: hub: try old enumeration scheme firs… ... Browse Code »

…t for high speed devices")

commit 3155f4f40811c5d7e3c686215051acf504e05565 upstream.

Commit bd0e6c9614b9 ("usb: hub: try old enumeration scheme first for
high speed devices") changed the way the hub driver enumerates
high-speed devices. Instead of using the "new" enumeration scheme
first and switching to the "old" scheme if that doesn't work, we start
with the "old" scheme. In theory this is better because the "old"
scheme is slightly faster -- it involves resetting the device only
once instead of twice.

However, for a long time Windows used only the "new" scheme. Zeng Tao
said that Windows 8 and later use the "old" scheme for high-speed
devices, but apparently there are some devices that don't like it.
William Bader reports that the Ricoh webcam built into his Sony Vaio
laptop not only doesn't enumerate under the "old" scheme, it gets hung
up so badly that it won't then enumerate under the "new" scheme! Only
a cold reset will fix it.

Therefore we will revert the commit and go back to trying the "new"
scheme first for high-speed devices.

Reported-and-tested-by: William Bader <williambader@hotmail.com>
Ref: https://bugzilla.kernel.org/show_bug.cgi?id=207219
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: bd0e6c9614b9 ("usb: hub: try old enumeration scheme first for high speed devices")
CC: Zeng Tao <prime.zeng@hisilicon.com>
CC: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.2004221611230.11262-100000@iolanthe.rowland.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Alan Stern
2020-04-29 22:33:14 +0800

23 Apr, 2020

1 commit

7d4adb1d3 docs: Fix path to MTD command line partition parser ... Browse Code »

commit fb2511247dc4061fd122d0195838278a4a0b7b59 upstream.

cmdlinepart.c has been moved to drivers/mtd/parsers/.

Fixes: a3f12a35c91d ("mtd: parsers: Move CMDLINE parser")
Signed-off-by: Jonathan Neuschäfer
Signed-off-by: Jonathan Corbet
Signed-off-by: Greg Kroah-Hartman

Jonathan Neuschäfer
2020-04-23 16:36:45 +0800

21 Mar, 2020

1 commit

20eed7692 ACPI: watchdog: Allow disabling WDAT at boot ... Browse Code »

[ Upstream commit 3f9e12e0df012c4a9a7fd7eb0d3ae69b459d6b2c ]

In case the WDAT interface is broken, give the user an option to
ignore it to let a native driver bind to the watchdog device instead.

Signed-off-by: Jean Delvare
Acked-by: Mika Westerberg
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Sasha Levin

Jean Delvare
2020-03-21 15:11:47 +0800

08 Mar, 2020

1 commit

335d2828a Merge tag 'v5.4.24' into imx_5.4.y ... Browse Code »

Merge Linux stable release v5.4.24 into imx_5.4.y

* tag 'v5.4.24': (3306 commits)
Linux 5.4.24
blktrace: Protect q->blk_trace with RCU
kvm: nVMX: VMWRITE checks unsupported field before read-only field
...

Signed-off-by: Jason Liu

Conflicts:
arch/arm/boot/dts/imx6sll-evk.dts
arch/arm/boot/dts/imx7ulp.dtsi
arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
drivers/clk/imx/clk-composite-8m.c
drivers/gpio/gpio-mxc.c
drivers/irqchip/Kconfig
drivers/mmc/host/sdhci-of-esdhc.c
drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
drivers/net/can/flexcan.c
drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
drivers/net/ethernet/mscc/ocelot.c
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
drivers/net/phy/realtek.c
drivers/pci/controller/mobiveil/pcie-mobiveil-host.c
drivers/perf/fsl_imx8_ddr_perf.c
drivers/tee/optee/shm_pool.c
drivers/usb/cdns3/gadget.c
kernel/sched/cpufreq.c
net/core/xdp.c
sound/soc/fsl/fsl_esai.c
sound/soc/fsl/fsl_sai.c
sound/soc/sof/core.c
sound/soc/sof/imx/Kconfig
sound/soc/sof/loader.c

Jason Liu
2020-03-08 18:57:18 +0800

04 Mar, 2020

2 commits

6c8278b52 MLK-23418-2 docs/perf: update ddr perf guide for PMU in DB ... Browse Code »

Update ddr perf guide for PMU in DRAM Block (DB).

Reviewed-by: Fugang Duan
Signed-off-by: Joakim Zhang

Joakim Zhang
2020-03-04 09:06:04 +0800
ca12e6a92 MLK-23417-2 docs/perf: Add explanation for DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER quirk ... Browse Code »

Add explanation for DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER quirk.

Reviewed-by: Fugang Duan
Signed-off-by: Joakim Zhang

Joakim Zhang
2020-03-04 09:06:04 +0800

18 Jan, 2020

1 commit

2ed4cb645 dm: add dm-clone to the documentation index ... Browse Code »

commit 484e0d2b11e1fdd0d17702b282eb2ed56148385f upstream.

Fixes: 7431b7835f554 ("dm: add clone target")
Signed-off-by: Diego Calleja
Signed-off-by: Nikos Tsironis
Signed-off-by: Mike Snitzer
Signed-off-by: Greg Kroah-Hartman

Diego Calleja
2020-01-18 02:48:45 +0800

09 Jan, 2020

1 commit

585017928 ACPI: sysfs: Change ACPI_MASKABLE_GPE_MAX to 0x100 ... Browse Code »

commit a7583e72a5f22470d3e6fd3b6ba912892242339f upstream.

The commit 0f27cff8597d ("ACPI: sysfs: Make ACPI GPE mask kernel
parameter cover all GPEs") says:
"Use a bitmap of size 0xFF instead of a u64 for the GPE mask so 256
GPEs can be masked"

But the masking of GPE 0xFF it not supported and the check condition
"gpe > ACPI_MASKABLE_GPE_MAX" is not valid because the type of gpe is
u8.

So modify the macro ACPI_MASKABLE_GPE_MAX to 0x100, and drop the "gpe >
ACPI_MASKABLE_GPE_MAX" check. In addition, update the docs "Format" for
acpi_mask_gpe parameter.

Fixes: 0f27cff8597d ("ACPI: sysfs: Make ACPI GPE mask kernel parameter cover all GPEs")
Signed-off-by: Yunfeng Ye
[ rjw: Use u16 as gpe data type in acpi_gpe_apply_masked_gpes() ]
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Greg Kroah-Hartman

Yunfeng Ye
2020-01-09 17:20:02 +0800

18 Dec, 2019

1 commit

d8fc2266c USB: documentation: flags on usb-storage versus UAS ... Browse Code »

commit 65cc8bf99349f651a0a2cee69333525fe581f306 upstream.

Document which flags work storage, UAS or both

Signed-off-by: Oliver Neukum
Cc: stable
Link: https://lore.kernel.org/r/20191114112758.32747-4-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman

Oliver Neukum
2019-12-18 02:55:32 +0800

16 Dec, 2019

1 commit

622141309 Merge linux-5.4.y tag 'v5.4.3' into lf-5.4.y ... Browse Code »

This is the 5.4.3 stable release

Conflicts:
drivers/cpufreq/imx-cpufreq-dt.c
drivers/spi/spi-fsl-qspi.c

The conflict is very minor, fixed it when do the merge. The imx-cpufreq-dt.c
is just one line code-style change, using upstream one, no any function change.

The spi-fsl-qspi.c has minor conflicts when merge upstream fixes: c69b17da53b2
spi: spi-fsl-qspi: Clear TDH bits in FLSHCR register

After merge, basic boot sanity test and basic qspi test been done on i.mx

Signed-off-by: Jason Liu

Jason Liu
2019-12-16 14:38:10 +0800

29 Nov, 2019

1 commit

75cad94d0 x86/speculation: Fix incorrect MDS/TAA mitigation status ... Browse Code »

commit 64870ed1b12e235cfca3f6c6da75b542c973ff78 upstream.

For MDS vulnerable processors with TSX support, enabling either MDS or
TAA mitigations will enable the use of VERW to flush internal processor
buffers at the right code path. IOW, they are either both mitigated
or both not. However, if the command line options are inconsistent,
the vulnerabilites sysfs files may not report the mitigation status
correctly.

For example, with only the "mds=off" option:

vulnerabilities/mds:Vulnerable; SMT vulnerable
vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable

The mds vulnerabilities file has wrong status in this case. Similarly,
the taa vulnerability file will be wrong with mds mitigation on, but
taa off.

Change taa_select_mitigation() to sync up the two mitigation status
and have them turned off if both "mds=off" and "tsx_async_abort=off"
are present.

Update documentation to emphasize the fact that both "mds=off" and
"tsx_async_abort=off" have to be specified together for processors that
are affected by both TAA and MDS to be effective.

[ bp: Massage and add kernel-parameters.txt change too. ]

Fixes: 1b42f017415b ("x86/speculation/taa: Add mitigation for TSX Async Abort")
Signed-off-by: Waiman Long
Signed-off-by: Borislav Petkov
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Jiri Kosina
Cc: Jonathan Corbet
Cc: Josh Poimboeuf
Cc: linux-doc@vger.kernel.org
Cc: Mark Gross
Cc:
Cc: Pawan Gupta
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Tim Chen
Cc: Tony Luck
Cc: Tyler Hicks
Cc: x86-ml
Link: https://lkml.kernel.org/r/20191115161445.30809-2-longman@redhat.com
Signed-off-by: Greg Kroah-Hartman

Waiman Long
2019-11-29 17:09:46 +0800

28 Nov, 2019

3 commits

3d2d4538b docs: perf: Add imx-ddr to documentation index ... Browse Code »

Sphinx is currently outputting a warning where
the file 'imx-ddr.rst' is not included in the
documentation index. Additionally, the code
highlighting and doc formatting can be slightly
improved.

Signed-off-by: Adam Zerella
Signed-off-by: Jonathan Corbet

Adam Zerella
2019-11-28 12:08:34 +0800
9869cf739 docs/perf: Add AXI ID filter capabilities information ... Browse Code »

Add capabilities information for AXI ID filter.

Signed-off-by: Joakim Zhang
Signed-off-by: Will Deacon

Joakim Zhang
2019-11-28 12:08:34 +0800
d094d32b0 docs/perf: Add explanation for DDR_CAP_AXI_ID_FILTER_ENHANCED quirk ... Browse Code »

Add explanation for DDR_CAP_AXI_ID_FILTER_ENHANCED quirk.

Signed-off-by: Joakim Zhang
[will: Simplified wording]
Signed-off-by: Will Deacon

Joakim Zhang
2019-11-28 12:08:34 +0800

05 Nov, 2019

2 commits

7f00cc8d4 Documentation: Add ITLB_MULTIHIT documentation ... Browse Code »

Add the initial ITLB_MULTIHIT documentation.

[ tglx: Add it to the index so it gets actually built. ]

Signed-off-by: Antonio Gomez Iglesias
Signed-off-by: Nelson D'Souza
Signed-off-by: Paolo Bonzini
Signed-off-by: Thomas Gleixner

Gomez Iglesias, Antonio
2019-11-05 03:26:00 +0800
1aa9b9572 kvm: x86: mmu: Recovery of shattered NX large pages ... Browse Code »

The page table pages corresponding to broken down large pages are zapped in
FIFO order, so that the large page can potentially be recovered, if it is
not longer being used for execution. This removes the performance penalty
for walking deeper EPT page tables.

By default, one large page will last about one hour once the guest
reaches a steady state.

Signed-off-by: Junaid Shahid
Signed-off-by: Paolo Bonzini
Signed-off-by: Thomas Gleixner

Junaid Shahid
2019-11-05 03:26:00 +0800

04 Nov, 2019

1 commit

b8e8c8303 kvm: mmu: ITLB_MULTIHIT mitigation ... Browse Code »

With some Intel processors, putting the same virtual address in the TLB
as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit
and cause the processor to issue a machine check resulting in a CPU lockup.

Unfortunately when EPT page tables use huge pages, it is possible for a
malicious guest to cause this situation.

Add a knob to mark huge pages as non-executable. When the nx_huge_pages
parameter is enabled (and we are using EPT), all huge pages are marked as
NX. If the guest attempts to execute in one of those pages, the page is
broken down into 4K pages, which are then marked executable.

This is not an issue for shadow paging (except nested EPT), because then
the host is in control of TLB flushes and the problematic situation cannot
happen. With nested EPT, again the nested guest can cause problems shadow
and direct EPT is treated in the same way.

[ tglx: Fixup default to auto and massage wording a bit ]

Originally-by: Junaid Shahid
Signed-off-by: Paolo Bonzini
Signed-off-by: Thomas Gleixner

Paolo Bonzini
2019-11-04 19:22:02 +0800

28 Oct, 2019

3 commits

a7a248c59 x86/speculation/taa: Add documentation for TSX Async Abort ... Browse Code »

Add the documenation for TSX Async Abort. Include the description of
the issue, how to check the mitigation state, control the mitigation,
guidance for system administrators.

[ bp: Add proper SPDX tags, touch ups by Josh and me. ]

Co-developed-by: Antonio Gomez Iglesias

Signed-off-by: Pawan Gupta
Signed-off-by: Antonio Gomez Iglesias
Signed-off-by: Borislav Petkov
Signed-off-by: Thomas Gleixner
Reviewed-by: Mark Gross
Reviewed-by: Tony Luck
Reviewed-by: Josh Poimboeuf

Pawan Gupta
2019-10-28 15:37:00 +0800
7531a3596 x86/tsx: Add "auto" option to the tsx= cmdline parameter ... Browse Code »

Platforms which are not affected by X86_BUG_TAA may want the TSX feature
enabled. Add "auto" option to the TSX cmdline parameter. When tsx=auto
disable TSX when X86_BUG_TAA is present, otherwise enable TSX.

More details on X86_BUG_TAA can be found here:
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html

[ bp: Extend the arg buffer to accommodate "auto\0". ]

Signed-off-by: Pawan Gupta
Signed-off-by: Borislav Petkov
Signed-off-by: Thomas Gleixner
Reviewed-by: Tony Luck
Reviewed-by: Josh Poimboeuf

Pawan Gupta
2019-10-28 15:37:00 +0800
95c5824f7 x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default ... Browse Code »

Add a kernel cmdline parameter "tsx" to control the Transactional
Synchronization Extensions (TSX) feature. On CPUs that support TSX
control, use "tsx=on|off" to enable or disable TSX. Not specifying this
option is equivalent to "tsx=off". This is because on certain processors
TSX may be used as a part of a speculative side channel attack.

Carve out the TSX controlling functionality into a separate compilation
unit because TSX is a CPU feature while the TSX async abort control
machinery will go to cpu/bugs.c.

[ bp: - Massage, shorten and clear the arg buffer.
- Clarifications of the tsx= possible options - Josh.
- Expand on TSX_CTRL availability - Pawan. ]

Signed-off-by: Pawan Gupta
Signed-off-by: Borislav Petkov
Signed-off-by: Thomas Gleixner
Reviewed-by: Josh Poimboeuf

Pawan Gupta
2019-10-28 15:36:58 +0800

13 Oct, 2019

1 commit

680b5b3c5 Merge tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip ... Browse Code »

Pull xen fixes from Juergen Gross:

- correct panic handling when running as a Xen guest

- cleanup the Xen grant driver to remove printing a pointer being
always NULL

- remove a soon to be wrong call of of_dma_configure()

* tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Stop abusing DT of_dma_configure API
xen/grant-table: remove unnecessary printing
x86/xen: Return from panic notifier

Linus Torvalds
2019-10-13 05:11:21 +0800

08 Oct, 2019

2 commits

9783aa991 mm, memcg: proportional memory.{low,min} reclaim ... Browse Code »

cgroup v2 introduces two memory protection thresholds: memory.low
(best-effort) and memory.min (hard protection). While they generally do
what they say on the tin, there is a limitation in their implementation
that makes them difficult to use effectively: that cliff behaviour often
manifests when they become eligible for reclaim. This patch implements
more intuitive and usable behaviour, where we gradually mount more
reclaim pressure as cgroups further and further exceed their protection
thresholds.

This cliff edge behaviour happens because we only choose whether or not
to reclaim based on whether the memcg is within its protection limits
(see the use of mem_cgroup_protected in shrink_node), but we don't vary
our reclaim behaviour based on this information. Imagine the following
timeline, with the numbers the lruvec size in this zone:

1. memory.low=1000000, memory.current=999999. 0 pages may be scanned.
2. memory.low=1000000, memory.current=1000000. 0 pages may be scanned.
3. memory.low=1000000, memory.current=1000001. 1000001* pages may be
scanned. (?!)

* Of course, we won't usually scan all available pages in the zone even
without this patch because of scan control priority, over-reclaim
protection, etc. However, as shown by the tests at the end, these
techniques don't sufficiently throttle such an extreme change in input,
so cliff-like behaviour isn't really averted by their existence alone.

Here's an example of how this plays out in practice. At Facebook, we are
trying to protect various workloads from "system" software, like
configuration management tools, metric collectors, etc (see this[0] case
study). In order to find a suitable memory.low value, we start by
determining the expected memory range within which the workload will be
comfortable operating. This isn't an exact science -- memory usage deemed
"comfortable" will vary over time due to user behaviour, differences in
composition of work, etc, etc. As such we need to ballpark memory.low,
but doing this is currently problematic:

1. If we end up setting it too low for the workload, it won't have
*any* effect (see discussion above). The group will receive the full
weight of reclaim and won't have any priority while competing with the
less important system software, as if we had no memory.low configured
at all.

2. Because of this behaviour, we end up erring on the side of setting
it too high, such that the comfort range is reliably covered. However,
protected memory is completely unavailable to the rest of the system,
so we might cause undue memory and IO pressure there when we *know* we
have some elasticity in the workload.

3. Even if we get the value totally right, smack in the middle of the
comfort zone, we get extreme jumps between no pressure and full
pressure that cause unpredictable pressure spikes in the workload due
to the current binary reclaim behaviour.

With this patch, we can set it to our ballpark estimation without too much
worry. Any undesirable behaviour, such as too much or too little reclaim
pressure on the workload or system will be proportional to how far our
estimation is off. This means we can set memory.low much more
conservatively and thus waste less resources *without* the risk of the
workload falling off a cliff if we overshoot.

As a more abstract technical description, this unintuitive behaviour
results in having to give high-priority workloads a large protection
buffer on top of their expected usage to function reliably, as otherwise
we have abrupt periods of dramatically increased memory pressure which
hamper performance. Having to set these thresholds so high wastes
resources and generally works against the principle of work conservation.
In addition, having proportional memory reclaim behaviour has other
benefits. Most notably, before this patch it's basically mandatory to set
memory.low to a higher than desirable value because otherwise as soon as
you exceed memory.low, all protection is lost, and all pages are eligible
to scan again. By contrast, having a gradual ramp in reclaim pressure
means that you now still get some protection when thresholds are exceeded,
which means that one can now be more comfortable setting memory.low to
lower values without worrying that all protection will be lost. This is
important because workingset size is really hard to know exactly,
especially with variable workloads, so at least getting *some* protection
if your workingset size grows larger than you expect increases user
confidence in setting memory.low without a huge buffer on top being
needed.

Thanks a lot to Johannes Weiner and Tejun Heo for their advice and
assistance in thinking about how to make this work better.

In testing these changes, I intended to verify that:

1. Changes in page scanning become gradual and proportional instead of
binary.

To test this, I experimented stepping further and further down
memory.low protection on a workload that floats around 19G workingset
when under memory.low protection, watching page scan rates for the
workload cgroup:

+------------+-----------------+--------------------+--------------+
| memory.low | test (pgscan/s) | control (pgscan/s) | % of control |
+------------+-----------------+--------------------+--------------+
| 21G | 0 | 0 | N/A |
| 17G | 867 | 3799 | 23% |
| 12G | 1203 | 3543 | 34% |
| 8G | 2534 | 3979 | 64% |
| 4G | 3980 | 4147 | 96% |
| 0 | 3799 | 3980 | 95% |
+------------+-----------------+--------------------+--------------+

As you can see, the test kernel (with a kernel containing this
patch) ramps up page scanning significantly more gradually than the
control kernel (without this patch).

2. More gradual ramp up in reclaim aggression doesn't result in
premature OOMs.

To test this, I wrote a script that slowly increments the number of
pages held by stress(1)'s --vm-keep mode until a production system
entered severe overall memory contention. This script runs in a highly
protected slice taking up the majority of available system memory.
Watching vmstat revealed that page scanning continued essentially
nominally between test and control, without causing forward reclaim
progress to become arrested.

[0]: https://facebookmicrosites.github.io/cgroup2/docs/overview.html#case-study-the-fbtax2-project

[akpm@linux-foundation.org: reflow block comments to fit in 80 cols]
[chris@chrisdown.name: handle cgroup_disable=memory when getting memcg protection]
Link: http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name
Link: http://lkml.kernel.org/r/20190124014455.GA6396@chrisdown.name
Signed-off-by: Chris Down
Acked-by: Johannes Weiner
Reviewed-by: Roman Gushchin
Cc: Michal Hocko
Cc: Tejun Heo
Cc: Dennis Zhou
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chris Down
2019-10-08 06:47:20 +0800
c6875f3aa x86/xen: Return from panic notifier ... Browse Code »

Currently execution of panic() continues until Xen's panic notifier
(xen_panic_event()) is called at which point we make a hypercall that
never returns.

This means that any notifier that is supposed to be called later as
well as significant part of panic() code (such as pstore writes from
kmsg_dump()) is never executed.

There is no reason for xen_panic_event() to be this last point in
execution since panic()'s emergency_restart() will call into
xen_emergency_restart() from where we can perform our hypercall.

Nevertheless, we will provide xen_legacy_crash boot option that will
preserve original behavior during crash. This option could be used,
for example, if running kernel dumper (which happens after panic
notifiers) is undesirable.

Reported-by: James Dingwall
Signed-off-by: Boris Ostrovsky
Reviewed-by: Juergen Gross

Boris Ostrovsky
2019-10-08 05:53:30 +0800

28 Sep, 2019

1 commit

aefcf2f4b Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security ... Browse Code »

Pull kernel lockdown mode from James Morris:
"This is the latest iteration of the kernel lockdown patchset, from
Matthew Garrett, David Howells and others.

From the original description:

This patchset introduces an optional kernel lockdown feature,
intended to strengthen the boundary between UID 0 and the kernel.
When enabled, various pieces of kernel functionality are restricted.
Applications that rely on low-level access to either hardware or the
kernel may cease working as a result - therefore this should not be
enabled without appropriate evaluation beforehand.

The majority of mainstream distributions have been carrying variants
of this patchset for many years now, so there's value in providing a
doesn't meet every distribution requirement, but gets us much closer
to not requiring external patches.

There are two major changes since this was last proposed for mainline:

- Separating lockdown from EFI secure boot. Background discussion is
covered here: https://lwn.net/Articles/751061/

- Implementation as an LSM, with a default stackable lockdown LSM
module. This allows the lockdown feature to be policy-driven,
rather than encoding an implicit policy within the mechanism.

The new locked_down LSM hook is provided to allow LSMs to make a
policy decision around whether kernel functionality that would allow
tampering with or examining the runtime state of the kernel should be
permitted.

The included lockdown LSM provides an implementation with a simple
policy intended for general purpose use. This policy provides a coarse
level of granularity, controllable via the kernel command line:

lockdown={integrity|confidentiality}

Enable the kernel lockdown feature. If set to integrity, kernel features
that allow userland to modify the running kernel are disabled. If set to
confidentiality, kernel features that allow userland to extract
confidential information from the kernel are also disabled.

This may also be controlled via /sys/kernel/security/lockdown and
overriden by kernel configuration.

New or existing LSMs may implement finer-grained controls of the
lockdown features. Refer to the lockdown_reason documentation in
include/linux/security.h for details.

The lockdown feature has had signficant design feedback and review
across many subsystems. This code has been in linux-next for some
weeks, with a few fixes applied along the way.

Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf
when kernel lockdown is in confidentiality mode") is missing a
Signed-off-by from its author. Matthew responded that he is providing
this under category (c) of the DCO"

* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
kexec: Fix file verification on S390
security: constify some arrays in lockdown LSM
lockdown: Print current->comm in restriction messages
efi: Restrict efivar_ssdt_load when the kernel is locked down
tracefs: Restrict tracefs when the kernel is locked down
debugfs: Restrict debugfs when the kernel is locked down
kexec: Allow kexec_file() with appropriate IMA policy when locked down
lockdown: Lock down perf when in confidentiality mode
bpf: Restrict bpf when kernel lockdown is in confidentiality mode
lockdown: Lock down tracing and perf kprobes when in confidentiality mode
lockdown: Lock down /proc/kcore
x86/mmiotrace: Lock down the testmmiotrace module
lockdown: Lock down module params that specify hardware parameters (eg. ioport)
lockdown: Lock down TIOCSSERIAL
lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
acpi: Disable ACPI table override if the kernel is locked down
acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
ACPI: Limit access to custom_method when the kernel is locked down
x86/msr: Restrict MSR access when the kernel is locked down
x86: Lock down IO port access when the kernel is locked down
...

Linus Torvalds
2019-09-28 23:14:15 +0800

25 Sep, 2019

3 commits

9c9fa97a8 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge updates from Andrew Morton:

- a few hot fixes

- ocfs2 updates

- almost all of -mm (slab-generic, slab, slub, kmemleak, kasan,
cleanups, debug, pagecache, memcg, gup, pagemap, memory-hotplug,
sparsemem, vmalloc, initialization, z3fold, compaction, mempolicy,
oom-kill, hugetlb, migration, thp, mmap, madvise, shmem, zswap,
zsmalloc)

* emailed patches from Andrew Morton : (132 commits)
mm/zsmalloc.c: fix a -Wunused-function warning
zswap: do not map same object twice
zswap: use movable memory if zpool support allocate movable memory
zpool: add malloc_support_movable to zpool_driver
shmem: fix obsolete comment in shmem_getpage_gfp()
mm/madvise: reduce code duplication in error handling paths
mm: mmap: increase sockets maximum memory size pgoff for 32bits
mm/mmap.c: refine find_vma_prev() with rb_last()
riscv: make mmap allocation top-down by default
mips: use generic mmap top-down layout and brk randomization
mips: replace arch specific way to determine 32bit task with generic version
mips: adjust brk randomization offset to fit generic version
mips: use STACK_TOP when computing mmap base address
mips: properly account for stack randomization and stack guard gap
arm: use generic mmap top-down layout and brk randomization
arm: use STACK_TOP when computing mmap base address
arm: properly account for stack randomization and stack guard gap
arm64, mm: make randomization selected by generic topdown mmap layout
arm64, mm: move generic mmap layout functions to mm
arm64: consider stack randomization for mmap base only when necessary
...

Linus Torvalds
2019-09-25 07:10:23 +0800
0158115f7 memcg, kmem: deprecate kmem.limit_in_bytes ... Browse Code »

Cgroup v1 memcg controller has exposed a dedicated kmem limit to users
which turned out to be really a bad idea because there are paths which
cannot shrink the kernel memory usage enough to get below the limit (e.g.
because the accounted memory is not reclaimable). There are cases when
the failure is even not allowed (e.g. __GFP_NOFAIL). This means that the
kmem limit is in excess to the hard limit without any way to shrink and
thus completely useless. OOM killer cannot be invoked to handle the
situation because that would lead to a premature oom killing.

As a result many places might see ENOMEM returning from kmalloc and result
in unexpected errors. E.g. a global OOM killer when there is a lot of
free memory because ENOMEM is translated into VM_FAULT_OOM in #PF path and
therefore pagefault_out_of_memory would result in OOM killer.

Please note that the kernel memory is still accounted to the overall limit
along with the user memory so removing the kmem specific limit should
still allow to contain kernel memory consumption. Unlike the kmem one,
though, it invokes memory reclaim and targeted memcg oom killing if
necessary.

Start the deprecation process by crying to the kernel log. Let's see
whether there are relevant usecases and simply return to EINVAL in the
second stage if nobody complains in few releases.

[akpm@linux-foundation.org: tweak documentation text]
Link: http://lkml.kernel.org/r/20190911151612.GI4023@dhcp22.suse.cz
Signed-off-by: Michal Hocko
Reviewed-by: Shakeel Butt
Cc: Johannes Weiner
Cc: Vladimir Davydov
Cc: Andrey Ryabinin
Cc: Thomas Lindroth
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2019-09-25 06:54:10 +0800
8974558f4 mm, page_owner, debug_pagealloc: save and dump freeing stack trace ... Browse Code »

The debug_pagealloc functionality is useful to catch buggy page allocator
users that cause e.g. use after free or double free. When page
inconsistency is detected, debugging is often simpler by knowing the call
stack of process that last allocated and freed the page. When page_owner
is also enabled, we record the allocation stack trace, but not freeing.

This patch therefore adds recording of freeing process stack trace to page
owner info, if both page_owner and debug_pagealloc are configured and
enabled. With only page_owner enabled, this info is not useful for the
memory leak debugging use case. dump_page() is adjusted to print the
info. An example result of calling __free_pages() twice may look like
this (note the page last free stack trace):

BUG: Bad page state in process bash pfn:13d8f8
page:ffffc31984f63e00 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x1affff800000000()
raw: 01affff800000000 dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount
page_owner tracks the page as freed
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL)
prep_new_page+0x143/0x150
get_page_from_freelist+0x289/0x380
__alloc_pages_nodemask+0x13c/0x2d0
khugepaged+0x6e/0xc10
kthread+0xf9/0x130
ret_from_fork+0x3a/0x50
page last free stack trace:
free_pcp_prepare+0x134/0x1e0
free_unref_page+0x18/0x90
khugepaged+0x7b/0xc10
kthread+0xf9/0x130
ret_from_fork+0x3a/0x50
Modules linked in:
CPU: 3 PID: 271 Comm: bash Not tainted 5.3.0-rc4-2.g07a1a73-default+ #57
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x85/0xc0
bad_page.cold+0xba/0xbf
rmqueue_pcplist.isra.0+0x6c5/0x6d0
rmqueue+0x2d/0x810
get_page_from_freelist+0x191/0x380
__alloc_pages_nodemask+0x13c/0x2d0
__get_free_pages+0xd/0x30
__pud_alloc+0x2c/0x110
copy_page_range+0x4f9/0x630
dup_mmap+0x362/0x480
dup_mm+0x68/0x110
copy_process+0x19e1/0x1b40
_do_fork+0x73/0x310
__x64_sys_clone+0x75/0x80
do_syscall_64+0x6e/0x1e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f10af854a10
...

Link: http://lkml.kernel.org/r/20190820131828.22684-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Cc: Kirill A. Shutemov
Cc: Matthew Wilcox
Cc: Mel Gorman
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2019-09-25 06:54:08 +0800

24 Sep, 2019

1 commit

299d14d4c Merge tag 'pci-v5.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci ... Browse Code »

Pull PCI updates from Bjorn Helgaas:
"Enumeration:

- Consolidate _HPP/_HPX stuff in pci-acpi.c and simplify it
(Krzysztof Wilczynski)

- Fix incorrect PCIe device types and remove dev->has_secondary_link
to simplify code that deals with upstream/downstream ports (Mika
Westerberg)

- After suspend, restore Resizable BAR size bits correctly for 1MB
BARs (Sumit Saxena)

- Enable PCI_MSI_IRQ_DOMAIN support for RISC-V (Wesley Terpstra)

Virtualization:

- Add ACS quirks for iProc PAXB (Abhinav Ratna), Amazon Annapurna
Labs (Ali Saidi)

- Move sysfs SR-IOV functions to iov.c (Kelsey Skunberg)

- Remove group write permissions from sysfs sriov_numvfs,
sriov_drivers_autoprobe (Kelsey Skunberg)

Hotplug:

- Simplify pciehp indicator control (Denis Efremov)

Peer-to-peer DMA:

- Allow P2P DMA between root ports for whitelisted bridges (Logan
Gunthorpe)

- Whitelist some Intel host bridges for P2P DMA (Logan Gunthorpe)

- DMA map P2P DMA requests that traverse host bridge (Logan
Gunthorpe)

Amazon Annapurna Labs host bridge driver:

- Add DT binding and controller driver (Jonathan Chocron)

Hyper-V host bridge driver:

- Fix hv_pci_dev->pci_slot use-after-free (Dexuan Cui)

- Fix PCI domain number collisions (Haiyang Zhang)

- Use instance ID bytes 4 & 5 as PCI domain numbers (Haiyang Zhang)

- Fix build errors on non-SYSFS config (Randy Dunlap)

i.MX6 host bridge driver:

- Limit DBI register length (Stefan Agner)

Intel VMD host bridge driver:

- Fix config addressing issues (Jon Derrick)

Layerscape host bridge driver:

- Add bar_fixed_64bit property to endpoint driver (Xiaowei Bao)

- Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC drivers separately
(Xiaowei Bao)

Mediatek host bridge driver:

- Add MT7629 controller support (Jianjun Wang)

Mobiveil host bridge driver:

- Fix CPU base address setup (Hou Zhiqiang)

- Make "num-lanes" property optional (Hou Zhiqiang)

Tegra host bridge driver:

- Fix OF node reference leak (Nishka Dasgupta)

- Disable MSI for root ports to work around design problem (Vidya
Sagar)

- Add Tegra194 DT binding and controller support (Vidya Sagar)

- Add support for sideband pins and slot regulators (Vidya Sagar)

- Add PIPE2UPHY support (Vidya Sagar)

Misc:

- Remove unused pci_block_cfg_access() et al (Kelsey Skunberg)

- Unexport pci_bus_get(), etc (Kelsey Skunberg)

- Hide PM, VC, link speed, ATS, ECRC, PTM constants and interfaces in
the PCI core (Kelsey Skunberg)

- Clean up sysfs DEVICE_ATTR() usage (Kelsey Skunberg)

- Mark expected switch fall-through (Gustavo A. R. Silva)

- Propagate errors for optional regulators and PHYs (Thierry Reding)

- Fix kernel command line resource_alignment parameter issues (Logan
Gunthorpe)"

* tag 'pci-v5.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (112 commits)
PCI: Add pci_irq_vector() and other stubs when !CONFIG_PCI
arm64: tegra: Add PCIe slot supply information in p2972-0000 platform
arm64: tegra: Add configuration for PCIe C5 sideband signals
PCI: tegra: Add support to enable slot regulators
PCI: tegra: Add support to configure sideband pins
PCI: vmd: Fix shadow offsets to reflect spec changes
PCI: vmd: Fix config addressing when using bus offsets
PCI: dwc: Add validation that PCIe core is set to correct mode
PCI: dwc: al: Add Amazon Annapurna Labs PCIe controller driver
dt-bindings: PCI: Add Amazon's Annapurna Labs PCIe host bridge binding
PCI: Add quirk to disable MSI-X support for Amazon's Annapurna Labs Root Port
PCI/VPD: Prevent VPD access for Amazon's Annapurna Labs Root Port
PCI: Add ACS quirk for Amazon Annapurna Labs root ports
PCI: Add Amazon's Annapurna Labs vendor ID
MAINTAINERS: Add PCI native host/endpoint controllers designated reviewer
PCI: hv: Use bytes 4 and 5 from instance ID as the PCI domain numbers
dt-bindings: PCI: tegra: Add PCIe slot supplies regulator entries
dt-bindings: PCI: tegra: Add sideband pins configuration entries
PCI: tegra: Add Tegra194 PCIe support
PCI: Get rid of dev->has_secondary_link flag
...

Linus Torvalds
2019-09-24 10:16:01 +0800

22 Sep, 2019

1 commit

3e414b5bd Merge tag 'for-5.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/… ... Browse Code »

…device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- crypto and DM crypt advances that allow the crypto API to reclaim
implementation details that do not belong in DM crypt. The wrapper
template for ESSIV generation that was factored out will also be used
by fscrypt in the future.

- Add root hash pkcs#7 signature verification to the DM verity target.

- Add a new "clone" DM target that allows for efficient remote
replication of a device.

- Enhance DM bufio's cache to be tailored to each client based on use.
Clients that make heavy use of the cache get more of it, and those
that use less have reduced cache usage.

- Add a new DM_GET_TARGET_VERSION ioctl to allow userspace to query the
version number of a DM target (even if the associated module isn't
yet loaded).

- Fix invalid memory access in DM zoned target.

- Fix the max_discard_sectors limit advertised by the DM raid target;
it was mistakenly storing the limit in bytes rather than sectors.

- Small optimizations and cleanups in DM writecache target.

- Various fixes and cleanups in DM core, DM raid1 and space map portion
of DM persistent data library.

* tag 'for-5.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
dm: introduce DM_GET_TARGET_VERSION
dm bufio: introduce a global cache replacement
dm bufio: remove old-style buffer cleanup
dm bufio: introduce a global queue
dm bufio: refactor adjust_total_allocated
dm bufio: call adjust_total_allocated from __link_buffer and __unlink_buffer
dm: add clone target
dm raid: fix updating of max_discard_sectors limit
dm writecache: skip writecache_wait for pmem mode
dm stats: use struct_size() helper
dm crypt: omit parsing of the encapsulated cipher
dm crypt: switch to ESSIV crypto API template
crypto: essiv - create wrapper template for ESSIV generation
dm space map common: remove check for impossible sm_find_free() return value
dm raid1: use struct_size() with kzalloc()
dm writecache: optimize performance by sorting the blocks for writeback_all
dm writecache: add unlikely for getting two block with same LBA
dm writecache: remove unused member pointer in writeback_struct
dm zoned: fix invalid memory access
dm verity: add root hash pkcs#7 signature verification
...

Linus Torvalds
2019-09-22 01:40:37 +0800

21 Sep, 2019

1 commit

45824fc0d Merge tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux ... Browse Code »

Pull powerpc updates from Michael Ellerman:
"This is a bit late, partly due to me travelling, and partly due to a
power outage knocking out some of my test systems *while* I was
travelling.

- Initial support for running on a system with an Ultravisor, which
is software that runs below the hypervisor and protects guests
against some attacks by the hypervisor.

- Support for building the kernel to run as a "Secure Virtual
Machine", ie. as a guest capable of running on a system with an
Ultravisor.

- Some changes to our DMA code on bare metal, to allow devices with
medium sized DMA masks (> 32 && < 59 bits) to use more than 2GB of
DMA space.

- Support for firmware assisted crash dumps on bare metal (powernv).

- Two series fixing bugs in and refactoring our PCI EEH code.

- A large series refactoring our exception entry code to use gas
macros, both to make it more readable and also enable some future
optimisations.

As well as many cleanups and other minor features & fixups.

Thanks to: Adam Zerella, Alexey Kardashevskiy, Alistair Popple, Andrew
Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual,
Balbir Singh, Benjamin Herrenschmidt, Cédric Le Goater, Christophe
JAILLET, Christophe Leroy, Christopher M. Riedl, Christoph Hellwig,
Claudio Carvalho, Daniel Axtens, David Gibson, David Hildenbrand,
Desnes A. Nunes do Rosario, Ganesh Goudar, Gautham R. Shenoy, Greg
Kurz, Guerney Hunt, Gustavo Romero, Halil Pasic, Hari Bathini, Joakim
Tjernlund, Jonathan Neuschafer, Jordan Niethe, Leonardo Bras, Lianbo
Jiang, Madhavan Srinivasan, Mahesh Salgaonkar, Mahesh Salgaonkar,
Masahiro Yamada, Maxiwell S. Garcia, Michael Anderson, Nathan
Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver
O'Halloran, Qian Cai, Ram Pai, Ravi Bangoria, Reza Arbab, Ryan Grimm,
Sam Bobroff, Santosh Sivaraj, Segher Boessenkool, Sukadev Bhattiprolu,
Thiago Bauermann, Thiago Jung Bauermann, Thomas Gleixner, Tom
Lendacky, Vasant Hegde"

* tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (264 commits)
powerpc/mm/mce: Keep irqs disabled during lockless page table walk
powerpc: Use ftrace_graph_ret_addr() when unwinding
powerpc/ftrace: Enable HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
ftrace: Look up the address of return_to_handler() using helpers
powerpc: dump kernel log before carrying out fadump or kdump
docs: powerpc: Add missing documentation reference
powerpc/xmon: Fix output of XIVE IPI
powerpc/xmon: Improve output of XIVE interrupts
powerpc/mm/radix: remove useless kernel messages
powerpc/fadump: support holes in kernel boot memory area
powerpc/fadump: remove RMA_START and RMA_END macros
powerpc/fadump: update documentation about option to release opalcore
powerpc/fadump: consider f/w load area
powerpc/opalcore: provide an option to invalidate /sys/firmware/opal/core file
powerpc/opalcore: export /sys/firmware/opal/core for analysing opal crashes
powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP
powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel
powerpc/fadump: improve how crashed kernel's memory is reserved
powerpc/fadump: consider reserved ranges while releasing memory
powerpc/fadump: make crash memory ranges array allocation generic
...

Linus Torvalds
2019-09-21 02:48:06 +0800

19 Sep, 2019

1 commit

e444d51b1 Merge tag 'tty-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty ... Browse Code »

Pull tty/serial driver updates from Greg KH:
"Even in this age, people are still making new serial port silicon,
why...

Anyway, here's the TTY and Serial driver update for 5.4-rc1. Lots of
changes in here for a number of embedded serial port devices that are
being worked on because people really like to see those console
logs...

Other than that, nothing major here, no core tty changes that anyone
should care about.

All of these have been in linux-next for a while with no reported
issues"

* tag 'tty-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (125 commits)
serial: tegra: Add PIO mode support
serial: tegra: report clk rate errors
serial: tegra: add support to adjust baud rate
serial: tegra: DT for Adjusted baud rates
serial: tegra: add support to use 8 bytes trigger
serial: tegra: set maximum num of uart ports to 8
serial: tegra: check for FIFO mode enabled status
dt-binding: serial: tegra: add new chips
serial: tegra: report error to upper tty layer
serial: tegra: flush the RX fifo on frame error
serial: tegra: avoid reg access when clk disabled
serial: tegra: add support to ignore read
serial: sprd: correct the wrong sequence of arguments
dt-bindings: serial: Convert riscv,sifive-serial to json-schema
serial: max310x: turn off transmitter before activating AutoCTS or auto transmitter flow control
serial: max310x: Properly set flags in AutoCTS mode
tty: serial: fix platform_no_drv_owner.cocci warnings
dt-bindings: serial: Document Freescale LINFlexD UART
serial: fsl_linflexuart: Update compatible string
tty: n_gsm: avoid recursive locking with async port hangup
...

Linus Torvalds
2019-09-19 01:50:47 +0800

18 Sep, 2019

2 commits

7ad67ca55 Merge tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:

- Two NVMe pull requests:
- ana log parse fix from Anton
- nvme quirks support for Apple devices from Ben
- fix missing bio completion tracing for multipath stack devices
from Hannes and Mikhail
- IP TOS settings for nvme rdma and tcp transports from Israel
- rq_dma_dir cleanups from Israel
- tracing for Get LBA Status command from Minwoo
- Some nvme-tcp cleanups from Minwoo, Potnuri and Myself
- Some consolidation between the fabrics transports for handling
the CAP register
- reset race with ns scanning fix for fabrics (move fabrics
commands to a dedicated request queue with a different lifetime
from the admin request queue)."
- controller reset and namespace scan races fixes
- nvme discovery log change uevent support
- naming improvements from Keith
- multiple discovery controllers reject fix from James
- some regular cleanups from various people

- Series fixing (and re-fixing) null_blk debug printing and nr_devices
checks (André)

- A few pull requests from Song, with fixes from Andy, Guoqing,
Guilherme, Neil, Nigel, and Yufen.

- REQ_OP_ZONE_RESET_ALL support (Chaitanya)

- Bio merge handling unification (Christoph)

- Pick default elevator correctly for devices with special needs
(Damien)

- Block stats fixes (Hou)

- Timeout and support devices nbd fixes (Mike)

- Series fixing races around elevator switching and device add/remove
(Ming)

- sed-opal cleanups (Revanth)

- Per device weight support for BFQ (Fam)

- Support for blk-iocost, a new model that can properly account cost of
IO workloads. (Tejun)

- blk-cgroup writeback fixes (Tejun)

- paride queue init fixes (zhengbin)

- blk_set_runtime_active() cleanup (Stanley)

- Block segment mapping optimizations (Bart)

- lightnvm fixes (Hans/Minwoo/YueHaibing)

- Various little fixes and cleanups

* tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits)
null_blk: format pr_* logs with pr_fmt
null_blk: match the type of parameter nr_devices
null_blk: do not fail the module load with zero devices
block: also check RQF_STATS in blk_mq_need_time_stamp()
block: make rq sector size accessible for block stats
bfq: Fix bfq linkage error
raid5: use bio_end_sector in r5_next_bio
raid5: remove STRIPE_OPS_REQ_PENDING
md: add feature flag MD_FEATURE_RAID0_LAYOUT
md/raid0: avoid RAID0 data corruption due to layout confusion.
raid5: don't set STRIPE_HANDLE to stripe which is in batch list
raid5: don't increment read_errors on EILSEQ return
nvmet: fix a wrong error status returned in error log page
nvme: send discovery log page change events to userspace
nvme: add uevent variables for controller devices
nvme: enable aen regardless of the presence of I/O queues
nvme-fabrics: allow discovery subsystems accept a kato
nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()
nvme: Remove redundant assignment of cq vector
nvme: Assign subsys instance from first ctrl
...

Linus Torvalds
2019-09-18 07:57:47 +0800
7c672abc1 Merge tag 'docs-5.4' of git://git.lwn.net/linux ... Browse Code »

Pull documentation updates from Jonathan Corbet:
"It's a somewhat calmer cycle for docs this time, as the churn of the
mass RST conversion is happily mostly behind us.

- A new document on reproducible builds.

- We finally got around to zapping the documentation for hardware
support that was removed in 2004; one doesn't want to rush these
things.

- The usual assortment of fixes, typo corrections, etc"

* tag 'docs-5.4' of git://git.lwn.net/linux: (67 commits)
Documentation: kbuild: Add document about reproducible builds
docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]
Documentation: Add "earlycon=sbi" to the admin guide
doc:lock: remove reference to clever use of read-write lock
devices.txt: improve entry for comedi (char major 98)
docs: mtd: Update spi nor reference driver
doc: arm64: fix grammar dtb placed in no attributes region
Documentation: sysrq: don't recommend 'S' 'U' before 'B'
mailmap: Update email address for Quentin Perret
docs: ftrace: clarify when tracing is disabled by the trace file
docs: process: fix broken link
Documentation/arm/samsung-s3c24xx: Remove stray U+FEFF character to fix title
Documentation/arm/sa1100/assabet: Fix 'make assabet_defconfig' command
Documentation/arm/sa1100: Remove some obsolete documentation
docs/zh_CN: update Chinese howto.rst for latexdocs making
Documentation: virt: Fix broken reference to virt tree's index
docs: Fix typo on pull requests guide
kernel-doc: Allow anonymous enum
Documentation: sphinx: Don't parse socket() as identifier reference
Documentation: sphinx: Add missing comma to list of strings
...

Linus Torvalds
2019-09-18 07:22:26 +0800

17 Sep, 2019

1 commit

ad0621957 Merge tag 'platform-drivers-x86-v5.4-1' of git://git.infradead.org/linux-platform-drivers-x86 ... Browse Code »

Pull x86 platform-drivers updates from Andy Shevchenko:

- ASUS WMI driver got a couple of updates, i.e. support of FAN is fixed
for recent products and the charge threshold support has been added

- Two uknown key events for Dell laptops are being ignored now to avoid
spamming users with harmless messages

- HP ZBook 17 G5 and ASUS Zenbook UX430UNR got accelerometer support.

- Intel CherryTrail platforms had a regression with wake up. Now it's
fixed

- Intel PMC driver got fixed in order to work nicely in Xen
environment

- Intel Speed Select driver provides bucket vs core count relationship.
Besides that the tools has been updated for better output

- The PrivacyGuard is enabled on Lenovo ThinkPad laptops

- Three tablets - Trekstor Primebook C11B 2-in-1, Irbis TW90 and Chuwi
Surbook Mini - got touchscreen support

* tag 'platform-drivers-x86-v5.4-1' of git://git.infradead.org/linux-platform-drivers-x86: (53 commits)
MAINTAINERS: Switch PDx86 subsystem status to Odd Fixes
platform/x86: asus-wmi: Refactor charge threshold to use the battery hooking API
platform/x86: asus-wmi: Rename CHARGE_THRESHOLD to RSOC
platform/x86: asus-wmi: Reorder ASUS_WMI_CHARGE_THRESHOLD
tools/power/x86/intel-speed-select: Display core count for bucket
platform/x86: ISST: Allow additional TRL MSRs
tools/power/x86/intel-speed-select: Fix memory leak
tools/power/x86/intel-speed-select: Output success/failed for command output
tools/power/x86/intel-speed-select: Output human readable CPU list
tools/power/x86/intel-speed-select: Change turbo ratio output to maximum turbo frequency
tools/power/x86/intel-speed-select: Switch output to MHz
tools/power/x86/intel-speed-select: Simplify output for turbo-freq and base-freq
tools/power/x86/intel-speed-select: Fix cpu-count output
tools/power/x86/intel-speed-select: Fix help option typo
tools/power/x86/intel-speed-select: Fix package typo
tools/power/x86/intel-speed-select: Fix a read overflow in isst_set_tdp_level_msr()
platform/x86: intel_int0002_vgpio: Use device_init_wakeup
platform/x86: intel_int0002_vgpio: Fix wakeups not working on Cherry Trail
platform/x86: compal-laptop: Initialize "value" in ec_read_u8()
platform/x86: touchscreen_dmi: Add info for the Trekstor Primebook C11B 2-in-1
...

Linus Torvalds
2019-09-17 10:59:10 +0800