Eric Lee / smarc-fsl-linux-kernel

11 Nov, 2020

1 commit

1ed576a20 KVM: s390: pv: Mark mm as protected after the set secure parameters and improve cleanup ... Browse Code »

We can only have protected guest pages after a successful set secure
parameters call as only then the UV allows imports and unpacks.

By moving the test we can now also check for it in s390_reset_acc()
and do an early return if it is 0.

Signed-off-by: Janosch Frank
Fixes: 29b40f105ec8 ("KVM: s390: protvirt: Add initial vm and cpu lifecycle handling")
Reviewed-by: Cornelia Huck
Signed-off-by: Christian Borntraeger

Janosch Frank
2020-11-11 16:31:48 +0800

26 Oct, 2020

1 commit

33def8498 treewide: Convert macro and uses of __section(foo) to __section("foo") ... Browse Code »

Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.

Remove the quote operator # from compiler_attributes.h __section macro.

Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.

Conversion done using the script at:

https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl

Signed-off-by: Joe Perches
Reviewed-by: Nick Desaulniers
Reviewed-by: Miguel Ojeda
Signed-off-by: Linus Torvalds

Joe Perches
2020-10-26 05:51:49 +0800

24 Oct, 2020

1 commit

9313f8026 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost ... Browse Code »

Pull virtio updates from Michael Tsirkin:
"vhost, vdpa, and virtio cleanups and fixes

A very quiet cycle, no new features"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
MAINTAINERS: add URL for virtio-mem
vhost_vdpa: remove unnecessary spin_lock in vhost_vring_call
vringh: fix __vringh_iov() when riov and wiov are different
vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK
s390: virtio: PV needs VIRTIO I/O device protection
virtio: let arch advertise guest's memory access restrictions
vhost_vdpa: Fix duplicate included kernel.h
vhost: reduce stack usage in log_used
virtio-mem: Constify mem_id_table
virtio_input: Constify id_table
virtio-balloon: Constify id_table
vdpa/mlx5: Fix failure to bring link up
vdpa/mlx5: Make use of a specific 16 bit endianness API

Linus Torvalds
2020-10-24 02:00:57 +0800

21 Oct, 2020

1 commit

4ce1cf7b0 s390: virtio: PV needs VIRTIO I/O device protection ... Browse Code »

If protected virtualization is active on s390, VIRTIO has only retricted
access to the guest memory.
Define CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS and export
arch_has_restricted_virtio_memory_access to advertize VIRTIO if that's
the case.

Signed-off-by: Pierre Morel
Reviewed-by: Cornelia Huck
Reviewed-by: Halil Pasic
Link: https://lore.kernel.org/r/1599728030-17085-3-git-send-email-pmorel@linux.ibm.com
Signed-off-by: Michael S. Tsirkin
Acked-by: Christian Borntraeger

Pierre Morel
2020-10-21 22:34:13 +0800

17 Oct, 2020

1 commit

847d4287a Merge tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull s390 updates from Vasily Gorbik:

- Remove address space overrides using set_fs()

- Convert to generic vDSO

- Convert to generic page table dumper

- Add ARCH_HAS_DEBUG_WX support

- Add leap seconds handling support

- Add NVMe firmware-assisted kernel dump support

- Extend NVMe boot support with memory clearing control and addition of
kernel parameters

- AP bus and zcrypt api code rework. Add adapter configure/deconfigure
interface. Extend debug features. Add failure injection support

- Add ECC secure private keys support

- Add KASan support for running protected virtualization host with
4-level paging

- Utilize destroy page ultravisor call to speed up secure guests
shutdown

- Implement ioremap_wc() and ioremap_prot() with MIO in PCI code

- Various checksum improvements

- Other small various fixes and improvements all over the code

* tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (85 commits)
s390/uaccess: fix indentation
s390/uaccess: add default cases for __put_user_fn()/__get_user_fn()
s390/zcrypt: fix wrong format specifications
s390/kprobes: move insn_page to text segment
s390/sie: fix typo in SIGP code description
s390/lib: fix kernel doc for memcmp()
s390/zcrypt: Introduce Failure Injection feature
s390/zcrypt: move ap_msg param one level up the call chain
s390/ap/zcrypt: revisit ap and zcrypt error handling
s390/ap: Support AP card SCLP config and deconfig operations
s390/sclp: Add support for SCLP AP adapter config/deconfig
s390/ap: add card/queue deconfig state
s390/ap: add error response code field for ap queue devices
s390/ap: split ap queue state machine state from device state
s390/zcrypt: New config switch CONFIG_ZCRYPT_DEBUG
s390/zcrypt: introduce msg tracking in zcrypt functions
s390/startup: correct early pgm check info formatting
s390: remove orphaned extern variables declarations
s390/kasan: make sure int handler always run with DAT on
s390/ipl: add support to control memory clearing for nvme re-IPL
...

Linus Torvalds
2020-10-17 03:36:38 +0800

14 Oct, 2020

2 commits

b10d6bca8 arch, drivers: replace for_each_membock() with for_each_mem_range() ... Browse Code »

There are several occurrences of the following pattern:

for_each_memblock(memory, reg) {
start = __pfn_to_phys(memblock_region_memory_base_pfn(reg);
end = __pfn_to_phys(memblock_region_memory_end_pfn(reg));

/* do something with start and end */
}

Using for_each_mem_range() iterator is more appropriate in such cases and
allows simpler and cleaner code.

[akpm@linux-foundation.org: fix arch/arm/mm/pmsa-v7.c build]
[rppt@linux.ibm.com: mips: fix cavium-octeon build caused by memblock refactoring]
Link: http://lkml.kernel.org/r/20200827124549.GD167163@linux.ibm.com

Signed-off-by: Mike Rapoport
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Baoquan He
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christoph Hellwig
Cc: Daniel Axtens
Cc: Dave Hansen
Cc: Emil Renner Berthing
Cc: Hari Bathini
Cc: Ingo Molnar
Cc: Ingo Molnar
Cc: Jonathan Cameron
Cc: Marek Szyprowski
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Miguel Ojeda
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Peter Zijlstra
Cc: Russell King
Cc: Stafford Horne
Cc: Thomas Bogendoerfer
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: Yoshinori Sato
Link: https://lkml.kernel.org/r/20200818151634.14343-13-rppt@kernel.org
Signed-off-by: Linus Torvalds

Mike Rapoport
2020-10-14 09:38:35 +0800
c9118e6c3 arch, mm: replace for_each_memblock() with for_each_mem_pfn_range() ... Browse Code »

There are several occurrences of the following pattern:

for_each_memblock(memory, reg) {
start_pfn = memblock_region_memory_base_pfn(reg);
end_pfn = memblock_region_memory_end_pfn(reg);

/* do something with start_pfn and end_pfn */
}

Rather than iterate over all memblock.memory regions and each time query
for their start and end PFNs, use for_each_mem_pfn_range() iterator to get
simpler and clearer code.

Signed-off-by: Mike Rapoport
Signed-off-by: Andrew Morton
Reviewed-by: Baoquan He
Acked-by: Miguel Ojeda [.clang-format]
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christoph Hellwig
Cc: Daniel Axtens
Cc: Dave Hansen
Cc: Emil Renner Berthing
Cc: Hari Bathini
Cc: Ingo Molnar
Cc: Ingo Molnar
Cc: Jonathan Cameron
Cc: Marek Szyprowski
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Peter Zijlstra
Cc: Russell King
Cc: Stafford Horne
Cc: Thomas Bogendoerfer
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: Yoshinori Sato
Link: https://lkml.kernel.org/r/20200818151634.14343-12-rppt@kernel.org
Signed-off-by: Linus Torvalds

Mike Rapoport
2020-10-14 09:38:35 +0800

16 Sep, 2020

3 commits

c360c9a23 s390/kasan: support protvirt with 4-level paging ... Browse Code »

Currently the kernel crashes in Kasan instrumentation code if
CONFIG_KASAN_S390_4_LEVEL_PAGING is used on protected virtualization
capable machine where the ultravisor imposes addressing limitations on
the host and those limitations are lower then KASAN_SHADOW_OFFSET.

The problem is that Kasan has to know in advance where vmalloc/modules
areas would be. With protected virtualization enabled vmalloc/modules
areas are moved down to the ultravisor secure storage limit while kasan
still expects them at the very end of 4-level paging address space.

To fix that make Kasan recognize when protected virtualization is enabled
and predefine vmalloc/modules areas position which are compliant with
ultravisor secure storage limit.

Kasan shadow itself stays in place and might reside above that ultravisor
secure storage limit.

One slight difference compaired to a kernel without Kasan enabled is that
vmalloc/modules areas position is not reverted to default if ultravisor
initialization fails. It would still be below the ultravisor secure
storage limit.

Kernel layout with kasan, 4-level paging and protected virtualization
enabled (ultravisor secure storage limit is at 0x0000800000000000):
---[ vmemmap Area Start ]---
0x0000400000000000-0x0000400080000000
---[ vmemmap Area End ]---
---[ vmalloc Area Start ]---
0x00007fe000000000-0x00007fff80000000
---[ vmalloc Area End ]---
---[ Modules Area Start ]---
0x00007fff80000000-0x0000800000000000
---[ Modules Area End ]---
---[ Kasan Shadow Start ]---
0x0018000000000000-0x001c000000000000
---[ Kasan Shadow End ]---
0x001c000000000000-0x0020000000000000 1P PGD I

Kernel layout with kasan, 4-level paging and protected virtualization
disabled/unsupported:
---[ vmemmap Area Start ]---
0x0000400000000000-0x0000400060000000
---[ vmemmap Area End ]---
---[ Kasan Shadow Start ]---
0x0018000000000000-0x001c000000000000
---[ Kasan Shadow End ]---
---[ vmalloc Area Start ]---
0x001fffe000000000-0x001fffff80000000
---[ vmalloc Area End ]---
---[ Modules Area Start ]---
0x001fffff80000000-0x0020000000000000
---[ Modules Area End ]---

Signed-off-by: Vasily Gorbik

Vasily Gorbik
2020-09-16 20:08:48 +0800
ee4b2ce6d s390/mm,ptdump: sort markers ... Browse Code »

Kasan configuration options and size of physical memory present could
affect kernel memory layout. In particular vmemmap, vmalloc and modules
might come before kasan shadow or after it. To make ptdump correctly
output markers in the right order markers have to be sorted.

To preserve the original order of markers with the same start address
avoid using sort() from lib/sort.c (which is not stable sorting algorithm)
and sort markers in place.

Reviewed-by: Heiko Carstens
Signed-off-by: Vasily Gorbik

Vasily Gorbik
2020-09-16 20:08:47 +0800
48111b483 s390/mm,ptdump: add proper ifdefs ... Browse Code »

Use ifdefs instead of IS_ENABLED() to avoid compile error
for !PTDUMP_DEBUGFS:

arch/s390/mm/dump_pagetables.c: In function ‘pt_dump_init’:
arch/s390/mm/dump_pagetables.c:248:64: error: ‘ptdump_fops’ undeclared (first use in this function); did you mean ‘pidfd_fops’?
debugfs_create_file("kernel_page_tables", 0400, NULL, NULL, &ptdump_fops);

Reported-by: Julian Wiedmann
Fixes: 08c8e685c7c9 ("s390: add ARCH_HAS_DEBUG_WX support")
Signed-off-by: Heiko Carstens
Signed-off-by: Vasily Gorbik

Heiko Carstens
2020-09-16 20:08:47 +0800

14 Sep, 2020

10 commits

1a80b54d1 s390/uv: add destroy page call ... Browse Code »

We don't need to export pages if we destroy the VM configuration
afterwards anyway. Instead we can destroy the page which will zero it
and then make it accessible to the host.

Destroying is about twice as fast as the export.

Signed-off-by: Janosch Frank
Reviewed-by: Claudio Imbrenda
Reviewed-by: Thomas Huth
Reviewed-by: Cornelia Huck
Link: https://lore.kernel.org/kvm/20200907124700.10374-2-frankja@linux.ibm.com/
Signed-off-by: Janosch Frank
Signed-off-by: Vasily Gorbik

Janosch Frank
2020-09-14 17:38:35 +0800
e670e64af s390/mm,ptdump: add couple of additional markers ... Browse Code »

Signed-off-by: Vasily Gorbik
[hca@linux.ibm.com: add more markers, rename some markers]
Signed-off-by: Heiko Carstens
Signed-off-by: Vasily Gorbik

Vasily Gorbik
2020-09-14 17:38:35 +0800
d411e3c67 s390/kasan: make shadow memory noexec ... Browse Code »

ARCH_HAS_DEBUG_WX feature support brought attention to the fact that
currently initial kasan shadow memory mapped without noexec flag. So fix that.

Temporary initial identity mapping is still created without noexec, but
it is replaced by properly set up paging later.

Signed-off-by: Vasily Gorbik
Signed-off-by: Heiko Carstens
Signed-off-by: Vasily Gorbik

Vasily Gorbik
2020-09-14 17:38:35 +0800
08c8e685c s390: add ARCH_HAS_DEBUG_WX support ... Browse Code »

Checks the whole kernel address space for W+X mappings. Note that
currently the first lowcore page unfortunately has to be mapped
W+X. Therefore this not reported as an insecure mapping.

For the very same reason the wording is also different to other
architectures if the test passes:

On s390 it is "no unexpected W+X pages found" instead of
"no W+X pages found".

Tested-by: Vasily Gorbik
Signed-off-by: Heiko Carstens
Signed-off-by: Vasily Gorbik

Heiko Carstens
2020-09-14 17:38:35 +0800
6bf9a639e s390/mm,ptdump: make page table dumping seq_file optional ... Browse Code »

s390 version of ae5d1cf358a5 ("arm64: dump: Make the page table
dumping seq_file optional").

Tested-by: Vasily Gorbik
Signed-off-by: Heiko Carstens
Signed-off-by: Vasily Gorbik

Heiko Carstens
2020-09-14 17:38:35 +0800
da1694ad9 s390/mm,ptdump: hold cpa mutex while walking for kernel page table dump ... Browse Code »

This is currently only preventing that outdated information is
provided to user space. A concurrent split of huge/large pages does
modify the kernel page tables, however either the huge/large mapping
is reported or the split area is being walked.

This "fixes" also only a potential future bug, since split pages could
also be merged again if page permissions are the same for larger
memory areas.

Reviewed-by: Vasily Gorbik
Signed-off-by: Heiko Carstens
Signed-off-by: Vasily Gorbik

Heiko Carstens
2020-09-14 17:38:34 +0800
36c2733c4 s390/mm,ptdump: hold memory hotplug lock while walking for kernel page table dump ... Browse Code »

This is the s390 variant of commit bf2b59f60ee1 ("arm64/mm: Hold
memory hotplug lock while walking for kernel page table dump").

Right now this doesn't fix any real bug, however as soon as kvm
patches get merged which make use of memory remove we might end up
dereferencing/accessing freed page tables.

Therefore fix this potential bug already now.

Reviewed-by: Vasily Gorbik
Signed-off-by: Heiko Carstens
Signed-off-by: Vasily Gorbik

Heiko Carstens
2020-09-14 17:38:34 +0800
9d719d39a s390/mm,ptdump: convert to generic page table dumper ... Browse Code »

Make use of generic ptdump infrastructure.

Reviewed-by: Vasily Gorbik
Signed-off-by: Heiko Carstens
Signed-off-by: Vasily Gorbik

Heiko Carstens
2020-09-14 17:38:34 +0800
b02002cc4 s390/pci: Implement ioremap_wc/prot() with MIO ... Browse Code »

With our current support for the new MIO PCI instructions, write
combining/write back MMIO memory can be obtained via the pci_iomap_wc()
and pci_iomap_wc_range() functions.
This is achieved by using the write back address for a specific bar
as provided in clp_store_query_pci_fn()

These functions are however not widely used and instead drivers often
rely on ioremap_wc() and ioremap_prot(), which on other platforms enable
write combining using a PTE flag set through the pgrprot value.

While we do not have a write combining flag in the low order flag bits
of the PTE like x86_64 does, with MIO support, there is a write back bit
in the physical address (bit 1 on z15) and thus also the PTE.
Which bit is used to toggle write back and whether it is available at
all, is however not fixed in the architecture. Instead we get this
information from the CLP Store Logical Processor Characteristics for PCI
command. When the write back bit is not provided we fall back to the
existing behavior.

Signed-off-by: Niklas Schnelle
Reviewed-by: Pierre Morel
Reviewed-by: Gerald Schaefer
Signed-off-by: Vasily Gorbik

Niklas Schnelle
2020-09-14 16:30:07 +0800
cd4d3d5f2 s390: add 3f program exception handler ... Browse Code »

Program exception 3f (secure storage violation) can only be detected
when the CPU is running in SIE with a format 4 state description,
e.g. running a protected guest. Because of this and because user
space partly controls the guest memory mapping and can trigger this
exception, we want to send a SIGSEGV to the process running the guest
and not panic the kernel.

Signed-off-by: Janosch Frank
Cc: # 5.7
Fixes: 084ea4d611a3 ("s390/mm: add (non)secure page access exceptions handlers")
Reviewed-by: Claudio Imbrenda
Reviewed-by: Cornelia Huck
Acked-by: Christian Borntraeger
Signed-off-by: Heiko Carstens
Signed-off-by: Vasily Gorbik

Janosch Frank
2020-09-14 16:08:07 +0800

27 Aug, 2020

1 commit

bffc2f7aa s390/vmem: fix vmem_add_range for 4-level paging ... Browse Code »

The kernel currently crashes if 4-level paging is used. Add missing
p4d_populate for just allocated pud entry.

Fixes: 3e0d3e408e63 ("s390/vmem: consolidate vmem_add_range() and vmem_remove_range()")
Reviewed-by: Gerald Schaefer
Signed-off-by: Vasily Gorbik

Vasily Gorbik
2020-08-27 00:07:05 +0800

14 Aug, 2020

1 commit

990f22737 Merge tag 's390-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull more s390 updates from Heiko Carstens:

- Allow s390 debug feature to handle finally more than 256 CPU numbers,
instead of truncating the most significant bits.

- Improve THP splitting required by qemu processes by making use of
walk_page_vma() instead of calling follow_page() for every single
page within each vma.

- Add missing ZCRYPT dependency to VFIO_AP to fix potential compile
problems.

- Remove not required select CLOCKSOURCE_VALIDATE_LAST_CYCLE again.

- Set node distance to LOCAL_DISTANCE instead of 0, since e.g. libnuma
translates a node distance of 0 to "no NUMA support available".

- Couple of other minor fixes and improvements.

* tag 's390-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/numa: move code to arch/s390/kernel
s390/time: remove select CLOCKSOURCE_VALIDATE_LAST_CYCLE again
s390/debug: debug feature version 3
s390/Kconfig: add missing ZCRYPT dependency to VFIO_AP
s390/numa: set node distance to LOCAL_DISTANCE
s390/pkey: remove redundant variable initialization
s390/test_unwind: fix possible memleak in test_unwind()
s390/gmap: improve THP splitting
s390/atomic: circumvent gcc 10 build regression

Linus Torvalds
2020-08-14 03:38:32 +0800

13 Aug, 2020

3 commits

64019a2e4 mm/gup: remove task_struct pointer for all gup code ... Browse Code »

After the cleanup of page fault accounting, gup does not need to pass
task_struct around any more. Remove that parameter in the whole gup
stack.

Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Reviewed-by: John Hubbard
Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com
Signed-off-by: Linus Torvalds

Peter Xu
2020-08-13 01:58:04 +0800
35e45f3e5 mm/s390: use general page fault accounting ... Browse Code »

Use the general page fault accounting by passing regs into
handle_mm_fault(). It naturally solve the issue of multiple page fault
accounting when page fault retry happened.

Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Reviewed-by: Gerald Schaefer
Acked-by: Gerald Schaefer
Cc: Alexander Gordeev
Cc: Heiko Carstens
Cc: Vasily Gorbik
Cc: Christian Borntraeger
Link: http://lkml.kernel.org/r/20200707225021.200906-19-peterx@redhat.com
Signed-off-by: Linus Torvalds

Peter Xu
2020-08-13 01:58:03 +0800
bce617ede mm: do page fault accounting in handle_mm_fault ... Browse Code »

Patch series "mm: Page fault accounting cleanups", v5.

This is v5 of the pf accounting cleanup series. It originates from Gerald
Schaefer's report on an issue a week ago regarding to incorrect page fault
accountings for retried page fault after commit 4064b9827063 ("mm: allow
VM_FAULT_RETRY for multiple times"):

https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

What this series did:

- Correct page fault accounting: we do accounting for a page fault
(no matter whether it's from #PF handling, or gup, or anything else)
only with the one that completed the fault. For example, page fault
retries should not be counted in page fault counters. Same to the
perf events.

- Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
event is used in an adhoc way across different archs.

Case (1): for many archs it's done at the entry of a page fault
handler, so that it will also cover e.g. errornous faults.

Case (2): for some other archs, it is only accounted when the page
fault is resolved successfully.

Case (3): there're still quite some archs that have not enabled
this perf event.

Since this series will touch merely all the archs, we unify this
perf event to always follow case (1), which is the one that makes most
sense. And since we moved the accounting into handle_mm_fault, the
other two MAJ/MIN perf events are well taken care of naturally.

- Unify definition of "major faults": the definition of "major
fault" is slightly changed when used in accounting (not
VM_FAULT_MAJOR). More information in patch 1.

- Always account the page fault onto the one that triggered the page
fault. This does not matter much for #PF handlings, but mostly for
gup. More information on this in patch 25.

Patchset layout:

Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
Patch 25: Cleanup GUP task_struct pointer since it's not needed any more

This patch (of 25):

This is a preparation patch to move page fault accountings into the
general code in handle_mm_fault(). This includes both the per task
flt_maj/flt_min counters, and the major/minor page fault perf events. To
do this, the pt_regs pointer is passed into handle_mm_fault().

PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
handlers.

So far, all the pt_regs pointer that passed into handle_mm_fault() is
NULL, which means this patch should have no intented functional change.

Suggested-by: Linus Torvalds
Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Cc: Albert Ou
Cc: Alexander Gordeev
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Brian Cain
Cc: Catalin Marinas
Cc: Christian Borntraeger
Cc: Chris Zankel
Cc: Dave Hansen
Cc: David S. Miller
Cc: Geert Uytterhoeven
Cc: Gerald Schaefer
Cc: Greentime Hu
Cc: Guo Ren
Cc: Heiko Carstens
Cc: Helge Deller
Cc: H. Peter Anvin
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: James E.J. Bottomley
Cc: John Hubbard
Cc: Jonas Bonn
Cc: Ley Foon Tan
Cc: "Luck, Tony"
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Nick Hu
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: Richard Henderson
Cc: Rich Felker
Cc: Russell King
Cc: Stafford Horne
Cc: Stefan Kristiansson
Cc: Thomas Bogendoerfer
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Vincent Chen
Cc: Vineet Gupta
Cc: Will Deacon
Cc: Yoshinori Sato
Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
Signed-off-by: Linus Torvalds

Peter Xu
2020-08-13 01:58:02 +0800

12 Aug, 2020

1 commit

ba925fa35 s390/gmap: improve THP splitting ... Browse Code »

During s390_enable_sie(), we need to take care of splitting all qemu user
process THP mappings. This is currently done with follow_page(FOLL_SPLIT),
by simply iterating over all vma ranges, with PAGE_SIZE increment.

This logic is sub-optimal and can result in a lot of unnecessary overhead,
especially when using qemu and ASAN with large shadow map. Ilya reported
significant system slow-down with one CPU busy for a long time and overall
unresponsiveness.

Fix this by using walk_page_vma() and directly calling split_huge_pmd()
only for present pmds, which greatly reduces overhead.

Cc: # v5.4+
Reported-by: Ilya Leoshkevich
Tested-by: Ilya Leoshkevich
Acked-by: Christian Borntraeger
Signed-off-by: Gerald Schaefer
Signed-off-by: Heiko Carstens

Gerald Schaefer
2020-08-12 00:16:13 +0800

08 Aug, 2020

2 commits

c89ab04fe mm/sparse: cleanup the code surrounding memory_present() ... Browse Code »

After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent
functions that call memory_present() for each region in memblock.memory:
sparse_memory_present_with_active_regions() and membocks_present().

Moreover, all architectures have a call to either of these functions
preceding the call to sparse_init() and in the most cases they are called
one after the other.

Mark the regions from memblock.memory as present during sparce_init() by
making sparse_init() call memblocks_present(), make memblocks_present()
and memory_present() functions static and remove redundant
sparse_memory_present_with_active_regions() function.

Also remove no longer required HAVE_MEMORY_PRESENT configuration option.

Signed-off-by: Mike Rapoport
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200712083130.22919-1-rppt@kernel.org
Signed-off-by: Linus Torvalds

Mike Rapoport
2020-08-08 02:33:27 +0800
ca15ca406 mm: remove unneeded includes of <asm/pgalloc.h> ... Browse Code »

Patch series "mm: cleanup usage of "

Most architectures have very similar versions of pXd_alloc_one() and
pXd_free_one() for intermediate levels of page table. These patches add
generic versions of these functions in and enable
use of the generic functions where appropriate.

In addition, functions declared and defined in headers are
used mostly by core mm and early mm initialization in arch and there is no
actual reason to have the included all over the place.
The first patch in this series removes unneeded includes of

In the end it didn't work out as neatly as I hoped and moving
pXd_alloc_track() definitions to would require
unnecessary changes to arches that have custom page table allocations, so
I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
to mm/.

This patch (of 8):

In most cases header is required only for allocations of
page table memory. Most of the .c files that include that header do not
use symbols declared in and do not require that header.

As for the other header files that used to include , it is
possible to move that include into the .c file that actually uses symbols
from and drop the include from the header file.

The process was somewhat automated using

sed -i -E '/[
Signed-off-by: Andrew Morton
Reviewed-by: Pekka Enberg
Acked-by: Geert Uytterhoeven [m68k]
Cc: Abdul Haleem
Cc: Andy Lutomirski
Cc: Arnd Bergmann
Cc: Christophe Leroy
Cc: Joerg Roedel
Cc: Max Filippov
Cc: Peter Zijlstra
Cc: Satheesh Rajendran
Cc: Stafford Horne
Cc: Stephen Rothwell
Cc: Steven Rostedt
Cc: Joerg Roedel
Cc: Matthew Wilcox
Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
Signed-off-by: Linus Torvalds

Mike Rapoport
2020-08-08 02:33:26 +0800

04 Aug, 2020

1 commit

45365a06a Merge tag 's390-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull s390 updates from Heiko Carstens:

- Add support for function error injection.

- Add support for custom exception handlers, as required by
BPF_PROBE_MEM.

- Add support for BPF_PROBE_MEM.

- Add trace events for idle enter / exit for the s390 specific idle
implementation.

- Remove unused zcore memmmap device.

- Remove unused "raw view" from s390 debug feature.

- AP bus + zcrypt device driver code refactoring.

- Provide cex4 cca sysfs attributes for cex3 for zcrypt device driver.

- Expose only minimal interface to walk physmem for mm/memblock. This
is a common code change and it has been agreed on with Mike Rapoport
and Andrew Morton that this can go upstream via the s390 tree.

- Rework of the s390 vmem/vmmemap code to allow for future memory hot
remove.

- Get rid of FORCE_MAX_ZONEORDER to finally allow for order-10
allocations again, instead of only order-8 allocations.

- Various small improvements and fixes.

* tag 's390-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
s390/vmemmap: coding style updates
s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive sections
s390/vmemmap: remember unused sub-pmd ranges
s390/vmemmap: fallback to PTEs if mapping large PMD fails
s390/vmem: cleanup empty page tables
s390/vmemmap: take the vmem_mutex when populating/freeing
s390/vmemmap: cleanup when vmemmap_populate() fails
s390/vmemmap: extend modify_pagetable() to handle vmemmap
s390/vmem: consolidate vmem_add_range() and vmem_remove_range()
s390/vmem: rename vmem_add_mem() to vmem_add_range()
s390: enable HAVE_FUNCTION_ERROR_INJECTION
s390/pci: clarify comment in s390_mmio_read/write
s390/time: improve comparison for tod steering
s390/time: select CLOCKSOURCE_VALIDATE_LAST_CYCLE
s390/time: use CLOCKSOURCE_MASK
s390/bpf: implement BPF_PROBE_MEM
s390/kernel: expand exception table logic to allow new handling options
s390/kernel: unify EX_TABLE* implementations
s390/mm: allow order 10 allocations
s390/mm: avoid trimming to MAX_ORDER
...

Linus Torvalds
2020-08-04 04:58:10 +0800

27 Jul, 2020

10 commits

9a996c67a s390/vmemmap: coding style updates ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2020-07-27 16:34:19 +0800
2c114df07 s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive sections ... Browse Code »

Let's avoid memset(PAGE_UNUSED) when adding consecutive sections,
whereby the vmemmap of a single section does not span full PMDs.

Cc: Vasily Gorbik
Cc: Christian Borntraeger
Cc: Gerald Schaefer
Signed-off-by: David Hildenbrand
Message-Id:
Signed-off-by: Heiko Carstens

David Hildenbrand
2020-07-27 16:34:14 +0800
cd5781d63 s390/vmemmap: remember unused sub-pmd ranges ... Browse Code »

With a memmap size of 56 bytes or 72 bytes per page, the memmap for a
256 MB section won't span full PMDs. As we populate single sections and
depopulate single sections, the depopulation step would not be able to
free all vmemmap pmds anymore.

Do it similarly to x86, marking the unused memmap ranges in a special way
(pad it with 0xFD).

This allows us to add/remove sections, cleaning up all allocated
vmemmap pages even if the memmap size is not multiple of 16 bytes per page.

A 56 byte memmap can, for example, be created with !CONFIG_MEMCG and
!CONFIG_SLUB.

Cc: Vasily Gorbik
Cc: Christian Borntraeger
Cc: Gerald Schaefer
Signed-off-by: David Hildenbrand
Message-Id:
Signed-off-by: Heiko Carstens

David Hildenbrand
2020-07-27 16:34:08 +0800
f2057b426 s390/vmemmap: fallback to PTEs if mapping large PMD fails ... Browse Code »

Let's fallback to single pages if short on huge pages. No need to stop
memory hotplug.

Cc: Vasily Gorbik
Cc: Christian Borntraeger
Cc: Gerald Schaefer
Signed-off-by: David Hildenbrand
Message-Id:
Signed-off-by: Heiko Carstens

David Hildenbrand
2020-07-27 16:34:03 +0800
b9ff81003 s390/vmem: cleanup empty page tables ... Browse Code »

Let's cleanup empty page tables. Consider only page tables that fully
fall into the idendity mapping and the vmemmap range.

As there are no valid accesses to vmem/vmemmap within non-populated ranges,
the single tlb flush at the end should be sufficient.

Cc: Vasily Gorbik
Cc: Christian Borntraeger
Cc: Gerald Schaefer
Signed-off-by: David Hildenbrand
Message-Id:
Signed-off-by: Heiko Carstens

David Hildenbrand
2020-07-27 16:33:59 +0800
aa18e0e65 s390/vmemmap: take the vmem_mutex when populating/freeing ... Browse Code »

Let's synchronize all accesses to the 1:1 and vmemmap mappings. This will
be especially relevant when wanting to cleanup empty page tables that could
be shared by both. Avoid races when removing tables that might be just
about to get reused.

Cc: Vasily Gorbik
Cc: Christian Borntraeger
Cc: Gerald Schaefer
Signed-off-by: David Hildenbrand
Message-Id:
Signed-off-by: Heiko Carstens

David Hildenbrand
2020-07-27 16:33:51 +0800
c00f05a92 s390/vmemmap: cleanup when vmemmap_populate() fails ... Browse Code »

Cleanup what we partially added in case vmemmap_populate() fails. For
vmem, this is already handled by vmem_add_mapping().

Cc: Vasily Gorbik
Cc: Christian Borntraeger
Cc: Gerald Schaefer
Signed-off-by: David Hildenbrand
Message-Id:
Signed-off-by: Heiko Carstens

David Hildenbrand
2020-07-27 16:33:46 +0800
9ec8fa8dc s390/vmemmap: extend modify_pagetable() to handle vmemmap ... Browse Code »

Extend our shiny new modify_pagetable() to handle !direct (vmemmap)
mappings. Convert vmemmap_populate() and implement vmemmap_free().

Cc: Vasily Gorbik
Cc: Christian Borntraeger
Cc: Gerald Schaefer
Signed-off-by: David Hildenbrand
Message-Id:
Signed-off-by: Heiko Carstens

David Hildenbrand
2020-07-27 16:33:41 +0800
3e0d3e408 s390/vmem: consolidate vmem_add_range() and vmem_remove_range() ... Browse Code »

We want to have only a single pagetable walker and reuse the same
functionality for vmemmap handling. Let's start by consolidating
vmem_add_range() and vmem_remove_range(), converting it into a
recursive implementation.

A recursive implementation makes it easier to expand individual cases
without harming readability. In addition, we minimize traversing the
whole hierarchy over and over again.

One change is that we don't unmap large PMDs/PUDs when not completely
covered by the request, something that should never happen with direct
mappings, unless one would be removing in other granularity than added,
which would be broken already.

Cc: Vasily Gorbik
Cc: Christian Borntraeger
Cc: Gerald Schaefer
Signed-off-by: David Hildenbrand
Message-Id:
Signed-off-by: Heiko Carstens

David Hildenbrand
2020-07-27 16:33:36 +0800
8398b226b s390/vmem: rename vmem_add_mem() to vmem_add_range() ... Browse Code »

Let's match the name to vmem_remove_range().

Cc: Vasily Gorbik
Cc: Christian Borntraeger
Cc: Gerald Schaefer
Signed-off-by: David Hildenbrand
Message-Id:
Signed-off-by: Heiko Carstens

David Hildenbrand
2020-07-27 16:33:32 +0800

20 Jul, 2020

1 commit

05a68e892 s390/kernel: expand exception table logic to allow new handling options ... Browse Code »

This is a s390 port of commit 548acf19234d ("x86/mm: Expand the
exception table logic to allow new handling options"), which is needed
for implementing BPF_PROBE_MEM on s390.

The new handler field is made 64-bit in order to allow pointing from
dynamically allocated entries to handlers in kernel text. Unlike on x86,
NULL is used instead of ex_handler_default. This is because exception
tables are used by boot/text_dma.S, and it would be a pain to preserve
ex_handler_default.

The new infrastructure is ignored in early_pgm_check_handler, since
there is no pt_regs.

Signed-off-by: Ilya Leoshkevich
Reviewed-by: Heiko Carstens
Signed-off-by: Heiko Carstens

Ilya Leoshkevich
2020-07-20 16:55:50 +0800