11 Nov, 2020

1 commit


26 Oct, 2020

1 commit

  • Use a more generic form for __section that requires quotes to avoid
    complications with clang and gcc differences.

    Remove the quote operator # from compiler_attributes.h __section macro.

    Convert all unquoted __section(foo) uses to quoted __section("foo").
    Also convert __attribute__((section("foo"))) uses to __section("foo")
    even if the __attribute__ has multiple list entry forms.

    Conversion done using the script at:

    https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl

    Signed-off-by: Joe Perches
    Reviewed-by: Nick Desaulniers
    Reviewed-by: Miguel Ojeda
    Signed-off-by: Linus Torvalds

    Joe Perches
     

24 Oct, 2020

1 commit

  • Pull virtio updates from Michael Tsirkin:
    "vhost, vdpa, and virtio cleanups and fixes

    A very quiet cycle, no new features"

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    MAINTAINERS: add URL for virtio-mem
    vhost_vdpa: remove unnecessary spin_lock in vhost_vring_call
    vringh: fix __vringh_iov() when riov and wiov are different
    vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK
    s390: virtio: PV needs VIRTIO I/O device protection
    virtio: let arch advertise guest's memory access restrictions
    vhost_vdpa: Fix duplicate included kernel.h
    vhost: reduce stack usage in log_used
    virtio-mem: Constify mem_id_table
    virtio_input: Constify id_table
    virtio-balloon: Constify id_table
    vdpa/mlx5: Fix failure to bring link up
    vdpa/mlx5: Make use of a specific 16 bit endianness API

    Linus Torvalds
     

21 Oct, 2020

1 commit

  • If protected virtualization is active on s390, VIRTIO has only retricted
    access to the guest memory.
    Define CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS and export
    arch_has_restricted_virtio_memory_access to advertize VIRTIO if that's
    the case.

    Signed-off-by: Pierre Morel
    Reviewed-by: Cornelia Huck
    Reviewed-by: Halil Pasic
    Link: https://lore.kernel.org/r/1599728030-17085-3-git-send-email-pmorel@linux.ibm.com
    Signed-off-by: Michael S. Tsirkin
    Acked-by: Christian Borntraeger

    Pierre Morel
     

17 Oct, 2020

1 commit

  • Pull s390 updates from Vasily Gorbik:

    - Remove address space overrides using set_fs()

    - Convert to generic vDSO

    - Convert to generic page table dumper

    - Add ARCH_HAS_DEBUG_WX support

    - Add leap seconds handling support

    - Add NVMe firmware-assisted kernel dump support

    - Extend NVMe boot support with memory clearing control and addition of
    kernel parameters

    - AP bus and zcrypt api code rework. Add adapter configure/deconfigure
    interface. Extend debug features. Add failure injection support

    - Add ECC secure private keys support

    - Add KASan support for running protected virtualization host with
    4-level paging

    - Utilize destroy page ultravisor call to speed up secure guests
    shutdown

    - Implement ioremap_wc() and ioremap_prot() with MIO in PCI code

    - Various checksum improvements

    - Other small various fixes and improvements all over the code

    * tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (85 commits)
    s390/uaccess: fix indentation
    s390/uaccess: add default cases for __put_user_fn()/__get_user_fn()
    s390/zcrypt: fix wrong format specifications
    s390/kprobes: move insn_page to text segment
    s390/sie: fix typo in SIGP code description
    s390/lib: fix kernel doc for memcmp()
    s390/zcrypt: Introduce Failure Injection feature
    s390/zcrypt: move ap_msg param one level up the call chain
    s390/ap/zcrypt: revisit ap and zcrypt error handling
    s390/ap: Support AP card SCLP config and deconfig operations
    s390/sclp: Add support for SCLP AP adapter config/deconfig
    s390/ap: add card/queue deconfig state
    s390/ap: add error response code field for ap queue devices
    s390/ap: split ap queue state machine state from device state
    s390/zcrypt: New config switch CONFIG_ZCRYPT_DEBUG
    s390/zcrypt: introduce msg tracking in zcrypt functions
    s390/startup: correct early pgm check info formatting
    s390: remove orphaned extern variables declarations
    s390/kasan: make sure int handler always run with DAT on
    s390/ipl: add support to control memory clearing for nvme re-IPL
    ...

    Linus Torvalds
     

14 Oct, 2020

2 commits

  • There are several occurrences of the following pattern:

    for_each_memblock(memory, reg) {
    start = __pfn_to_phys(memblock_region_memory_base_pfn(reg);
    end = __pfn_to_phys(memblock_region_memory_end_pfn(reg));

    /* do something with start and end */
    }

    Using for_each_mem_range() iterator is more appropriate in such cases and
    allows simpler and cleaner code.

    [akpm@linux-foundation.org: fix arch/arm/mm/pmsa-v7.c build]
    [rppt@linux.ibm.com: mips: fix cavium-octeon build caused by memblock refactoring]
    Link: http://lkml.kernel.org/r/20200827124549.GD167163@linux.ibm.com

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: Daniel Axtens
    Cc: Dave Hansen
    Cc: Emil Renner Berthing
    Cc: Hari Bathini
    Cc: Ingo Molnar
    Cc: Ingo Molnar
    Cc: Jonathan Cameron
    Cc: Marek Szyprowski
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Miguel Ojeda
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: https://lkml.kernel.org/r/20200818151634.14343-13-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • There are several occurrences of the following pattern:

    for_each_memblock(memory, reg) {
    start_pfn = memblock_region_memory_base_pfn(reg);
    end_pfn = memblock_region_memory_end_pfn(reg);

    /* do something with start_pfn and end_pfn */
    }

    Rather than iterate over all memblock.memory regions and each time query
    for their start and end PFNs, use for_each_mem_pfn_range() iterator to get
    simpler and clearer code.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Reviewed-by: Baoquan He
    Acked-by: Miguel Ojeda [.clang-format]
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: Daniel Axtens
    Cc: Dave Hansen
    Cc: Emil Renner Berthing
    Cc: Hari Bathini
    Cc: Ingo Molnar
    Cc: Ingo Molnar
    Cc: Jonathan Cameron
    Cc: Marek Szyprowski
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: https://lkml.kernel.org/r/20200818151634.14343-12-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

16 Sep, 2020

3 commits

  • Currently the kernel crashes in Kasan instrumentation code if
    CONFIG_KASAN_S390_4_LEVEL_PAGING is used on protected virtualization
    capable machine where the ultravisor imposes addressing limitations on
    the host and those limitations are lower then KASAN_SHADOW_OFFSET.

    The problem is that Kasan has to know in advance where vmalloc/modules
    areas would be. With protected virtualization enabled vmalloc/modules
    areas are moved down to the ultravisor secure storage limit while kasan
    still expects them at the very end of 4-level paging address space.

    To fix that make Kasan recognize when protected virtualization is enabled
    and predefine vmalloc/modules areas position which are compliant with
    ultravisor secure storage limit.

    Kasan shadow itself stays in place and might reside above that ultravisor
    secure storage limit.

    One slight difference compaired to a kernel without Kasan enabled is that
    vmalloc/modules areas position is not reverted to default if ultravisor
    initialization fails. It would still be below the ultravisor secure
    storage limit.

    Kernel layout with kasan, 4-level paging and protected virtualization
    enabled (ultravisor secure storage limit is at 0x0000800000000000):
    ---[ vmemmap Area Start ]---
    0x0000400000000000-0x0000400080000000
    ---[ vmemmap Area End ]---
    ---[ vmalloc Area Start ]---
    0x00007fe000000000-0x00007fff80000000
    ---[ vmalloc Area End ]---
    ---[ Modules Area Start ]---
    0x00007fff80000000-0x0000800000000000
    ---[ Modules Area End ]---
    ---[ Kasan Shadow Start ]---
    0x0018000000000000-0x001c000000000000
    ---[ Kasan Shadow End ]---
    0x001c000000000000-0x0020000000000000 1P PGD I

    Kernel layout with kasan, 4-level paging and protected virtualization
    disabled/unsupported:
    ---[ vmemmap Area Start ]---
    0x0000400000000000-0x0000400060000000
    ---[ vmemmap Area End ]---
    ---[ Kasan Shadow Start ]---
    0x0018000000000000-0x001c000000000000
    ---[ Kasan Shadow End ]---
    ---[ vmalloc Area Start ]---
    0x001fffe000000000-0x001fffff80000000
    ---[ vmalloc Area End ]---
    ---[ Modules Area Start ]---
    0x001fffff80000000-0x0020000000000000
    ---[ Modules Area End ]---

    Signed-off-by: Vasily Gorbik

    Vasily Gorbik
     
  • Kasan configuration options and size of physical memory present could
    affect kernel memory layout. In particular vmemmap, vmalloc and modules
    might come before kasan shadow or after it. To make ptdump correctly
    output markers in the right order markers have to be sorted.

    To preserve the original order of markers with the same start address
    avoid using sort() from lib/sort.c (which is not stable sorting algorithm)
    and sort markers in place.

    Reviewed-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik

    Vasily Gorbik
     
  • Use ifdefs instead of IS_ENABLED() to avoid compile error
    for !PTDUMP_DEBUGFS:

    arch/s390/mm/dump_pagetables.c: In function ‘pt_dump_init’:
    arch/s390/mm/dump_pagetables.c:248:64: error: ‘ptdump_fops’ undeclared (first use in this function); did you mean ‘pidfd_fops’?
    debugfs_create_file("kernel_page_tables", 0400, NULL, NULL, &ptdump_fops);

    Reported-by: Julian Wiedmann
    Fixes: 08c8e685c7c9 ("s390: add ARCH_HAS_DEBUG_WX support")
    Signed-off-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik

    Heiko Carstens
     

14 Sep, 2020

10 commits

  • We don't need to export pages if we destroy the VM configuration
    afterwards anyway. Instead we can destroy the page which will zero it
    and then make it accessible to the host.

    Destroying is about twice as fast as the export.

    Signed-off-by: Janosch Frank
    Reviewed-by: Claudio Imbrenda
    Reviewed-by: Thomas Huth
    Reviewed-by: Cornelia Huck
    Link: https://lore.kernel.org/kvm/20200907124700.10374-2-frankja@linux.ibm.com/
    Signed-off-by: Janosch Frank
    Signed-off-by: Vasily Gorbik

    Janosch Frank
     
  • Signed-off-by: Vasily Gorbik
    [hca@linux.ibm.com: add more markers, rename some markers]
    Signed-off-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik

    Vasily Gorbik
     
  • ARCH_HAS_DEBUG_WX feature support brought attention to the fact that
    currently initial kasan shadow memory mapped without noexec flag. So fix that.

    Temporary initial identity mapping is still created without noexec, but
    it is replaced by properly set up paging later.

    Signed-off-by: Vasily Gorbik
    Signed-off-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik

    Vasily Gorbik
     
  • Checks the whole kernel address space for W+X mappings. Note that
    currently the first lowcore page unfortunately has to be mapped
    W+X. Therefore this not reported as an insecure mapping.

    For the very same reason the wording is also different to other
    architectures if the test passes:

    On s390 it is "no unexpected W+X pages found" instead of
    "no W+X pages found".

    Tested-by: Vasily Gorbik
    Signed-off-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik

    Heiko Carstens
     
  • s390 version of ae5d1cf358a5 ("arm64: dump: Make the page table
    dumping seq_file optional").

    Tested-by: Vasily Gorbik
    Signed-off-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik

    Heiko Carstens
     
  • This is currently only preventing that outdated information is
    provided to user space. A concurrent split of huge/large pages does
    modify the kernel page tables, however either the huge/large mapping
    is reported or the split area is being walked.

    This "fixes" also only a potential future bug, since split pages could
    also be merged again if page permissions are the same for larger
    memory areas.

    Reviewed-by: Vasily Gorbik
    Signed-off-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik

    Heiko Carstens
     
  • This is the s390 variant of commit bf2b59f60ee1 ("arm64/mm: Hold
    memory hotplug lock while walking for kernel page table dump").

    Right now this doesn't fix any real bug, however as soon as kvm
    patches get merged which make use of memory remove we might end up
    dereferencing/accessing freed page tables.

    Therefore fix this potential bug already now.

    Reviewed-by: Vasily Gorbik
    Signed-off-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik

    Heiko Carstens
     
  • Make use of generic ptdump infrastructure.

    Reviewed-by: Vasily Gorbik
    Signed-off-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik

    Heiko Carstens
     
  • With our current support for the new MIO PCI instructions, write
    combining/write back MMIO memory can be obtained via the pci_iomap_wc()
    and pci_iomap_wc_range() functions.
    This is achieved by using the write back address for a specific bar
    as provided in clp_store_query_pci_fn()

    These functions are however not widely used and instead drivers often
    rely on ioremap_wc() and ioremap_prot(), which on other platforms enable
    write combining using a PTE flag set through the pgrprot value.

    While we do not have a write combining flag in the low order flag bits
    of the PTE like x86_64 does, with MIO support, there is a write back bit
    in the physical address (bit 1 on z15) and thus also the PTE.
    Which bit is used to toggle write back and whether it is available at
    all, is however not fixed in the architecture. Instead we get this
    information from the CLP Store Logical Processor Characteristics for PCI
    command. When the write back bit is not provided we fall back to the
    existing behavior.

    Signed-off-by: Niklas Schnelle
    Reviewed-by: Pierre Morel
    Reviewed-by: Gerald Schaefer
    Signed-off-by: Vasily Gorbik

    Niklas Schnelle
     
  • Program exception 3f (secure storage violation) can only be detected
    when the CPU is running in SIE with a format 4 state description,
    e.g. running a protected guest. Because of this and because user
    space partly controls the guest memory mapping and can trigger this
    exception, we want to send a SIGSEGV to the process running the guest
    and not panic the kernel.

    Signed-off-by: Janosch Frank
    Cc: # 5.7
    Fixes: 084ea4d611a3 ("s390/mm: add (non)secure page access exceptions handlers")
    Reviewed-by: Claudio Imbrenda
    Reviewed-by: Cornelia Huck
    Acked-by: Christian Borntraeger
    Signed-off-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik

    Janosch Frank
     

27 Aug, 2020

1 commit


14 Aug, 2020

1 commit

  • Pull more s390 updates from Heiko Carstens:

    - Allow s390 debug feature to handle finally more than 256 CPU numbers,
    instead of truncating the most significant bits.

    - Improve THP splitting required by qemu processes by making use of
    walk_page_vma() instead of calling follow_page() for every single
    page within each vma.

    - Add missing ZCRYPT dependency to VFIO_AP to fix potential compile
    problems.

    - Remove not required select CLOCKSOURCE_VALIDATE_LAST_CYCLE again.

    - Set node distance to LOCAL_DISTANCE instead of 0, since e.g. libnuma
    translates a node distance of 0 to "no NUMA support available".

    - Couple of other minor fixes and improvements.

    * tag 's390-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
    s390/numa: move code to arch/s390/kernel
    s390/time: remove select CLOCKSOURCE_VALIDATE_LAST_CYCLE again
    s390/debug: debug feature version 3
    s390/Kconfig: add missing ZCRYPT dependency to VFIO_AP
    s390/numa: set node distance to LOCAL_DISTANCE
    s390/pkey: remove redundant variable initialization
    s390/test_unwind: fix possible memleak in test_unwind()
    s390/gmap: improve THP splitting
    s390/atomic: circumvent gcc 10 build regression

    Linus Torvalds
     

13 Aug, 2020

3 commits

  • After the cleanup of page fault accounting, gup does not need to pass
    task_struct around any more. Remove that parameter in the whole gup
    stack.

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: John Hubbard
    Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • Use the general page fault accounting by passing regs into
    handle_mm_fault(). It naturally solve the issue of multiple page fault
    accounting when page fault retry happened.

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: Gerald Schaefer
    Acked-by: Gerald Schaefer
    Cc: Alexander Gordeev
    Cc: Heiko Carstens
    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Link: http://lkml.kernel.org/r/20200707225021.200906-19-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • Patch series "mm: Page fault accounting cleanups", v5.

    This is v5 of the pf accounting cleanup series. It originates from Gerald
    Schaefer's report on an issue a week ago regarding to incorrect page fault
    accountings for retried page fault after commit 4064b9827063 ("mm: allow
    VM_FAULT_RETRY for multiple times"):

    https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

    What this series did:

    - Correct page fault accounting: we do accounting for a page fault
    (no matter whether it's from #PF handling, or gup, or anything else)
    only with the one that completed the fault. For example, page fault
    retries should not be counted in page fault counters. Same to the
    perf events.

    - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
    event is used in an adhoc way across different archs.

    Case (1): for many archs it's done at the entry of a page fault
    handler, so that it will also cover e.g. errornous faults.

    Case (2): for some other archs, it is only accounted when the page
    fault is resolved successfully.

    Case (3): there're still quite some archs that have not enabled
    this perf event.

    Since this series will touch merely all the archs, we unify this
    perf event to always follow case (1), which is the one that makes most
    sense. And since we moved the accounting into handle_mm_fault, the
    other two MAJ/MIN perf events are well taken care of naturally.

    - Unify definition of "major faults": the definition of "major
    fault" is slightly changed when used in accounting (not
    VM_FAULT_MAJOR). More information in patch 1.

    - Always account the page fault onto the one that triggered the page
    fault. This does not matter much for #PF handlings, but mostly for
    gup. More information on this in patch 25.

    Patchset layout:

    Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
    Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
    Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
    Patch 25: Cleanup GUP task_struct pointer since it's not needed any more

    This patch (of 25):

    This is a preparation patch to move page fault accountings into the
    general code in handle_mm_fault(). This includes both the per task
    flt_maj/flt_min counters, and the major/minor page fault perf events. To
    do this, the pt_regs pointer is passed into handle_mm_fault().

    PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
    handlers.

    So far, all the pt_regs pointer that passed into handle_mm_fault() is
    NULL, which means this patch should have no intented functional change.

    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Cc: Albert Ou
    Cc: Alexander Gordeev
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Christian Borntraeger
    Cc: Chris Zankel
    Cc: Dave Hansen
    Cc: David S. Miller
    Cc: Geert Uytterhoeven
    Cc: Gerald Schaefer
    Cc: Greentime Hu
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: James E.J. Bottomley
    Cc: John Hubbard
    Cc: Jonas Bonn
    Cc: Ley Foon Tan
    Cc: "Luck, Tony"
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Cc: Richard Henderson
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Stefan Kristiansson
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
    Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     

12 Aug, 2020

1 commit

  • During s390_enable_sie(), we need to take care of splitting all qemu user
    process THP mappings. This is currently done with follow_page(FOLL_SPLIT),
    by simply iterating over all vma ranges, with PAGE_SIZE increment.

    This logic is sub-optimal and can result in a lot of unnecessary overhead,
    especially when using qemu and ASAN with large shadow map. Ilya reported
    significant system slow-down with one CPU busy for a long time and overall
    unresponsiveness.

    Fix this by using walk_page_vma() and directly calling split_huge_pmd()
    only for present pmds, which greatly reduces overhead.

    Cc: # v5.4+
    Reported-by: Ilya Leoshkevich
    Tested-by: Ilya Leoshkevich
    Acked-by: Christian Borntraeger
    Signed-off-by: Gerald Schaefer
    Signed-off-by: Heiko Carstens

    Gerald Schaefer
     

08 Aug, 2020

2 commits

  • After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent
    functions that call memory_present() for each region in memblock.memory:
    sparse_memory_present_with_active_regions() and membocks_present().

    Moreover, all architectures have a call to either of these functions
    preceding the call to sparse_init() and in the most cases they are called
    one after the other.

    Mark the regions from memblock.memory as present during sparce_init() by
    making sparse_init() call memblocks_present(), make memblocks_present()
    and memory_present() functions static and remove redundant
    sparse_memory_present_with_active_regions() function.

    Also remove no longer required HAVE_MEMORY_PRESENT configuration option.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200712083130.22919-1-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Patch series "mm: cleanup usage of "

    Most architectures have very similar versions of pXd_alloc_one() and
    pXd_free_one() for intermediate levels of page table. These patches add
    generic versions of these functions in and enable
    use of the generic functions where appropriate.

    In addition, functions declared and defined in headers are
    used mostly by core mm and early mm initialization in arch and there is no
    actual reason to have the included all over the place.
    The first patch in this series removes unneeded includes of

    In the end it didn't work out as neatly as I hoped and moving
    pXd_alloc_track() definitions to would require
    unnecessary changes to arches that have custom page table allocations, so
    I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
    to mm/.

    This patch (of 8):

    In most cases header is required only for allocations of
    page table memory. Most of the .c files that include that header do not
    use symbols declared in and do not require that header.

    As for the other header files that used to include , it is
    possible to move that include into the .c file that actually uses symbols
    from and drop the include from the header file.

    The process was somewhat automated using

    sed -i -E '/[
    Signed-off-by: Andrew Morton
    Reviewed-by: Pekka Enberg
    Acked-by: Geert Uytterhoeven [m68k]
    Cc: Abdul Haleem
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Christophe Leroy
    Cc: Joerg Roedel
    Cc: Max Filippov
    Cc: Peter Zijlstra
    Cc: Satheesh Rajendran
    Cc: Stafford Horne
    Cc: Stephen Rothwell
    Cc: Steven Rostedt
    Cc: Joerg Roedel
    Cc: Matthew Wilcox
    Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

04 Aug, 2020

1 commit

  • Pull s390 updates from Heiko Carstens:

    - Add support for function error injection.

    - Add support for custom exception handlers, as required by
    BPF_PROBE_MEM.

    - Add support for BPF_PROBE_MEM.

    - Add trace events for idle enter / exit for the s390 specific idle
    implementation.

    - Remove unused zcore memmmap device.

    - Remove unused "raw view" from s390 debug feature.

    - AP bus + zcrypt device driver code refactoring.

    - Provide cex4 cca sysfs attributes for cex3 for zcrypt device driver.

    - Expose only minimal interface to walk physmem for mm/memblock. This
    is a common code change and it has been agreed on with Mike Rapoport
    and Andrew Morton that this can go upstream via the s390 tree.

    - Rework of the s390 vmem/vmmemap code to allow for future memory hot
    remove.

    - Get rid of FORCE_MAX_ZONEORDER to finally allow for order-10
    allocations again, instead of only order-8 allocations.

    - Various small improvements and fixes.

    * tag 's390-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
    s390/vmemmap: coding style updates
    s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive sections
    s390/vmemmap: remember unused sub-pmd ranges
    s390/vmemmap: fallback to PTEs if mapping large PMD fails
    s390/vmem: cleanup empty page tables
    s390/vmemmap: take the vmem_mutex when populating/freeing
    s390/vmemmap: cleanup when vmemmap_populate() fails
    s390/vmemmap: extend modify_pagetable() to handle vmemmap
    s390/vmem: consolidate vmem_add_range() and vmem_remove_range()
    s390/vmem: rename vmem_add_mem() to vmem_add_range()
    s390: enable HAVE_FUNCTION_ERROR_INJECTION
    s390/pci: clarify comment in s390_mmio_read/write
    s390/time: improve comparison for tod steering
    s390/time: select CLOCKSOURCE_VALIDATE_LAST_CYCLE
    s390/time: use CLOCKSOURCE_MASK
    s390/bpf: implement BPF_PROBE_MEM
    s390/kernel: expand exception table logic to allow new handling options
    s390/kernel: unify EX_TABLE* implementations
    s390/mm: allow order 10 allocations
    s390/mm: avoid trimming to MAX_ORDER
    ...

    Linus Torvalds
     

27 Jul, 2020

10 commits

  • Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Let's avoid memset(PAGE_UNUSED) when adding consecutive sections,
    whereby the vmemmap of a single section does not span full PMDs.

    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Cc: Gerald Schaefer
    Signed-off-by: David Hildenbrand
    Message-Id:
    Signed-off-by: Heiko Carstens

    David Hildenbrand
     
  • With a memmap size of 56 bytes or 72 bytes per page, the memmap for a
    256 MB section won't span full PMDs. As we populate single sections and
    depopulate single sections, the depopulation step would not be able to
    free all vmemmap pmds anymore.

    Do it similarly to x86, marking the unused memmap ranges in a special way
    (pad it with 0xFD).

    This allows us to add/remove sections, cleaning up all allocated
    vmemmap pages even if the memmap size is not multiple of 16 bytes per page.

    A 56 byte memmap can, for example, be created with !CONFIG_MEMCG and
    !CONFIG_SLUB.

    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Cc: Gerald Schaefer
    Signed-off-by: David Hildenbrand
    Message-Id:
    Signed-off-by: Heiko Carstens

    David Hildenbrand
     
  • Let's fallback to single pages if short on huge pages. No need to stop
    memory hotplug.

    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Cc: Gerald Schaefer
    Signed-off-by: David Hildenbrand
    Message-Id:
    Signed-off-by: Heiko Carstens

    David Hildenbrand
     
  • Let's cleanup empty page tables. Consider only page tables that fully
    fall into the idendity mapping and the vmemmap range.

    As there are no valid accesses to vmem/vmemmap within non-populated ranges,
    the single tlb flush at the end should be sufficient.

    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Cc: Gerald Schaefer
    Signed-off-by: David Hildenbrand
    Message-Id:
    Signed-off-by: Heiko Carstens

    David Hildenbrand
     
  • Let's synchronize all accesses to the 1:1 and vmemmap mappings. This will
    be especially relevant when wanting to cleanup empty page tables that could
    be shared by both. Avoid races when removing tables that might be just
    about to get reused.

    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Cc: Gerald Schaefer
    Signed-off-by: David Hildenbrand
    Message-Id:
    Signed-off-by: Heiko Carstens

    David Hildenbrand
     
  • Cleanup what we partially added in case vmemmap_populate() fails. For
    vmem, this is already handled by vmem_add_mapping().

    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Cc: Gerald Schaefer
    Signed-off-by: David Hildenbrand
    Message-Id:
    Signed-off-by: Heiko Carstens

    David Hildenbrand
     
  • Extend our shiny new modify_pagetable() to handle !direct (vmemmap)
    mappings. Convert vmemmap_populate() and implement vmemmap_free().

    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Cc: Gerald Schaefer
    Signed-off-by: David Hildenbrand
    Message-Id:
    Signed-off-by: Heiko Carstens

    David Hildenbrand
     
  • We want to have only a single pagetable walker and reuse the same
    functionality for vmemmap handling. Let's start by consolidating
    vmem_add_range() and vmem_remove_range(), converting it into a
    recursive implementation.

    A recursive implementation makes it easier to expand individual cases
    without harming readability. In addition, we minimize traversing the
    whole hierarchy over and over again.

    One change is that we don't unmap large PMDs/PUDs when not completely
    covered by the request, something that should never happen with direct
    mappings, unless one would be removing in other granularity than added,
    which would be broken already.

    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Cc: Gerald Schaefer
    Signed-off-by: David Hildenbrand
    Message-Id:
    Signed-off-by: Heiko Carstens

    David Hildenbrand
     
  • Let's match the name to vmem_remove_range().

    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Cc: Gerald Schaefer
    Signed-off-by: David Hildenbrand
    Message-Id:
    Signed-off-by: Heiko Carstens

    David Hildenbrand
     

20 Jul, 2020

1 commit

  • This is a s390 port of commit 548acf19234d ("x86/mm: Expand the
    exception table logic to allow new handling options"), which is needed
    for implementing BPF_PROBE_MEM on s390.

    The new handler field is made 64-bit in order to allow pointing from
    dynamically allocated entries to handlers in kernel text. Unlike on x86,
    NULL is used instead of ex_handler_default. This is because exception
    tables are used by boot/text_dma.S, and it would be a pain to preserve
    ex_handler_default.

    The new infrastructure is ignored in early_pgm_check_handler, since
    there is no pt_regs.

    Signed-off-by: Ilya Leoshkevich
    Reviewed-by: Heiko Carstens
    Signed-off-by: Heiko Carstens

    Ilya Leoshkevich