16 Nov, 2020

1 commit

  • Stefan Agner reported a bug when using zsram on 32-bit Arm machines
    with RAM above the 4GB address boundary:

    Unable to handle kernel NULL pointer dereference at virtual address 00000000
    pgd = a27bd01c
    [00000000] *pgd=236a0003, *pmd=1ffa64003
    Internal error: Oops: 207 [#1] SMP ARM
    Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
    CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
    Hardware name: BCM2711
    PC is at zs_map_object+0x94/0x338
    LR is at zram_bvec_rw.constprop.0+0x330/0xa64
    pc : [] lr : [] psr: 60000013
    sp : e376bbe0 ip : 00000000 fp : c1e2921c
    r10: 00000002 r9 : c1dda730 r8 : 00000000
    r7 : e8ff7a00 r6 : 00000000 r5 : 02f9ffa0 r4 : e3710000
    r3 : 000fdffe r2 : c1e0ce80 r1 : ebf979a0 r0 : 00000000
    Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
    Control: 30c5383d Table: 235c2a80 DAC: fffffffd
    Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
    Stack: (0xe376bbe0 to 0xe376c000)

    As it turns out, zsram needs to know the maximum memory size, which
    is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in
    MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture.

    The same problem will be hit on all 32-bit architectures that have a
    physical address space larger than 4GB and happen to not enable sparsemem
    and include asm/sparsemem.h from asm/pgtable.h.

    After the initial discussion, I suggested just always defining
    MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is
    set, or provoking a build error otherwise. This addresses all
    configurations that can currently have this runtime bug, but
    leaves all other configurations unchanged.

    I looked up the possible number of bits in source code and
    datasheets, here is what I found:

    - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used
    - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never
    support more than 32 bits, even though supersections in theory allow
    up to 40 bits as well.
    - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5
    XPA supports up to 60 bits in theory, but 40 bits are more than
    anyone will ever ship
    - On PowerPC, there are three different implementations of 36 bit
    addressing, but 32-bit is used without CONFIG_PTE_64BIT
    - On RISC-V, the normal page table format can support 34 bit
    addressing. There is no highmem support on RISC-V, so anything
    above 2GB is unused, but it might be useful to eventually support
    CONFIG_ZRAM for high pages.

    Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library")
    Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS")
    Acked-by: Thomas Bogendoerfer
    Reviewed-by: Stefan Agner
    Tested-by: Stefan Agner
    Acked-by: Mike Rapoport
    Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

03 Nov, 2020

1 commit

  • The purpose of io_remap_pfn_range() is to map IO memory, such as a
    memory mapped IO exposed through a PCI BAR. IO devices do not
    understand encryption, so this memory must always be decrypted.
    Automatically call pgprot_decrypted() as part of the generic
    implementation.

    This fixes a bug where enabling AMD SME causes subsystems, such as RDMA,
    using io_remap_pfn_range() to expose BAR pages to user space to fail.
    The CPU will encrypt access to those BAR pages instead of passing
    unencrypted IO directly to the device.

    Places not mapping IO should use remap_pfn_range().

    Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption")
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Tom Lendacky
    Cc: Thomas Gleixner
    Cc: Andrey Ryabinin
    Cc: Borislav Petkov
    Cc: Brijesh Singh
    Cc: Jonathan Corbet
    Cc: Dmitry Vyukov
    Cc: "Dave Young"
    Cc: Alexander Potapenko
    Cc: Konrad Rzeszutek Wilk
    Cc: Andy Lutomirski
    Cc: Larry Woodman
    Cc: Matt Fleming
    Cc: Ingo Molnar
    Cc: "Michael S. Tsirkin"
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Toshimitsu Kani
    Cc:
    Link: https://lkml.kernel.org/r/0-v1-025d64bdf6c4+e-amd_sme_fix_jgg@nvidia.com
    Signed-off-by: Linus Torvalds

    Jason Gunthorpe
     

13 Oct, 2020

1 commit

  • Pull arm64 updates from Will Deacon:
    "There's quite a lot of code here, but much of it is due to the
    addition of a new PMU driver as well as some arm64-specific selftests
    which is an area where we've traditionally been lagging a bit.

    In terms of exciting features, this includes support for the Memory
    Tagging Extension which narrowly missed 5.9, hopefully allowing
    userspace to run with use-after-free detection in production on CPUs
    that support it. Work is ongoing to integrate the feature with KASAN
    for 5.11.

    Another change that I'm excited about (assuming they get the hardware
    right) is preparing the ASID allocator for sharing the CPU page-table
    with the SMMU. Those changes will also come in via Joerg with the
    IOMMU pull.

    We do stray outside of our usual directories in a few places, mostly
    due to core changes required by MTE. Although much of this has been
    Acked, there were a couple of places where we unfortunately didn't get
    any review feedback.

    Other than that, we ran into a handful of minor conflicts in -next,
    but nothing that should post any issues.

    Summary:

    - Userspace support for the Memory Tagging Extension introduced by
    Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11.

    - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
    switching.

    - Fix and subsequent rewrite of our Spectre mitigations, including
    the addition of support for PR_SPEC_DISABLE_NOEXEC.

    - Support for the Armv8.3 Pointer Authentication enhancements.

    - Support for ASID pinning, which is required when sharing
    page-tables with the SMMU.

    - MM updates, including treating flush_tlb_fix_spurious_fault() as a
    no-op.

    - Perf/PMU driver updates, including addition of the ARM CMN PMU
    driver and also support to handle CPU PMU IRQs as NMIs.

    - Allow prefetchable PCI BARs to be exposed to userspace using normal
    non-cacheable mappings.

    - Implementation of ARCH_STACKWALK for unwinding.

    - Improve reporting of unexpected kernel traps due to BPF JIT
    failure.

    - Improve robustness of user-visible HWCAP strings and their
    corresponding numerical constants.

    - Removal of TEXT_OFFSET.

    - Removal of some unused functions, parameters and prototypes.

    - Removal of MPIDR-based topology detection in favour of firmware
    description.

    - Cleanups to handling of SVE and FPSIMD register state in
    preparation for potential future optimisation of handling across
    syscalls.

    - Cleanups to the SDEI driver in preparation for support in KVM.

    - Miscellaneous cleanups and refactoring work"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
    Revert "arm64: initialize per-cpu offsets earlier"
    arm64: random: Remove no longer needed prototypes
    arm64: initialize per-cpu offsets earlier
    kselftest/arm64: Check mte tagged user address in kernel
    kselftest/arm64: Verify KSM page merge for MTE pages
    kselftest/arm64: Verify all different mmap MTE options
    kselftest/arm64: Check forked child mte memory accessibility
    kselftest/arm64: Verify mte tag inclusion via prctl
    kselftest/arm64: Add utilities and a test to validate mte memory
    perf: arm-cmn: Fix conversion specifiers for node type
    perf: arm-cmn: Fix unsigned comparison to less than zero
    arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
    arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
    arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
    arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
    KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
    arm64: Get rid of arm64_ssbd_state
    KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
    KVM: arm64: Get rid of kvm_arm_have_ssbd()
    KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
    ...

    Linus Torvalds
     

27 Sep, 2020

1 commit

  • Currently to make sure that every page table entry is read just once
    gup_fast walks perform READ_ONCE and pass pXd value down to the next
    gup_pXd_range function by value e.g.:

    static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
    unsigned int flags, struct page **pages, int *nr)
    ...
    pudp = pud_offset(&p4d, addr);

    This function passes a reference on that local value copy to pXd_offset,
    and might get the very same pointer in return. This happens when the
    level is folded (on most arches), and that pointer should not be
    iterated.

    On s390 due to the fact that each task might have different 5,4 or
    3-level address translation and hence different levels folded the logic
    is more complex and non-iteratable pointer to a local copy leads to
    severe problems.

    Here is an example of what happens with gup_fast on s390, for a task
    with 3-level paging, crossing a 2 GB pud boundary:

    // addr = 0x1007ffff000, end = 0x10080001000
    static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
    unsigned int flags, struct page **pages, int *nr)
    {
    unsigned long next;
    pud_t *pudp;

    // pud_offset returns &p4d itself (a pointer to a value on stack)
    pudp = pud_offset(&p4d, addr);
    do {
    // on second iteratation reading "random" stack value
    pud_t pud = READ_ONCE(*pudp);

    // next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390
    next = pud_addr_end(addr, end);
    ...
    } while (pudp++, addr = next, addr != end); // pudp++ iterating over stack

    return 1;
    }

    This happens since s390 moved to common gup code with commit
    d1874a0c2805 ("s390/mm: make the pxd_offset functions more robust") and
    commit 1a42010cdc26 ("s390/mm: convert to the generic
    get_user_pages_fast code").

    s390 tried to mimic static level folding by changing pXd_offset
    primitives to always calculate top level page table offset in pgd_offset
    and just return the value passed when pXd_offset has to act as folded.

    What is crucial for gup_fast and what has been overlooked is that
    PxD_SIZE/MASK and thus pXd_addr_end should also change correspondingly.
    And the latter is not possible with dynamic folding.

    To fix the issue in addition to pXd values pass original pXdp pointers
    down to gup_pXd_range functions. And introduce pXd_offset_lockless
    helpers, which take an additional pXd entry value parameter. This has
    already been discussed in

    https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1

    Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code")
    Signed-off-by: Vasily Gorbik
    Signed-off-by: Andrew Morton
    Reviewed-by: Gerald Schaefer
    Reviewed-by: Alexander Gordeev
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Mike Rapoport
    Reviewed-by: John Hubbard
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Dave Hansen
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: Arnd Bergmann
    Cc: Andrey Ryabinin
    Cc: Heiko Carstens
    Cc: Christian Borntraeger
    Cc: Claudio Imbrenda
    Cc: [5.2+]
    Link: https://lkml.kernel.org/r/patch.git-943f1e5dcff2.your-ad-here.call-01599856292-ext-8676@work.hours
    Signed-off-by: Linus Torvalds

    Vasily Gorbik
     

04 Sep, 2020

1 commit

  • Arm's Memory Tagging Extension (MTE) adds some metadata (tags) to
    every physical page, when swapping pages out to disk it is necessary to
    save these tags, and later restore them when reading the pages back.

    Add some hooks along with dummy implementations to enable the
    arch code to handle this.

    Three new hooks are added to the swap code:
    * arch_prepare_to_swap() and
    * arch_swap_invalidate_page() / arch_swap_invalidate_area().
    One new hook is added to shmem:
    * arch_swap_restore()

    Signed-off-by: Steven Price
    [catalin.marinas@arm.com: add unlock_page() on the error path]
    [catalin.marinas@arm.com: dropped the _tags suffix]
    Signed-off-by: Catalin Marinas
    Acked-by: Andrew Morton

    Steven Price
     

18 Aug, 2020

1 commit

  • IA-64 is special and treats pgd_offset_k() differently to pgd_offset(),
    using different formulae to calculate the indices into the kernel and user
    PGDs. The index into the user PGDs takes into account the region number,
    but the index into the kernel (init_mm) PGD always assumes a predefined
    kernel region number. Commit 974b9b2c68f3 ("mm: consolidate pte_index() and
    pte_offset_*() definitions") made IA-64 use a generic pgd_offset_k() which
    incorrectly used pgd_index() for kernel page tables. As a result, the
    index into the kernel PGD was going out of bounds and the kernel hung
    during early boot.

    Allow overrides of pgd_offset_k() and override it on IA-64 with the old
    implementation that will correctly index the kernel PGD.

    Fixes: 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions")
    Reported-by: John Paul Adrian Glaubitz
    Signed-off-by: Jessica Clarke
    Tested-by: John Paul Adrian Glaubitz
    Acked-by: Tony Luck
    Signed-off-by: Mike Rapoport

    Jessica Clarke
     

13 Aug, 2020

1 commit

  • Drop the doubled words "used" and "by".

    Drop the repeated acronym "TLB" and make several other fixes around it.
    (capital letters, spellos)

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Reviewed-by: SeongJae Park
    Link: http://lkml.kernel.org/r/2bb6e13e-44df-4920-52d9-4d3539945f73@infradead.org
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

31 Jul, 2020

1 commit

  • The header defines some generic pgprot_*
    implementations, but they are only available when CONFIG_MMU is enabled.
    The RISC-V architecture, for example, therefore defines some of these
    pgprot_* macros for !NOMMU.

    Let's make the pgprot_* generic available even for !NOMMU so we can
    remove the RISC-V specific definitions.

    Compile-tested with x86 defconfig, and riscv defconfig and !MMU defconfig.

    Suggested-by: Palmer Dabbelt
    Reviewed-by: Mike Rapoport
    Acked-by: David Rientjes
    Signed-off-by: Pekka Enberg
    Signed-off-by: Palmer Dabbelt

    Pekka Enberg
     

20 Jun, 2020

1 commit

  • Since commit 9e343b467c70 ("READ_ONCE: Enforce atomicity for
    {READ,WRITE}_ONCE() memory accesses") it is not possible anymore to
    use READ_ONCE() to access complex page table entries like the one
    defined for powerpc 8xx with 16k size pages.

    Define a ptep_get() helper that architectures can override instead
    of performing a READ_ONCE() on the page table entry pointer.

    Fixes: 9e343b467c70 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
    Signed-off-by: Christophe Leroy
    Acked-by: Will Deacon
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/087fa12b6e920e32315136b998aa834f99242695.1592225558.git.christophe.leroy@csgroup.eu

    Christophe Leroy
     

10 Jun, 2020

4 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • All architectures define pte_index() as

    (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)

    and all architectures define pte_offset_kernel() as an entry in the array
    of PTEs indexed by the pte_index().

    For the most architectures the pte_offset_kernel() implementation relies
    on the availability of pmd_page_vaddr() that converts a PMD entry value to
    the virtual address of the page containing PTEs array.

    Let's move x86 definitions of the PTE accessors to the generic place in
    and then simply drop the respective definitions from the
    other architectures.

    The architectures that didn't provide pmd_page_vaddr() are updated to have
    that defined.

    The generic implementation of pte_offset_kernel() can be overridden by an
    architecture and alpha makes use of this because it has special ordering
    requirements for its version of pte_offset_kernel().

    [rppt@linux.ibm.com: v2]
    Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org
    [rppt@linux.ibm.com: update]
    Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org
    [rppt@linux.ibm.com: update]
    Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org
    [akpm@linux-foundation.org: fix x86 warning]
    [sfr@canb.auug.org.au: fix powerpc build]
    Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.com

    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The powerpc 32-bit implementation of pgtable has nice shortcuts for
    accessing kernel PMD and PTE for a given virtual address. Make these
    helpers available for all architectures.

    [rppt@linux.ibm.com: microblaze: fix page table traversal in setup_rt_frame()]
    Link: http://lkml.kernel.org/r/20200518191511.GD1118872@kernel.org
    [akpm@linux-foundation.org: s/pmd_ptr_k/pmd_off_k/ in various powerpc places]

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-9-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The include/linux/pgtable.h is going to be the home of generic page table
    manipulation functions.

    Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
    make the latter include asm/pgtable.h.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport