10 Jun, 2020

2 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

03 Jun, 2020

2 commits

  • Merge updates from Andrew Morton:
    "A few little subsystems and a start of a lot of MM patches.

    Subsystems affected by this patch series: squashfs, ocfs2, parisc,
    vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup,
    swap, memcg, pagemap, memory-failure, vmalloc, kasan"

    * emailed patches from Andrew Morton : (128 commits)
    kasan: move kasan_report() into report.c
    mm/mm_init.c: report kasan-tag information stored in page->flags
    ubsan: entirely disable alignment checks under UBSAN_TRAP
    kasan: fix clang compilation warning due to stack protector
    x86/mm: remove vmalloc faulting
    mm: remove vmalloc_sync_(un)mappings()
    x86/mm/32: implement arch_sync_kernel_mappings()
    x86/mm/64: implement arch_sync_kernel_mappings()
    mm/ioremap: track which page-table levels were modified
    mm/vmalloc: track which page-table levels were modified
    mm: add functions to track page directory modifications
    s390: use __vmalloc_node in stack_alloc
    powerpc: use __vmalloc_node in alloc_vm_stack
    arm64: use __vmalloc_node in arch_alloc_vmap_stack
    mm: remove vmalloc_user_node_flags
    mm: switch the test_vmalloc module to use __vmalloc_node
    mm: remove __vmalloc_node_flags_caller
    mm: remove both instances of __vmalloc_node_flags
    mm: remove the prot argument to __vmalloc_node
    mm: remove the pgprot argument to __vmalloc
    ...

    Linus Torvalds
     
  • Now, when reading /proc/PID/smaps, the PMD migration entry in page table
    is simply ignored. To improve the accuracy of /proc/PID/smaps, its
    parsing and processing is added.

    To test the patch, we run pmbench to eat 400 MB memory in background,
    then run /usr/bin/migratepages and `cat /proc/PID/smaps` every second.
    The issue as follows can be reproduced within 60 seconds.

    Before the patch, for the fully populated 400 MB anonymous VMA, some THP
    pages under migration may be lost as below.

    7f3f6a7e5000-7f3f837e5000 rw-p 00000000 00:00 0
    Size: 409600 kB
    KernelPageSize: 4 kB
    MMUPageSize: 4 kB
    Rss: 407552 kB
    Pss: 407552 kB
    Shared_Clean: 0 kB
    Shared_Dirty: 0 kB
    Private_Clean: 0 kB
    Private_Dirty: 407552 kB
    Referenced: 301056 kB
    Anonymous: 407552 kB
    LazyFree: 0 kB
    AnonHugePages: 405504 kB
    ShmemPmdMapped: 0 kB
    FilePmdMapped: 0 kB
    Shared_Hugetlb: 0 kB
    Private_Hugetlb: 0 kB
    Swap: 0 kB
    SwapPss: 0 kB
    Locked: 0 kB
    THPeligible: 1
    VmFlags: rd wr mr mw me ac

    After the patch, it will be always,

    7f3f6a7e5000-7f3f837e5000 rw-p 00000000 00:00 0
    Size: 409600 kB
    KernelPageSize: 4 kB
    MMUPageSize: 4 kB
    Rss: 409600 kB
    Pss: 409600 kB
    Shared_Clean: 0 kB
    Shared_Dirty: 0 kB
    Private_Clean: 0 kB
    Private_Dirty: 409600 kB
    Referenced: 294912 kB
    Anonymous: 409600 kB
    LazyFree: 0 kB
    AnonHugePages: 407552 kB
    ShmemPmdMapped: 0 kB
    FilePmdMapped: 0 kB
    Shared_Hugetlb: 0 kB
    Private_Hugetlb: 0 kB
    Swap: 0 kB
    SwapPss: 0 kB
    Locked: 0 kB
    THPeligible: 1
    VmFlags: rd wr mr mw me ac

    Signed-off-by: "Huang, Ying"
    Signed-off-by: Andrew Morton
    Reviewed-by: Zi Yan
    Acked-by: Michal Hocko
    Acked-by: Kirill A. Shutemov
    Acked-by: Vlastimil Babka
    Cc: Andrea Arcangeli
    Cc: Alexey Dobriyan
    Cc: Konstantin Khlebnikov
    Cc: "Jérôme Glisse"
    Cc: Yang Shi
    Link: http://lkml.kernel.org/r/20200403123059.1846960-1-ying.huang@intel.com
    Signed-off-by: Linus Torvalds

    Huang Ying
     

02 Jun, 2020

1 commit

  • Pull arm64 updates from Will Deacon:
    "A sizeable pile of arm64 updates for 5.8.

    Summary below, but the big two features are support for Branch Target
    Identification and Clang's Shadow Call stack. The latter is currently
    arm64-only, but the high-level parts are all in core code so it could
    easily be adopted by other architectures pending toolchain support

    Branch Target Identification (BTI):

    - Support for ARMv8.5-BTI in both user- and kernel-space. This allows
    branch targets to limit the types of branch from which they can be
    called and additionally prevents branching to arbitrary code,
    although kernel support requires a very recent toolchain.

    - Function annotation via SYM_FUNC_START() so that assembly functions
    are wrapped with the relevant "landing pad" instructions.

    - BPF and vDSO updates to use the new instructions.

    - Addition of a new HWCAP and exposure of BTI capability to userspace
    via ID register emulation, along with ELF loader support for the
    BTI feature in .note.gnu.property.

    - Non-critical fixes to CFI unwind annotations in the sigreturn
    trampoline.

    Shadow Call Stack (SCS):

    - Support for Clang's Shadow Call Stack feature, which reserves
    platform register x18 to point at a separate stack for each task
    that holds only return addresses. This protects function return
    control flow from buffer overruns on the main stack.

    - Save/restore of x18 across problematic boundaries (user-mode,
    hypervisor, EFI, suspend, etc).

    - Core support for SCS, should other architectures want to use it
    too.

    - SCS overflow checking on context-switch as part of the existing
    stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.

    CPU feature detection:

    - Removed numerous "SANITY CHECK" errors when running on a system
    with mismatched AArch32 support at EL1. This is primarily a concern
    for KVM, which disabled support for 32-bit guests on such a system.

    - Addition of new ID registers and fields as the architecture has
    been extended.

    Perf and PMU drivers:

    - Minor fixes and cleanups to system PMU drivers.

    Hardware errata:

    - Unify KVM workarounds for VHE and nVHE configurations.

    - Sort vendor errata entries in Kconfig.

    Secure Monitor Call Calling Convention (SMCCC):

    - Update to the latest specification from Arm (v1.2).

    - Allow PSCI code to query the SMCCC version.

    Software Delegated Exception Interface (SDEI):

    - Unexport a bunch of unused symbols.

    - Minor fixes to handling of firmware data.

    Pointer authentication:

    - Add support for dumping the kernel PAC mask in vmcoreinfo so that
    the stack can be unwound by tools such as kdump.

    - Simplification of key initialisation during CPU bringup.

    BPF backend:

    - Improve immediate generation for logical and add/sub instructions.

    vDSO:

    - Minor fixes to the linker flags for consistency with other
    architectures and support for LLVM's unwinder.

    - Clean up logic to initialise and map the vDSO into userspace.

    ACPI:

    - Work around for an ambiguity in the IORT specification relating to
    the "num_ids" field.

    - Support _DMA method for all named components rather than only PCIe
    root complexes.

    - Minor other IORT-related fixes.

    Miscellaneous:

    - Initialise debug traps early for KGDB and fix KDB cacheflushing
    deadlock.

    - Minor tweaks to early boot state (documentation update, set
    TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).

    - Refactoring and cleanup"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
    KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
    KVM: arm64: Check advertised Stage-2 page size capability
    arm64/cpufeature: Add get_arm64_ftr_reg_nowarn()
    ACPI/IORT: Remove the unused __get_pci_rid()
    arm64/cpuinfo: Add ID_MMFR4_EL1 into the cpuinfo_arm64 context
    arm64/cpufeature: Add remaining feature bits in ID_AA64PFR1 register
    arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register
    arm64/cpufeature: Add remaining feature bits in ID_AA64ISAR0 register
    arm64/cpufeature: Add remaining feature bits in ID_MMFR4 register
    arm64/cpufeature: Add remaining feature bits in ID_PFR0 register
    arm64/cpufeature: Introduce ID_MMFR5 CPU register
    arm64/cpufeature: Introduce ID_DFR1 CPU register
    arm64/cpufeature: Introduce ID_PFR2 CPU register
    arm64/cpufeature: Make doublelock a signed feature in ID_AA64DFR0
    arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register
    arm64/cpufeature: Add explicit ftr_id_isar0[] for ID_ISAR0 register
    arm64: mm: Add asid_gen_match() helper
    firmware: smccc: Fix missing prototype warning for arm_smccc_version_init
    arm64: vdso: Fix CFI directives in sigreturn trampoline
    arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction
    ...

    Linus Torvalds
     

05 May, 2020

1 commit

  • Merge in user support for Branch Target Identification, which narrowly
    missed the cut for 5.7 after a late ABI concern.

    * for-next/bti-user:
    arm64: bti: Document behaviour for dynamically linked binaries
    arm64: elf: Fix allnoconfig kernel build with !ARCH_USE_GNU_PROPERTY
    arm64: BTI: Add Kconfig entry for userspace BTI
    mm: smaps: Report arm64 guarded pages in smaps
    arm64: mm: Display guarded pages in ptdump
    KVM: arm64: BTI: Reset BTYPE when skipping emulated instructions
    arm64: BTI: Reset BTYPE when skipping emulated instructions
    arm64: traps: Shuffle code to eliminate forward declarations
    arm64: unify native/compat instruction skipping
    arm64: BTI: Decode BYTPE bits when printing PSTATE
    arm64: elf: Enable BTI at exec based on ELF program properties
    elf: Allow arch to tweak initial mmap prot flags
    arm64: Basic Branch Target Identification support
    ELF: Add ELF program property parsing support
    ELF: UAPI and Kconfig additions for ELF program properties

    Will Deacon
     

23 Apr, 2020

1 commit

  • Remove MPX leftovers in generic code.

    Fixes: 45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
    Signed-off-by: Jimmy Assarsson
    Signed-off-by: Borislav Petkov
    Acked-by: Dave Hansen
    Link: https://lkml.kernel.org/r/20200402172507.2786-1-jimmyassarsson@gmail.com

    Jimmy Assarsson
     

08 Apr, 2020

4 commits

  • It's clearer to just put this inline.

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200317193201.9924-5-adobriyan@gmail.com
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • The ppos is a private cursor, just like m->version. Use the canonical
    cursor, not a special one.

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200317193201.9924-3-adobriyan@gmail.com
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • Instead of setting m->version in the show method, set it in m_next(),
    where it should be. Also remove the fallback code for failing to find a
    vma, or version being zero.

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200317193201.9924-2-adobriyan@gmail.com
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • Instead of calling vma_stop() from m_start() and m_next(), do its work
    in m_stop().

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200317193201.9924-1-adobriyan@gmail.com
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

17 Mar, 2020

1 commit

  • The arm64 Branch Target Identification support is activated by marking
    executable pages as guarded pages. Report pages mapped this way in
    smaps to aid diagnostics.

    Signed-off-by: Mark Brown
    Signed-off-by: Daniel Kiss
    Reviewed-by: Kees Cook
    Signed-off-by: Catalin Marinas

    Daniel Kiss
     

04 Feb, 2020

1 commit

  • The pte_hole() callback is called at multiple levels of the page tables.
    Code dumping the kernel page tables needs to know what at what depth the
    missing entry is. Add this is an extra parameter to pte_hole(). When the
    depth isn't know (e.g. processing a vma) then -1 is passed.

    The depth that is reported is the actual level where the entry is missing
    (ignoring any folding that is in place), i.e. any levels where
    PTRS_PER_P?D is set to 1 are ignored.

    Note that depth starts at 0 for a PGD so that PUD/PMD/PTE retain their
    natural numbers as levels 2/3/4.

    Link: http://lkml.kernel.org/r/20191218162402.45610-16-steven.price@arm.com
    Signed-off-by: Steven Price
    Tested-by: Zong Li
    Cc: Albert Ou
    Cc: Alexandre Ghiti
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christian Borntraeger
    Cc: Dave Hansen
    Cc: David S. Miller
    Cc: Heiko Carstens
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: James Hogan
    Cc: James Morse
    Cc: Jerome Glisse
    Cc: "Liang, Kan"
    Cc: Mark Rutland
    Cc: Michael Ellerman
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vineet Gupta
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Price
     

25 Sep, 2019

2 commits

  • In preparation for non-shmem THP, this patch adds a few stats and exposes
    them in /proc/meminfo, /sys/bus/node/devices//meminfo, and
    /proc//task//smaps.

    This patch is mostly a rewrite of Kirill A. Shutemov's earlier version:
    https://lkml.kernel.org/r/20170126115819.58875-5-kirill.shutemov@linux.intel.com/

    Link: http://lkml.kernel.org/r/20190801184244.3169074-5-songliubraving@fb.com
    Signed-off-by: Song Liu
    Acked-by: Rik van Riel
    Acked-by: Kirill A. Shutemov
    Acked-by: Johannes Weiner
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: William Kucharski
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     
  • Replace 1 << compound_order(page) with compound_nr(page). Minor
    improvements in readability.

    Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle)
    Reviewed-by: Andrew Morton
    Reviewed-by: Ira Weiny
    Acked-by: Kirill A. Shutemov
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

07 Sep, 2019

2 commits

  • The mm_walk structure currently mixed data and code. Split out the
    operations vectors into a new mm_walk_ops structure, and while we are
    changing the API also declare the mm_walk structure inside the
    walk_page_range and walk_page_vma functions.

    Based on patch from Linus Torvalds.

    Link: https://lore.kernel.org/r/20190828141955.22210-3-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Thomas Hellstrom
    Reviewed-by: Steven Price
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     
  • Add a new header for the two handful of users of the walk_page_range /
    walk_page_vma interface instead of polluting all users of mm.h with it.

    Link: https://lore.kernel.org/r/20190828141955.22210-2-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Thomas Hellstrom
    Reviewed-by: Steven Price
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     

19 Jul, 2019

1 commit

  • Commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
    vma") introduced THPeligible bit for processes' smaps. But, when
    checking the eligibility for shmem vma, __transparent_hugepage_enabled()
    is called to override the result from shmem_huge_enabled(). It may
    result in the anonymous vma's THP flag override shmem's. For example,
    running a simple test which create THP for shmem, but with anonymous THP
    disabled, when reading the process's smaps, it may show:

    7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
    Size: 4096 kB
    ...
    [snip]
    ...
    ShmemPmdMapped: 4096 kB
    ...
    [snip]
    ...
    THPeligible: 0

    And, /proc/meminfo does show THP allocated and PMD mapped too:

    ShmemHugePages: 4096 kB
    ShmemPmdMapped: 4096 kB

    This doesn't make too much sense. The shmem objects should be treated
    separately from anonymous THP. Calling shmem_huge_enabled() with
    checking MMF_DISABLE_THP sounds good enough. And, we could skip stack
    and dax vma check since we already checked if the vma is shmem already.

    Also check if vma is suitable for THP by calling
    transhuge_vma_suitable().

    And minor fix to smaps output format and documentation.

    Link: http://lkml.kernel.org/r/1560401041-32207-3-git-send-email-yang.shi@linux.alibaba.com
    Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
    Signed-off-by: Yang Shi
    Acked-by: Hugh Dickins
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: David Rientjes
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     

15 Jul, 2019

1 commit

  • Pull HMM updates from Jason Gunthorpe:
    "Improvements and bug fixes for the hmm interface in the kernel:

    - Improve clarity, locking and APIs related to the 'hmm mirror'
    feature merged last cycle. In linux-next we now see AMDGPU and
    nouveau to be using this API.

    - Remove old or transitional hmm APIs. These are hold overs from the
    past with no users, or APIs that existed only to manage cross tree
    conflicts. There are still a few more of these cleanups that didn't
    make the merge window cut off.

    - Improve some core mm APIs:
    - export alloc_pages_vma() for driver use
    - refactor into devm_request_free_mem_region() to manage
    DEVICE_PRIVATE resource reservations
    - refactor duplicative driver code into the core dev_pagemap
    struct

    - Remove hmm wrappers of improved core mm APIs, instead have drivers
    use the simplified API directly

    - Remove DEVICE_PUBLIC

    - Simplify the kconfig flow for the hmm users and core code"

    * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits)
    mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR
    mm: remove the HMM config option
    mm: sort out the DEVICE_PRIVATE Kconfig mess
    mm: simplify ZONE_DEVICE page private data
    mm: remove hmm_devmem_add
    mm: remove hmm_vma_alloc_locked_page
    nouveau: use devm_memremap_pages directly
    nouveau: use alloc_page_vma directly
    PCI/P2PDMA: use the dev_pagemap internal refcount
    device-dax: use the dev_pagemap internal refcount
    memremap: provide an optional internal refcount in struct dev_pagemap
    memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag
    memremap: remove the data field in struct dev_pagemap
    memremap: add a migrate_to_ram method to struct dev_pagemap_ops
    memremap: lift the devmap_enable manipulation into devm_memremap_pages
    memremap: pass a struct dev_pagemap to ->kill and ->cleanup
    memremap: move dev_pagemap callbacks into a separate structure
    memremap: validate the pagemap type passed to devm_memremap_pages
    mm: factor out a devm_request_free_mem_region helper
    mm: export alloc_pages_vma
    ...

    Linus Torvalds
     

13 Jul, 2019

5 commits

  • Report separate components (anon, file, and shmem) for PSS in
    smaps_rollup.

    This helps understand and tune the memory manager behavior in consumer
    devices, particularly mobile devices. Many of them (e.g. chromebooks and
    Android-based devices) use zram for anon memory, and perform disk reads
    for discarded file pages. The difference in latency is large (e.g.
    reading a single page from SSD is 30 times slower than decompressing a
    zram page on one popular device), thus it is useful to know how much of
    the PSS is anon vs. file.

    All the information is already present in /proc/pid/smaps, but much more
    expensive to obtain because of the large size of that procfs entry.

    This patch also removes a small code duplication in smaps_account, which
    would have gotten worse otherwise.

    Also updated Documentation/filesystems/proc.txt (the smaps section was a
    bit stale, and I added a smaps_rollup section) and
    Documentation/ABI/testing/procfs-smaps_rollup.

    [semenzato@chromium.org: v5]
    Link: http://lkml.kernel.org/r/20190626234333.44608-1-semenzato@chromium.org
    Link: http://lkml.kernel.org/r/20190626180429.174569-1-semenzato@chromium.org
    Signed-off-by: Luigi Semenzato
    Acked-by: Yu Zhao
    Cc: Sonny Rao
    Cc: Yu Zhao
    Cc: Brian Geffon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luigi Semenzato
     
  • Do not remain stuck forever if something goes wrong. Using a killable
    lock permits cleanup of stuck tasks and simplifies investigation.

    Replace the only unkillable mmap_sem lock in clear_refs_write().

    Link: http://lkml.kernel.org/r/156007493826.3335.5424884725467456239.stgit@buzz
    Signed-off-by: Konstantin Khlebnikov
    Reviewed-by: Roman Gushchin
    Reviewed-by: Cyrill Gorcunov
    Reviewed-by: Kirill Tkhai
    Acked-by: Michal Hocko
    Cc: Alexey Dobriyan
    Cc: Al Viro
    Cc: Matthew Wilcox
    Cc: Michal Koutný
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Do not remain stuck forever if something goes wrong. Using a killable
    lock permits cleanup of stuck tasks and simplifies investigation.

    Link: http://lkml.kernel.org/r/156007493638.3335.4872164955523928492.stgit@buzz
    Signed-off-by: Konstantin Khlebnikov
    Reviewed-by: Roman Gushchin
    Reviewed-by: Cyrill Gorcunov
    Reviewed-by: Kirill Tkhai
    Acked-by: Michal Hocko
    Cc: Alexey Dobriyan
    Cc: Al Viro
    Cc: Matthew Wilcox
    Cc: Michal Koutný
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Do not remain stuck forever if something goes wrong. Using a killable
    lock permits cleanup of stuck tasks and simplifies investigation.

    Link: http://lkml.kernel.org/r/156007493429.3335.14666825072272692455.stgit@buzz
    Signed-off-by: Konstantin Khlebnikov
    Reviewed-by: Roman Gushchin
    Reviewed-by: Cyrill Gorcunov
    Reviewed-by: Kirill Tkhai
    Acked-by: Michal Hocko
    Cc: Alexey Dobriyan
    Cc: Al Viro
    Cc: Matthew Wilcox
    Cc: Michal Koutný
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Do not remain stuck forever if something goes wrong. Using a killable
    lock permits cleanup of stuck tasks and simplifies investigation.

    This function is also used for /proc/pid/smaps.

    Link: http://lkml.kernel.org/r/156007493160.3335.14447544314127417266.stgit@buzz
    Signed-off-by: Konstantin Khlebnikov
    Reviewed-by: Roman Gushchin
    Reviewed-by: Cyrill Gorcunov
    Reviewed-by: Kirill Tkhai
    Acked-by: Michal Hocko
    Cc: Alexey Dobriyan
    Cc: Al Viro
    Cc: Matthew Wilcox
    Cc: Michal Koutný
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

03 Jul, 2019

1 commit

  • The code hasn't been used since it was added to the tree, and doesn't
    appear to actually be usable.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jason Gunthorpe
    Acked-by: Michal Hocko
    Reviewed-by: Dan Williams
    Tested-by: Dan Williams
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     

15 May, 2019

2 commits

  • This updates each existing invalidation to use the correct mmu notifier
    event that represent what is happening to the CPU page table. See the
    patch which introduced the events to see the rational behind this.

    Link: http://lkml.kernel.org/r/20190326164747.24405-7-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Reviewed-by: Ralph Campbell
    Reviewed-by: Ira Weiny
    Cc: Christian König
    Cc: Joonas Lahtinen
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Jan Kara
    Cc: Andrea Arcangeli
    Cc: Peter Xu
    Cc: Felix Kuehling
    Cc: Jason Gunthorpe
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Christian Koenig
    Cc: John Hubbard
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • CPU page table update can happens for many reasons, not only as a result
    of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as
    a result of kernel activities (memory compression, reclaim, migration,
    ...).

    Users of mmu notifier API track changes to the CPU page table and take
    specific action for them. While current API only provide range of virtual
    address affected by the change, not why the changes is happening.

    This patchset do the initial mechanical convertion of all the places that
    calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP
    event as well as the vma if it is know (most invalidation happens against
    a given vma). Passing down the vma allows the users of mmu notifier to
    inspect the new vma page protection.

    The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier
    should assume that every for the range is going away when that event
    happens. A latter patch do convert mm call path to use a more appropriate
    events for each call.

    This is done as 2 patches so that no call site is forgotten especialy
    as it uses this following coccinelle patch:

    %vm_mm, E3, E4)
    ...>

    @@
    expression E1, E2, E3, E4;
    identifier FN, VMA;
    @@
    FN(..., struct vm_area_struct *VMA, ...) {
    }

    @@
    expression E1, E2, E3, E4;
    identifier FN, VMA;
    @@
    FN(...) {
    struct vm_area_struct *VMA;
    }

    @@
    expression E1, E2, E3, E4;
    identifier FN;
    @@
    FN(...) {
    }
    ---------------------------------------------------------------------->%

    Applied with:
    spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place
    spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place
    spatch --sp-file mmu-notifier.spatch --dir mm --in-place

    Link: http://lkml.kernel.org/r/20190326164747.24405-6-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Reviewed-by: Ralph Campbell
    Reviewed-by: Ira Weiny
    Cc: Christian König
    Cc: Joonas Lahtinen
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Jan Kara
    Cc: Andrea Arcangeli
    Cc: Peter Xu
    Cc: Felix Kuehling
    Cc: Jason Gunthorpe
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Christian Koenig
    Cc: John Hubbard
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     

20 Apr, 2019

1 commit

  • The core dumping code has always run without holding the mmap_sem for
    writing, despite that is the only way to ensure that the entire vma
    layout will not change from under it. Only using some signal
    serialization on the processes belonging to the mm is not nearly enough.
    This was pointed out earlier. For example in Hugh's post from Jul 2017:

    https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils

    "Not strictly relevant here, but a related note: I was very surprised
    to discover, only quite recently, how handle_mm_fault() may be called
    without down_read(mmap_sem) - when core dumping. That seems a
    misguided optimization to me, which would also be nice to correct"

    In particular because the growsdown and growsup can move the
    vm_start/vm_end the various loops the core dump does around the vma will
    not be consistent if page faults can happen concurrently.

    Pretty much all users calling mmget_not_zero()/get_task_mm() and then
    taking the mmap_sem had the potential to introduce unexpected side
    effects in the core dumping code.

    Adding mmap_sem for writing around the ->core_dump invocation is a
    viable long term fix, but it requires removing all copy user and page
    faults and to replace them with get_dump_page() for all binary formats
    which is not suitable as a short term fix.

    For the time being this solution manually covers the places that can
    confuse the core dump either by altering the vma layout or the vma flags
    while it runs. Once ->core_dump runs under mmap_sem for writing the
    function mmget_still_valid() can be dropped.

    Allowing mmap_sem protected sections to run in parallel with the
    coredump provides some minor parallelism advantage to the swapoff code
    (which seems to be safe enough by never mangling any vma field and can
    keep doing swapins in parallel to the core dumping) and to some other
    corner case.

    In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
    however the side effect of this same race condition in /proc/pid/mem
    should be reproducible since before 2.6.12-rc2 so I couldn't add any
    other "Fixes:" because there's no hash beyond the git genesis commit.

    Because find_extend_vma() is the only location outside of the process
    context that could modify the "mm" structures under mmap_sem for
    reading, by adding the mmget_still_valid() check to it, all other cases
    that take the mmap_sem for reading don't need the new check after
    mmget_not_zero()/get_task_mm(). The expand_stack() in page fault
    context also doesn't need the new check, because all tasks under core
    dumping are frozen.

    Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
    Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
    Signed-off-by: Andrea Arcangeli
    Reported-by: Jann Horn
    Suggested-by: Oleg Nesterov
    Acked-by: Peter Xu
    Reviewed-by: Mike Rapoport
    Reviewed-by: Oleg Nesterov
    Reviewed-by: Jann Horn
    Acked-by: Jason Gunthorpe
    Acked-by: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

10 Mar, 2019

1 commit

  • Pull rdma updates from Jason Gunthorpe:
    "This has been a slightly more active cycle than normal with ongoing
    core changes and quite a lot of collected driver updates.

    - Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe

    - A new data transfer mode for HFI1 giving higher performance

    - Significant functional and bug fix update to the mlx5
    On-Demand-Paging MR feature

    - A chip hang reset recovery system for hns

    - Change mm->pinned_vm to an atomic64

    - Update bnxt_re to support a new 57500 chip

    - A sane netlink 'rdma link add' method for creating rxe devices and
    fixing the various unregistration race conditions in rxe's
    unregister flow

    - Allow lookup up objects by an ID over netlink

    - Various reworking of the core to driver interface:
    - drivers should not assume umem SGLs are in PAGE_SIZE chunks
    - ucontext is accessed via udata not other means
    - start to make the core code responsible for object memory
    allocation
    - drivers should convert struct device to struct ib_device via a
    helper
    - drivers have more tools to avoid use after unregister problems"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (280 commits)
    net/mlx5: ODP support for XRC transport is not enabled by default in FW
    IB/hfi1: Close race condition on user context disable and close
    RDMA/umem: Revert broken 'off by one' fix
    RDMA/umem: minor bug fix in error handling path
    RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
    cxgb4: kfree mhp after the debug print
    IB/rdmavt: Fix concurrency panics in QP post_send and modify to error
    IB/rdmavt: Fix loopback send with invalidate ordering
    IB/iser: Fix dma_nents type definition
    IB/mlx5: Set correct write permissions for implicit ODP MR
    bnxt_re: Clean cq for kernel consumers only
    RDMA/uverbs: Don't do double free of allocated PD
    RDMA: Handle ucontext allocations by IB/core
    RDMA/core: Fix a WARN() message
    bnxt_re: fix the regression due to changes in alloc_pbl
    IB/mlx4: Increase the timeout for CM cache
    IB/core: Abort page fault handler silently during owning process exit
    IB/mlx5: Validate correct PD before prefetch MR
    IB/mlx5: Protect against prefetch of invalid MR
    RDMA/uverbs: Store PR pointer before it is overwritten
    ...

    Linus Torvalds
     

06 Mar, 2019

2 commits

  • Architectures like ppc64 require to do a conditional tlb flush based on
    the old and new value of pte. Enable that by passing old pte value as
    the arg.

    Link: http://lkml.kernel.org/r/20190116085035.29729-3-aneesh.kumar@linux.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Heiko Carstens
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Mackerras
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • Patch series "NestMMU pte upgrade workaround for mprotect", v5.

    We can upgrade pte access (R -> RW transition) via mprotect. We need to
    make sure we follow the recommended pte update sequence as outlined in
    commit bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to
    handle nest MMU hang") for such updates. This patch series does that.

    This patch (of 5):

    Some architectures may want to call flush_tlb_range from these helpers.

    Link: http://lkml.kernel.org/r/20190116085035.29729-2-aneesh.kumar@linux.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Cc: Nicholas Piggin
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     

13 Feb, 2019

1 commit

  • The 'pss_locked' field of smaps_rollup was being calculated incorrectly.
    It accumulated the current pss everytime a locked VMA was found. Fix
    that by adding to 'pss_locked' the same time as that of 'pss' if the vma
    being walked is locked.

    Link: http://lkml.kernel.org/r/20190203065425.14650-1-sspatil@android.com
    Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
    Signed-off-by: Sandeep Patil
    Acked-by: Vlastimil Babka
    Reviewed-by: Joel Fernandes (Google)
    Cc: Alexey Dobriyan
    Cc: Daniel Colascione
    Cc: [4.14.x, 4.19.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sandeep Patil
     

08 Feb, 2019

1 commit

  • Taking a sleeping lock to _only_ increment a variable is quite the
    overkill, and pretty much all users do this. Furthermore, some drivers
    (ie: infiniband and scif) that need pinned semantics can go to quite
    some trouble to actually delay via workqueue (un)accounting for pinned
    pages when not possible to acquire it.

    By making the counter atomic we no longer need to hold the mmap_sem and
    can simply some code around it for pinned_vm users. The counter is 64-bit
    such that we need not worry about overflows such as rdma user input
    controlled from userspace.

    Reviewed-by: Ira Weiny
    Reviewed-by: Christoph Lameter
    Reviewed-by: Daniel Jordan
    Reviewed-by: Jan Kara
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Jason Gunthorpe

    Davidlohr Bueso
     

29 Dec, 2018

2 commits

  • Userspace falls short when trying to find out whether a specific memory
    range is eligible for THP. There are usecases that would like to know
    that
    http://lkml.kernel.org/r/alpine.DEB.2.21.1809251248450.50347@chino.kir.corp.google.com
    : This is used to identify heap mappings that should be able to fault thp
    : but do not, and they normally point to a low-on-memory or fragmentation
    : issue.

    The only way to deduce this now is to query for hg resp. nh flags and
    confronting the state with the global setting. Except that there is also
    PR_SET_THP_DISABLE that might change the picture. So the final logic is
    not trivial. Moreover the eligibility of the vma depends on the type of
    VMA as well. In the past we have supported only anononymous memory VMAs
    but things have changed and shmem based vmas are supported as well these
    days and the query logic gets even more complicated because the
    eligibility depends on the mount option and another global configuration
    knob.

    Simplify the current state and report the THP eligibility in
    /proc//smaps for each existing vma. Reuse
    transparent_hugepage_enabled for this purpose. The original
    implementation of this function assumes that the caller knows that the vma
    itself is supported for THP so make the core checks into
    __transparent_hugepage_enabled and use it for existing callers.
    __show_smap just use the new transparent_hugepage_enabled which also
    checks the vma support status (please note that this one has to be out of
    line due to include dependency issues).

    [mhocko@kernel.org: fix oops with NULL ->f_mapping]
    Link: http://lkml.kernel.org/r/20181224185106.GC16738@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/20181211143641.3503-3-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Dan Williams
    Cc: David Rientjes
    Cc: Jan Kara
    Cc: Mike Rapoport
    Cc: Paul Oppenheimer
    Cc: William Kucharski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • To avoid having to change many call sites everytime we want to add a
    parameter use a structure to group all parameters for the mmu_notifier
    invalidate_range_start/end cakks. No functional changes with this patch.

    [akpm@linux-foundation.org: coding style fixes]
    Link: http://lkml.kernel.org/r/20181205053628.3210-3-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Acked-by: Christian König
    Acked-by: Jan Kara
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Felix Kuehling
    Cc: Ralph Campbell
    Cc: John Hubbard
    From: Jérôme Glisse
    Subject: mm/mmu_notifier: use structure for invalidate_range_start/end calls v3

    fix build warning in migrate.c when CONFIG_MMU_NOTIFIER=n

    Link: http://lkml.kernel.org/r/20181213171330.8489-3-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     

29 Oct, 2018

1 commit

  • Pull XArray conversion from Matthew Wilcox:
    "The XArray provides an improved interface to the radix tree data
    structure, providing locking as part of the API, specifying GFP flags
    at allocation time, eliminating preloading, less re-walking the tree,
    more efficient iterations and not exposing RCU-protected pointers to
    its users.

    This patch set

    1. Introduces the XArray implementation

    2. Converts the pagecache to use it

    3. Converts memremap to use it

    The page cache is the most complex and important user of the radix
    tree, so converting it was most important. Converting the memremap
    code removes the only other user of the multiorder code, which allows
    us to remove the radix tree code that supported it.

    I have 40+ followup patches to convert many other users of the radix
    tree over to the XArray, but I'd like to get this part in first. The
    other conversions haven't been in linux-next and aren't suitable for
    applying yet, but you can see them in the xarray-conv branch if you're
    interested"

    * 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits)
    radix tree: Remove multiorder support
    radix tree test: Convert multiorder tests to XArray
    radix tree tests: Convert item_delete_rcu to XArray
    radix tree tests: Convert item_kill_tree to XArray
    radix tree tests: Move item_insert_order
    radix tree test suite: Remove multiorder benchmarking
    radix tree test suite: Remove __item_insert
    memremap: Convert to XArray
    xarray: Add range store functionality
    xarray: Move multiorder_check to in-kernel tests
    xarray: Move multiorder_shrink to kernel tests
    xarray: Move multiorder account test in-kernel
    radix tree test suite: Convert iteration test to XArray
    radix tree test suite: Convert tag_tagged_items to XArray
    radix tree: Remove radix_tree_clear_tags
    radix tree: Remove radix_tree_maybe_preload_order
    radix tree: Remove split/join code
    radix tree: Remove radix_tree_update_node_t
    page cache: Finish XArray conversion
    dax: Convert page fault handlers to XArray
    ...

    Linus Torvalds
     

27 Oct, 2018

1 commit

  • Leonardo reports an apparent regression in 4.19-rc7:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0
    PGD 0 P4D 0
    Oops: 0000 [#1] PREEMPT SMP PTI
    CPU: 3 PID: 6032 Comm: python Not tainted 4.19.0-041900rc7-lowlatency #201810071631
    Hardware name: LENOVO 80UG/Toronto 4A2, BIOS 0XCN45WW 08/09/2018
    RIP: 0010:smaps_pte_range+0x32d/0x540
    Code: 80 00 00 00 00 74 a9 48 89 de 41 f6 40 52 40 0f 85 04 02 00 00 49 2b 30 48 c1 ee 0c 49 03 b0 98 00 00 00 49 8b 80 a0 00 00 00 8b b8 f0 00 00 00 e8 b7 ef ec ff 48 85 c0 0f 84 71 ff ff ff a8
    RSP: 0018:ffffb0cbc484fb88 EFLAGS: 00010202
    RAX: 0000000000000000 RBX: 0000560ddb9e9000 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000560ddb9e9 RDI: 0000000000000001
    RBP: ffffb0cbc484fbc0 R08: ffff94a5a227a578 R09: ffff94a5a227a578
    R10: 0000000000000000 R11: 0000560ddbbe7000 R12: ffffe903098ba728
    R13: ffffb0cbc484fc78 R14: ffffb0cbc484fcf8 R15: ffff94a5a2e9cf48
    FS: 00007f6dfb683740(0000) GS:ffff94a5aaf80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000f0 CR3: 000000011c118001 CR4: 00000000003606e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    __walk_page_range+0x3c2/0x6f0
    walk_page_vma+0x42/0x60
    smap_gather_stats+0x79/0xe0
    ? gather_pte_stats+0x320/0x320
    ? gather_hugetlb_stats+0x70/0x70
    show_smaps_rollup+0xcd/0x1c0
    seq_read+0x157/0x400
    __vfs_read+0x3a/0x180
    ? security_file_permission+0x93/0xc0
    ? security_file_permission+0x93/0xc0
    vfs_read+0x8f/0x140
    ksys_read+0x55/0xc0
    __x64_sys_read+0x1a/0x20
    do_syscall_64+0x5a/0x110
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Decoded code matched to local compilation+disassembly points to
    smaps_pte_entry():

    } else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap
    && pte_none(*pte))) {
    page = find_get_entry(vma->vm_file->f_mapping,
    linear_page_index(vma, addr));

    Here, vma->vm_file is NULL. mss->check_shmem_swap should be false in that
    case, however for smaps_rollup, smap_gather_stats() can set the flag true
    for one vma and leave it true for subsequent vma's where it should be
    false.

    To fix, reset the check_shmem_swap flag to false. There's also related
    bug which sets mss->swap to shmem_swapped, which in the context of
    smaps_rollup overwrites any value accumulated from previous vma's. Fix
    that as well.

    Note that the report suggests a regression between 4.17.19 and 4.19-rc7,
    which makes the 4.19 series ending with commit 258f669e7e88 ("mm:
    /proc/pid/smaps_rollup: convert to single value seq_file") suspicious.
    But the mss was reused for rollup since 493b0e9d945f ("mm: add
    /proc/pid/smaps_rollup") so let's play it safe with the stable backport.

    Link: http://lkml.kernel.org/r/555fbd1f-4ac9-0b58-dcd4-5dc4380ff7ca@suse.cz
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=201377
    Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
    Signed-off-by: Vlastimil Babka
    Reported-by: Leonardo Soares Müller
    Tested-by: Leonardo Soares Müller
    Cc: Greg Kroah-Hartman
    Cc: Daniel Colascione
    Cc: Alexey Dobriyan
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

30 Sep, 2018

1 commit

  • Introduce xarray value entries and tagged pointers to replace radix
    tree exceptional entries. This is a slight change in encoding to allow
    the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
    value entry). It is also a change in emphasis; exceptional entries are
    intimidating and different. As the comment explains, you can choose
    to store values or pointers in the xarray and they are both first-class
    citizens.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Josef Bacik

    Matthew Wilcox
     

23 Aug, 2018

2 commits

  • The /proc/pid/smaps_rollup file is currently implemented via the
    m_start/m_next/m_stop seq_file iterators shared with the other maps files,
    that iterate over vma's. However, the rollup file doesn't print anything
    for each vma, only accumulate the stats.

    There are some issues with the current code as reported in [1] - the
    accumulated stats can get skewed if seq_file start()/stop() op is called
    multiple times, if show() is called multiple times, and after seeks to
    non-zero position.

    Patch [1] fixed those within existing design, but I believe it is
    fundamentally wrong to expose the vma iterators to the seq_file mechanism
    when smaps_rollup shows logically a single set of values for the whole
    address space.

    This patch thus refactors the code to provide a single "value" at offset
    0, with vma iteration to gather the stats done internally. This fixes the
    situations where results are skewed, and simplifies the code, especially
    in show_smap(), at the expense of somewhat less code reuse.

    [1] https://marc.info/?l=linux-mm&m=151927723128134&w=2

    [vbabka@suse.c: use seq_file infrastructure]
    Link: http://lkml.kernel.org/r/bf4525b0-fd5b-4c4c-2cb3-adee3dd95a48@suse.cz
    Link: http://lkml.kernel.org/r/20180723111933.15443-5-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Reported-by: Daniel Colascione
    Reviewed-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • To prepare for handling /proc/pid/smaps_rollup differently from
    /proc/pid/smaps factor out from show_smap() printing the parts of output
    that are common for both variants, which is the bulk of the gathered
    memory stats.

    [vbabka@suse.cz: add const, per Alexey]
    Link: http://lkml.kernel.org/r/b45f319f-cd04-337b-37f8-77f99786aa8a@suse.cz
    Link: http://lkml.kernel.org/r/20180723111933.15443-4-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Reviewed-by: Alexey Dobriyan
    Cc: Daniel Colascione
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka