07 Jan, 2021

1 commit

  • Collect the time for each allocation recorded in page owner so that
    allocation "surges" can be measured.

    Record the pid for each allocation recorded in page owner so that the
    source of allocation "surges" can be better identified.

    The above is very useful when doing memory analysis. On a crash for
    example, we can get this information from kdump (or ramdump) and parse it
    to figure out memory allocation problems.

    Please note that on x86_64 this increases the size of struct page_owner
    from 16 bytes to 32.

    Vlastimil: it's not a functionality intended for production, so unless
    somebody says they need to enable page_owner for debugging and this
    increase prevents them from fitting into available memory, let's not
    complicate things with making this optional.

    [lmark@codeaurora.org: v3]
    Link: https://lkml.kernel.org/r/20201210160357.27779-1-georgi.djakov@linaro.org

    Link: https://lkml.kernel.org/r/20201209125153.10533-1-georgi.djakov@linaro.org
    Signed-off-by: Liam Mark
    Signed-off-by: Georgi Djakov
    Acked-by: Vlastimil Babka
    Acked-by: Joonsoo Kim
    Cc: Jonathan Corbet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    (cherry picked from commit 9cc7e96aa846f9086431d6c2d33ff9ab42d72b2d)

    Bug: 175129313
    Signed-off-by: Suren Baghdasaryan
    Change-Id: I5e246ea009c7e9e34c1cc608bcd3196fc0e623b4

    Liam Mark
     

24 Oct, 2020

1 commit

  • Pull documentation fixes from Jonathan Corbet:
    "A handful of late-arriving documentation fixes"

    * tag 'docs-5.10-2' of git://git.lwn.net/linux:
    docs: Add two missing entries in vm sysctl index
    docs/vm: trivial fixes to several spelling mistakes
    docs: submitting-patches: describe preserving review/test tags
    Documentation: Chinese translation of Documentation/arm64/hugetlbpage.rst
    Documentation: x86: fix a missing word in x86_64/mm.rst.
    docs: driver-api: remove a duplicated index entry
    docs: lkdtm: Modernize and improve details
    docs: deprecated.rst: Expand str*cpy() replacement notes
    docs/cpu-load: format the example code.

    Linus Torvalds
     

23 Oct, 2020

1 commit


16 Oct, 2020

1 commit

  • Literal blocks with :: markup should be indented, as otherwise
    Sphinx complains:

    Documentation/vm/hmm.rst:363: WARNING: Literal block expected; none found.

    Fixes: f7ebd9ed7767 ("mm/doc: add usage description for migrate_vma_*()")
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

15 Oct, 2020

1 commit

  • The :c:type:`foo` only works properly with structs before
    Sphinx 3.x.

    On Sphinx 3.x, structs should now be declared using the
    .. c:struct, and referenced via :c:struct tag.

    As we now have the automarkup.py macro, that automatically
    convert:
    struct foo

    into cross-references, let's get rid of that, solving
    several warnings when building docs with Sphinx 3.x.

    Reviewed-by: André Almeida # blk-mq.rst
    Reviewed-by: Takashi Iwai # sound
    Reviewed-by: Mike Rapoport
    Reviewed-by: Greg Kroah-Hartman
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

14 Oct, 2020

1 commit

  • In the context of the anonymous address space lifespan description the
    'mm_users' reference counter is confused with 'mm_count'. I.e a "zombie"
    mm gets released when "mm_count" becomes zero, not "mm_users".

    Signed-off-by: Alexander Gordeev
    Signed-off-by: Andrew Morton
    Cc: Jonathan Corbet
    Link: https://lkml.kernel.org/r/1597040695-32633-1-git-send-email-agordeev@linux.ibm.com
    Signed-off-by: Linus Torvalds

    Alexander Gordeev
     

17 Sep, 2020

1 commit

  • The migrate_vma_setup(), migrate_vma_pages(), and migrate_vma_finalize()
    API usage by device drivers is not well documented.
    Add a description for how device drivers are expected to use it.

    Signed-off-by: Ralph Campbell
    Reviewed-by: Alistair Popple
    Link: https://lore.kernel.org/r/20200909212956.20104-1-rcampbell@nvidia.com
    Signed-off-by: Jonathan Corbet

    Ralph Campbell
     

11 Sep, 2020

1 commit


10 Sep, 2020

1 commit

  • Add Sphinx reference links to HMM and CPUSETS, and numerous small
    editorial changes to make the page_migration.rst document more readable.

    Signed-off-by: Ralph Campbell
    Reviewed-by: Randy Dunlap
    Link: https://lore.kernel.org/r/20200902225247.15213-1-rcampbell@nvidia.com
    Signed-off-by: Jonathan Corbet

    Ralph Campbell
     

13 Aug, 2020

1 commit

  • Add following new vmstat events which will help in validating THP
    migration without split. Statistics reported through these new VM events
    will help in performance debugging.

    1. THP_MIGRATION_SUCCESS
    2. THP_MIGRATION_FAILURE
    3. THP_MIGRATION_SPLIT

    In addition, these new events also update normal page migration statistics
    appropriately via PGMIGRATE_SUCCESS and PGMIGRATE_FAILURE. While here,
    this updates current trace event 'mm_migrate_pages' to accommodate now
    available THP statistics.

    [akpm@linux-foundation.org: s/hpage_nr_pages/thp_nr_pages/]
    [ziy@nvidia.com: v2]
    Link: http://lkml.kernel.org/r/C5E3C65C-8253-4638-9D3C-71A61858BB8B@nvidia.com
    [anshuman.khandual@arm.com: s/thp_nr_pages/hpage_nr_pages/]
    Link: http://lkml.kernel.org/r/1594287583-16568-1-git-send-email-anshuman.khandual@arm.com

    Signed-off-by: Anshuman Khandual
    Signed-off-by: Zi Yan
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Cc: Hugh Dickins
    Cc: Matthew Wilcox
    Cc: Zi Yan
    Cc: John Hubbard
    Cc: Naoya Horiguchi
    Link: http://lkml.kernel.org/r/1594080415-27924-1-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     

08 Aug, 2020

6 commits

  • After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent
    functions that call memory_present() for each region in memblock.memory:
    sparse_memory_present_with_active_regions() and membocks_present().

    Moreover, all architectures have a call to either of these functions
    preceding the call to sparse_init() and in the most cases they are called
    one after the other.

    Mark the regions from memblock.memory as present during sparce_init() by
    making sparse_init() call memblocks_present(), make memblocks_present()
    and memory_present() functions static and remove redundant
    sparse_memory_present_with_active_regions() function.

    Also remove no longer required HAVE_MEMORY_PRESENT configuration option.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200712083130.22919-1-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • There are many instances where vmemap allocation is often switched between
    regular memory and device memory just based on whether altmap is available
    or not. vmemmap_alloc_block_buf() is used in various platforms to
    allocate vmemmap mappings. Lets also enable it to handle altmap based
    device memory allocation along with existing regular memory allocations.
    This will help in avoiding the altmap based allocation switch in many
    places. To summarize there are two different methods to call
    vmemmap_alloc_block_buf().

    vmemmap_alloc_block_buf(size, node, NULL) /* Allocate from system RAM */
    vmemmap_alloc_block_buf(size, node, altmap) /* Allocate from altmap */

    This converts altmap_alloc_block_buf() into a static function, drops it's
    entry from the header and updates Documentation/vm/memory-model.rst.

    Suggested-by: Robin Murphy
    Signed-off-by: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Tested-by: Jia He
    Reviewed-by: Catalin Marinas
    Cc: Jonathan Corbet
    Cc: Will Deacon
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Dan Williams
    Cc: David Hildenbrand
    Cc: Fenghua Yu
    Cc: Hsin-Yi Wang
    Cc: "Kirill A. Shutemov"
    Cc: Mark Rutland
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Michal Hocko
    Cc: Mike Rapoport
    Cc: Palmer Dabbelt
    Cc: Paul Walmsley
    Cc: Pavel Tatashin
    Cc: Steve Capper
    Cc: Tony Luck
    Cc: Yu Zhao
    Link: http://lkml.kernel.org/r/1594004178-8861-3-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     
  • This adds a specific description file for all arch page table helpers which
    is in sync with the semantics being tested via CONFIG_DEBUG_VM_PGTABLE. All
    future changes either to these descriptions here or the debug test should
    always remain in sync.

    [anshuman.khandual@arm.com: fold in Mike's patch for the rst document, fix typos in the rst document]
    Link: http://lkml.kernel.org/r/1594610587-4172-5-git-send-email-anshuman.khandual@arm.com

    Suggested-by: Mike Rapoport
    Signed-off-by: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Acked-by: Mike Rapoport
    Cc: Jonathan Corbet
    Cc: Mike Rapoport
    Cc: Vineet Gupta
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Heiko Carstens
    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Kirill A. Shutemov
    Cc: Paul Walmsley
    Cc: Palmer Dabbelt
    Cc: Zi Yan
    Link: http://lkml.kernel.org/r/1593996516-7186-5-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     
  • SLUB_DEBUG creates several files under /sys/kernel/slab// that can
    be read to check if the respective debugging options are enabled for given
    cache. Some options, namely sanity_checks, trace, and failslab can be
    also enabled and disabled at runtime by writing into the files.

    The runtime toggling is racy. Some options disable __CMPXCHG_DOUBLE when
    enabled, which means that in case of concurrent allocations, some can
    still use __CMPXCHG_DOUBLE and some not, leading to potential corruption.
    The s->flags field is also not updated or checked atomically. The
    simplest solution is to remove the runtime toggling. The extended
    slub_debug boot parameter syntax introduced by earlier patch should allow
    to fine-tune the debugging configuration during boot with same
    granularity.

    Signed-off-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Reviewed-by: Kees Cook
    Acked-by: Roman Gushchin
    Cc: Christoph Lameter
    Cc: Jann Horn
    Cc: Vijayanand Jitta
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Pekka Enberg
    Link: http://lkml.kernel.org/r/20200610163135.17364-5-vbabka@suse.cz
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • SLUB_DEBUG creates several files under /sys/kernel/slab// that can
    be read to check if the respective debugging options are enabled for given
    cache. The options can be also toggled at runtime by writing into the
    files. Some of those, namely red_zone, poison, and store_user can be
    toggled only when no objects yet exist in the cache.

    Vijayanand reports [1] that there is a problem with freelist randomization
    if changing the debugging option's state results in different number of
    objects per page, and the random sequence cache needs thus needs to be
    recomputed.

    However, another problem is that the check for "no objects yet exist in
    the cache" is racy, as noted by Jann [2] and fixing that would add
    overhead or otherwise complicate the allocation/freeing paths. Thus it
    would be much simpler just to remove the runtime toggling support. The
    documentation describes it's "In case you forgot to enable debugging on
    the kernel command line", but the neccessity of having no objects limits
    its usefulness anyway for many caches.

    Vijayanand describes an use case [3] where debugging is enabled for all
    but zram caches for memory overhead reasons, and using the runtime toggles
    was the only way to achieve such configuration. After the previous patch
    it's now possible to do that directly from the kernel boot option, so we
    can remove the dangerous runtime toggles by making the /sys attribute
    files read-only.

    While updating it, also improve the documentation of the debugging /sys files.

    [1] https://lkml.kernel.org/r/1580379523-32272-1-git-send-email-vjitta@codeaurora.org
    [2] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com
    [3] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org

    Reported-by: Vijayanand Jitta
    Reported-by: Jann Horn
    Signed-off-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Reviewed-by: Kees Cook
    Acked-by: Roman Gushchin
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Pekka Enberg
    Link: http://lkml.kernel.org/r/20200610163135.17364-3-vbabka@suse.cz
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Patch series "slub_debug fixes and improvements".

    The slub_debug kernel boot parameter can either apply a single set of
    options to all caches or a list of caches. There is a use case where
    debugging is applied for all caches and then disabled at runtime for
    specific caches, for performance and memory consumption reasons [1]. As
    runtime changes are dangerous, extend the boot parameter syntax so that
    multiple blocks of either global or slab-specific options can be
    specified, with blocks delimited by ';'. This will also support the use
    case of [1] without runtime changes.

    For details see the updated Documentation/vm/slub.rst

    [1] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org

    [weiyongjun1@huawei.com: make parse_slub_debug_flags() static]
    Link: http://lkml.kernel.org/r/20200702150522.4940-1-weiyongjun1@huawei.com

    Signed-off-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Reviewed-by: Kees Cook
    Cc: Vlastimil Babka
    Cc: Christoph Lameter
    Cc: Jann Horn
    Cc: Roman Gushchin
    Cc: Vijayanand Jitta
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Link: http://lkml.kernel.org/r/20200610163135.17364-2-vbabka@suse.cz
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

13 Jul, 2020

1 commit


11 Jun, 2020

1 commit

  • Pull more documentation updates from Jonathan Corbet:
    "A handful of late-arriving docs fixes, along with a patch changing a
    lot of HTTP links to HTTPS that had to be yanked and redone before the
    first pull"

    * tag 'docs-5.8-2' of git://git.lwn.net/linux:
    docs/memory-barriers.txt/kokr: smp_mb__{before,after}_atomic(): update Documentation
    Documentation: devres: add missing entry for devm_platform_get_and_ioremap_resource()
    Replace HTTP links with HTTPS ones: documentation
    docs: it_IT: address invalid reference warnings
    doc: zh_CN: use doc reference to resolve undefined label warning
    docs: Update the location of the LF NDA program
    docs: dev-tools: coccinelle: underlines

    Linus Torvalds
     

10 Jun, 2020

2 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Convert comments that reference old mmap_sem APIs to reference
    corresponding new mmap locking APIs instead.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Davidlohr Bueso
    Reviewed-by: Daniel Jordan
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

08 Jun, 2020

1 commit

  • Rationale:
    Reduces attack surface on kernel devs opening the links for MITM
    as HTTPS traffic is much harder to manipulate.

    Deterministic algorithm:
    For each file:
    For each line:
    If doesn't contain `\bxmlns\b`:
    For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
    If both the HTTP and HTTPS versions
    return 200 OK and serve the same content:
    Replace HTTP with HTTPS.

    Signed-off-by: Alexander A. Klimov
    Link: https://lore.kernel.org/r/20200526060544.25127-1-grandmaster@al2klimov.de
    Signed-off-by: Jonathan Corbet

    Alexander A. Klimov
     

04 Jun, 2020

2 commits

  • To see a sorted result from page_owner, there must be a tiresome
    preprocessing step before running page_owner_sort. This patch simply
    filters out lines which start with "PFN" while reading the page owner
    report.

    Signed-off-by: Changhee Han
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Cc: Vlastimil Babka
    Cc: Joonsoo Kim
    Cc: Jonathan Corbet
    Link: http://lkml.kernel.org/r/20200429052940.16968-1-ch0.han@lge.com
    Signed-off-by: Linus Torvalds

    Changhee Han
     
  • To reflect the updates to free_area_init() family of functions.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Tested-by: Hoan Tran [arm64]
    Cc: Baoquan He
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: "James E.J. Bottomley"
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200412194859.12663-22-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

03 Jun, 2020

3 commits

  • Pull hmm updates from Jason Gunthorpe:
    "This series adds a selftest for hmm_range_fault() and several of the
    DEVICE_PRIVATE migration related actions, and another simplification
    for hmm_range_fault()'s API.

    - Simplify hmm_range_fault() with a simpler return code, no
    HMM_PFN_SPECIAL, and no customizable output PFN format

    - Add a selftest for hmm_range_fault() and DEVICE_PRIVATE related
    functionality"

    * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
    MAINTAINERS: add HMM selftests
    mm/hmm/test: add selftests for HMM
    mm/hmm/test: add selftest driver for HMM
    mm/hmm: remove the customizable pfn format from hmm_range_fault
    mm/hmm: remove HMM_PFN_SPECIAL
    drm/amdgpu: remove dead code after hmm_range_fault()
    mm/hmm: make hmm_range_fault return 0 or -1

    Linus Torvalds
     
  • Merge updates from Andrew Morton:
    "A few little subsystems and a start of a lot of MM patches.

    Subsystems affected by this patch series: squashfs, ocfs2, parisc,
    vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup,
    swap, memcg, pagemap, memory-failure, vmalloc, kasan"

    * emailed patches from Andrew Morton : (128 commits)
    kasan: move kasan_report() into report.c
    mm/mm_init.c: report kasan-tag information stored in page->flags
    ubsan: entirely disable alignment checks under UBSAN_TRAP
    kasan: fix clang compilation warning due to stack protector
    x86/mm: remove vmalloc faulting
    mm: remove vmalloc_sync_(un)mappings()
    x86/mm/32: implement arch_sync_kernel_mappings()
    x86/mm/64: implement arch_sync_kernel_mappings()
    mm/ioremap: track which page-table levels were modified
    mm/vmalloc: track which page-table levels were modified
    mm: add functions to track page directory modifications
    s390: use __vmalloc_node in stack_alloc
    powerpc: use __vmalloc_node in alloc_vm_stack
    arm64: use __vmalloc_node in arch_alloc_vmap_stack
    mm: remove vmalloc_user_node_flags
    mm: switch the test_vmalloc module to use __vmalloc_node
    mm: remove __vmalloc_node_flags_caller
    mm: remove both instances of __vmalloc_node_flags
    mm: remove the prot argument to __vmalloc_node
    mm: remove the pgprot argument to __vmalloc
    ...

    Linus Torvalds
     
  • "toggle" means to change a boolean thing's state. This operation
    doesn't do that - it sets it to "true".

    Signed-off-by: Andrew Morton
    Acked-by: Rafael Aquini
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Pekka Enberg
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

01 Jun, 2020

1 commit

  • Recently, I switched over from swap-file to zramswap.

    When reading the Documentation/vm/zswap.rst file I fell over this typo.

    The parameter is called accept_threshold_percent not accept_threhsold_percent
    in /sys/module/zswap/parameters/ directory.

    Fixes: 45190f01dd402 ("mm/zswap.c: add allocation hysteresis if pool limit is hit")
    Cc: Vitaly Wool
    Signed-off-by: Sedat Dilek
    Link: https://lore.kernel.org/r/20200601005911.31222-1-sedat.dilek@gmail.com
    Signed-off-by: Jonathan Corbet

    Sedat Dilek
     

16 May, 2020

1 commit


11 May, 2020

2 commits

  • Presumably the intent here was that hmm_range_fault() could put the data
    into some HW specific format and thus avoid some work. However, nothing
    actually does that, and it isn't clear how anything actually could do that
    as hmm_range_fault() provides CPU addresses which must be DMA mapped.

    Perhaps there is some special HW that does not need DMA mapping, but we
    don't have any examples of this, and the theoretical performance win of
    avoiding an extra scan over the pfns array doesn't seem worth the
    complexity. Plus pfns needs to be scanned anyhow to sort out any
    DEVICE_PRIVATE pages.

    This version replaces the uint64_t with an usigned long containing a pfn
    and fixed flags. On input flags is filled with the HMM_PFN_REQ_* values,
    on successful output it is filled with HMM_PFN_* values, describing the
    state of the pages.

    amdgpu is simple to convert, it doesn't use snapshot and doesn't use
    per-page flags.

    nouveau uses only 16 hmm_pte entries at most (ie fits in a few cache
    lines), and it sweeps over its pfns array a couple of times anyhow. It
    also has a nasty call chain before it reaches the dma map and hardware
    suggesting performance isn't important:

    nouveau_svm_fault():
    args.i.m.method = NVIF_VMM_V0_PFNMAP
    nouveau_range_fault()
    nvif_object_ioctl()
    client->driver->ioctl()
    struct nvif_driver nvif_driver_nvkm:
    .ioctl = nvkm_client_ioctl
    nvkm_ioctl()
    nvkm_ioctl_path()
    nvkm_ioctl_v0[type].func(..)
    nvkm_ioctl_mthd()
    nvkm_object_mthd()
    struct nvkm_object_func nvkm_uvmm:
    .mthd = nvkm_uvmm_mthd
    nvkm_uvmm_mthd()
    nvkm_uvmm_mthd_pfnmap()
    nvkm_vmm_pfn_map()
    nvkm_vmm_ptes_get_map()
    func == gp100_vmm_pgt_pfn
    struct nvkm_vmm_desc_func gp100_vmm_desc_spt:
    .pfn = gp100_vmm_pgt_pfn
    nvkm_vmm_iter()
    REF_PTES == func == gp100_vmm_pgt_pfn()
    dma_map_page()

    Link: https://lore.kernel.org/r/5-v2-b4e84f444c7d+24f57-hmm_no_flags_jgg@mellanox.com
    Acked-by: Felix Kuehling
    Tested-by: Ralph Campbell
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • hmm_vma_walk->last is supposed to be updated after every write to the
    pfns, so that it can be returned by hmm_range_fault(). However, this is
    not done consistently. Fortunately nothing checks the return code of
    hmm_range_fault() for anything other than error.

    More importantly last must be set before returning -EBUSY as it is used to
    prevent reading an output pfn as an input flags when the loop restarts.

    For clarity and simplicity make hmm_range_fault() return 0 or -ERRNO. Only
    set last when returning -EBUSY.

    Link: https://lore.kernel.org/r/2-v2-b4e84f444c7d+24f57-hmm_no_flags_jgg@mellanox.com
    Acked-by: Felix Kuehling
    Tested-by: Ralph Campbell
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

21 Apr, 2020

1 commit


08 Apr, 2020

2 commits

  • The compressed cache for swap pages (zswap) currently needs from 1 to 3
    extra kernel command line parameters in order to make it work: it has to
    be enabled by adding a "zswap.enabled=1" command line parameter and if one
    wants a different compressor or pool allocator than the default lzo / zbud
    combination then these choices also need to be specified on the kernel
    command line in additional parameters.

    Using a different compressor and allocator for zswap is actually pretty
    common as guides often recommend using the lz4 / z3fold pair instead of
    the default one. In such case it is also necessary to remember to enable
    the appropriate compression algorithm and pool allocator in the kernel
    config manually.

    Let's avoid the need for adding these kernel command line parameters and
    automatically pull in the dependencies for the selected compressor
    algorithm and pool allocator by adding an appropriate default switches to
    Kconfig.

    The default values for these options match what the code was using
    previously as its defaults.

    Signed-off-by: Maciej S. Szmigiero
    Signed-off-by: Andrew Morton
    Reviewed-by: Vitaly Wool
    Link: http://lkml.kernel.org/r/20200202000112.456103-1-mail@maciej.szmigiero.name
    Signed-off-by: Linus Torvalds

    Maciej S. Szmigiero
     
  • Add documentation for free page reporting. Currently the only consumer is
    virtio-balloon, however it is possible that other drivers might make use
    of this so it is best to add a bit of documetation explaining at a high
    level how to use the API.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Andrew Morton
    Cc: Andrea Arcangeli
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Konrad Rzeszutek Wilk
    Cc: Luiz Capitulino
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Michael S. Tsirkin
    Cc: Michal Hocko
    Cc: Nitesh Narayan Lal
    Cc: Oscar Salvador
    Cc: Pankaj Gupta
    Cc: Paolo Bonzini
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Wei Wang
    Cc: Yang Zhang
    Cc: wei qi
    Link: http://lkml.kernel.org/r/20200211224730.29318.43815.stgit@localhost.localdomain
    Signed-off-by: Linus Torvalds

    Alexander Duyck
     

04 Apr, 2020

1 commit

  • Pull SPDX updates from Greg KH:
    "Here are three SPDX patches for 5.7-rc1.

    One fixes up the SPDX tag for a single driver, while the other two go
    through the tree and add SPDX tags for all of the .gitignore files as
    needed.

    Nothing too complex, but you will get a merge conflict with your
    current tree, that should be trivial to handle (one file modified by
    two things, one file deleted.)

    All three of these have been in linux-next for a while, with no
    reported issues other than the merge conflict"

    * tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
    ASoC: MT6660: make spdxcheck.py happy
    .gitignore: add SPDX License Identifier
    .gitignore: remove too obvious comments

    Linus Torvalds
     

28 Mar, 2020

1 commit

  • Now that flags are handled on a fine-grained per-page basis this global
    flag is redundant and has a confusing overlap with the pfn_flags_mask and
    default_flags.

    Normalize the HMM_FAULT_SNAPSHOT behavior into one place. Callers needing
    the SNAPSHOT behavior should set a pfn_flags_mask and default_flags that
    always results in a cleared HMM_PFN_VALID. Then no pages will be faulted,
    and HMM_FAULT_SNAPSHOT is not a special flow that overrides the masking
    mechanism.

    As this is the last flag, also remove the flags argument. If future flags
    are needed they can be part of the struct hmm_range function arguments.

    Link: https://lore.kernel.org/r/20200327200021.29372-5-jgg@ziepe.ca
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

25 Mar, 2020

1 commit


01 Feb, 2020

1 commit

  • zswap will always try to shrink pool when zswap is full. If there is a
    high pressure on zswap it will result in flipping pages in and out zswap
    pool without any real benefit, and the overall system performance will
    drop. The previous discussion on this subject [1] ended up with a
    suggestion to implement a sort of hysteresis to refuse taking pages into
    zswap pool until it has sufficient space if the limit has been hit.
    This is my take on this.

    Hysteresis is controlled with a sysfs-configurable parameter (namely,
    /sys/kernel/debug/zswap/accept_threhsold_percent). It specifies the
    threshold at which zswap would start accepting pages again after it
    became full. Setting this parameter to 100 disables the hysteresis and
    sets the zswap behavior to pre-hysteresis state.

    [1] https://lkml.org/lkml/2019/11/8/949

    Link: http://lkml.kernel.org/r/20200108200118.15563-1-vitaly.wool@konsulko.com
    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

14 Jan, 2020

1 commit


24 Nov, 2019

1 commit

  • The only two users of this are now converted to use mmu_interval_notifier,
    delete all the code and update hmm.rst.

    Link: https://lore.kernel.org/r/20191112202231.3856-14-jgg@ziepe.ca
    Reviewed-by: Jérôme Glisse
    Tested-by: Ralph Campbell
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

27 Sep, 2019

1 commit

  • The naming of pgtable_page_{ctor,dtor}() seems to have confused a few
    people, and until recently arm64 used these erroneously/pointlessly for
    other levels of page table.

    To make it incredibly clear that these only apply to the PTE level, and to
    align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them
    to pgtable_pte_page_{ctor,dtor}().

    These changes were generated with the following shell script:

    ----
    git grep -lw 'pgtable_page_.tor' | while read FILE; do
    sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE;
    sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE;
    done
    ----

    ... with the documentation re-flowed to remain under 80 columns, and
    whitespace fixed up in macros to keep backslashes aligned.

    There should be no functional change as a result of this patch.

    Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.com
    Signed-off-by: Mark Rutland
    Reviewed-by: Mike Rapoport
    Acked-by: Geert Uytterhoeven [m68k]
    Cc: Anshuman Khandual
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Yu Zhao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Rutland