11 Jun, 2020

1 commit

  • Switch the function documentation to kerneldoc comments, and add
    WARN_ON_ONCE asserts that the calling thread is a kernel thread and does
    not have ->mm set (or has ->mm set in the case of unuse_mm).

    Also give the functions a kthread_ prefix to better document the use case.

    [hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio]
    Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de
    [sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename]
    Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Tested-by: Jens Axboe
    Reviewed-by: Jens Axboe
    Acked-by: Felix Kuehling
    Acked-by: Greg Kroah-Hartman [usb]
    Acked-by: Haren Myneni
    Cc: Alex Deucher
    Cc: Al Viro
    Cc: Felipe Balbi
    Cc: Jason Wang
    Cc: "Michael S. Tsirkin"
    Cc: Zhenyu Wang
    Cc: Zhi Wang
    Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

10 Jun, 2020

1 commit

  • Patch series "mm: consolidate definitions of page table accessors", v2.

    The low level page table accessors (pXY_index(), pXY_offset()) are
    duplicated across all architectures and sometimes more than once. For
    instance, we have 31 definition of pgd_offset() for 25 supported
    architectures.

    Most of these definitions are actually identical and typically it boils
    down to, e.g.

    static inline unsigned long pmd_index(unsigned long address)
    {
    return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
    }

    static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
    {
    return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
    }

    These definitions can be shared among 90% of the arches provided
    XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

    For architectures that really need a custom version there is always
    possibility to override the generic version with the usual ifdefs magic.

    These patches introduce include/linux/pgtable.h that replaces
    include/asm-generic/pgtable.h and add the definitions of the page table
    accessors to the new header.

    This patch (of 12):

    The linux/mm.h header includes to allow inlining of the
    functions involving page table manipulations, e.g. pte_alloc() and
    pmd_alloc(). So, there is no point to explicitly include
    in the files that include .

    The include statements in such cases are remove with a simple loop:

    for f in $(git grep -l "include ") ; do
    sed -i -e '/include / d' $f
    done

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Mike Rapoport
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

14 Sep, 2018

1 commit

  • Jann Horn points out that the vmacache_flush_all() function is not only
    potentially expensive, it's buggy too. It also happens to be entirely
    unnecessary, because the sequence number overflow case can be avoided by
    simply making the sequence number be 64-bit. That doesn't even grow the
    data structures in question, because the other adjacent fields are
    already 64-bit.

    So simplify the whole thing by just making the sequence number overflow
    case go away entirely, which gets rid of all the complications and makes
    the code faster too. Win-win.

    [ Oleg Nesterov points out that the VMACACHE_FULL_FLUSHES statistics
    also just goes away entirely with this ]

    Reported-by: Jann Horn
    Suggested-by: Will Deacon
    Acked-by: Davidlohr Bueso
    Cc: Oleg Nesterov
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

18 Aug, 2018

1 commit

  • When perf profiling a wide variety of different workloads, it was found
    that vmacache_find() had higher than expected cost: up to 0.08% of cpu
    utilization in some cases. This was found to rival other core VM
    functions such as alloc_pages_vma() with thp enabled and default
    mempolicy, and the conditionals in __get_vma_policy().

    VMACACHE_HASH() determines which of the four per-task_struct slots a vma
    is cached for a particular address. This currently depends on the pfn,
    so pfn 5212 occupies a different vmacache slot than its neighboring pfn
    5213.

    vmacache_find() iterates through all four of current's vmacache slots
    when looking up an address. Hashing based on pfn, an address has
    ~1/VMACACHE_SIZE chance of being cached in the first vmacache slot, or
    about 25%, *if* the vma is cached.

    This patch hashes an address by its pmd instead of pte to optimize for
    workloads with good spatial locality. This results in a higher
    probability of vmas being cached in the first slot that is checked:
    normally ~70% on the same workloads instead of 25%.

    [rientjes@google.com: various updates]
    Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1807231532290.109445@chino.kir.corp.google.com
    Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1807091749150.114630@chino.kir.corp.google.com
    Signed-off-by: David Rientjes
    Reviewed-by: Andrew Morton
    Cc: Davidlohr Bueso
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

02 Mar, 2017

3 commits


08 Oct, 2016

1 commit

  • Current code doesn't count first FIND operation after VMA cache flush
    (which happen surprisingly often) artificially increasing cache hit ratio.

    On my regular setup the difference is:

    Before After
    ==========================================================

    * boot, login into KDE

    vmacache_find_calls 446216 vmacache_find_calls 492741
    vmacache_find_hits 277596 vmacache_find_hits 276096

    ~62.2% ~56.0%

    * rebuild kernel (no changes to code, usual config)

    vmacache_find_calls 1943007 vmacache_find_calls 2083718
    vmacache_find_hits 1246123 vmacache_find_hits 1244146

    ~64.1% ~59.7%

    * rebuild kernel (full rebuild, usual config)

    vmacache_find_calls 32163155 vmacache_find_calls 33677183
    vmacache_find_hits 27889956 vmacache_find_hits 27877591

    ~88.2% ~84.3%

    Total: ~4% cache hit ratio.

    If someone is counting _relative_ cache _miss_ ratio, misreporting is much
    higher.

    Link: http://lkml.kernel.org/r/20160822225009.GA3934@p183.telecom.by
    Signed-off-by: Alexey Dobriyan
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

06 Nov, 2015

1 commit

  • This function incurs in very hot paths and merely does a few loads for
    validity check. Lets inline it, such that we can save the function call
    overhead.

    (akpm: this is cosmetic - the compiler already inlines vmacache_valid_mm())

    Signed-off-by: Davidlohr Bueso
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

14 Dec, 2014

1 commit

  • These flushes deal with sequence number overflows, such as for long lived
    threads. These are rare, but interesting from a debugging PoV. As such,
    display the number of flushes when vmacache debugging is enabled.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

05 Jun, 2014

2 commits

  • For single threaded workloads, we can avoid flushing and iterating through
    the entire list of tasks, making the whole function a lot faster,
    requiring only a single atomic read for the mm_users.

    Signed-off-by: Davidlohr Bueso
    Suggested-by: Oleg Nesterov
    Cc: Aswin Chandramouleeswaran
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Introduce a CONFIG_DEBUG_VM_VMACACHE option to enable counting the cache
    hit rate -- exported in /proc/vmstat.

    Any updates to the caching scheme needs this kind of data, thus it can
    save some work re-implementing the counting all the time.

    Signed-off-by: Davidlohr Bueso
    Cc: Aswin Chandramouleeswaran
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

29 Apr, 2014

1 commit

  • BUG_ON() is a big hammer, and should be used _only_ if there is some
    major corruption that you cannot possibly recover from, making it
    imperative that the current process (and possibly the whole machine) be
    terminated with extreme prejudice.

    The trivial sanity check in the vmacache code is *not* such a fatal
    error. Recovering from it is absolutely trivial, and using BUG_ON()
    just makes it harder to debug for no actual advantage.

    To make matters worse, the placement of the BUG_ON() (only if the range
    check matched) actually makes it harder to hit the sanity check to begin
    with, so _if_ there is a bug (and we just got a report from Srivatsa
    Bhat that this can indeed trigger), it is harder to debug not just
    because the machine is possibly dead, but because we don't have better
    coverage.

    BUG_ON() must *die*. Maybe we should add a checkpatch warning for it,
    because it is simply just about the worst thing you can ever do if you
    hit some "this cannot happen" situation.

    Reported-by: Srivatsa S. Bhat
    Cc: Davidlohr Bueso
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Apr, 2014

1 commit

  • This patch is a continuation of efforts trying to optimize find_vma(),
    avoiding potentially expensive rbtree walks to locate a vma upon faults.
    The original approach (https://lkml.org/lkml/2013/11/1/410), where the
    largest vma was also cached, ended up being too specific and random,
    thus further comparison with other approaches were needed. There are
    two things to consider when dealing with this, the cache hit rate and
    the latency of find_vma(). Improving the hit-rate does not necessarily
    translate in finding the vma any faster, as the overhead of any fancy
    caching schemes can be too high to consider.

    We currently cache the last used vma for the whole address space, which
    provides a nice optimization, reducing the total cycles in find_vma() by
    up to 250%, for workloads with good locality. On the other hand, this
    simple scheme is pretty much useless for workloads with poor locality.
    Analyzing ebizzy runs shows that, no matter how many threads are
    running, the mmap_cache hit rate is less than 2%, and in many situations
    below 1%.

    The proposed approach is to replace this scheme with a small per-thread
    cache, maximizing hit rates at a very low maintenance cost.
    Invalidations are performed by simply bumping up a 32-bit sequence
    number. The only expensive operation is in the rare case of a seq
    number overflow, where all caches that share the same address space are
    flushed. Upon a miss, the proposed replacement policy is based on the
    page number that contains the virtual address in question. Concretely,
    the following results are seen on an 80 core, 8 socket x86-64 box:

    1) System bootup: Most programs are single threaded, so the per-thread
    scheme does improve ~50% hit rate by just adding a few more slots to
    the cache.

    +----------------+----------+------------------+
    | caching scheme | hit-rate | cycles (billion) |
    +----------------+----------+------------------+
    | baseline | 50.61% | 19.90 |
    | patched | 73.45% | 13.58 |
    +----------------+----------+------------------+

    2) Kernel build: This one is already pretty good with the current
    approach as we're dealing with good locality.

    +----------------+----------+------------------+
    | caching scheme | hit-rate | cycles (billion) |
    +----------------+----------+------------------+
    | baseline | 75.28% | 11.03 |
    | patched | 88.09% | 9.31 |
    +----------------+----------+------------------+

    3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

    +----------------+----------+------------------+
    | caching scheme | hit-rate | cycles (billion) |
    +----------------+----------+------------------+
    | baseline | 70.66% | 17.14 |
    | patched | 91.15% | 12.57 |
    +----------------+----------+------------------+

    4) Ebizzy: There's a fair amount of variation from run to run, but this
    approach always shows nearly perfect hit rates, while baseline is just
    about non-existent. The amounts of cycles can fluctuate between
    anywhere from ~60 to ~116 for the baseline scheme, but this approach
    reduces it considerably. For instance, with 80 threads:

    +----------------+----------+------------------+
    | caching scheme | hit-rate | cycles (billion) |
    +----------------+----------+------------------+
    | baseline | 1.06% | 91.54 |
    | patched | 99.97% | 14.18 |
    +----------------+----------+------------------+

    [akpm@linux-foundation.org: fix nommu build, per Davidlohr]
    [akpm@linux-foundation.org: document vmacache_valid() logic]
    [akpm@linux-foundation.org: attempt to untangle header files]
    [akpm@linux-foundation.org: add vmacache_find() BUG_ON]
    [hughd@google.com: add vmacache_valid_mm() (from Oleg)]
    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: adjust and enhance comments]
    Signed-off-by: Davidlohr Bueso
    Reviewed-by: Rik van Riel
    Acked-by: Linus Torvalds
    Reviewed-by: Michel Lespinasse
    Cc: Oleg Nesterov
    Tested-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso