11 May, 2017

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The main changes are:

    - Debloat RCU headers

    - Parallelize SRCU callback handling (plus overlapping patches)

    - Improve the performance of Tree SRCU on a CPU-hotplug stress test

    - Documentation updates

    - Miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
    rcu: Open-code the rcu_cblist_n_lazy_cbs() function
    rcu: Open-code the rcu_cblist_n_cbs() function
    rcu: Open-code the rcu_cblist_empty() function
    rcu: Separately compile large rcu_segcblist functions
    srcu: Debloat the header
    srcu: Adjust default auto-expediting holdoff
    srcu: Specify auto-expedite holdoff time
    srcu: Expedite first synchronize_srcu() when idle
    srcu: Expedited grace periods with reduced memory contention
    srcu: Make rcutorture writer stalls print SRCU GP state
    srcu: Exact tracking of srcu_data structures containing callbacks
    srcu: Make SRCU be built by default
    srcu: Fix Kconfig botch when SRCU not selected
    rcu: Make non-preemptive schedule be Tasks RCU quiescent state
    srcu: Expedite srcu_schedule_cbs_snp() callback invocation
    srcu: Parallelize callback handling
    kvm: Move srcu_struct fields to end of struct kvm
    rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
    rcu: Use true/false in assignment to bool
    rcu: Use bool value directly
    ...

    Linus Torvalds
     

04 May, 2017

16 commits

  • The memory controllers stat function names are awkwardly long and
    arbitrarily different from the zone and node stat functions.

    The current interface is named:

    mem_cgroup_read_stat()
    mem_cgroup_update_stat()
    mem_cgroup_inc_stat()
    mem_cgroup_dec_stat()
    mem_cgroup_update_page_stat()
    mem_cgroup_inc_page_stat()
    mem_cgroup_dec_page_stat()

    This patch renames it to match the corresponding node stat functions:

    memcg_page_state() [node_page_state()]
    mod_memcg_state() [mod_node_state()]
    inc_memcg_state() [inc_node_state()]
    dec_memcg_state() [dec_node_state()]
    mod_memcg_page_state() [mod_node_page_state()]
    inc_memcg_page_state() [inc_node_page_state()]
    dec_memcg_page_state() [dec_node_page_state()]

    Link: http://lkml.kernel.org/r/20170404220148.28338-4-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Acked-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The current duplication is a high-maintenance mess, and it's painful to
    add new items or query memcg state from the rest of the VM.

    This increases the size of the stat array marginally, but we should aim
    to track all these stats on a per-cgroup level anyway.

    Link: http://lkml.kernel.org/r/20170404220148.28338-3-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Acked-by: Vladimir Davydov
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • There is no user for it. Remove it.

    [minchan@kernel.org: use false instead of SWAP_FAIL]
    Link: http://lkml.kernel.org/r/20170316053313.GA19241@bbox
    Link: http://lkml.kernel.org/r/1489555493-14659-11-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Anshuman Khandual
    Cc: Hillf Danton
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Naoya Horiguchi
    Cc: Vlastimil Babka
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • rmap_one's return value controls whether rmap_work should contine to
    scan other ptes or not so it's target for changing to boolean. Return
    true if the scan should be continued. Otherwise, return false to stop
    the scanning.

    This patch makes rmap_one's return value to boolean.

    Link: http://lkml.kernel.org/r/1489555493-14659-10-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Anshuman Khandual
    Cc: Hillf Danton
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Naoya Horiguchi
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • There is no user of the return value from rmap_walk() and friends so
    this patch makes them void-returning functions.

    Link: http://lkml.kernel.org/r/1489555493-14659-9-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Anshuman Khandual
    Cc: Hillf Danton
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Naoya Horiguchi
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • try_to_unmap() returns SWAP_SUCCESS or SWAP_FAIL so it's suitable for
    boolean return. This patch changes it.

    Link: http://lkml.kernel.org/r/1489555493-14659-8-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Naoya Horiguchi
    Cc: Anshuman Khandual
    Cc: Hillf Danton
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • In 2002, [1] introduced SWAP_AGAIN. At that time, try_to_unmap_one used
    spin_trylock(&mm->page_table_lock) so it's really easy to contend and
    fail to hold a lock so SWAP_AGAIN to keep LRU status makes sense.

    However, now we changed it to mutex-based lock and be able to block
    without skip pte so there is few of small window to return SWAP_AGAIN so
    remove SWAP_AGAIN and just return SWAP_FAIL.

    [1] c48c43e, minimal rmap

    Link: http://lkml.kernel.org/r/1489555493-14659-7-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Anshuman Khandual
    Cc: Hillf Danton
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Naoya Horiguchi
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • ttu doesn't need to return SWAP_MLOCK. Instead, just return SWAP_FAIL
    because it means the page is not-swappable so it should move to another
    LRU list(active or unevictable). putback friends will move it to right
    list depending on the page's LRU flag.

    Link: http://lkml.kernel.org/r/1489555493-14659-6-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Anshuman Khandual
    Cc: Hillf Danton
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Naoya Horiguchi
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • try_to_munlock returns SWAP_MLOCK if the one of VMAs mapped the page has
    VM_LOCKED flag. In that time, VM set PG_mlocked to the page if the page
    is not pte-mapped THP which cannot be mlocked, either.

    With that, __munlock_isolated_page can use PageMlocked to check whether
    try_to_munlock is successful or not without relying on try_to_munlock's
    retval. It helps to make try_to_unmap/try_to_unmap_one simple with
    upcoming patches.

    [minchan@kernel.org: remove PG_Mlocked VM_BUG_ON check]
    Link: http://lkml.kernel.org/r/20170411025615.GA6545@bbox
    Link: http://lkml.kernel.org/r/1489555493-14659-5-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Acked-by: Kirill A. Shutemov
    Acked-by: Vlastimil Babka
    Cc: Anshuman Khandual
    Cc: Hillf Danton
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Naoya Horiguchi
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • If the page is mapped and rescue in try_to_unmap_one, the
    page_mapcount() of a page cannot be zero, so the page_mapcount check in
    try_to_unmap is enough to return SWAP_SUCCESS. IOW, SWAP_MLOCK check is
    redundant so remove it.

    Link: http://lkml.kernel.org/r/1489555493-14659-4-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Anshuman Khandual
    Cc: Hillf Danton
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Naoya Horiguchi
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • If we found lazyfree page is dirty, try_to_unmap_one can just
    SetPageSwapBakced in there like PG_mlocked page and just return with
    SWAP_FAIL which is very natural because the page is not swappable right
    now so that vmscan can activate it. There is no point to introduce new
    return value SWAP_DIRTY in try_to_unmap at the moment.

    Link: http://lkml.kernel.org/r/1489555493-14659-3-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Acked-by: Hillf Danton
    Acked-by: Kirill A. Shutemov
    Cc: Anshuman Khandual
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Naoya Horiguchi
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Nobody uses ret variable. Remove it.

    Link: http://lkml.kernel.org/r/1489555493-14659-2-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Acked-by: Hillf Danton
    Acked-by: Kirill A. Shutemov
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Kirill A. Shutemov
    Cc: Anshuman Khandual
    Cc: Vlastimil Babka
    Cc: Naoya Horiguchi

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • If a page is swapbacked, it means it should be in swapcache in
    try_to_unmap_one's path.

    If a page is !swapbacked, it mean it shouldn't be in swapcache in
    try_to_unmap_one's path.

    Check both two cases all at once and if it fails, warn and return
    SWAP_FAIL. Such bug never mean we should shut down the kernel.

    [minchan@kernel.org: do not use VM_WARN_ON_ONCE as if condition[
    Link: http://lkml.kernel.org/r/20170309060226.GB854@bbox
    Link: http://lkml.kernel.org/r/20170307055551.GC29458@bbox
    Signed-off-by: Minchan Kim
    Suggested-by: Johannes Weiner
    Acked-by: Michal Hocko
    Acked-by: Johannes Weiner
    Cc: Shaohua Li
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • When memory pressure is high, we free MADV_FREE pages. If the pages are
    not dirty in pte, the pages could be freed immediately. Otherwise we
    can't reclaim them. We put the pages back to anonumous LRU list (by
    setting SwapBacked flag) and the pages will be reclaimed in normal
    swapout way.

    We use normal page reclaim policy. Since MADV_FREE pages are put into
    inactive file list, such pages and inactive file pages are reclaimed
    according to their age. This is expected, because we don't want to
    reclaim too many MADV_FREE pages before used once pages.

    Based on Minchan's original patch

    [minchan@kernel.org: clean up lazyfree page handling]
    Link: http://lkml.kernel.org/r/20170303025237.GB3503@bbox
    Link: http://lkml.kernel.org/r/14b8eb1d3f6bf6cc492833f183ac8c304e560484.1487965799.git.shli@fb.com
    Signed-off-by: Shaohua Li
    Signed-off-by: Minchan Kim
    Acked-by: Minchan Kim
    Acked-by: Michal Hocko
    Acked-by: Johannes Weiner
    Acked-by: Hillf Danton
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • There are a few places the code assumes anonymous pages should have
    SwapBacked flag set. MADV_FREE pages are anonymous pages but we are
    going to add them to LRU_INACTIVE_FILE list and clear SwapBacked flag
    for them. The assumption doesn't hold any more, so fix them.

    Link: http://lkml.kernel.org/r/3945232c0df3dd6c4ef001976f35a95f18dcb407.1487965799.git.shli@fb.com
    Signed-off-by: Shaohua Li
    Acked-by: Johannes Weiner
    Acked-by: Hillf Danton
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • Patch series "mm: fix some MADV_FREE issues", v5.

    We are trying to use MADV_FREE in jemalloc. Several issues are found.
    Without solving the issues, jemalloc can't use the MADV_FREE feature.

    - Doesn't support system without swap enabled. Because if swap is off,
    we can't or can't efficiently age anonymous pages. And since
    MADV_FREE pages are mixed with other anonymous pages, we can't
    reclaim MADV_FREE pages. In current implementation, MADV_FREE will
    fallback to MADV_DONTNEED without swap enabled. But in our
    environment, a lot of machines don't enable swap. This will prevent
    our setup using MADV_FREE.

    - Increases memory pressure. page reclaim bias file pages reclaim
    against anonymous pages. This doesn't make sense for MADV_FREE pages,
    because those pages could be freed easily and refilled with very
    slight penality. Even page reclaim doesn't bias file pages, there is
    still an issue, because MADV_FREE pages and other anonymous pages are
    mixed together. To reclaim a MADV_FREE page, we probably must scan a
    lot of other anonymous pages, which is inefficient. In our test, we
    usually see oom with MADV_FREE enabled and nothing without it.

    - Accounting. There are two accounting problems. We don't have a global
    accounting. If the system is abnormal, we don't know if it's a
    problem from MADV_FREE side. The other problem is RSS accounting.
    MADV_FREE pages are accounted as normal anon pages and reclaimed
    lazily, so application's RSS becomes bigger. This confuses our
    workloads. We have monitoring daemon running and if it finds
    applications' RSS becomes abnormal, the daemon will kill the
    applications even kernel can reclaim the memory easily.

    To address the first the two issues, we can either put MADV_FREE pages
    into a separate LRU list (Minchan's previous patches and V1 patches), or
    put them into LRU_INACTIVE_FILE list (suggested by Johannes). The
    patchset use the second idea. The reason is LRU_INACTIVE_FILE list is
    tiny nowadays and should be full of used once file pages. So we can
    still efficiently reclaim MADV_FREE pages there without interference
    with other anon and active file pages. Putting the pages into inactive
    file list also has an advantage which allows page reclaim to prioritize
    MADV_FREE pages and used once file pages. MADV_FREE pages are put into
    the lru list and clear SwapBacked flag, so PageAnon(page) &&
    !PageSwapBacked(page) will indicate a MADV_FREE pages. These pages will
    directly freed without pageout if they are clean, otherwise normal swap
    will reclaim them.

    For the third issue, the previous post adds global accounting and a
    separate RSS count for MADV_FREE pages. The problem is we never get
    accurate accounting for MADV_FREE pages. The pages are mapped to
    userspace, can be dirtied without notice from kernel side. To get
    accurate accounting, we could write protect the page, but then there is
    extra page fault overhead, which people don't want to pay. Jemalloc
    guys have concerns about the inaccurate accounting, so this post drops
    the accounting patches temporarily. The info exported to
    /proc/pid/smaps for MADV_FREE pages are kept, which is the only place we
    can get accurate accounting right now.

    This patch (of 6):

    Johannes pointed out TTU_LZFREE is unnecessary. It's true because we
    always have the flag set if we want to do an unmap. For cases we don't
    do an unmap, the TTU_LZFREE part of code should never run.

    Also the TTU_UNMAP is unnecessary. If no other flags set (for example,
    TTU_MIGRATION), an unmap is implied.

    The patch includes Johannes's cleanup and dead TTU_ACTION macro removal
    code

    Link: http://lkml.kernel.org/r/4be3ea1bc56b26fd98a54d0a6f70bec63f6d8980.1487965799.git.shli@fb.com
    Signed-off-by: Shaohua Li
    Suggested-by: Johannes Weiner
    Acked-by: Johannes Weiner
    Acked-by: Minchan Kim
    Acked-by: Hillf Danton
    Acked-by: Michal Hocko
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     

23 Apr, 2017

1 commit


19 Apr, 2017

1 commit

  • A group of Linux kernel hackers reported chasing a bug that resulted
    from their assumption that SLAB_DESTROY_BY_RCU provided an existence
    guarantee, that is, that no block from such a slab would be reallocated
    during an RCU read-side critical section. Of course, that is not the
    case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
    slab of blocks.

    However, there is a phrase for this, namely "type safety". This commit
    therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
    to avoid future instances of this sort of confusion.

    Signed-off-by: Paul E. McKenney
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrew Morton
    Cc:
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    [ paulmck: Add comments mentioning the old name, as requested by Eric
    Dumazet, in order to help people familiar with the old name find
    the new one. ]
    Acked-by: David Rientjes

    Paul E. McKenney
     

01 Apr, 2017

1 commit

  • Huge pages are accounted as single units in the memcg's "file_mapped"
    counter. Account the correct number of base pages, like we do in the
    corresponding node counter.

    Link: http://lkml.kernel.org/r/20170322005111.3156-1-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Reviewed-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Cc: Vladimir Davydov
    Cc: [4.8+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

11 Mar, 2017

1 commit

  • Merge 5-level page table prep from Kirill Shutemov:
    "Here's relatively low-risk part of 5-level paging patchset. Merging it
    now will make x86 5-level paging enabling in v4.12 easier.

    The first patch is actually x86-specific: detect 5-level paging
    support. It boils down to single define.

    The rest of patchset converts Linux MMU abstraction from 4- to 5-level
    paging.

    Enabling of new abstraction in most cases requires adding single line
    of code in arch-specific code. The rest is taken care by asm-generic/.

    Changes to mm/ code are mostly mechanical: add support for new page
    table level -- p4d_t -- where we deal with pud_t now.

    v2:
    - fix build on microblaze (Michal);
    - comment for __ARCH_HAS_5LEVEL_HACK in kasan_populate_zero_shadow();
    - acks from Michal"

    * emailed patches from Kirill A Shutemov :
    mm: introduce __p4d_alloc()
    mm: convert generic code to 5-level paging
    asm-generic: introduce
    arch, mm: convert all architectures to use 5level-fixup.h
    asm-generic: introduce __ARCH_USE_5LEVEL_HACK
    asm-generic: introduce 5level-fixup.h
    x86/cpufeature: Add 5-level paging detection

    Linus Torvalds
     

10 Mar, 2017

2 commits

  • The following test case triggers NULL-pointer derefernce in
    try_to_unmap_one():

    #include
    #include
    #include
    #include

    int main(int argc, char *argv[])
    {
    int fd;

    system("mount -t tmpfs -o huge=always none /mnt");
    fd = open("/mnt/test", O_CREAT | O_RDWR);
    ftruncate(fd, 2UL << 20);
    mmap(NULL, 2UL << 20, PROT_READ | PROT_WRITE,
    MAP_SHARED | MAP_FIXED | MAP_LOCKED, fd, 0);
    mmap(NULL, 2UL << 20, PROT_READ | PROT_WRITE,
    MAP_SHARED | MAP_LOCKED, fd, 0);
    munlockall();
    return 0;
    }

    Apparently, there's a case when we call try_to_unmap() on huge PMDs:
    it's TTU_MUNLOCK.

    Let's handle this case correctly.

    Fixes: c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use page_vma_mapped_walk()")
    Link: http://lkml.kernel.org/r/20170302151159.30592-1-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Convert all non-architecture-specific code to 5-level paging.

    It's mostly mechanical adding handling one more page table level in
    places where we deal with pud_t.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

02 Mar, 2017

2 commits

  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    The APIs that are going to be moved first are:

    mm_alloc()
    __mmdrop()
    mmdrop()
    mmdrop_async_fn()
    mmdrop_async()
    mmget_not_zero()
    mmput()
    mmput_async()
    get_task_mm()
    mm_access()
    mm_release()

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

25 Feb, 2017

6 commits

  • All users are gone. Let's drop them.

    Link: http://lkml.kernel.org/r/20170129173858.45174-12-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Srikar Dronamraju
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • For consistency, it worth converting all page_check_address() to
    page_vma_mapped_walk(), so we could drop the former.

    Link: http://lkml.kernel.org/r/20170129173858.45174-11-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Acked-by: Hillf Danton
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Srikar Dronamraju
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • For consistency, it worth converting all page_check_address() to
    page_vma_mapped_walk(), so we could drop the former.

    It also makes freeze_page() as we walk though rmap only once.

    Link: http://lkml.kernel.org/r/20170129173858.45174-8-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Srikar Dronamraju
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • For consistency, it worth converting all page_check_address() to
    page_vma_mapped_walk(), so we could drop the former.

    PMD handling here is future-proofing, we don't have users yet. ext4
    with huge pages will be the first.

    Link: http://lkml.kernel.org/r/20170129173858.45174-7-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Srikar Dronamraju
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Current rmap code can miss a VMA that maps PTE-mapped THP if the first
    suppage of the THP was unmapped from the VMA.

    We need to walk rmap for the whole range of offsets that THP covers, not
    only the first one.

    vma_address() also need to be corrected to check the range instead of
    the first subpage.

    Link: http://lkml.kernel.org/r/20170129173858.45174-6-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Acked-by: Hillf Danton
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Srikar Dronamraju
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • For PTE-mapped THP page_check_address_transhuge() is not adequate: it
    cannot find all relevant PTEs, only the first one. It means we can miss
    some references of the page and it can result in suboptimal decisions by
    vmscan.

    Let's switch it to page_vma_mapped_walk().

    I don't think it's subject for stable@: it's not fatal. The only side
    effect is that THP can be swapped out when it shouldn't.

    Link: http://lkml.kernel.org/r/20170129173858.45174-4-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Srikar Dronamraju
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

13 Dec, 2016

1 commit

  • anon_vma_prepare() is mostly a large "if (unlikely(...))" block, as the
    expected common case is that an anon_vma already exists. We could turn
    the condition around and return 0, but it also makes sense to do it
    inline and avoid a call for the common case.

    Bloat-o-meter naturally shows that inlining the check has some code size
    costs:

    add/remove: 1/1 grow/shrink: 4/0 up/down: 475/-373 (102)
    function old new delta
    __anon_vma_prepare - 359 +359
    handle_mm_fault 2744 2796 +52
    hugetlb_cow 1146 1170 +24
    hugetlb_fault 2123 2145 +22
    wp_page_copy 1469 1487 +18
    anon_vma_prepare 373 - -373

    Checking the asm however confirms that the hot paths now avoid a call,
    which is moved away.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20161116074005.22768-1-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Cc: "Kirill A. Shutemov"
    Cc: Johannes Weiner
    Cc: Konstantin Khlebnikov
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

11 Aug, 2016

2 commits

  • In page_remove_file_rmap(.) we have the following check:

    VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);

    This is meant to check for either HugeTLB pages or THP when a compound
    page is passed in.

    Unfortunately, if one disables CONFIG_TRANSPARENT_HUGEPAGE, then
    PageTransHuge(.) will always return false, provoking BUGs when one runs
    the libhugetlbfs test suite.

    This patch replaces PageTransHuge(), with PageHead() which will work for
    both HugeTLB and THP.

    Fixes: dd78fedde4b9 ("rmap: support file thp")
    Link: http://lkml.kernel.org/r/1470838217-5889-1-git-send-email-steve.capper@arm.com
    Signed-off-by: Steve Capper
    Acked-by: Kirill A. Shutemov
    Cc: Huang Shijie
    Cc: Will Deacon
    Cc: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steve Capper
     
  • PageTransCompound() doesn't distinguish THP from from any other type of
    compound pages. This can lead to false-positive VM_BUG_ON() in
    page_add_file_rmap() if called on compound page from a driver[1].

    I think we can exclude such cases by checking if the page belong to a
    mapping.

    The VM_BUG_ON_PAGE() is downgraded to VM_WARN_ON_ONCE(). This path
    should not cause any harm to non-THP page, but good to know if we step
    on anything else.

    [1] http://lkml.kernel.org/r/c711e067-0bff-a6cb-3c37-04dfe77d2db1@redhat.com

    Link: http://lkml.kernel.org/r/20160810161345.GA67522@black.fi.intel.com
    Signed-off-by: Kirill A. Shutemov
    Reported-by: Laura Abbott
    Tested-by: Laura Abbott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

29 Jul, 2016

4 commits

  • There are now a number of accounting oddities such as mapped file pages
    being accounted for on the node while the total number of file pages are
    accounted on the zone. This can be coped with to some extent but it's
    confusing so this patch moves the relevant file-based accounted. Due to
    throttling logic in the page allocator for reliable OOM detection, it is
    still necessary to track dirty and writeback pages on a per-zone basis.

    [mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
    Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
    Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Hillf Danton
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Minchan Kim
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • NR_FILE_PAGES is the number of file pages.
    NR_FILE_MAPPED is the number of mapped file pages.
    NR_ANON_PAGES is the number of mapped anon pages.

    This is unhelpful naming as it's easy to confuse NR_FILE_MAPPED and
    NR_ANON_PAGES for mapped pages. This patch renames NR_ANON_PAGES so we
    have

    NR_FILE_PAGES is the number of file pages.
    NR_FILE_MAPPED is the number of mapped file pages.
    NR_ANON_MAPPED is the number of mapped anon pages.

    Link: http://lkml.kernel.org/r/1467970510-21195-19-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Hillf Danton
    Cc: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Reclaim makes decisions based on the number of pages that are mapped but
    it's mixing node and zone information. Account NR_FILE_MAPPED and
    NR_ANON_PAGES pages on the node.

    Link: http://lkml.kernel.org/r/1467970510-21195-18-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Hillf Danton
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Minchan Kim
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Node-based reclaim requires node-based LRUs and locking. This is a
    preparation patch that just moves the lru_lock to the node so later
    patches are easier to review. It is a mechanical change but note this
    patch makes contention worse because the LRU lock is hotter and direct
    reclaim and kswapd can contend on the same lock even when reclaiming
    from different zones.

    Link: http://lkml.kernel.org/r/1467970510-21195-3-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Reviewed-by: Minchan Kim
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    Cc: Hillf Danton
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

27 Jul, 2016

2 commits

  • Let's add ShmemHugePages and ShmemPmdMapped fields into meminfo and
    smaps. It indicates how many times we allocate and map shmem THP.

    NR_ANON_TRANSPARENT_HUGEPAGES is renamed to NR_ANON_THPS.

    Link: http://lkml.kernel.org/r/1466021202-61880-27-git-send-email-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • As with anon THP, we only mlock file huge pages if we can prove that the
    page is not mapped with PTE. This way we can avoid mlock leak into
    non-mlocked vma on split.

    We rely on PageDoubleMap() under lock_page() to check if the the page
    may be PTE mapped. PG_double_map is set by page_add_file_rmap() when
    the page mapped with PTEs.

    Link: http://lkml.kernel.org/r/1466021202-61880-21-git-send-email-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov