10 Jun, 2020

1 commit

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

04 Feb, 2020

5 commits

  • As described in the comment, the correct order for freeing pages is:

    1) unhook page
    2) TLB invalidate page
    3) free page

    This order equally applies to page directories.

    Currently there are two correct options:

    - use tlb_remove_page(), when all page directores are full pages and
    there are no futher contraints placed by things like software
    walkers (HAVE_FAST_GUP).

    - use MMU_GATHER_RCU_TABLE_FREE and tlb_remove_table() when the
    architecture does not do IPI based TLB invalidate and has
    HAVE_FAST_GUP (or software TLB fill).

    This however leaves architectures that don't have page based directories
    but don't need RCU in a bind. For those, provide MMU_GATHER_TABLE_FREE,
    which provides the independent batching for directories without the
    additional RCU freeing.

    Link: http://lkml.kernel.org/r/20200116064531.483522-10-aneesh.kumar@linux.ibm.com
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Aneesh Kumar K.V
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Towards a more consistent naming scheme.

    Link: http://lkml.kernel.org/r/20200116064531.483522-9-aneesh.kumar@linux.ibm.com
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Aneesh Kumar K.V
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Towards a more consistent naming scheme.

    Link: http://lkml.kernel.org/r/20200116064531.483522-8-aneesh.kumar@linux.ibm.com
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Aneesh Kumar K.V
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Towards a more consistent naming scheme.

    [akpm@linux-foundation.org: fix sparc64 Kconfig]
    Link: http://lkml.kernel.org/r/20200116064531.483522-7-aneesh.kumar@linux.ibm.com
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Aneesh Kumar K.V
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Architectures for which we have hardware walkers of Linux page table
    should flush TLB on mmu gather batch allocation failures and batch flush.
    Some architectures like POWER supports multiple translation modes (hash
    and radix) and in the case of POWER only radix translation mode needs the
    above TLBI. This is because for hash translation mode kernel wants to
    avoid this extra flush since there are no hardware walkers of linux page
    table. With radix translation, the hardware also walks linux page table
    and with that, kernel needs to make sure to TLB invalidate page walk cache
    before page table pages are freed.

    More details in commit d86564a2f085 ("mm/tlb, x86/mm: Support invalidating
    TLB caches for RCU_TABLE_FREE")

    The changes to sparc are to make sure we keep the old behavior since we
    are now removing HAVE_RCU_TABLE_NO_INVALIDATE. The default value for
    tlb_needs_table_invalidate is to always force an invalidate and sparc can
    avoid the table invalidate. Hence we define tlb_needs_table_invalidate to
    false for sparc architecture.

    Link: http://lkml.kernel.org/r/20200116064531.483522-3-aneesh.kumar@linux.ibm.com
    Fixes: a46cc7a90fd8 ("powerpc/mm/radix: Improve TLB/PWC flushes")
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Michael Ellerman [powerpc]
    Cc: [4.14+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

25 Sep, 2019

1 commit

  • Patch series "mm: remove quicklist page table caches".

    A while ago Nicholas proposed to remove quicklist page table caches [1].

    I've rebased his patch on the curren upstream and switched ia64 and sh to
    use generic versions of PTE allocation.

    [1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com

    This patch (of 3):

    Remove page table allocator "quicklists". These have been around for a
    long time, but have not got much traction in the last decade and are only
    used on ia64 and sh architectures.

    The numbers in the initial commit look interesting but probably don't
    apply anymore. If anybody wants to resurrect this it's in the git
    history, but it's unhelpful to have this code and divergent allocator
    behaviour for minor archs.

    Also it might be better to instead make more general improvements to page
    allocator if this is still so slow.

    Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Nicholas Piggin
    Signed-off-by: Mike Rapoport
    Cc: Tony Luck
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicholas Piggin
     

14 Jun, 2019

1 commit

  • A few new fields were added to mmu_gather to make TLB flush smarter for
    huge page by telling what level of page table is changed.

    __tlb_reset_range() is used to reset all these page table state to
    unchanged, which is called by TLB flush for parallel mapping changes for
    the same range under non-exclusive lock (i.e. read mmap_sem).

    Before commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in
    munmap"), the syscalls (e.g. MADV_DONTNEED, MADV_FREE) which may update
    PTEs in parallel don't remove page tables. But, the forementioned
    commit may do munmap() under read mmap_sem and free page tables. This
    may result in program hang on aarch64 reported by Jan Stancek. The
    problem could be reproduced by his test program with slightly modified
    below.

    ---8< num_iter; i++) {
    map_address = mmap(distant_area, (size_t) map_size, PROT_WRITE | PROT_READ,
    MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    if (map_address == MAP_FAILED) {
    perror("mmap");
    exit(1);
    }

    for (j = 0; j < map_size; j++)
    map_address[j] = 'b';

    if (munmap(map_address, map_size) == -1) {
    perror("munmap");
    exit(1);
    }
    }

    return NULL;
    }

    void *dummy(void *ptr)
    {
    return NULL;
    }

    int main(void)
    {
    pthread_t thid[2];

    /* hint for mmap in map_write_unmap() */
    distant_area = mmap(0, DISTANT_MMAP_SIZE, PROT_WRITE | PROT_READ,
    MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
    munmap(distant_area, (size_t)DISTANT_MMAP_SIZE);
    distant_area += DISTANT_MMAP_SIZE / 2;

    while (1) {
    pthread_create(&thid[0], NULL, map_write_unmap, NULL);
    pthread_create(&thid[1], NULL, dummy, NULL);

    pthread_join(thid[0], NULL);
    pthread_join(thid[1], NULL);
    }
    }
    ---8mmap_sem);
    unmap_region()
    tlb_gather_mmu()
    inc_tlb_flush_pending(tlb->mm);
    free_pgtables()
    tlb->freed_tables = 1
    tlb->cleared_pmds = 1

    pthread_exit()
    madvise(thread_stack, 8M, MADV_DONTNEED)
    zap_page_range()
    tlb_gather_mmu()
    inc_tlb_flush_pending(tlb->mm);

    tlb_finish_mmu()
    if (mm_tlb_flush_nested(tlb->mm))
    __tlb_reset_range()

    __tlb_reset_range() would reset freed_tables and cleared_* bits, but this
    may cause inconsistency for munmap() which do free page tables. Then it
    may result in some architectures, e.g. aarch64, may not flush TLB
    completely as expected to have stale TLB entries remained.

    Use fullmm flush since it yields much better performance on aarch64 and
    non-fullmm doesn't yields significant difference on x86.

    The original proposed fix came from Jan Stancek who mainly debugged this
    issue, I just wrapped up everything together.

    Jan's testing results:

    v5.2-rc2-24-gbec7550cca10
    --------------------------
    mean stddev
    real 37.382 2.780
    user 1.420 0.078
    sys 54.658 1.855

    v5.2-rc2-24-gbec7550cca10 + "mm: mmu_gather: remove __tlb_reset_range() for force flush"
    ---------------------------------------------------------------------------------------_
    mean stddev
    real 37.119 2.105
    user 1.548 0.087
    sys 55.698 1.357

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1558322252-113575-1-git-send-email-yang.shi@linux.alibaba.com
    Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
    Signed-off-by: Yang Shi
    Signed-off-by: Jan Stancek
    Reported-by: Jan Stancek
    Tested-by: Jan Stancek
    Suggested-by: Will Deacon
    Tested-by: Will Deacon
    Acked-by: Will Deacon
    Cc: Peter Zijlstra
    Cc: Nick Piggin
    Cc: "Aneesh Kumar K.V"
    Cc: Nadav Amit
    Cc: Minchan Kim
    Cc: Mel Gorman
    Cc: [4.20+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     

03 Apr, 2019

7 commits

  • There are no external users of this API (nor should there be); remove it.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • As the comment notes; it is a potentially dangerous operation. Just
    use tlb_flush_mmu(), that will skip the (double) TLB invalidate if
    it really isn't needed anyway.

    No change in behavior intended.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Since all architectures are now using it, it is redundant.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Now that all architectures are converted to the generic code, remove
    the arch hooks.

    No change in behavior intended.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Add the Kconfig option HAVE_MMU_GATHER_NO_GATHER to the generic
    mmu_gather code. If the option is set the mmu_gather will not
    track individual pages for delayed page free anymore. A platform
    that enables the option needs to provide its own implementation
    of the __tlb_remove_page_size() function to free pages.

    No change in behavior intended.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: aneesh.kumar@linux.vnet.ibm.com
    Cc: heiko.carstens@de.ibm.com
    Cc: linux@armlinux.org.uk
    Cc: npiggin@gmail.com
    Link: http://lkml.kernel.org/r/20180918125151.31744-2-schwidefsky@de.ibm.com
    Signed-off-by: Ingo Molnar

    Martin Schwidefsky
     
  • Make issuing a TLB invalidate for page-table pages the normal case.

    The reason is twofold:

    - too many invalidates is safer than too few,
    - most architectures use the linux page-tables natively
    and would thus require this.

    Make it an opt-out, instead of an opt-in.

    No change in behavior intended.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Move the mmu_gather::page_size things into the generic code instead of
    PowerPC specific bits.

    No change in behavior intended.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Aneesh Kumar K.V
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

28 Nov, 2018

1 commit

  • Now that call_rcu()'s callback is not invoked until after all
    preempt-disable regions of code have completed (in addition to explicitly
    marked RCU read-side critical sections), call_rcu() can be used in place
    of call_rcu_sched(). This commit therefore makes that change.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

07 Sep, 2018

1 commit