19 Dec, 2013

2 commits

  • There are a few subtle races, between change_protection_range (used by
    mprotect and change_prot_numa) on one side, and NUMA page migration and
    compaction on the other side.

    The basic race is that there is a time window between when the PTE gets
    made non-present (PROT_NONE or NUMA), and the TLB is flushed.

    During that time, a CPU may continue writing to the page.

    This is fine most of the time, however compaction or the NUMA migration
    code may come in, and migrate the page away.

    When that happens, the CPU may continue writing, through the cached
    translation, to what is no longer the current memory location of the
    process.

    This only affects x86, which has a somewhat optimistic pte_accessible.
    All other architectures appear to be safe, and will either always flush,
    or flush whenever there is a valid mapping, even with no permissions
    (SPARC).

    The basic race looks like this:

    CPU A CPU B CPU C

    load TLB entry
    make entry PTE/PMD_NUMA
    fault on entry
    read/write old page
    start migrating page
    change PTE/PMD to new page
    read/write old page [*]
    flush TLB
    reload TLB from new entry
    read/write new page
    lose data

    [*] the old page may belong to a new user at this point!

    The obvious fix is to flush remote TLB entries, by making sure that
    pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
    still be accessible if there is a TLB flush pending for the mm.

    This should fix both NUMA migration and compaction.

    [mgorman@suse.de: fix build]
    Signed-off-by: Rik van Riel
    Signed-off-by: Mel Gorman
    Cc: Alex Thorlton
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • On x86, PMD entries are similar to _PAGE_PROTNONE protection and are
    handled as NUMA hinting faults. The following two page table protection
    bits are what defines them

    _PAGE_NUMA:set _PAGE_PRESENT:clear

    A PMD is considered present if any of the _PAGE_PRESENT, _PAGE_PROTNONE,
    _PAGE_PSE or _PAGE_NUMA bits are set. If pmdp_invalidate encounters a
    pmd_numa, it clears the present bit leaving _PAGE_NUMA which will be
    considered not present by the CPU but present by pmd_present. The
    existing caller of pmdp_invalidate should handle it but it's an
    inconsistent state for a PMD. This patch keeps the state consistent
    when calling pmdp_invalidate.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Alex Thorlton
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

15 Nov, 2013

2 commits

  • Only trivial cases left. Let's convert them altogether.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Ingo Molnar
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Currently mm->pmd_huge_pte protected by page table lock. It will not
    work with split lock. We have to have per-pmd pmd_huge_pte for proper
    access serialization.

    For now, let's just introduce wrapper to access mm->pmd_huge_pte.

    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Alex Thorlton
    Cc: Ingo Molnar
    Cc: Naoya Horiguchi
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

12 Sep, 2013

1 commit

  • pgtable related functions are mostly in pgtable-generic.c.
    So move remaining functions from memory.c to pgtable-generic.c.

    Signed-off-by: Joonsoo Kim
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Mel Gorman
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

20 Jun, 2013

1 commit

  • This will be later used by powerpc THP support. In powerpc we want to use
    pgtable for storing the hash index values. So instead of adding them to
    mm_context list, we would like to store them in the second half of pmd

    Signed-off-by: Aneesh Kumar K.V
    Reviewed-by: Andrea Arcangeli
    Reviewed-by: David Gibson
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     

11 Dec, 2012

2 commits

  • If ptep_clear_flush() is called to clear a page table entry that is
    accessible anyway by the CPU, eg. a _PAGE_PROTNONE page table entry,
    there is no need to flush the TLB on remote CPUs.

    Signed-off-by: Rik van Riel
    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/n/tip-vm3rkzevahelwhejx5uwm8ex@git.kernel.org
    Signed-off-by: Ingo Molnar

    Rik van Riel
     
  • The function ptep_set_access_flags is only ever used to upgrade
    access permissions to a page. That means the only negative side
    effect of not flushing remote TLBs is that other CPUs may incur
    spurious page faults, if they happen to access the same address,
    and still have a PTE with the old permissions cached in their
    TLB.

    Having another CPU maybe incur a spurious page fault is faster
    than always incurring the cost of a remote TLB flush, so replace
    the remote TLB flush with a purely local one.

    This should be safe on every architecture that correctly
    implements flush_tlb_fix_spurious_fault() to actually invalidate
    the local TLB entry that caused a page fault, as well as on
    architectures where the hardware invalidates TLB entries that
    cause page faults.

    In the unlikely event that you are hitting what appears to be
    an infinite loop of page faults, and 'git bisect' took you to
    this changeset, your architecture needs to implement
    flush_tlb_fix_spurious_fault to actually flush the TLB entry.

    Signed-off-by: Rik van Riel
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Cc: Michel Lespinasse
    Cc: Ingo Molnar

    Rik van Riel
     

09 Oct, 2012

2 commits

  • On s390, a valid page table entry must not be changed while it is attached
    to any CPU. So instead of pmd_mknotpresent() and set_pmd_at(), an IDTE
    operation would be necessary there. This patch introduces the
    pmdp_invalidate() function, to allow architecture-specific
    implementations.

    Signed-off-by: Gerald Schaefer
    Cc: Andrea Arcangeli
    Cc: Andi Kleen
    Cc: Hugh Dickins
    Cc: Hillf Danton
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerald Schaefer
     
  • The thp page table pre-allocation code currently assumes that pgtable_t is
    of type "struct page *". This may not be true for all architectures, so
    this patch removes that assumption by replacing the functions
    prepare_pmd_huge_pte() and get_pmd_huge_pte() with two new functions that
    can be defined architecture-specific.

    It also removes two VM_BUG_ON checks for page_count() and page_mapcount()
    operating on a pgtable_t. Apart from the VM_BUG_ON removal, there will be
    no functional change introduced by this patch.

    Signed-off-by: Gerald Schaefer
    Cc: Andrea Arcangeli
    Cc: Andi Kleen
    Cc: Hugh Dickins
    Cc: Hillf Danton
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerald Schaefer
     

26 May, 2012

1 commit

  • The change adds some infrastructure for managing tile pmd's more generally,
    using pte_pmd() and pmd_pte() methods to translate pmd values to and
    from ptes, since on TILEPro a pmd is really just a nested structure
    holding a pgd (aka pte). Several existing pmd methods are moved into
    this framework, and a whole raft of additional pmd accessors are defined
    that are used by the transparent hugepage framework.

    The tile PTE now has a "client2" bit. The bit is used to indicate a
    transparent huge page is in the process of being split into subpages.

    This change also fixes a generic bug where the return value of the
    generic pmdp_splitting_flush() was incorrect.

    Signed-off-by: Chris Metcalf

    Chris Metcalf
     

22 Mar, 2012

1 commit

  • These macros will be used in a later patch, where all usages are expected
    to be optimized away without #ifdef CONFIG_TRANSPARENT_HUGEPAGE. But to
    detect unexpected usages, we convert the existing BUG() to BUILD_BUG().

    [akpm@linux-foundation.org: fix build in mm/pgtable-generic.c]
    Signed-off-by: Naoya Horiguchi
    Acked-by: Hillf Danton
    Reviewed-by: Andrea Arcangeli
    Reviewed-by: KAMEZAWA Hiroyuki
    Acked-by: David Rientjes
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

26 Jan, 2011

1 commit

  • mips (and sparc32):

    In file included from arch/mips/include/asm/tlb.h:21,
    from mm/pgtable-generic.c:9:
    include/asm-generic/tlb.h: In function `tlb_flush_mmu':
    include/asm-generic/tlb.h:76: error: implicit declaration of function `release_pages'
    include/asm-generic/tlb.h: In function `tlb_remove_page':
    include/asm-generic/tlb.h:105: error: implicit declaration of function `page_cache_release'

    free_pages_and_swap_cache() and free_page_and_swap_cache() are macros
    which call release_pages() and page_cache_release(). The obvious fix is
    to include pagemap.h in swap.h, where those macros are defined. But that
    breaks sparc for weird reasons.

    So fix it within mm/pgtable-generic.c instead.

    Reported-by: Yoichi Yuasa
    Cc: Geert Uytterhoeven
    Acked-by: Sam Ravnborg
    Cc: Sergei Shtylyov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

17 Jan, 2011

1 commit

  • pmdp_get_and_clear/pmdp_clear_flush/pmdp_splitting_flush were trapped as
    BUG() and they were defined only to diminish the risk of build issues on
    not-x86 archs and to be consistent with the generic pte methods previously
    defined in include/asm-generic/pgtable.h.

    But they are causing more trouble than they were supposed to solve, so
    it's simpler not to define them when THP is off.

    This is also correcting the export of pmdp_splitting_flush which is
    currently unused (x86 isn't using the generic implementation in
    mm/pgtable-generic.c and no other arch needs that [yet]).

    Signed-off-by: Andrea Arcangeli
    Sam Ravnborg
    Cc: Stephen Rothwell
    Cc: "David S. Miller"
    Cc: Benjamin Herrenschmidt
    Cc: "Luck, Tony"
    Cc: James Bottomley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

14 Jan, 2011

1 commit

  • Some are needed to build but not actually used on archs not supporting
    transparent hugepages. Others like pmdp_clear_flush are used by x86 too.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli