14 Nov, 2018

1 commit

  • commit aab8d0520e6e7c2a61f71195e6ce7007a4843afb upstream.

    Private ZONE_DEVICE pages use a special pte entry and thus are not
    present. Properly handle this case in map_pte(), it is already handled in
    check_pte(), the map_pte() part was lost in some rebase most probably.

    Without this patch the slow migration path can not migrate back to any
    private ZONE_DEVICE memory to regular memory. This was found after stress
    testing migration back to system memory. This ultimatly can lead to the
    CPU constantly page fault looping on the special swap entry.

    Link: http://lkml.kernel.org/r/20181019160442.18723-3-jglisse@redhat.com
    Signed-off-by: Ralph Campbell
    Signed-off-by: Jérôme Glisse
    Reviewed-by: Balbir Singh
    Cc: Andrew Morton
    Cc: Kirill A. Shutemov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Ralph Campbell
     

24 Jan, 2018

1 commit

  • commit 0d665e7b109d512b7cae3ccef6e8654714887844 upstream.

    Tetsuo reported random crashes under memory pressure on 32-bit x86
    system and tracked down to change that introduced
    page_vma_mapped_walk().

    The root cause of the issue is the faulty pointer math in check_pte().
    As ->pte may point to an arbitrary page we have to check that they are
    belong to the section before doing math. Otherwise it may lead to weird
    results.

    It wasn't noticed until now as mem_map[] is virtually contiguous on
    flatmem or vmemmap sparsemem. Pointer arithmetic just works against all
    'struct page' pointers. But with classic sparsemem, it doesn't because
    each section memap is allocated separately and so consecutive pfns
    crossing two sections might have struct pages at completely unrelated
    addresses.

    Let's restructure code a bit and replace pointer arithmetic with
    operations on pfns.

    Signed-off-by: Kirill A. Shutemov
    Reported-and-tested-by: Tetsuo Handa
    Acked-by: Michal Hocko
    Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Kirill A. Shutemov
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

14 Oct, 2017

2 commits

  • Loading the pmd without holding the pmd_lock exposes us to races with
    concurrent updaters of the page tables but, worse still, it also allows
    the compiler to cache the pmd value in a register and reuse it later on,
    even if we've performed a READ_ONCE in between and seen a more recent
    value.

    In the case of page_vma_mapped_walk, this leads to the following crash
    when the pmd loaded for the initial pmd_trans_huge check is all zeroes
    and a subsequent valid table entry is loaded by check_pmd. We then
    proceed into map_pte, but the compiler re-uses the zero entry inside
    pte_offset_map, resulting in a junk pointer being installed in
    pvmw->pte:

    PC is at check_pte+0x20/0x170
    LR is at page_vma_mapped_walk+0x2e0/0x540
    [...]
    Process doio (pid: 2463, stack limit = 0xffff00000f2e8000)
    Call trace:
    check_pte+0x20/0x170
    page_vma_mapped_walk+0x2e0/0x540
    page_mkclean_one+0xac/0x278
    rmap_walk_file+0xf0/0x238
    rmap_walk+0x64/0xa0
    page_mkclean+0x90/0xa8
    clear_page_dirty_for_io+0x84/0x2a8
    mpage_submit_page+0x34/0x98
    mpage_process_page_bufs+0x164/0x170
    mpage_prepare_extent_to_map+0x134/0x2b8
    ext4_writepages+0x484/0xe30
    do_writepages+0x44/0xe8
    __filemap_fdatawrite_range+0xbc/0x110
    file_write_and_wait_range+0x48/0xd8
    ext4_sync_file+0x80/0x4b8
    vfs_fsync_range+0x64/0xc0
    SyS_msync+0x194/0x1e8

    This patch fixes the problem by ensuring that READ_ONCE is used before
    the initial checks on the pmd, and this value is subsequently used when
    checking whether or not the pmd is present. pmd_check is removed and
    the pmd_present check is inlined directly.

    Link: http://lkml.kernel.org/r/1507222630-5839-1-git-send-email-will.deacon@arm.com
    Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()")
    Signed-off-by: Will Deacon
    Tested-by: Yury Norov
    Tested-by: Richard Ruigrok
    Acked-by: Kirill A. Shutemov
    Cc: "Paul E. McKenney"
    Cc: Peter Zijlstra
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon
     
  • A non present pmd entry can appear after pmd_lock is taken in
    page_vma_mapped_walk(), even if THP migration is not enabled. The
    WARN_ONCE is unnecessary.

    Link: http://lkml.kernel.org/r/20171003142606.12324-1-zi.yan@sent.com
    Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
    Signed-off-by: Zi Yan
    Reported-by: Abdul Haleem
    Tested-by: Abdul Haleem
    Acked-by: Kirill A. Shutemov
    Cc: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zi Yan
     

09 Sep, 2017

2 commits

  • Allow to unmap and restore special swap entry of un-addressable
    ZONE_DEVICE memory.

    Link: http://lkml.kernel.org/r/20170817000548.32038-17-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Cc: Kirill A. Shutemov
    Cc: Aneesh Kumar
    Cc: Balbir Singh
    Cc: Benjamin Herrenschmidt
    Cc: Dan Williams
    Cc: David Nellans
    Cc: Evgeny Baskakov
    Cc: Johannes Weiner
    Cc: John Hubbard
    Cc: Mark Hairgrove
    Cc: Michal Hocko
    Cc: Paul E. McKenney
    Cc: Ross Zwisler
    Cc: Sherry Cheung
    Cc: Subhash Gutti
    Cc: Vladimir Davydov
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • Add thp migration's core code, including conversions between a PMD entry
    and a swap entry, setting PMD migration entry, removing PMD migration
    entry, and waiting on PMD migration entries.

    This patch makes it possible to support thp migration. If you fail to
    allocate a destination page as a thp, you just split the source thp as
    we do now, and then enter the normal page migration. If you succeed to
    allocate destination thp, you enter thp migration. Subsequent patches
    actually enable thp migration for each caller of page migration by
    allowing its get_new_page() callback to allocate thps.

    [zi.yan@cs.rutgers.edu: fix gcc-4.9.0 -Wmissing-braces warning]
    Link: http://lkml.kernel.org/r/A0ABA698-7486-46C3-B209-E95A9048B22C@cs.rutgers.edu
    [akpm@linux-foundation.org: fix x86_64 allnoconfig warning]
    Signed-off-by: Zi Yan
    Acked-by: Kirill A. Shutemov
    Cc: "H. Peter Anvin"
    Cc: Anshuman Khandual
    Cc: Dave Hansen
    Cc: David Nellans
    Cc: Ingo Molnar
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Naoya Horiguchi
    Cc: Thomas Gleixner
    Cc: Vlastimil Babka
    Cc: Andrea Arcangeli
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zi Yan
     

07 Jul, 2017

1 commit

  • A poisoned or migrated hugepage is stored as a swap entry in the page
    tables. On architectures that support hugepages consisting of
    contiguous page table entries (such as on arm64) this leads to ambiguity
    in determining the page table entry to return in huge_pte_offset() when
    a poisoned entry is encountered.

    Let's remove the ambiguity by adding a size parameter to convey
    additional information about the requested address. Also fixup the
    definition/usage of huge_pte_offset() throughout the tree.

    Link: http://lkml.kernel.org/r/20170522133604.11392-4-punit.agrawal@arm.com
    Signed-off-by: Punit Agrawal
    Acked-by: Steve Capper
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: James Hogan (odd fixer:METAG ARCHITECTURE)
    Cc: Ralf Baechle (supporter:MIPS)
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Yoshinori Sato
    Cc: Rich Felker
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Alexander Viro
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Naoya Horiguchi
    Cc: "Aneesh Kumar K.V"
    Cc: "Kirill A. Shutemov"
    Cc: Hillf Danton
    Cc: Mark Rutland
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Punit Agrawal
     

08 Apr, 2017

1 commit

  • Doug Smythies reports oops with KSM in this backtrace, I've been seeing
    the same:

    page_vma_mapped_walk+0xe6/0x5b0
    page_referenced_one+0x91/0x1a0
    rmap_walk_ksm+0x100/0x190
    rmap_walk+0x4f/0x60
    page_referenced+0x149/0x170
    shrink_active_list+0x1c2/0x430
    shrink_node_memcg+0x67a/0x7a0
    shrink_node+0xe1/0x320
    kswapd+0x34b/0x720

    Just as observed in commit 4b0ece6fa016 ("mm: migrate: fix
    remove_migration_pte() for ksm pages"), you cannot use page->index
    calculations on ksm pages.

    page_vma_mapped_walk() is relying on __vma_address(), where a ksm page
    can lead it off the end of the page table, and into whatever nonsense is
    in the next page, ending as an oops inside check_pte()'s pte_page().

    KSM tells page_vma_mapped_walk() exactly where to look for the page, it
    does not need any page->index calculation: and that's so also for all
    the normal and file and anon pages - just not for THPs and their
    subpages. Get out early in most cases: instead of a PageKsm test, move
    down the earlier not-THP-page test, as suggested by Kirill.

    I'm also slightly worried that this loop can stray into other vmas, so
    added a vm_end test to prevent surprises; though I have not imagined
    anything worse than a very contrived case, in which a page mlocked in
    the next vma might be reclaimed because it is not mlocked in this vma.

    Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1704031104400.1118@eggly.anvils
    Signed-off-by: Hugh Dickins
    Reported-by: Doug Smythies
    Tested-by: Doug Smythies
    Reviewed-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

10 Mar, 2017

1 commit


25 Feb, 2017

2 commits

  • For consistency, it worth converting all page_check_address() to
    page_vma_mapped_walk(), so we could drop the former.

    Link: http://lkml.kernel.org/r/20170129173858.45174-11-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Acked-by: Hillf Danton
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Srikar Dronamraju
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Introduce a new interface to check if a page is mapped into a vma. It
    aims to address shortcomings of page_check_address{,_transhuge}.

    Existing interface is not able to handle PTE-mapped THPs: it only finds
    the first PTE. The rest lefted unnoticed.

    page_vma_mapped_walk() iterates over all possible mapping of the page in
    the vma.

    Link: http://lkml.kernel.org/r/20170129173858.45174-3-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Srikar Dronamraju
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov