10 Sep, 2020

1 commit

  • commit e5a59d308f52bb0052af5790c22173651b187465 upstream.

    collapse_file() in khugepaged passes PAGE_SIZE as the number of pages to
    be read to page_cache_sync_readahead(). The intent was probably to read
    a single page. Fix it to use the number of pages to the end of the
    window instead.

    Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
    Signed-off-by: David Howells
    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Reviewed-by: Matthew Wilcox (Oracle)
    Acked-by: Song Liu
    Acked-by: Yang Shi
    Acked-by: Pankaj Gupta
    Cc: Eric Biggers
    Link: https://lkml.kernel.org/r/20200903140844.14194-2-willy@infradead.org
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

26 Aug, 2020

2 commits

  • [ Upstream commit f3f99d63a8156c7a4a6b20aac22b53c5579c7dc1 ]

    syzbot crashes on the VM_BUG_ON_MM(khugepaged_test_exit(mm), mm) in
    __khugepaged_enter(): yes, when one thread is about to dump core, has set
    core_state, and is waiting for others, another might do something calling
    __khugepaged_enter(), which now crashes because I lumped the core_state
    test (known as "mmget_still_valid") into khugepaged_test_exit(). I still
    think it's best to lump them together, so just in this exceptional case,
    check mm->mm_users directly instead of khugepaged_test_exit().

    Fixes: bbe98f9cadff ("khugepaged: khugepaged_test_exit() check mmget_still_valid()")
    Reported-by: syzbot
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Acked-by: Yang Shi
    Cc: "Kirill A. Shutemov"
    Cc: Andrea Arcangeli
    Cc: Song Liu
    Cc: Mike Kravetz
    Cc: Eric Dumazet
    Cc: [4.8+]
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008141503370.18085@eggly.anvils
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Hugh Dickins
     
  • [ Upstream commit bbe98f9cadff58cdd6a4acaeba0efa8565dabe65 ]

    Move collapse_huge_page()'s mmget_still_valid() check into
    khugepaged_test_exit() itself. collapse_huge_page() is used for anon THP
    only, and earned its mmget_still_valid() check because it inserts a huge
    pmd entry in place of the page table's pmd entry; whereas
    collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp()
    merely clears the page table's pmd entry. But core dumping without mmap
    lock must have been as open to mistaking a racily cleared pmd entry for a
    page table at physical page 0, as exit_mmap() was. And we certainly have
    no interest in mapping as a THP once dumping core.

    Fixes: 59ea6d06cfa9 ("coredump: fix race condition between collapse_huge_page() and core dumping")
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Cc: Andrea Arcangeli
    Cc: Song Liu
    Cc: Mike Kravetz
    Cc: Kirill A. Shutemov
    Cc: [4.8+]
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021217020.27773@eggly.anvils
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Hugh Dickins
     

21 Aug, 2020

3 commits

  • commit 18e77600f7a1ed69f8ce46c9e11cad0985712dfa upstream.

    Only once have I seen this scenario (and forgot even to notice what forced
    the eventual crash): a sequence of "BUG: Bad page map" alerts from
    vm_normal_page(), from zap_pte_range() servicing exit_mmap();
    pmd:00000000, pte values corresponding to data in physical page 0.

    The pte mappings being zapped in this case were supposed to be from a huge
    page of ext4 text (but could as well have been shmem): my belief is that
    it was racing with collapse_file()'s retract_page_tables(), found *pmd
    pointing to a page table, locked it, but *pmd had become 0 by the time
    start_pte was decided.

    In most cases, that possibility is excluded by holding mmap lock; but
    exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged
    checks khugepaged_test_exit() after acquiring mmap lock:
    khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
    for example. But retract_page_tables() did not: fix that.

    The fix is for retract_page_tables() to check khugepaged_test_exit(),
    after acquiring mmap lock, before doing anything to the page table.
    Getting the mmap lock serializes with __mmput(), which briefly takes and
    drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
    mm_users makes sure we don't touch the page table once exit_mmap() might
    reach it, since exit_mmap() will be proceeding without mmap lock, not
    expecting anyone to be racing with it.

    Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Acked-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Mike Kravetz
    Cc: Song Liu
    Cc: [4.8+]
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Hugh Dickins
     
  • commit 119a5fc16105b2b9383a6e2a7800b2ef861b2975 upstream.

    When retract_page_tables() removes a page table to make way for a huge
    pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
    pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
    case when the original mmap_write_trylock had failed), only
    mmap_write_trylock and pmd lock are held.

    That's not enough. One machine has twice crashed under load, with "BUG:
    spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second
    crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
    page_referenced() on a file THP, that had found a page table at *pmd)
    discovers that the page table page and its lock have already been freed by
    the time it comes to unlock.

    Follow the example of retract_page_tables(), but we only need one of huge
    page lock or i_mmap_lock_write to secure against this: because it's the
    narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
    the hpage earlier, choose to rely on huge page lock here.

    Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Acked-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Mike Kravetz
    Cc: Song Liu
    Cc: [5.4+]
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Hugh Dickins
     
  • commit 723a80dafed5c95889d48baab9aa433a6ffa0b4e upstream.

    pmdp_collapse_flush() should be given the start address at which the huge
    page is mapped, haddr: it was given addr, which at that point has been
    used as a local variable, incremented to the end address of the extent.

    Found by source inspection while chasing a hugepage locking bug, which I
    then could not explain by this. At first I thought this was very bad;
    then saw that all of the page translations that were not flushed would
    actually still point to the right pages afterwards, so harmless; then
    realized that I know nothing of how different architectures and models
    cache intermediate paging structures, so maybe it matters after all -
    particularly since the page table concerned is immediately freed.

    Much easier to fix than to think about.

    Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Acked-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Mike Kravetz
    Cc: Song Liu
    Cc: [5.4+]
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021204390.27773@eggly.anvils
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Hugh Dickins
     

29 Jul, 2020

1 commit

  • commit 594cced14ad3903166c8b091ff96adac7552f0b3 upstream.

    khugepaged has to drop mmap lock several times while collapsing a page.
    The situation can change while the lock is dropped and we need to
    re-validate that the VMA is still in place and the PMD is still subject
    for collapse.

    But we miss one corner case: while collapsing an anonymous pages the VMA
    could be replaced with file VMA. If the file VMA doesn't have any
    private pages we get NULL pointer dereference:

    general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
    KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
    anon_vma_lock_write include/linux/rmap.h:120 [inline]
    collapse_huge_page mm/khugepaged.c:1110 [inline]
    khugepaged_scan_pmd mm/khugepaged.c:1349 [inline]
    khugepaged_scan_mm_slot mm/khugepaged.c:2110 [inline]
    khugepaged_do_scan mm/khugepaged.c:2193 [inline]
    khugepaged+0x3bba/0x5a10 mm/khugepaged.c:2238

    The fix is to make sure that the VMA is anonymous in
    hugepage_vma_revalidate(). The helper is only used for collapsing
    anonymous pages.

    Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
    Reported-by: syzbot+ed318e8b790ca72c5ad0@syzkaller.appspotmail.com
    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Acked-by: Yang Shi
    Cc:
    Link: http://lkml.kernel.org/r/20200722121439.44328-1-kirill.shutemov@linux.intel.com
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Kirill A. Shutemov
     

03 Jun, 2020

1 commit

  • [ Upstream commit 2f33a706027c94cd4f70fcd3e3f4a17c1ce4ea4b ]

    When collapse_file() calls try_to_release_page(), it has already isolated
    the page: so if releasing buffers happens to fail (as it sometimes does),
    remember to putback_lru_page(): otherwise that page is left unreclaimable
    and unfreeable, and the file extent uncollapsible.

    Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Acked-by: Song Liu
    Acked-by: Kirill A. Shutemov
    Acked-by: Johannes Weiner
    Cc: Rik van Riel
    Cc: [5.4+]
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2005231837500.1766@eggly.anvils
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Hugh Dickins
     

16 Nov, 2019

1 commit

  • In collapse_file(), for !is_shmem case, current check cannot guarantee
    the locked page is up-to-date. Specifically, xas_unlock_irq() should
    not be called before lock_page() and get_page(); and it is necessary to
    recheck PageUptodate() after locking the page.

    With this bug and CONFIG_READ_ONLY_THP_FOR_FS=y, madvise(HUGE)'ed .text
    may contain corrupted data. This is because khugepaged mistakenly
    collapses some not up-to-date sub pages into a huge page, and assumes
    the huge page is up-to-date. This will NOT corrupt data in the disk,
    because the page is read-only and never written back. Fix this by
    properly checking PageUptodate() after locking the page. This check
    replaces "VM_BUG_ON_PAGE(!PageUptodate(page), page);".

    Also, move PageDirty() check after locking the page. Current khugepaged
    should not try to collapse dirty file THP, because it is limited to
    read-only .text. The only case we hit a dirty page here is when the
    page hasn't been written since write. Bail out and retry when this
    happens.

    syzbot reported bug on previous version of this patch.

    Link: http://lkml.kernel.org/r/20191106060930.2571389-2-songliubraving@fb.com
    Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
    Signed-off-by: Song Liu
    Reported-by: syzbot+efb9e48b9fbdc49bb34a@syzkaller.appspotmail.com
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Hugh Dickins
    Cc: William Kucharski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     

07 Nov, 2019

1 commit

  • I got some khugepaged spew on a 32bit x86:

    BUG: sleeping function called from invalid context at include/linux/mmu_notifier.h:346
    in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 25, name: khugepaged
    INFO: lockdep is turned off.
    CPU: 1 PID: 25 Comm: khugepaged Not tainted 5.4.0-rc5-elk+ #206
    Hardware name: System manufacturer P5Q-EM/P5Q-EM, BIOS 2203 07/08/2009
    Call Trace:
    dump_stack+0x66/0x8e
    ___might_sleep.cold.96+0x95/0xa6
    __might_sleep+0x2e/0x80
    collapse_huge_page.isra.51+0x5ac/0x1360
    khugepaged+0x9a9/0x20f0
    kthread+0xf5/0x110
    ret_from_fork+0x2e/0x38

    Looks like it's due to CONFIG_HIGHPTE=y pte_offset_map()->kmap_atomic()
    vs. mmu_notifier_invalidate_range_start(). Let's do the naive approach
    and just reorder the two operations.

    Link: http://lkml.kernel.org/r/20191029201513.GG1208@intel.com
    Fixes: 810e24e009cf71 ("mm/mmu_notifiers: annotate with might_sleep()")
    Signed-off-by: Ville Syrjl
    Reviewed-by: Andrew Morton
    Acked-by: Kirill A. Shutemov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Jérôme Glisse
    Cc: Ralph Campbell
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Daniel Vetter
    Cc: Andrea Arcangeli
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ville Syrjälä
     

25 Sep, 2019

5 commits

  • khugepaged needs exclusive mmap_sem to access page table. When it fails
    to lock mmap_sem, the page will fault in as pte-mapped THP. As the page
    is already a THP, khugepaged will not handle this pmd again.

    This patch enables the khugepaged to retry collapse the page table.

    struct mm_slot (in khugepaged.c) is extended with an array, containing
    addresses of pte-mapped THPs. We use array here for simplicity. We can
    easily replace it with more advanced data structures when needed.

    In khugepaged_scan_mm_slot(), if the mm contains pte-mapped THP, we try to
    collapse the page table.

    Since collapse may happen at an later time, some pages may already fault
    in. collapse_pte_mapped_thp() is added to properly handle these pages.
    collapse_pte_mapped_thp() also double checks whether all ptes in this pmd
    are mapping to the same THP. This is necessary because some subpage of
    the THP may be replaced, for example by uprobe. In such cases, it is not
    possible to collapse the pmd.

    [kirill.shutemov@linux.intel.com: add comments for retract_page_tables()]
    Link: http://lkml.kernel.org/r/20190816145443.6ard3iilytc6jlgv@box
    Link: http://lkml.kernel.org/r/20190815164525.1848545-6-songliubraving@fb.com
    Signed-off-by: Song Liu
    Signed-off-by: Kirill A. Shutemov
    Acked-by: Kirill A. Shutemov
    Suggested-by: Johannes Weiner
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     
  • In previous patch, an application could put part of its text section in
    THP via madvise(). These THPs will be protected from writes when the
    application is still running (TXTBSY). However, after the application
    exits, the file is available for writes.

    This patch avoids writes to file THP by dropping page cache for the file
    when the file is open for write. A new counter nr_thps is added to struct
    address_space. In do_dentry_open(), if the file is open for write and
    nr_thps is non-zero, we drop page cache for the whole file.

    Link: http://lkml.kernel.org/r/20190801184244.3169074-8-songliubraving@fb.com
    Signed-off-by: Song Liu
    Reported-by: kbuild test robot
    Acked-by: Rik van Riel
    Acked-by: Kirill A. Shutemov
    Acked-by: Johannes Weiner
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: William Kucharski
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     
  • This patch is (hopefully) the first step to enable THP for non-shmem
    filesystems.

    This patch enables an application to put part of its text sections to THP
    via madvise, for example:

    madvise((void *)0x600000, 0x200000, MADV_HUGEPAGE);

    We tried to reuse the logic for THP on tmpfs.

    Currently, write is not supported for non-shmem THP. khugepaged will only
    process vma with VM_DENYWRITE. sys_mmap() ignores VM_DENYWRITE requests
    (see ksys_mmap_pgoff). The only way to create vma with VM_DENYWRITE is
    execve(). This requirement limits non-shmem THP to text sections.

    The next patch will handle writes, which would only happen when the all
    the vmas with VM_DENYWRITE are unmapped.

    An EXPERIMENTAL config, READ_ONLY_THP_FOR_FS, is added to gate this
    feature.

    [songliubraving@fb.com: fix build without CONFIG_SHMEM]
    Link: http://lkml.kernel.org/r/F53407FB-96CC-42E8-9862-105C92CC2B98@fb.com
    [songliubraving@fb.com: fix double unlock in collapse_file()]
    Link: http://lkml.kernel.org/r/B960CBFA-8EFC-4DA4-ABC5-1977FFF2CA57@fb.com
    Link: http://lkml.kernel.org/r/20190801184244.3169074-7-songliubraving@fb.com
    Signed-off-by: Song Liu
    Acked-by: Rik van Riel
    Acked-by: Kirill A. Shutemov
    Acked-by: Johannes Weiner
    Cc: Stephen Rothwell
    Cc: Dan Carpenter
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: William Kucharski
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     
  • Next patch will add khugepaged support of non-shmem files. This patch
    renames these two functions to reflect the new functionality:

    collapse_shmem() => collapse_file()
    khugepaged_scan_shmem() => khugepaged_scan_file()

    Link: http://lkml.kernel.org/r/20190801184244.3169074-6-songliubraving@fb.com
    Signed-off-by: Song Liu
    Acked-by: Rik van Riel
    Acked-by: Kirill A. Shutemov
    Acked-by: Johannes Weiner
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: William Kucharski
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     
  • Transparent Huge Pages are currently stored in i_pages as pointers to
    consecutive subpages. This patch changes that to storing consecutive
    pointers to the head page in preparation for storing huge pages more
    efficiently in i_pages.

    Large parts of this are "inspired" by Kirill's patch
    https://lore.kernel.org/lkml/20170126115819.58875-2-kirill.shutemov@linux.intel.com/

    Kirill and Huang Ying contributed several fixes.

    [willy@infradead.org: use compound_nr, squish uninit-var warning]
    Link: http://lkml.kernel.org/r/20190731210400.7419-1-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Acked-by: Jan Kara
    Reviewed-by: Kirill Shutemov
    Reviewed-by: Song Liu
    Tested-by: Song Liu
    Tested-by: William Kucharski
    Reviewed-by: William Kucharski
    Tested-by: Qian Cai
    Tested-by: Mikhail Gavrilov
    Cc: Hugh Dickins
    Cc: Chris Wilson
    Cc: Song Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

03 Sep, 2019

1 commit

  • SD_BALANCE_{FORK,EXEC} and SD_WAKE_AFFINE are stripped in sd_init()
    for any sched domains with a NUMA distance greater than 2 hops
    (RECLAIM_DISTANCE). The idea being that it's expensive to balance
    across domains that far apart.

    However, as is rather unfortunately explained in:

    commit 32e45ff43eaf ("mm: increase RECLAIM_DISTANCE to 30")

    the value for RECLAIM_DISTANCE is based on node distance tables from
    2011-era hardware.

    Current AMD EPYC machines have the following NUMA node distances:

    node distances:
    node 0 1 2 3 4 5 6 7
    0: 10 16 16 16 32 32 32 32
    1: 16 10 16 16 32 32 32 32
    2: 16 16 10 16 32 32 32 32
    3: 16 16 16 10 32 32 32 32
    4: 32 32 32 32 10 16 16 16
    5: 32 32 32 32 16 10 16 16
    6: 32 32 32 32 16 16 10 16
    7: 32 32 32 32 16 16 16 10

    where 2 hops is 32.

    The result is that the scheduler fails to load balance properly across
    NUMA nodes on different sockets -- 2 hops apart.

    For example, pinning 16 busy threads to NUMA nodes 0 (CPUs 0-7) and 4
    (CPUs 32-39) like so,

    $ numactl -C 0-7,32-39 ./spinner 16

    causes all threads to fork and remain on node 0 until the active
    balancer kicks in after a few seconds and forcibly moves some threads
    to node 4.

    Override node_reclaim_distance for AMD Zen.

    Signed-off-by: Matt Fleming
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Mel Gorman
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Suravee.Suthikulpanit@amd.com
    Cc: Thomas Gleixner
    Cc: Thomas.Lendacky@amd.com
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20190808195301.13222-3-matt@codeblueprint.co.uk
    Signed-off-by: Ingo Molnar

    Matt Fleming
     

06 Jul, 2019

1 commit

  • This reverts commit 5fd4ca2d84b249f0858ce28cf637cf25b61a398f.

    Mikhail Gavrilov reports that it causes the VM_BUG_ON_PAGE() in
    __delete_from_swap_cache() to trigger:

    page:ffffd6d34dff0000 refcount:1 mapcount:1 mapping:ffff97812323a689 index:0xfecec363
    anon
    flags: 0x17fffe00080034(uptodate|lru|active|swapbacked)
    raw: 0017fffe00080034 ffffd6d34c67c508 ffffd6d3504b8d48 ffff97812323a689
    raw: 00000000fecec363 0000000000000000 0000000100000000 ffff978433ace000
    page dumped because: VM_BUG_ON_PAGE(entry != page)
    page->mem_cgroup:ffff978433ace000
    ------------[ cut here ]------------
    kernel BUG at mm/swap_state.c:170!
    invalid opcode: 0000 [#1] SMP NOPTI
    CPU: 1 PID: 221 Comm: kswapd0 Not tainted 5.2.0-0.rc2.git0.1.fc31.x86_64 #1
    Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 2202 04/11/2019
    RIP: 0010:__delete_from_swap_cache+0x20d/0x240
    Code: 30 65 48 33 04 25 28 00 00 00 75 4a 48 83 c4 38 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 2f dc 0f 8a 48 89 c7 e8 93 1b fd ff 0b 48 c7 c6 a8 74 0f 8a e8 85 1b fd ff 0f 0b 48 c7 c6 a8 7d 0f
    RSP: 0018:ffffa982036e7980 EFLAGS: 00010046
    RAX: 0000000000000021 RBX: 0000000000000040 RCX: 0000000000000006
    RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff97843d657900
    RBP: 0000000000000001 R08: ffffa982036e7835 R09: 0000000000000535
    R10: ffff97845e21a46c R11: ffffa982036e7835 R12: ffff978426387120
    R13: 0000000000000000 R14: ffffd6d34dff0040 R15: ffffd6d34dff0000
    FS: 0000000000000000(0000) GS:ffff97843d640000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00002cba88ef5000 CR3: 000000078a97c000 CR4: 00000000003406e0
    Call Trace:
    delete_from_swap_cache+0x46/0xa0
    try_to_free_swap+0xbc/0x110
    swap_writepage+0x13/0x70
    pageout.isra.0+0x13c/0x350
    shrink_page_list+0xc14/0xdf0
    shrink_inactive_list+0x1e5/0x3c0
    shrink_node_memcg+0x202/0x760
    shrink_node+0xe0/0x470
    balance_pgdat+0x2d1/0x510
    kswapd+0x220/0x420
    kthread+0xfb/0x130
    ret_from_fork+0x22/0x40

    and it's not immediately obvious why it happens. It's too late in the
    rc cycle to do anything but revert for now.

    Link: https://lore.kernel.org/lkml/CABXGCsN9mYmBD-4GaaeW_NrDu+FDXLzr_6x+XNxfmFV6QkYCDg@mail.gmail.com/
    Reported-and-bisected-by: Mikhail Gavrilov
    Suggested-by: Jan Kara
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Matthew Wilcox
    Cc: Kirill Shutemov
    Cc: William Kucharski
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

14 Jun, 2019

1 commit

  • When fixing the race conditions between the coredump and the mmap_sem
    holders outside the context of the process, we focused on
    mmget_not_zero()/get_task_mm() callers in 04f5866e41fb70 ("coredump: fix
    race condition between mmget_not_zero()/get_task_mm() and core
    dumping"), but those aren't the only cases where the mmap_sem can be
    taken outside of the context of the process as Michal Hocko noticed
    while backporting that commit to older -stable kernels.

    If mmgrab() is called in the context of the process, but then the
    mm_count reference is transferred outside the context of the process,
    that can also be a problem if the mmap_sem has to be taken for writing
    through that mm_count reference.

    khugepaged registration calls mmgrab() in the context of the process,
    but the mmap_sem for writing is taken later in the context of the
    khugepaged kernel thread.

    collapse_huge_page() after taking the mmap_sem for writing doesn't
    modify any vma, so it's not obvious that it could cause a problem to the
    coredump, but it happens to modify the pmd in a way that breaks an
    invariant that pmd_trans_huge_lock() relies upon. collapse_huge_page()
    needs the mmap_sem for writing just to block concurrent page faults that
    call pmd_trans_huge_lock().

    Specifically the invariant that "!pmd_trans_huge()" cannot become a
    "pmd_trans_huge()" doesn't hold while collapse_huge_page() runs.

    The coredump will call __get_user_pages() without mmap_sem for reading,
    which eventually can invoke a lockless page fault which will need a
    functional pmd_trans_huge_lock().

    So collapse_huge_page() needs to use mmget_still_valid() to check it's
    not running concurrently with the coredump... as long as the coredump
    can invoke page faults without holding the mmap_sem for reading.

    This has "Fixes: khugepaged" to facilitate backporting, but in my view
    it's more a bug in the coredump code that will eventually have to be
    rewritten to stop invoking page faults without the mmap_sem for reading.
    So the long term plan is still to drop all mmget_still_valid().

    Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com
    Fixes: ba76149f47d8 ("thp: khugepaged")
    Signed-off-by: Andrea Arcangeli
    Reported-by: Michal Hocko
    Acked-by: Michal Hocko
    Acked-by: Kirill A. Shutemov
    Cc: Oleg Nesterov
    Cc: Jann Horn
    Cc: Hugh Dickins
    Cc: Mike Rapoport
    Cc: Mike Kravetz
    Cc: Peter Xu
    Cc: Jason Gunthorpe
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

15 May, 2019

3 commits

  • This updates each existing invalidation to use the correct mmu notifier
    event that represent what is happening to the CPU page table. See the
    patch which introduced the events to see the rational behind this.

    Link: http://lkml.kernel.org/r/20190326164747.24405-7-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Reviewed-by: Ralph Campbell
    Reviewed-by: Ira Weiny
    Cc: Christian König
    Cc: Joonas Lahtinen
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Jan Kara
    Cc: Andrea Arcangeli
    Cc: Peter Xu
    Cc: Felix Kuehling
    Cc: Jason Gunthorpe
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Christian Koenig
    Cc: John Hubbard
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • CPU page table update can happens for many reasons, not only as a result
    of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as
    a result of kernel activities (memory compression, reclaim, migration,
    ...).

    Users of mmu notifier API track changes to the CPU page table and take
    specific action for them. While current API only provide range of virtual
    address affected by the change, not why the changes is happening.

    This patchset do the initial mechanical convertion of all the places that
    calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP
    event as well as the vma if it is know (most invalidation happens against
    a given vma). Passing down the vma allows the users of mmu notifier to
    inspect the new vma page protection.

    The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier
    should assume that every for the range is going away when that event
    happens. A latter patch do convert mm call path to use a more appropriate
    events for each call.

    This is done as 2 patches so that no call site is forgotten especialy
    as it uses this following coccinelle patch:

    %vm_mm, E3, E4)
    ...>

    @@
    expression E1, E2, E3, E4;
    identifier FN, VMA;
    @@
    FN(..., struct vm_area_struct *VMA, ...) {
    }

    @@
    expression E1, E2, E3, E4;
    identifier FN, VMA;
    @@
    FN(...) {
    struct vm_area_struct *VMA;
    }

    @@
    expression E1, E2, E3, E4;
    identifier FN;
    @@
    FN(...) {
    }
    ---------------------------------------------------------------------->%

    Applied with:
    spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place
    spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place
    spatch --sp-file mmu-notifier.spatch --dir mm --in-place

    Link: http://lkml.kernel.org/r/20190326164747.24405-6-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Reviewed-by: Ralph Campbell
    Reviewed-by: Ira Weiny
    Cc: Christian König
    Cc: Joonas Lahtinen
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Jan Kara
    Cc: Andrea Arcangeli
    Cc: Peter Xu
    Cc: Felix Kuehling
    Cc: Jason Gunthorpe
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Christian Koenig
    Cc: John Hubbard
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • Transparent Huge Pages are currently stored in i_pages as pointers to
    consecutive subpages. This patch changes that to storing consecutive
    pointers to the head page in preparation for storing huge pages more
    efficiently in i_pages.

    Large parts of this are "inspired" by Kirill's patch
    https://lore.kernel.org/lkml/20170126115819.58875-2-kirill.shutemov@linux.intel.com/

    [willy@infradead.org: fix swapcache pages]
    Link: http://lkml.kernel.org/r/20190324155441.GF10344@bombadil.infradead.org
    [kirill@shutemov.name: hugetlb stores pages in page cache differently]
    Link: http://lkml.kernel.org/r/20190404134553.vuvhgmghlkiw2hgl@kshutemo-mobl1
    Link: http://lkml.kernel.org/r/20190307153051.18815-1-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Acked-by: Jan Kara
    Reviewed-by: Kirill Shutemov
    Reviewed-and-tested-by: Song Liu
    Tested-by: William Kucharski
    Reviewed-by: William Kucharski
    Tested-by: Qian Cai
    Cc: Hugh Dickins
    Cc: Song Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

06 Mar, 2019

1 commit

  • Currently THP allocation events data is fairly opaque, since you can
    only get it system-wide. This patch makes it easier to reason about
    transparent hugepage behaviour on a per-memcg basis.

    For anonymous THP-backed pages, we already have MEMCG_RSS_HUGE in v1,
    which is used for v1's rss_huge [sic]. This is reused here as it's
    fairly involved to untangle NR_ANON_THPS right now to make it per-memcg,
    since right now some of this is delegated to rmap before we have any
    memcg actually assigned to the page. It's a good idea to rework that,
    but let's leave untangling THP allocation for a future patch.

    [akpm@linux-foundation.org: fix build]
    [chris@chrisdown.name: fix memcontrol build when THP is disabled]
    Link: http://lkml.kernel.org/r/20190131160802.GA5777@chrisdown.name
    Link: http://lkml.kernel.org/r/20190129205852.GA7310@chrisdown.name
    Signed-off-by: Chris Down
    Acked-by: Johannes Weiner
    Cc: Tejun Heo
    Cc: Roman Gushchin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Down
     

29 Dec, 2018

1 commit

  • To avoid having to change many call sites everytime we want to add a
    parameter use a structure to group all parameters for the mmu_notifier
    invalidate_range_start/end cakks. No functional changes with this patch.

    [akpm@linux-foundation.org: coding style fixes]
    Link: http://lkml.kernel.org/r/20181205053628.3210-3-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Acked-by: Christian König
    Acked-by: Jan Kara
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Felix Kuehling
    Cc: Ralph Campbell
    Cc: John Hubbard
    From: Jérôme Glisse
    Subject: mm/mmu_notifier: use structure for invalidate_range_start/end calls v3

    fix build warning in migrate.c when CONFIG_MMU_NOTIFIER=n

    Link: http://lkml.kernel.org/r/20181213171330.8489-3-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     

04 Dec, 2018

1 commit

  • …k/linux-rcu into core/rcu

    Pull RCU changes from Paul E. McKenney:

    - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

    - Replace calls of RCU-bh and RCU-sched update-side functions
    to their vanilla RCU counterparts. This series is a step
    towards complete removal of the RCU-bh and RCU-sched update-side
    functions.

    ( Note that some of these conversions are going upstream via their
    respective maintainers. )

    - Documentation updates, including a number of flavor-consolidation
    updates from Joel Fernandes.

    - Miscellaneous fixes.

    - Automate generation of the initrd filesystem used for
    rcutorture testing.

    - Convert spin_is_locked() assertions to instead use lockdep.

    ( Note that some of these conversions are going upstream via their
    respective maintainers. )

    - SRCU updates, especially including a fix from Dennis Krein
    for a bag-on-head-class bug.

    - RCU torture-test updates.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

01 Dec, 2018

7 commits

  • collapse_shmem()'s xas_nomem() is very unlikely to fail, but it is
    rightly given a failure path, so move the whole xas_create_range() block
    up before __SetPageLocked(new_page): so that it does not need to
    remember to unlock_page(new_page).

    Add the missing mem_cgroup_cancel_charge(), and set (currently unused)
    result to SCAN_FAIL rather than SCAN_SUCCEED.

    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261531200.2275@eggly.anvils
    Fixes: 77da9389b9d5 ("mm: Convert collapse_shmem to XArray")
    Signed-off-by: Hugh Dickins
    Cc: Matthew Wilcox
    Cc: Kirill A. Shutemov
    Cc: Jerome Glisse
    Cc: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • collapse_shmem()'s VM_BUG_ON_PAGE(PageTransCompound) was unsafe: before
    it holds page lock of the first page, racing truncation then extension
    might conceivably have inserted a hugepage there already. Fail with the
    SCAN_PAGE_COMPOUND result, instead of crashing (CONFIG_DEBUG_VM=y) or
    otherwise mishandling the unexpected hugepage - though later we might
    code up a more constructive way of handling it, with SCAN_SUCCESS.

    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261529310.2275@eggly.anvils
    Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
    Signed-off-by: Hugh Dickins
    Cc: Kirill A. Shutemov
    Cc: Jerome Glisse
    Cc: Konstantin Khlebnikov
    Cc: Matthew Wilcox
    Cc: [4.8+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • khugepaged's collapse_shmem() does almost all of its work, to assemble
    the huge new_page from 512 scattered old pages, with the new_page's
    refcount frozen to 0 (and refcounts of all old pages so far also frozen
    to 0). Including shmem_getpage() to read in any which were out on swap,
    memory reclaim if necessary to allocate their intermediate pages, and
    copying over all the data from old to new.

    Imagine the frozen refcount as a spinlock held, but without any lock
    debugging to highlight the abuse: it's not good, and under serious load
    heads into lockups - speculative getters of the page are not expecting
    to spin while khugepaged is rescheduled.

    One can get a little further under load by hacking around elsewhere; but
    fortunately, freezing the new_page turns out to have been entirely
    unnecessary, with no hacks needed elsewhere.

    The huge new_page lock is already held throughout, and guards all its
    subpages as they are brought one by one into the page cache tree; and
    anything reading the data in that page, without the lock, before it has
    been marked PageUptodate, would already be in the wrong. So simply
    eliminate the freezing of the new_page.

    Each of the old pages remains frozen with refcount 0 after it has been
    replaced by a new_page subpage in the page cache tree, until they are
    all unfrozen on success or failure: just as before. They could be
    unfrozen sooner, but cause no problem once no longer visible to
    find_get_entry(), filemap_map_pages() and other speculative lookups.

    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils
    Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
    Signed-off-by: Hugh Dickins
    Acked-by: Kirill A. Shutemov
    Cc: Jerome Glisse
    Cc: Konstantin Khlebnikov
    Cc: Matthew Wilcox
    Cc: [4.8+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Several cleanups in collapse_shmem(): most of which probably do not
    really matter, beyond doing things in a more familiar and reassuring
    order. Simplify the failure gotos in the main loop, and on success
    update stats while interrupts still disabled from the last iteration.

    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261526400.2275@eggly.anvils
    Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
    Signed-off-by: Hugh Dickins
    Acked-by: Kirill A. Shutemov
    Cc: Jerome Glisse
    Cc: Konstantin Khlebnikov
    Cc: Matthew Wilcox
    Cc: [4.8+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Huge tmpfs testing reminds us that there is no __GFP_ZERO in the gfp
    flags khugepaged uses to allocate a huge page - in all common cases it
    would just be a waste of effort - so collapse_shmem() must remember to
    clear out any holes that it instantiates.

    The obvious place to do so, where they are put into the page cache tree,
    is not a good choice: because interrupts are disabled there. Leave it
    until further down, once success is assured, where the other pages are
    copied (before setting PageUptodate).

    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261525080.2275@eggly.anvils
    Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
    Signed-off-by: Hugh Dickins
    Acked-by: Kirill A. Shutemov
    Cc: Jerome Glisse
    Cc: Konstantin Khlebnikov
    Cc: Matthew Wilcox
    Cc: [4.8+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Huge tmpfs testing on a shortish file mapped into a pmd-rounded extent
    hit shmem_evict_inode()'s WARN_ON(inode->i_blocks) followed by
    clear_inode()'s BUG_ON(inode->i_data.nrpages) when the file was later
    closed and unlinked.

    khugepaged's collapse_shmem() was forgetting to update mapping->nrpages
    on the rollback path, after it had added but then needs to undo some
    holes.

    There is indeed an irritating asymmetry between shmem_charge(), whose
    callers want it to increment nrpages after successfully accounting
    blocks, and shmem_uncharge(), when __delete_from_page_cache() already
    decremented nrpages itself: oh well, just add a comment on that to them
    both.

    And shmem_recalc_inode() is supposed to be called when the accounting is
    expected to be in balance (so it can deduce from imbalance that reclaim
    discarded some pages): so change shmem_charge() to update nrpages
    earlier (though it's rare for the difference to matter at all).

    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261523450.2275@eggly.anvils
    Fixes: 800d8c63b2e98 ("shmem: add huge pages support")
    Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
    Signed-off-by: Hugh Dickins
    Acked-by: Kirill A. Shutemov
    Cc: Jerome Glisse
    Cc: Konstantin Khlebnikov
    Cc: Matthew Wilcox
    Cc: [4.8+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Huge tmpfs testing showed that although collapse_shmem() recognizes a
    concurrently truncated or hole-punched page correctly, its handling of
    holes was liable to refill an emptied extent. Add check to stop that.

    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261522040.2275@eggly.anvils
    Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
    Signed-off-by: Hugh Dickins
    Reviewed-by: Matthew Wilcox
    Cc: Kirill A. Shutemov
    Cc: Jerome Glisse
    Cc: Konstantin Khlebnikov
    Cc: [4.8+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

13 Nov, 2018

1 commit

  • lockdep_assert_held() is better suited to checking locking requirements,
    since it only checks if the current thread holds the lock regardless of
    whether someone else does. This is also a step towards possibly removing
    spin_is_locked().

    Signed-off-by: Lance Roy
    Cc: Andrew Morton
    Cc: "Kirill A. Shutemov"
    Cc: Yang Shi
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Jan Kara
    Cc: Shakeel Butt
    Cc:
    Signed-off-by: Paul E. McKenney

    Lance Roy
     

21 Oct, 2018

2 commits


30 Sep, 2018

1 commit

  • Introduce xarray value entries and tagged pointers to replace radix
    tree exceptional entries. This is a slight change in encoding to allow
    the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
    value entry). It is also a change in emphasis; exceptional entries are
    intimidating and different. As the comment explains, you can choose
    to store values or pointers in the xarray and they are both first-class
    citizens.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Josef Bacik

    Matthew Wilcox
     

24 Aug, 2018

1 commit

  • Use new return type vm_fault_t for fault handler. For now, this is just
    documenting that the function returns a VM_FAULT value rather than an
    errno. Once all instances are converted, vm_fault_t will become a
    distinct type.

    Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")

    The aim is to change the return type of finish_fault() and
    handle_mm_fault() to vm_fault_t type. As part of that clean up return
    type of all other recursively called functions have been changed to
    vm_fault_t type.

    The places from where handle_mm_fault() is getting invoked will be
    change to vm_fault_t type but in a separate patch.

    vmf_error() is the newly introduce inline function in 4.17-rc6.

    [akpm@linux-foundation.org: don't shadow outer local `ret' in __do_huge_pmd_anonymous_page()]
    Link: http://lkml.kernel.org/r/20180604171727.GA20279@jordon-HP-15-Notebook-PC
    Signed-off-by: Souptick Joarder
    Reviewed-by: Matthew Wilcox
    Reviewed-by: Andrew Morton
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Souptick Joarder
     

18 Aug, 2018

3 commits

  • khugepaged_enter_vma_merge() passes a stale vma->vm_flags to
    hugepage_vma_check(). The argument vm_flags contains the latest value.
    Therefore, it is necessary to pass this vm_flags into
    hugepage_vma_check().

    With this bug, madvise(MADV_HUGEPAGE) for mmap files in shmem fails to
    put memory in huge pages. Here is an example of failed madvise():

    /* mount /dev/shm with huge=advise:
    * mount -o remount,huge=advise /dev/shm */
    /* create file /dev/shm/huge */
    #define HUGE_FILE "/dev/shm/huge"

    fd = open(HUGE_FILE, O_RDONLY);
    ptr = mmap(NULL, FILE_SIZE, PROT_READ, MAP_PRIVATE, fd, 0);
    ret = madvise(ptr, FILE_SIZE, MADV_HUGEPAGE);

    madvise() will return 0, but this memory region is never put in huge
    page (check from /proc/meminfo: ShmemHugePages).

    Link: http://lkml.kernel.org/r/20180629181752.792831-1-songliubraving@fb.com
    Fixes: 02b75dc8160d ("mm: thp: register mm for khugepaged when merging vma for shmem")
    Signed-off-by: Song Liu
    Reviewed-by: Rik van Riel
    Reviewed-by: Yang Shi
    Cc: Kirill A. Shutemov
    Cc: Hugh Dickins
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     
  • /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed is used
    to record the counter of collapsed THP, but it just gets inc'ed in
    anonymous THP collapse path, do this for shmem THP collapse too.

    Link: http://lkml.kernel.org/r/1529622949-75504-2-git-send-email-yang.shi@linux.alibaba.com
    Signed-off-by: Yang Shi
    Acked-by: Kirill A. Shutemov
    Cc: Hugh Dickins
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     
  • When merging anonymous page vma, if the size of the vma can fit in at
    least one hugepage, the mm will be registered for khugepaged for
    collapsing THP in the future.

    But it skips shmem vmas. Do so for shmem also, but not for file-private
    mappings when merging a vma in order to increase the odds of collapsing
    a hugepage via khugepaged.

    hugepage_vma_check() sounds like a good fit to do the check. And move
    the definition of it before khugepaged_enter_vma_merge() to avoid a
    build error.

    Link: http://lkml.kernel.org/r/1529697791-6950-1-git-send-email-yang.shi@linux.alibaba.com
    Signed-off-by: Yang Shi
    Acked-by: Kirill A. Shutemov
    Cc: Hugh Dickins
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     

12 Apr, 2018

1 commit

  • Remove the address_space ->tree_lock and use the xa_lock newly added to
    the radix_tree_root. Rename the address_space ->page_tree to ->i_pages,
    since we don't really care that it's a tree.

    [willy@infradead.org: fix nds32, fs/dax.c]
    Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Acked-by: Jeff Layton
    Cc: Darrick J. Wong
    Cc: Dave Chinner
    Cc: Ryusuke Konishi
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox