13 Aug, 2020

1 commit

  • In current implementation, newly created or swap-in anonymous page is
    started on active list. Growing active list results in rebalancing
    active/inactive list so old pages on active list are demoted to inactive
    list. Hence, the page on active list isn't protected at all.

    Following is an example of this situation.

    Assume that 50 hot pages on active list. Numbers denote the number of
    pages on active/inactive list (active | inactive).

    1. 50 hot pages on active list
    50(h) | 0

    2. workload: 50 newly created (used-once) pages
    50(uo) | 50(h)

    3. workload: another 50 newly created (used-once) pages
    50(uo) | 50(uo), swap-out 50(h)

    This patch tries to fix this issue. Like as file LRU, newly created or
    swap-in anonymous pages will be inserted to the inactive list. They are
    promoted to active list if enough reference happens. This simple
    modification changes the above example as following.

    1. 50 hot pages on active list
    50(h) | 0

    2. workload: 50 newly created (used-once) pages
    50(h) | 50(uo)

    3. workload: another 50 newly created (used-once) pages
    50(h) | 50(uo), swap-out 50(uo)

    As you can see, hot pages on active list would be protected.

    Note that, this implementation has a drawback that the page cannot be
    promoted and will be swapped-out if re-access interval is greater than the
    size of inactive list but less than the size of total(active+inactive).
    To solve this potential issue, following patch will apply workingset
    detection similar to the one that's already applied to file LRU.

    Signed-off-by: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    Cc: Hugh Dickins
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Minchan Kim
    Link: http://lkml.kernel.org/r/1595490560-15117-3-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

10 Jun, 2020

2 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

04 Jun, 2020

4 commits

  • Swapin faults were the last event to charge pages after they had already
    been put on the LRU list. Now that we charge directly on swapin, the
    lrucare portion of the charge code is unused.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Reviewed-by: Joonsoo Kim
    Cc: Alex Shi
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Roman Gushchin
    Cc: Balbir Singh
    Cc: Shakeel Butt
    Link: http://lkml.kernel.org/r/20200508183105.225460-19-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • With the page->mapping requirement gone from memcg, we can charge anon and
    file-thp pages in one single step, right after they're allocated.

    This removes two out of three API calls - especially the tricky commit
    step that needed to happen at just the right time between when the page is
    "set up" and when it's "published" - somewhat vague and fluid concepts
    that varied by page type. All we need is a freshly allocated page and a
    memcg context to charge.

    v2: prevent double charges on pre-allocated hugepages in khugepaged

    [hannes@cmpxchg.org: Fix crash - *hpage could be ERR_PTR instead of NULL]
    Link: http://lkml.kernel.org/r/20200512215813.GA487759@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Reviewed-by: Joonsoo Kim
    Cc: Alex Shi
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Roman Gushchin
    Cc: Shakeel Butt
    Cc: Balbir Singh
    Cc: Qian Cai
    Link: http://lkml.kernel.org/r/20200508183105.225460-13-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Memcg maintains a private MEMCG_RSS counter. This divergence from the
    generic VM accounting means unnecessary code overhead, and creates a
    dependency for memcg that page->mapping is set up at the time of charging,
    so that page types can be told apart.

    Convert the generic accounting sites to mod_lruvec_page_state and friends
    to maintain the per-cgroup vmstat counter of NR_ANON_MAPPED. We use
    lock_page_memcg() to stabilize page->mem_cgroup during rmap changes, the
    same way we do for NR_FILE_MAPPED.

    With the previous patch removing MEMCG_CACHE and the private NR_SHMEM
    counter, this patch finally eliminates the need to have page->mapping set
    up at charge time. However, we need to have page->mem_cgroup set up by
    the time rmap runs and does the accounting, so switch the commit and the
    rmap callbacks around.

    v2: fix temporary accounting bug by switching rmapcommit (Joonsoo)

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Cc: Alex Shi
    Cc: Hugh Dickins
    Cc: Joonsoo Kim
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Roman Gushchin
    Cc: Shakeel Butt
    Cc: Balbir Singh
    Link: http://lkml.kernel.org/r/20200508183105.225460-11-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The memcg charging API carries a boolean @compound parameter that tells
    whether the page we're dealing with is a hugepage.
    mem_cgroup_commit_charge() has another boolean @lrucare that indicates
    whether the page needs LRU locking or not while charging. The majority of
    callsites know those parameters at compile time, which results in a lot of
    naked "false, false" argument lists. This makes for cryptic code and is a
    breeding ground for subtle mistakes.

    Thankfully, the huge page state can be inferred from the page itself and
    doesn't need to be passed along. This is safe because charging completes
    before the page is published and somebody may split it.

    Simplify the callsites by removing @compound, and let memcg infer the
    state by using hpage_nr_pages() unconditionally. That function does
    PageTransHuge() to identify huge pages, which also helpfully asserts that
    nobody passes in tail pages by accident.

    The following patches will introduce a new charging API, best not to carry
    over unnecessary weight.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Reviewed-by: Alex Shi
    Reviewed-by: Joonsoo Kim
    Reviewed-by: Shakeel Butt
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Roman Gushchin
    Cc: Balbir Singh
    Link: http://lkml.kernel.org/r/20200508183105.225460-4-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

08 Apr, 2020

3 commits

  • Add API to enable/disable writeprotect a vma range. Unlike mprotect, this
    doesn't split/merge vmas.

    [peterx@redhat.com:
    - use the helper to find VMA;
    - return -ENOENT if not found to match mcopy case;
    - use the new MM_CP_UFFD_WP* flags for change_protection
    - check against mmap_changing for failures
    - replace find_dst_vma with vma_find_uffd]
    Signed-off-by: Shaohua Li
    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: Jerome Glisse
    Reviewed-by: Mike Rapoport
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: Kirill A. Shutemov
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Bobby Powers
    Cc: Brian Geffon
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Maya Gokhale
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Link: http://lkml.kernel.org/r/20200220163112.11409-13-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • Firstly, introduce two new flags MM_CP_UFFD_WP[_RESOLVE] for
    change_protection() when used with uffd-wp and make sure the two new flags
    are exclusively used. Then,

    - For MM_CP_UFFD_WP: apply the _PAGE_UFFD_WP bit and remove _PAGE_RW
    when a range of memory is write protected by uffd

    - For MM_CP_UFFD_WP_RESOLVE: remove the _PAGE_UFFD_WP bit and recover
    _PAGE_RW when write protection is resolved from userspace

    And use this new interface in mwriteprotect_range() to replace the old
    MM_CP_DIRTY_ACCT.

    Do this change for both PTEs and huge PMDs. Then we can start to identify
    which PTE/PMD is write protected by general (e.g., COW or soft dirty
    tracking), and which is for userfaultfd-wp.

    Since we should keep the _PAGE_UFFD_WP when doing pte_modify(), add it
    into _PAGE_CHG_MASK as well. Meanwhile, since we have this new bit, we
    can be even more strict when detecting uffd-wp page faults in either
    do_wp_page() or wp_huge_pmd().

    After we're with _PAGE_UFFD_WP, a special case is when a page is both
    protected by the general COW logic and also userfault-wp. Here the
    userfault-wp will have higher priority and will be handled first. Only
    after the uffd-wp bit is cleared on the PTE/PMD will we continue to handle
    the general COW. These are the steps on what will happen with such a
    page:

    1. CPU accesses write protected shared page (so both protected by
    general COW and uffd-wp), blocked by uffd-wp first because in
    do_wp_page we'll handle uffd-wp first, so it has higher priority
    than general COW.

    2. Uffd service thread receives the request, do UFFDIO_WRITEPROTECT
    to remove the uffd-wp bit upon the PTE/PMD. However here we
    still keep the write bit cleared. Notify the blocked CPU.

    3. The blocked CPU resumes the page fault process with a fault
    retry, during retry it'll notice it was not with the uffd-wp bit
    this time but it is still write protected by general COW, then
    it'll go though the COW path in the fault handler, copy the page,
    apply write bit where necessary, and retry again.

    4. The CPU will be able to access this page with write bit set.

    Suggested-by: Andrea Arcangeli
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Cc: Brian Geffon
    Cc: Pavel Emelyanov
    Cc: Mike Kravetz
    Cc: David Hildenbrand
    Cc: Martin Cracauer
    Cc: Mel Gorman
    Cc: Bobby Powers
    Cc: Mike Rapoport
    Cc: "Kirill A . Shutemov"
    Cc: Maya Gokhale
    Cc: Johannes Weiner
    Cc: Marty McFadden
    Cc: Denis Plotnikov
    Cc: Hugh Dickins
    Cc: "Dr . David Alan Gilbert"
    Cc: Jerome Glisse
    Cc: Rik van Riel
    Cc: Shaohua Li
    Link: http://lkml.kernel.org/r/20200220163112.11409-8-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • This allows UFFDIO_COPY to map pages write-protected.

    [peterx@redhat.com: switch to VM_WARN_ON_ONCE in mfill_atomic_pte; add brackets
    around "dst_vma->vm_flags & VM_WRITE"; fix wordings in comments and
    commit messages]
    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: Jerome Glisse
    Reviewed-by: Mike Rapoport
    Cc: Bobby Powers
    Cc: Brian Geffon
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Cc: Rik van Riel
    Cc: Shaohua Li
    Link: http://lkml.kernel.org/r/20200220163112.11409-6-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

03 Apr, 2020

1 commit

  • Patch series "hugetlbfs: use i_mmap_rwsem for more synchronization", v2.

    While discussing the issue with huge_pte_offset [1], I remembered that
    there were more outstanding hugetlb races. These issues are:

    1) For shared pmds, huge PTE pointers returned by huge_pte_alloc can become
    invalid via a call to huge_pmd_unshare by another thread.
    2) hugetlbfs page faults can race with truncation causing invalid global
    reserve counts and state.

    A previous attempt was made to use i_mmap_rwsem in this manner as
    described at [2]. However, those patches were reverted starting with [3]
    due to locking issues.

    To effectively use i_mmap_rwsem to address the above issues it needs to be
    held (in read mode) during page fault processing. However, during fault
    processing we need to lock the page we will be adding. Lock ordering
    requires we take page lock before i_mmap_rwsem. Waiting until after
    taking the page lock is too late in the fault process for the
    synchronization we want to do.

    To address this lock ordering issue, the following patches change the lock
    ordering for hugetlb pages. This is not too invasive as hugetlbfs
    processing is done separate from core mm in many places. However, I don't
    really like this idea. Much ugliness is contained in the new routine
    hugetlb_page_mapping_lock_write() of patch 1.

    The only other way I can think of to address these issues is by catching
    all the races. After catching a race, cleanup, backout, retry ... etc,
    as needed. This can get really ugly, especially for huge page
    reservations. At one time, I started writing some of the reservation
    backout code for page faults and it got so ugly and complicated I went
    down the path of adding synchronization to avoid the races. Any other
    suggestions would be welcome.

    [1] https://lore.kernel.org/linux-mm/1582342427-230392-1-git-send-email-longpeng2@huawei.com/
    [2] https://lore.kernel.org/linux-mm/20181222223013.22193-1-mike.kravetz@oracle.com/
    [3] https://lore.kernel.org/linux-mm/20190103235452.29335-1-mike.kravetz@oracle.com
    [4] https://lore.kernel.org/linux-mm/1584028670.7365.182.camel@lca.pw/
    [5] https://lore.kernel.org/lkml/20200312183142.108df9ac@canb.auug.org.au/

    This patch (of 2):

    While looking at BUGs associated with invalid huge page map counts, it was
    discovered and observed that a huge pte pointer could become 'invalid' and
    point to another task's page table. Consider the following:

    A task takes a page fault on a shared hugetlbfs file and calls
    huge_pte_alloc to get a ptep. Suppose the returned ptep points to a
    shared pmd.

    Now, another task truncates the hugetlbfs file. As part of truncation, it
    unmaps everyone who has the file mapped. If the range being truncated is
    covered by a shared pmd, huge_pmd_unshare will be called. For all but the
    last user of the shared pmd, huge_pmd_unshare will clear the pud pointing
    to the pmd. If the task in the middle of the page fault is not the last
    user, the ptep returned by huge_pte_alloc now points to another task's
    page table or worse. This leads to bad things such as incorrect page
    map/reference counts or invalid memory references.

    To fix, expand the use of i_mmap_rwsem as follows:
    - i_mmap_rwsem is held in read mode whenever huge_pmd_share is called.
    huge_pmd_share is only called via huge_pte_alloc, so callers of
    huge_pte_alloc take i_mmap_rwsem before calling. In addition, callers
    of huge_pte_alloc continue to hold the semaphore until finished with
    the ptep.
    - i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called.

    One problem with this scheme is that it requires taking i_mmap_rwsem
    before taking the page lock during page faults. This is not the order
    specified in the rest of mm code. Handling of hugetlbfs pages is mostly
    isolated today. Therefore, we use this alternative locking order for
    PageHuge() pages.

    mapping->i_mmap_rwsem
    hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
    page->flags PG_locked (lock_page)

    To help with lock ordering issues, hugetlb_page_mapping_lock_write() is
    introduced to write lock the i_mmap_rwsem associated with a page.

    In most cases it is easy to get address_space via vma->vm_file->f_mapping.
    However, in the case of migration or memory errors for anon pages we do
    not have an associated vma. A new routine _get_hugetlb_page_mapping()
    will use anon_vma to get address_space in these cases.

    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrew Morton
    Cc: Michal Hocko
    Cc: Hugh Dickins
    Cc: Naoya Horiguchi
    Cc: "Aneesh Kumar K . V"
    Cc: Andrea Arcangeli
    Cc: "Kirill A . Shutemov"
    Cc: Davidlohr Bueso
    Cc: Prakash Sangappa
    Link: http://lkml.kernel.org/r/20200316205756.146666-2-mike.kravetz@oracle.com
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

02 Dec, 2019

6 commits

  • There are several places emphasise the effect of __SetPageUptodate(),
    while the comment seems to have a typo in two places.

    Link: http://lkml.kernel.org/r/20190926023705.7226-1-richardw.yang@linux.intel.com
    Signed-off-by: Wei Yang
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • When doing UFFDIO_COPY, it is necessary to find the correct destination
    vma and make sure fault range is in it.

    Since there are two places need to do the same task, just wrap those
    common check into an inlined function.

    Link: http://lkml.kernel.org/r/20190927070032.2129-3-richardw.yang@linux.intel.com
    Signed-off-by: Wei Yang
    Reviewed-by: Mike Kravetz
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • These warning here is to make sure address(dst_addr) and length(len -
    copied) are huge page size aligned.

    While this is ensured by:

    dst_start and len is huge page size aligned
    dst_addr equals to dst_start and increase huge page size each time
    copied increase huge page size each time

    This means these warnings will never be triggered.

    Link: http://lkml.kernel.org/r/20190927070032.2129-2-richardw.yang@linux.intel.com
    Signed-off-by: Wei Yang
    Reviewed-by: Mike Kravetz
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • In __mcopy_atomic_hugetlb() we use two variables to deal with huge page
    size: vma_hpagesize and huge_page_size.

    Since they are the same, it is not necessary to use two different
    mechanism. This patch makes it consistent by all using vma_hpagesize.

    Link: http://lkml.kernel.org/r/20190927070032.2129-1-richardw.yang@linux.intel.com
    Signed-off-by: Wei Yang
    Reviewed-by: Mike Kravetz
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • The first parameter hstate in function hugetlb_fault_mutex_hash() is not
    used anymore.

    This patch removes it.

    [akpm@linux-foundation.org: various build fixes]
    [cai@lca.pw: fix a GCC compilation warning]
    Link: http://lkml.kernel.org/r/1570544108-32331-1-git-send-email-cai@lca.pw
    Link: http://lkml.kernel.org/r/20191005003302.785-1-richardw.yang@linux.intel.com
    Signed-off-by: Wei Yang
    Signed-off-by: Qian Cai
    Suggested-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Cc: Mike Kravetz
    Cc: Hugh Dickins
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • A new clang diagnostic (-Wsizeof-array-div) warns about the calculation
    to determine the number of u32's in an array of unsigned longs.
    Suppress warning by adding parentheses.

    While looking at the above issue, noticed that the 'address' parameter
    to hugetlb_fault_mutex_hash is no longer used. So, remove it from the
    definition and all callers.

    No functional change.

    Link: http://lkml.kernel.org/r/20190919011847.18400-1-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz
    Reported-by: Nathan Chancellor
    Reviewed-by: Nathan Chancellor
    Reviewed-by: Davidlohr Bueso
    Reviewed-by: Andrew Morton
    Cc: Nick Desaulniers
    Cc: Ilie Halip
    Cc: David Bolvansky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

19 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this work is licensed under the terms of the gnu gpl version 2 see
    the copying file in the top level directory

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 35 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kate Stewart
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.797835076@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

15 May, 2019

1 commit

  • hugetlb uses a fault mutex hash table to prevent page faults of the
    same pages concurrently. The key for shared and private mappings is
    different. Shared keys off address_space and file index. Private keys
    off mm and virtual address. Consider a private mappings of a populated
    hugetlbfs file. A fault will map the page from the file and if needed
    do a COW to map a writable page.

    Hugetlbfs hole punch uses the fault mutex to prevent mappings of file
    pages. It uses the address_space file index key. However, private
    mappings will use a different key and could race with this code to map
    the file page. This causes problems (BUG) for the page cache remove
    code as it expects the page to be unmapped. A sample stack is:

    page dumped because: VM_BUG_ON_PAGE(page_mapped(page))
    kernel BUG at mm/filemap.c:169!
    ...
    RIP: 0010:unaccount_page_cache_page+0x1b8/0x200
    ...
    Call Trace:
    __delete_from_page_cache+0x39/0x220
    delete_from_page_cache+0x45/0x70
    remove_inode_hugepages+0x13c/0x380
    ? __add_to_page_cache_locked+0x162/0x380
    hugetlbfs_fallocate+0x403/0x540
    ? _cond_resched+0x15/0x30
    ? __inode_security_revalidate+0x5d/0x70
    ? selinux_file_permission+0x100/0x130
    vfs_fallocate+0x13f/0x270
    ksys_fallocate+0x3c/0x80
    __x64_sys_fallocate+0x1a/0x20
    do_syscall_64+0x5b/0x180
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    There seems to be another potential COW issue/race with this approach
    of different private and shared keys as noted in commit 8382d914ebf7
    ("mm, hugetlb: improve page-fault scalability").

    Since every hugetlb mapping (even anon and private) is actually a file
    mapping, just use the address_space index key for all mappings. This
    results in potentially more hash collisions. However, this should not
    be the common case.

    Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com
    Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5
    Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages")
    Signed-off-by: Mike Kravetz
    Reviewed-by: Naoya Horiguchi
    Reviewed-by: Davidlohr Bueso
    Cc: Joonsoo Kim
    Cc: "Kirill A . Shutemov"
    Cc: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

09 Jan, 2019

1 commit

  • This reverts b43a9990055958e70347c56f90ea2ae32c67334c

    The reverted commit caused issues with migration and poisoning of anon
    huge pages. The LTP move_pages12 test will cause an "unable to handle
    kernel NULL pointer" BUG would occur with stack similar to:

    RIP: 0010:down_write+0x1b/0x40
    Call Trace:
    migrate_pages+0x81f/0xb90
    __ia32_compat_sys_migrate_pages+0x190/0x190
    do_move_pages_to_node.isra.53.part.54+0x2a/0x50
    kernel_move_pages+0x566/0x7b0
    __x64_sys_move_pages+0x24/0x30
    do_syscall_64+0x5b/0x180
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    The purpose of the reverted patch was to fix some long existing races
    with huge pmd sharing. It used i_mmap_rwsem for this purpose with the
    idea that this could also be used to address truncate/page fault races
    with another patch. Further analysis has determined that i_mmap_rwsem
    can not be used to address all these hugetlbfs synchronization issues.
    Therefore, revert this patch while working an another approach to the
    underlying issues.

    Link: http://lkml.kernel.org/r/20190103235452.29335-2-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz
    Reported-by: Jan Stancek
    Cc: Michal Hocko
    Cc: Hugh Dickins
    Cc: Naoya Horiguchi
    Cc: "Aneesh Kumar K . V"
    Cc: Andrea Arcangeli
    Cc: "Kirill A . Shutemov"
    Cc: Davidlohr Bueso
    Cc: Prakash Sangappa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

05 Jan, 2019

1 commit

  • Patch series "Add support for fast mremap".

    This series speeds up the mremap(2) syscall by copying page tables at
    the PMD level even for non-THP systems. There is concern that the extra
    'address' argument that mremap passes to pte_alloc may do something
    subtle architecture related in the future that may make the scheme not
    work. Also we find that there is no point in passing the 'address' to
    pte_alloc since its unused. This patch therefore removes this argument
    tree-wide resulting in a nice negative diff as well. Also ensuring
    along the way that the enabled architectures do not do anything funky
    with the 'address' argument that goes unnoticed by the optimization.

    Build and boot tested on x86-64. Build tested on arm64. The config
    enablement patch for arm64 will be posted in the future after more
    testing.

    The changes were obtained by applying the following Coccinelle script.
    (thanks Julia for answering all Coccinelle questions!).
    Following fix ups were done manually:
    * Removal of address argument from pte_fragment_alloc
    * Removal of pte_alloc_one_fast definitions from m68k and microblaze.

    // Options: --include-headers --no-includes
    // Note: I split the 'identifier fn' line, so if you are manually
    // running it, please unsplit it so it runs for you.

    virtual patch

    @pte_alloc_func_def depends on patch exists@
    identifier E2;
    identifier fn =~
    "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
    type T2;
    @@

    fn(...
    - , T2 E2
    )
    { ... }

    @pte_alloc_func_proto_noarg depends on patch exists@
    type T1, T2, T3, T4;
    identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
    @@

    (
    - T3 fn(T1, T2);
    + T3 fn(T1);
    |
    - T3 fn(T1, T2, T4);
    + T3 fn(T1, T2);
    )

    @pte_alloc_func_proto depends on patch exists@
    identifier E1, E2, E4;
    type T1, T2, T3, T4;
    identifier fn =~
    "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
    @@

    (
    - T3 fn(T1 E1, T2 E2);
    + T3 fn(T1 E1);
    |
    - T3 fn(T1 E1, T2 E2, T4 E4);
    + T3 fn(T1 E1, T2 E2);
    )

    @pte_alloc_func_call depends on patch exists@
    expression E2;
    identifier fn =~
    "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
    @@

    fn(...
    -, E2
    )

    @pte_alloc_macro depends on patch exists@
    identifier fn =~
    "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
    identifier a, b, c;
    expression e;
    position p;
    @@

    (
    - #define fn(a, b, c) e
    + #define fn(a, b) e
    |
    - #define fn(a, b) e
    + #define fn(a) e
    )

    Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.com
    Signed-off-by: Joel Fernandes (Google)
    Suggested-by: Kirill A. Shutemov
    Acked-by: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Julia Lawall
    Cc: Kirill A. Shutemov
    Cc: William Kucharski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joel Fernandes (Google)
     

29 Dec, 2018

1 commit

  • While looking at BUGs associated with invalid huge page map counts, it was
    discovered and observed that a huge pte pointer could become 'invalid' and
    point to another task's page table. Consider the following:

    A task takes a page fault on a shared hugetlbfs file and calls
    huge_pte_alloc to get a ptep. Suppose the returned ptep points to a
    shared pmd.

    Now, another task truncates the hugetlbfs file. As part of truncation, it
    unmaps everyone who has the file mapped. If the range being truncated is
    covered by a shared pmd, huge_pmd_unshare will be called. For all but the
    last user of the shared pmd, huge_pmd_unshare will clear the pud pointing
    to the pmd. If the task in the middle of the page fault is not the last
    user, the ptep returned by huge_pte_alloc now points to another task's
    page table or worse. This leads to bad things such as incorrect page
    map/reference counts or invalid memory references.

    To fix, expand the use of i_mmap_rwsem as follows:

    - i_mmap_rwsem is held in read mode whenever huge_pmd_share is called.
    huge_pmd_share is only called via huge_pte_alloc, so callers of
    huge_pte_alloc take i_mmap_rwsem before calling. In addition, callers
    of huge_pte_alloc continue to hold the semaphore until finished with the
    ptep.

    - i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is
    called.

    [mike.kravetz@oracle.com: add explicit check for mapping != null]
    Link: http://lkml.kernel.org/r/20181218223557.5202-2-mike.kravetz@oracle.com
    Fixes: 39dde65c9940 ("shared page table for hugetlb page")
    Signed-off-by: Mike Kravetz
    Acked-by: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Hugh Dickins
    Cc: Naoya Horiguchi
    Cc: "Aneesh Kumar K . V"
    Cc: Andrea Arcangeli
    Cc: Davidlohr Bueso
    Cc: Prakash Sangappa
    Cc: Colin Ian King
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

01 Dec, 2018

4 commits

  • With MAP_SHARED: recheck the i_size after taking the PT lock, to
    serialize against truncate with the PT lock. Delete the page from the
    pagecache if the i_size_read check fails.

    With MAP_PRIVATE: check the i_size after the PT lock before mapping
    anonymous memory or zeropages into the MAP_PRIVATE shmem mapping.

    A mostly irrelevant cleanup: like we do the delete_from_page_cache()
    pagecache removal after dropping the PT lock, the PT lock is a spinlock
    so drop it before the sleepable page lock.

    Link: http://lkml.kernel.org/r/20181126173452.26955-5-aarcange@redhat.com
    Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
    Signed-off-by: Andrea Arcangeli
    Reviewed-by: Mike Rapoport
    Reviewed-by: Hugh Dickins
    Reported-by: Jann Horn
    Cc:
    Cc: "Dr. David Alan Gilbert"
    Cc: Mike Kravetz
    Cc: Peter Xu
    Cc: stable@vger.kernel.org
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • After the VMA to register the uffd onto is found, check that it has
    VM_MAYWRITE set before allowing registration. This way we inherit all
    common code checks before allowing to fill file holes in shmem and
    hugetlbfs with UFFDIO_COPY.

    The userfaultfd memory model is not applicable for readonly files unless
    it's a MAP_PRIVATE.

    Link: http://lkml.kernel.org/r/20181126173452.26955-4-aarcange@redhat.com
    Fixes: ff62a3421044 ("hugetlb: implement memfd sealing")
    Signed-off-by: Andrea Arcangeli
    Reviewed-by: Mike Rapoport
    Reviewed-by: Hugh Dickins
    Reported-by: Jann Horn
    Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
    Cc:
    Cc: "Dr. David Alan Gilbert"
    Cc: Mike Kravetz
    Cc: Peter Xu
    Cc: stable@vger.kernel.org
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Userfaultfd did not create private memory when UFFDIO_COPY was invoked
    on a MAP_PRIVATE shmem mapping. Instead it wrote to the shmem file,
    even when that had not been opened for writing. Though, fortunately,
    that could only happen where there was a hole in the file.

    Fix the shmem-backed implementation of UFFDIO_COPY to create private
    memory for MAP_PRIVATE mappings. The hugetlbfs-backed implementation
    was already correct.

    This change is visible to userland, if userfaultfd has been used in
    unintended ways: so it introduces a small risk of incompatibility, but
    is necessary in order to respect file permissions.

    An app that uses UFFDIO_COPY for anything like postcopy live migration
    won't notice the difference, and in fact it'll run faster because there
    will be no copy-on-write and memory waste in the tmpfs pagecache
    anymore.

    Userfaults on MAP_PRIVATE shmem keep triggering only on file holes like
    before.

    The real zeropage can also be built on a MAP_PRIVATE shmem mapping
    through UFFDIO_ZEROPAGE and that's safe because the zeropage pte is
    never dirty, in turn even an mprotect upgrading the vma permission from
    PROT_READ to PROT_READ|PROT_WRITE won't make the zeropage pte writable.

    Link: http://lkml.kernel.org/r/20181126173452.26955-3-aarcange@redhat.com
    Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
    Signed-off-by: Andrea Arcangeli
    Reported-by: Mike Rapoport
    Reviewed-by: Hugh Dickins
    Cc:
    Cc: "Dr. David Alan Gilbert"
    Cc: Jann Horn
    Cc: Mike Kravetz
    Cc: Peter Xu
    Cc: stable@vger.kernel.org
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Patch series "userfaultfd shmem updates".

    Jann found two bugs in the userfaultfd shmem MAP_SHARED backend: the
    lack of the VM_MAYWRITE check and the lack of i_size checks.

    Then looking into the above we also fixed the MAP_PRIVATE case.

    Hugh by source review also found a data loss source if UFFDIO_COPY is
    used on shmem MAP_SHARED PROT_READ mappings (the production usages
    incidentally run with PROT_READ|PROT_WRITE, so the data loss couldn't
    happen in those production usages like with QEMU).

    The whole patchset is marked for stable.

    We verified QEMU postcopy live migration with guest running on shmem
    MAP_PRIVATE run as well as before after the fix of shmem MAP_PRIVATE.
    Regardless if it's shmem or hugetlbfs or MAP_PRIVATE or MAP_SHARED, QEMU
    unconditionally invokes a punch hole if the guest mapping is filebacked
    and a MADV_DONTNEED too (needed to get rid of the MAP_PRIVATE COWs and
    for the anon backend).

    This patch (of 5):

    We internally used EFAULT to communicate with the caller, switch to
    ENOENT, so EFAULT can be used as a non internal retval.

    Link: http://lkml.kernel.org/r/20181126173452.26955-2-aarcange@redhat.com
    Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
    Signed-off-by: Andrea Arcangeli
    Reviewed-by: Mike Rapoport
    Reviewed-by: Hugh Dickins
    Cc: Mike Kravetz
    Cc: Jann Horn
    Cc: Peter Xu
    Cc: "Dr. David Alan Gilbert"
    Cc:
    Cc: stable@vger.kernel.org
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

08 Jun, 2018

1 commit

  • If a process monitored with userfaultfd changes it's memory mappings or
    forks() at the same time as uffd monitor fills the process memory with
    UFFDIO_COPY, the actual creation of page table entries and copying of
    the data in mcopy_atomic may happen either before of after the memory
    mapping modifications and there is no way for the uffd monitor to
    maintain consistent view of the process memory layout.

    For instance, let's consider fork() running in parallel with
    userfaultfd_copy():

    process | uffd monitor
    ---------------------------------+------------------------------
    fork() | userfaultfd_copy()
    ... | ...
    dup_mmap() | down_read(mmap_sem)
    down_write(mmap_sem) | /* create PTEs, copy data */
    dup_uffd() | up_read(mmap_sem)
    copy_page_range() |
    up_write(mmap_sem) |
    dup_uffd_complete() |
    /* notify monitor */ |

    If the userfaultfd_copy() takes the mmap_sem first, the new page(s) will
    be present by the time copy_page_range() is called and they will appear
    in the child's memory mappings. However, if the fork() is the first to
    take the mmap_sem, the new pages won't be mapped in the child's address
    space.

    If the pages are not present and child tries to access them, the monitor
    will get page fault notification and everything is fine. However, if
    the pages *are present*, the child can access them without uffd
    noticing. And if we copy them into child it'll see the wrong data.
    Since we are talking about background copy, we'd need to decide whether
    the pages should be copied or not regardless #PF notifications.

    Since userfaultfd monitor has no way to determine what was the order,
    let's disallow userfaultfd_copy in parallel with the non-cooperative
    events. In such case we return -EAGAIN and the uffd monitor can
    understand that userfaultfd_copy() clashed with a non-cooperative event
    and take an appropriate action.

    Link: http://lkml.kernel.org/r/1527061324-19949-1-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Pavel Emelyanov
    Cc: Andrea Arcangeli
    Cc: Mike Kravetz
    Cc: Andrei Vagin
    Signed-off-by: Andrew Morton

    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

07 Feb, 2018

1 commit

  • These duplicate includes have been found with scripts/checkincludes.pl but
    they have been removed manually to avoid removing false positives.

    Link: http://lkml.kernel.org/r/1512580957-6071-1-git-send-email-pravin.shedge4linux@gmail.com
    Signed-off-by: Pravin Shedge
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pravin Shedge
     

07 Sep, 2017

2 commits

  • For shmem VMAs we can use shmem_mfill_zeropage_pte for UFFDIO_ZEROPAGE

    Link: http://lkml.kernel.org/r/1497939652-16528-6-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: "Kirill A. Shutemov"
    Cc: Andrea Arcangeli
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Shuffle the code a bit to improve readability.

    Link: http://lkml.kernel.org/r/1497939652-16528-5-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: "Kirill A. Shutemov"
    Cc: Andrea Arcangeli
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

10 Mar, 2017

1 commit


02 Mar, 2017

1 commit


25 Feb, 2017

1 commit

  • The memory mapping of a process may change between #PF event and the
    call to mcopy_atomic that comes to resolve the page fault. In such
    case, there will be no VMA covering the range passed to mcopy_atomic or
    the VMA will not have userfaultfd context.

    To allow uffd monitor to distinguish those case from other errors, let's
    return -ENOENT instead of -EINVAL.

    Note, that despite availability of UFFD_EVENT_UNMAP there still might be
    race between the processing of UFFD_EVENT_UNMAP and outstanding
    mcopy_atomic in case of non-cooperative uffd usage.

    [rppt@linux.vnet.ibm.com: update cases returning -ENOENT]
    Link: http://lkml.kernel.org/r/20170207150249.GA6709@rapoport-lnx
    [aarcange@redhat.com: merge fix]
    [akpm@linux-foundation.org: fix the merge fix]
    Link: http://lkml.kernel.org/r/1485542673-24387-5-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Hillf Danton
    Cc: Andrea Arcangeli
    Cc: "Dr. David Alan Gilbert"
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

23 Feb, 2017

6 commits

  • When userfaultfd hugetlbfs support was originally added, it followed the
    pattern of anon mappings and did not support any vmas marked VM_SHARED.
    As such, support was only added for private mappings.

    Remove this limitation and support shared mappings. The primary
    functional change required is adding pages to the page cache. More subtle
    changes are required for huge page reservation handling in error paths. A
    lengthy comment in the code describes the reservation handling.

    [mike.kravetz@oracle.com: update]
    Link: http://lkml.kernel.org/r/c9c8cafe-baa7-05b4-34ea-1dfa5523a85f@oracle.com
    Link: http://lkml.kernel.org/r/1487195210-12839-1-git-send-email-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz
    Reviewed-by: Andrea Arcangeli
    Cc: Andrew Morton
    Cc: Mike Rapoport
    Cc: "Dr. David Alan Gilbert"
    Cc: Hillf Danton
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • The shmem_mcopy_atomic_pte implements low lever part of UFFDIO_COPY
    operation for shared memory VMAs. It's based on mcopy_atomic_pte with
    adjustments necessary for shared memory pages.

    Link: http://lkml.kernel.org/r/20161216144821.5183-32-aarcange@redhat.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrea Arcangeli
    Cc: "Dr. David Alan Gilbert"
    Cc: Hillf Danton
    Cc: Michael Rapoport
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • If __mcopy_atomic_hugetlb exits with an error, put_page will be called
    if a huge page was allocated and needs to be freed. If a reservation
    was associated with the huge page, the PagePrivate flag will be set.
    Clear PagePrivate before calling put_page/free_huge_page so that the
    global reservation count is not incremented.

    Link: http://lkml.kernel.org/r/20161216144821.5183-26-aarcange@redhat.com
    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrea Arcangeli
    Cc: "Dr. David Alan Gilbert"
    Cc: Hillf Danton
    Cc: Michael Rapoport
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • The new routine copy_huge_page_from_user() uses kmap_atomic() to map
    PAGE_SIZE pages. However, this prevents page faults in the subsequent
    call to copy_from_user(). This is OK in the case where the routine is
    copied with mmap_sema held. However, in another case we want to allow
    page faults. So, add a new argument allow_pagefault to indicate if the
    routine should allow page faults.

    [dan.carpenter@oracle.com: unmap the correct pointer]
    Link: http://lkml.kernel.org/r/20170113082608.GA3548@mwanda
    [akpm@linux-foundation.org: kunmap() takes a page*, per Hugh]
    Link: http://lkml.kernel.org/r/20161216144821.5183-20-aarcange@redhat.com
    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Dan Carpenter
    Cc: "Dr. David Alan Gilbert"
    Cc: Hillf Danton
    Cc: Michael Rapoport
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Cc: Hugh Dickins
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • __mcopy_atomic_hugetlb performs the UFFDIO_COPY operation for huge
    pages. It is based on the existing __mcopy_atomic routine for normal
    pages. Unlike normal pages, there is no huge page support for the
    UFFDIO_ZEROPAGE operation.

    Link: http://lkml.kernel.org/r/20161216144821.5183-19-aarcange@redhat.com
    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrea Arcangeli
    Cc: "Dr. David Alan Gilbert"
    Cc: Hillf Danton
    Cc: Michael Rapoport
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • Cleanup the vma->vm_ops usage.

    Side note: it would be more robust if vma_is_anonymous() would also
    check that vm_flags hasn't VM_PFNMAP set.

    Link: http://lkml.kernel.org/r/20161216144821.5183-5-aarcange@redhat.com
    Signed-off-by: Andrea Arcangeli
    Cc: "Dr. David Alan Gilbert"
    Cc: Hillf Danton
    Cc: Michael Rapoport
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov