07 Sep, 2017

2 commits

  • For shmem VMAs we can use shmem_mfill_zeropage_pte for UFFDIO_ZEROPAGE

    Link: http://lkml.kernel.org/r/1497939652-16528-6-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: "Kirill A. Shutemov"
    Cc: Andrea Arcangeli
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Shuffle the code a bit to improve readability.

    Link: http://lkml.kernel.org/r/1497939652-16528-5-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: "Kirill A. Shutemov"
    Cc: Andrea Arcangeli
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

10 Mar, 2017

1 commit


02 Mar, 2017

1 commit


25 Feb, 2017

1 commit

  • The memory mapping of a process may change between #PF event and the
    call to mcopy_atomic that comes to resolve the page fault. In such
    case, there will be no VMA covering the range passed to mcopy_atomic or
    the VMA will not have userfaultfd context.

    To allow uffd monitor to distinguish those case from other errors, let's
    return -ENOENT instead of -EINVAL.

    Note, that despite availability of UFFD_EVENT_UNMAP there still might be
    race between the processing of UFFD_EVENT_UNMAP and outstanding
    mcopy_atomic in case of non-cooperative uffd usage.

    [rppt@linux.vnet.ibm.com: update cases returning -ENOENT]
    Link: http://lkml.kernel.org/r/20170207150249.GA6709@rapoport-lnx
    [aarcange@redhat.com: merge fix]
    [akpm@linux-foundation.org: fix the merge fix]
    Link: http://lkml.kernel.org/r/1485542673-24387-5-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Hillf Danton
    Cc: Andrea Arcangeli
    Cc: "Dr. David Alan Gilbert"
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

23 Feb, 2017

6 commits

  • When userfaultfd hugetlbfs support was originally added, it followed the
    pattern of anon mappings and did not support any vmas marked VM_SHARED.
    As such, support was only added for private mappings.

    Remove this limitation and support shared mappings. The primary
    functional change required is adding pages to the page cache. More subtle
    changes are required for huge page reservation handling in error paths. A
    lengthy comment in the code describes the reservation handling.

    [mike.kravetz@oracle.com: update]
    Link: http://lkml.kernel.org/r/c9c8cafe-baa7-05b4-34ea-1dfa5523a85f@oracle.com
    Link: http://lkml.kernel.org/r/1487195210-12839-1-git-send-email-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz
    Reviewed-by: Andrea Arcangeli
    Cc: Andrew Morton
    Cc: Mike Rapoport
    Cc: "Dr. David Alan Gilbert"
    Cc: Hillf Danton
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • The shmem_mcopy_atomic_pte implements low lever part of UFFDIO_COPY
    operation for shared memory VMAs. It's based on mcopy_atomic_pte with
    adjustments necessary for shared memory pages.

    Link: http://lkml.kernel.org/r/20161216144821.5183-32-aarcange@redhat.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrea Arcangeli
    Cc: "Dr. David Alan Gilbert"
    Cc: Hillf Danton
    Cc: Michael Rapoport
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • If __mcopy_atomic_hugetlb exits with an error, put_page will be called
    if a huge page was allocated and needs to be freed. If a reservation
    was associated with the huge page, the PagePrivate flag will be set.
    Clear PagePrivate before calling put_page/free_huge_page so that the
    global reservation count is not incremented.

    Link: http://lkml.kernel.org/r/20161216144821.5183-26-aarcange@redhat.com
    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrea Arcangeli
    Cc: "Dr. David Alan Gilbert"
    Cc: Hillf Danton
    Cc: Michael Rapoport
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • The new routine copy_huge_page_from_user() uses kmap_atomic() to map
    PAGE_SIZE pages. However, this prevents page faults in the subsequent
    call to copy_from_user(). This is OK in the case where the routine is
    copied with mmap_sema held. However, in another case we want to allow
    page faults. So, add a new argument allow_pagefault to indicate if the
    routine should allow page faults.

    [dan.carpenter@oracle.com: unmap the correct pointer]
    Link: http://lkml.kernel.org/r/20170113082608.GA3548@mwanda
    [akpm@linux-foundation.org: kunmap() takes a page*, per Hugh]
    Link: http://lkml.kernel.org/r/20161216144821.5183-20-aarcange@redhat.com
    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Dan Carpenter
    Cc: "Dr. David Alan Gilbert"
    Cc: Hillf Danton
    Cc: Michael Rapoport
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Cc: Hugh Dickins
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • __mcopy_atomic_hugetlb performs the UFFDIO_COPY operation for huge
    pages. It is based on the existing __mcopy_atomic routine for normal
    pages. Unlike normal pages, there is no huge page support for the
    UFFDIO_ZEROPAGE operation.

    Link: http://lkml.kernel.org/r/20161216144821.5183-19-aarcange@redhat.com
    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrea Arcangeli
    Cc: "Dr. David Alan Gilbert"
    Cc: Hillf Danton
    Cc: Michael Rapoport
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • Cleanup the vma->vm_ops usage.

    Side note: it would be more robust if vma_is_anonymous() would also
    check that vm_flags hasn't VM_PFNMAP set.

    Link: http://lkml.kernel.org/r/20161216144821.5183-5-aarcange@redhat.com
    Signed-off-by: Andrea Arcangeli
    Cc: "Dr. David Alan Gilbert"
    Cc: Hillf Danton
    Cc: Michael Rapoport
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

18 Mar, 2016

1 commit

  • There are few things about *pte_alloc*() helpers worth cleaning up:

    - 'vma' argument is unused, let's drop it;

    - most __pte_alloc() callers do speculative check for pmd_none(),
    before taking ptl: let's introduce pte_alloc() macro which does
    the check.

    The only direct user of __pte_alloc left is userfaultfd, which has
    different expectation about atomicity wrt pmd.

    - pte_alloc_map() and pte_alloc_map_lock() are redefined using
    pte_alloc().

    [sudeep.holla@arm.com: fix build for arm64 hugetlbpage]
    [sfr@canb.auug.org.au: fix arch/arm/mm/mmu.c some more]
    Signed-off-by: Kirill A. Shutemov
    Cc: Dave Hansen
    Signed-off-by: Sudeep Holla
    Acked-by: Kirill A. Shutemov
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

16 Jan, 2016

2 commits

  • As with rmap, with new refcounting we cannot rely on PageTransHuge() to
    check if we need to charge size of huge page form the cgroup. We need
    to get information from caller to know whether it was mapped with PMD or
    PTE.

    We do uncharge when last reference on the page gone. At that point if
    we see PageTransHuge() it means we need to unchange whole huge page.

    The tricky part is partial unmap -- when we try to unmap part of huge
    page. We don't do a special handing of this situation, meaning we don't
    uncharge the part of huge page unless last user is gone or
    split_huge_page() is triggered. In case of cgroup memory pressure
    happens the partial unmapped page will be split through shrinker. This
    should be good enough.

    Signed-off-by: Kirill A. Shutemov
    Tested-by: Sasha Levin
    Tested-by: Aneesh Kumar K.V
    Acked-by: Vlastimil Babka
    Acked-by: Jerome Marchand
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Naoya Horiguchi
    Cc: Steve Capper
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Christoph Lameter
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • We're going to allow mapping of individual 4k pages of THP compound
    page. It means we cannot rely on PageTransHuge() check to decide if
    map/unmap small page or THP.

    The patch adds new argument to rmap functions to indicate whether we
    want to operate on whole compound page or only the small page.

    [n-horiguchi@ah.jp.nec.com: fix mapcount mismatch in hugepage migration]
    Signed-off-by: Kirill A. Shutemov
    Tested-by: Sasha Levin
    Tested-by: Aneesh Kumar K.V
    Acked-by: Vlastimil Babka
    Acked-by: Jerome Marchand
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Steve Capper
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Christoph Lameter
    Cc: David Rientjes
    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

05 Sep, 2015

2 commits

  • If the rwsem starves writers it wasn't strictly a bug but lockdep
    doesn't like it and this avoids depending on lowlevel implementation
    details of the lock.

    [akpm@linux-foundation.org: delete weird BUILD_BUG_ON()]
    Signed-off-by: Andrea Arcangeli
    Acked-by: Pavel Emelyanov
    Cc: Sanidhya Kashyap
    Cc: zhang.zhanghailiang@huawei.com
    Cc: "Kirill A. Shutemov"
    Cc: Andres Lagar-Cavilla
    Cc: Dave Hansen
    Cc: Paolo Bonzini
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Andy Lutomirski
    Cc: Hugh Dickins
    Cc: Peter Feiner
    Cc: "Dr. David Alan Gilbert"
    Cc: Johannes Weiner
    Cc: "Huangpeng (Peter)"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • This implements mcopy_atomic and mfill_zeropage that are the lowlevel
    VM methods that are invoked respectively by the UFFDIO_COPY and
    UFFDIO_ZEROPAGE userfaultfd commands.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Pavel Emelyanov
    Cc: Sanidhya Kashyap
    Cc: zhang.zhanghailiang@huawei.com
    Cc: "Kirill A. Shutemov"
    Cc: Andres Lagar-Cavilla
    Cc: Dave Hansen
    Cc: Paolo Bonzini
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Andy Lutomirski
    Cc: Hugh Dickins
    Cc: Peter Feiner
    Cc: "Dr. David Alan Gilbert"
    Cc: Johannes Weiner
    Cc: "Huangpeng (Peter)"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli