17 Jul, 2018

1 commit

  • commit bb177a732c4369bb58a1fe1df8f552b6f0f7db5f upstream.

    syzbot has noticed that a specially crafted library can easily hit
    VM_BUG_ON in __mm_populate

    kernel BUG at mm/gup.c:1242!
    invalid opcode: 0000 [#1] SMP
    CPU: 2 PID: 9667 Comm: a.out Not tainted 4.18.0-rc3 #644
    Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
    RIP: 0010:__mm_populate+0x1e2/0x1f0
    Code: 55 d0 65 48 33 14 25 28 00 00 00 89 d8 75 21 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 75 18 f1 ff 0f 0b e8 6e 18 f1 ff 0b 31 db eb c9 e8 93 06 e0 ff 0f 1f 00 55 48 89 e5 53 48 89 fb
    Call Trace:
    vm_brk_flags+0xc3/0x100
    vm_brk+0x1f/0x30
    load_elf_library+0x281/0x2e0
    __ia32_sys_uselib+0x170/0x1e0
    do_fast_syscall_32+0xca/0x420
    entry_SYSENTER_compat+0x70/0x7f

    The reason is that the length of the new brk is not page aligned when we
    try to populate the it. There is no reason to bug on that though.
    do_brk_flags already aligns the length properly so the mapping is
    expanded as it should. All we need is to tell mm_populate about it.
    Besides that there is absolutely no reason to to bug_on in the first
    place. The worst thing that could happen is that the last page wouldn't
    get populated and that is far from putting system into an inconsistent
    state.

    Fix the issue by moving the length sanitization code from do_brk_flags
    up to vm_brk_flags. The only other caller of do_brk_flags is brk
    syscall entry and it makes sure to provide the proper length so t here
    is no need for sanitation and so we can use do_brk_flags without it.

    Also remove the bogus BUG_ONs.

    [osalvador@techadventures.net: fix up vm_brk_flags s@request@len@]
    Link: http://lkml.kernel.org/r/20180706090217.GI32658@dhcp22.suse.cz
    Signed-off-by: Michal Hocko
    Reported-by: syzbot
    Tested-by: Tetsuo Handa
    Reviewed-by: Oscar Salvador
    Cc: Zi Yan
    Cc: "Aneesh Kumar K.V"
    Cc: Dan Williams
    Cc: "Kirill A. Shutemov"
    Cc: Michael S. Tsirkin
    Cc: Al Viro
    Cc: "Huang, Ying"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Michal Hocko
     

03 Jul, 2018

1 commit

  • commit a9b6de77b1a3ff729f7bfc54b2e17711776a416c upstream.

    get_user_pages_fast() for device pages is missing the typical validation
    that all page references have been taken while the mapping was valid.
    Without this validation truncate operations can not reliably coordinate
    against new page reference events like O_DIRECT.

    Cc:
    Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
    Reported-by: Jan Kara
    Reviewed-by: Jan Kara
    Signed-off-by: Dan Williams
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     

19 May, 2018

1 commit

  • commit 7f7ccc2ccc2e70c6054685f5e3522efa81556830 upstream.

    proc_pid_cmdline_read() and environ_read() directly access the target
    process' VM to retrieve the command line and environment. If this
    process remaps these areas onto a file via mmap(), the requesting
    process may experience various issues such as extra delays if the
    underlying device is slow to respond.

    Let's simply refuse to access file-backed areas in these functions.
    For this we add a new FOLL_ANON gup flag that is passed to all calls
    to access_remote_vm(). The code already takes care of such failures
    (including unmapped areas). Accesses via /proc/pid/mem were not
    changed though.

    This was assigned CVE-2018-1120.

    Note for stable backports: the patch may apply to kernels prior to 4.11
    but silently miss one location; it must be checked that no call to
    access_remote_vm() keeps zero as the last argument.

    Reported-by: Qualys Security Advisory
    Cc: Linus Torvalds
    Cc: Andy Lutomirski
    Cc: Oleg Nesterov
    Cc: stable@vger.kernel.org
    Signed-off-by: Willy Tarreau
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Willy Tarreau
     

19 Apr, 2018

1 commit

  • commit c61611f70958d86f659bca25c02ae69413747a8d upstream.

    get_user_pages_fast is supposed to be a faster drop-in equivalent of
    get_user_pages. As such, callers expect it to return a negative return
    code when passed an invalid address, and never expect it to return 0
    when passed a positive number of pages, since its documentation says:

    * Returns number of pages pinned. This may be fewer than the number
    * requested. If nr_pages is 0 or negative, returns 0. If no pages
    * were pinned, returns -errno.

    When get_user_pages_fast fall back on get_user_pages this is exactly
    what happens. Unfortunately the implementation is inconsistent: it
    returns 0 if passed a kernel address, confusing callers: for example,
    the following is pretty common but does not appear to do the right thing
    with a kernel address:

    ret = get_user_pages_fast(addr, 1, writeable, &page);
    if (ret < 0)
    return ret;

    Change get_user_pages_fast to return -EFAULT when supplied a kernel
    address to make it match expectations.

    All callers have been audited for consistency with the documented
    semantics.

    Link: http://lkml.kernel.org/r/1522962072-182137-4-git-send-email-mst@redhat.com
    Fixes: 5b65c4677a57 ("mm, x86/mm: Fix performance regression in get_user_pages_fast()")
    Signed-off-by: Michael S. Tsirkin
    Reported-by: syzbot+6304bf97ef436580fede@syzkaller.appspotmail.com
    Reviewed-by: Andrew Morton
    Cc: Kirill A. Shutemov
    Cc: Huang Ying
    Cc: Jonathan Corbet
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Thorsten Leemhuis
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Michael S. Tsirkin
     

10 Dec, 2017

1 commit

  • [ Upstream commit 5b65c4677a57a1d4414212f9995aa0e46a21ff80 ]

    The 0-day test bot found a performance regression that was tracked down to
    switching x86 to the generic get_user_pages_fast() implementation:

    http://lkml.kernel.org/r/20170710024020.GA26389@yexl-desktop

    The regression was caused by the fact that we now use local_irq_save() +
    local_irq_restore() in get_user_pages_fast() to disable interrupts.
    In x86 implementation local_irq_disable() + local_irq_enable() was used.

    The fix is to make get_user_pages_fast() use local_irq_disable(),
    leaving local_irq_save() for __get_user_pages_fast() that can be called
    with interrupts disabled.

    Numbers for pinning a gigabyte of memory, one page a time, 20 repeats:

    Before: Average: 14.91 ms, stddev: 0.45 ms
    After: Average: 10.76 ms, stddev: 0.18 ms

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrew Morton
    Cc: Huang Ying
    Cc: Jonathan Corbet
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Thorsten Leemhuis
    Cc: linux-mm@kvack.org
    Fixes: e585513b76f7 ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation")
    Link: http://lkml.kernel.org/r/20170908215603.9189-3-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Kirill A. Shutemov
     

05 Dec, 2017

1 commit

  • commit 2bb6d2837083de722bfdc369cb0d76ce188dd9b4 upstream.

    Patch series "introduce get_user_pages_longterm()", v2.

    Here is a new get_user_pages api for cases where a driver intends to
    keep an elevated page count indefinitely. This is distinct from usages
    like iov_iter_get_pages where the elevated page counts are transient.
    The iov_iter_get_pages cases immediately turn around and submit the
    pages to a device driver which will put_page when the i/o operation
    completes (under kernel control).

    In the longterm case userspace is responsible for dropping the page
    reference at some undefined point in the future. This is untenable for
    filesystem-dax case where the filesystem is in control of the lifetime
    of the block / page and needs reasonable limits on how long it can wait
    for pages in a mapping to become idle.

    Fixing filesystems to actually wait for dax pages to be idle before
    blocks from a truncate/hole-punch operation are repurposed is saved for
    a later patch series.

    Also, allowing longterm registration of dax mappings is a future patch
    series that introduces a "map with lease" semantic where the kernel can
    revoke a lease and force userspace to drop its page references.

    I have also tagged these for -stable to purposely break cases that might
    assume that longterm memory registrations for filesystem-dax mappings
    were supported by the kernel. The behavior regression this policy
    change implies is one of the reasons we maintain the "dax enabled.
    Warning: EXPERIMENTAL, use at your own risk" notification when mounting
    a filesystem in dax mode.

    It is worth noting the device-dax interface does not suffer the same
    constraints since it does not support file space management operations
    like hole-punch.

    This patch (of 4):

    Until there is a solution to the dma-to-dax vs truncate problem it is
    not safe to allow long standing memory registrations against
    filesytem-dax vmas. Device-dax vmas do not have this problem and are
    explicitly allowed.

    This is temporary until a "memory registration with layout-lease"
    mechanism can be implemented for the affected sub-systems (RDMA and
    V4L2).

    [akpm@linux-foundation.org: use kcalloc()]
    Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com
    Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
    Signed-off-by: Dan Williams
    Suggested-by: Christoph Hellwig
    Cc: Doug Ledford
    Cc: Hal Rosenstock
    Cc: Inki Dae
    Cc: Jan Kara
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: Joonyoung Shim
    Cc: Kyungmin Park
    Cc: Mauro Carvalho Chehab
    Cc: Mel Gorman
    Cc: Ross Zwisler
    Cc: Sean Hefty
    Cc: Seung-Woo Kim
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     

09 Sep, 2017

2 commits

  • Platform with advance system bus (like CAPI or CCIX) allow device memory
    to be accessible from CPU in a cache coherent fashion. Add a new type of
    ZONE_DEVICE to represent such memory. The use case are the same as for
    the un-addressable device memory but without all the corners cases.

    Link: http://lkml.kernel.org/r/20170817000548.32038-19-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Cc: Aneesh Kumar
    Cc: Paul E. McKenney
    Cc: Benjamin Herrenschmidt
    Cc: Dan Williams
    Cc: Ross Zwisler
    Cc: Balbir Singh
    Cc: David Nellans
    Cc: Evgeny Baskakov
    Cc: Johannes Weiner
    Cc: John Hubbard
    Cc: Kirill A. Shutemov
    Cc: Mark Hairgrove
    Cc: Michal Hocko
    Cc: Sherry Cheung
    Cc: Subhash Gutti
    Cc: Vladimir Davydov
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • When THP migration is being used, memory management code needs to handle
    pmd migration entries properly. This patch uses !pmd_present() or
    is_swap_pmd() (depending on whether pmd_none() needs separate code or
    not) to check pmd migration entries at the places where a pmd entry is
    present.

    Since pmd-related code uses split_huge_page(), split_huge_pmd(),
    pmd_trans_huge(), pmd_trans_unstable(), or
    pmd_none_or_trans_huge_or_clear_bad(), this patch:

    1. adds pmd migration entry split code in split_huge_pmd(),

    2. takes care of pmd migration entries whenever pmd_trans_huge() is present,

    3. makes pmd_none_or_trans_huge_or_clear_bad() pmd migration entry aware.

    Since split_huge_page() uses split_huge_pmd() and pmd_trans_unstable()
    is equivalent to pmd_none_or_trans_huge_or_clear_bad(), we do not change
    them.

    Until this commit, a pmd entry should be:
    1. pointing to a pte page,
    2. is_swap_pmd(),
    3. pmd_trans_huge(),
    4. pmd_devmap(), or
    5. pmd_none().

    Signed-off-by: Zi Yan
    Cc: Kirill A. Shutemov
    Cc: "H. Peter Anvin"
    Cc: Anshuman Khandual
    Cc: Dave Hansen
    Cc: David Nellans
    Cc: Ingo Molnar
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Naoya Horiguchi
    Cc: Thomas Gleixner
    Cc: Vlastimil Babka
    Cc: Andrea Arcangeli
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zi Yan
     

07 Sep, 2017

1 commit

  • These functions are the only bits of generic code that use
    {pud,pmd}_pfn() without checking for CONFIG_TRANSPARENT_HUGEPAGE. This
    works fine on x86, the only arch with devmap support, since the *_pfn()
    functions are always defined there, but this isn't true for every
    architecture.

    Link: http://lkml.kernel.org/r/20170626063833.11094-1-oohall@gmail.com
    Signed-off-by: Oliver O'Halloran
    Cc: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oliver O'Halloran
     

07 Jul, 2017

5 commits

  • When speculatively taking references to a hugepage using
    page_cache_add_speculative() in gup_huge_pmd(), it is assumed that the
    page returned by pmd_page() is the head page. Although normally true,
    this assumption doesn't hold when the hugepage comprises of successive
    page table entries such as when using contiguous bit on arm64 at PTE or
    PMD levels.

    This can be addressed by ensuring that the page passed to
    page_cache_add_speculative() is the real head or by de-referencing the
    head page within the function.

    We take the first approach to keep the usage pattern aligned with
    page_cache_get_speculative() where users already pass the appropriate
    page, i.e., the de-referenced head.

    Apply the same logic to fix gup_huge_[pud|pgd]() as well.

    [punit.agrawal@arm.com: fix arm64 ltp failure]
    Link: http://lkml.kernel.org/r/20170619170145.25577-5-punit.agrawal@arm.com
    Link: http://lkml.kernel.org/r/20170522133604.11392-3-punit.agrawal@arm.com
    Signed-off-by: Punit Agrawal
    Acked-by: Steve Capper
    Cc: Michal Hocko
    Cc: "Kirill A. Shutemov"
    Cc: Aneesh Kumar K.V
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Naoya Horiguchi
    Cc: Mark Rutland
    Cc: Hillf Danton
    Cc: Mike Kravetz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Punit Agrawal
     
  • When operating on hugepages with DEBUG_VM enabled, the GUP code checks
    the compound head for each tail page prior to calling
    page_cache_add_speculative. This is broken, because on the fast-GUP
    path (where we don't hold any page table locks) we can be racing with a
    concurrent invocation of split_huge_page_to_list.

    split_huge_page_to_list deals with this race by using page_ref_freeze to
    freeze the page and force concurrent GUPs to fail whilst the component
    pages are modified. This modification includes clearing the
    compound_head field for the tail pages, so checking this prior to a
    successful call to page_cache_add_speculative can lead to false
    positives: In fact, page_cache_add_speculative *already* has this check
    once the page refcount has been successfully updated, so we can simply
    remove the broken calls to VM_BUG_ON_PAGE.

    Link: http://lkml.kernel.org/r/20170522133604.11392-2-punit.agrawal@arm.com
    Signed-off-by: Will Deacon
    Signed-off-by: Punit Agrawal
    Acked-by: Steve Capper
    Acked-by: Kirill A. Shutemov
    Cc: Aneesh Kumar K.V
    Cc: Catalin Marinas
    Cc: Naoya Horiguchi
    Cc: Mark Rutland
    Cc: Hillf Danton
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon
     
  • Architectures like ppc64 supports hugepage size that is not mapped to
    any of of the page table levels. Instead they add an alternate page
    table entry format called hugepage directory (hugepd). hugepd indicates
    that the page table entry maps to a set of hugetlb pages. Add support
    for this in generic follow_page_mask code. We already support this
    format in the generic gup code.

    The default implementation prints warning and returns NULL. We will add
    ppc64 support in later patches

    Link: http://lkml.kernel.org/r/1494926612-23928-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Cc: Anshuman Khandual
    Cc: Naoya Horiguchi
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Mike Kravetz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • ppc64 supports pgd hugetlb entries. Add code to handle hugetlb pgd
    entries to follow_page_mask so that ppc64 can switch to it to handle
    hugetlbe entries.

    Link: http://lkml.kernel.org/r/1494926612-23928-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com
    Signed-off-by: Anshuman Khandual
    Signed-off-by: Aneesh Kumar K.V
    Cc: Naoya Horiguchi
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Mike Kravetz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     
  • Makes code reading easy. No functional changes in this patch. In a
    followup patch, we will be updating the follow_page_mask to handle
    hugetlb hugepd format so that archs like ppc64 can switch to the generic
    version. This split helps in doing that nicely.

    Link: http://lkml.kernel.org/r/1494926612-23928-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Reviewed-by: Naoya Horiguchi
    Cc: Anshuman Khandual
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Mike Kravetz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     

22 Jun, 2017

1 commit


19 Jun, 2017

1 commit

  • Stack guard page is a useful feature to reduce a risk of stack smashing
    into a different mapping. We have been using a single page gap which
    is sufficient to prevent having stack adjacent to a different mapping.
    But this seems to be insufficient in the light of the stack usage in
    userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
    used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
    which is 256kB or stack strings with MAX_ARG_STRLEN.

    This will become especially dangerous for suid binaries and the default
    no limit for the stack size limit because those applications can be
    tricked to consume a large portion of the stack and a single glibc call
    could jump over the guard page. These attacks are not theoretical,
    unfortunatelly.

    Make those attacks less probable by increasing the stack guard gap
    to 1MB (on systems with 4k pages; but make it depend on the page size
    because systems with larger base pages might cap stack allocations in
    the PAGE_SIZE units) which should cover larger alloca() and VLA stack
    allocations. It is obviously not a full fix because the problem is
    somehow inherent, but it should reduce attack space a lot.

    One could argue that the gap size should be configurable from userspace,
    but that can be done later when somebody finds that the new 1MB is wrong
    for some special case applications. For now, add a kernel command line
    option (stack_guard_gap) to specify the stack gap size (in page units).

    Implementation wise, first delete all the old code for stack guard page:
    because although we could get away with accounting one extra page in a
    stack vma, accounting a larger gap can break userspace - case in point,
    a program run with "ulimit -S -v 20000" failed when the 1MB gap was
    counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
    and strict non-overcommit mode.

    Instead of keeping gap inside the stack vma, maintain the stack guard
    gap as a gap between vmas: using vm_start_gap() in place of vm_start
    (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
    places which need to respect the gap - mainly arch_get_unmapped_area(),
    and and the vma tree's subtree_gap support for that.

    Original-patch-by: Oleg Nesterov
    Original-patch-by: Michal Hocko
    Signed-off-by: Hugh Dickins
    Acked-by: Michal Hocko
    Tested-by: Helge Deller # parisc
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

13 Jun, 2017

1 commit

  • This patch provides all required callbacks required by the generic
    get_user_pages_fast() code and switches x86 over - and removes
    the platform specific implementation.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20170606113133.22974-2-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar

    Kirill A. Shutemov
     

03 Jun, 2017

1 commit

  • KVM uses get_user_pages() to resolve its stage2 faults. KVM sets the
    FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it
    finds a VM_FAULT_HWPOISON. KVM handles these hwpoison pages as a
    special case. (check_user_page_hwpoison())

    When huge pages are involved, this doesn't work so well.
    get_user_pages() calls follow_hugetlb_page(), which stops early if it
    receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning
    -EFAULT to the caller. The step to map this to -EHWPOISON based on the
    FOLL_ flags is missing. The hwpoison special case is skipped, and
    -EFAULT is returned to user-space, causing Qemu or kvmtool to exit.

    Instead, move this VM_FAULT_ to errno mapping code into a header file
    and use it from faultin_page() and follow_hugetlb_page().

    With this, KVM works as expected.

    This isn't a problem for arm64 today as we haven't enabled
    MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86
    too, so I think this should be a fix. This doesn't apply earlier than
    stable's v4.11.1 due to all sorts of cleanup.

    [james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()]
    suggested.
    Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com
    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.com
    Signed-off-by: James Morse
    Acked-by: Punit Agrawal
    Acked-by: Naoya Horiguchi
    Cc: "Kirill A . Shutemov"
    Cc: [4.11.1+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    James Morse
     

04 May, 2017

1 commit

  • MIPS just got changed to only accept a pointer argument for access_ok(),
    causing one warning in drivers/scsi/pmcraid.c. I tried changing x86 the
    same way and found the same warning in __get_user_pages_fast() and
    nowhere else in the kernel during randconfig testing:

    mm/gup.c: In function '__get_user_pages_fast':
    mm/gup.c:1578:6: error: passing argument 1 of '__chk_range_not_ok' makes pointer from integer without a cast [-Werror=int-conversion]

    It would probably be a good idea to enforce type-safety in general, so
    let's change this file to not cause a warning if we do that.

    I don't know why the warning did not appear on MIPS.

    Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()")
    Link: http://lkml.kernel.org/r/20170421162659.3314521-1-arnd@arndb.de
    Signed-off-by: Arnd Bergmann
    Cc: Alexander Viro
    Acked-by: Ingo Molnar
    Cc: Michal Hocko
    Cc: "Kirill A. Shutemov"
    Cc: Lorenzo Stoakes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

23 Apr, 2017

1 commit

  • This reverts commit 2947ba054a4dabbd82848728d765346886050029.

    Dan Williams reported dax-pmem kernel warnings with the following signature:

    WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
    percpu ref (dax_pmem_percpu_release [dax_pmem])
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Kirill A. Shutemov
    Cc: Linus Torvalds
    Cc: Michal Hocko
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: aneesh.kumar@linux.vnet.ibm.com
    Cc: dann.frazier@canonical.com
    Cc: dave.hansen@intel.com
    Cc: steve.capper@linaro.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

18 Mar, 2017

7 commits

  • This patch provides all required callbacks required by the generic
    get_user_pages_fast() code and switches x86 over - and removes
    the platform specific implementation.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrew Morton
    Cc: Aneesh Kumar K . V
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Dann Frazier
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Steve Capper
    Cc: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20170316213906.89528-1-kirill.shutemov@linux.intel.com
    [ Minor readability edits. ]
    Signed-off-by: Ingo Molnar

    Kirill A. Shutemov
     
  • This is a preparation patch for the transition of x86 to the generic GUP_fast()
    implementation.

    On x86, get_user_pages_fast() does a couple of sanity checks to see if we can
    call __get_user_pages_fast() for the range.

    This kind of wrapping protection should be useful for the generic code too.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrew Morton
    Cc: Aneesh Kumar K . V
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Dann Frazier
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Steve Capper
    Cc: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20170316152655.37789-7-kirill.shutemov@linux.intel.com
    [ Small readability edits. ]
    Signed-off-by: Ingo Molnar

    Kirill A. Shutemov
     
  • This is a preparation patch for the transition of x86 to the generic GUP_fast()
    implementation.

    Prepare generic GUP_fast() to handle dev_pagemap(). At the moment, it's
    only implemented on x86. On non-x86, the new code will be compiled out.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrew Morton
    Cc: Aneesh Kumar K . V
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Dan Williams
    Cc: Dann Frazier
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Steve Capper
    Cc: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20170316152655.37789-6-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar

    Kirill A. Shutemov
     
  • This is a preparation patch for the transition of x86 to the generic GUP_fast()
    implementation.

    Unlike generic GUP_fast(), the x86 version makes all pages it touches
    referenced. It seems required for GRU and EPT.

    See the following commit:

    8ee53820edfd ("thp: mmu_notifier_test_young")

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Andrew Morton
    Cc: Aneesh Kumar K . V
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Dann Frazier
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Steve Capper
    Cc: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20170316152655.37789-5-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar

    Kirill A. Shutemov
     
  • This is a preparation patch for the transition of x86 to the generic GUP_fast()
    implementation.

    On x86 PAE, page table entry is larger than sizeof(long) and we would
    need to provide a helper that can read the entry atomically.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrew Morton
    Cc: Aneesh Kumar K . V
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Dann Frazier
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Steve Capper
    Cc: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20170316152655.37789-4-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar

    Kirill A. Shutemov
     
  • This is a preparation patch for the transition of x86 to the generic GUP_fast()
    implementation.

    On x86, we would need to do additional permission checks to determine if
    access is allowed.

    Let's abstract it out into separate helpers.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrew Morton
    Cc: Aneesh Kumar K . V
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Dann Frazier
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Steve Capper
    Cc: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20170316152655.37789-3-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar

    Kirill A. Shutemov
     
  • The only arch that defines it to something meaningful is x86.
    But x86 doesn't use the generic GUP_fast() implementation -- the
    only place where the callback is called.

    Let's drop it.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrew Morton
    Cc: Aneesh Kumar K . V
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Dann Frazier
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Steve Capper
    Cc: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20170316152655.37789-2-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar

    Kirill A. Shutemov
     

13 Mar, 2017

1 commit

  • gup_p4d_range() should call gup_pud_range(), not itself.

    [ This was not noticed on x86: this is the HAVE_GENERIC_RCU_GUP code
    used by arm[64] and powerpc - Linus ]

    Fixes: c2febafc6773 ("mm: convert generic code to 5-level paging")
    Signed-off-by: Kirill A. Shutemov
    Reported-by: Chris Packham
    Reported-by: Anton Blanchard
    Acked-by: Michal Hocko
    Acked-by: Mark Rutland
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

10 Mar, 2017

1 commit


02 Mar, 2017

1 commit


25 Feb, 2017

2 commits

  • Do the prot_none/FOLL_NUMA check after we are sure this is a THP pte.
    Archs can implement prot_none such that it can return true for regular
    pmd entries.

    Link: http://lkml.kernel.org/r/1487498326-8734-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Hillf Danton
    Cc: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • The current transparent hugepage code only supports PMDs. This patch
    adds support for transparent use of PUDs with DAX. It does not include
    support for anonymous pages. x86 support code also added.

    Most of this patch simply parallels the work that was done for huge
    PMDs. The only major difference is how the new ->pud_entry method in
    mm_walk works. The ->pmd_entry method replaces the ->pte_entry method,
    whereas the ->pud_entry method works along with either ->pmd_entry or
    ->pte_entry. The pagewalk code takes care of locking the PUD before
    calling ->pud_walk, so handlers do not need to worry whether the PUD is
    stable.

    [dave.jiang@intel.com: fix SMP x86 32bit build for native_pud_clear()]
    Link: http://lkml.kernel.org/r/148719066814.31111.3239231168815337012.stgit@djiang5-desk3.ch.intel.com
    [dave.jiang@intel.com: native_pud_clear missing on i386 build]
    Link: http://lkml.kernel.org/r/148640375195.69754.3315433724330910314.stgit@djiang5-desk3.ch.intel.com
    Link: http://lkml.kernel.org/r/148545059381.17912.8602162635537598445.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Matthew Wilcox
    Signed-off-by: Dave Jiang
    Tested-by: Alexander Kapshuk
    Cc: Dave Hansen
    Cc: Vlastimil Babka
    Cc: Jan Kara
    Cc: Dan Williams
    Cc: Ross Zwisler
    Cc: Kirill A. Shutemov
    Cc: Nilesh Choudhury
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

23 Feb, 2017

1 commit

  • Add support for VM_FAULT_RETRY to follow_hugetlb_page() so that
    get_user_pages_unlocked/locked and "nonblocking/FOLL_NOWAIT" features
    will work on hugetlbfs.

    This is required for fully functional userfaultfd non-present support on
    hugetlbfs.

    Link: http://lkml.kernel.org/r/20161216144821.5183-25-aarcange@redhat.com
    Signed-off-by: Andrea Arcangeli
    Reviewed-by: Mike Kravetz
    Cc: "Dr. David Alan Gilbert"
    Cc: Hillf Danton
    Cc: Michael Rapoport
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

15 Dec, 2016

2 commits

  • Unexport the low-level __get_user_pages_unlocked() function and replaces
    invocations with calls to more appropriate higher-level functions.

    In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked()
    with get_user_pages_unlocked() since we can now pass gup_flags.

    In async_pf_execute() and process_vm_rw_single_vec() we need to pass
    different tsk, mm arguments so get_user_pages_remote() is the sane
    replacement in these cases (having added manual acquisition and release
    of mmap_sem.)

    Additionally get_user_pages_remote() reintroduces use of the FOLL_TOUCH
    flag. However, this flag was originally silently dropped by commit
    1e9877902dc7 ("mm/gup: Introduce get_user_pages_remote()"), so this
    appears to have been unintentional and reintroducing it is therefore not
    an issue.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20161027095141.2569-3-lstoakes@gmail.com
    Signed-off-by: Lorenzo Stoakes
    Acked-by: Michal Hocko
    Cc: Jan Kara
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lorenzo Stoakes
     
  • Patch series "mm: unexport __get_user_pages_unlocked()".

    This patch series continues the cleanup of get_user_pages*() functions
    taking advantage of the fact we can now pass gup_flags as we please.

    It firstly adds an additional 'locked' parameter to
    get_user_pages_remote() to allow for its callers to utilise
    VM_FAULT_RETRY functionality. This is necessary as the invocation of
    __get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of
    this and no other existing higher level function would allow it to do
    so.

    Secondly existing callers of __get_user_pages_unlocked() are replaced
    with the appropriate higher-level replacement -
    get_user_pages_unlocked() if the current task and memory descriptor are
    referenced, or get_user_pages_remote() if other task/memory descriptors
    are referenced (having acquiring mmap_sem.)

    This patch (of 2):

    Add a int *locked parameter to get_user_pages_remote() to allow
    VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked().

    Taking into account the previous adjustments to get_user_pages*()
    functions allowing for the passing of gup_flags, we are now in a
    position where __get_user_pages_unlocked() need only be exported for his
    ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to
    subsequently unexport __get_user_pages_unlocked() as well as allowing
    for future flexibility in the use of get_user_pages_remote().

    [sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change]
    Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.com
    Signed-off-by: Lorenzo Stoakes
    Acked-by: Michal Hocko
    Cc: Jan Kara
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lorenzo Stoakes
     

13 Dec, 2016

2 commits

  • In the previous round of get_user_pages* changes comments attached to
    __get_user_pages_unlocked() and get_user_pages_unlocked() were rendered
    incorrect, this patch corrects them.

    In addition the get_user_pages_unlocked() comment seems to have already
    been outdated as it referred to tsk, mm parameters which were removed in
    c12d2da5 ("mm/gup: Remove the macro overload API migration helpers from
    the get_user*() APIs"), this patch fixes this also.

    Link: http://lkml.kernel.org/r/20161025233435.5338-1-lstoakes@gmail.com
    Signed-off-by: Lorenzo Stoakes
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lorenzo Stoakes
     
  • Make vma_permits_fault() static as it is only used in mm/gup.c

    This fixes a sparse warning.

    Link: http://lkml.kernel.org/r/20161017122353.31598-1-tklauser@distanz.ch
    Signed-off-by: Tobias Klauser
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tobias Klauser
     

25 Oct, 2016

1 commit

  • This patch unexports the low-level __get_user_pages() function.

    Recent refactoring of the get_user_pages* functions allow flags to be
    passed through get_user_pages() which eliminates the need for access to
    this function from its one user, kvm.

    We can see that the two calls to get_user_pages() which replace
    __get_user_pages() in kvm_main.c are equivalent by examining their call
    stacks:

    get_user_page_nowait():
    get_user_pages(start, 1, flags, page, NULL)
    __get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL,
    false, flags | FOLL_TOUCH)
    __get_user_pages(current, current->mm, start, 1,
    flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL)

    check_user_page_hwpoison():
    get_user_pages(addr, 1, flags, NULL, NULL)
    __get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL,
    false, flags | FOLL_TOUCH)
    __get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL,
    NULL, NULL)

    Signed-off-by: Lorenzo Stoakes
    Acked-by: Paolo Bonzini
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Lorenzo Stoakes
     

19 Oct, 2016

2 commits

  • This removes the 'write' and 'force' from get_user_pages_remote() and
    replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
    callers as use of this flag can result in surprising behaviour (and
    hence bugs) within the mm subsystem.

    Signed-off-by: Lorenzo Stoakes
    Acked-by: Michal Hocko
    Reviewed-by: Jan Kara
    Signed-off-by: Linus Torvalds

    Lorenzo Stoakes
     
  • This removes the 'write' and 'force' from get_user_pages() and replaces
    them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
    as use of this flag can result in surprising behaviour (and hence bugs)
    within the mm subsystem.

    Signed-off-by: Lorenzo Stoakes
    Acked-by: Christian König
    Acked-by: Jesper Nilsson
    Acked-by: Michal Hocko
    Reviewed-by: Jan Kara
    Signed-off-by: Linus Torvalds

    Lorenzo Stoakes