05 Mar, 2020

1 commit

  • commit f4000fdf435b8301a11cf85237c561047f8c4c72 upstream.

    Commit 817be129e6f2 ("mm: validate get_user_pages_fast flags") allowed
    only FOLL_WRITE and FOLL_LONGTERM to be passed to get_user_pages_fast().
    This, combined with the fact that get_user_pages_fast() falls back to
    "slow gup", which *does* accept FOLL_FORCE, leads to an odd situation:
    if you need FOLL_FORCE, you cannot call get_user_pages_fast().

    There does not appear to be any reason for filtering out FOLL_FORCE.
    There is nothing in the _fast() implementation that requires that we
    avoid writing to the pages. So it appears to have been an oversight.

    Fix by allowing FOLL_FORCE to be set for get_user_pages_fast().

    Link: http://lkml.kernel.org/r/20200107224558.2362728-9-jhubbard@nvidia.com
    Fixes: 817be129e6f2 ("mm: validate get_user_pages_fast flags")
    Signed-off-by: John Hubbard
    Reviewed-by: Leon Romanovsky
    Reviewed-by: Jan Kara
    Cc: Christoph Hellwig
    Cc: Alex Williamson
    Cc: Aneesh Kumar K.V
    Cc: Björn Töpel
    Cc: Daniel Vetter
    Cc: Dan Williams
    Cc: Hans Verkuil
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Gunthorpe
    Cc: Jens Axboe
    Cc: Jerome Glisse
    Cc: Jonathan Corbet
    Cc: Kirill A. Shutemov
    Cc: Mauro Carvalho Chehab
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    John Hubbard
     

19 Oct, 2019

1 commit

  • In several routines, the "flags" argument is incorrectly named "write".
    Change it to "flags".

    Also, in one place, the misnaming led to an actual bug:
    "flags & FOLL_WRITE" is required, rather than just "flags".
    (That problem was flagged by krobot, in v1 of this patch.)

    Also, change the flags argument from int, to unsigned int.

    You can see that this was a simple oversight, because the
    calling code passes "flags" to the fifth argument:

    gup_pgd_range():
    ...
    if (!gup_huge_pd(__hugepd(pgd_val(pgd)), addr,
    PGDIR_SHIFT, next, flags, pages, nr))

    ...which, until this patch, the callees referred to as "write".

    Also, change two lines to avoid checkpatch line length
    complaints, and another line to fix another oversight
    that checkpatch called out: missing "int" on pdshift.

    Link: http://lkml.kernel.org/r/20191014184639.1512873-3-jhubbard@nvidia.com
    Fixes: b798bec4741b ("mm/gup: change write parameter to flags in fast walk")
    Signed-off-by: John Hubbard
    Reported-by: kbuild test robot
    Suggested-by: Kirill A. Shutemov
    Suggested-by: Ira Weiny
    Acked-by: Kirill A. Shutemov
    Reviewed-by: Ira Weiny
    Cc: Christoph Hellwig
    Cc: Aneesh Kumar K.V
    Cc: Keith Busch
    Cc: Shuah Khan
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Hubbard
     

26 Sep, 2019

1 commit

  • This patch is a part of a series that extends kernel ABI to allow to pass
    tagged user pointers (with the top byte set to something else other than
    0x00) as syscall arguments.

    mm/gup.c provides a kernel interface that accepts user addresses and
    manipulates user pages directly (for example get_user_pages, that is used
    by the futex syscall). Since a user can provided tagged addresses, we
    need to handle this case.

    Add untagging to gup.c functions that use user addresses for vma lookups.

    Link: http://lkml.kernel.org/r/4731bddba3c938658c10ff4ed55cc01c60f4c8f8.1563904656.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Reviewed-by: Khalid Aziz
    Reviewed-by: Vincenzo Frascino
    Reviewed-by: Kees Cook
    Reviewed-by: Catalin Marinas
    Cc: Al Viro
    Cc: Dave Hansen
    Cc: Eric Auger
    Cc: Felix Kuehling
    Cc: Jens Wiklander
    Cc: Mauro Carvalho Chehab
    Cc: Mike Rapoport
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     

25 Sep, 2019

3 commits

  • Introduce a new foll_flag: FOLL_SPLIT_PMD. As the name says
    FOLL_SPLIT_PMD splits huge pmd for given mm_struct, the underlining huge
    page stays as-is.

    FOLL_SPLIT_PMD is useful for cases where we need to use regular pages, but
    would switch back to huge page and huge pmd on. One of such example is
    uprobe. The following patches use FOLL_SPLIT_PMD in uprobe.

    Link: http://lkml.kernel.org/r/20190815164525.1848545-4-songliubraving@fb.com
    Signed-off-by: Song Liu
    Reviewed-by: Oleg Nesterov
    Acked-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     
  • [11~From: John Hubbard
    Subject: mm/gup: add make_dirty arg to put_user_pages_dirty_lock()

    Patch series "mm/gup: add make_dirty arg to put_user_pages_dirty_lock()",
    v3.

    There are about 50+ patches in my tree [2], and I'll be sending out the
    remaining ones in a few more groups:

    * The block/bio related changes (Jerome mostly wrote those, but I've had
    to move stuff around extensively, and add a little code)

    * mm/ changes

    * other subsystem patches

    * an RFC that shows the current state of the tracking patch set. That
    can only be applied after all call sites are converted, but it's good to
    get an early look at it.

    This is part a tree-wide conversion, as described in fc1d8e7cca2d ("mm:
    introduce put_user_page*(), placeholder versions").

    This patch (of 3):

    Provide more capable variation of put_user_pages_dirty_lock(), and delete
    put_user_pages_dirty(). This is based on the following:

    1. Lots of call sites become simpler if a bool is passed into
    put_user_page*(), instead of making the call site choose which
    put_user_page*() variant to call.

    2. Christoph Hellwig's observation that set_page_dirty_lock() is
    usually correct, and set_page_dirty() is usually a bug, or at least
    questionable, within a put_user_page*() calling chain.

    This leads to the following API choices:

    * put_user_pages_dirty_lock(page, npages, make_dirty)

    * There is no put_user_pages_dirty(). You have to
    hand code that, in the rare case that it's
    required.

    [jhubbard@nvidia.com: remove unused variable in siw_free_plist()]
    Link: http://lkml.kernel.org/r/20190729074306.10368-1-jhubbard@nvidia.com
    Link: http://lkml.kernel.org/r/20190724044537.10458-2-jhubbard@nvidia.com
    Signed-off-by: John Hubbard
    Cc: Matthew Wilcox
    Cc: Jan Kara
    Cc: Christoph Hellwig
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@linux-foundation.org
     
  • Replace 1 << compound_order(page) with compound_nr(page). Minor
    improvements in readability.

    Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle)
    Reviewed-by: Andrew Morton
    Reviewed-by: Ira Weiny
    Acked-by: Kirill A. Shutemov
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

17 Jul, 2019

1 commit

  • ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
    with the long-out-of-date comment can lead to the impression than an
    architecture may just enable it (since __add_pages() now "comprehends
    device memory" for itself) and expect things to work.

    In practice, however, ZONE_DEVICE users have little chance of
    functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
    that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
    dependency so the real situation is clearer.

    Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com
    Signed-off-by: Robin Murphy
    Acked-by: Dan Williams
    Reviewed-by: Ira Weiny
    Acked-by: Oliver O'Halloran
    Reviewed-by: Anshuman Khandual
    Cc: Michael Ellerman
    Cc: Catalin Marinas
    Cc: David Hildenbrand
    Cc: Jerome Glisse
    Cc: Michal Hocko
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Murphy
     

15 Jul, 2019

1 commit

  • Pull HMM updates from Jason Gunthorpe:
    "Improvements and bug fixes for the hmm interface in the kernel:

    - Improve clarity, locking and APIs related to the 'hmm mirror'
    feature merged last cycle. In linux-next we now see AMDGPU and
    nouveau to be using this API.

    - Remove old or transitional hmm APIs. These are hold overs from the
    past with no users, or APIs that existed only to manage cross tree
    conflicts. There are still a few more of these cleanups that didn't
    make the merge window cut off.

    - Improve some core mm APIs:
    - export alloc_pages_vma() for driver use
    - refactor into devm_request_free_mem_region() to manage
    DEVICE_PRIVATE resource reservations
    - refactor duplicative driver code into the core dev_pagemap
    struct

    - Remove hmm wrappers of improved core mm APIs, instead have drivers
    use the simplified API directly

    - Remove DEVICE_PUBLIC

    - Simplify the kconfig flow for the hmm users and core code"

    * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits)
    mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR
    mm: remove the HMM config option
    mm: sort out the DEVICE_PRIVATE Kconfig mess
    mm: simplify ZONE_DEVICE page private data
    mm: remove hmm_devmem_add
    mm: remove hmm_vma_alloc_locked_page
    nouveau: use devm_memremap_pages directly
    nouveau: use alloc_page_vma directly
    PCI/P2PDMA: use the dev_pagemap internal refcount
    device-dax: use the dev_pagemap internal refcount
    memremap: provide an optional internal refcount in struct dev_pagemap
    memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag
    memremap: remove the data field in struct dev_pagemap
    memremap: add a migrate_to_ram method to struct dev_pagemap_ops
    memremap: lift the devmap_enable manipulation into devm_memremap_pages
    memremap: pass a struct dev_pagemap to ->kill and ->cleanup
    memremap: move dev_pagemap callbacks into a separate structure
    memremap: validate the pagemap type passed to devm_memremap_pages
    mm: factor out a devm_request_free_mem_region helper
    mm: export alloc_pages_vma
    ...

    Linus Torvalds
     

13 Jul, 2019

14 commits

  • Several mips builds generate the following build warning.

    mm/gup.c:1788:13: warning: 'undo_dev_pagemap' defined but not used

    The function is declared unconditionally but only called from behind
    various ifdefs. Mark it __maybe_unused.

    Link: http://lkml.kernel.org/r/1562072523-22311-1-git-send-email-linux@roeck-us.net
    Signed-off-by: Guenter Roeck
    Reviewed-by: Andrew Morton
    Cc: Stephen Rothwell
    Cc: Robin Murphy
    Cc: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guenter Roeck
     
  • If we end up without a PGD or PUD entry backing the gate area, don't BUG
    -- just fail gracefully.

    It's not entirely implausible that this could happen some day on x86. It
    doesn't right now even with an execute-only emulated vsyscall page because
    the fixmap shares the PUD, but the core mm code shouldn't rely on that
    particular detail to avoid OOPSing.

    Link: http://lkml.kernel.org/r/a1d9f4efb75b9d464e59fd6af00104b21c58f6f7.1561610798.git.luto@kernel.org
    Signed-off-by: Andy Lutomirski
    Reviewed-by: Kees Cook
    Reviewed-by: Andrew Morton
    Cc: Florian Weimer
    Cc: Jann Horn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     
  • Both hugetlb and thp locate on the same migration type of pageblock, since
    they are allocated from a free_list[]. Based on this fact, it is enough
    to check on a single subpage to decide the migration type of the whole
    huge page. By this way, it saves (2M/4K - 1) times loop for pmd_huge on
    x86, similar on other archs.

    Furthermore, when executing isolate_huge_page(), it avoid taking global
    hugetlb_lock many times, and meanless remove/add to the local link list
    cma_page_list.

    [akpm@linux-foundation.org: make `i' and `step' unsigned]
    Link: http://lkml.kernel.org/r/1561612545-28997-1-git-send-email-kernelfans@gmail.com
    Signed-off-by: Pingfan Liu
    Reviewed-by: Andrew Morton
    Reviewed-by: Ira Weiny
    Cc: Mike Rapoport
    Cc: "Kirill A. Shutemov"
    Cc: Thomas Gleixner
    Cc: John Hubbard
    Cc: "Aneesh Kumar K.V"
    Cc: Christoph Hellwig
    Cc: Keith Busch
    Cc: Mike Kravetz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pingfan Liu
     
  • All other get_user_page_fast cases mark the page referenced, so do this
    here as well.

    Link: http://lkml.kernel.org/r/20190625143715.1689-17-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • This applies the overflow fixes from 8fde12ca79aff ("mm: prevent
    get_user_pages() from overflowing page refcount") to the powerpc hugepd
    code and brings it back in sync with the other GUP cases.

    Link: http://lkml.kernel.org/r/20190625143715.1689-16-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • While only powerpc supports the hugepd case, the code is pretty generic
    and I'd like to keep all GUP internals in one place.

    Link: http://lkml.kernel.org/r/20190625143715.1689-15-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • We can only deal with FOLL_WRITE and/or FOLL_LONGTERM in
    get_user_pages_fast, so reject all other flags.

    Link: http://lkml.kernel.org/r/20190625143715.1689-14-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Always build mm/gup.c so that we don't have to provide separate nommu
    stubs. Also merge the get_user_pages_fast and __get_user_pages_fast stubs
    when HAVE_FAST_GUP into the main implementations, which will never call
    the fast path if HAVE_FAST_GUP is not set.

    This also ensures the new put_user_pages* helpers are available for nommu,
    as those are currently missing, which would create a problem as soon as we
    actually grew users for it.

    Link: http://lkml.kernel.org/r/20190625143715.1689-13-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • This moves the actually exported functions towards the end of the file,
    and reorders some functions to be in more logical blocks as a preparation
    for moving various stubs inline into the main functionality using
    IS_ENABLED().

    Link: http://lkml.kernel.org/r/20190625143715.1689-12-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • We only support the generic GUP now, so rename the config option to
    be more clear, and always use the mm/Kconfig definition of the
    symbol and select it from the arch Kconfigs.

    Link: http://lkml.kernel.org/r/20190625143715.1689-11-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Khalid Aziz
    Reviewed-by: Jason Gunthorpe
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • The split low/high access is the only non-READ_ONCE version of gup_get_pte
    that did show up in the various arch implemenations. Lift it to common
    code and drop the ifdef based arch override.

    Link: http://lkml.kernel.org/r/20190625143715.1689-4-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jason Gunthorpe
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Pass in the already calculated end value instead of recomputing it, and
    leave the end > start check in the callers instead of duplicating them in
    the arch code.

    Link: http://lkml.kernel.org/r/20190625143715.1689-3-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jason Gunthorpe
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Patch series "switch the remaining architectures to use generic GUP", v4.

    A series to switch mips, sh and sparc64 to use the generic GUP code so
    that we only have one codebase to touch for further improvements to this
    code.

    This patch (of 16):

    This will allow sparc64, or any future architecture with memory tagging to
    override its tags for get_user_pages and get_user_pages_fast.

    Link: http://lkml.kernel.org/r/20190625143715.1689-2-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Khalid Aziz
    Reviewed-by: Jason Gunthorpe
    Cc: Paul Burton
    Cc: James Hogan
    Cc: Yoshinori Sato
    Cc: Rich Felker
    Cc: David Miller
    Cc: Nicholas Piggin
    Cc: Khalid Aziz
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • follow_page_mask() is only used in gup.c, make it static.

    Link: http://lkml.kernel.org/r/20190510190831.GA4061@bharath12345-Inspiron-5559
    Signed-off-by: Bharath Vedartham
    Reviewed-by: Ira Weiny
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bharath Vedartham
     

03 Jul, 2019

1 commit

  • The code hasn't been used since it was added to the tree, and doesn't
    appear to actually be usable.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jason Gunthorpe
    Acked-by: Michal Hocko
    Reviewed-by: Dan Williams
    Tested-by: Dan Williams
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     

02 Jun, 2019

1 commit

  • When get_user_pages*() is called with pages = NULL, the processing of
    VM_FAULT_RETRY terminates early without actually retrying to fault-in all
    the pages.

    If the pages in the requested range belong to a VMA that has userfaultfd
    registered, handle_userfault() returns VM_FAULT_RETRY *after* user space
    has populated the page, but for the gup pre-fault case there's no actual
    retry and the caller will get no pages although they are present.

    This issue was uncovered when running post-copy memory restore in CRIU
    after d9c9ce34ed5c ("x86/fpu: Fault-in user stack if
    copy_fpstate_to_sigframe() fails").

    After this change, the copying of FPU state to the sigframe switched from
    copy_to_user() variants which caused a real page fault to get_user_pages()
    with pages parameter set to NULL.

    In post-copy mode of CRIU, the destination memory is managed with
    userfaultfd and lack of the retry for pre-fault case in get_user_pages()
    causes a crash of the restored process.

    Making the pre-fault behavior of get_user_pages() the same as the "normal"
    one fixes the issue.

    Link: http://lkml.kernel.org/r/1557844195-18882-1-git-send-email-rppt@linux.ibm.com
    Fixes: d9c9ce34ed5c ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails")
    Signed-off-by: Mike Rapoport
    Tested-by: Andrei Vagin [https://travis-ci.org/avagin/linux/builds/533184940]
    Tested-by: Hugh Dickins
    Cc: Andrea Arcangeli
    Cc: Sebastian Andrzej Siewior
    Cc: Borislav Petkov
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

15 May, 2019

5 commits

  • A discussion of the overall problem is below.

    As mentioned in patch 0001, the steps are to fix the problem are:

    1) Provide put_user_page*() routines, intended to be used
    for releasing pages that were pinned via get_user_pages*().

    2) Convert all of the call sites for get_user_pages*(), to
    invoke put_user_page*(), instead of put_page(). This involves dozens of
    call sites, and will take some time.

    3) After (2) is complete, use get_user_pages*() and put_user_page*() to
    implement tracking of these pages. This tracking will be separate from
    the existing struct page refcounting.

    4) Use the tracking and identification of these pages, to implement
    special handling (especially in writeback paths) when the pages are
    backed by a filesystem.

    Overview
    ========

    Some kernel components (file systems, device drivers) need to access
    memory that is specified via process virtual address. For a long time,
    the API to achieve that was get_user_pages ("GUP") and its variations.
    However, GUP has critical limitations that have been overlooked; in
    particular, GUP does not interact correctly with filesystems in all
    situations. That means that file-backed memory + GUP is a recipe for
    potential problems, some of which have already occurred in the field.

    GUP was first introduced for Direct IO (O_DIRECT), allowing filesystem
    code to get the struct page behind a virtual address and to let storage
    hardware perform a direct copy to or from that page. This is a
    short-lived access pattern, and as such, the window for a concurrent
    writeback of GUP'd page was small enough that there were not (we think)
    any reported problems. Also, userspace was expected to understand and
    accept that Direct IO was not synchronized with memory-mapped access to
    that data, nor with any process address space changes such as munmap(),
    mremap(), etc.

    Over the years, more GUP uses have appeared (virtualization, device
    drivers, RDMA) that can keep the pages they get via GUP for a long period
    of time (seconds, minutes, hours, days, ...). This long-term pinning
    makes an underlying design problem more obvious.

    In fact, there are a number of key problems inherent to GUP:

    Interactions with file systems
    ==============================

    File systems expect to be able to write back data, both to reclaim pages,
    and for data integrity. Allowing other hardware (NICs, GPUs, etc) to gain
    write access to the file memory pages means that such hardware can dirty
    the pages, without the filesystem being aware. This can, in some cases
    (depending on filesystem, filesystem options, block device, block device
    options, and other variables), lead to data corruption, and also to kernel
    bugs of the form:

    kernel BUG at /build/linux-fQ94TU/linux-4.4.0/fs/ext4/inode.c:1899!
    backtrace:
    ext4_writepage
    __writepage
    write_cache_pages
    ext4_writepages
    do_writepages
    __writeback_single_inode
    writeback_sb_inodes
    __writeback_inodes_wb
    wb_writeback
    wb_workfn
    process_one_work
    worker_thread
    kthread
    ret_from_fork

    ...which is due to the file system asserting that there are still buffer
    heads attached:

    ({ \
    BUG_ON(!PagePrivate(page)); \
    ((struct buffer_head *)page_private(page)); \
    })

    Dave Chinner's description of this is very clear:

    "The fundamental issue is that ->page_mkwrite must be called on every
    write access to a clean file backed page, not just the first one.
    How long the GUP reference lasts is irrelevant, if the page is clean
    and you need to dirty it, you must call ->page_mkwrite before it is
    marked writeable and dirtied. Every. Time."

    This is just one symptom of the larger design problem: real filesystems
    that actually write to a backing device, do not actually support
    get_user_pages() being called on their pages, and letting hardware write
    directly to those pages--even though that pattern has been going on since
    about 2005 or so.

    Long term GUP
    =============

    Long term GUP is an issue when FOLL_WRITE is specified to GUP (so, a
    writeable mapping is created), and the pages are file-backed. That can
    lead to filesystem corruption. What happens is that when a file-backed
    page is being written back, it is first mapped read-only in all of the CPU
    page tables; the file system then assumes that nobody can write to the
    page, and that the page content is therefore stable. Unfortunately, the
    GUP callers generally do not monitor changes to the CPU pages tables; they
    instead assume that the following pattern is safe (it's not):

    get_user_pages()

    Hardware can keep a reference to those pages for a very long time,
    and write to it at any time. Because "hardware" here means "devices
    that are not a CPU", this activity occurs without any interaction with
    the kernel's file system code.

    for each page
    set_page_dirty
    put_page()

    In fact, the GUP documentation even recommends that pattern.

    Anyway, the file system assumes that the page is stable (nothing is
    writing to the page), and that is a problem: stable page content is
    necessary for many filesystem actions during writeback, such as checksum,
    encryption, RAID striping, etc. Furthermore, filesystem features like COW
    (copy on write) or snapshot also rely on being able to use a new page for
    as memory for that memory range inside the file.

    Corruption during write back is clearly possible here. To solve that, one
    idea is to identify pages that have active GUP, so that we can use a
    bounce page to write stable data to the filesystem. The filesystem would
    work on the bounce page, while any of the active GUP might write to the
    original page. This would avoid the stable page violation problem, but
    note that it is only part of the overall solution, because other problems
    remain.

    Other filesystem features that need to replace the page with a new one can
    be inhibited for pages that are GUP-pinned. This will, however, alter and
    limit some of those filesystem features. The only fix for that would be
    to require GUP users to monitor and respond to CPU page table updates.
    Subsystems such as ODP and HMM do this, for example. This aspect of the
    problem is still under discussion.

    Direct IO
    =========

    Direct IO can cause corruption, if userspace does Direct-IO that writes to
    a range of virtual addresses that are mmap'd to a file. The pages written
    to are file-backed pages that can be under write back, while the Direct IO
    is taking place. Here, Direct IO races with a write back: it calls GUP
    before page_mkclean() has replaced the CPU pte with a read-only entry.
    The race window is pretty small, which is probably why years have gone by
    before we noticed this problem: Direct IO is generally very quick, and
    tends to finish up before the filesystem gets around to do anything with
    the page contents. However, it's still a real problem. The solution is
    to never let GUP return pages that are under write back, but instead,
    force GUP to take a write fault on those pages. That way, GUP will
    properly synchronize with the active write back. This does not change the
    required GUP behavior, it just avoids that race.

    Details
    =======

    Introduces put_user_page(), which simply calls put_page(). This provides
    a way to update all get_user_pages*() callers, so that they call
    put_user_page(), instead of put_page().

    Also introduces put_user_pages(), and a few dirty/locked variations, as a
    replacement for release_pages(), and also as a replacement for open-coded
    loops that release multiple pages. These may be used for subsequent
    performance improvements, via batching of pages to be released.

    This is the first step of fixing a problem (also described in [1] and [2])
    with interactions between get_user_pages ("gup") and filesystems.

    Problem description: let's start with a bug report. Below, is what
    happens sometimes, under memory pressure, when a driver pins some pages
    via gup, and then marks those pages dirty, and releases them. Note that
    the gup documentation actually recommends that pattern. The problem is
    that the filesystem may do a writeback while the pages were gup-pinned,
    and then the filesystem believes that the pages are clean. So, when the
    driver later marks the pages as dirty, that conflicts with the
    filesystem's page tracking and results in a BUG(), like this one that I
    experienced:

    kernel BUG at /build/linux-fQ94TU/linux-4.4.0/fs/ext4/inode.c:1899!
    backtrace:
    ext4_writepage
    __writepage
    write_cache_pages
    ext4_writepages
    do_writepages
    __writeback_single_inode
    writeback_sb_inodes
    __writeback_inodes_wb
    wb_writeback
    wb_workfn
    process_one_work
    worker_thread
    kthread
    ret_from_fork

    ...which is due to the file system asserting that there are still buffer
    heads attached:

    ({ \
    BUG_ON(!PagePrivate(page)); \
    ((struct buffer_head *)page_private(page)); \
    })

    Dave Chinner's description of this is very clear:

    "The fundamental issue is that ->page_mkwrite must be called on
    every write access to a clean file backed page, not just the first
    one. How long the GUP reference lasts is irrelevant, if the page is
    clean and you need to dirty it, you must call ->page_mkwrite before it
    is marked writeable and dirtied. Every. Time."

    This is just one symptom of the larger design problem: real filesystems
    that actually write to a backing device, do not actually support
    get_user_pages() being called on their pages, and letting hardware write
    directly to those pages--even though that pattern has been going on since
    about 2005 or so.

    The steps are to fix it are:

    1) (This patch): provide put_user_page*() routines, intended to be used
    for releasing pages that were pinned via get_user_pages*().

    2) Convert all of the call sites for get_user_pages*(), to
    invoke put_user_page*(), instead of put_page(). This involves dozens of
    call sites, and will take some time.

    3) After (2) is complete, use get_user_pages*() and put_user_page*() to
    implement tracking of these pages. This tracking will be separate from
    the existing struct page refcounting.

    4) Use the tracking and identification of these pages, to implement
    special handling (especially in writeback paths) when the pages are
    backed by a filesystem.

    [1] https://lwn.net/Articles/774411/ : "DMA and get_user_pages()"
    [2] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

    Link: http://lkml.kernel.org/r/20190327023632.13307-2-jhubbard@nvidia.com
    Signed-off-by: John Hubbard
    Reviewed-by: Jan Kara
    Reviewed-by: Mike Rapoport [docs]
    Reviewed-by: Ira Weiny
    Reviewed-by: Jérôme Glisse
    Reviewed-by: Christoph Lameter
    Tested-by: Ira Weiny
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Dan Williams
    Cc: Dave Chinner
    Cc: Jason Gunthorpe
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Ralph Campbell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Hubbard
     
  • DAX pages were previously unprotected from longterm pins when users called
    get_user_pages_fast().

    Use the new FOLL_LONGTERM flag to check for DEVMAP pages and fall back to
    regular GUP processing if a DEVMAP page is encountered.

    [ira.weiny@intel.com: v3]
    Link: http://lkml.kernel.org/r/20190328084422.29911-5-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190328084422.29911-5-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190317183438.2057-5-ira.weiny@intel.com
    Signed-off-by: Ira Weiny
    Reviewed-by: Andrew Morton
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Dan Williams
    Cc: "David S. Miller"
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: "Kirill A. Shutemov"
    Cc: Martin Schwidefsky
    Cc: Michal Hocko
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Thomas Gleixner
    Cc: Yoshinori Sato
    Cc: Mike Marshall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ira Weiny
     
  • To facilitate additional options to get_user_pages_fast() change the
    singular write parameter to be gup_flags.

    This patch does not change any functionality. New functionality will
    follow in subsequent patches.

    Some of the get_user_pages_fast() call sites were unchanged because they
    already passed FOLL_WRITE or 0 for the write parameter.

    NOTE: It was suggested to change the ordering of the get_user_pages_fast()
    arguments to ensure that callers were converted. This breaks the current
    GUP call site convention of having the returned pages be the final
    parameter. So the suggestion was rejected.

    Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
    Signed-off-by: Ira Weiny
    Reviewed-by: Mike Marshall
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Dan Williams
    Cc: "David S. Miller"
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: "Kirill A. Shutemov"
    Cc: Martin Schwidefsky
    Cc: Michal Hocko
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Thomas Gleixner
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ira Weiny
     
  • In order to support more options in the GUP fast walk, change the write
    parameter to flags throughout the call stack.

    This patch does not change functionality and passes FOLL_WRITE where write
    was previously used.

    Link: http://lkml.kernel.org/r/20190328084422.29911-3-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190317183438.2057-3-ira.weiny@intel.com
    Signed-off-by: Ira Weiny
    Reviewed-by: Dan Williams
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: "David S. Miller"
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: "Kirill A. Shutemov"
    Cc: Martin Schwidefsky
    Cc: Michal Hocko
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Thomas Gleixner
    Cc: Yoshinori Sato
    Cc: Mike Marshall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ira Weiny
     
  • Pach series "Add FOLL_LONGTERM to GUP fast and use it".

    HFI1, qib, and mthca, use get_user_pages_fast() due to its performance
    advantages. These pages can be held for a significant time. But
    get_user_pages_fast() does not protect against mapping FS DAX pages.

    Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
    retains the performance while also adding the FS DAX checks. XDP has also
    shown interest in using this functionality.[1]

    In addition we change get_user_pages() to use the new FOLL_LONGTERM flag
    and remove the specialized get_user_pages_longterm call.

    [1] https://lkml.org/lkml/2019/3/19/939

    "longterm" is a relative thing and at this point is probably a misnomer.
    This is really flagging a pin which is going to be given to hardware and
    can't move. I've thought of a couple of alternative names but I think we
    have to settle on if we are going to use FL_LAYOUT or something else to
    solve the "longterm" problem. Then I think we can change the flag to a
    better name.

    Secondly, it depends on how often you are registering memory. I have
    spoken with some RDMA users who consider MR in the performance path...
    For the overall application performance. I don't have the numbers as the
    tests for HFI1 were done a long time ago. But there was a significant
    advantage. Some of which is probably due to the fact that you don't have
    to hold mmap_sem.

    Finally, architecturally I think it would be good for everyone to use
    *_fast. There are patches submitted to the RDMA list which would allow
    the use of *_fast (they reworking the use of mmap_sem) and as soon as they
    are accepted I'll submit a patch to convert the RDMA core as well. Also
    to this point others are looking to use *_fast.

    As an aside, Jasons pointed out in my previous submission that *_fast and
    *_unlocked look very much the same. I agree and I think further cleanup
    will be coming. But I'm focused on getting the final solution for DAX at
    the moment.

    This patch (of 7):

    This patch starts a series which aims to support FOLL_LONGTERM in
    get_user_pages_fast(). Some callers who would like to do a longterm (user
    controlled pin) of pages with the fast variant of GUP for performance
    purposes.

    Rather than have a separate get_user_pages_longterm() call, introduce
    FOLL_LONGTERM and change the longterm callers to use it.

    This patch does not change any functionality. In the short term
    "longterm" or user controlled pins are unsafe for Filesystems and FS DAX
    in particular has been blocked. However, callers of get_user_pages_fast()
    were not "protected".

    FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it
    requires vmas to determine if DAX is in use.

    NOTE: In merging with the CMA changes we opt to change the
    get_user_pages() call in check_and_migrate_cma_pages() to a call of
    __get_user_pages_locked() on the newly migrated pages. This makes the
    code read better in that we are calling __get_user_pages_locked() on the
    pages before and after a potential migration.

    As a side affect some of the interfaces are cleaned up but this is not the
    primary purpose of the series.

    In review[1] it was asked:

    > This I don't get - if you do lock down long term mappings performance
    > of the actual get_user_pages call shouldn't matter to start with.
    >
    > What do I miss?

    A couple of points.

    First "longterm" is a relative thing and at this point is probably a
    misnomer. This is really flagging a pin which is going to be given to
    hardware and can't move. I've thought of a couple of alternative names
    but I think we have to settle on if we are going to use FL_LAYOUT or
    something else to solve the "longterm" problem. Then I think we can
    change the flag to a better name.

    Second, It depends on how often you are registering memory. I have spoken
    with some RDMA users who consider MR in the performance path... For the
    overall application performance. I don't have the numbers as the tests
    for HFI1 were done a long time ago. But there was a significant
    advantage. Some of which is probably due to the fact that you don't have
    to hold mmap_sem.

    Finally, architecturally I think it would be good for everyone to use
    *_fast. There are patches submitted to the RDMA list which would allow
    the use of *_fast (they reworking the use of mmap_sem) and as soon as they
    are accepted I'll submit a patch to convert the RDMA core as well. Also
    to this point others are looking to use *_fast.

    As an asside, Jasons pointed out in my previous submission that *_fast and
    *_unlocked look very much the same. I agree and I think further cleanup
    will be coming. But I'm focused on getting the final solution for DAX at
    the moment.

    [1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965

    [ira.weiny@intel.com: v3]
    Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.com
    Signed-off-by: Ira Weiny
    Reviewed-by: Andrew Morton
    Cc: Aneesh Kumar K.V
    Cc: Michal Hocko
    Cc: John Hubbard
    Cc: "Kirill A. Shutemov"
    Cc: Peter Zijlstra
    Cc: Jason Gunthorpe
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "David S. Miller"
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: Ralf Baechle
    Cc: James Hogan
    Cc: Dan Williams
    Cc: Mike Marshall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ira Weiny
     

15 Apr, 2019

2 commits

  • Merge page ref overflow branch.

    Jann Horn reported that he can overflow the page ref count with
    sufficient memory (and a filesystem that is intentionally extremely
    slow).

    Admittedly it's not exactly easy. To have more than four billion
    references to a page requires a minimum of 32GB of kernel memory just
    for the pointers to the pages, much less any metadata to keep track of
    those pointers. Jann needed a total of 140GB of memory and a specially
    crafted filesystem that leaves all reads pending (in order to not ever
    free the page references and just keep adding more).

    Still, we have a fairly straightforward way to limit the two obvious
    user-controllable sources of page references: direct-IO like page
    references gotten through get_user_pages(), and the splice pipe page
    duplication. So let's just do that.

    * branch page-refs:
    fs: prevent page refcount overflow in pipe_buf_get
    mm: prevent get_user_pages() from overflowing page refcount
    mm: add 'try_get_page()' helper function
    mm: make page ref count overflow check tighter and more explicit

    Linus Torvalds
     
  • If the page refcount wraps around past zero, it will be freed while
    there are still four billion references to it. One of the possible
    avenues for an attacker to try to make this happen is by doing direct IO
    on a page multiple times. This patch makes get_user_pages() refuse to
    take a new page reference if there are already more than two billion
    references to the page.

    Reported-by: Jann Horn
    Acked-by: Matthew Wilcox
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Mar, 2019

1 commit


06 Mar, 2019

1 commit

  • This patch updates get_user_pages_longterm to migrate pages allocated
    out of CMA region. This makes sure that we don't keep non-movable pages
    (due to page reference count) in the CMA area.

    This will be used by ppc64 in a later patch to avoid pinning pages in
    the CMA region. ppc64 uses CMA region for allocation of the hardware
    page table (hash page table) and not able to migrate pages out of CMA
    region results in page table allocation failures.

    One case where we hit this easy is when a guest using a VFIO passthrough
    device. VFIO locks all the guest's memory and if the guest memory is
    backed by CMA region, it becomes unmovable resulting in fragmenting the
    CMA and possibly preventing other guests from allocation a large enough
    hash page table.

    NOTE: We allocate the new page without using __GFP_THISNODE

    Link: http://lkml.kernel.org/r/20190114095438.32470-3-aneesh.kumar@linux.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Cc: Alexey Kardashevskiy
    Cc: Andrea Arcangeli
    Cc: David Gibson
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     

13 Feb, 2019

1 commit

  • For dax pmd, pmd_trans_huge() returns false but pmd_huge() returns true
    on x86. So the function works as long as hugetlb is configured.
    However, dax doesn't depend on hugetlb.

    Link: http://lkml.kernel.org/r/20190111034033.601-1-yuzhao@google.com
    Signed-off-by: Yu Zhao
    Reviewed-by: Jan Kara
    Cc: Dan Williams
    Cc: Huang Ying
    Cc: Matthew Wilcox
    Cc: Keith Busch
    Cc: "Michael S . Tsirkin"
    Cc: John Hubbard
    Cc: Wei Yang
    Cc: Mike Rapoport
    Cc: Andrea Arcangeli
    Cc: "Kirill A . Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yu Zhao
     

11 Feb, 2019

1 commit

  • The 'write' parameter is unused in gup_fast_permitted() so remove it.

    Signed-off-by: Ira Weiny
    Acked-by: Kirill A. Shutemov
    Reviewed-by: Thomas Gleixner
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20190210223424.13934-1-ira.weiny@intel.com
    Signed-off-by: Ingo Molnar

    Ira Weiny
     

06 Jan, 2019

1 commit

  • Merge more updates from Andrew Morton:

    - procfs updates

    - various misc bits

    - lib/ updates

    - epoll updates

    - autofs

    - fatfs

    - a few more MM bits

    * emailed patches from Andrew Morton : (58 commits)
    mm/page_io.c: fix polled swap page in
    checkpatch: add Co-developed-by to signature tags
    docs: fix Co-Developed-by docs
    drivers/base/platform.c: kmemleak ignore a known leak
    fs: don't open code lru_to_page()
    fs/: remove caller signal_pending branch predictions
    mm/: remove caller signal_pending branch predictions
    arch/arc/mm/fault.c: remove caller signal_pending_branch predictions
    kernel/sched/: remove caller signal_pending branch predictions
    kernel/locking/mutex.c: remove caller signal_pending branch predictions
    mm: select HAVE_MOVE_PMD on x86 for faster mremap
    mm: speed up mremap by 20x on large regions
    mm: treewide: remove unused address argument from pte_alloc functions
    initramfs: cleanup incomplete rootfs
    scripts/gdb: fix lx-version string output
    kernel/kcov.c: mark write_comp_data() as notrace
    kernel/sysctl: add panic_print into sysctl
    panic: add options to print system info when panic happens
    bfs: extra sanity checking and static inode bitmap
    exec: separate MM_ANONPAGES and RLIMIT_STACK accounting
    ...

    Linus Torvalds
     

05 Jan, 2019

1 commit


04 Jan, 2019

1 commit

  • Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
    of the user address range verification function since we got rid of the
    old racy i386-only code to walk page tables by hand.

    It existed because the original 80386 would not honor the write protect
    bit when in kernel mode, so you had to do COW by hand before doing any
    user access. But we haven't supported that in a long time, and these
    days the 'type' argument is a purely historical artifact.

    A discussion about extending 'user_access_begin()' to do the range
    checking resulted this patch, because there is no way we're going to
    move the old VERIFY_xyz interface to that model. And it's best done at
    the end of the merge window when I've done most of my merges, so let's
    just get this done once and for all.

    This patch was mostly done with a sed-script, with manual fix-ups for
    the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

    There were a couple of notable cases:

    - csky still had the old "verify_area()" name as an alias.

    - the iter_iov code had magical hardcoded knowledge of the actual
    values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
    really used it)

    - microblaze used the type argument for a debug printout

    but other than those oddities this should be a total no-op patch.

    I tried to fix up all architectures, did fairly extensive grepping for
    access_ok() uses, and the changes are trivial, but I may have missed
    something. Any missed conversion should be trivially fixable, though.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Dec, 2018

1 commit

  • Commit df06b37ffe5a ("mm/gup: cache dev_pagemap while pinning pages")
    attempted to operate on each page that get_user_pages had retrieved. In
    order to do that, it created a common exit point from the routine.
    However, one case was missed, which this patch fixes up.

    Also, there was still an unnecessary shadow declaration (with a
    different type) of the "ret" variable, which this patch removes.

    Keith's description of the situation is:

    This also fixes a potentially leaked dev_pagemap reference count if a
    failure occurs when an iteration crosses a vma boundary. I don't think
    it's normal to have different vma's on a users mapped zone device
    memory, but good to fix anyway.

    I actually thought that this code:

    /* first iteration or cross vma bound */
    if (!vma || start >= vma->vm_end) {
    vma = find_extend_vma(mm, start);
    if (!vma && in_gate_area(mm, start)) {
    ret = get_gate_page(mm, start & PAGE_MASK,
    gup_flags, &vma,
    pages ? &pages[i] : NULL);
    if (ret)
    goto out;

    dealt with the "you're trying to pin the gate page, as part of this
    call", rather than the generic case of crossing a vma boundary. (I
    think there's a fine point that I must be overlooking.) But it's still a
    valid case, either way.

    Link: http://lkml.kernel.org/r/20181121081402.29641-2-jhubbard@nvidia.com
    Fixes: df06b37ffe5a4 ("mm/gup: cache dev_pagemap while pinning pages")
    Signed-off-by: John Hubbard
    Reviewed-by: Keith Busch
    Cc: Dan Williams
    Cc: Kirill A. Shutemov
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Hubbard