14 Feb, 2023

14 commits

  • Every caller of restore_reserve_on_error() is now passing in &folio->page,
    change the function to take in a folio directly and clean up the call
    sites.

    Link: https://lkml.kernel.org/r/20230125170537.96973-6-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar
    Cc: Gerald Schaefer
    Cc: John Hubbard
    Cc: Matthew Wilcox
    Cc: Mike Kravetz
    Cc: Muchun Song
    Signed-off-by: Andrew Morton

    Sidhartha Kumar
     
  • Change alloc_huge_page() to alloc_hugetlb_folio() by changing all callers
    to handle the now folio return type of the function. In this conversion,
    alloc_huge_page_vma() is also changed to alloc_hugetlb_folio_vma() and
    hugepage_add_new_anon_rmap() is changed to take in a folio directly. Many
    additions of '&folio->page' are cleaned up in subsequent patches.

    hugetlbfs_fallocate() is also refactored to use the RCU +
    page_cache_next_miss() API.

    Link: https://lkml.kernel.org/r/20230125170537.96973-5-sidhartha.kumar@oracle.com
    Suggested-by: Mike Kravetz
    Reported-by: kernel test robot
    Signed-off-by: Sidhartha Kumar
    Cc: Gerald Schaefer
    Cc: John Hubbard
    Cc: Matthew Wilcox
    Cc: Muchun Song
    Signed-off-by: Andrew Morton

    Sidhartha Kumar
     
  • Convert putback_active_hugepage() to folio_putback_active_hugetlb(), this
    removes one user of the Huge Page macros which take in a page. The
    callers in migrate.c are also cleaned up by being able to directly use the
    src and dst folio variables.

    Link: https://lkml.kernel.org/r/20230125170537.96973-4-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar
    Reviewed-by: Mike Kravetz
    Cc: Gerald Schaefer
    Cc: John Hubbard
    Cc: Matthew Wilcox
    Cc: Muchun Song
    Signed-off-by: Andrew Morton

    Sidhartha Kumar
     
  • Refactor hugetlbfs_pagecache_present() to avoid getting and dropping a
    refcount on a page. Use RCU and page_cache_next_miss() instead.

    Link: https://lkml.kernel.org/r/20230125170537.96973-3-sidhartha.kumar@oracle.com
    Suggested-by: Matthew Wilcox
    Signed-off-by: Sidhartha Kumar
    Cc: Gerald Schaefer
    Cc: John Hubbard
    Cc: kernel test robot
    Cc: Mike Kravetz
    Cc: Muchun Song
    Signed-off-by: Andrew Morton

    Sidhartha Kumar
     
  • Patch series "convert hugetlb fault functions to folios", v2.

    This series converts the hugetlb page faulting functions to operate on
    folios. These include hugetlb_no_page(), hugetlb_wp(),
    copy_hugetlb_page_range(), and hugetlb_mcopy_atomic_pte().

    This patch (of 8):

    Change hugetlb_install_page() to hugetlb_install_folio(). This reduces
    one user of the Huge Page flag macros which take in a page.

    Link: https://lkml.kernel.org/r/20230125170537.96973-1-sidhartha.kumar@oracle.com
    Link: https://lkml.kernel.org/r/20230125170537.96973-2-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar
    Reviewed-by: Mike Kravetz
    Cc: Gerald Schaefer
    Cc: John Hubbard
    Cc: Matthew Wilcox
    Cc: Muchun Song
    Signed-off-by: Andrew Morton

    Sidhartha Kumar
     
  • Change demote_free_huge_page to demote_free_hugetlb_folio() and change
    demote_pool_huge_page() pass in a folio.

    Link: https://lkml.kernel.org/r/20230113223057.173292-9-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar
    Cc: John Hubbard
    Cc: Matthew Wilcox
    Cc: Mike Kravetz
    Cc: Muchun Song
    Signed-off-by: Andrew Morton

    Sidhartha Kumar
     
  • Use the hugetlb folio flag macros inside restore_reserve_on_error() and
    update the comments to reflect the use of folios.

    Link: https://lkml.kernel.org/r/20230113223057.173292-8-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar
    Reviewed-by: Mike Kravetz
    Cc: John Hubbard
    Cc: Matthew Wilcox
    Cc: Muchun Song
    Signed-off-by: Andrew Morton

    Sidhartha Kumar
     
  • Change alloc_huge_page_nodemask() to alloc_hugetlb_folio_nodemask() and
    alloc_migrate_huge_page() to alloc_migrate_hugetlb_folio(). Both
    functions now return a folio rather than a page.

    Link: https://lkml.kernel.org/r/20230113223057.173292-7-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar
    Reviewed-by: Mike Kravetz
    Cc: John Hubbard
    Cc: Matthew Wilcox
    Cc: Muchun Song
    Signed-off-by: Andrew Morton

    Sidhartha Kumar
     
  • Change hugetlb_cgroup_commit_charge{,_rsvd}(), dequeue_huge_page_vma() and
    alloc_buddy_huge_page_with_mpol() to use folios so alloc_huge_page() is
    cleaned by operating on folios until its return.

    Link: https://lkml.kernel.org/r/20230113223057.173292-6-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar
    Reviewed-by: Mike Kravetz
    Cc: John Hubbard
    Cc: Matthew Wilcox
    Cc: Muchun Song
    Signed-off-by: Andrew Morton

    Sidhartha Kumar
     
  • Change alloc_surplus_huge_page() to alloc_surplus_hugetlb_folio() and
    update its callers.

    Link: https://lkml.kernel.org/r/20230113223057.173292-5-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar
    Reviewed-by: Mike Kravetz
    Cc: John Hubbard
    Cc: Matthew Wilcox
    Cc: Muchun Song
    Signed-off-by: Andrew Morton

    Sidhartha Kumar
     
  • dequeue_huge_page_node_exact() is changed to dequeue_hugetlb_folio_node_
    exact() and dequeue_huge_page_nodemask() is changed to dequeue_hugetlb_
    folio_nodemask(). Update their callers to pass in a folio.

    Link: https://lkml.kernel.org/r/20230113223057.173292-4-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar
    Cc: John Hubbard
    Cc: Matthew Wilcox
    Cc: Mike Kravetz
    Cc: Muchun Song
    Signed-off-by: Andrew Morton

    Sidhartha Kumar
     
  • Change __update_and_free_page() to __update_and_free_hugetlb_folio() by
    changing its callers to pass in a folio.

    Link: https://lkml.kernel.org/r/20230113223057.173292-3-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar
    Reviewed-by: Mike Kravetz
    Cc: John Hubbard
    Cc: Matthew Wilcox
    Cc: Muchun Song
    Signed-off-by: Andrew Morton

    Sidhartha Kumar
     
  • Patch series "continue hugetlb folio conversion", v3.

    This series continues the conversion of core hugetlb functions to use
    folios. This series converts many helper funtions in the hugetlb fault
    path. This is in preparation for another series to convert the hugetlb
    fault code paths to operate on folios.

    This patch (of 8):

    Convert isolate_hugetlb() to take in a folio and convert its callers to
    pass a folio. Use page_folio() to convert the callers to use a folio is
    safe as isolate_hugetlb() operates on a head page.

    Link: https://lkml.kernel.org/r/20230113223057.173292-1-sidhartha.kumar@oracle.com
    Link: https://lkml.kernel.org/r/20230113223057.173292-2-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar
    Reviewed-by: Mike Kravetz
    Cc: John Hubbard
    Cc: Matthew Wilcox
    Cc: Mike Kravetz
    Cc: Muchun Song
    Signed-off-by: Andrew Morton

    Sidhartha Kumar
     
  • release_pte_pages() converts from a pfn to a folio by using pfn_folio().
    If the pte is not mapped, pfn_folio() will result in undefined behavior
    which ends up causing a kernel panic[1].

    Only call pfn_folio() once we have validated that the pte is both valid
    and mapped to fix the issue.

    [1] https://lore.kernel.org/linux-mm/ff300770-afe9-908d-23ed-d23e0796e899@samsung.com/

    Link: https://lkml.kernel.org/r/20230213214324.34215-1-vishal.moola@gmail.com
    Signed-off-by: Vishal Moola (Oracle)
    Fixes: 9bdfeea46f49 ("mm/khugepaged: convert release_pte_pages() to use folios")
    Reported-by: Marek Szyprowski
    Tested-by: Marek Szyprowski
    Debugged-by: Alexandre Ghiti
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton

    Vishal Moola (Oracle)
     

11 Feb, 2023

1 commit


10 Feb, 2023

25 commits

  • commit a4574f63edc6 ("mm/memremap_pages: convert to 'struct range'")
    converted res to range, update the comment correspondingly.

    Link: https://lkml.kernel.org/r/1675751220-2-1-git-send-email-lizhijian@fujitsu.com
    Signed-off-by: Li Zhijian
    Cc: Dan Williams
    Signed-off-by: Andrew Morton

    Li Zhijian
     
  • Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the
    driver core allows the usage of const struct kobj_type.

    Take advantage of this to constify the structure definitions to prevent
    modification at runtime.

    Link: https://lkml.kernel.org/r/20230207-kobj_type-damon-v1-1-9d4fea6a465b@weissschuh.net
    Signed-off-by: Thomas Weißschuh
    Reviewed-by: SeongJae Park
    Signed-off-by: Andrew Morton

    Thomas Weißschuh
     
  • Move the flags that should not/are not used outside gup.c and related into
    mm/internal.h to discourage driver abuse.

    To make this more maintainable going forward compact the two FOLL ranges
    with new bit numbers from 0 to 11 and 16 to 21, using shifts so it is
    explicit.

    Switch to an enum so the whole thing is easier to read.

    Link: https://lkml.kernel.org/r/13-v2-987e91b59705+36b-gup_tidy_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe
    Reviewed-by: John Hubbard
    Acked-by: David Hildenbrand
    Cc: David Howells
    Cc: Christoph Hellwig
    Cc: Claudio Imbrenda
    Cc: Alistair Popple
    Cc: Mike Rapoport (IBM)
    Signed-off-by: Andrew Morton

    Jason Gunthorpe
     
  • This function is only used in gup.c and closely related. It touches
    FOLL_PIN so it must be moved before the next patch.

    Link: https://lkml.kernel.org/r/12-v2-987e91b59705+36b-gup_tidy_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe
    Reviewed-by: John Hubbard
    Reviewed-by: David Hildenbrand
    Cc: Alistair Popple
    Cc: Christoph Hellwig
    Cc: Claudio Imbrenda
    Cc: David Howells
    Cc: Mike Rapoport (IBM)
    Signed-off-by: Andrew Morton

    Jason Gunthorpe
     
  • There are only two callers, both can handle the common return code:

    - get_user_page_fast_only() checks == 1

    - gfn_to_page_many_atomic() already returns -1, and the only caller
    checks for negative return values

    Remove the restriction against returning negative values.

    Link: https://lkml.kernel.org/r/11-v2-987e91b59705+36b-gup_tidy_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe
    Acked-by: Mike Rapoport (IBM)
    Reviewed-by: John Hubbard
    Reviewed-by: David Hildenbrand
    Cc: Alistair Popple
    Cc: Christoph Hellwig
    Cc: Claudio Imbrenda
    Cc: David Howells
    Signed-off-by: Andrew Morton

    Jason Gunthorpe
     
  • Commit ed29c2691188 ("drm/i915: Fix userptr so we do not have to worry
    about obj->mm.lock, v7.") removed the only caller, remove this dead code
    too.

    Link: https://lkml.kernel.org/r/10-v2-987e91b59705+36b-gup_tidy_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe
    Acked-by: Mike Rapoport (IBM)
    Reviewed-by: John Hubbard
    Reviewed-by: David Hildenbrand
    Cc: Alistair Popple
    Cc: Christoph Hellwig
    Cc: Claudio Imbrenda
    Cc: David Howells
    Signed-off-by: Andrew Morton

    Jason Gunthorpe
     
  • Now that NULL locked doesn't have a special meaning we can just make it
    non-NULL in all cases and remove the special tests.

    get_user_pages() and pin_user_pages() can safely pass in a locked = 1

    get_user_pages_remote) and pin_user_pages_remote() can swap in a local
    variable for locked if NULL is passed.

    Remove all the NULL checks.

    Link: https://lkml.kernel.org/r/9-v2-987e91b59705+36b-gup_tidy_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe
    Acked-by: Mike Rapoport (IBM)
    Reviewed-by: John Hubbard
    Cc: Alistair Popple
    Cc: Christoph Hellwig
    Cc: Claudio Imbrenda
    Cc: David Hildenbrand
    Cc: David Howells
    Signed-off-by: Andrew Morton

    Jason Gunthorpe
     
  • Setting FOLL_UNLOCKABLE allows GUP to lock/unlock the mmap lock on its
    own. It is a more explicit replacement for locked != NULL. This clears
    the way for passing in locked = 1, without intending that the lock can be
    unlocked.

    Set the flag in all cases where it is used, eg locked is present in the
    external interface or locked is used internally with locked = 0.

    Link: https://lkml.kernel.org/r/8-v2-987e91b59705+36b-gup_tidy_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe
    Acked-by: Mike Rapoport (IBM)
    Reviewed-by: John Hubbard
    Cc: Alistair Popple
    Cc: Christoph Hellwig
    Cc: Claudio Imbrenda
    Cc: David Hildenbrand
    Cc: David Howells
    Signed-off-by: Andrew Morton

    Jason Gunthorpe
     
  • The only caller of this function always passes in a non-NULL locked, so
    just remove this obsolete comment.

    Link: https://lkml.kernel.org/r/7-v2-987e91b59705+36b-gup_tidy_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe
    Reviewed-by: John Hubbard
    Cc: Alistair Popple
    Cc: Christoph Hellwig
    Cc: Claudio Imbrenda
    Cc: David Hildenbrand
    Cc: David Howells
    Cc: Mike Rapoport (IBM)
    Signed-off-by: Andrew Morton

    Jason Gunthorpe
     
  • Since commit 5b78ed24e8ec ("mm/pagemap: add mmap_assert_locked()
    annotations to find_vma*()") we already have this assertion, it is just
    buried in find_vma():

    __get_user_pages_locked()
    __get_user_pages()
    find_extend_vma()
    find_vma()

    Also check it at the top of __get_user_pages_locked() as a form of
    documentation.

    Link: https://lkml.kernel.org/r/6-v2-987e91b59705+36b-gup_tidy_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe
    Reviewed-by: John Hubbard
    Cc: Alistair Popple
    Cc: Christoph Hellwig
    Cc: Claudio Imbrenda
    Cc: David Hildenbrand
    Cc: David Howells
    Cc: Mike Rapoport (IBM)
    Signed-off-by: Andrew Morton

    Jason Gunthorpe
     
  • The GUP family of functions have a complex, but fairly well defined, set
    of invariants for their arguments. Currently these are sprinkled about,
    sometimes in duplicate through many functions.

    Internally we don't follow all the invariants that the external interface
    has to follow, so place these checks directly at the exported interface.
    This ensures the internal functions never reach a violated invariant.

    Remove the duplicated invariant checks.

    The end result is to make these functions fully internal:
    __get_user_pages_locked()
    internal_get_user_pages_fast()
    __gup_longterm_locked()

    And all the other functions call directly into one of these.

    Link: https://lkml.kernel.org/r/5-v2-987e91b59705+36b-gup_tidy_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe
    Suggested-by: John Hubbard
    Reviewed-by: John Hubbard
    Acked-by: Mike Rapoport (IBM)
    Cc: Alistair Popple
    Cc: Christoph Hellwig
    Cc: Claudio Imbrenda
    Cc: David Hildenbrand
    Cc: David Howells
    Signed-off-by: Andrew Morton

    Jason Gunthorpe
     
  • This is part of the internal function of gup.c and is only non-static so
    that the parts of gup.c in the huge_memory.c and hugetlb.c can call it.

    Put it in internal.h beside the similarly purposed try_grab_folio()

    Link: https://lkml.kernel.org/r/4-v2-987e91b59705+36b-gup_tidy_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe
    Reviewed-by: John Hubbard
    Cc: Alistair Popple
    Cc: Christoph Hellwig
    Cc: Claudio Imbrenda
    Cc: David Hildenbrand
    Cc: David Howells
    Cc: Mike Rapoport (IBM)
    Signed-off-by: Andrew Morton

    Jason Gunthorpe
     
  • get_user_pages_remote(), get_user_pages_unlocked() and get_user_pages()
    are never called with FOLL_LONGTERM, so directly call
    __get_user_pages_locked()

    The next patch will add an assertion for this.

    Link: https://lkml.kernel.org/r/3-v2-987e91b59705+36b-gup_tidy_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe
    Suggested-by: John Hubbard
    Reviewed-by: John Hubbard
    Acked-by: Mike Rapoport (IBM)
    Cc: Alistair Popple
    Cc: Christoph Hellwig
    Cc: Claudio Imbrenda
    Cc: David Hildenbrand
    Cc: David Howells
    Signed-off-by: Andrew Morton

    Jason Gunthorpe
     
  • These days FOLL_LONGTERM is not allowed at all on any get_user_pages*()
    functions, it must be only be used with pin_user_pages*(), plus it now has
    universal support for all the pin_user_pages*() functions.

    Link: https://lkml.kernel.org/r/2-v2-987e91b59705+36b-gup_tidy_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe
    Reviewed-by: John Hubbard
    Cc: Alistair Popple
    Cc: Christoph Hellwig
    Cc: Claudio Imbrenda
    Cc: David Hildenbrand
    Cc: David Howells
    Cc: Mike Rapoport (IBM)
    Signed-off-by: Andrew Morton

    Jason Gunthorpe
     
  • Patch series "Simplify the external interface for GUP", v2.

    It is quite a maze of EXPORTED symbols leading up to the three actual
    worker functions of GUP. Simplify this by reorganizing some of the code so
    the EXPORTED symbols directly call the correct internal function with
    validated and consistent arguments.

    Consolidate all the assertions into one place at the top of the call
    chains.

    Remove some dead code.

    Move more things into the mm/internal.h header

    This patch (of 13):

    __get_user_pages_locked() and __gup_longterm_locked() both require the
    mmap lock to be held. They have a slightly unusual locked parameter that
    is used to allow these functions to unlock and relock the mmap lock and
    convey that fact to the caller.

    Several places wrap these functions with a simple mmap_read_lock() just so
    they can follow the optimized locked protocol.

    Consolidate this internally to the functions. Allow internal callers to
    set locked = 0 to cause the functions to acquire and release the lock on
    their own.

    Reorganize __gup_longterm_locked() to use the autolocking in
    __get_user_pages_locked().

    Replace all the places obtaining the mmap_read_lock() just to call
    __get_user_pages_locked() with the new mechanism. Replace all the
    internal callers of get_user_pages_unlocked() with direct calls to
    __gup_longterm_locked() using the new mechanism.

    A following patch will add assertions ensuring the external interface
    continues to always pass in locked = 1.

    Link: https://lkml.kernel.org/r/0-v2-987e91b59705+36b-gup_tidy_jgg@nvidia.com
    Link: https://lkml.kernel.org/r/1-v2-987e91b59705+36b-gup_tidy_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe
    Acked-by: Mike Rapoport (IBM)
    Reviewed-by: John Hubbard
    Cc: Alistair Popple
    Cc: Christoph Hellwig
    Cc: David Hildenbrand
    Cc: David Howells
    Cc: Claudio Imbrenda
    Signed-off-by: Andrew Morton

    Jason Gunthorpe
     
  • Currently, for vmalloc areas with flag VM_IOREMAP set, except of the
    specific alignment clamping in __get_vm_area_node(), they will be

    1) Shown as ioremap in /proc/vmallocinfo;

    2) Ignored by /proc/kcore reading via vread()

    So for the ioremap in __sq_remap() of sh, we should set VM_IOREMAP in flag
    to make it handled correctly as above.

    Link: https://lkml.kernel.org/r/20230206084020.174506-8-bhe@redhat.com
    Signed-off-by: Baoquan He
    Reviewed-by: Lorenzo Stoakes
    Reviewed-by: Uladzislau Rezki (Sony)
    Cc: Dan Carpenter
    Cc: Stephen Brennan
    Signed-off-by: Andrew Morton

    Baoquan He
     
  • Currently, for vmalloc areas with flag VM_IOREMAP set, except of the
    specific alignment clamping in __get_vm_area_node(), they will be

    1) Shown as ioremap in /proc/vmallocinfo;

    2) Ignored by /proc/kcore reading via vread()

    So for the io mapping in ioremap_phb() of ppc, we should set VM_IOREMAP in
    flag to make it handled correctly as above.

    Link: https://lkml.kernel.org/r/20230206084020.174506-7-bhe@redhat.com
    Signed-off-by: Baoquan He
    Reviewed-by: Lorenzo Stoakes
    Reviewed-by: Uladzislau Rezki (Sony)
    Cc: Dan Carpenter
    Cc: Stephen Brennan
    Signed-off-by: Andrew Morton

    Baoquan He
     
  • For areas allocated via vmalloc_xxx() APIs, it searches for unmapped area
    to reserve and allocates new pages to map into, please see function
    __vmalloc_node_range(). During the process, flag VM_UNINITIALIZED is set
    in vm->flags to indicate that the pages allocation and mapping haven't
    been done, until clear_vm_uninitialized_flag() is called to clear
    VM_UNINITIALIZED.

    For this kind of area, if VM_UNINITIALIZED is still set, let's ignore it
    in vread() because pages newly allocated and being mapped in that area
    only contains zero data. reading them out by aligned_vread() is wasting
    time.

    Link: https://lkml.kernel.org/r/20230206084020.174506-6-bhe@redhat.com
    Signed-off-by: Baoquan He
    Reviewed-by: Lorenzo Stoakes
    Reviewed-by: Uladzislau Rezki (Sony)
    Cc: Dan Carpenter
    Cc: Stephen Brennan
    Signed-off-by: Andrew Morton

    Baoquan He
     
  • Now, by marking VMAP_RAM in vmap_area->flags for vm_map_ram area, we can
    clearly differentiate it with other vmalloc areas. So identify
    vm_map_area area by checking VMAP_RAM of vmap_area->flags when shown in
    /proc/vmcoreinfo.

    Meanwhile, the code comment above vm_map_ram area checking in s_show() is
    not needed any more, remove it here.

    Link: https://lkml.kernel.org/r/20230206084020.174506-5-bhe@redhat.com
    Signed-off-by: Baoquan He
    Reviewed-by: Lorenzo Stoakes
    Cc: Dan Carpenter
    Cc: Stephen Brennan
    Cc: Uladzislau Rezki (Sony)
    Signed-off-by: Andrew Morton

    Baoquan He
     
  • Currently, vread can read out vmalloc areas which is associated with a
    vm_struct. While this doesn't work for areas created by vm_map_ram()
    interface because it doesn't have an associated vm_struct. Then in
    vread(), these areas are all skipped.

    Here, add a new function vmap_ram_vread() to read out vm_map_ram areas.
    The area created with vmap_ram_vread() interface directly can be handled
    like the other normal vmap areas with aligned_vread(). While areas which
    will be further subdivided and managed with vmap_block need carefully read
    out page-aligned small regions and zero fill holes.

    Link: https://lkml.kernel.org/r/20230206084020.174506-4-bhe@redhat.com
    Reported-by: Stephen Brennan
    Signed-off-by: Baoquan He
    Reviewed-by: Lorenzo Stoakes
    Tested-by: Stephen Brennan
    Cc: Dan Carpenter
    Cc: Uladzislau Rezki (Sony)
    Signed-off-by: Andrew Morton

    Baoquan He
     
  • Through vmalloc API, a virtual kernel area is reserved for physical
    address mapping. And vmap_area is used to track them, while vm_struct is
    allocated to associate with the vmap_area to store more information and
    passed out.

    However, area reserved via vm_map_ram() is an exception. It doesn't have
    vm_struct to associate with vmap_area. And we can't recognize the
    vmap_area with '->vm == NULL' as a vm_map_ram() area because the normal
    freeing path will set va->vm = NULL before unmapping, please see function
    remove_vm_area().

    Meanwhile, there are two kinds of handling for vm_map_ram area. One is
    the whole vmap_area being reserved and mapped at one time through
    vm_map_area() interface; the other is the whole vmap_area with
    VMAP_BLOCK_SIZE size being reserved, while mapped into split regions with
    smaller size via vb_alloc().

    To mark the area reserved through vm_map_ram(), add flags field into
    struct vmap_area. Bit 0 indicates this is vm_map_ram area created through
    vm_map_ram() interface, while bit 1 marks out the type of vm_map_ram area
    which makes use of vmap_block to manage split regions via vb_alloc/free().

    This is a preparation for later use.

    Link: https://lkml.kernel.org/r/20230206084020.174506-3-bhe@redhat.com
    Signed-off-by: Baoquan He
    Reviewed-by: Lorenzo Stoakes
    Reviewed-by: Uladzislau Rezki (Sony)
    Cc: Dan Carpenter
    Cc: Stephen Brennan
    Signed-off-by: Andrew Morton

    Baoquan He
     
  • Patch series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas", v5.

    Problem:
    ***

    Stephen reported vread() will skip vm_map_ram areas when reading out
    /proc/kcore with drgn utility. Please see below link to get more details.

    /proc/kcore reads 0's for vmap_block
    https://lore.kernel.org/all/87ilk6gos2.fsf@oracle.com/T/#u

    Root cause:
    ***

    The normal vmalloc API uses struct vmap_area to manage the virtual kernel
    area allocated, and associate a vm_struct to store more information and
    pass out. However, area reserved through vm_map_ram() interface doesn't
    allocate vm_struct to associate with. So the current code in vread() will
    skip the vm_map_ram area through 'if (!va->vm)' conditional checking.

    Solution:
    ***

    To mark the area reserved through vm_map_ram() interface, add field
    'flags' into struct vmap_area. Bit 0 indicates this is vm_map_ram area
    created through vm_map_ram() interface, bit 1 marks out the type of
    vm_map_ram area which makes use of vmap_block to manage split regions via
    vb_alloc/free().

    And also add bitmap field 'used_map' into struct vmap_block to mark those
    further subdivided regions being used to differentiate with dirty and free
    regions in vmap_block.

    With the help of above vmap_area->flags and vmap_block->used_map, we can
    recognize and handle vm_map_ram areas successfully. All these are done in
    patch 1~3.

    Meanwhile, do some improvement on areas related to vm_map_ram areas in
    patch 4, 5. And also change area flag from VM_ALLOC to VM_IOREMAP in
    patch 6, 7 because this will show them as 'ioremap' in /proc/vmallocinfo,
    and exclude them from /proc/kcore.

    This patch (of 7):

    In one vmap_block area, there could be three types of regions: region
    being used which is allocated through vb_alloc(), dirty region which is
    freed via vb_free() and free region. Among them, only used region has
    available data. While there's no way to track those used regions
    currently.

    Here, add bitmap field used_map into vmap_block, and set/clear it during
    allocation or freeing regions of vmap_block area.

    This is a preparation for later use.

    Link: https://lkml.kernel.org/r/20230206084020.174506-1-bhe@redhat.com
    Link: https://lkml.kernel.org/r/20230206084020.174506-2-bhe@redhat.com
    Signed-off-by: Baoquan He
    Reviewed-by: Lorenzo Stoakes
    Reviewed-by: Uladzislau Rezki (Sony)
    Cc: Dan Carpenter
    Cc: Stephen Brennan
    Cc: Uladzislau Rezki (Sony)
    Signed-off-by: Andrew Morton

    Baoquan He
     
  • With W=1 and CONFIG_SHMEM=n, shmem.c functions have no prototypes so the
    compiler emits warnings.

    Link: https://lkml.kernel.org/r/20230206190850.4054983-1-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle)
    Cc: Mark Hemment
    Cc: Charan Teja Kalla
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Michal Hocko
    Cc: Pavankumar Kondeti
    Cc: Shakeel Butt
    Cc: Suren Baghdasaryan
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton

    Matthew Wilcox (Oracle)
     
  • These are the folio replacements for shmem_read_mapping_page() and
    shmem_read_mapping_page_gfp().

    [akpm@linux-foundation.org: fix shmem_read_mapping_page_gfp(), per Matthew]
    Link: https://lkml.kernel.org/r/Y+QdJTuzxeBYejw2@casper.infradead.org
    Link: https://lkml.kernel.org/r/20230206162520.4029022-2-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle)
    Cc: Mark Hemment
    Cc: Charan Teja Kalla
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Michal Hocko
    Cc: Pavankumar Kondeti
    Cc: Shakeel Butt
    Cc: Suren Baghdasaryan
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton

    Matthew Wilcox (Oracle)
     
  • This is like read_cache_page_gfp() except it returns the folio instead
    of the precise page.

    Link: https://lkml.kernel.org/r/20230206162520.4029022-1-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle)
    Cc: Charan Teja Kalla
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Mark Hemment
    Cc: Michal Hocko
    Cc: Pavankumar Kondeti
    Cc: Shakeel Butt
    Cc: Suren Baghdasaryan
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton

    Matthew Wilcox (Oracle)