20 Jan, 2021

1 commit

  • commit 0eb98f1588c2cc7a79816d84ab18a55d254f481c upstream.

    The huge page size is encoded for VM_FAULT_HWPOISON errors only. So if
    we return VM_FAULT_HWPOISON, huge page size would just be ignored.

    Link: https://lkml.kernel.org/r/20210107123449.38481-1-linmiaohe@huawei.com
    Fixes: aa50d3a7aa81 ("Encode huge page size for VM_FAULT_HWPOISON errors")
    Signed-off-by: Miaohe Lin
    Reviewed-by: Mike Kravetz
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Miaohe Lin
     

06 Jan, 2021

1 commit

  • commit e7dd91c456a8cdbcd7066997d15e36d14276a949 upstream.

    syzbot reported the deadlock here [1]. The issue is in hugetlb cow
    error handling when there are not enough huge pages for the faulting
    task which took the original reservation. It is possible that other
    (child) tasks could have consumed pages associated with the reservation.
    In this case, we want the task which took the original reservation to
    succeed. So, we unmap any associated pages in children so that they can
    be used by the faulting task that owns the reservation.

    The unmapping code needs to hold i_mmap_rwsem in write mode. However,
    due to commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd
    sharing synchronization") we are already holding i_mmap_rwsem in read
    mode when hugetlb_cow is called.

    Technically, i_mmap_rwsem does not need to be held in read mode for COW
    mappings as they can not share pmd's. Modifying the fault code to not
    take i_mmap_rwsem in read mode for COW (and other non-sharable) mappings
    is too involved for a stable fix.

    Instead, we simply drop the hugetlb_fault_mutex and i_mmap_rwsem before
    unmapping. This is OK as it is technically not needed. They are
    reacquired after unmapping as expected by calling code. Since this is
    done in an uncommon error path, the overhead of dropping and reacquiring
    mutexes is acceptable.

    While making changes, remove redundant BUG_ON after unmap_ref_private.

    [1] https://lkml.kernel.org/r/000000000000b73ccc05b5cf8558@google.com

    Link: https://lkml.kernel.org/r/4c5781b8-3b00-761e-c0c7-c5edebb6ec1a@oracle.com
    Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
    Signed-off-by: Mike Kravetz
    Reported-by: syzbot+5eee4145df3c15e96625@syzkaller.appspotmail.com
    Cc: Naoya Horiguchi
    Cc: Michal Hocko
    Cc: Hugh Dickins
    Cc: "Aneesh Kumar K . V"
    Cc: Davidlohr Bueso
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Mike Kravetz
     

30 Dec, 2020

1 commit

  • [ Upstream commit 7fc2513aa237e2ce239ab54d7b04d1d79b317110 ]

    Preserve the error code from region_add() instead of returning success.

    Link: https://lkml.kernel.org/r/X9NGZWnZl5/Mt99R@mwanda
    Fixes: 0db9d74ed884 ("hugetlb: disable region_add file_region coalescing")
    Signed-off-by: Dan Carpenter
    Reviewed-by: Mike Kravetz
    Reviewed-by: David Hildenbrand
    Cc: Mina Almasry
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Dan Carpenter
     

12 Dec, 2020

1 commit

  • Commit 1378a5ee451a ("mm: store compound_nr as well as compound_order")
    added compound_nr counter to first tail struct page, overlaying with
    page->mapping. The overlay itself is fine, but while freeing gigantic
    hugepages via free_contig_range(), a "bad page" check will trigger for
    non-NULL page->mapping on the first tail page:

    BUG: Bad page state in process bash pfn:380001
    page:00000000c35f0856 refcount:0 mapcount:0 mapping:00000000126b68aa index:0x0 pfn:0x380001
    aops:0x0
    flags: 0x3ffff00000000000()
    raw: 3ffff00000000000 0000000000000100 0000000000000122 0000000100000000
    raw: 0000000000000000 0000000000000000 ffffffff00000000 0000000000000000
    page dumped because: non-NULL mapping
    Modules linked in:
    CPU: 6 PID: 616 Comm: bash Not tainted 5.10.0-rc7-next-20201208 #1
    Hardware name: IBM 3906 M03 703 (LPAR)
    Call Trace:
    show_stack+0x6e/0xe8
    dump_stack+0x90/0xc8
    bad_page+0xd6/0x130
    free_pcppages_bulk+0x26a/0x800
    free_unref_page+0x6e/0x90
    free_contig_range+0x94/0xe8
    update_and_free_page+0x1c4/0x2c8
    free_pool_huge_page+0x11e/0x138
    set_max_huge_pages+0x228/0x300
    nr_hugepages_store_common+0xb8/0x130
    kernfs_fop_write+0xd2/0x218
    vfs_write+0xb0/0x2b8
    ksys_write+0xac/0xe0
    system_call+0xe6/0x288
    Disabling lock debugging due to kernel taint

    This is because only the compound_order is cleared in
    destroy_compound_gigantic_page(), and compound_nr is set to
    1U << order == 1 for order 0 in set_compound_order(page, 0).

    Fix this by explicitly clearing compound_nr for first tail page after
    calling set_compound_order(page, 0).

    Link: https://lkml.kernel.org/r/20201208182813.66391-2-gerald.schaefer@linux.ibm.com
    Fixes: 1378a5ee451a ("mm: store compound_nr as well as compound_order")
    Signed-off-by: Gerald Schaefer
    Reviewed-by: Matthew Wilcox (Oracle)
    Cc: Heiko Carstens
    Cc: Mike Kravetz
    Cc: Christian Borntraeger
    Cc: [5.9+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerald Schaefer
     

15 Nov, 2020

1 commit

  • Qian Cai reported the following BUG in [1]

    LTP: starting move_pages12
    BUG: unable to handle page fault for address: ffffffffffffffe0
    ...
    RIP: 0010:anon_vma_interval_tree_iter_first+0xa2/0x170 avc_start_pgoff at mm/interval_tree.c:63
    Call Trace:
    rmap_walk_anon+0x141/0xa30 rmap_walk_anon at mm/rmap.c:1864
    try_to_unmap+0x209/0x2d0 try_to_unmap at mm/rmap.c:1763
    migrate_pages+0x1005/0x1fb0
    move_pages_and_store_status.isra.47+0xd7/0x1a0
    __x64_sys_move_pages+0xa5c/0x1100
    do_syscall_64+0x5f/0x310
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Hugh Dickins diagnosed this as a migration bug caused by code introduced
    to use i_mmap_rwsem for pmd sharing synchronization. Specifically, the
    routine unmap_and_move_huge_page() is always passing the TTU_RMAP_LOCKED
    flag to try_to_unmap() while holding i_mmap_rwsem. This is wrong for
    anon pages as the anon_vma_lock should be held in this case. Further
    analysis suggested that i_mmap_rwsem was not required to he held at all
    when calling try_to_unmap for anon pages as an anon page could never be
    part of a shared pmd mapping.

    Discussion also revealed that the hack in hugetlb_page_mapping_lock_write
    to drop page lock and acquire i_mmap_rwsem is wrong. There is no way to
    keep mapping valid while dropping page lock.

    This patch does the following:

    - Do not take i_mmap_rwsem and set TTU_RMAP_LOCKED for anon pages when
    calling try_to_unmap.

    - Remove the hacky code in hugetlb_page_mapping_lock_write. The routine
    will now simply do a 'trylock' while still holding the page lock. If
    the trylock fails, it will return NULL. This could impact the
    callers:

    - migration calling code will receive -EAGAIN and retry up to the
    hard coded limit (10).

    - memory error code will treat the page as BUSY. This will force
    killing (SIGKILL) instead of SIGBUS any mapping tasks.

    Do note that this change in behavior only happens when there is a
    race. None of the standard kernel testing suites actually hit this
    race, but it is possible.

    [1] https://lore.kernel.org/lkml/20200708012044.GC992@lca.pw/
    [2] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/

    Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
    Reported-by: Qian Cai
    Suggested-by: Hugh Dickins
    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrew Morton
    Acked-by: Naoya Horiguchi
    Cc:
    Link: https://lkml.kernel.org/r/20201105195058.78401-1-mike.kravetz@oracle.com
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

03 Nov, 2020

1 commit

  • Michal Privoznik was using "free page reporting" in QEMU/virtio-balloon
    with hugetlbfs and hit the warning below. QEMU with free page hinting
    uses fallocate(FALLOC_FL_PUNCH_HOLE) to discard pages that are reported
    as free by a VM. The reporting granularity is in pageblock granularity.
    So when the guest reports 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE)
    one huge page in QEMU.

    WARNING: CPU: 7 PID: 6636 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50
    Modules linked in: ...
    CPU: 7 PID: 6636 Comm: qemu-system-x86 Not tainted 5.9.0 #137
    Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
    RIP: 0010:page_counter_uncharge+0x4b/0x50
    ...
    Call Trace:
    hugetlb_cgroup_uncharge_file_region+0x4b/0x80
    region_del+0x1d3/0x300
    hugetlb_unreserve_pages+0x39/0xb0
    remove_inode_hugepages+0x1a8/0x3d0
    hugetlbfs_fallocate+0x3c4/0x5c0
    vfs_fallocate+0x146/0x290
    __x64_sys_fallocate+0x3e/0x70
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Investigation of the issue uncovered bugs in hugetlb cgroup reservation
    accounting. This patch addresses the found issues.

    Fixes: 075a61d07a8e ("hugetlb_cgroup: add accounting for shared mappings")
    Reported-by: Michal Privoznik
    Co-developed-by: David Hildenbrand
    Signed-off-by: David Hildenbrand
    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrew Morton
    Tested-by: Michal Privoznik
    Reviewed-by: Mina Almasry
    Acked-by: Michael S. Tsirkin
    Cc:
    Cc: David Hildenbrand
    Cc: Michal Hocko
    Cc: Muchun Song
    Cc: "Aneesh Kumar K . V"
    Cc: Tejun Heo
    Link: https://lkml.kernel.org/r/20201021204426.36069-1-mike.kravetz@oracle.com
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

16 Oct, 2020

1 commit

  • Pull dma-mapping updates from Christoph Hellwig:

    - rework the non-coherent DMA allocator

    - move private definitions out of

    - lower CMA_ALIGNMENT (Paul Cercueil)

    - remove the omap1 dma address translation in favor of the common code

    - make dma-direct aware of multiple dma offset ranges (Jim Quinlan)

    - support per-node DMA CMA areas (Barry Song)

    - increase the default seg boundary limit (Nicolin Chen)

    - misc fixes (Robin Murphy, Thomas Tai, Xu Wang)

    - various cleanups

    * tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits)
    ARM/ixp4xx: add a missing include of dma-map-ops.h
    dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling
    dma-direct: factor out a dma_direct_alloc_from_pool helper
    dma-direct check for highmem pages in dma_direct_alloc_pages
    dma-mapping: merge into
    dma-mapping: move large parts of to kernel/dma
    dma-mapping: move dma-debug.h to kernel/dma/
    dma-mapping: remove
    dma-mapping: merge into
    dma-contiguous: remove dma_contiguous_set_default
    dma-contiguous: remove dev_set_cma_area
    dma-contiguous: remove dma_declare_contiguous
    dma-mapping: split
    cma: decrease CMA_ALIGNMENT lower limit to 2
    firewire-ohci: use dma_alloc_pages
    dma-iommu: implement ->alloc_noncoherent
    dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods
    dma-mapping: add a new dma_alloc_pages API
    dma-mapping: remove dma_cache_sync
    53c700: convert to dma_alloc_noncoherent
    ...

    Linus Torvalds
     

15 Oct, 2020

1 commit

  • Pull driver core updates from Greg KH:
    "Here is the "big" set of driver core patches for 5.10-rc1

    They include a lot of different things, all related to the driver core
    and/or some driver logic:

    - sysfs common write functions to make it easier to audit sysfs
    attributes

    - device connection cleanups and fixes

    - devm helpers for a few functions

    - NOIO allocations for when devices are being removed

    - minor cleanups and fixes

    All have been in linux-next for a while with no reported issues"

    * tag 'driver-core-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
    regmap: debugfs: use semicolons rather than commas to separate statements
    platform/x86: intel_pmc_core: do not create a static struct device
    drivers core: node: Use a more typical macro definition style for ACCESS_ATTR
    drivers core: Use sysfs_emit for shared_cpu_map_show and shared_cpu_list_show
    mm: and drivers core: Convert hugetlb_report_node_meminfo to sysfs_emit
    drivers core: Miscellaneous changes for sysfs_emit
    drivers core: Reindent a couple uses around sysfs_emit
    drivers core: Remove strcat uses around sysfs_emit and neaten
    drivers core: Use sysfs_emit and sysfs_emit_at for show(device *...) functions
    sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output
    dyndbg: use keyword, arg varnames for query term pairs
    driver core: force NOIO allocations during unplug
    platform_device: switch to simpler IDA interface
    driver core: platform: Document return type of more functions
    Revert "driver core: Annotate dev_err_probe() with __must_check"
    Revert "test_firmware: Test platform fw loading on non-EFI systems"
    iio: adc: xilinx-xadc: use devm_krealloc()
    hwmon: pmbus: use more devres helpers
    devres: provide devm_krealloc()
    syscore: Use pm_pr_dbg() for syscore_{suspend,resume}()
    ...

    Linus Torvalds
     

14 Oct, 2020

10 commits

  • As a debugging aid, huge_pmd_share should make sure i_mmap_rwsem is held
    if necessary. To clarify the 'if necessary', expand the comment block at
    the beginning of huge_pmd_share.

    No functional change. The added i_mmap_assert_locked() call is only
    enabled if CONFIG_LOCKDEP.

    Ideally, this should have been included with commit 34ae204f1851
    ("hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem").

    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrew Morton
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: "Kirill A . Shutemov"
    Cc: Davidlohr Bueso
    Link: https://lkml.kernel.org/r/20200911201248.88537-1-mike.kravetz@oracle.com
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • Function dequeue_huge_page_node_exact() iterates the free list and return
    the first valid free hpage.

    Instead of break and check the loop variant, we could return in the loop
    directly. This could reduce some redundant check.

    [mike.kravetz@oracle.com: points out a logic error]
    [richard.weiyang@linux.alibaba.com: v4]
    Link: https://lkml.kernel.org/r/20200901014636.29737-8-richard.weiyang@linux.alibaba.com

    Signed-off-by: Wei Yang
    Signed-off-by: Andrew Morton
    Cc: Baoquan He
    Cc: Mike Kravetz
    Cc: Vlastimil Babka
    Link: https://lkml.kernel.org/r/20200831022351.20916-8-richard.weiyang@linux.alibaba.com
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • set_hugetlb_cgroup_[rsvd] just manipulate page local data, which is not
    necessary to be protected by hugetlb_lock.

    Let's take this out.

    Signed-off-by: Wei Yang
    Signed-off-by: Andrew Morton
    Reviewed-by: Baoquan He
    Reviewed-by: Mike Kravetz
    Cc: Vlastimil Babka
    Link: https://lkml.kernel.org/r/20200831022351.20916-7-richard.weiyang@linux.alibaba.com
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • The page allocated from buddy is not on any list, so just use list_add()
    is enough.

    Signed-off-by: Wei Yang
    Signed-off-by: Andrew Morton
    Reviewed-by: Baoquan He
    Reviewed-by: Mike Kravetz
    Cc: Vlastimil Babka
    Link: https://lkml.kernel.org/r/20200831022351.20916-6-richard.weiyang@linux.alibaba.com
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • There are only two cases of function add_reservation_in_range()

    * count file_region and return the number in regions_needed
    * do the real list operation without counting

    This means it is not necessary to have two parameters to classify these
    two cases.

    Just use regions_needed to separate them.

    Signed-off-by: Wei Yang
    Signed-off-by: Andrew Morton
    Reviewed-by: Baoquan He
    Reviewed-by: Mike Kravetz
    Cc: Vlastimil Babka
    Link: https://lkml.kernel.org/r/20200831022351.20916-5-richard.weiyang@linux.alibaba.com
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • Instead of add allocated file_region one by one to region_cache, we could
    use list_splice to merge two list at once.

    Also we know the number of entries in the list, increase the number
    directly.

    Signed-off-by: Wei Yang
    Signed-off-by: Andrew Morton
    Reviewed-by: Baoquan He
    Reviewed-by: Mike Kravetz
    Cc: Vlastimil Babka
    Link: https://lkml.kernel.org/r/20200831022351.20916-4-richard.weiyang@linux.alibaba.com
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • We are sure to get a valid file_region, otherwise the
    VM_BUG_ON(resv->region_cache_count
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Cc: Baoquan He
    Cc: Vlastimil Babka
    Link: https://lkml.kernel.org/r/20200831022351.20916-3-richard.weiyang@linux.alibaba.com
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • Patch series "mm/hugetlb: code refine and simplification", v4.

    Following are some cleanups for hugetlb. Simple testing with
    tools/testing/selftests/vm/map_hugetlb passes.

    This patch (of 7):

    Per my understanding, we keep the regions ordered and would always
    coalesce regions properly. So the task to keep this property is just to
    coalesce its neighbour.

    Let's simplify this.

    Signed-off-by: Wei Yang
    Signed-off-by: Andrew Morton
    Reviewed-by: Baoquan He
    Reviewed-by: Mike Kravetz
    Cc: Vlastimil Babka
    Link: https://lkml.kernel.org/r/20200901014636.29737-1-richard.weiyang@linux.alibaba.com
    Link: https://lkml.kernel.org/r/20200831022351.20916-1-richard.weiyang@linux.alibaba.com
    Link: https://lkml.kernel.org/r/20200831022351.20916-2-richard.weiyang@linux.alibaba.com
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • If a swap entry tests positive for either is_[migration|hwpoison]_entry(),
    then its swap_type() is among SWP_MIGRATION_READ, SWP_MIGRATION_WRITE and
    SWP_HWPOISON. All these types >= MAX_SWAPFILES, exactly what is asserted
    with non_swap_entry().

    So the checking non_swap_entry() in is_hugetlb_entry_migration() and
    is_hugetlb_entry_hwpoisoned() is redundant.

    Let's remove it to optimize code.

    Signed-off-by: Baoquan He
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Reviewed-by: David Hildenbrand
    Reviewed-by: Anshuman Khandual
    Link: https://lkml.kernel.org/r/20200723032248.24772-3-bhe@redhat.com
    Signed-off-by: Linus Torvalds

    Baoquan He
     
  • Patch series "mm/hugetlb: Small cleanup and improvement", v2.

    This patch (of 3):

    Just like its neighbour is_hugetlb_entry_migration() has done.

    Signed-off-by: Baoquan He
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Reviewed-by: David Hildenbrand
    Reviewed-by: Anshuman Khandual
    Link: https://lkml.kernel.org/r/20200723032248.24772-1-bhe@redhat.com
    Link: https://lkml.kernel.org/r/20200723032248.24772-2-bhe@redhat.com
    Signed-off-by: Linus Torvalds

    Baoquan He
     

02 Oct, 2020

1 commit


25 Sep, 2020

1 commit


06 Sep, 2020

2 commits

  • There is a race between the assignment of `table->data` and write value
    to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
    the other thread.

    CPU0: CPU1:
    proc_sys_write
    hugetlb_sysctl_handler proc_sys_call_handler
    hugetlb_sysctl_handler_common hugetlb_sysctl_handler
    table->data = &tmp; hugetlb_sysctl_handler_common
    table->data = &tmp;
    proc_doulongvec_minmax
    do_proc_doulongvec_minmax sysctl_head_finish
    __do_proc_doulongvec_minmax unuse_table
    i = table->data;
    *i = val; // corrupt CPU1's stack

    Fix this by duplicating the `table`, and only update the duplicate of
    it. And introduce a helper of proc_hugetlb_doulongvec_minmax() to
    simplify the code.

    The following oops was seen:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor instruction fetch in kernel mode
    #PF: error_code(0x0010) - not-present page
    Code: Bad RIP value.
    ...
    Call Trace:
    ? set_max_huge_pages+0x3da/0x4f0
    ? alloc_pool_huge_page+0x150/0x150
    ? proc_doulongvec_minmax+0x46/0x60
    ? hugetlb_sysctl_handler_common+0x1c7/0x200
    ? nr_hugepages_store+0x20/0x20
    ? copy_fd_bitmaps+0x170/0x170
    ? hugetlb_sysctl_handler+0x1e/0x20
    ? proc_sys_call_handler+0x2f1/0x300
    ? unregister_sysctl_table+0xb0/0xb0
    ? __fd_install+0x78/0x100
    ? proc_sys_write+0x14/0x20
    ? __vfs_write+0x4d/0x90
    ? vfs_write+0xef/0x240
    ? ksys_write+0xc0/0x160
    ? __ia32_sys_read+0x50/0x50
    ? __close_fd+0x129/0x150
    ? __x64_sys_write+0x43/0x50
    ? do_syscall_64+0x6c/0x200
    ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes")
    Signed-off-by: Muchun Song
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Cc: Andi Kleen
    Link: http://lkml.kernel.org/r/20200828031146.43035-1-songmuchun@bytedance.com
    Signed-off-by: Linus Torvalds

    Muchun Song
     
  • Since commit cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic
    hugepages using cma"), the gigantic page would be allocated from node
    which is not the preferred node, although there are pages available from
    that node. The reason is that the nid parameter has been ignored in
    alloc_gigantic_page().

    Besides, the __GFP_THISNODE also need be checked if user required to
    alloc only from the preferred node.

    After this patch, the preferred node is tried first before other allowed
    nodes, and don't try to allocate from other nodes if __GFP_THISNODE is
    specified. If user don't specify the preferred node, the current node
    will be used as preferred node, which makes sure consistent behavior of
    allocating gigantic and non-gigantic hugetlb page.

    Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
    Signed-off-by: Li Xinhai
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Acked-by: Michal Hocko
    Cc: Roman Gushchin
    Link: https://lkml.kernel.org/r/20200902025016.697260-1-lixinhai.lxh@gmail.com
    Signed-off-by: Linus Torvalds

    Li Xinhai
     

01 Sep, 2020

1 commit

  • CMA_MAX_NAME should be visible to CMA's users as they might need it to set
    the name of CMA areas and avoid hardcoding the size locally.
    So this patch moves CMA_MAX_NAME from local header file to include/linux
    header file and removes the hardcode in both hugetlb.c and contiguous.c.

    Signed-off-by: Barry Song
    Signed-off-by: Christoph Hellwig

    Barry Song
     

13 Aug, 2020

6 commits

  • new_non_cma_page() in gup.c requires to allocate the new page that is not
    on the CMA area. new_non_cma_page() implements it by using allocation
    scope APIs.

    However, there is a work-around for hugetlb. Normal hugetlb page
    allocation API for migration is alloc_huge_page_nodemask(). It consists
    of two steps. First is dequeing from the pool. Second is, if there is no
    available page on the queue, allocating by using the page allocator.

    new_non_cma_page() can't use this API since first step (deque) isn't aware
    of scope API to exclude CMA area. So, new_non_cma_page() exports hugetlb
    internal function for the second step, alloc_migrate_huge_page(), to
    global scope and uses it directly. This is suboptimal since hugetlb pages
    on the queue cannot be utilized.

    This patch tries to fix this situation by making the deque function on
    hugetlb CMA aware. In the deque function, CMA memory is skipped if
    PF_MEMALLOC_NOCMA flag is found.

    Signed-off-by: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Acked-by: Mike Kravetz
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: "Aneesh Kumar K . V"
    Cc: Christoph Hellwig
    Cc: Naoya Horiguchi
    Cc: Roman Gushchin
    Link: http://lkml.kernel.org/r/1596180906-8442-2-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • There is no difference between two migration callback functions,
    alloc_huge_page_node() and alloc_huge_page_nodemask(), except
    __GFP_THISNODE handling. It's redundant to have two almost similar
    functions in order to handle this flag. So, this patch tries to remove
    one by introducing a new argument, gfp_mask, to
    alloc_huge_page_nodemask().

    After introducing gfp_mask argument, it's caller's job to provide correct
    gfp_mask. So, every callsites for alloc_huge_page_nodemask() are changed
    to provide gfp_mask.

    Note that it's safe to remove a node id check in alloc_huge_page_node()
    since there is no caller passing NUMA_NO_NODE as a node id.

    Signed-off-by: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Reviewed-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Christoph Hellwig
    Cc: Naoya Horiguchi
    Cc: Roman Gushchin
    Link: http://lkml.kernel.org/r/1594622517-20681-4-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Drop the repeated word "the" in two places.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Reviewed-by: Zi Yan
    Link: http://lkml.kernel.org/r/20200801173822.14973-5-rdunlap@infradead.org
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Once we enable CMA_DEBUGFS, we will get the below errors: directory
    'cma-hugetlb' with parent 'cma' already present.

    We should have different names for different CMA areas.

    Signed-off-by: Barry Song
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Acked-by: Roman Gushchin
    Link: http://lkml.kernel.org/r/20200616223131.33828-3-song.bao.hua@hisilicon.com
    Signed-off-by: Linus Torvalds

    Barry Song
     
  • Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
    synchronization") requires callers of huge_pte_alloc to hold i_mmap_rwsem
    in at least read mode. This is because the explicit locking in
    huge_pmd_share (called by huge_pte_alloc) was removed. When restructuring
    the code, the call to huge_pte_alloc in the else block at the beginning of
    hugetlb_fault was missed.

    Unfortunately, that else clause is exercised when there is no page table
    entry. This will likely lead to a call to huge_pmd_share. If
    huge_pmd_share thinks pmd sharing is possible, it will traverse the
    mapping tree (i_mmap) without holding i_mmap_rwsem. If someone else is
    modifying the tree, bad things such as addressing exceptions or worse
    could happen.

    Simply remove the else clause. It should have been removed previously.
    The code following the else will call huge_pte_alloc with the appropriate
    locking.

    To prevent this type of issue in the future, add routines to assert that
    i_mmap_rwsem is held, and call these routines in huge pmd sharing
    routines.

    Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
    Suggested-by: Matthew Wilcox
    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrew Morton
    Cc: Michal Hocko
    Cc: Hugh Dickins
    Cc: Naoya Horiguchi
    Cc: "Aneesh Kumar K.V"
    Cc: Andrea Arcangeli
    Cc: "Kirill A.Shutemov"
    Cc: Davidlohr Bueso
    Cc: Prakash Sangappa
    Cc:
    Link: http://lkml.kernel.org/r/e670f327-5cf9-1959-96e4-6dc7cc30d3d5@oracle.com
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • In the reservation routine, we only check whether the cpuset meets the
    memory allocation requirements. But we ignore the mempolicy of MPOL_BIND
    case. If someone mmap hugetlb succeeds, but the subsequent memory
    allocation may fail due to mempolicy restrictions and receives the SIGBUS
    signal. This can be reproduced by the follow steps.

    1) Compile the test case.
    cd tools/testing/selftests/vm/
    gcc map_hugetlb.c -o map_hugetlb

    2) Pre-allocate huge pages. Suppose there are 2 numa nodes in the
    system. Each node will pre-allocate one huge page.
    echo 2 > /proc/sys/vm/nr_hugepages

    3) Run test case(mmap 4MB). We receive the SIGBUS signal.
    numactl --membind=3D0 ./map_hugetlb 4

    With this patch applied, the mmap will fail in the step 3) and throw
    "mmap: Cannot allocate memory".

    [akpm@linux-foundation.org: include sched.h for `current']

    Reported-by: Jianchao Guo
    Suggested-by: Michal Hocko
    Signed-off-by: Muchun Song
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Baoquan He
    Link: http://lkml.kernel.org/r/20200728034938.14993-1-songmuchun@bytedance.com
    Signed-off-by: Linus Torvalds

    Muchun Song
     

08 Aug, 2020

2 commits

  • This is found by code observation only.

    Firstly, the worst case scenario should assume the whole range was covered
    by pmd sharing. The old algorithm might not work as expected for ranges
    like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the
    expected range should be (0, 2g).

    Since at it, remove the loop since it should not be required. With that,
    the new code should be faster too when the invalidating range is huge.

    Mike said:

    : With range (1g-2m, 1g+2m) within a vma (0, 2g) the existing code will only
    : adjust to (0, 1g+2m) which is incorrect.
    :
    : We should cc stable. The original reason for adjusting the range was to
    : prevent data corruption (getting wrong page). Since the range is not
    : always adjusted correctly, the potential for corruption still exists.
    :
    : However, I am fairly confident that adjust_range_if_pmd_sharing_possible
    : is only gong to be called in two cases:
    :
    : 1) for a single page
    : 2) for range == entire vma
    :
    : In those cases, the current code should produce the correct results.
    :
    : To be safe, let's just cc stable.

    Fixes: 017b1660df89 ("mm: migration: fix migration of huge PMD shared pages")
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Cc: Andrea Arcangeli
    Cc: Matthew Wilcox
    Cc:
    Link: http://lkml.kernel.org/r/20200730201636.74778-1-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • Patch series "mm: cleanup usage of "

    Most architectures have very similar versions of pXd_alloc_one() and
    pXd_free_one() for intermediate levels of page table. These patches add
    generic versions of these functions in and enable
    use of the generic functions where appropriate.

    In addition, functions declared and defined in headers are
    used mostly by core mm and early mm initialization in arch and there is no
    actual reason to have the included all over the place.
    The first patch in this series removes unneeded includes of

    In the end it didn't work out as neatly as I hoped and moving
    pXd_alloc_track() definitions to would require
    unnecessary changes to arches that have custom page table allocations, so
    I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
    to mm/.

    This patch (of 8):

    In most cases header is required only for allocations of
    page table memory. Most of the .c files that include that header do not
    use symbols declared in and do not require that header.

    As for the other header files that used to include , it is
    possible to move that include into the .c file that actually uses symbols
    from and drop the include from the header file.

    The process was somewhat automated using

    sed -i -E '/[
    Signed-off-by: Andrew Morton
    Reviewed-by: Pekka Enberg
    Acked-by: Geert Uytterhoeven [m68k]
    Cc: Abdul Haleem
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Christophe Leroy
    Cc: Joerg Roedel
    Cc: Max Filippov
    Cc: Peter Zijlstra
    Cc: Satheesh Rajendran
    Cc: Stafford Horne
    Cc: Stephen Rothwell
    Cc: Steven Rostedt
    Cc: Joerg Roedel
    Cc: Matthew Wilcox
    Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

25 Jul, 2020

1 commit

  • hugetlb_cma[0] can be NULL due to various reasons, for example, node0
    has no memory. so NULL hugetlb_cma[0] doesn't necessarily mean cma is
    not enabled. gigantic pages might have been reserved on other nodes.
    This patch fixes possible double reservation and CMA leak.

    [akpm@linux-foundation.org: fix CONFIG_CMA=n warning]
    [sfr@canb.auug.org.au: better checks before using hugetlb_cma]
    Link: http://lkml.kernel.org/r/20200721205716.6dbaa56b@canb.auug.org.au

    Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
    Signed-off-by: Barry Song
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Acked-by: Roman Gushchin
    Cc: Jonathan Cameron
    Cc:
    Link: http://lkml.kernel.org/r/20200710005726.36068-1-song.bao.hua@hisilicon.com
    Signed-off-by: Linus Torvalds

    Barry Song
     

04 Jul, 2020

1 commit

  • The routine hpage_nr_pages() was incorrectly used to calculate the number
    of base pages in a hugetlb page. hpage_nr_pages is designed to be called
    for THP pages and will return HPAGE_PMD_NR for hugetlb pages of any size.

    Due to the context in which hpage_nr_pages was called, it is unlikely to
    produce a user visible error. The routine with the incorrect call is only
    exercised in the case of hugetlb memory error or migration. In addition,
    this would need to be on an architecture which supports huge page sizes
    less than PMD_SIZE. And, the vma containing the huge page would also need
    to smaller than PMD_SIZE.

    Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
    Reported-by: Matthew Wilcox (Oracle)
    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrew Morton
    Reviewed-by: Matthew Wilcox (Oracle)
    Cc: Michal Hocko
    Cc: "Kirill A . Shutemov"
    Cc:
    Link: http://lkml.kernel.org/r/20200629185003.97202-1-mike.kravetz@oracle.com
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

10 Jun, 2020

2 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Patch series "mm: consolidate definitions of page table accessors", v2.

    The low level page table accessors (pXY_index(), pXY_offset()) are
    duplicated across all architectures and sometimes more than once. For
    instance, we have 31 definition of pgd_offset() for 25 supported
    architectures.

    Most of these definitions are actually identical and typically it boils
    down to, e.g.

    static inline unsigned long pmd_index(unsigned long address)
    {
    return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
    }

    static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
    {
    return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
    }

    These definitions can be shared among 90% of the arches provided
    XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

    For architectures that really need a custom version there is always
    possibility to override the generic version with the usual ifdefs magic.

    These patches introduce include/linux/pgtable.h that replaces
    include/asm-generic/pgtable.h and add the definitions of the page table
    accessors to the new header.

    This patch (of 12):

    The linux/mm.h header includes to allow inlining of the
    functions involving page table manipulations, e.g. pte_alloc() and
    pmd_alloc(). So, there is no point to explicitly include
    in the files that include .

    The include statements in such cases are remove with a simple loop:

    for f in $(git grep -l "include ") ; do
    sed -i -e '/include / d' $f
    done

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Mike Rapoport
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

05 Jun, 2020

1 commit

  • [akpm@linux-foundation.org: coding style fixes]
    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Reviewed-by: Ralph Campbell
    Link: http://lkml.kernel.org/r/20200410163714.14085-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     

04 Jun, 2020

4 commits

  • Merge more updates from Andrew Morton:
    "More mm/ work, plenty more to come

    Subsystems affected by this patch series: slub, memcg, gup, kasan,
    pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
    thp, mmap, kconfig"

    * akpm: (131 commits)
    arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
    x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
    riscv: support DEBUG_WX
    mm: add DEBUG_WX support
    drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
    mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
    powerpc/mm: drop platform defined pmd_mknotpresent()
    mm: thp: don't need to drain lru cache when splitting and mlocking THP
    hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
    sparc32: register memory occupied by kernel as memblock.memory
    include/linux/memblock.h: fix minor typo and unclear comment
    mm, mempolicy: fix up gup usage in lookup_node
    tools/vm/page_owner_sort.c: filter out unneeded line
    mm: swap: memcg: fix memcg stats for huge pages
    mm: swap: fix vmstats for huge pages
    mm: vmscan: limit the range of LRU type balancing
    mm: vmscan: reclaim writepage is IO cost
    mm: vmscan: determine anon/file pressure balance at the reclaim root
    mm: balance LRU lists based on relative thrashing
    mm: only count actual rotations as LRU reclaim cost
    ...

    Linus Torvalds
     
  • When huge_pte_offset() is called, the parameter sz can only be PUD_SIZE or
    PMD_SIZE. If sz is PUD_SIZE and code can reach pud, then *pud must be
    none, or normal hugetlb entry, or non-present (migration or hwpoisoned)
    hugetlb entry, and we can directly return pud. When sz is PMD_SIZE, pud
    must be none or present, and if code can reach pmd, we can directly return
    pmd.

    So after this patch the code is simplified by first check on the parameter
    sz, and avoid unnecessary checks in current code. Same semantics of
    existing code is maintained.

    More details about relevant commits:
    commit 9b19df292c66 ("mm/hugetlb.c: make huge_pte_offset() consistent
    and document behaviour") changed the code path for pud and pmd handling,
    see comments about why this patch intends to change it.
    ...
    pud = pud_offset(p4d, addr);
    if (sz != PUD_SIZE && pud_none(*pud)) // [1]
    return NULL;
    /* hugepage or swap? */
    if (pud_huge(*pud) || !pud_present(*pud)) // [2]
    return (pte_t *)pud;

    pmd = pmd_offset(pud, addr);
    if (sz != PMD_SIZE && pmd_none(*pmd)) // [3]
    return NULL;
    /* hugepage or swap? */
    if (pmd_huge(*pmd) || !pmd_present(*pmd)) // [4]
    return (pte_t *)pmd;

    return NULL; // [5]
    ...
    [1]: this is necessary, return NULL for sz == PMD_SIZE;
    [2]: if sz == PUD_SIZE, all valid values of pud entry will cause return;
    [3]: dead code, sz != PMD_SIZE never true;
    [4]: all valid values of pmd entry will cause return;
    [5]: dead code, because of check in [4].

    Now, this patch combines [1] and [2] for pud, and combines [3], [4] and
    [5] for pmd, so avoid unnecessary checks.

    I don't try to catch any invalid values in page table entry, as that will
    be checked by caller and avoid extra branch in this function. Also no
    assert on sz must equal PUD_SIZE or PMD_SIZE, since this function only
    call for hugetlb mapping.

    For commit 3c1d7e6ccb64 ("mm/hugetlb: fix a addressing exception caused by
    huge_pte_offset"), since we don't read the entry more than once now,
    variable pud_entry and pmd_entry are not needed.

    Signed-off-by: Li Xinhai
    Signed-off-by: Andrew Morton
    Cc: Mike Kravetz
    Cc: Jason Gunthorpe
    Cc: Punit Agrawal
    Cc: Longpeng
    Link: http://lkml.kernel.org/r/1587794313-16849-1-git-send-email-lixinhai.lxh@gmail.com
    Signed-off-by: Linus Torvalds

    Li Xinhai
     
  • Previously, a check for hugepages_supported was added before processing
    hugetlb command line parameters. On some architectures such as powerpc,
    hugepages_supported() is not set to true until after command line
    processing. Therefore, no hugetlb command line parameters would be
    accepted.

    Remove the additional checks for hugepages_supported. In hugetlb_init,
    print a warning if !hugepages_supported and command line parameters were
    specified.

    Reported-by: Sandipan Das
    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrew Morton
    Cc: Stephen Rothwell
    Link: http://lkml.kernel.org/r/b1f04f9f-fa46-c2a0-7693-4a0679d2a1ee@oracle.com
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • With all hugetlb page processing done in a single file clean up code.

    - Make code match desired semantics
    - Update documentation with semantics
    - Make all warnings and errors messages start with 'HugeTLB:'.
    - Consistently name command line parsing routines.
    - Warn if !hugepages_supported() and command line parameters have
    been specified.
    - Add comments to code
    - Describe some of the subtle interactions
    - Describe semantics of command line arguments

    This patch also fixes issues with implicitly setting the number of
    gigantic huge pages to preallocate. Previously on X86 command line,

    hugepages=2 default_hugepagesz=1G

    would result in zero 1G pages being preallocated and,

    # grep HugePages_Total /proc/meminfo
    HugePages_Total: 0
    # sysctl -a | grep nr_hugepages
    vm.nr_hugepages = 2
    vm.nr_hugepages_mempolicy = 2
    # cat /proc/sys/vm/nr_hugepages
    2

    After this patch 2 gigantic pages will be preallocated and all the proc,
    sysfs, sysctl and meminfo files will accurately reflect this.

    To address the issue with gigantic pages, a small change in behavior was
    made to command line processing. Previously the command line,

    hugepages=128 default_hugepagesz=2M hugepagesz=2M hugepages=256

    would result in the allocation of 256 2M huge pages. The value 128 would
    be ignored without any warning. After this patch, 128 2M pages will be
    allocated and a warning message will be displayed indicating the value of
    256 is ignored. This change in behavior is required because allocation of
    implicitly specified gigantic pages must be done when the
    default_hugepagesz= is encountered for gigantic pages. Previously the
    code waited until later in the boot process (hugetlb_init), to allocate
    pages of default size. However the bootmem allocator required for
    gigantic allocations is not available at this time.

    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrew Morton
    Tested-by: Sandipan Das
    Acked-by: Gerald Schaefer [s390]
    Acked-by: Will Deacon
    Cc: Albert Ou
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Christian Borntraeger
    Cc: Christophe Leroy
    Cc: Dave Hansen
    Cc: David S. Miller
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Jonathan Corbet
    Cc: Longpeng
    Cc: Mina Almasry
    Cc: Nitesh Narayan Lal
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Peter Xu
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Anders Roxell
    Cc: "Aneesh Kumar K.V"
    Cc: Qian Cai
    Cc: Stephen Rothwell
    Link: http://lkml.kernel.org/r/20200417185049.275845-5-mike.kravetz@oracle.com
    Signed-off-by: Linus Torvalds

    Mike Kravetz