01 Jul, 2021

4 commits

  • All the infrastructure is ready, so we introduce nr_free_vmemmap_pages
    field in the hstate to indicate how many vmemmap pages associated with a
    HugeTLB page that can be freed to buddy allocator. And initialize it in
    the hugetlb_vmemmap_init(). This patch is actual enablement of the
    feature.

    There are only (RESERVE_VMEMMAP_SIZE / sizeof(struct page)) struct page
    structs that can be used when CONFIG_HUGETLB_PAGE_FREE_VMEMMAP, so add a
    BUILD_BUG_ON to catch invalid usage of the tail struct page.

    Link: https://lkml.kernel.org/r/20210510030027.56044-10-songmuchun@bytedance.com
    Signed-off-by: Muchun Song
    Acked-by: Mike Kravetz
    Reviewed-by: Oscar Salvador
    Reviewed-by: Miaohe Lin
    Tested-by: Chen Huang
    Tested-by: Bodeddula Balasubramaniam
    Cc: Alexander Viro
    Cc: Andy Lutomirski
    Cc: Anshuman Khandual
    Cc: Balbir Singh
    Cc: Barry Song
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: David Rientjes
    Cc: HORIGUCHI NAOYA
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Joao Martins
    Cc: Joerg Roedel
    Cc: Jonathan Corbet
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Mina Almasry
    Cc: Oliver Neukum
    Cc: Paul E. McKenney
    Cc: Pawan Gupta
    Cc: Peter Zijlstra
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Xiongchun Duan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Muchun Song
     
  • When we free a HugeTLB page to the buddy allocator, we need to allocate
    the vmemmap pages associated with it. However, we may not be able to
    allocate the vmemmap pages when the system is under memory pressure. In
    this case, we just refuse to free the HugeTLB page. This changes behavior
    in some corner cases as listed below:

    1) Failing to free a huge page triggered by the user (decrease nr_pages).

    User needs to try again later.

    2) Failing to free a surplus huge page when freed by the application.

    Try again later when freeing a huge page next time.

    3) Failing to dissolve a free huge page on ZONE_MOVABLE via
    offline_pages().

    This can happen when we have plenty of ZONE_MOVABLE memory, but
    not enough kernel memory to allocate vmemmmap pages. We may even
    be able to migrate huge page contents, but will not be able to
    dissolve the source huge page. This will prevent an offline
    operation and is unfortunate as memory offlining is expected to
    succeed on movable zones. Users that depend on memory hotplug
    to succeed for movable zones should carefully consider whether the
    memory savings gained from this feature are worth the risk of
    possibly not being able to offline memory in certain situations.

    4) Failing to dissolve a huge page on CMA/ZONE_MOVABLE via
    alloc_contig_range() - once we have that handling in place. Mainly
    affects CMA and virtio-mem.

    Similar to 3). virito-mem will handle migration errors gracefully.
    CMA might be able to fallback on other free areas within the CMA
    region.

    Vmemmap pages are allocated from the page freeing context. In order for
    those allocations to be not disruptive (e.g. trigger oom killer)
    __GFP_NORETRY is used. hugetlb_lock is dropped for the allocation because
    a non sleeping allocation would be too fragile and it could fail too
    easily under memory pressure. GFP_ATOMIC or other modes to access memory
    reserves is not used because we want to prevent consuming reserves under
    heavy hugetlb freeing.

    [mike.kravetz@oracle.com: fix dissolve_free_huge_page use of tail/head page]
    Link: https://lkml.kernel.org/r/20210527231225.226987-1-mike.kravetz@oracle.com
    [willy@infradead.org: fix alloc_vmemmap_page_list documentation warning]
    Link: https://lkml.kernel.org/r/20210615200242.1716568-6-willy@infradead.org

    Link: https://lkml.kernel.org/r/20210510030027.56044-7-songmuchun@bytedance.com
    Signed-off-by: Muchun Song
    Signed-off-by: Mike Kravetz
    Signed-off-by: Matthew Wilcox (Oracle)
    Reviewed-by: Mike Kravetz
    Reviewed-by: Oscar Salvador
    Cc: Alexander Viro
    Cc: Andy Lutomirski
    Cc: Anshuman Khandual
    Cc: Balbir Singh
    Cc: Barry Song
    Cc: Bodeddula Balasubramaniam
    Cc: Borislav Petkov
    Cc: Chen Huang
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: David Rientjes
    Cc: HORIGUCHI NAOYA
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Joao Martins
    Cc: Joerg Roedel
    Cc: Jonathan Corbet
    Cc: Matthew Wilcox
    Cc: Miaohe Lin
    Cc: Michal Hocko
    Cc: Mina Almasry
    Cc: Oliver Neukum
    Cc: Paul E. McKenney
    Cc: Pawan Gupta
    Cc: Peter Zijlstra
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Xiongchun Duan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Muchun Song
     
  • In the subsequent patch, we should allocate the vmemmap pages when freeing
    a HugeTLB page. But update_and_free_page() can be called under any
    context, so we cannot use GFP_KERNEL to allocate vmemmap pages. However,
    we can defer the actual freeing in a kworker to prevent from using
    GFP_ATOMIC to allocate the vmemmap pages.

    The __update_and_free_page() is where the call to allocate vmemmmap pages
    will be inserted.

    Link: https://lkml.kernel.org/r/20210510030027.56044-6-songmuchun@bytedance.com
    Signed-off-by: Muchun Song
    Reviewed-by: Mike Kravetz
    Reviewed-by: Oscar Salvador
    Cc: Alexander Viro
    Cc: Andy Lutomirski
    Cc: Anshuman Khandual
    Cc: Balbir Singh
    Cc: Barry Song
    Cc: Bodeddula Balasubramaniam
    Cc: Borislav Petkov
    Cc: Chen Huang
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: David Rientjes
    Cc: HORIGUCHI NAOYA
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Joao Martins
    Cc: Joerg Roedel
    Cc: Jonathan Corbet
    Cc: Matthew Wilcox
    Cc: Miaohe Lin
    Cc: Michal Hocko
    Cc: Mina Almasry
    Cc: Oliver Neukum
    Cc: Paul E. McKenney
    Cc: Pawan Gupta
    Cc: Peter Zijlstra
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Xiongchun Duan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Muchun Song
     
  • Every HugeTLB has more than one struct page structure. We __know__ that
    we only use the first 4 (__NR_USED_SUBPAGE) struct page structures to
    store metadata associated with each HugeTLB.

    There are a lot of struct page structures associated with each HugeTLB
    page. For tail pages, the value of compound_head is the same. So we can
    reuse first page of tail page structures. We map the virtual addresses of
    the remaining pages of tail page structures to the first tail page struct,
    and then free these page frames. Therefore, we need to reserve two pages
    as vmemmap areas.

    When we allocate a HugeTLB page from the buddy, we can free some vmemmap
    pages associated with each HugeTLB page. It is more appropriate to do it
    in the prep_new_huge_page().

    The free_vmemmap_pages_per_hpage(), which indicates how many vmemmap pages
    associated with a HugeTLB page can be freed, returns zero for now, which
    means the feature is disabled. We will enable it once all the
    infrastructure is there.

    [willy@infradead.org: fix documentation warning]
    Link: https://lkml.kernel.org/r/20210615200242.1716568-5-willy@infradead.org

    Link: https://lkml.kernel.org/r/20210510030027.56044-5-songmuchun@bytedance.com
    Signed-off-by: Muchun Song
    Signed-off-by: Matthew Wilcox (Oracle)
    Reviewed-by: Oscar Salvador
    Tested-by: Chen Huang
    Tested-by: Bodeddula Balasubramaniam
    Acked-by: Michal Hocko
    Reviewed-by: Mike Kravetz
    Cc: Alexander Viro
    Cc: Andy Lutomirski
    Cc: Anshuman Khandual
    Cc: Balbir Singh
    Cc: Barry Song
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: David Rientjes
    Cc: HORIGUCHI NAOYA
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Joao Martins
    Cc: Joerg Roedel
    Cc: Jonathan Corbet
    Cc: Matthew Wilcox
    Cc: Miaohe Lin
    Cc: Mina Almasry
    Cc: Oliver Neukum
    Cc: Paul E. McKenney
    Cc: Pawan Gupta
    Cc: Peter Zijlstra
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Xiongchun Duan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Muchun Song