30 Jun, 2021

3 commits

  • The page reporting order (threshold) is sticky to @pageblock_order by
    default. The page reporting can never be triggered because the freeing
    page can't come up with a free area like that huge. The situation becomes
    worse when the system memory becomes heavily fragmented.

    For example, the following configurations are used on ARM64 when 64KB base
    page size is enabled. In this specific case, the page reporting won't be
    triggered until the freeing page comes up with a 512MB free area. That's
    hard to be met, especially when the system memory becomes heavily
    fragmented.

    PAGE_SIZE: 64KB
    HPAGE_SIZE: 512MB
    pageblock_order: 13 (512MB)
    MAX_ORDER: 14

    This allows the drivers to specify the page reporting order when the page
    reporting device is registered. It falls back to @pageblock_order if it's
    not specified by the driver. The existing users (hv_balloon and
    virtio_balloon) don't specify it and @pageblock_order is still taken as
    their page reporting order. So this shouldn't introduce any functional
    changes.

    Link: https://lkml.kernel.org/r/20210625014710.42954-4-gshan@redhat.com
    Signed-off-by: Gavin Shan
    Reviewed-by: Alexander Duyck
    Cc: Anshuman Khandual
    Cc: Catalin Marinas
    Cc: David Hildenbrand
    Cc: "Michael S. Tsirkin"
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gavin Shan
     
  • The macro PAGE_REPORTING_MIN_ORDER is defined as the page reporting
    threshold. It can't be adjusted at runtime.

    This introduces a variable (@page_reporting_order) to replace the marcro
    (PAGE_REPORTING_MIN_ORDER). MAX_ORDER is assigned to it initially,
    meaning the page reporting is disabled. It will be specified by driver if
    valid one is provided. Otherwise, it will fall back to @pageblock_order.
    It's also exported so that the page reporting order can be adjusted at
    runtime.

    Link: https://lkml.kernel.org/r/20210625014710.42954-3-gshan@redhat.com
    Signed-off-by: Gavin Shan
    Suggested-by: David Hildenbrand
    Reviewed-by: Alexander Duyck
    Cc: Anshuman Khandual
    Cc: Catalin Marinas
    Cc: "Michael S. Tsirkin"
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gavin Shan
     
  • Patch series "mm/page_reporting: Make page reporting work on arm64 with 64KB page size", v4.

    The page reporting threshold is currently equal to @pageblock_order, which
    is 13 and 512MB on arm64 with 64KB base page size selected. The page
    reporting won't be triggered if the freeing page can't come up with a free
    area like that huge. The condition is hard to be met, especially when the
    system memory becomes fragmented.

    This series intends to solve the issue by having page reporting threshold
    as 5 (2MB) on arm64 with 64KB base page size. The patches are organized
    as:

    PATCH[1/4] Fix some coding style in __page_reporting_request().
    PATCH[2/4] Represents page reporting order with variable so that it can
    be exported as module parameter.
    PATCH[3/4] Allows the device driver (e.g. virtio_balloon) to specify
    the page reporting order when the device info is registered.
    PATCH[4/4] Specifies the page reporting order to 5, corresponding to
    2MB in size on ARM64 when 64KB base page size is used.

    This patch (of 4):

    The lines of comments would be starting with one, instead two space. This
    corrects the style.

    Link: https://lkml.kernel.org/r/20210625014710.42954-1-gshan@redhat.com
    Link: https://lkml.kernel.org/r/20210625014710.42954-2-gshan@redhat.com
    Signed-off-by: Gavin Shan
    Reviewed-by: Alexander Duyck
    Cc: David Hildenbrand
    Cc: "Michael S. Tsirkin"
    Cc: Anshuman Khandual
    Cc: Catalin Marinas
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gavin Shan
     

25 Feb, 2021

1 commit


17 Oct, 2020

2 commits

  • The current page_order() can only be called on pages in the buddy
    allocator. For compound pages, you have to use compound_order(). This is
    confusing and led to a bug, so rename page_order() to buddy_order().

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Link: https://lkml.kernel.org/r/20201001152259.14932-2-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • list_for_each_entry_safe() guarantees that we will never stumble over the
    list head; "&page->lru != list" will always evaluate to true. Let's
    simplify.

    [david@redhat.com: Changelog refinements]

    Signed-off-by: Wei Yang
    Signed-off-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Reviewed-by: Alexander Duyck
    Link: http://lkml.kernel.org/r/20200818084448.33969-1-richard.weiyang@linux.alibaba.com
    Signed-off-by: Linus Torvalds

    Wei Yang
     

08 Apr, 2020

3 commits

  • In order to keep ourselves from reporting pages that are just going to be
    reused again in the case of heavy churn we can put a limit on how many
    total pages we will process per pass. Doing this will allow the worker
    thread to go into idle much more quickly so that we avoid competing with
    other threads that might be allocating or freeing pages.

    The logic added here will limit the worker thread to no more than one
    sixteenth of the total free pages in a given area per list. Once that
    limit is reached it will update the state so that at the end of the pass
    we will reschedule the worker to try again in 2 seconds when the memory
    churn has hopefully settled down.

    Again this optimization doesn't show much of a benefit in the standard
    case as the memory churn is minmal. However with page allocator shuffling
    enabled the gain is quite noticeable. Below are the results with a THP
    enabled version of the will-it-scale page_fault1 test showing the
    improvement in iterations for 16 processes or threads.

    Without:
    tasks processes processes_idle threads threads_idle
    16 8283274.75 0.17 5594261.00 38.15

    With:
    tasks processes processes_idle threads threads_idle
    16 8767010.50 0.21 5791312.75 36.98

    Signed-off-by: Alexander Duyck
    Signed-off-by: Andrew Morton
    Acked-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Konrad Rzeszutek Wilk
    Cc: Luiz Capitulino
    Cc: Matthew Wilcox
    Cc: Michael S. Tsirkin
    Cc: Michal Hocko
    Cc: Nitesh Narayan Lal
    Cc: Oscar Salvador
    Cc: Pankaj Gupta
    Cc: Paolo Bonzini
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Wei Wang
    Cc: Yang Zhang
    Cc: wei qi
    Link: http://lkml.kernel.org/r/20200211224719.29318.72113.stgit@localhost.localdomain
    Signed-off-by: Linus Torvalds

    Alexander Duyck
     
  • Rather than walking over the same pages again and again to get to the
    pages that have yet to be reported we can save ourselves a significant
    amount of time by simply rotating the list so that when we have a full
    list of reported pages the head of the list is pointing to the next
    non-reported page. Doing this should save us some significant time when
    processing each free list.

    This doesn't gain us much in the standard case as all of the non-reported
    pages should be near the top of the list already. However in the case of
    page shuffling this results in a noticeable improvement. Below are the
    will-it-scale page_fault1 w/ THP numbers for 16 tasks with and without
    this patch.

    Without:
    tasks processes processes_idle threads threads_idle
    16 8093776.25 0.17 5393242.00 38.20

    With:
    tasks processes processes_idle threads threads_idle
    16 8283274.75 0.17 5594261.00 38.15

    Signed-off-by: Alexander Duyck
    Signed-off-by: Andrew Morton
    Acked-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Konrad Rzeszutek Wilk
    Cc: Luiz Capitulino
    Cc: Matthew Wilcox
    Cc: Michael S. Tsirkin
    Cc: Michal Hocko
    Cc: Nitesh Narayan Lal
    Cc: Oscar Salvador
    Cc: Pankaj Gupta
    Cc: Paolo Bonzini
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Wei Wang
    Cc: Yang Zhang
    Cc: wei qi
    Link: http://lkml.kernel.org/r/20200211224708.29318.16862.stgit@localhost.localdomain
    Signed-off-by: Linus Torvalds

    Alexander Duyck
     
  • In order to pave the way for free page reporting in virtualized
    environments we will need a way to get pages out of the free lists and
    identify those pages after they have been returned. To accomplish this,
    this patch adds the concept of a Reported Buddy, which is essentially
    meant to just be the Uptodate flag used in conjunction with the Buddy page
    type.

    To prevent the reported pages from leaking outside of the buddy lists I
    added a check to clear the PageReported bit in the del_page_from_free_list
    function. As a result any reported page that is split, merged, or
    allocated will have the flag cleared prior to the PageBuddy value being
    cleared.

    The process for reporting pages is fairly simple. Once we free a page
    that meets the minimum order for page reporting we will schedule a worker
    thread to start 2s or more in the future. That worker thread will begin
    working from the lowest supported page reporting order up to MAX_ORDER - 1
    pulling unreported pages from the free list and storing them in the
    scatterlist.

    When processing each individual free list it is necessary for the worker
    thread to release the zone lock when it needs to stop and report the full
    scatterlist of pages. To reduce the work of the next iteration the worker
    thread will rotate the free list so that the first unreported page in the
    free list becomes the first entry in the list.

    It will then call a reporting function providing information on how many
    entries are in the scatterlist. Once the function completes it will
    return the pages to the free area from which they were allocated and start
    over pulling more pages from the free areas until there are no longer
    enough pages to report on to keep the worker busy, or we have processed as
    many pages as were contained in the free area when we started processing
    the list.

    The worker thread will work in a round-robin fashion making its way though
    each zone requesting reporting, and through each reportable free list
    within that zone. Once all free areas within the zone have been processed
    it will check to see if there have been any requests for reporting while
    it was processing. If so it will reschedule the worker thread to start up
    again in roughly 2s and exit.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Andrew Morton
    Acked-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Konrad Rzeszutek Wilk
    Cc: Luiz Capitulino
    Cc: Matthew Wilcox
    Cc: Michael S. Tsirkin
    Cc: Michal Hocko
    Cc: Nitesh Narayan Lal
    Cc: Oscar Salvador
    Cc: Pankaj Gupta
    Cc: Paolo Bonzini
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Wei Wang
    Cc: Yang Zhang
    Cc: wei qi
    Link: http://lkml.kernel.org/r/20200211224635.29318.19750.stgit@localhost.localdomain
    Signed-off-by: Linus Torvalds

    Alexander Duyck