17 Oct, 2020

40 commits

  • Fix kernel-doc notation to use the documented Returns: syntax and place
    the function description for acct_process() on the first line where it
    should be.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Cc: Alexander Viro
    Link: https://lkml.kernel.org/r/b4c33e5d-98e8-0c47-77b6-ac1859f94d7f@infradead.org
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Fix multiple occurrences of duplicated words in kernel/.

    Fix one typo/spello on the same line as a duplicate word. Change one
    instance of "the the" to "that the". Otherwise just drop one of the
    repeated words.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Link: https://lkml.kernel.org/r/98202fa6-8919-ef63-9efe-c0fad5ca7af1@infradead.org
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Replace do_brk with do_brk_flags in comment of prctl_set_mm_map(), since
    do_brk was removed in following commit.

    Fixes: bb177a732c4369 ("mm: do not bug_on on incorrect length in __mm_populate()")
    Signed-off-by: Liao Pingfang
    Signed-off-by: Yi Wang
    Signed-off-by: Andrew Morton
    Link: https://lkml.kernel.org/r/1600650751-43127-1-git-send-email-wang.yi59@zte.com.cn
    Signed-off-by: Linus Torvalds

    Liao Pingfang
     
  • kernel.h is being used as a dump for all kinds of stuff for a long time.
    Here is the attempt to start cleaning it up by splitting out min()/max()
    et al. helpers.

    At the same time convert users in header and lib folder to use new header.
    Though for time being include new header back to kernel.h to avoid
    twisted indirected includes for other existing users.

    Signed-off-by: Andy Shevchenko
    Signed-off-by: Andrew Morton
    Cc: "Rafael J. Wysocki"
    Cc: Steven Rostedt
    Cc: Rasmus Villemoes
    Cc: Joe Perches
    Cc: Linus Torvalds
    Link: https://lkml.kernel.org/r/20200910164152.GA1891694@smile.fi.intel.com
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • Drop duplicated words {the, that} in comments.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Cc: Joel Becker
    Cc: Christoph Hellwig
    Link: https://lkml.kernel.org/r/20200811021826.25032-1-rdunlap@infradead.org
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • The current page_order() can only be called on pages in the buddy
    allocator. For compound pages, you have to use compound_order(). This is
    confusing and led to a bug, so rename page_order() to buddy_order().

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Link: https://lkml.kernel.org/r/20201001152259.14932-2-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • The early_pfn_valid() macro is defined but it is never used. Remove it.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Acked-by: David Hildenbrand
    Link: https://lkml.kernel.org/r/20200923162915.26935-1-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • In commit 1da177e4c3f4 ("Linux-2.6.12-rc2"), the helper put_write_access()
    came with the atomic_dec operation of the i_writecount field. But it
    forgot to use this helper in __vma_link_file() and dup_mmap().

    Signed-off-by: Miaohe Lin
    Signed-off-by: Andrew Morton
    Link: https://lkml.kernel.org/r/20200924115235.5111-1-linmiaohe@huawei.com
    Signed-off-by: Linus Torvalds

    Miaohe Lin
     
  • Fix following warnings caused by mismatch bewteen function parameters and
    comments.

    mm/workingset.c:228: warning: Function parameter or member 'lruvec' not described in 'workingset_age_nonresident'
    mm/workingset.c:228: warning: Excess function parameter 'memcg' description in 'workingset_age_nonresident'

    Signed-off-by: Xiaofei Tan
    Signed-off-by: Andrew Morton
    Link: https://lkml.kernel.org/r/1600485913-11192-1-git-send-email-tanxiaofei@huawei.com
    Signed-off-by: Linus Torvalds

    Xiaofei Tan
     
  • Correct one function name "get_partials" with "get_partial". Update the
    old struct name of list3 with kmem_cache_node.

    Signed-off-by: Chen Tao
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Rapoport
    Link: https://lkml.kernel.org/r/Message-ID:
    Signed-off-by: Linus Torvalds

    Chen Tao
     
  • Fix some broken comments including typo, grammar error and wrong function
    name.

    Signed-off-by: Miaohe Lin
    Signed-off-by: Andrew Morton
    Link: https://lkml.kernel.org/r/20200913095456.54873-1-linmiaohe@huawei.com
    Signed-off-by: Linus Torvalds

    Miaohe Lin
     
  • Signed-off-by: Yu Zhao
    Signed-off-by: Andrew Morton
    Cc: Alex Shi
    Link: http://lkml.kernel.org/r/20200831175042.3527153-2-yuzhao@google.com
    Signed-off-by: Linus Torvalds

    Yu Zhao
     
  • The #endif at the end of the file matches up with the '#if
    defined(HASHED_PAGE_VIRTUAL)' on line 374. Not the CONFIG_HIGHMEM #if
    earlier.

    Fix comments on both of the #endif's to indicate the correct end of
    blocks for each.

    Signed-off-by: Ira Weiny
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Reviewed-by: Mike Rapoport
    Link: https://lkml.kernel.org/r/20200819184635.112579-1-ira.weiny@intel.com
    Signed-off-by: Linus Torvalds

    Ira Weiny
     
  • list_for_each_entry_safe() guarantees that we will never stumble over the
    list head; "&page->lru != list" will always evaluate to true. Let's
    simplify.

    [david@redhat.com: Changelog refinements]

    Signed-off-by: Wei Yang
    Signed-off-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Reviewed-by: Alexander Duyck
    Link: http://lkml.kernel.org/r/20200818084448.33969-1-richard.weiyang@linux.alibaba.com
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • Remove duplicate header which is included twice.

    Signed-off-by: YueHaibing
    Signed-off-by: Andrew Morton
    Reviewed-by: Pekka Enberg
    Link: http://lkml.kernel.org/r/20200818114323.58156-1-yuehaibing@huawei.com
    Signed-off-by: Linus Torvalds

    YueHaibing
     
  • If we fail to decompress in zram it's a pretty serious problem. We were
    entrusted to be able to decompress the old data but we failed. Either
    we've got some crazy bug in the compression code or we've got memory
    corruption.

    At the moment, when this happens the log looks like this:

    ERR kernel: [ 1833.099861] zram: Decompression failed! err=-22, page=336112
    ERR kernel: [ 1833.099881] zram: Decompression failed! err=-22, page=336112
    ALERT kernel: [ 1833.099886] Read-error on swap-device (253:0:2688896)

    It is true that we have an "ALERT" level log in there, but (at least to
    me) it feels like even this isn't enough to impart the seriousness of this
    error. Let's convert to a WARN_ON. Note that WARN_ON is automatically
    "unlikely" so we can simply replace the old annotation with the new one.

    Signed-off-by: Douglas Anderson
    Signed-off-by: Andrew Morton
    Acked-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Sonny Rao
    Cc: Jens Axboe
    Link: https://lkml.kernel.org/r/20200917174059.1.If09c882545dbe432268f7a67a4d4cfcb6caace4f@changeid
    Signed-off-by: Linus Torvalds

    Douglas Anderson
     
  • As we no longer shuffle via generic_online_page() and when undoing
    isolation, we can simplify the comment.

    We now effectively shuffle only once (properly) when onlining new memory.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Wei Yang
    Acked-by: Michal Hocko
    Cc: Alexander Duyck
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Dave Hansen
    Cc: Vlastimil Babka
    Cc: Wei Yang
    Cc: Oscar Salvador
    Cc: Mike Rapoport
    Cc: Pankaj Gupta
    Cc: Haiyang Zhang
    Cc: "K. Y. Srinivasan"
    Cc: Matthew Wilcox
    Cc: Michael Ellerman
    Cc: Scott Cheloha
    Cc: Stephen Hemminger
    Cc: Wei Liu
    Link: https://lkml.kernel.org/r/20201005121534.15649-6-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • __free_pages_core() is used when exposing fresh memory to the buddy during
    system boot and when onlining memory in generic_online_page().

    generic_online_page() is used in two cases:

    1. Direct memory onlining in online_pages().
    2. Deferred memory onlining in memory-ballooning-like mechanisms (HyperV
    balloon and virtio-mem), when parts of a section are kept
    fake-offline to be fake-onlined later on.

    In 1, we already place pages to the tail of the freelist. Pages will be
    freed to MIGRATE_ISOLATE lists first and moved to the tail of the
    freelists via undo_isolate_page_range().

    In 2, we currently don't implement a proper rule. In case of virtio-mem,
    where we currently always online MAX_ORDER - 1 pages, the pages will be
    placed to the HEAD of the freelist - undesireable. While the hyper-v
    balloon calls generic_online_page() with single pages, usually it will
    call it on successive single pages in a larger block.

    The pages are fresh, so place them to the tail of the freelist and avoid
    the PCP. In __free_pages_core(), remove the now superflouos call to
    set_page_refcounted() and add a comment regarding page initialization and
    the refcount.

    Note: In 2. we currently don't shuffle. If ever relevant (page shuffling
    is usually of limited use in virtualized environments), we might want to
    shuffle after a sequence of generic_online_page() calls in the relevant
    callers.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Oscar Salvador
    Reviewed-by: Wei Yang
    Acked-by: Pankaj Gupta
    Acked-by: Michal Hocko
    Cc: Alexander Duyck
    Cc: Mel Gorman
    Cc: Dave Hansen
    Cc: Mike Rapoport
    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: Wei Liu
    Cc: Matthew Wilcox
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Scott Cheloha
    Link: https://lkml.kernel.org/r/20201005121534.15649-5-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Whenever we move pages between freelists via move_to_free_list()/
    move_freepages_block(), we don't actually touch the pages:
    1. Page isolation doesn't actually touch the pages, it simply isolates
    pageblocks and moves all free pages to the MIGRATE_ISOLATE freelist.
    When undoing isolation, we move the pages back to the target list.
    2. Page stealing (steal_suitable_fallback()) moves free pages directly
    between lists without touching them.
    3. reserve_highatomic_pageblock()/unreserve_highatomic_pageblock() moves
    free pages directly between freelists without touching them.

    We already place pages to the tail of the freelists when undoing isolation
    via __putback_isolated_page(), let's do it in any case (e.g., if order
    Signed-off-by: Andrew Morton
    Reviewed-by: Oscar Salvador
    Reviewed-by: Wei Yang
    Acked-by: Pankaj Gupta
    Acked-by: Michal Hocko
    Cc: Alexander Duyck
    Cc: Mel Gorman
    Cc: Dave Hansen
    Cc: Vlastimil Babka
    Cc: Mike Rapoport
    Cc: Scott Cheloha
    Cc: Michael Ellerman
    Cc: Haiyang Zhang
    Cc: "K. Y. Srinivasan"
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Stephen Hemminger
    Cc: Wei Liu
    Link: https://lkml.kernel.org/r/20201005121534.15649-4-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • __putback_isolated_page() already documents that pages will be placed to
    the tail of the freelist - this is, however, not the case for "order >=
    MAX_ORDER - 2" (see buddy_merge_likely()) - which should be the case for
    all existing users.

    This change affects two users:
    - free page reporting
    - page isolation, when undoing the isolation (including memory onlining).

    This behavior is desirable for pages that haven't really been touched
    lately, so exactly the two users that don't actually read/write page
    content, but rather move untouched pages.

    The new behavior is especially desirable for memory onlining, where we
    allow allocation of newly onlined pages via undo_isolate_page_range() in
    online_pages(). Right now, we always place them to the head of the
    freelist, resulting in undesireable behavior: Assume we add individual
    memory chunks via add_memory() and online them right away to the NORMAL
    zone. We create a dependency chain of unmovable allocations e.g., via the
    memmap. The memmap of the next chunk will be placed onto previous chunks
    - if the last block cannot get offlined+removed, all dependent ones cannot
    get offlined+removed. While this can already be observed with individual
    DIMMs, it's more of an issue for virtio-mem (and I suspect also ppc
    DLPAR).

    Document that this should only be used for optimizations, and no code
    should rely on this behavior for correction (if the order of the freelists
    ever changes).

    We won't care about page shuffling: memory onlining already properly
    shuffles after onlining. free page reporting doesn't care about
    physically contiguous ranges, and there are already cases where page
    isolation will simply move (physically close) free pages to (currently)
    the head of the freelists via move_freepages_block() instead of shuffling.
    If this becomes ever relevant, we should shuffle the whole zone when
    undoing isolation of larger ranges, and after free_contig_range().

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Alexander Duyck
    Reviewed-by: Oscar Salvador
    Reviewed-by: Wei Yang
    Reviewed-by: Pankaj Gupta
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Dave Hansen
    Cc: Vlastimil Babka
    Cc: Mike Rapoport
    Cc: Scott Cheloha
    Cc: Michael Ellerman
    Cc: Haiyang Zhang
    Cc: "K. Y. Srinivasan"
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Stephen Hemminger
    Cc: Wei Liu
    Link: https://lkml.kernel.org/r/20201005121534.15649-3-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Patch series "mm: place pages to the freelist tail when onlining and undoing isolation", v2.

    When adding separate memory blocks via add_memory*() and onlining them
    immediately, the metadata (especially the memmap) of the next block will
    be placed onto one of the just added+onlined block. This creates a chain
    of unmovable allocations: If the last memory block cannot get
    offlined+removed() so will all dependent ones. We directly have unmovable
    allocations all over the place.

    This can be observed quite easily using virtio-mem, however, it can also
    be observed when using DIMMs. The freshly onlined pages will usually be
    placed to the head of the freelists, meaning they will be allocated next,
    turning the just-added memory usually immediately un-removable. The fresh
    pages are cold, prefering to allocate others (that might be hot) also
    feels to be the natural thing to do.

    It also applies to the hyper-v balloon xen-balloon, and ppc64 dlpar: when
    adding separate, successive memory blocks, each memory block will have
    unmovable allocations on them - for example gigantic pages will fail to
    allocate.

    While the ZONE_NORMAL doesn't provide any guarantees that memory can get
    offlined+removed again (any kind of fragmentation with unmovable
    allocations is possible), there are many scenarios (hotplugging a lot of
    memory, running workload, hotunplug some memory/as much as possible) where
    we can offline+remove quite a lot with this patchset.

    a) To visualize the problem, a very simple example:

    Start a VM with 4GB and 8GB of virtio-mem memory:

    [root@localhost ~]# lsmem
    RANGE SIZE STATE REMOVABLE BLOCK
    0x0000000000000000-0x00000000bfffffff 3G online yes 0-23
    0x0000000100000000-0x000000033fffffff 9G online yes 32-103

    Memory block size: 128M
    Total online memory: 12G
    Total offline memory: 0B

    Then try to unplug as much as possible using virtio-mem. Observe which
    memory blocks are still around. Without this patch set:

    [root@localhost ~]# lsmem
    RANGE SIZE STATE REMOVABLE BLOCK
    0x0000000000000000-0x00000000bfffffff 3G online yes 0-23
    0x0000000100000000-0x000000013fffffff 1G online yes 32-39
    0x0000000148000000-0x000000014fffffff 128M online yes 41
    0x0000000158000000-0x000000015fffffff 128M online yes 43
    0x0000000168000000-0x000000016fffffff 128M online yes 45
    0x0000000178000000-0x000000017fffffff 128M online yes 47
    0x0000000188000000-0x0000000197ffffff 256M online yes 49-50
    0x00000001a0000000-0x00000001a7ffffff 128M online yes 52
    0x00000001b0000000-0x00000001b7ffffff 128M online yes 54
    0x00000001c0000000-0x00000001c7ffffff 128M online yes 56
    0x00000001d0000000-0x00000001d7ffffff 128M online yes 58
    0x00000001e0000000-0x00000001e7ffffff 128M online yes 60
    0x00000001f0000000-0x00000001f7ffffff 128M online yes 62
    0x0000000200000000-0x0000000207ffffff 128M online yes 64
    0x0000000210000000-0x0000000217ffffff 128M online yes 66
    0x0000000220000000-0x0000000227ffffff 128M online yes 68
    0x0000000230000000-0x0000000237ffffff 128M online yes 70
    0x0000000240000000-0x0000000247ffffff 128M online yes 72
    0x0000000250000000-0x0000000257ffffff 128M online yes 74
    0x0000000260000000-0x0000000267ffffff 128M online yes 76
    0x0000000270000000-0x0000000277ffffff 128M online yes 78
    0x0000000280000000-0x0000000287ffffff 128M online yes 80
    0x0000000290000000-0x0000000297ffffff 128M online yes 82
    0x00000002a0000000-0x00000002a7ffffff 128M online yes 84
    0x00000002b0000000-0x00000002b7ffffff 128M online yes 86
    0x00000002c0000000-0x00000002c7ffffff 128M online yes 88
    0x00000002d0000000-0x00000002d7ffffff 128M online yes 90
    0x00000002e0000000-0x00000002e7ffffff 128M online yes 92
    0x00000002f0000000-0x00000002f7ffffff 128M online yes 94
    0x0000000300000000-0x0000000307ffffff 128M online yes 96
    0x0000000310000000-0x0000000317ffffff 128M online yes 98
    0x0000000320000000-0x0000000327ffffff 128M online yes 100
    0x0000000330000000-0x000000033fffffff 256M online yes 102-103

    Memory block size: 128M
    Total online memory: 8.1G
    Total offline memory: 0B

    With this patch set:

    [root@localhost ~]# lsmem
    RANGE SIZE STATE REMOVABLE BLOCK
    0x0000000000000000-0x00000000bfffffff 3G online yes 0-23
    0x0000000100000000-0x000000013fffffff 1G online yes 32-39

    Memory block size: 128M
    Total online memory: 4G
    Total offline memory: 0B

    All memory can get unplugged, all memory block can get removed. Of
    course, no workload ran and the system was basically idle, but it
    highlights the issue - the fairly deterministic chain of unmovable
    allocations. When a huge page for the 2MB memmap is needed, a
    just-onlined 4MB page will be split. The remaining 2MB page will be used
    for the memmap of the next memory block. So one memory block will hold
    the memmap of the two following memory blocks. Finally the pages of the
    last-onlined memory block will get used for the next bigger allocations -
    if any allocation is unmovable, all dependent memory blocks cannot get
    unplugged and removed until that allocation is gone.

    Note that with bigger memory blocks (e.g., 256MB), *all* memory
    blocks are dependent and none can get unplugged again!

    b) Experiment with memory intensive workload

    I performed an experiment with an older version of this patch set (before
    we used undo_isolate_page_range() in online_pages(): Hotplug 56GB to a VM
    with an initial 4GB, onlining all memory to ZONE_NORMAL right from the
    kernel when adding it. I then run various memory intensive workloads that
    consume most system memory for a total of 45 minutes. Once finished, I
    try to unplug as much memory as possible.

    With this change, I am able to remove via virtio-mem (adding individual
    128MB memory blocks) 413 out of 448 added memory blocks. Via individual
    (256MB) DIMMs 380 out of 448 added memory blocks. (I don't have any
    numbers without this patchset, but looking at the above example, it's at
    most half of the 448 memory blocks for virtio-mem, and most probably none
    for DIMMs).

    Again, there are workloads that might behave very differently due to the
    nature of ZONE_NORMAL.

    This change also affects (besides memory onlining):
    - Other users of undo_isolate_page_range(): Pages are always placed to the
    tail.
    -- When memory offlining fails
    -- When memory isolation fails after having isolated some pageblocks
    -- When alloc_contig_range() either succeeds or fails
    - Other users of __putback_isolated_page(): Pages are always placed to the
    tail.
    -- Free page reporting
    - Other users of __free_pages_core()
    -- AFAIKs, any memory that is getting exposed to the buddy during boot.
    IIUC we will now usually allocate memory from lower addresses within
    a zone first (especially during boot).
    - Other users of generic_online_page()
    -- Hyper-V balloon

    This patch (of 5):

    Let's prepare for additional flags and avoid long parameter lists of
    bools. Follow-up patches will also make use of the flags in
    __free_pages_ok().

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Alexander Duyck
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Oscar Salvador
    Reviewed-by: Wei Yang
    Reviewed-by: Pankaj Gupta
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Dave Hansen
    Cc: Mike Rapoport
    Cc: Matthew Wilcox
    Cc: Haiyang Zhang
    Cc: "K. Y. Srinivasan"
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Scott Cheloha
    Cc: Stephen Hemminger
    Cc: Wei Liu
    Cc: Michal Hocko
    Link: https://lkml.kernel.org/r/20201005121534.15649-1-david@redhat.com
    Link: https://lkml.kernel.org/r/20201005121534.15649-2-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • At boot time, or when doing memory hot-add operations, if the links in
    sysfs can't be created, the system is still able to run, so just report
    the error in the kernel log rather than BUG_ON and potentially make system
    unusable because the callpath can be called with locks held.

    Since the number of memory blocks managed could be high, the messages are
    rate limited.

    As a consequence, link_mem_sections() has no status to report anymore.

    Signed-off-by: Laurent Dufour
    Signed-off-by: Andrew Morton
    Reviewed-by: Oscar Salvador
    Acked-by: Michal Hocko
    Acked-by: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: Fenghua Yu
    Cc: Nathan Lynch
    Cc: "Rafael J . Wysocki"
    Cc: Scott Cheloha
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200915094143.79181-4-ldufour@linux.ibm.com
    Signed-off-by: Linus Torvalds

    Laurent Dufour
     
  • "mem" in the name already indicates the root, similar to
    release_mem_region() and devm_request_mem_region(). Make it implicit.
    The only single caller always passes iomem_resource, other parents are not
    applicable.

    Suggested-by: Wei Yang
    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Wei Yang
    Cc: Michal Hocko
    Cc: Dan Williams
    Cc: Jason Gunthorpe
    Cc: Kees Cook
    Cc: Ard Biesheuvel
    Cc: Pankaj Gupta
    Cc: Baoquan He
    Link: https://lkml.kernel.org/r/20200916073041.10355-1-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Let's try to merge system ram resources we add, to minimize the number of
    resources in /proc/iomem. We don't care about the boundaries of
    individual chunks we added.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Wei Liu
    Cc: Michal Hocko
    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: Wei Liu
    Cc: Pankaj Gupta
    Cc: Baoquan He
    Cc: Wei Yang
    Cc: Anton Blanchard
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Boris Ostrovsky
    Cc: Christian Borntraeger
    Cc: Dan Williams
    Cc: Dave Jiang
    Cc: Eric Biederman
    Cc: Greg Kroah-Hartman
    Cc: Heiko Carstens
    Cc: Jason Gunthorpe
    Cc: Jason Wang
    Cc: Juergen Gross
    Cc: Julien Grall
    Cc: Kees Cook
    Cc: Len Brown
    Cc: Leonardo Bras
    Cc: Libor Pechacek
    Cc: Michael Ellerman
    Cc: "Michael S. Tsirkin"
    Cc: Nathan Lynch
    Cc: "Oliver O'Halloran"
    Cc: Paul Mackerras
    Cc: Pingfan Liu
    Cc: "Rafael J. Wysocki"
    Cc: Roger Pau Monné
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vishal Verma
    Link: https://lkml.kernel.org/r/20200911103459.10306-9-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Let's try to merge system ram resources we add, to minimize the number of
    resources in /proc/iomem. We don't care about the boundaries of
    individual chunks we added.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Juergen Gross
    Cc: Michal Hocko
    Cc: Boris Ostrovsky
    Cc: Stefano Stabellini
    Cc: Roger Pau Monné
    Cc: Julien Grall
    Cc: Pankaj Gupta
    Cc: Baoquan He
    Cc: Wei Yang
    Cc: Anton Blanchard
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Christian Borntraeger
    Cc: Dan Williams
    Cc: Dave Jiang
    Cc: Eric Biederman
    Cc: Greg Kroah-Hartman
    Cc: Haiyang Zhang
    Cc: Heiko Carstens
    Cc: Jason Gunthorpe
    Cc: Jason Wang
    Cc: Kees Cook
    Cc: "K. Y. Srinivasan"
    Cc: Len Brown
    Cc: Leonardo Bras
    Cc: Libor Pechacek
    Cc: Michael Ellerman
    Cc: "Michael S. Tsirkin"
    Cc: Nathan Lynch
    Cc: "Oliver O'Halloran"
    Cc: Paul Mackerras
    Cc: Pingfan Liu
    Cc: "Rafael J. Wysocki"
    Cc: Stephen Hemminger
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vishal Verma
    Cc: Wei Liu
    Link: https://lkml.kernel.org/r/20200911103459.10306-8-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • virtio-mem adds memory in memory block granularity, to be able to remove
    it in the same granularity again later, and to grow slowly on demand.
    This, however, results in quite a lot of resources when adding a lot of
    memory. Resources are effectively stored in a list-based tree. Having a
    lot of resources not only wastes memory, it also makes traversing that
    tree more expensive, and makes /proc/iomem explode in size (e.g.,
    requiring kexec-tools to manually merge resources later when e.g., trying
    to create a kdump header).

    Before this patch, we get (/proc/iomem) when hotplugging 2G via virtio-mem
    on x86-64:
    [...]
    100000000-13fffffff : System RAM
    140000000-33fffffff : virtio0
    140000000-147ffffff : System RAM (virtio_mem)
    148000000-14fffffff : System RAM (virtio_mem)
    150000000-157ffffff : System RAM (virtio_mem)
    158000000-15fffffff : System RAM (virtio_mem)
    160000000-167ffffff : System RAM (virtio_mem)
    168000000-16fffffff : System RAM (virtio_mem)
    170000000-177ffffff : System RAM (virtio_mem)
    178000000-17fffffff : System RAM (virtio_mem)
    180000000-187ffffff : System RAM (virtio_mem)
    188000000-18fffffff : System RAM (virtio_mem)
    190000000-197ffffff : System RAM (virtio_mem)
    198000000-19fffffff : System RAM (virtio_mem)
    1a0000000-1a7ffffff : System RAM (virtio_mem)
    1a8000000-1afffffff : System RAM (virtio_mem)
    1b0000000-1b7ffffff : System RAM (virtio_mem)
    1b8000000-1bfffffff : System RAM (virtio_mem)
    3280000000-32ffffffff : PCI Bus 0000:00

    With this patch, we get (/proc/iomem):
    [...]
    fffc0000-ffffffff : Reserved
    100000000-13fffffff : System RAM
    140000000-33fffffff : virtio0
    140000000-1bfffffff : System RAM (virtio_mem)
    3280000000-32ffffffff : PCI Bus 0000:00

    Of course, with more hotplugged memory, it gets worse. When unplugging
    memory blocks again, try_remove_memory() (via offline_and_remove_memory())
    will properly split the resource up again.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Pankaj Gupta
    Cc: Michal Hocko
    Cc: Dan Williams
    Cc: Michael S. Tsirkin
    Cc: Jason Wang
    Cc: Baoquan He
    Cc: Wei Yang
    Cc: Anton Blanchard
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Boris Ostrovsky
    Cc: Christian Borntraeger
    Cc: Dave Jiang
    Cc: Eric Biederman
    Cc: Greg Kroah-Hartman
    Cc: Haiyang Zhang
    Cc: Heiko Carstens
    Cc: Jason Gunthorpe
    Cc: Juergen Gross
    Cc: Julien Grall
    Cc: Kees Cook
    Cc: "K. Y. Srinivasan"
    Cc: Len Brown
    Cc: Leonardo Bras
    Cc: Libor Pechacek
    Cc: Michael Ellerman
    Cc: Nathan Lynch
    Cc: "Oliver O'Halloran"
    Cc: Paul Mackerras
    Cc: Pingfan Liu
    Cc: "Rafael J. Wysocki"
    Cc: Roger Pau Monné
    Cc: Stefano Stabellini
    Cc: Stephen Hemminger
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vishal Verma
    Cc: Wei Liu
    Link: https://lkml.kernel.org/r/20200911103459.10306-7-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Some add_memory*() users add memory in small, contiguous memory blocks.
    Examples include virtio-mem, hyper-v balloon, and the XEN balloon.

    This can quickly result in a lot of memory resources, whereby the actual
    resource boundaries are not of interest (e.g., it might be relevant for
    DIMMs, exposed via /proc/iomem to user space). We really want to merge
    added resources in this scenario where possible.

    Let's provide a flag (MEMHP_MERGE_RESOURCE) to specify that a resource
    either created within add_memory*() or passed via add_memory_resource()
    shall be marked mergeable and merged with applicable siblings.

    To implement that, we need a kernel/resource interface to mark selected
    System RAM resources mergeable (IORESOURCE_SYSRAM_MERGEABLE) and trigger
    merging.

    Note: We really want to merge after the whole operation succeeded, not
    directly when adding a resource to the resource tree (it would break
    add_memory_resource() and require splitting resources again when the
    operation failed - e.g., due to -ENOMEM).

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Pankaj Gupta
    Cc: Michal Hocko
    Cc: Dan Williams
    Cc: Jason Gunthorpe
    Cc: Kees Cook
    Cc: Ard Biesheuvel
    Cc: Thomas Gleixner
    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: Wei Liu
    Cc: Boris Ostrovsky
    Cc: Juergen Gross
    Cc: Stefano Stabellini
    Cc: Roger Pau Monné
    Cc: Julien Grall
    Cc: Baoquan He
    Cc: Wei Yang
    Cc: Anton Blanchard
    Cc: Benjamin Herrenschmidt
    Cc: Christian Borntraeger
    Cc: Dave Jiang
    Cc: Eric Biederman
    Cc: Greg Kroah-Hartman
    Cc: Heiko Carstens
    Cc: Jason Wang
    Cc: Len Brown
    Cc: Leonardo Bras
    Cc: Libor Pechacek
    Cc: Michael Ellerman
    Cc: "Michael S. Tsirkin"
    Cc: Nathan Lynch
    Cc: "Oliver O'Halloran"
    Cc: Paul Mackerras
    Cc: Pingfan Liu
    Cc: "Rafael J. Wysocki"
    Cc: Vasily Gorbik
    Cc: Vishal Verma
    Link: https://lkml.kernel.org/r/20200911103459.10306-6-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • We soon want to pass flags, e.g., to mark added System RAM resources.
    mergeable. Prepare for that.

    This patch is based on a similar patch by Oscar Salvador:

    https://lkml.kernel.org/r/20190625075227.15193-3-osalvador@suse.de

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Juergen Gross # Xen related part
    Reviewed-by: Pankaj Gupta
    Acked-by: Wei Liu
    Cc: Michal Hocko
    Cc: Dan Williams
    Cc: Jason Gunthorpe
    Cc: Baoquan He
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Rafael J. Wysocki"
    Cc: Len Brown
    Cc: Greg Kroah-Hartman
    Cc: Vishal Verma
    Cc: Dave Jiang
    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: Wei Liu
    Cc: Heiko Carstens
    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Cc: David Hildenbrand
    Cc: "Michael S. Tsirkin"
    Cc: Jason Wang
    Cc: Boris Ostrovsky
    Cc: Stefano Stabellini
    Cc: "Oliver O'Halloran"
    Cc: Pingfan Liu
    Cc: Nathan Lynch
    Cc: Libor Pechacek
    Cc: Anton Blanchard
    Cc: Leonardo Bras
    Cc: Ard Biesheuvel
    Cc: Eric Biederman
    Cc: Julien Grall
    Cc: Kees Cook
    Cc: Roger Pau Monné
    Cc: Thomas Gleixner
    Cc: Wei Yang
    Link: https://lkml.kernel.org/r/20200911103459.10306-5-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • We soon want to pass flags via a new type to add_memory() and friends.
    That revealed that we currently don't guard some declarations by
    CONFIG_MEMORY_HOTPLUG.

    While some definitions could be moved to different places, let's keep it
    minimal for now and use CONFIG_MEMORY_HOTPLUG for all functions only
    compiled with CONFIG_MEMORY_HOTPLUG.

    Wrap sparse_decode_mem_map() into CONFIG_MEMORY_HOTPLUG, it's only called
    from CONFIG_MEMORY_HOTPLUG code.

    While at it, remove allow_online_pfn_range(), which is no longer around,
    and mhp_notimplemented(), which is unused.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Cc: Michal Hocko
    Cc: Dan Williams
    Cc: Pankaj Gupta
    Cc: Baoquan He
    Cc: Wei Yang
    Cc: Anton Blanchard
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Boris Ostrovsky
    Cc: Christian Borntraeger
    Cc: Dave Jiang
    Cc: Eric Biederman
    Cc: Greg Kroah-Hartman
    Cc: Haiyang Zhang
    Cc: Heiko Carstens
    Cc: Jason Gunthorpe
    Cc: Jason Wang
    Cc: Juergen Gross
    Cc: Julien Grall
    Cc: Kees Cook
    Cc: "K. Y. Srinivasan"
    Cc: Len Brown
    Cc: Leonardo Bras
    Cc: Libor Pechacek
    Cc: Michael Ellerman
    Cc: "Michael S. Tsirkin"
    Cc: Nathan Lynch
    Cc: "Oliver O'Halloran"
    Cc: Paul Mackerras
    Cc: Pingfan Liu
    Cc: "Rafael J. Wysocki"
    Cc: Roger Pau Monné
    Cc: Stefano Stabellini
    Cc: Stephen Hemminger
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vishal Verma
    Cc: Wei Liu
    Link: https://lkml.kernel.org/r/20200911103459.10306-4-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • IORESOURCE_MEM_DRIVER_MANAGED currently uses an unused PnP bit, which is
    always set to 0 by hardware. This is far from beautiful (and confusing),
    and the bit only applies to SYSRAM. So let's move it out of the
    bus-specific (PnP) defined bits.

    We'll add another SYSRAM specific bit soon. If we ever need more bits for
    other purposes, we can steal some from "desc", or reshuffle/regroup what
    we have.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Cc: Michal Hocko
    Cc: Dan Williams
    Cc: Jason Gunthorpe
    Cc: Kees Cook
    Cc: Ard Biesheuvel
    Cc: Pankaj Gupta
    Cc: Baoquan He
    Cc: Wei Yang
    Cc: Eric Biederman
    Cc: Thomas Gleixner
    Cc: Greg Kroah-Hartman
    Cc: Anton Blanchard
    Cc: Benjamin Herrenschmidt
    Cc: Boris Ostrovsky
    Cc: Christian Borntraeger
    Cc: Dave Jiang
    Cc: Haiyang Zhang
    Cc: Heiko Carstens
    Cc: Jason Wang
    Cc: Juergen Gross
    Cc: Julien Grall
    Cc: "K. Y. Srinivasan"
    Cc: Len Brown
    Cc: Leonardo Bras
    Cc: Libor Pechacek
    Cc: Michael Ellerman
    Cc: "Michael S. Tsirkin"
    Cc: Nathan Lynch
    Cc: "Oliver O'Halloran"
    Cc: Paul Mackerras
    Cc: Pingfan Liu
    Cc: "Rafael J. Wysocki"
    Cc: Roger Pau Monné
    Cc: Stefano Stabellini
    Cc: Stephen Hemminger
    Cc: Vasily Gorbik
    Cc: Vishal Verma
    Cc: Wei Liu
    Link: https://lkml.kernel.org/r/20200911103459.10306-3-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Patch series "selective merging of system ram resources", v4.

    Some add_memory*() users add memory in small, contiguous memory blocks.
    Examples include virtio-mem, hyper-v balloon, and the XEN balloon.

    This can quickly result in a lot of memory resources, whereby the actual
    resource boundaries are not of interest (e.g., it might be relevant for
    DIMMs, exposed via /proc/iomem to user space). We really want to merge
    added resources in this scenario where possible.

    Resources are effectively stored in a list-based tree. Having a lot of
    resources not only wastes memory, it also makes traversing that tree more
    expensive, and makes /proc/iomem explode in size (e.g., requiring
    kexec-tools to manually merge resources when creating a kdump header. The
    current kexec-tools resource count limit does not allow for more than
    ~100GB of memory with a memory block size of 128MB on x86-64).

    Let's allow to selectively merge system ram resources by specifying a new
    flag for add_memory*(). Patch #5 contains a /proc/iomem example. Only
    tested with virtio-mem.

    This patch (of 8):

    Let's make sure splitting a resource on memory hotunplug will never fail.
    This will become more relevant once we merge selected System RAM resources
    - then, we'll trigger that case more often on memory hotunplug.

    In general, this function is already unlikely to fail. When we remove
    memory, we free up quite a lot of metadata (memmap, page tables, memory
    block device, etc.). The only reason it could really fail would be when
    injecting allocation errors.

    All other error cases inside release_mem_region_adjustable() seem to be
    sanity checks if the function would be abused in different context - let's
    add WARN_ON_ONCE() in these cases so we can catch them.

    [natechancellor@gmail.com: fix use of ternary condition in release_mem_region_adjustable]
    Link: https://lkml.kernel.org/r/20200922060748.2452056-1-natechancellor@gmail.com
    Link: https://github.com/ClangBuiltLinux/linux/issues/1159

    Signed-off-by: David Hildenbrand
    Signed-off-by: Nathan Chancellor
    Signed-off-by: Andrew Morton
    Cc: Michal Hocko
    Cc: Dan Williams
    Cc: Jason Gunthorpe
    Cc: Kees Cook
    Cc: Ard Biesheuvel
    Cc: Pankaj Gupta
    Cc: Baoquan He
    Cc: Wei Yang
    Cc: Anton Blanchard
    Cc: Benjamin Herrenschmidt
    Cc: Boris Ostrovsky
    Cc: Christian Borntraeger
    Cc: Dave Jiang
    Cc: Eric Biederman
    Cc: Greg Kroah-Hartman
    Cc: Haiyang Zhang
    Cc: Heiko Carstens
    Cc: Jason Wang
    Cc: Juergen Gross
    Cc: Julien Grall
    Cc: "K. Y. Srinivasan"
    Cc: Len Brown
    Cc: Leonardo Bras
    Cc: Libor Pechacek
    Cc: Michael Ellerman
    Cc: "Michael S. Tsirkin"
    Cc: Nathan Lynch
    Cc: "Oliver O'Halloran"
    Cc: Paul Mackerras
    Cc: Pingfan Liu
    Cc: "Rafael J. Wysocki"
    Cc: Roger Pau Monn
    Cc: Stefano Stabellini
    Cc: Stephen Hemminger
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vishal Verma
    Cc: Wei Liu
    Link: https://lkml.kernel.org/r/20200911103459.10306-2-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Currently, it can happen that pages are allocated (and freed) via the
    buddy before we finished basic memory onlining.

    For example, pages are exposed to the buddy and can be allocated before we
    actually mark the sections online. Allocated pages could suddenly fail
    pfn_to_online_page() checks. We had similar issues with pcp handling,
    when pages are allocated+freed before we reach zone_pcp_update() in
    online_pages() [1].

    Instead, mark all pageblocks MIGRATE_ISOLATE, such that allocations are
    impossible. Once done with the heavy lifting, use
    undo_isolate_page_range() to move the pages to the MIGRATE_MOVABLE
    freelist, marking them ready for allocation. Similar to offline_pages(),
    we have to manually adjust zone->nr_isolate_pageblock.

    [1] https://lkml.kernel.org/r/1597150703-19003-1-git-send-email-charante@codeaurora.org

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Oscar Salvador
    Acked-by: Michal Hocko
    Cc: Wei Yang
    Cc: Baoquan He
    Cc: Pankaj Gupta
    Cc: Charan Teja Reddy
    Cc: Dan Williams
    Cc: Fenghua Yu
    Cc: Logan Gunthorpe
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Mel Gorman
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Mike Rapoport
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200819175957.28465-11-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • On the memory onlining path, we want to start with MIGRATE_ISOLATE, to
    un-isolate the pages after memory onlining is complete. Let's allow
    passing in the migratetype.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Oscar Salvador
    Acked-by: Michal Hocko
    Cc: Wei Yang
    Cc: Baoquan He
    Cc: Pankaj Gupta
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Logan Gunthorpe
    Cc: Dan Williams
    Cc: Mike Rapoport
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Michel Lespinasse
    Cc: Charan Teja Reddy
    Cc: Mel Gorman
    Link: https://lkml.kernel.org/r/20200819175957.28465-10-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Commit ac5d2539b238 ("mm: meminit: reduce number of times pageblocks are
    set during struct page init") moved the actual zone range check, leaving
    only the alignment check for pageblocks.

    Let's drop the stale comment and make the pageblock check easier to read.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Oscar Salvador
    Acked-by: Michal Hocko
    Cc: Wei Yang
    Cc: Baoquan He
    Cc: Pankaj Gupta
    Cc: Mel Gorman
    Cc: Charan Teja Reddy
    Cc: Dan Williams
    Cc: Fenghua Yu
    Cc: Logan Gunthorpe
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Mike Rapoport
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200819175957.28465-9-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • We don't allow to offline memory with holes, all boot memory is online,
    and all hotplugged memory cannot have holes.

    We can now simplify onlining of pages. As we only allow to online/offline
    full sections and sections always span full MAX_ORDER_NR_PAGES, we can
    just process MAX_ORDER - 1 pages without further special handling.

    The number of onlined pages simply corresponds to the number of pages we
    were requested to online.

    While at it, refine the comment regarding the callback not exposing all
    pages to the buddy.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Oscar Salvador
    Acked-by: Michal Hocko
    Cc: Wei Yang
    Cc: Baoquan He
    Cc: Pankaj Gupta
    Cc: Charan Teja Reddy
    Cc: Dan Williams
    Cc: Fenghua Yu
    Cc: Logan Gunthorpe
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Mel Gorman
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Mike Rapoport
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200819175957.28465-8-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Callers no longer need the number of isolated pageblocks. Let's simplify.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Oscar Salvador
    Acked-by: Michal Hocko
    Cc: Wei Yang
    Cc: Baoquan He
    Cc: Pankaj Gupta
    Cc: Charan Teja Reddy
    Cc: Dan Williams
    Cc: Fenghua Yu
    Cc: Logan Gunthorpe
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Mel Gorman
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Mike Rapoport
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200819175957.28465-7-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • We make sure that we cannot have any memory holes right at the beginning
    of offline_pages() and we only support to online/offline full sections.
    Both, sections and pageblocks are a power of two in size, and sections
    always span full pageblocks.

    We can directly calculate the number of isolated pageblocks from nr_pages.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Oscar Salvador
    Acked-by: Michal Hocko
    Cc: Wei Yang
    Cc: Baoquan He
    Cc: Pankaj Gupta
    Cc: Charan Teja Reddy
    Cc: Dan Williams
    Cc: Fenghua Yu
    Cc: Logan Gunthorpe
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Mel Gorman
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Mike Rapoport
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200819175957.28465-6-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • offline_pages() is the only user. __offline_isolated_pages() never gets
    called with ranges that contain memory holes and we no longer care about
    the return value. Drop the return value handling and all pfn_valid()
    checks.

    Update the documentation.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Oscar Salvador
    Acked-by: Michal Hocko
    Cc: Wei Yang
    Cc: Baoquan He
    Cc: Pankaj Gupta
    Cc: Charan Teja Reddy
    Cc: Dan Williams
    Cc: Fenghua Yu
    Cc: Logan Gunthorpe
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Mel Gorman
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Mike Rapoport
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200819175957.28465-5-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • We make sure that we cannot have any memory holes right at the beginning
    of offline_pages(). We no longer need walk_system_ram_range() and can
    call test_pages_isolated() and __offline_isolated_pages() directly.

    offlined_pages always corresponds to nr_pages, so we can simplify that.

    [akpm@linux-foundation.org: patch conflict resolution]

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Oscar Salvador
    Acked-by: Michal Hocko
    Cc: Wei Yang
    Cc: Baoquan He
    Cc: Pankaj Gupta
    Cc: Charan Teja Reddy
    Cc: Dan Williams
    Cc: Fenghua Yu
    Cc: Logan Gunthorpe
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Mel Gorman
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Mike Rapoport
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200819175957.28465-4-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Already two people (including me) tried to offline subsections, because
    the function looks like it can deal with it. But we really can only
    online/offline full sections that are properly aligned (e.g., we can only
    mark full sections online/offline via SECTION_IS_ONLINE).

    Add a simple safety net to document the restriction now. Current users
    (core and powernv/memtrace) respect these restrictions.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Oscar Salvador
    Acked-by: Michal Hocko
    Cc: Wei Yang
    Cc: Baoquan He
    Cc: Pankaj Gupta
    Cc: Charan Teja Reddy
    Cc: Dan Williams
    Cc: Fenghua Yu
    Cc: Logan Gunthorpe
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Mel Gorman
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Mike Rapoport
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200819175957.28465-3-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand