25 May, 2019

1 commit

  • Introduce interfaces for ballooning enqueueing and dequeueing of a list
    of pages. These interfaces reduce the overhead of storing and restoring
    IRQs by batching the operations. In addition they do not panic if the
    list of pages is empty.

    Cc: Jason Wang
    Cc: linux-mm@kvack.org
    Cc: virtualization@lists.linux-foundation.org
    Acked-by: Michael S. Tsirkin
    Reviewed-by: Xavier Deguillard
    Signed-off-by: Nadav Amit
    Signed-off-by: Greg Kroah-Hartman

    Nadav Amit
     

15 May, 2019

1 commit


06 Mar, 2019

2 commits

  • PG_balloon was introduced to implement page migration/compaction for
    pages inflated in virtio-balloon. Nowadays, it is only a marker that a
    page is part of virtio-balloon and therefore logically offline.

    We also want to make use of this flag in other balloon drivers - for
    inflated pages or when onlining a section but keeping some pages offline
    (e.g. used right now by XEN and Hyper-V via set_online_page_callback()).

    We are going to expose this flag to dump tools like makedumpfile. But
    instead of exposing PG_balloon, let's generalize the concept of marking
    pages as logically offline, so it can be reused for other purposes later
    on.

    Rename PG_balloon to PG_offline. This is an indicator that the page is
    logically offline, the content stale and that it should not be touched
    (e.g. a hypervisor would have to allocate backing storage in order for
    the guest to dump an unused page). We can then e.g. exclude such pages
    from dumps.

    We replace and reuse KPF_BALLOON (23), as this shouldn't really harm
    (and for now the semantics stay the same). In following patches, we
    will make use of this bit also in other balloon drivers. While at it,
    document PGTABLE.

    [akpm@linux-foundation.org: fix comment text, per David]
    Link: http://lkml.kernel.org/r/20181119101616.8901-3-david@redhat.com
    Signed-off-by: David Hildenbrand
    Acked-by: Konstantin Khlebnikov
    Acked-by: Michael S. Tsirkin
    Acked-by: Pankaj gupta
    Cc: Jonathan Corbet
    Cc: Alexey Dobriyan
    Cc: Mike Rapoport
    Cc: Christian Hansen
    Cc: Vlastimil Babka
    Cc: "Kirill A. Shutemov"
    Cc: Stephen Rothwell
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Pavel Tatashin
    Cc: Alexander Duyck
    Cc: Naoya Horiguchi
    Cc: Miles Chen
    Cc: David Rientjes
    Cc: Kazuhito Hagio
    Cc: Arnd Bergmann
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Dave Young
    Cc: Greg Kroah-Hartman
    Cc: Haiyang Zhang
    Cc: Juergen Gross
    Cc: Julien Freche
    Cc: Kairui Song
    Cc: "K. Y. Srinivasan"
    Cc: Len Brown
    Cc: Lianbo Jiang
    Cc: Michal Hocko
    Cc: Nadav Amit
    Cc: Omar Sandoval
    Cc: Pavel Machek
    Cc: Rafael J. Wysocki
    Cc: "Rafael J. Wysocki"
    Cc: Stefano Stabellini
    Cc: Stephen Hemminger
    Cc: Vitaly Kuznetsov
    Cc: Xavier Deguillard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Patch series "mm/kdump: allow to exclude pages that are logically
    offline"

    Right now, pages inflated as part of a balloon driver will be dumped by
    dump tools like makedumpfile. While XEN is able to check in the crash
    kernel whether a certain pfn is actuall backed by memory in the
    hypervisor (see xen_oldmem_pfn_is_ram) and optimize this case, dumps of
    virtio-balloon, hv-balloon and VMWare balloon inflated memory will
    essentially result in zero pages getting allocated by the hypervisor and
    the dump getting filled with this data.

    The allocation and reading of zero pages can directly be avoided if a
    dumping tool could know which pages only contain stale information not
    to be dumped.

    Also for XEN, calling into the kernel and asking the hypervisor if a pfn
    is backed can be avoided if the duming tool would skip such pages right
    from the beginning.

    Dumping tools have no idea whether a given page is part of a balloon
    driver and shall not be dumped. Esp. PG_reserved cannot be used for
    that purpose as all memory allocated during early boot is also
    PG_reserved, see discussion at [1]. So some other way of indication is
    required and a new page flag is frowned upon.

    We have PG_balloon (MAPCOUNT value), which is essentially unused now. I
    suggest renaming it to something more generic (PG_offline) to mark pages
    as logically offline. This flag can than e.g. also be used by
    virtio-mem in the future to mark subsections as offline. Or by other
    code that wants to put pages logically offline (e.g. later maybe
    poisoned pages that shall no longer be used).

    This series converts PG_balloon to PG_offline, allows dumping tools to
    query the value to detect such pages and marks pages in the hv-balloon
    and XEN balloon properly as PG_offline. Note that virtio-balloon
    already set pages to PG_balloon (and now PG_offline).

    Please note that this is also helpful for a problem we were seeing under
    Hyper-V: Dumping logically offline memory (pages kept fake offline while
    onlining a section via online_page_callback) would under some condicions
    result in a kernel panic when dumping them.

    As I don't have access to neither XEN nor Hyper-V nor VMWare
    installations, this was only tested with the virtio-balloon and pages
    were properly skipped when dumping. I'll also attach the makedumpfile
    patch to this series.

    [1] https://lkml.org/lkml/2018/7/20/566

    This patch (of 8):

    Commit b1123ea6d3b3 ("mm: balloon: use general non-lru movable page
    feature") reworked balloon handling to make use of the general non-lru
    movable page feature. The big comment block in balloon_compaction.h
    contains quite some outdated information. Let's fix this.

    Link: http://lkml.kernel.org/r/20181119101616.8901-2-david@redhat.com
    Signed-off-by: David Hildenbrand
    Acked-by: Michael S. Tsirkin
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Alexander Duyck
    Cc: Alexey Dobriyan
    Cc: Arnd Bergmann
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Christian Hansen
    Cc: Dave Young
    Cc: David Rientjes
    Cc: Greg Kroah-Hartman
    Cc: Haiyang Zhang
    Cc: Jonathan Corbet
    Cc: Juergen Gross
    Cc: Julien Freche
    Cc: Kairui Song
    Cc: Kazuhito Hagio
    Cc: "Kirill A. Shutemov"
    Cc: Konstantin Khlebnikov
    Cc: "K. Y. Srinivasan"
    Cc: Len Brown
    Cc: Lianbo Jiang
    Cc: Michal Hocko
    Cc: Mike Rapoport
    Cc: Miles Chen
    Cc: Nadav Amit
    Cc: Naoya Horiguchi
    Cc: Omar Sandoval
    Cc: Pankaj gupta
    Cc: Pavel Machek
    Cc: Pavel Tatashin
    Cc: Rafael J. Wysocki
    Cc: "Rafael J. Wysocki"
    Cc: Stefano Stabellini
    Cc: Stephen Hemminger
    Cc: Stephen Rothwell
    Cc: Vitaly Kuznetsov
    Cc: Vlastimil Babka
    Cc: Xavier Deguillard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     

15 Nov, 2017

1 commit

  • fill_balloon doing memory allocations under balloon_lock
    can cause a deadlock when leak_balloon is called from
    virtballoon_oom_notify and tries to take same lock.

    To fix, split page allocation and enqueue and do allocations outside the lock.

    Here's a detailed analysis of the deadlock by Tetsuo Handa:

    In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
    serialize against fill_balloon(). But in fill_balloon(),
    alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
    called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
    implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
    is specified, this allocation attempt might indirectly depend on somebody
    else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
    __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
    virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
    out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
    mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
    will cause OOM lockup.

    Thread1 Thread2
    fill_balloon()
    takes a balloon_lock
    balloon_page_enqueue()
    alloc_page(GFP_HIGHUSER_MOVABLE)
    direct reclaim (__GFP_FS context) takes a fs lock
    waits for that fs lock alloc_page(GFP_NOFS)
    __alloc_pages_may_oom()
    takes the oom_lock
    out_of_memory()
    blocking_notifier_call_chain()
    leak_balloon()
    tries to take that balloon_lock and deadlocks

    Reported-by: Tetsuo Handa
    Cc: Michal Hocko
    Cc: Wei Wang
    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

27 Jul, 2016

2 commits

  • Randy reported below build error.

    > In file included from ../include/linux/balloon_compaction.h:48:0,
    > from ../mm/balloon_compaction.c:11:
    > ../include/linux/compaction.h:237:51: warning: 'struct node' declared inside parameter list [enabled by default]
    > static inline int compaction_register_node(struct node *node)
    > ../include/linux/compaction.h:237:51: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
    > ../include/linux/compaction.h:242:54: warning: 'struct node' declared inside parameter list [enabled by default]
    > static inline void compaction_unregister_node(struct node *node)
    >

    It was caused by non-lru page migration which needs compaction.h but
    compaction.h doesn't include any header to be standalone.

    I think proper header for non-lru page migration is migrate.h rather
    than compaction.h because migrate.h has already headers needed to work
    non-lru page migration indirectly like isolate_mode_t, migrate_mode
    MIGRATEPAGE_SUCCESS.

    [akpm@linux-foundation.org: revert mm-balloon-use-general-non-lru-movable-page-feature-fix.patch temp fix]
    Link: http://lkml.kernel.org/r/20160610003304.GE29779@bbox
    Signed-off-by: Minchan Kim
    Reported-by: Randy Dunlap
    Cc: Konstantin Khlebnikov
    Cc: Vlastimil Babka
    Cc: Gioh Kim
    Cc: Rafael Aquini
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Now, VM has a feature to migrate non-lru movable pages so balloon
    doesn't need custom migration hooks in migrate.c and compaction.c.

    Instead, this patch implements the page->mapping->a_ops->
    {isolate|migrate|putback} functions.

    With that, we could remove hooks for ballooning in general migration
    functions and make balloon compaction simple.

    [akpm@linux-foundation.org: compaction.h requires that the includer first include node.h]
    Link: http://lkml.kernel.org/r/1464736881-24886-4-git-send-email-minchan@kernel.org
    Signed-off-by: Gioh Kim
    Signed-off-by: Minchan Kim
    Acked-by: Vlastimil Babka
    Cc: Rafael Aquini
    Cc: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

10 Oct, 2014

3 commits

  • Always mark pages with PageBalloon even if balloon compaction is disabled
    and expose this mark in /proc/kpageflags as KPF_BALLOON.

    Also this patch adds three counters into /proc/vmstat: "balloon_inflate",
    "balloon_deflate" and "balloon_migrate". They accumulate balloon
    activity. Current size of balloon is (balloon_inflate - balloon_deflate)
    pages.

    All generic balloon code now gathered under option CONFIG_MEMORY_BALLOON.
    It should be selected by ballooning driver which wants use this feature.
    Currently virtio-balloon is the only user.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Rafael Aquini
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Now ballooned pages are detected using PageBalloon(). Fake mapping is no
    longer required. This patch links ballooned pages to balloon device using
    field page->private instead of page->mapping. Also this patch embeds
    balloon_dev_info directly into struct virtio_balloon.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Rafael Aquini
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Sasha Levin reported KASAN splash inside isolate_migratepages_range().
    Problem is in the function __is_movable_balloon_page() which tests
    AS_BALLOON_MAP in page->mapping->flags. This function has no protection
    against anonymous pages. As result it tried to check address space flags
    inside struct anon_vma.

    Further investigation shows more problems in current implementation:

    * Special branch in __unmap_and_move() never works:
    balloon_page_movable() checks page flags and page_count. In
    __unmap_and_move() page is locked, reference counter is elevated, thus
    balloon_page_movable() always fails. As a result execution goes to the
    normal migration path. virtballoon_migratepage() returns
    MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS,
    move_to_new_page() thinks this is an error code and assigns
    newpage->mapping to NULL. Newly migrated page lose connectivity with
    balloon an all ability for further migration.

    * lru_lock erroneously required in isolate_migratepages_range() for
    isolation ballooned page. This function releases lru_lock periodically,
    this makes migration mostly impossible for some pages.

    * balloon_page_dequeue have a tight race with balloon_page_isolate:
    balloon_page_isolate could be executed in parallel with dequeue between
    picking page from list and locking page_lock. Race is rare because they
    use trylock_page() for locking.

    This patch fixes all of them.

    Instead of fake mapping with special flag this patch uses special state of
    page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256. Buddy allocator uses
    PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose. Storing mark
    directly in struct page makes everything safer and easier.

    PagePrivate is used to mark pages present in page list (i.e. not
    isolated, like PageLRU for normal pages). It replaces special rules for
    reference counter and makes balloon migration similar to migration of
    normal pages. This flag is protected by page_lock together with link to
    the balloon device.

    Signed-off-by: Konstantin Khlebnikov
    Reported-by: Sasha Levin
    Link: http://lkml.kernel.org/p/53E6CEAA.9020105@oracle.com
    Cc: Rafael Aquini
    Cc: Andrey Ryabinin
    Cc: [3.8+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

01 Oct, 2013

1 commit

  • Isolated balloon pages can wrongly end up in LRU lists when
    migrate_pages() finishes its round without draining all the isolated
    page list.

    The same issue can happen when reclaim_clean_pages_from_list() tries to
    reclaim pages from an isolated page list, before migration, in the CMA
    path. Such balloon page leak opens a race window against LRU lists
    shrinkers that leads us to the following kernel panic:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
    IP: [] shrink_page_list+0x24e/0x897
    PGD 3cda2067 PUD 3d713067 PMD 0
    Oops: 0000 [#1] SMP
    CPU: 0 PID: 340 Comm: kswapd0 Not tainted 3.12.0-rc1-22626-g4367597 #87
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    RIP: shrink_page_list+0x24e/0x897
    RSP: 0000:ffff88003da499b8 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: ffff88003e82bd60 RCX: 00000000000657d5
    RDX: 0000000000000000 RSI: 000000000000031f RDI: ffff88003e82bd40
    RBP: ffff88003da49ab0 R08: 0000000000000001 R09: 0000000081121a45
    R10: ffffffff81121a45 R11: ffff88003c4a9a28 R12: ffff88003e82bd40
    R13: ffff88003da0e800 R14: 0000000000000001 R15: ffff88003da49d58
    FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000067d9000 CR3: 000000003ace5000 CR4: 00000000000407b0
    Call Trace:
    shrink_inactive_list+0x240/0x3de
    shrink_lruvec+0x3e0/0x566
    __shrink_zone+0x94/0x178
    shrink_zone+0x3a/0x82
    balance_pgdat+0x32a/0x4c2
    kswapd+0x2f0/0x372
    kthread+0xa2/0xaa
    ret_from_fork+0x7c/0xb0
    Code: 80 7d 8f 01 48 83 95 68 ff ff ff 00 4c 89 e7 e8 5a 7b 00 00 48 85 c0 49 89 c5 75 08 80 7d 8f 00 74 3e eb 31 48 8b 80 18 01 00 00 8b 74 0d 48 8b 78 30 be 02 00 00 00 ff d2 eb
    RIP [] shrink_page_list+0x24e/0x897
    RSP
    CR2: 0000000000000028
    ---[ end trace 703d2451af6ffbfd ]---
    Kernel panic - not syncing: Fatal exception

    This patch fixes the issue, by assuring the proper tests are made at
    putback_movable_pages() & reclaim_clean_pages_from_list() to avoid
    isolated balloon pages being wrongly reinserted in LRU lists.

    [akpm@linux-foundation.org: clarify awkward comment text]
    Signed-off-by: Rafael Aquini
    Reported-by: Luiz Capitulino
    Tested-by: Luiz Capitulino
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     

12 Dec, 2012

1 commit

  • Memory fragmentation introduced by ballooning might reduce significantly
    the number of 2MB contiguous memory blocks that can be used within a guest,
    thus imposing performance penalties associated with the reduced number of
    transparent huge pages that could be used by the guest workload.

    This patch introduces a common interface to help a balloon driver on
    making its page set movable to compaction, and thus allowing the system
    to better leverage the compation efforts on memory defragmentation.

    [akpm@linux-foundation.org: use PAGE_FLAGS_CHECK_AT_PREP, s/__balloon_page_flags/page_flags_cleared/, small cleanups]
    [rientjes@google.com: allow balloon compaction for any system with memory compaction enabled, which is the defconfig]
    Signed-off-by: Rafael Aquini
    Acked-by: Mel Gorman
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Rik van Riel
    Cc: Andi Kleen
    Cc: Konrad Rzeszutek Wilk
    Cc: Minchan Kim
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini