08 Apr, 2020

1 commit

  • Patch series "mm: drop superfluous section checks when onlining/offlining".

    Let's drop some superfluous section checks on the onlining/offlining path.

    This patch (of 3):

    Since commit c5e79ef561b0 ("mm/memory_hotplug.c: don't allow to
    online/offline memory blocks with holes") we have a generic check in
    offline_pages() that disallows offlining memory blocks with holes.

    Memory blocks with missing sections are just another variant of these type
    of blocks. We can stop checking (and especially storing) present
    sections. A proper error message is now printed why offlining failed.

    section_count was initially introduced in commit 07681215975e ("Driver
    core: Add section count to memory_block struct") in order to detect when
    it is okay to remove a memory block. It was used in commit 26bbe7ef6d5c
    ("drivers/base/memory.c: prohibit offlining of memory blocks with missing
    sections") to disallow offlining memory blocks with missing sections. As
    we refactored creation/removal of memory devices and have a proper check
    for holes in place, we can drop the section_count.

    This also removes a leftover comment regarding the mem_sysfs_mutex, which
    was removed in commit 848e19ad3c33 ("drivers/base/memory.c: drop the
    mem_sysfs_mutex").

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: Michal Hocko
    Cc: Dan Williams
    Cc: Pavel Tatashin
    Cc: Anshuman Khandual
    Link: http://lkml.kernel.org/r/20200127110424.5757-2-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     

01 Feb, 2020

2 commits

  • memory_block structure elements 'hw' and 'phys_callback' are not getting
    used. This was originally added with commit 3947be1969a9 ("[PATCH]
    memory hotplug: sysfs and add/remove functions") but never seem to have
    been used. Just drop them now.

    Link: http://lkml.kernel.org/r/1576728650-13867-1-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual
    Reviewed-by: Dan Williams
    Reviewed-by: David Hildenbrand
    Cc: Michal Hocko
    Cc: Pavel Tatashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     
  • Luckily, we have no users left, so we can get rid of it. Cleanup
    set_migratetype_isolate() a little bit.

    Link: http://lkml.kernel.org/r/20191114131911.11783-2-david@redhat.com
    Signed-off-by: David Hildenbrand
    Reviewed-by: Greg Kroah-Hartman
    Acked-by: Michal Hocko
    Cc: "Rafael J. Wysocki"
    Cc: Pavel Tatashin
    Cc: Dan Williams
    Cc: Oscar Salvador
    Cc: Qian Cai
    Cc: Anshuman Khandual
    Cc: Pingfan Liu
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     

16 Nov, 2019

1 commit

  • try_offline_node() is pretty much broken right now:

    - The node span is updated when onlining memory, not when adding it. We
    ignore memory that was mever onlined. Bad.

    - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
    trigger a kernel panic. Bad for memory that is offline but also bad
    for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
    first PFN of a section might contain garbage.

    - Sections belonging to mixed nodes are not properly considered.

    As memory blocks might belong to multiple nodes, we would have to walk
    all pageblocks (or at least subsections) within present sections.
    However, we don't have a way to identify whether a memmap that is not
    online was initialized (relevant for ZONE_DEVICE). This makes things
    more complicated.

    Luckily, we can piggy pack on the node span and the nid stored in memory
    blocks. Currently, the node span is grown when calling
    move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
    removing memory, before calling try_offline_node(). Sysfs links are
    created via link_mem_sections(), e.g., during boot or when adding
    memory.

    If the node still spans memory or if any memory block belongs to the
    nid, we don't set the node offline. As memory blocks that span multiple
    nodes cannot get offlined, the nid stored in memory blocks is reliable
    enough (for such online memory blocks, the node still spans the memory).

    Introduce for_each_memory_block() to efficiently walk all memory blocks.

    Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
    when removing ZONE_DEVICE memory to fix similar issues (access of
    garbage memmaps) - until we have a reliable way to identify whether
    these memmaps were properly initialized. This implies later, that once
    a node had ZONE_DEVICE memory, we won't be able to set a node offline -
    which should be acceptable.

    Since commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate
    hotadded memory to zones until online") memory that is added is not
    assoziated with a zone/node (memmap not initialized). The introducing
    commit 60a5a19e7419 ("memory-hotplug: remove sysfs file of node")
    already missed that we could have multiple nodes for a section and that
    the zone/node span is updated when onlining pages, not when adding them.

    I tested this by hotplugging two DIMMs to a memory-less and cpu-less
    NUMA node. The node is properly onlined when adding the DIMMs. When
    removing the DIMMs, the node is properly offlined.

    Masayoshi Mizuma reported:

    : Without this patch, memory hotplug fails as panic:
    :
    : BUG: kernel NULL pointer dereference, address: 0000000000000000
    : ...
    : Call Trace:
    : remove_memory_block_devices+0x81/0xc0
    : try_remove_memory+0xb4/0x130
    : __remove_memory+0xa/0x20
    : acpi_memory_device_remove+0x84/0x100
    : acpi_bus_trim+0x57/0x90
    : acpi_bus_trim+0x2e/0x90
    : acpi_device_hotplug+0x2b2/0x4d0
    : acpi_hotplug_work_fn+0x1a/0x30
    : process_one_work+0x171/0x380
    : worker_thread+0x49/0x3f0
    : kthread+0xf8/0x130
    : ret_from_fork+0x35/0x40

    [david@redhat.com: v3]
    Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com
    Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com
    Fixes: 60a5a19e7419 ("memory-hotplug: remove sysfs file of node")
    Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e86b319
    Signed-off-by: David Hildenbrand
    Tested-by: Masayoshi Mizuma
    Cc: Tang Chen
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: Keith Busch
    Cc: Jiri Olsa
    Cc: "Peter Zijlstra (Intel)"
    Cc: Jani Nikula
    Cc: Nayna Jain
    Cc: Michal Hocko
    Cc: Oscar Salvador
    Cc: Stephen Rothwell
    Cc: Dan Williams
    Cc: Pavel Tatashin
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     

25 Sep, 2019

2 commits

  • Each memory block spans the same amount of sections/pages/bytes. The size
    is determined before the first memory block is created. No need to store
    what we can easily calculate - and the calculations even look simpler now.

    Michal brought up the idea of variable-sized memory blocks. However, if
    we ever implement something like this, we will need an API compatibility
    switch and reworks at various places (most code assumes a fixed memory
    block size). So let's cleanup what we have right now.

    While at it, fix the variable naming in register_mem_sect_under_node() -
    we no longer talk about a single section.

    Link: http://lkml.kernel.org/r/20190809110200.2746-1-david@redhat.com
    Signed-off-by: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: Pavel Tatashin
    Cc: Michal Hocko
    Cc: Dan Williams
    Cc: Oscar Salvador
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Let's validate the memory block size early, when initializing the memory
    device infrastructure. Fail hard in case the value is not suitable.

    As nobody checks the return value of memory_dev_init(), turn it into a
    void function and fail with a panic in all scenarios instead. Otherwise,
    we'll crash later during boot when core/drivers expect that the memory
    device infrastructure (including memory_block_size_bytes()) works as
    expected.

    I think long term, we should move the whole memory block size
    configuration (set_memory_block_size_order() and
    memory_block_size_bytes()) into drivers/base/memory.c.

    Link: http://lkml.kernel.org/r/20190806090142.22709-1-david@redhat.com
    Signed-off-by: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: Pavel Tatashin
    Cc: Michal Hocko
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     

19 Jul, 2019

5 commits

  • No longer needed, let's remove it. Also, drop the "hint" parameter
    completely from "find_memory_block_by_id", as nobody needs it anymore.

    [david@redhat.com: v3]
    Link: http://lkml.kernel.org/r/20190620183139.4352-7-david@redhat.com
    [david@redhat.com: handle zero-length walks]
    Link: http://lkml.kernel.org/r/1c2edc22-afd7-2211-c4c7-40e54e5007e8@redhat.com
    Link: http://lkml.kernel.org/r/20190614100114.311-7-david@redhat.com
    Signed-off-by: David Hildenbrand
    Reviewed-by: Andrew Morton
    Tested-by: Qian Cai
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: David Hildenbrand
    Cc: Stephen Rothwell
    Cc: Pavel Tatashin
    Cc: Andrew Banman
    Cc: Mike Travis
    Cc: Oscar Salvador
    Cc: Michal Hocko
    Cc: Wei Yang
    Cc: Arun KS
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Let's move walk_memory_blocks() to the place where memory block logic
    resides and simplify it. While at it, add a type for the callback
    function.

    Link: http://lkml.kernel.org/r/20190614100114.311-6-david@redhat.com
    Signed-off-by: David Hildenbrand
    Reviewed-by: Andrew Morton
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: David Hildenbrand
    Cc: Stephen Rothwell
    Cc: Pavel Tatashin
    Cc: Andrew Banman
    Cc: Mike Travis
    Cc: Oscar Salvador
    Cc: Michal Hocko
    Cc: Wei Yang
    Cc: Arun KS
    Cc: Qian Cai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Let's factor out removing of memory block devices, which is only
    necessary for memory added via add_memory() and friends that created
    memory block devices. Remove the devices before calling
    arch_remove_memory().

    This finishes factoring out memory block device handling from
    arch_add_memory() and arch_remove_memory().

    Link: http://lkml.kernel.org/r/20190527111152.16324-10-david@redhat.com
    Signed-off-by: David Hildenbrand
    Reviewed-by: Dan Williams
    Acked-by: Michal Hocko
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: David Hildenbrand
    Cc: "mike.travis@hpe.com"
    Cc: Andrew Banman
    Cc: Ingo Molnar
    Cc: Alex Deucher
    Cc: "David S. Miller"
    Cc: Mark Brown
    Cc: Chris Wilson
    Cc: Oscar Salvador
    Cc: Jonathan Cameron
    Cc: Arun KS
    Cc: Mathieu Malaterre
    Cc: Andy Lutomirski
    Cc: Anshuman Khandual
    Cc: Ard Biesheuvel
    Cc: Baoquan He
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Chintan Pandya
    Cc: Christophe Leroy
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: "H. Peter Anvin"
    Cc: Joonsoo Kim
    Cc: Jun Yao
    Cc: "Kirill A. Shutemov"
    Cc: Logan Gunthorpe
    Cc: Mark Rutland
    Cc: Masahiro Yamada
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Nicholas Piggin
    Cc: Oscar Salvador
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: Qian Cai
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Robin Murphy
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vasily Gorbik
    Cc: Wei Yang
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Cc: Yu Zhao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Only memory to be added to the buddy and to be onlined/offlined by user
    space using /sys/devices/system/memory/... needs (and should have!)
    memory block devices.

    Factor out creation of memory block devices. Create all devices after
    arch_add_memory() succeeded. We can later drop the want_memblock
    parameter, because it is now effectively stale.

    Only after memory block devices have been added, memory can be onlined
    by user space. This implies, that memory is not visible to user space
    at all before arch_add_memory() succeeded.

    While at it
    - use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory()
    - introduce find_memory_block_by_id() to search via block id
    - Use find_memory_block_by_id() in init_memory_block() to catch
    duplicates

    Link: http://lkml.kernel.org/r/20190527111152.16324-8-david@redhat.com
    Signed-off-by: David Hildenbrand
    Reviewed-by: Pavel Tatashin
    Acked-by: Michal Hocko
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: David Hildenbrand
    Cc: "mike.travis@hpe.com"
    Cc: Ingo Molnar
    Cc: Andrew Banman
    Cc: Oscar Salvador
    Cc: Qian Cai
    Cc: Wei Yang
    Cc: Arun KS
    Cc: Mathieu Malaterre
    Cc: Alex Deucher
    Cc: Andy Lutomirski
    Cc: Anshuman Khandual
    Cc: Ard Biesheuvel
    Cc: Baoquan He
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Chintan Pandya
    Cc: Christophe Leroy
    Cc: Chris Wilson
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: "David S. Miller"
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: "H. Peter Anvin"
    Cc: Jonathan Cameron
    Cc: Joonsoo Kim
    Cc: Jun Yao
    Cc: "Kirill A. Shutemov"
    Cc: Logan Gunthorpe
    Cc: Mark Brown
    Cc: Mark Rutland
    Cc: Masahiro Yamada
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Nicholas Piggin
    Cc: Oscar Salvador
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Robin Murphy
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vasily Gorbik
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Cc: Yu Zhao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • We want to improve error handling while adding memory by allowing to use
    arch_remove_memory() and __remove_pages() even if
    CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:

    arch_add_memory()
    rc = do_something();
    if (rc) {
    arch_remove_memory();
    }

    We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
    quite some dependencies for memory offlining.

    Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.com
    Signed-off-by: David Hildenbrand
    Reviewed-by: Pavel Tatashin
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Heiko Carstens
    Cc: Yoshinori Sato
    Cc: Rich Felker
    Cc: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: Michal Hocko
    Cc: David Hildenbrand
    Cc: Oscar Salvador
    Cc: "Kirill A. Shutemov"
    Cc: Alex Deucher
    Cc: "David S. Miller"
    Cc: Mark Brown
    Cc: Chris Wilson
    Cc: Christophe Leroy
    Cc: Nicholas Piggin
    Cc: Vasily Gorbik
    Cc: Rob Herring
    Cc: Masahiro Yamada
    Cc: "mike.travis@hpe.com"
    Cc: Andrew Banman
    Cc: Arun KS
    Cc: Qian Cai
    Cc: Mathieu Malaterre
    Cc: Baoquan He
    Cc: Logan Gunthorpe
    Cc: Anshuman Khandual
    Cc: Ard Biesheuvel
    Cc: Catalin Marinas
    Cc: Chintan Pandya
    Cc: Dan Williams
    Cc: Ingo Molnar
    Cc: Jonathan Cameron
    Cc: Joonsoo Kim
    Cc: Jun Yao
    Cc: Mark Rutland
    Cc: Mike Rapoport
    Cc: Oscar Salvador
    Cc: Robin Murphy
    Cc: Wei Yang
    Cc: Will Deacon
    Cc: Yu Zhao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     

15 May, 2019

1 commit

  • Failing while removing memory is mostly ignored and cannot really be
    handled. Let's treat errors in unregister_memory_section() in a nice way,
    warning, but continuing.

    Link: http://lkml.kernel.org/r/20190409100148.24703-3-david@redhat.com
    Signed-off-by: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: Ingo Molnar
    Cc: Andrew Banman
    Cc: Mike Travis
    Cc: David Hildenbrand
    Cc: Oscar Salvador
    Cc: Michal Hocko
    Cc: Pavel Tatashin
    Cc: Qian Cai
    Cc: Wei Yang
    Cc: Arun KS
    Cc: Mathieu Malaterre
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Christophe Leroy
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Heiko Carstens
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Cc: "Kirill A. Shutemov"
    Cc: Martin Schwidefsky
    Cc: Masahiro Yamada
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Nicholas Piggin
    Cc: Oscar Salvador
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Stefan Agner
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vasily Gorbik
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     

21 Jun, 2018

1 commit

  • Add a new function to "adjust" the current fixed UV memory block size
    of 2GB so it can be changed to a different physical boundary. This is
    out of necessity so arch dependent code can accommodate specific BIOS
    requirements which can align these new PMEM modules at less than the
    default boundaries.

    A "set order" type of function was used to insure that the memory block
    size will be a power of two value without requiring a validity check.
    64GB was chosen as the upper limit for memory block size values to
    accommodate upcoming 4PB systems which have 6 more bits of physical
    address space (46 becoming 52).

    Signed-off-by: Mike Travis
    Reviewed-by: Andrew Banman
    Cc: Andrew Morton
    Cc: Dimitri Sivanich
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Russ Anderson
    Cc: Thomas Gleixner
    Cc: dan.j.williams@intel.com
    Cc: jgross@suse.com
    Cc: kirill.shutemov@linux.intel.com
    Cc: mhocko@suse.com
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/lkml/20180524201711.609546602@stormcage.americas.sgi.com
    Signed-off-by: Ingo Molnar

    mike.travis@hpe.com
     

06 Apr, 2018

2 commits

  • During memory hotplugging we traverse struct pages three times:

    1. memset(0) in sparse_add_one_section()
    2. loop in __add_section() to set do: set_page_node(page, nid); and
    SetPageReserved(page);
    3. loop in memmap_init_zone() to call __init_single_pfn()

    This patch removes the first two loops, and leaves only loop 3. All
    struct pages are initialized in one place, the same as it is done during
    boot.

    The benefits:

    - We improve memory hotplug performance because we are not evicting the
    cache several times and also reduce loop branching overhead.

    - Remove condition from hotpath in __init_single_pfn(), that was added
    in order to fix the problem that was reported by Bharata in the above
    email thread, thus also improve performance during normal boot.

    - Make memory hotplug more similar to the boot memory initialization
    path because we zero and initialize struct pages only in one
    function.

    - Simplifies memory hotplug struct page initialization code, and thus
    enables future improvements, such as multi-threading the
    initialization of struct pages in order to improve hotplug
    performance even further on larger machines.

    [pasha.tatashin@oracle.com: v5]
    Link: http://lkml.kernel.org/r/20180228030308.1116-7-pasha.tatashin@oracle.com
    Link: http://lkml.kernel.org/r/20180215165920.8570-7-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Reviewed-by: Ingo Molnar
    Cc: Michal Hocko
    Cc: Baoquan He
    Cc: Bharata B Rao
    Cc: Daniel Jordan
    Cc: Dan Williams
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Kirill A. Shutemov
    Cc: Mel Gorman
    Cc: Steven Sistare
    Cc: Thomas Gleixner
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • During memory hotplugging the probe routine will leave struct pages
    uninitialized, the same as it is currently done during boot. Therefore,
    we do not want to access the inside of struct pages before
    __init_single_page() is called during onlining.

    Because during hotplug we know that pages in one memory block belong to
    the same numa node, we can skip the checking. We should keep checking
    for the boot case.

    [pasha.tatashin@oracle.com: s/register_new_memory()/hotplug_memory_register()]
    Link: http://lkml.kernel.org/r/20180228030308.1116-6-pasha.tatashin@oracle.com
    Link: http://lkml.kernel.org/r/20180215165920.8570-6-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Acked-by: Michal Hocko
    Reviewed-by: Ingo Molnar
    Cc: Baoquan He
    Cc: Bharata B Rao
    Cc: Daniel Jordan
    Cc: Dan Williams
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Kirill A. Shutemov
    Cc: Mel Gorman
    Cc: Steven Sistare
    Cc: Thomas Gleixner
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

25 Feb, 2017

1 commit

  • Commit 31bc3858ea3e ("add automatic onlining policy for the newly added
    memory") provides the capability to have added memory automatically
    onlined during add, but this appears to be slightly broken.

    The current implementation uses walk_memory_range() to call
    online_memory_block, which uses memory_block_change_state() to online
    the memory. Instead, we should be calling device_online() for the
    memory block in online_memory_block(). This would online the memory
    (the memory bus online routine memory_subsys_online() called from
    device_online calls memory_block_change_state()) and properly update the
    device struct offline flag.

    As a result of the current implementation, attempting to remove a memory
    block after adding it using auto online fails. This is because doing a
    remove, for instance

    echo offline > /sys/devices/system/memory/memoryXXX/state

    uses device_offline() which checks the dev->offline flag.

    Link: http://lkml.kernel.org/r/20170222220744.8119.19687.stgit@ltcalpine2-lp14.aus.stglabs.ibm.com
    Signed-off-by: Nathan Fontenot
    Cc: Michael Ellerman
    Cc: Michael Roth
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nathan Fontenot
     

18 Mar, 2016

1 commit

  • Pull char/misc updates from Greg KH:
    "Here is the big char/misc driver update for 4.6-rc1.

    The majority of the patches here is hwtracing and some new mic
    drivers, but there's a lot of other driver updates as well. Full
    details in the shortlog.

    All have been in linux-next for a while with no reported issues"

    * tag 'char-misc-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (238 commits)
    goldfish: Fix build error of missing ioremap on UM
    nvmem: mediatek: Fix later provider initialization
    nvmem: imx-ocotp: Fix return value of imx_ocotp_read
    nvmem: Fix dependencies for !HAS_IOMEM archs
    char: genrtc: replace blacklist with whitelist
    drivers/hwtracing: make coresight-etm-perf.c explicitly non-modular
    drivers: char: mem: fix IS_ERROR_VALUE usage
    char: xillybus: Fix internal data structure initialization
    pch_phub: return -ENODATA if ROM can't be mapped
    Drivers: hv: vmbus: Support kexec on ws2012 r2 and above
    Drivers: hv: vmbus: Support handling messages on multiple CPUs
    Drivers: hv: utils: Remove util transport handler from list if registration fails
    Drivers: hv: util: Pass the channel information during the init call
    Drivers: hv: vmbus: avoid unneeded compiler optimizations in vmbus_wait_for_unload()
    Drivers: hv: vmbus: remove code duplication in message handling
    Drivers: hv: vmbus: avoid wait_for_completion() on crash
    Drivers: hv: vmbus: don't loose HVMSG_TIMER_EXPIRED messages
    misc: at24: replace memory_accessor with nvmem_device_read
    eeprom: 93xx46: extend driver to plug into the NVMEM framework
    eeprom: at25: extend driver to plug into the NVMEM framework
    ...

    Linus Torvalds
     

16 Mar, 2016

1 commit

  • Currently, all newly added memory blocks remain in 'offline' state
    unless someone onlines them, some linux distributions carry special udev
    rules like:

    SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"

    to make this happen automatically. This is not a great solution for
    virtual machines where memory hotplug is being used to address high
    memory pressure situations as such onlining is slow and a userspace
    process doing this (udev) has a chance of being killed by the OOM killer
    as it will probably require to allocate some memory.

    Introduce default policy for the newly added memory blocks in
    /sys/devices/system/memory/auto_online_blocks file with two possible
    values: "offline" which preserves the current behavior and "online"
    which causes all newly added memory blocks to go online as soon as
    they're added. The default is "offline".

    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Daniel Kiper
    Cc: Jonathan Corbet
    Cc: Greg Kroah-Hartman
    Cc: Daniel Kiper
    Cc: Dan Williams
    Cc: Tang Chen
    Cc: David Vrabel
    Acked-by: David Rientjes
    Cc: Naoya Horiguchi
    Cc: Xishi Qiu
    Cc: Mel Gorman
    Cc: "K. Y. Srinivasan"
    Cc: Igor Mammedov
    Cc: Kay Sievers
    Cc: Konrad Rzeszutek Wilk
    Cc: Boris Ostrovsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Kuznetsov
     

02 Mar, 2016

1 commit


23 Oct, 2014

1 commit

  • drivers/base/memory.c provides a default memory_block_size_bytes()
    definition explicitly marked "weak". Several architectures provide their
    own definitions intended to override the default, but the "weak" attribute
    on the declaration applied to the arch definitions as well, so the linker
    chose one based on link order (see 10629d711ed7 ("PCI: Remove __weak
    annotation from pcibios_get_phb_of_node decl")).

    Remove the "weak" attribute from the declaration so we always prefer a
    non-weak definition over the weak one, independent of link order.

    Fixes: 41f107266b19 ("drivers: base: Add prototype declaration to the header file")
    Signed-off-by: Bjorn Helgaas
    Acked-by: Andrew Morton
    CC: Rashika Kheria
    CC: Nathan Fontenot
    CC: Anton Blanchard
    CC: Heiko Carstens
    CC: Yinghai Lu

    Bjorn Helgaas
     

21 Dec, 2013

1 commit

  • Add prototype declaration of function memory_block_size_bytes() to
    the header file include/linux/memory.h.

    This eliminates the following warning in memory.c:
    drivers/base/memory.c:87:1: warning: no previous prototype for ‘memory_block_size_bytes’ [-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Signed-off-by: Greg Kroah-Hartman

    Rashika Kheria
     

22 Aug, 2013

2 commits

  • There are two ways to set the online/offline state for a memory block:
    echo 0|1 > online and echo online|online_kernel|online_movable|offline >
    state.

    The state attribute can online a memory block with extra data, the
    "online type", where the online attribute uses a default online type of
    ONLINE_KEEP, same as echo online > state.

    Currently there is a state_mutex that provides consistency between the
    memory block state and the underlying memory.

    The problem is that this code does a lot of things that the common
    device layer can do for us, such as the serialization of the
    online/offline handlers using the device lock, setting the dev->offline
    field, and calling kobject_uevent().

    This patch refactors the online/offline code to allow the common
    device_[online|offline] functions to be used. The result is a simpler
    and more common code path for the two state setting mechanisms. It also
    removes the state_mutex from the struct memory_block as the memory block
    device lock provides the state consistency.

    No functional change is intended by this patch.

    Signed-off-by: Seth Jennings
    Signed-off-by: Greg Kroah-Hartman

    Seth Jennings
     
  • Now that add_memory_section() is only called from boot time, reduce
    the logic and remove the enum.

    Signed-off-by: Seth Jennings
    Signed-off-by: Greg Kroah-Hartman

    Seth Jennings
     

01 May, 2013

1 commit

  • Fix the following compilation warnings:

    mm/slab.c: In function `kmem_cache_init_late':
    mm/slab.c:1778:2: warning: statement with no effect [-Wunused-value]

    mm/page_cgroup.c: In function `page_cgroup_init':
    mm/page_cgroup.c:305:2: warning: statement with no effect [-Wunused-value]

    Signed-off-by: Vincent Stehlé
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vincent Stehlé
     

30 Apr, 2013

2 commits

  • __remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE. PowerPC
    pseries will return -EOPNOTSUPP if unsupported.

    Adding an #ifdef causes several other functions it depends on to also
    become unnecessary, which saves in .text when disabled (it's disabled in
    most defconfigs besides powerpc, including x86). remove_memory_block()
    becomes static since it is not referenced outside of
    drivers/base/memory.c.

    Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled
    and disabled.

    Signed-off-by: David Rientjes
    Acked-by: Toshi Kani
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Greg Kroah-Hartman
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • When CONFIG_MEMORY_HOTPLUG=n, we don't want the memory-hotplug notifier
    handlers to be included in the .o files, for space reasons.

    The existing hotplug_memory_notifier() tries to handle this but testing
    with gcc-4.4.4 shows that it doesn't work - the hotplug functions are
    still present in the .o files.

    So implement a new register_hotmemory_notifier() which is a copy of
    register_hotcpu_notifier(), and which actually works as desired.
    hotplug_memory_notifier() and register_memory_notifier() callsites
    should be converted to use this new register_hotmemory_notifier().

    While we're there, let's repair the existing hotplug_memory_notifier():
    it simply stomps on the register_memory_notifier() return value, so
    well-behaved code cannot check for errors. Apparently non of the
    existing callers were well-behaved :(

    Cc: Andrew Shewmaker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

13 Dec, 2012

1 commit

  • Update nodemasks management for N_MEMORY.

    [lliubbo@gmail.com: fix build]
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Wen Congyang
    Cc: Christoph Lameter
    Cc: Hillf Danton
    Cc: Lin Feng
    Cc: David Rientjes
    Signed-off-by: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     

12 Dec, 2012

1 commit

  • Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY], it
    forgets to manage node_states[N_NORMAL_MEMORY]. This may cause
    node_states[N_NORMAL_MEMORY] to become incorrect.

    Example, if a node is empty before online, and we online a memory which is
    in ZONE_NORMAL. And after online, node_states[N_HIGH_MEMORY] is correct,
    but node_states[N_NORMAL_MEMORY] is incorrect, the online code doesn't set
    the new online node to node_states[N_NORMAL_MEMORY].

    The same thing will happen when offlining (the offline code doesn't clear
    the node from node_states[N_NORMAL_MEMORY] when needed). Some memory
    managment code depends node_states[N_NORMAL_MEMORY], so we have to fix up
    the node_states[N_NORMAL_MEMORY].

    We add node_states_check_changes_online() and
    node_states_check_changes_offline() to detect whether
    node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY] are changed
    while hotpluging.

    Also add @status_change_nid_normal to struct memory_notify, thus the
    memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY] are
    changed. (We can add a @flags and reuse @status_change_nid instead of
    introducing @status_change_nid_normal, but it will add much more
    complexity in memory hotplug callback in every subsystem. So introducing
    @status_change_nid_normal is better and it doesn't change the sematics of
    @status_change_nid)

    Signed-off-by: Lai Jiangshan
    Cc: David Rientjes
    Cc: Minchan Kim
    Cc: KOSAKI Motohiro
    Cc: Yasuaki Ishimatsu
    Cc: Rob Landley
    Cc: Jiang Liu
    Cc: Kay Sievers
    Cc: Greg Kroah-Hartman
    Cc: Mel Gorman
    Cc: Wen Congyang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     

18 Sep, 2012

1 commit

  • I found following definition in include/linux/memory.h, in my IA64
    platform, SECTION_SIZE_BITS is equal to 32, and MIN_MEMORY_BLOCK_SIZE
    will be 0.

    #define MIN_MEMORY_BLOCK_SIZE (1 << SECTION_SIZE_BITS)

    Because MIN_MEMORY_BLOCK_SIZE is int type and length of 32bits,
    so MIN_MEMORY_BLOCK_SIZE(1 << 32) will will equal to 0.
    Actually when SECTION_SIZE_BITS >= 31, MIN_MEMORY_BLOCK_SIZE will be wrong.
    This will cause wrong system memory infomation in sysfs.
    I think it should be:

    #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)

    And "echo offline > memory0/state" will cause following call trace:

    kernel BUG at mm/memory_hotplug.c:885!
    sh[6455]: bugcheck! 0 [1]
    Pid: 6455, CPU 0, comm: sh
    psr : 0000101008526030 ifs : 8000000000000fa4 ip : [] Not tainted (3.6.0-rc1)
    ip is at offline_pages+0x210/0xee0
    Call Trace:
    show_stack+0x80/0xa0
    show_regs+0x640/0x920
    die+0x190/0x2c0
    die_if_kernel+0x50/0x80
    ia64_bad_break+0x3d0/0x6e0
    ia64_native_leave_kernel+0x0/0x270
    offline_pages+0x210/0xee0
    alloc_pages_current+0x180/0x2a0

    Signed-off-by: Jianguo Wu
    Signed-off-by: Jiang Liu
    Cc: "Luck, Tony"
    Reviewed-by: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jianguo Wu
     

22 Dec, 2011

1 commit

  • This moves the 'memory sysdev_class' over to a regular 'memory' subsystem
    and converts the devices to regular devices. The sysdev drivers are
    implemented as subsystem interfaces now.

    After all sysdev classes are ported to regular driver core entities, the
    sysdev implementation will be entirely removed from the kernel.

    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

12 Jul, 2011

1 commit


04 Feb, 2011

1 commit

  • Update the 'phys_index' property of a the memory_block struct to be
    called start_section_nr, and add a end_section_nr property. The
    data tracked here is the same but the updated naming is more in line
    with what is stored here, namely the first and last section number
    that the memory block spans.

    The names presented to userspace remain the same, phys_index for
    start_section_nr and end_phys_index for end_section_nr, to avoid breaking
    anything in userspace.

    This also updates the node sysfs code to be aware of the new capability for
    a memory block to contain multiple memory sections and be aware of the memory
    block structure name changes (start_section_nr). This requires an additional
    parameter to unregister_mem_sect_under_nodes so that we know which memory
    section of the memory block to unregister.

    Signed-off-by: Nathan Fontenot
    Reviewed-by: Robin Holt
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Greg Kroah-Hartman

    Nathan Fontenot
     

23 Oct, 2010

2 commits


18 Mar, 2010

1 commit

  • /sys/devices/system/memory/memoryX/phys_device is supposed to contain the
    number of the physical device that the corresponding piece of memory
    belongs to.

    In case a physical device should be replaced or taken offline for whatever
    reason it is necessary to set all corresponding memory pieces offline.
    The current implementation always sets phys_device to '0' and there is no
    way or hook to change that. Seems like there was a plan to implement that
    but it wasn't finished for whatever reason.

    So add a weak function which architectures can override to actually set
    the phys_device from within add_memory_block().

    Signed-off-by: Heiko Carstens
    Cc: Dave Hansen
    Cc: Gerald Schaefer
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

18 Dec, 2009

1 commit

  • Memory balloon drivers can allocate a large amount of memory which is not
    movable but could be freed to accomodate memory hotplug remove.

    Prior to calling the memory hotplug notifier chain the memory in the
    pageblock is isolated. Currently, if the migrate type is not
    MIGRATE_MOVABLE the isolation will not proceed, causing the memory removal
    for that page range to fail.

    Rather than failing pageblock isolation if the migrateteype is not
    MIGRATE_MOVABLE, this patch checks if all of the pages in the pageblock,
    and not on the LRU, are owned by a registered balloon driver (or other
    entity) using a notifier chain. If all of the non-movable pages are owned
    by a balloon, they can be freed later through the memory notifier chain
    and the range can still be isolated in set_migratetype_isolate().

    Signed-off-by: Robert Jennings
    Cc: Mel Gorman
    Cc: Ingo Molnar
    Cc: Brian King
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Gerald Schaefer
    Cc: KAMEZAWA Hiroyuki
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Benjamin Herrenschmidt

    Robert Jennings
     

06 Apr, 2009

1 commit

  • * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
    tracing, net: fix net tree and tracing tree merge interaction
    tracing, powerpc: fix powerpc tree and tracing tree interaction
    ring-buffer: do not remove reader page from list on ring buffer free
    function-graph: allow unregistering twice
    trace: make argument 'mem' of trace_seq_putmem() const
    tracing: add missing 'extern' keywords to trace_output.h
    tracing: provide trace_seq_reserve()
    blktrace: print out BLK_TN_MESSAGE properly
    blktrace: extract duplidate code
    blktrace: fix memory leak when freeing struct blk_io_trace
    blktrace: fix blk_probes_ref chaos
    blktrace: make classic output more classic
    blktrace: fix off-by-one bug
    blktrace: fix the original blktrace
    blktrace: fix a race when creating blk_tree_root in debugfs
    blktrace: fix timestamp in binary output
    tracing, Text Edit Lock: cleanup
    tracing: filter fix for TRACE_EVENT_FORMAT events
    ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
    x86: kretprobe-booster interrupt emulation code fix
    ...

    Fix up trivial conflicts in
    arch/parisc/include/asm/ftrace.h
    include/linux/memory.h
    kernel/extable.c
    kernel/module.c

    Linus Torvalds
     

03 Apr, 2009

1 commit

  • Add an interface by which other kernel code can read/write persistent
    memory such as I2C or SPI EEPROMs, or devices which provide NVRAM. Use
    cases include storage of board-specific configuration data like Ethernet
    addresses and sensor calibrations.

    Original idea, review and improvement suggestions by David Brownell.

    Acked-by: David Brownell
    Signed-off-by: Kevin Hilman
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kevin Hilman
     

06 Mar, 2009

1 commit

  • This is an architecture independant synchronization around kernel text
    modifications through use of a global mutex.

    A mutex has been chosen so that kprobes, the main user of this, can sleep
    during memory allocation between the memory read of the instructions it
    must replace and the memory write of the breakpoint.

    Other user of this interface: immediate values.

    Paravirt and alternatives are always done when SMP is inactive, so there
    is no need to use locks.

    Signed-off-by: Mathieu Desnoyers
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mathieu Desnoyers