20 May, 2016

1 commit

  • This patchset continues the work I started with commit 31bc3858ea3e
    ("memory-hotplug: add automatic onlining policy for the newly added
    memory").

    Initially I was going to stop there and bring the policy setting logic
    to userspace. I met two issues on this way:

    1) It is possible to have memory hotplugged at boot (e.g. with QEMU).
    These blocks stay offlined if we turn the onlining policy on by
    userspace.

    2) My attempt to bring this policy setting to systemd failed, systemd
    maintainers suggest to change the default in kernel or ... to use
    tmpfiles.d to alter the policy (which looks like a hack to me):
    https://github.com/systemd/systemd/pull/2938

    Here I suggest to add a config option to set the default value for the
    policy and a kernel command line parameter to make the override.

    This patch (of 2):

    Introduce config option to set the default value for memory hotplug
    onlining policy (/sys/devices/system/memory/auto_online_blocks). The
    reason one would want to turn this option on are to have early onlining
    for hotpluggable memory available at boot and to not require any
    userspace actions to make memory hotplug work.

    [akpm@linux-foundation.org: tweak Kconfig text]
    Signed-off-by: Vitaly Kuznetsov
    Cc: Jonathan Corbet
    Cc: Dan Williams
    Cc: "Kirill A. Shutemov"
    Cc: Mel Gorman
    Cc: David Vrabel
    Cc: David Rientjes
    Cc: Igor Mammedov
    Cc: Lennart Poettering
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Kuznetsov
     

16 Mar, 2016

1 commit

  • Currently, all newly added memory blocks remain in 'offline' state
    unless someone onlines them, some linux distributions carry special udev
    rules like:

    SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"

    to make this happen automatically. This is not a great solution for
    virtual machines where memory hotplug is being used to address high
    memory pressure situations as such onlining is slow and a userspace
    process doing this (udev) has a chance of being killed by the OOM killer
    as it will probably require to allocate some memory.

    Introduce default policy for the newly added memory blocks in
    /sys/devices/system/memory/auto_online_blocks file with two possible
    values: "offline" which preserves the current behavior and "online"
    which causes all newly added memory blocks to go online as soon as
    they're added. The default is "offline".

    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Daniel Kiper
    Cc: Jonathan Corbet
    Cc: Greg Kroah-Hartman
    Cc: Daniel Kiper
    Cc: Dan Williams
    Cc: Tang Chen
    Cc: David Vrabel
    Acked-by: David Rientjes
    Cc: Naoya Horiguchi
    Cc: Xishi Qiu
    Cc: Mel Gorman
    Cc: "K. Y. Srinivasan"
    Cc: Igor Mammedov
    Cc: Kay Sievers
    Cc: Konrad Rzeszutek Wilk
    Cc: Boris Ostrovsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Kuznetsov
     

20 Mar, 2015

1 commit


28 Feb, 2015

1 commit

  • Documentation/memory-hotplug.txt describes that a callback function can
    be added to the notification chain by calling hotplug_memory_notifier().
    The function prototype of the callback function is mssing. This missing
    information is added by the patch.

    The description of the arguments of the callback function is
    reworked.

    The constants for the event types are corrected.

    The possible return values are explained.

    Signed-off-by: Heinrich Schuchardt
    Signed-off-by: Jonathan Corbet

    Heinrich Schuchardt
     

10 Oct, 2014

1 commit

  • Currently memory-hotplug has two limits:

    1. If the memory block is in ZONE_NORMAL, you can change it to
    ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE.

    2. If the memory block is in ZONE_MOVABLE, you can change it to
    ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL.

    With this patch, we can easy to know a memory block can be onlined to
    which zone, and don't need to know the above two limits.

    Updated the related Documentation.

    [akpm@linux-foundation.org: use conventional comment layout]
    [akpm@linux-foundation.org: fix build with CONFIG_MEMORY_HOTREMOVE=n]
    [akpm@linux-foundation.org: remove unused local zone_prev]
    Signed-off-by: Zhang Zhen
    Cc: Dave Hansen
    Cc: David Rientjes
    Cc: Toshi Kani
    Cc: Yasuaki Ishimatsu
    Cc: Naoya Horiguchi
    Cc: Wang Nan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Zhen
     

24 Jun, 2014

1 commit

  • Documentation/memory-hotplug.txt incorrectly states that the memory
    driver "probe" interface is only supported on powerpc and is vague about
    its application on x86. Clarify the platforms that make this interface
    available if memory hotplug is enabled.

    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

05 Jun, 2014

1 commit

  • Seems we all agree that information about SECTION, e.g. section size,
    sections per memory block should be kept as kernel internals, and not
    exposed to userspace.

    This patch updates Documentation/memory-hotplug.txt to refer to memory
    blocks instead of memory sections where appropriate and added a
    paragraph to explain that memory blocks are made of memory sections.
    The documentation update is mostly provided by Nathan.

    Also, as end_phys_index in code is actually not the end section id, but
    the end memory block id, which should always be the same as phys_index.
    So it is removed here.

    Signed-off-by: Li Zhong
    Reviewed-by: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zhong
     

07 Sep, 2013

1 commit

  • Pull trivial tree from Jiri Kosina:
    "The usual trivial updates all over the tree -- mostly typo fixes and
    documentation updates"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits)
    doc: Documentation/cputopology.txt fix typo
    treewide: Convert retrun typos to return
    Fix comment typo for init_cma_reserved_pageblock
    Documentation/trace: Correcting and extending tracepoint documentation
    mm/hotplug: fix a typo in Documentation/memory-hotplug.txt
    power: Documentation: Update s2ram link
    doc: fix a typo in Documentation/00-INDEX
    Documentation/printk-formats.txt: No casts needed for u64/s64
    doc: Fix typo "is is" in Documentations
    treewide: Fix printks with 0x%#
    zram: doc fixes
    Documentation/kmemcheck: update kmemcheck documentation
    doc: documentation/hwspinlock.txt fix typo
    PM / Hibernate: add section for resume options
    doc: filesystems : Fix typo in Documentations/filesystems
    scsi/megaraid fixed several typos in comments
    ppc: init_32: Fix error typo "CONFIG_START_KERNEL"
    treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks
    page_isolation: Fix a comment typo in test_pages_isolated()
    doc: fix a typo about irq affinity
    ...

    Linus Torvalds
     

27 Aug, 2013

1 commit


22 Jul, 2013

1 commit

  • CONFIG_ARCH_MEMORY_PROBE enables the
    /sys/devices/system/memory/probe interface, which allows a given
    memory address to be hot-added as follows:

    # echo start_address_of_new_memory > /sys/devices/system/memory/probe

    (See Documentation/memory-hotplug.txt for more details.)

    This probe interface is required on powerpc. On x86, however,
    ACPI notifies a memory hotplug event to the kernel, which
    performs its hotplug operation as the result.

    Therefore, regular users do not need this interface on x86. This probe
    interface is also error-prone and misleading that the kernel blindly
    adds a given memory address without checking if the memory is present
    on the system; no probing is done despite of its name.

    The kernel crashes when a user requests to online a memory block
    that is not present on the system. This interface is currently
    used for testing as it can fake a hotplug event.

    This patch disables CONFIG_ARCH_MEMORY_PROBE by default on x86,
    adds its Kconfig menu entry on x86, and clarifies its use in
    Documentation/ memory-hotplug.txt.

    Signed-off-by: Toshi Kani
    Acked-by: KOSAKI Motohiro
    Cc: linux-mm@kvack.org
    Cc: dave@sr71.net
    Cc: isimatu.yasuaki@jp.fujitsu.com
    Cc: tangchen@cn.fujitsu.com
    Cc: vasilis.liaskovitis@profitbricks.com
    Link: http://lkml.kernel.org/r/1374256068-26016-1-git-send-email-toshi.kani@hp.com
    [ Edited it slightly. ]
    Signed-off-by: Ingo Molnar

    Toshi Kani
     

13 Dec, 2012

1 commit

  • Update nodemasks management for N_MEMORY.

    [lliubbo@gmail.com: fix build]
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Wen Congyang
    Cc: Christoph Lameter
    Cc: Hillf Danton
    Cc: Lin Feng
    Cc: David Rientjes
    Signed-off-by: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     

12 Dec, 2012

2 commits

  • Add online_movable and online_kernel for logic memory hotplug. This is
    the dynamic version of "movablecore" & "kernelcore".

    We have the same reason to introduce it as to introduce "movablecore" &
    "kernelcore". It has the same motive as "movablecore" & "kernelcore", but
    it is dynamic/running-time:

    o We can configure memory as kernelcore or movablecore after boot.

    Userspace workload is increased, we need more hugepage, we can't use
    "online_movable" to add memory and allow the system use more
    THP(transparent-huge-page), vice-verse when kernel workload is increase.

    Also help for virtualization to dynamic configure host/guest's memory,
    to save/(reduce waste) memory.

    Memory capacity on Demand

    o When a new node is physically online after boot, we need to use
    "online_movable" or "online_kernel" to configure/portion it as we
    expected when we logic-online it.

    This configuration also helps for physically-memory-migrate.

    o all benefit as the same as existed "movablecore" & "kernelcore".

    o Preparing for movable-node, which is very important for power-saving,
    hardware partitioning and high-available-system(hardware fault
    management).

    (Note, we don't introduce movable-node here.)

    Action behavior:
    When a memoryblock/memorysection is onlined by "online_movable", the kernel
    will not have directly reference to the page of the memoryblock,
    thus we can remove that memory any time when needed.

    When it is online by "online_kernel", the kernel can use it.
    When it is online by "online", the zone type doesn't changed.

    Current constraints:
    Only the memoryblock which is adjacent to the ZONE_MOVABLE
    can be online from ZONE_NORMAL to ZONE_MOVABLE.

    [akpm@linux-foundation.org: use min_t, cleanups]
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Wen Congyang
    Cc: Yasuaki Ishimatsu
    Cc: Lai Jiangshan
    Cc: Jiang Liu
    Cc: KOSAKI Motohiro
    Cc: Minchan Kim
    Cc: Mel Gorman
    Cc: David Rientjes
    Cc: Yinghai Lu
    Cc: Rusty Russell
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY], it
    forgets to manage node_states[N_NORMAL_MEMORY]. This may cause
    node_states[N_NORMAL_MEMORY] to become incorrect.

    Example, if a node is empty before online, and we online a memory which is
    in ZONE_NORMAL. And after online, node_states[N_HIGH_MEMORY] is correct,
    but node_states[N_NORMAL_MEMORY] is incorrect, the online code doesn't set
    the new online node to node_states[N_NORMAL_MEMORY].

    The same thing will happen when offlining (the offline code doesn't clear
    the node from node_states[N_NORMAL_MEMORY] when needed). Some memory
    managment code depends node_states[N_NORMAL_MEMORY], so we have to fix up
    the node_states[N_NORMAL_MEMORY].

    We add node_states_check_changes_online() and
    node_states_check_changes_offline() to detect whether
    node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY] are changed
    while hotpluging.

    Also add @status_change_nid_normal to struct memory_notify, thus the
    memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY] are
    changed. (We can add a @flags and reuse @status_change_nid instead of
    introducing @status_change_nid_normal, but it will add much more
    complexity in memory hotplug callback in every subsystem. So introducing
    @status_change_nid_normal is better and it doesn't change the sematics of
    @status_change_nid)

    Signed-off-by: Lai Jiangshan
    Cc: David Rientjes
    Cc: Minchan Kim
    Cc: KOSAKI Motohiro
    Cc: Yasuaki Ishimatsu
    Cc: Rob Landley
    Cc: Jiang Liu
    Cc: Kay Sievers
    Cc: Greg Kroah-Hartman
    Cc: Mel Gorman
    Cc: Wen Congyang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     

16 Apr, 2012

1 commit


04 Feb, 2011

1 commit

  • Update the memory sysfs code such that each sysfs memory directory is now
    considered a memory block that can span multiple memory sections per
    memory block. The default size of each memory block is SECTION_SIZE_BITS
    to maintain the current behavior of having a single memory section per
    memory block (i.e. one sysfs directory per memory section).

    For architectures that want to have memory blocks span multiple
    memory sections they need only define their own memory_block_size_bytes()
    routine.

    Update the memory hotplug documentation to reflect the new behaviors of
    memory blocks reflected in sysfs.

    Signed-off-by: Nathan Fontenot
    Reviewed-by: Robin Holt
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Greg Kroah-Hartman

    Nathan Fontenot
     

16 Dec, 2009

1 commit

  • Commit c04fc586c (mm: show node to memory section relationship with
    symlinks in sysfs) created symlinks from nodes to memory sections, e.g.

    /sys/devices/system/node/node1/memory135 -> ../../memory/memory135

    If you're examining the memory section though and are wondering what node
    it might belong to, you can find it by grovelling around in sysfs, but
    it's a little cumbersome.

    Add a reverse symlink for each memory section that points back to the
    node to which it belongs.

    Signed-off-by: Alex Chiang
    Cc: Gary Hade
    Cc: Badari Pulavarty
    Cc: Ingo Molnar
    Acked-by: David Rientjes
    Cc: Greg KH
    Cc: Randy Dunlap
    Cc: David Rientjes
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Chiang
     

13 Jun, 2009

1 commit


07 Jan, 2009

1 commit

  • Show node to memory section relationship with symlinks in sysfs

    Add /sys/devices/system/node/nodeX/memoryY symlinks for all
    the memory sections located on nodeX. For example:
    /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
    indicates that memory section 135 resides on node1.

    Also revises documentation to cover this change as well as updating
    Documentation/ABI/testing/sysfs-devices-memory to include descriptions
    of memory hotremove files 'phys_device', 'phys_index', and 'state'
    that were previously not described there.

    In addition to it always being a good policy to provide users with
    the maximum possible amount of physical location information for
    resources that can be hot-added and/or hot-removed, the following
    are some (but likely not all) of the user benefits provided by
    this change.
    Immediate:
    - Provides information needed to determine the specific node
    on which a defective DIMM is located. This will reduce system
    downtime when the node or defective DIMM is swapped out.
    - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM. This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node. The consequences of reintroducing the defective memory
    could be ugly.
    - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
    Future:
    - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

    Symlink creation during boot was tested on 2-node x86_64, 2-node
    ppc64, and 2-node ia64 systems. Symlink creation during physical
    memory hot-add tested on a 2-node x86_64 system.

    Signed-off-by: Gary Hade
    Signed-off-by: Badari Pulavarty
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gary Hade
     

22 Oct, 2007

1 commit


12 Aug, 2007

1 commit

  • This is add a document for memory hotplug to describe "How to use" and
    "Current status".

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto