17 Jul, 2019

2 commits

  • It is now allowed to use persistent memory like a regular RAM, but
    currently there is no way to remove this memory until machine is
    rebooted.

    This work expands the functionality to also allows hotremoving
    previously hotplugged persistent memory, and recover the device for use
    for other purposes.

    To hotremove persistent memory, the management software must first
    offline all memory blocks of dax region, and than unbind it from
    device-dax/kmem driver. So, operations should look like this:

    echo offline > /sys/devices/system/memory/memoryN/state
    ...
    echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind

    Note: if unbind is done without offlining memory beforehand, it won't be
    possible to do dax0.0 hotremove, and dax's memory is going to be part of
    System RAM until reboot.

    Link: http://lkml.kernel.org/r/20190517215438.6487-4-pasha.tatashin@soleen.com
    Signed-off-by: Pavel Tatashin
    Reviewed-by: David Hildenbrand
    Cc: James Morris
    Cc: Sasha Levin
    Cc: Michal Hocko
    Cc: Dave Hansen
    Cc: Dan Williams
    Cc: Keith Busch
    Cc: Vishal Verma
    Cc: Dave Jiang
    Cc: Ross Zwisler
    Cc: Tom Lendacky
    Cc: Huang Ying
    Cc: Fengguang Wu
    Cc: Borislav Petkov
    Cc: Bjorn Helgaas
    Cc: Yaowei Bai
    Cc: Takashi Iwai
    Cc: Jérôme Glisse
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • Patch series ""Hotremove" persistent memory", v6.

    Recently, adding a persistent memory to be used like a regular RAM was
    added to Linux. This work extends this functionality to also allow hot
    removing persistent memory.

    We (Microsoft) have an important use case for this functionality.

    The requirement is for physical machines with small amount of RAM (~8G)
    to be able to reboot in a very short period of time ( /sys/bus/dax/drivers/device_dax/unbind
    echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
    echo online_movable > /sys/devices/system/memoryXXX/state
    4. Before reboot hotremove device-dax memory from System RAM
    echo offline > /sys/devices/system/memoryXXX/state
    echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
    5. Create raw pmem0 device
    ndctl create-namespace --mode raw -e namespace0.0 -f
    6. Copy the state that was stored by apps to ramdisk to pmem device
    7. Do kexec reboot or reboot through firmware if firmware does not
    zero memory in pmem0 region (These machines have only regular
    volatile memory). So to have pmem0 device either memmap kernel
    parameter is used, or devices nodes in dtb are specified.

    This patch (of 3):

    When add_memory() fails, the resource and the memory should be freed.

    Link: http://lkml.kernel.org/r/20190517215438.6487-2-pasha.tatashin@soleen.com
    Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM")
    Signed-off-by: Pavel Tatashin
    Reviewed-by: Dave Hansen
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Hildenbrand
    Cc: Fengguang Wu
    Cc: Huang Ying
    Cc: James Morris
    Cc: Jérôme Glisse
    Cc: Keith Busch
    Cc: Michal Hocko
    Cc: Ross Zwisler
    Cc: Sasha Levin
    Cc: Takashi Iwai
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Yaowei Bai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     

01 Mar, 2019

1 commit

  • This is intended for use with NVDIMMs that are physically persistent
    (physically like flash) so that they can be used as a cost-effective
    RAM replacement. Intel Optane DC persistent memory is one
    implementation of this kind of NVDIMM.

    Currently, a persistent memory region is "owned" by a device driver,
    either the "Direct DAX" or "Filesystem DAX" drivers. These drivers
    allow applications to explicitly use persistent memory, generally
    by being modified to use special, new libraries. (DIMM-based
    persistent memory hardware/software is described in great detail
    here: Documentation/nvdimm/nvdimm.txt).

    However, this limits persistent memory use to applications which
    *have* been modified. To make it more broadly usable, this driver
    "hotplugs" memory into the kernel, to be managed and used just like
    normal RAM would be.

    To make this work, management software must remove the device from
    being controlled by the "Device DAX" infrastructure:

    echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind

    and then tell the new driver that it can bind to the device:

    echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id

    After this, there will be a number of new memory sections visible
    in sysfs that can be onlined, or that may get onlined by existing
    udev-initiated memory hotplug rules.

    This rebinding procedure is currently a one-way trip. Once memory
    is bound to "kmem", it's there permanently and can not be
    unbound and assigned back to device_dax.

    The kmem driver will never bind to a dax device unless the device
    is *explicitly* bound to the driver. There are two reasons for
    this: One, since it is a one-way trip, it can not be undone if
    bound incorrectly. Two, the kmem driver destroys data on the
    device. Think of if you had good data on a pmem device. It
    would be catastrophic if you compile-in "kmem", but leave out
    the "device_dax" driver. kmem would take over the device and
    write volatile data all over your good data.

    This inherits any existing NUMA information for the newly-added
    memory from the persistent memory device that came from the
    firmware. On Intel platforms, the firmware has guarantees that
    require each socket's persistent memory to be in a separate
    memory-only NUMA node. That means that this patch is not expected
    to create NUMA nodes, but will simply hotplug memory into existing
    nodes.

    Because NUMA nodes are created, the existing NUMA APIs and tools
    are sufficient to create policies for applications or memory areas
    to have affinity for or an aversion to using this memory.

    There is currently some metadata at the beginning of pmem regions.
    The section-size memory hotplug restrictions, plus this small
    reserved area can cause the "loss" of a section or two of capacity.
    This should be fixable in follow-on patches. But, as a first step,
    losing 256MB of memory (worst case) out of hundreds of gigabytes
    is a good tradeoff vs. the required code to fix this up precisely.
    This calculation is also the reason we export
    memory_block_size_bytes().

    Signed-off-by: Dave Hansen
    Reviewed-by: Dan Williams
    Reviewed-by: Keith Busch
    Cc: Dave Jiang
    Cc: Ross Zwisler
    Cc: Vishal Verma
    Cc: Tom Lendacky
    Cc: Andrew Morton
    Cc: Michal Hocko
    Cc: linux-nvdimm@lists.01.org
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-mm@kvack.org
    Cc: Huang Ying
    Cc: Fengguang Wu
    Cc: Borislav Petkov
    Cc: Bjorn Helgaas
    Cc: Yaowei Bai
    Cc: Takashi Iwai
    Cc: Jerome Glisse
    Reviewed-by: Vishal Verma
    Signed-off-by: Dan Williams

    Dave Hansen