31 May, 2019

1 commit

  • Based on 3 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version this program is distributed in the
    hope that it will be useful but without any warranty without even
    the implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version [author] [kishon] [vijay] [abraham]
    [i] [kishon]@[ti] [com] this program is distributed in the hope that
    it will be useful but without any warranty without even the implied
    warranty of merchantability or fitness for a particular purpose see
    the gnu general public license for more details

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version [author] [graeme] [gregory]
    [gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i]
    [kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema]
    [hk] [hemahk]@[ti] [com] this program is distributed in the hope
    that it will be useful but without any warranty without even the
    implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 1105 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Richard Fontana
    Reviewed-by: Kate Stewart
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

05 Apr, 2019

1 commit

  • Parsing entries in an ACPI table had assumed a generic header
    structure. There is no standard ACPI header, though, so less common
    layouts with different field sizes required custom parsers to go through
    their subtable entry list.

    Create the infrastructure for adding different table types so parsing
    the entries array may be more reused for all ACPI system tables and
    the common code doesn't need to be duplicated.

    Reviewed-by: Rafael J. Wysocki
    Acked-by: Jonathan Cameron
    Tested-by: Jonathan Cameron
    Signed-off-by: Keith Busch
    Tested-by: Brice Goglin
    Signed-off-by: Greg Kroah-Hartman

    Keith Busch
     

17 Mar, 2019

1 commit

  • Pull device-dax updates from Dan Williams:
    "New device-dax infrastructure to allow persistent memory and other
    "reserved" / performance differentiated memories, to be assigned to
    the core-mm as "System RAM".

    Some users want to use persistent memory as additional volatile
    memory. They are willing to cope with potential performance
    differences, for example between DRAM and 3D Xpoint, and want to use
    typical Linux memory management apis rather than a userspace memory
    allocator layered over an mmap() of a dax file. The administration
    model is to decide how much Persistent Memory (pmem) to use as System
    RAM, create a device-dax-mode namespace of that size, and then assign
    it to the core-mm. The rationale for device-dax is that it is a
    generic memory-mapping driver that can be layered over any "special
    purpose" memory, not just pmem. On subsequent boots udev rules can be
    used to restore the memory assignment.

    One implication of using pmem as RAM is that mlock() no longer keeps
    data off persistent media. For this reason it is recommended to enable
    NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
    at rest. We considered making this recommendation an actively enforced
    requirement, but in the end decided to leave it as a distribution /
    administrator policy to allow for emulation and test environments that
    lack security capable NVDIMMs.

    Summary:

    - Replace the /sys/class/dax device model with /sys/bus/dax, and
    include a compat driver so distributions can opt-in to the new ABI.

    - Allow for an alternative driver for the device-dax address-range

    - Introduce the 'kmem' driver to hotplug / assign a device-dax
    address-range to the core-mm.

    - Arrange for the device-dax target-node to be onlined so that the
    newly added memory range can be uniquely referenced by numa apis"

    NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
    we currently have special - and very annoying rules in the kernel about
    accessing PMEM only with the "MC safe" accessors, because machine checks
    inside the regular repeat string copy functions can be fatal in some
    (not described) circumstances.

    And apparently the PMEM modules can cause that a lot more than regular
    RAM. The argument is that this happens because PMEM doesn't necessarily
    get scrubbed at boot like RAM does, but that is planned to be added for
    the user space tooling.

    Quoting Dan from another email:
    "The exposure can be reduced in the volatile-RAM case by scanning for
    and clearing errors before it is onlined as RAM. The userspace tooling
    for that can be in place before v5.1-final. There's also runtime
    notifications of errors via acpi_nfit_uc_error_notify() from
    background scrubbers on the DIMM devices. With that mechanism the
    kernel could proactively clear newly discovered poison in the volatile
    case, but that would be additional development more suitable for v5.2.

    I understand the concern, and the need to highlight this issue by
    tapping the brakes on feature development, but I don't see PMEM as RAM
    making the situation worse when the exposure is also there via DAX in
    the PMEM case. Volatile-RAM is arguably a safer use case since it's
    possible to repair pages where the persistent case needs active
    application coordination"

    * tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    device-dax: "Hotplug" persistent memory for use like normal RAM
    mm/resource: Let walk_system_ram_range() search child resources
    mm/memory-hotplug: Allow memory resources to be children
    mm/resource: Move HMM pr_debug() deeper into resource code
    mm/resource: Return real error codes from walk failures
    device-dax: Add a 'modalias' attribute to DAX 'bus' devices
    device-dax: Add a 'target_node' attribute
    device-dax: Auto-bind device after successful new_id
    acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
    device-dax: Add /sys/class/dax backwards compatibility
    device-dax: Add support for a dax override driver
    device-dax: Move resource pinning+mapping into the common driver
    device-dax: Introduce bus + driver model
    device-dax: Start defining a dax bus model
    device-dax: Remove multi-resource infrastructure
    device-dax: Kill dax_region base
    device-dax: Kill dax_region ida

    Linus Torvalds
     

07 Jan, 2019

1 commit

  • Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware
    Interface Table), is the first known instance of a memory range
    described by a unique "target" proximity domain. Where "initiator" and
    "target" proximity domains is an approach that the ACPI HMAT
    (Heterogeneous Memory Attributes Table) uses to described the unique
    performance properties of a memory range relative to a given initiator
    (e.g. CPU or DMA device).

    Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y
    char-device follows the traditional notion of 'numa-node' where the
    attribute conveys the closest online numa-node. That numa-node attribute
    is useful for cpu-binding and memory-binding processes *near* the
    device. However, when the memory range backing a 'pmem', or 'dax' device
    is onlined (memory hot-add) the memory-only-numa-node representing that
    address needs to be differentiated from the set of online nodes. In
    other words, the numa-node association of the device depends on whether
    you can bind processes *near* the cpu-numa-node in the offline
    device-case, or bind process *on* the memory-range directly after the
    backing address range is onlined.

    Allow for the case that platform firmware describes persistent memory
    with a unique proximity domain, i.e. when it is distinct from the
    proximity of DRAM and CPUs that are on the same socket. Plumb the Linux
    numa-node translation of that proximity through the libnvdimm region
    device to namespaces that are in device-dax mode. With this in place the
    proposed kmem driver [1] can optionally discover a unique numa-node
    number for the address range as it transitions the memory from an
    offline state managed by a device-driver to an online memory range
    managed by the core-mm.

    [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com

    Reported-by: Fan Du
    Cc: Michael Ellerman
    Cc: "Oliver O'Halloran"
    Cc: Dave Hansen
    Cc: Jérôme Glisse
    Reviewed-by: Yang Shi
    Signed-off-by: Dan Williams

    Dan Williams
     

03 Jan, 2019

1 commit

  • The addresses of NUMA nodes are not printed correctly on i386-PAE
    which is misleading.

    Here is a debian9-32bit with PAE in a QEMU guest having more than 4G
    of memory:

    qemu-system-i386 \
    -hda /var/lib/libvirt/images/debian32.qcow2 \
    -m 5G \
    -enable-kvm \
    -smp 10 \
    -numa node,mem=512M,nodeid=0,cpus=0 \
    -numa node,mem=512M,nodeid=1,cpus=1 \
    -numa node,mem=512M,nodeid=2,cpus=2 \
    -numa node,mem=512M,nodeid=3,cpus=3 \
    -numa node,mem=512M,nodeid=4,cpus=4 \
    -numa node,mem=512M,nodeid=5,cpus=5 \
    -numa node,mem=512M,nodeid=6,cpus=6 \
    -numa node,mem=512M,nodeid=7,cpus=7 \
    -numa node,mem=512M,nodeid=8,cpus=8 \
    -numa node,mem=512M,nodeid=9,cpus=9 \
    -serial stdio

    Because of the wrong value type, it prints as below:

    [ 0.021049] ACPI: SRAT Memory (0x0 length 0xa0000) in proximity domain 0 enabled
    [ 0.021740] ACPI: SRAT Memory (0x100000 length 0x1ff00000) in proximity domain 0 enabled
    [ 0.022425] ACPI: SRAT Memory (0x20000000 length 0x20000000) in proximity domain 1 enabled
    [ 0.023092] ACPI: SRAT Memory (0x40000000 length 0x20000000) in proximity domain 2 enabled
    [ 0.023764] ACPI: SRAT Memory (0x60000000 length 0x20000000) in proximity domain 3 enabled
    [ 0.024431] ACPI: SRAT Memory (0x80000000 length 0x20000000) in proximity domain 4 enabled
    [ 0.025104] ACPI: SRAT Memory (0xa0000000 length 0x20000000) in proximity domain 5 enabled
    [ 0.025791] ACPI: SRAT Memory (0x0 length 0x20000000) in proximity domain 6 enabled
    [ 0.026412] ACPI: SRAT Memory (0x20000000 length 0x20000000) in proximity domain 7 enabled
    [ 0.027118] ACPI: SRAT Memory (0x40000000 length 0x20000000) in proximity domain 8 enabled
    [ 0.027802] ACPI: SRAT Memory (0x60000000 length 0x20000000) in proximity domain 9 enabled

    The upper half of the start address of the NUMA domains between 6
    and 9 inclusive was cut, so the printed values are incorrect.

    Fix the value type, to get the correct values in the log as follows:

    [ 0.023698] ACPI: SRAT Memory (0x0 length 0xa0000) in proximity domain 0 enabled
    [ 0.024325] ACPI: SRAT Memory (0x100000 length 0x1ff00000) in proximity domain 0 enabled
    [ 0.024981] ACPI: SRAT Memory (0x20000000 length 0x20000000) in proximity domain 1 enabled
    [ 0.025659] ACPI: SRAT Memory (0x40000000 length 0x20000000) in proximity domain 2 enabled
    [ 0.026317] ACPI: SRAT Memory (0x60000000 length 0x20000000) in proximity domain 3 enabled
    [ 0.026980] ACPI: SRAT Memory (0x80000000 length 0x20000000) in proximity domain 4 enabled
    [ 0.027635] ACPI: SRAT Memory (0xa0000000 length 0x20000000) in proximity domain 5 enabled
    [ 0.028311] ACPI: SRAT Memory (0x100000000 length 0x20000000) in proximity domain 6 enabled
    [ 0.028985] ACPI: SRAT Memory (0x120000000 length 0x20000000) in proximity domain 7 enabled
    [ 0.029667] ACPI: SRAT Memory (0x140000000 length 0x20000000) in proximity domain 8 enabled
    [ 0.030334] ACPI: SRAT Memory (0x160000000 length 0x20000000) in proximity domain 9 enabled

    Signed-off-by: Chao Fan
    [ rjw: Subject & changelog ]
    Signed-off-by: Rafael J. Wysocki

    Chao Fan
     

31 Oct, 2018

1 commit

  • Move remaining definitions and declarations from include/linux/bootmem.h
    into include/linux/memblock.h and remove the redundant header.

    The includes were replaced with the semantic patch below and then
    semi-automated removal of duplicated '#include

    @@
    @@
    - #include
    + #include

    [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
    [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
    [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
    Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
    Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Rothwell
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

16 Mar, 2018

1 commit

  • Commit 99759869faf1 "acpi: Add acpi_map_pxm_to_online_node()" added
    support for mapping a given proximity to its nearest, by SLIT distance,
    online node. However, it sometimes returns unexpected results due to the
    fact that it switches from comparing the PXM node to the last node that
    was closer than the current max.

    for_each_online_node(n) {
    dist = node_distance(node, n);
    if (dist < min_dist) {
    min_dist = dist;
    node = n;
    Reviewed-by: Toshi Kani
    Acked-by: Rafael J. Wysocki >
    Signed-off-by: Dan Williams

    Dan Williams
     

27 Nov, 2017

1 commit

  • In current implementation, SRAT Memory Affinity Structure table
    parsing is restricted to number of maximum memblocks allowed
    (NR_NODE_MEMBLKS). However NR_NODE_MEMBLKS is defined individually
    as per architecture requirements. Hence removing the restriction of
    SRAT Memory Affinity Structure parsing in ACPI driver code and
    let architecture code check for allowed memblocks count.

    This check is already there in the x86 code, so do the same on ia64.

    Signed-off-by: Ganapatrao Kulkarni
    Acked-by: Tony Luck
    Signed-off-by: Rafael J. Wysocki

    Ganapatrao Kulkarni
     

25 Jul, 2017

1 commit


15 Dec, 2016

1 commit

  • acpi_map_pxm_to_node() unconditially maps nodes even when NUMA is turned
    off. So acpi_get_node() might return a node > 0, which is fatal when NUMA
    is disabled as the rest of the kernel assumes that only node 0 exists.

    Expose numa_off to the acpi code and return NUMA_NO_NODE when it's set.

    Signed-off-by: Boris Ostrovsky
    Cc: fenghua.yu@intel.com
    Cc: tony.luck@intel.com
    Cc: linux-ia64@vger.kernel.org
    Cc: catalin.marinas@arm.com
    Cc: rjw@rjwysocki.net
    Cc: will.deacon@arm.com
    Cc: linux-acpi@vger.kernel.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: lenb@kernel.org
    Link: http://lkml.kernel.org/r/1481602709-18260-1-git-send-email-boris.ostrovsky@oracle.com
    Signed-off-by: Thomas Gleixner

    Boris Ostrovsky
     

22 Jun, 2016

1 commit

  • Add function needed for cpu to node mapping, and enable ACPI based
    NUMA for ARM64 in Kconfig

    Signed-off-by: Hanjun Guo
    Signed-off-by: Robert Richter
    [david.daney@cavium.com added ACPI_NUMA default to y for ARM64]
    Signed-off-by: David Daney
    Acked-by: Catalin Marinas
    Signed-off-by: Rafael J. Wysocki

    Hanjun Guo
     

30 May, 2016

8 commits

  • Loosely based on code from Robert Richter and Hanjun Guo.

    Improve out of range node detection as well as allow for Larger SRAT
    entities.

    Add printing of nice messages.

    Signed-off-by: David Daney
    Signed-off-by: Rafael J. Wysocki

    David Daney
     
  • acpi_numa_memory_affinity_init() will be reused by arm64. Move it to
    drivers/acpi/numa.c to facilitate reuse.

    No code change.

    Signed-off-by: Hanjun Guo
    Signed-off-by: Robert Richter
    Signed-off-by: David Daney
    Signed-off-by: Rafael J. Wysocki

    Hanjun Guo
     
  • bad_srat() and srat_disabled() are shared by x86 and follow-on arm64
    patches. Move them to drivers/acpi/numa.c in preparation for arm64
    support.

    Signed-off-by: Hanjun Guo
    Signed-off-by: Robert Richter
    [david.daney@cavium.com moved definitions to drivers/acpi/numa.c]
    Signed-off-by: David Daney
    Signed-off-by: Rafael J. Wysocki

    David Daney
     
  • Identical implementations of acpi_numa_slit_init() are used by both
    x86 and follow-on arm64 support. Move it to drivers/acpi/numa.c, and
    guard with CONFIG_X86 || CONFIG_ARM64 because ia64 has its own
    architecture specific implementation.

    No code change.

    Signed-off-by: Hanjun Guo
    Signed-off-by: Robert Richter
    Signed-off-by: David Daney
    Signed-off-by: Rafael J. Wysocki

    Hanjun Guo
     
  • Since acpi_numa_arch_fixup() is only used in arch ia64, move it there
    to make a generic interface easier. This avoids empty function stubs
    or some complex kconfig options for x86 and arm64.

    Signed-off-by: Robert Richter
    Reviewed-by: Hanjun Guo
    Signed-off-by: David Daney
    Signed-off-by: Rafael J. Wysocki

    Robert Richter
     
  • The argument "header" for acpi_table_print_srat_entry()
    is always checked before the function is called, it's
    duplicate to check it again, remove it.

    Signed-off-by: Hanjun Guo
    Signed-off-by: Robert Richter
    Signed-off-by: David Daney
    Signed-off-by: Rafael J. Wysocki

    Hanjun Guo
     
  • ACPI_DEBUG_PRINT is a bit fragile in acpi/numa.c, the first thing
    is that component ACPI_NUMA(0x80000000) is not described in the
    Documentation/acpi/debug.txt, and even not defined in the struct
    acpi_dlayer acpi_debug_layers which we can not dynamically enable/disable
    it with /sys/modules/acpi/parameters/debug_layer. another thing
    is that ACPI_DEBUG_OUTPUT is controlled by ACPICA which not coordinate
    well with ACPI drivers.

    Replace ACPI_DEBUG_PRINT() with pr_debug() in this patch as pr_debug
    will do the same thing for debug purpose and it can make the code much
    cleaner, also remove the related code which not needed anymore if
    ACPI_DEBUG_PRINT() is gone.

    Signed-off-by: Hanjun Guo
    Signed-off-by: Robert Richter
    Signed-off-by: David Daney
    Signed-off-by: Rafael J. Wysocki

    Hanjun Guo
     
  • Just do some cleanups to replace printk with pr_fmt().

    Signed-off-by: Hanjun Guo
    Signed-off-by: Robert Richter
    Signed-off-by: David Daney
    Signed-off-by: Rafael J. Wysocki

    Hanjun Guo
     

22 Apr, 2016

1 commit

  • SRAT maps APIC ID to proximity domains ids (PXM). Mapping from PXM to
    NUMA node ids is based on order of entries in SRAT table.
    SRAT table has just LAPIC entires or mix of LAPIC and X2APIC entries.
    As long as there are only LAPIC entires, mapping from proximity domain
    id to NUMA node id is as assumed by BIOS. However, once APIC entries are
    mixed, X2APIC entries would be first mapped which causes unexpected NUMA
    node mapping.

    To fix that, change parsing to check each entry against both LAPIC and
    X2APIC so mapping is in the SRAT/PXM order.

    This is supplemental change to the fix made by commit d81056b5278
    (Handle apic/x2apic entries in MADT in correct order) and using the
    mechanism introduced by 9b3fedd (ACPI / tables: Add acpi_subtable_proc
    to ACPI table parsers).

    Fixes: d81056b5278 (Handle apic/x2apic entries in MADT in correct order)
    Signed-off-by: Lukasz Anaczkowski
    [ rjw : Subject & changelog ]
    Signed-off-by: Rafael J. Wysocki

    Lukasz Anaczkowski
     

08 Jul, 2015

1 commit


26 Jun, 2015

1 commit

  • The kernel initializes CPU & memory's NUMA topology from ACPI
    SRAT table. Some other ACPI tables, such as NFIT and DMAR, also
    contain proximity IDs for their device's NUMA topology. This
    information can be used to improve performance of these devices.

    This patch introduces acpi_map_pxm_to_online_node(), which is
    similar to acpi_map_pxm_to_node(), but always returns an online
    node. When the mapped node from a given proximity ID is offline,
    it looks up the node distance table and returns the nearest
    online node.

    ACPI device drivers, which are called after the NUMA initialization
    has completed in the kernel, can call this interface to obtain their
    device NUMA topology from ACPI tables. Such drivers do not have to
    deal with offline nodes. A node may be offline when a device
    proximity ID is unique, SRAT memory entry does not exist, or NUMA is
    disabled, ex. "numa=off" on x86.

    This patch also moves the pxm range check from acpi_get_node() to
    acpi_map_pxm_to_node().

    Signed-off-by: Toshi Kani
    Acked-by: Rafael J. Wysocki >
    Signed-off-by: Dan Williams

    Toshi Kani
     

06 Feb, 2015

1 commit


04 Feb, 2014

4 commits


07 Dec, 2013

1 commit

  • Replace direct inclusions of , and
    , which are incorrect, with
    inclusions and remove some inclusions of those files that aren't
    necessary.

    First of all, , and
    should not be included directly from any files that are built for
    CONFIG_ACPI unset, because that generally leads to build warnings about
    undefined symbols in !CONFIG_ACPI builds. For CONFIG_ACPI set,
    includes those files and for CONFIG_ACPI unset it
    provides stub ACPI symbols to be used in that case.

    Second, there are ordering dependencies between those files that always
    have to be met. Namely, it is required that be included
    prior to so that the acpi_pci_root declarations the
    latter depends on are always there. And which provides
    basic ACPICA type declarations should always be included prior to any other
    ACPI headers in CONFIG_ACPI builds. That also is taken care of including
    as appropriate.

    Signed-off-by: Lv Zheng
    Cc: Greg Kroah-Hartman
    Cc: Matthew Garrett
    Cc: Tony Luck
    Cc: "H. Peter Anvin"
    Acked-by: Bjorn Helgaas (drivers/pci stuff)
    Acked-by: Konrad Rzeszutek Wilk (Xen stuff)
    Signed-off-by: Rafael J. Wysocki

    Lv Zheng
     

24 Sep, 2013

1 commit


13 Aug, 2013

1 commit


03 Mar, 2013

1 commit

  • Tim found:

    WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
    Hardware name: S2600CP
    sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
    smpboot: Booting Node 1, Processors #1
    Modules linked in:
    Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
    Call Trace:
    set_cpu_sibling_map+0x279/0x449
    start_secondary+0x11d/0x1e5

    Don Morris reproduced on a HP z620 workstation, and bisected it to
    commit e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock
    is ready")

    It turns out movable_map has some problems, and it breaks several things

    1. numa_init is called several times, NOT just for srat. so those
    nodes_clear(numa_nodes_parsed)
    memset(&numa_meminfo, 0, sizeof(numa_meminfo))
    can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy.
    and make fall back path working.

    2. simply split acpi_numa_init to early_parse_srat.
    a. that early_parse_srat is NOT called for ia64, so you break ia64.
    b. for (i = 0; i < MAX_LOCAL_APIC; i++)
    set_apicid_to_node(i, NUMA_NO_NODE)
    still left in numa_init. So it will just clear result from early_parse_srat.
    it should be moved before that....
    c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
    early before override from INITRD is settled.

    3. that patch TITLE is total misleading, there is NO x86 in the title,
    but it changes critical x86 code. It caused x86 guys did not
    pay attention to find the problem early. Those patches really should
    be routed via tip/x86/mm.

    4. after that commit, following range can not use movable ram:
    a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
    b. initrd... it will be freed after booting, so it could be on movable...
    c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
    anymore.
    d. init_mem_mapping: can not put page table high anymore.
    e. initmem_init: vmemmap can not be high local node anymore. That is
    not good.

    If node is hotplugable, the mem related range like page table and
    vmemmap could be on the that node without problem and should be on that
    node.

    We have workaround patch that could fix some problems, but some can not
    be fixed.

    So just remove that offending commit and related ones including:

    f7210e6c4ac7 ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
    protect movablecore_map in memblock_overlaps_region().")

    01a178a94e8e ("acpi, memory-hotplug: support getting hotplug info from
    SRAT")

    27168d38fa20 ("acpi, memory-hotplug: extend movablemem_map ranges to
    the end of node")

    e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock is
    ready")

    fb06bc8e5f42 ("page_alloc: bootmem limit with movablecore_map")

    42f47e27e761 ("page_alloc: make movablemem_map have higher priority")

    6981ec31146c ("page_alloc: introduce zone_movable_limit[] to keep
    movable limit for nodes")

    34b71f1e04fc ("page_alloc: add movable_memmap kernel parameter")

    4d59a75125d5 ("x86: get pg_data_t's memory from other node")

    Later we should have patches that will make sure kernel put page table
    and vmemmap on local node ram instead of push them down to node0. Also
    need to find way to put other kernel used ram to local node ram.

    Reported-by: Tim Gardner
    Reported-by: Don Morris
    Bisected-by: Don Morris
    Tested-by: Don Morris
    Signed-off-by: Yinghai Lu
    Cc: Tony Luck
    Cc: Thomas Renninger
    Cc: Tejun Heo
    Cc: Tang Chen
    Cc: Yasuaki Ishimatsu
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

24 Feb, 2013

1 commit

  • On linux, the pages used by kernel could not be migrated. As a result,
    if a memory range is used by kernel, it cannot be hot-removed. So if we
    want to hot-remove memory, we should prevent kernel from using it.

    The way now used to prevent this is specify a memory range by
    movablemem_map boot option and set it as ZONE_MOVABLE.

    But when the system is booting, memblock will allocate memory, and
    reserve the memory for kernel. And before we parse SRAT, and know the
    node memory ranges, memblock is working. And it may allocate memory in
    ranges to be set as ZONE_MOVABLE. This memory can be used by kernel,
    and never be freed.

    So, let's parse SRAT before memblock is called first. And it is early
    enough.

    The first call of memblock_find_in_range_node() is in:

    setup_arch()
    |-->setup_real_mode()

    so, this patch add a function early_parse_srat() to parse SRAT, and call
    it before setup_real_mode() is called.

    NOTE:

    1) early_parse_srat() is called before numa_init(), and has initialized
    numa_meminfo. So DO NOT clear numa_nodes_parsed in numa_init() and DO
    NOT zero numa_meminfo in numa_init(), otherwise we will lose memory
    numa info.

    2) I don't know why using count of memory affinities parsed from SRAT
    as a return value in original acpi_numa_init(). So I add a static
    variable srat_mem_cnt to remember this count and use it as the return
    value of the new acpi_numa_init()

    [mhocko@suse.cz: parse SRAT before memblock is ready fix]
    Signed-off-by: Tang Chen
    Reviewed-by: Wen Congyang
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Kamezawa Hiroyuki
    Cc: Lai Jiangshan
    Cc: Wu Jianguo
    Cc: Yasuaki Ishimatsu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Len Brown
    Cc: "Brown, Len"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     

15 Feb, 2013

1 commit

  • * acpi-assorted:
    ACPI: Add DMI entry for Sony VGN-FW41E_H
    ACPI: fix obsolete comment in custom_method.c
    ACPI / thermal: Use mode to enable/disable kernel thermal processing
    ACPI thermal: remove unnecessary newline from exception message
    ACPI sysfs: remove unnecessary newline from exception
    ACPI video: remove unnecessary newline from error messages
    ACPI: SRAT: report non-volatile memory in debug
    ACPI: Rework acpi_get_child() to be more efficient

    Rafael J. Wysocki
     

26 Jan, 2013

1 commit


11 Jan, 2013

1 commit

  • This is a cosmetic patch only. Comparison of the resulting binary showed
    only line number differences.

    This patch does not affect the generation of the Linux binary.
    This patch decreases 44 lines of 20121114 divergence.diff.

    There are naming conflicts between Linux and ACPICA on table handlers. This
    patch cleans up this conflicts to reduce the source code diff between Linux
    and ACPICA.

    Signed-off-by: Lv Zheng
    Signed-off-by: Rafael J. Wysocki

    Lv Zheng
     

03 Aug, 2012

2 commits


17 Jan, 2012

1 commit

  • In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
    32bits for these. The new fields were reserved before.
    According to the ACPI spec, the OS must disregrard reserved fields.
    In order to know whether or not, we must know what version the SRAT
    table has.

    This patch stores the SRAT table revision for later consumption
    by arch specific __init functions.

    Signed-off-by: Kurt Garloff
    Signed-off-by: Len Brown

    Kurt Garloff
     

16 Feb, 2011

1 commit

  • The functions used during NUMA initialization - *_numa_init() and
    *_scan_nodes() - have different arguments and return values. Unify
    them such that they all take no argument and return 0 on success and
    -errno on failure. This is in preparation for further NUMA init
    cleanups.

    Signed-off-by: Tejun Heo
    Cc: Yinghai Lu
    Cc: Brian Gerst
    Cc: Cyrill Gorcunov
    Cc: Shaohui Zheng
    Cc: David Rientjes
    Cc: Ingo Molnar
    Cc: H. Peter Anvin

    Tejun Heo
     

12 Jan, 2011

1 commit

  • As pointed out by Linus CONFIG_X86 in drivers/acpi/numa.c is
    ugly.

    Builds and boots on ia64 (both normally and with maxcpus=8 to limit
    the number of cpus).

    Signed-off-by: Tony Luck
    Acked-by: Yinghai Lu
    Cc: Linus Torvalds
    Cc: Wu Fengguang
    Cc: Bjorn Helgaas
    Cc: Len Brown
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tony Luck