31 May, 2019
1 commit
-
Based on 3 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more detailsthis program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version [author] [kishon] [vijay] [abraham]
[i] [kishon]@[ti] [com] this program is distributed in the hope that
it will be useful but without any warranty without even the implied
warranty of merchantability or fitness for a particular purpose see
the gnu general public license for more detailsthis program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version [author] [graeme] [gregory]
[gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i]
[kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema]
[hk] [hemahk]@[ti] [com] this program is distributed in the hope
that it will be useful but without any warranty without even the
implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more detailsextracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 1105 file(s).
Signed-off-by: Thomas Gleixner
Reviewed-by: Allison Randal
Reviewed-by: Richard Fontana
Reviewed-by: Kate Stewart
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.de
Signed-off-by: Greg Kroah-Hartman
05 Apr, 2019
1 commit
-
Parsing entries in an ACPI table had assumed a generic header
structure. There is no standard ACPI header, though, so less common
layouts with different field sizes required custom parsers to go through
their subtable entry list.Create the infrastructure for adding different table types so parsing
the entries array may be more reused for all ACPI system tables and
the common code doesn't need to be duplicated.Reviewed-by: Rafael J. Wysocki
Acked-by: Jonathan Cameron
Tested-by: Jonathan Cameron
Signed-off-by: Keith Busch
Tested-by: Brice Goglin
Signed-off-by: Greg Kroah-Hartman
17 Mar, 2019
1 commit
-
Pull device-dax updates from Dan Williams:
"New device-dax infrastructure to allow persistent memory and other
"reserved" / performance differentiated memories, to be assigned to
the core-mm as "System RAM".Some users want to use persistent memory as additional volatile
memory. They are willing to cope with potential performance
differences, for example between DRAM and 3D Xpoint, and want to use
typical Linux memory management apis rather than a userspace memory
allocator layered over an mmap() of a dax file. The administration
model is to decide how much Persistent Memory (pmem) to use as System
RAM, create a device-dax-mode namespace of that size, and then assign
it to the core-mm. The rationale for device-dax is that it is a
generic memory-mapping driver that can be layered over any "special
purpose" memory, not just pmem. On subsequent boots udev rules can be
used to restore the memory assignment.One implication of using pmem as RAM is that mlock() no longer keeps
data off persistent media. For this reason it is recommended to enable
NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
at rest. We considered making this recommendation an actively enforced
requirement, but in the end decided to leave it as a distribution /
administrator policy to allow for emulation and test environments that
lack security capable NVDIMMs.Summary:
- Replace the /sys/class/dax device model with /sys/bus/dax, and
include a compat driver so distributions can opt-in to the new ABI.- Allow for an alternative driver for the device-dax address-range
- Introduce the 'kmem' driver to hotplug / assign a device-dax
address-range to the core-mm.- Arrange for the device-dax target-node to be onlined so that the
newly added memory range can be uniquely referenced by numa apis"NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
we currently have special - and very annoying rules in the kernel about
accessing PMEM only with the "MC safe" accessors, because machine checks
inside the regular repeat string copy functions can be fatal in some
(not described) circumstances.And apparently the PMEM modules can cause that a lot more than regular
RAM. The argument is that this happens because PMEM doesn't necessarily
get scrubbed at boot like RAM does, but that is planned to be added for
the user space tooling.Quoting Dan from another email:
"The exposure can be reduced in the volatile-RAM case by scanning for
and clearing errors before it is onlined as RAM. The userspace tooling
for that can be in place before v5.1-final. There's also runtime
notifications of errors via acpi_nfit_uc_error_notify() from
background scrubbers on the DIMM devices. With that mechanism the
kernel could proactively clear newly discovered poison in the volatile
case, but that would be additional development more suitable for v5.2.I understand the concern, and the need to highlight this issue by
tapping the brakes on feature development, but I don't see PMEM as RAM
making the situation worse when the exposure is also there via DAX in
the PMEM case. Volatile-RAM is arguably a safer use case since it's
possible to repair pages where the persistent case needs active
application coordination"* tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: "Hotplug" persistent memory for use like normal RAM
mm/resource: Let walk_system_ram_range() search child resources
mm/memory-hotplug: Allow memory resources to be children
mm/resource: Move HMM pr_debug() deeper into resource code
mm/resource: Return real error codes from walk failures
device-dax: Add a 'modalias' attribute to DAX 'bus' devices
device-dax: Add a 'target_node' attribute
device-dax: Auto-bind device after successful new_id
acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
device-dax: Add /sys/class/dax backwards compatibility
device-dax: Add support for a dax override driver
device-dax: Move resource pinning+mapping into the common driver
device-dax: Introduce bus + driver model
device-dax: Start defining a dax bus model
device-dax: Remove multi-resource infrastructure
device-dax: Kill dax_region base
device-dax: Kill dax_region ida
07 Jan, 2019
1 commit
-
Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware
Interface Table), is the first known instance of a memory range
described by a unique "target" proximity domain. Where "initiator" and
"target" proximity domains is an approach that the ACPI HMAT
(Heterogeneous Memory Attributes Table) uses to described the unique
performance properties of a memory range relative to a given initiator
(e.g. CPU or DMA device).Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y
char-device follows the traditional notion of 'numa-node' where the
attribute conveys the closest online numa-node. That numa-node attribute
is useful for cpu-binding and memory-binding processes *near* the
device. However, when the memory range backing a 'pmem', or 'dax' device
is onlined (memory hot-add) the memory-only-numa-node representing that
address needs to be differentiated from the set of online nodes. In
other words, the numa-node association of the device depends on whether
you can bind processes *near* the cpu-numa-node in the offline
device-case, or bind process *on* the memory-range directly after the
backing address range is onlined.Allow for the case that platform firmware describes persistent memory
with a unique proximity domain, i.e. when it is distinct from the
proximity of DRAM and CPUs that are on the same socket. Plumb the Linux
numa-node translation of that proximity through the libnvdimm region
device to namespaces that are in device-dax mode. With this in place the
proposed kmem driver [1] can optionally discover a unique numa-node
number for the address range as it transitions the memory from an
offline state managed by a device-driver to an online memory range
managed by the core-mm.[1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com
Reported-by: Fan Du
Cc: Michael Ellerman
Cc: "Oliver O'Halloran"
Cc: Dave Hansen
Cc: Jérôme Glisse
Reviewed-by: Yang Shi
Signed-off-by: Dan Williams
03 Jan, 2019
1 commit
-
The addresses of NUMA nodes are not printed correctly on i386-PAE
which is misleading.Here is a debian9-32bit with PAE in a QEMU guest having more than 4G
of memory:qemu-system-i386 \
-hda /var/lib/libvirt/images/debian32.qcow2 \
-m 5G \
-enable-kvm \
-smp 10 \
-numa node,mem=512M,nodeid=0,cpus=0 \
-numa node,mem=512M,nodeid=1,cpus=1 \
-numa node,mem=512M,nodeid=2,cpus=2 \
-numa node,mem=512M,nodeid=3,cpus=3 \
-numa node,mem=512M,nodeid=4,cpus=4 \
-numa node,mem=512M,nodeid=5,cpus=5 \
-numa node,mem=512M,nodeid=6,cpus=6 \
-numa node,mem=512M,nodeid=7,cpus=7 \
-numa node,mem=512M,nodeid=8,cpus=8 \
-numa node,mem=512M,nodeid=9,cpus=9 \
-serial stdioBecause of the wrong value type, it prints as below:
[ 0.021049] ACPI: SRAT Memory (0x0 length 0xa0000) in proximity domain 0 enabled
[ 0.021740] ACPI: SRAT Memory (0x100000 length 0x1ff00000) in proximity domain 0 enabled
[ 0.022425] ACPI: SRAT Memory (0x20000000 length 0x20000000) in proximity domain 1 enabled
[ 0.023092] ACPI: SRAT Memory (0x40000000 length 0x20000000) in proximity domain 2 enabled
[ 0.023764] ACPI: SRAT Memory (0x60000000 length 0x20000000) in proximity domain 3 enabled
[ 0.024431] ACPI: SRAT Memory (0x80000000 length 0x20000000) in proximity domain 4 enabled
[ 0.025104] ACPI: SRAT Memory (0xa0000000 length 0x20000000) in proximity domain 5 enabled
[ 0.025791] ACPI: SRAT Memory (0x0 length 0x20000000) in proximity domain 6 enabled
[ 0.026412] ACPI: SRAT Memory (0x20000000 length 0x20000000) in proximity domain 7 enabled
[ 0.027118] ACPI: SRAT Memory (0x40000000 length 0x20000000) in proximity domain 8 enabled
[ 0.027802] ACPI: SRAT Memory (0x60000000 length 0x20000000) in proximity domain 9 enabledThe upper half of the start address of the NUMA domains between 6
and 9 inclusive was cut, so the printed values are incorrect.Fix the value type, to get the correct values in the log as follows:
[ 0.023698] ACPI: SRAT Memory (0x0 length 0xa0000) in proximity domain 0 enabled
[ 0.024325] ACPI: SRAT Memory (0x100000 length 0x1ff00000) in proximity domain 0 enabled
[ 0.024981] ACPI: SRAT Memory (0x20000000 length 0x20000000) in proximity domain 1 enabled
[ 0.025659] ACPI: SRAT Memory (0x40000000 length 0x20000000) in proximity domain 2 enabled
[ 0.026317] ACPI: SRAT Memory (0x60000000 length 0x20000000) in proximity domain 3 enabled
[ 0.026980] ACPI: SRAT Memory (0x80000000 length 0x20000000) in proximity domain 4 enabled
[ 0.027635] ACPI: SRAT Memory (0xa0000000 length 0x20000000) in proximity domain 5 enabled
[ 0.028311] ACPI: SRAT Memory (0x100000000 length 0x20000000) in proximity domain 6 enabled
[ 0.028985] ACPI: SRAT Memory (0x120000000 length 0x20000000) in proximity domain 7 enabled
[ 0.029667] ACPI: SRAT Memory (0x140000000 length 0x20000000) in proximity domain 8 enabled
[ 0.030334] ACPI: SRAT Memory (0x160000000 length 0x20000000) in proximity domain 9 enabledSigned-off-by: Chao Fan
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki
31 Oct, 2018
1 commit
-
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include@@
@@
- #include
+ #include[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Signed-off-by: Stephen Rothwell
Acked-by: Michal Hocko
Cc: Catalin Marinas
Cc: Chris Zankel
Cc: "David S. Miller"
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Kroah-Hartman
Cc: Guan Xuetao
Cc: Ingo Molnar
Cc: "James E.J. Bottomley"
Cc: Jonas Bonn
Cc: Jonathan Corbet
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Martin Schwidefsky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Palmer Dabbelt
Cc: Paul Burton
Cc: Richard Kuo
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Russell King
Cc: Serge Semin
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Mar, 2018
1 commit
-
Commit 99759869faf1 "acpi: Add acpi_map_pxm_to_online_node()" added
support for mapping a given proximity to its nearest, by SLIT distance,
online node. However, it sometimes returns unexpected results due to the
fact that it switches from comparing the PXM node to the last node that
was closer than the current max.for_each_online_node(n) {
dist = node_distance(node, n);
if (dist < min_dist) {
min_dist = dist;
node = n;
Reviewed-by: Toshi Kani
Acked-by: Rafael J. Wysocki >
Signed-off-by: Dan Williams
27 Nov, 2017
1 commit
-
In current implementation, SRAT Memory Affinity Structure table
parsing is restricted to number of maximum memblocks allowed
(NR_NODE_MEMBLKS). However NR_NODE_MEMBLKS is defined individually
as per architecture requirements. Hence removing the restriction of
SRAT Memory Affinity Structure parsing in ACPI driver code and
let architecture code check for allowed memblocks count.This check is already there in the x86 code, so do the same on ia64.
Signed-off-by: Ganapatrao Kulkarni
Acked-by: Tony Luck
Signed-off-by: Rafael J. Wysocki
25 Jul, 2017
1 commit
-
To save someone the time of searching the ACPI spec for
"Static Resource Affinity Table".Signed-off-by: Ross Zwisler
Signed-off-by: Rafael J. Wysocki
15 Dec, 2016
1 commit
-
acpi_map_pxm_to_node() unconditially maps nodes even when NUMA is turned
off. So acpi_get_node() might return a node > 0, which is fatal when NUMA
is disabled as the rest of the kernel assumes that only node 0 exists.Expose numa_off to the acpi code and return NUMA_NO_NODE when it's set.
Signed-off-by: Boris Ostrovsky
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: linux-ia64@vger.kernel.org
Cc: catalin.marinas@arm.com
Cc: rjw@rjwysocki.net
Cc: will.deacon@arm.com
Cc: linux-acpi@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1481602709-18260-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Thomas Gleixner
22 Jun, 2016
1 commit
-
Add function needed for cpu to node mapping, and enable ACPI based
NUMA for ARM64 in KconfigSigned-off-by: Hanjun Guo
Signed-off-by: Robert Richter
[david.daney@cavium.com added ACPI_NUMA default to y for ARM64]
Signed-off-by: David Daney
Acked-by: Catalin Marinas
Signed-off-by: Rafael J. Wysocki
30 May, 2016
8 commits
-
Loosely based on code from Robert Richter and Hanjun Guo.
Improve out of range node detection as well as allow for Larger SRAT
entities.Add printing of nice messages.
Signed-off-by: David Daney
Signed-off-by: Rafael J. Wysocki -
acpi_numa_memory_affinity_init() will be reused by arm64. Move it to
drivers/acpi/numa.c to facilitate reuse.No code change.
Signed-off-by: Hanjun Guo
Signed-off-by: Robert Richter
Signed-off-by: David Daney
Signed-off-by: Rafael J. Wysocki -
bad_srat() and srat_disabled() are shared by x86 and follow-on arm64
patches. Move them to drivers/acpi/numa.c in preparation for arm64
support.Signed-off-by: Hanjun Guo
Signed-off-by: Robert Richter
[david.daney@cavium.com moved definitions to drivers/acpi/numa.c]
Signed-off-by: David Daney
Signed-off-by: Rafael J. Wysocki -
Identical implementations of acpi_numa_slit_init() are used by both
x86 and follow-on arm64 support. Move it to drivers/acpi/numa.c, and
guard with CONFIG_X86 || CONFIG_ARM64 because ia64 has its own
architecture specific implementation.No code change.
Signed-off-by: Hanjun Guo
Signed-off-by: Robert Richter
Signed-off-by: David Daney
Signed-off-by: Rafael J. Wysocki -
Since acpi_numa_arch_fixup() is only used in arch ia64, move it there
to make a generic interface easier. This avoids empty function stubs
or some complex kconfig options for x86 and arm64.Signed-off-by: Robert Richter
Reviewed-by: Hanjun Guo
Signed-off-by: David Daney
Signed-off-by: Rafael J. Wysocki -
The argument "header" for acpi_table_print_srat_entry()
is always checked before the function is called, it's
duplicate to check it again, remove it.Signed-off-by: Hanjun Guo
Signed-off-by: Robert Richter
Signed-off-by: David Daney
Signed-off-by: Rafael J. Wysocki -
ACPI_DEBUG_PRINT is a bit fragile in acpi/numa.c, the first thing
is that component ACPI_NUMA(0x80000000) is not described in the
Documentation/acpi/debug.txt, and even not defined in the struct
acpi_dlayer acpi_debug_layers which we can not dynamically enable/disable
it with /sys/modules/acpi/parameters/debug_layer. another thing
is that ACPI_DEBUG_OUTPUT is controlled by ACPICA which not coordinate
well with ACPI drivers.Replace ACPI_DEBUG_PRINT() with pr_debug() in this patch as pr_debug
will do the same thing for debug purpose and it can make the code much
cleaner, also remove the related code which not needed anymore if
ACPI_DEBUG_PRINT() is gone.Signed-off-by: Hanjun Guo
Signed-off-by: Robert Richter
Signed-off-by: David Daney
Signed-off-by: Rafael J. Wysocki -
Just do some cleanups to replace printk with pr_fmt().
Signed-off-by: Hanjun Guo
Signed-off-by: Robert Richter
Signed-off-by: David Daney
Signed-off-by: Rafael J. Wysocki
22 Apr, 2016
1 commit
-
SRAT maps APIC ID to proximity domains ids (PXM). Mapping from PXM to
NUMA node ids is based on order of entries in SRAT table.
SRAT table has just LAPIC entires or mix of LAPIC and X2APIC entries.
As long as there are only LAPIC entires, mapping from proximity domain
id to NUMA node id is as assumed by BIOS. However, once APIC entries are
mixed, X2APIC entries would be first mapped which causes unexpected NUMA
node mapping.To fix that, change parsing to check each entry against both LAPIC and
X2APIC so mapping is in the SRAT/PXM order.This is supplemental change to the fix made by commit d81056b5278
(Handle apic/x2apic entries in MADT in correct order) and using the
mechanism introduced by 9b3fedd (ACPI / tables: Add acpi_subtable_proc
to ACPI table parsers).Fixes: d81056b5278 (Handle apic/x2apic entries in MADT in correct order)
Signed-off-by: Lukasz Anaczkowski
[ rjw : Subject & changelog ]
Signed-off-by: Rafael J. Wysocki
08 Jul, 2015
1 commit
-
There is no need to carry potentially outdated Free Software Foundation
mailing address in file headers since the COPYING file includes it.Signed-off-by: Jarkko Nikula
Signed-off-by: Rafael J. Wysocki
26 Jun, 2015
1 commit
-
The kernel initializes CPU & memory's NUMA topology from ACPI
SRAT table. Some other ACPI tables, such as NFIT and DMAR, also
contain proximity IDs for their device's NUMA topology. This
information can be used to improve performance of these devices.This patch introduces acpi_map_pxm_to_online_node(), which is
similar to acpi_map_pxm_to_node(), but always returns an online
node. When the mapped node from a given proximity ID is offline,
it looks up the node distance table and returns the nearest
online node.ACPI device drivers, which are called after the NUMA initialization
has completed in the kernel, can call this interface to obtain their
device NUMA topology from ACPI tables. Such drivers do not have to
deal with offline nodes. A node may be offline when a device
proximity ID is unique, SRAT memory entry does not exist, or NUMA is
disabled, ex. "numa=off" on x86.This patch also moves the pxm range check from acpi_get_node() to
acpi_map_pxm_to_node().Signed-off-by: Toshi Kani
Acked-by: Rafael J. Wysocki >
Signed-off-by: Dan Williams
06 Feb, 2015
1 commit
-
In acpi_table_parse(), pointer of the table to pass to handler() is
checked before handler() called, so remove all the duplicate NULL
check in the handler function.CC: Tony Luck
CC: Thomas Gleixner
Signed-off-by: Hanjun Guo
Signed-off-by: Rafael J. Wysocki
04 Feb, 2014
4 commits
-
Use "__weak" instead of the gcc-specific "__attribute__ ((weak))".
Signed-off-by: Bjorn Helgaas
Acked-by: Rafael J. Wysocki -
__acpi_map_pxm_to_node() and acpi_get_pxm() are only used within
drivers/acpi/numa.c. This makes them static and removes their
declarations.Signed-off-by: Bjorn Helgaas
Acked-by: Rafael J. Wysocki -
Simplify control flow by removing local variable initialization and
returning a constant as soon as possible. No functional change.Signed-off-by: Bjorn Helgaas
Acked-by: Rafael J. Wysocki -
acpi_get_node() takes an acpi_handle, not an "acpi_handle *". This
fixes the prototype and the definitions.Signed-off-by: Bjorn Helgaas
Acked-by: Rafael J. Wysocki
07 Dec, 2013
1 commit
-
Replace direct inclusions of , and
, which are incorrect, with
inclusions and remove some inclusions of those files that aren't
necessary.First of all, , and
should not be included directly from any files that are built for
CONFIG_ACPI unset, because that generally leads to build warnings about
undefined symbols in !CONFIG_ACPI builds. For CONFIG_ACPI set,
includes those files and for CONFIG_ACPI unset it
provides stub ACPI symbols to be used in that case.Second, there are ordering dependencies between those files that always
have to be met. Namely, it is required that be included
prior to so that the acpi_pci_root declarations the
latter depends on are always there. And which provides
basic ACPICA type declarations should always be included prior to any other
ACPI headers in CONFIG_ACPI builds. That also is taken care of including
as appropriate.Signed-off-by: Lv Zheng
Cc: Greg Kroah-Hartman
Cc: Matthew Garrett
Cc: Tony Luck
Cc: "H. Peter Anvin"
Acked-by: Bjorn Helgaas (drivers/pci stuff)
Acked-by: Konrad Rzeszutek Wilk (Xen stuff)
Signed-off-by: Rafael J. Wysocki
24 Sep, 2013
1 commit
-
Use more appropriate NUMA_NO_NODE instead of -1
Signed-off-by: Jianguo Wu
Acked-by: David Rientjes
Signed-off-by: Rafael J. Wysocki
13 Aug, 2013
1 commit
-
__init belongs after the return type on functions, not before it.
Signed-off-by: Hanjun Guo
Signed-off-by: Rafael J. Wysocki
03 Mar, 2013
1 commit
-
Tim found:
WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
Hardware name: S2600CP
sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
smpboot: Booting Node 1, Processors #1
Modules linked in:
Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
Call Trace:
set_cpu_sibling_map+0x279/0x449
start_secondary+0x11d/0x1e5Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")It turns out movable_map has some problems, and it breaks several things
1. numa_init is called several times, NOT just for srat. so those
nodes_clear(numa_nodes_parsed)
memset(&numa_meminfo, 0, sizeof(numa_meminfo))
can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy.
and make fall back path working.2. simply split acpi_numa_init to early_parse_srat.
a. that early_parse_srat is NOT called for ia64, so you break ia64.
b. for (i = 0; i < MAX_LOCAL_APIC; i++)
set_apicid_to_node(i, NUMA_NO_NODE)
still left in numa_init. So it will just clear result from early_parse_srat.
it should be moved before that....
c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
early before override from INITRD is settled.3. that patch TITLE is total misleading, there is NO x86 in the title,
but it changes critical x86 code. It caused x86 guys did not
pay attention to find the problem early. Those patches really should
be routed via tip/x86/mm.4. after that commit, following range can not use movable ram:
a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
b. initrd... it will be freed after booting, so it could be on movable...
c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
anymore.
d. init_mem_mapping: can not put page table high anymore.
e. initmem_init: vmemmap can not be high local node anymore. That is
not good.If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.We have workaround patch that could fix some problems, but some can not
be fixed.So just remove that offending commit and related ones including:
f7210e6c4ac7 ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
protect movablecore_map in memblock_overlaps_region().")01a178a94e8e ("acpi, memory-hotplug: support getting hotplug info from
SRAT")27168d38fa20 ("acpi, memory-hotplug: extend movablemem_map ranges to
the end of node")e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock is
ready")fb06bc8e5f42 ("page_alloc: bootmem limit with movablecore_map")
42f47e27e761 ("page_alloc: make movablemem_map have higher priority")
6981ec31146c ("page_alloc: introduce zone_movable_limit[] to keep
movable limit for nodes")34b71f1e04fc ("page_alloc: add movable_memmap kernel parameter")
4d59a75125d5 ("x86: get pg_data_t's memory from other node")
Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0. Also
need to find way to put other kernel used ram to local node ram.Reported-by: Tim Gardner
Reported-by: Don Morris
Bisected-by: Don Morris
Tested-by: Don Morris
Signed-off-by: Yinghai Lu
Cc: Tony Luck
Cc: Thomas Renninger
Cc: Tejun Heo
Cc: Tang Chen
Cc: Yasuaki Ishimatsu
Signed-off-by: Linus Torvalds
24 Feb, 2013
1 commit
-
On linux, the pages used by kernel could not be migrated. As a result,
if a memory range is used by kernel, it cannot be hot-removed. So if we
want to hot-remove memory, we should prevent kernel from using it.The way now used to prevent this is specify a memory range by
movablemem_map boot option and set it as ZONE_MOVABLE.But when the system is booting, memblock will allocate memory, and
reserve the memory for kernel. And before we parse SRAT, and know the
node memory ranges, memblock is working. And it may allocate memory in
ranges to be set as ZONE_MOVABLE. This memory can be used by kernel,
and never be freed.So, let's parse SRAT before memblock is called first. And it is early
enough.The first call of memblock_find_in_range_node() is in:
setup_arch()
|-->setup_real_mode()so, this patch add a function early_parse_srat() to parse SRAT, and call
it before setup_real_mode() is called.NOTE:
1) early_parse_srat() is called before numa_init(), and has initialized
numa_meminfo. So DO NOT clear numa_nodes_parsed in numa_init() and DO
NOT zero numa_meminfo in numa_init(), otherwise we will lose memory
numa info.2) I don't know why using count of memory affinities parsed from SRAT
as a return value in original acpi_numa_init(). So I add a static
variable srat_mem_cnt to remember this count and use it as the return
value of the new acpi_numa_init()[mhocko@suse.cz: parse SRAT before memblock is ready fix]
Signed-off-by: Tang Chen
Reviewed-by: Wen Congyang
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Kamezawa Hiroyuki
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Yasuaki Ishimatsu
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Len Brown
Cc: "Brown, Len"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Feb, 2013
1 commit
-
* acpi-assorted:
ACPI: Add DMI entry for Sony VGN-FW41E_H
ACPI: fix obsolete comment in custom_method.c
ACPI / thermal: Use mode to enable/disable kernel thermal processing
ACPI thermal: remove unnecessary newline from exception message
ACPI sysfs: remove unnecessary newline from exception
ACPI video: remove unnecessary newline from error messages
ACPI: SRAT: report non-volatile memory in debug
ACPI: Rework acpi_get_child() to be more efficient
26 Jan, 2013
1 commit
-
Just as with the other memory affinity flags, report
non-volatile memory with ACPI debug.Signed-off-by: Davidlohr Bueso
Signed-off-by: Rafael J. Wysocki
11 Jan, 2013
1 commit
-
This is a cosmetic patch only. Comparison of the resulting binary showed
only line number differences.This patch does not affect the generation of the Linux binary.
This patch decreases 44 lines of 20121114 divergence.diff.There are naming conflicts between Linux and ACPICA on table handlers. This
patch cleans up this conflicts to reduce the source code diff between Linux
and ACPICA.Signed-off-by: Lv Zheng
Signed-off-by: Rafael J. Wysocki
03 Aug, 2012
2 commits
-
Otherwise you could run into:
WARN_ON in numa_register_memblks(), because node_possible_map is zeroReferences: https://bugzilla.novell.com/show_bug.cgi?id=757888
On this machine (ProLiant ML570 G3) the SRAT table contains:
- No processor affinities
- One memory affinity structure (which is set disabled)CC: Per Jessen
CC: Andi Kleen
Signed-off-by: Thomas Renninger
Signed-off-by: Len Brown -
No functional change.
Signed-off-by: Thomas Renninger
Signed-off-by: Len Brown
17 Jan, 2012
1 commit
-
In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregrard reserved fields.
In order to know whether or not, we must know what version the SRAT
table has.This patch stores the SRAT table revision for later consumption
by arch specific __init functions.Signed-off-by: Kurt Garloff
Signed-off-by: Len Brown
16 Feb, 2011
1 commit
-
The functions used during NUMA initialization - *_numa_init() and
*_scan_nodes() - have different arguments and return values. Unify
them such that they all take no argument and return 0 on success and
-errno on failure. This is in preparation for further NUMA init
cleanups.Signed-off-by: Tejun Heo
Cc: Yinghai Lu
Cc: Brian Gerst
Cc: Cyrill Gorcunov
Cc: Shaohui Zheng
Cc: David Rientjes
Cc: Ingo Molnar
Cc: H. Peter Anvin
12 Jan, 2011
1 commit
-
As pointed out by Linus CONFIG_X86 in drivers/acpi/numa.c is
ugly.Builds and boots on ia64 (both normally and with maxcpus=8 to limit
the number of cpus).Signed-off-by: Tony Luck
Acked-by: Yinghai Lu
Cc: Linus Torvalds
Cc: Wu Fengguang
Cc: Bjorn Helgaas
Cc: Len Brown
LKML-Reference:
Signed-off-by: Ingo Molnar