Commit ad2c8144418c6a81cefe65379fd47bbe8344cef2

Authored by Joonsoo Kim
Committed by Linus Torvalds
1 parent c9e16131d6

topology: add support for node_to_mem_node() to determine the fallback node

Anton noticed (http://www.spinics.net/lists/linux-mm/msg67489.html) that
on ppc LPARs with memoryless nodes, a large amount of memory was consumed
by slabs and was marked unreclaimable.  He tracked it down to slab
deactivations in the SLUB core when we allocate remotely, leading to poor
efficiency always when memoryless nodes are present.

After much discussion, Joonsoo provided a few patches that help
significantly.  They don't resolve the problem altogether:

 - memory hotplug still needs testing, that is when a memoryless node
   becomes memory-ful, we want to dtrt
 - there are other reasons for going off-node than memoryless nodes,
   e.g., fully exhausted local nodes

Neither case is resolved with this series, but I don't think that should
block their acceptance, as they can be explored/resolved with follow-on
patches.

The series consists of:

[1/3] topology: add support for node_to_mem_node() to determine the
      fallback node

[2/3] slub: fallback to node_to_mem_node() node if allocating on
      memoryless node

      - Joonsoo's patches to cache the nearest node with memory for each
        NUMA node

[3/3] Partial revert of 81c98869faa5 (""kthread: ensure locality of
      task_struct allocations")

 - At Tejun's request, keep the knowledge of memoryless node fallback
   to the allocator core.

This patch (of 3):

We need to determine the fallback node in slub allocator if the allocation
target node is memoryless node.  Without it, the SLUB wrongly select the
node which has no memory and can't use a partial slab, because of node
mismatch.  Introduced function, node_to_mem_node(X), will return a node Y
with memory that has the nearest distance.  If X is memoryless node, it
will return nearest distance node, but, if X is normal node, it will
return itself.

We will use this function in following patch to determine the fallback
node.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anton Blanchard <anton@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Showing 2 changed files with 18 additions and 0 deletions Side-by-side Diff

include/linux/topology.h
... ... @@ -119,14 +119,23 @@
119 119 * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
120 120 */
121 121 DECLARE_PER_CPU(int, _numa_mem_);
  122 +extern int _node_numa_mem_[MAX_NUMNODES];
122 123  
123 124 #ifndef set_numa_mem
124 125 static inline void set_numa_mem(int node)
125 126 {
126 127 this_cpu_write(_numa_mem_, node);
  128 + _node_numa_mem_[numa_node_id()] = node;
127 129 }
128 130 #endif
129 131  
  132 +#ifndef node_to_mem_node
  133 +static inline int node_to_mem_node(int node)
  134 +{
  135 + return _node_numa_mem_[node];
  136 +}
  137 +#endif
  138 +
130 139 #ifndef numa_mem_id
131 140 /* Returns the number of the nearest Node with memory */
132 141 static inline int numa_mem_id(void)
... ... @@ -146,6 +155,7 @@
146 155 static inline void set_cpu_numa_mem(int cpu, int node)
147 156 {
148 157 per_cpu(_numa_mem_, cpu) = node;
  158 + _node_numa_mem_[cpu_to_node(cpu)] = node;
149 159 }
150 160 #endif
151 161  
... ... @@ -156,6 +166,13 @@
156 166 static inline int numa_mem_id(void)
157 167 {
158 168 return numa_node_id();
  169 +}
  170 +#endif
  171 +
  172 +#ifndef node_to_mem_node
  173 +static inline int node_to_mem_node(int node)
  174 +{
  175 + return node;
159 176 }
160 177 #endif
161 178  
... ... @@ -85,6 +85,7 @@
85 85 */
86 86 DEFINE_PER_CPU(int, _numa_mem_); /* Kernel "local memory" node */
87 87 EXPORT_PER_CPU_SYMBOL(_numa_mem_);
  88 +int _node_numa_mem_[MAX_NUMNODES];
88 89 #endif
89 90  
90 91 /*