Commit 38837fc75acb7fa9b0e111b0241fe4fe76c5d4b3

Authored by Paul Jackson
Committed by Linus Torvalds
1 parent af3ffa6758

[PATCH] cpuset: top_cpuset tracks hotplug changes to node_online_map

Change the list of memory nodes allowed to tasks in the top (root) nodeset
to dynamically track what cpus are online, using a call to a cpuset hook
from the memory hotplug code.  Make this top cpus file read-only.

On systems that have cpusets configured in their kernel, but that aren't
actively using cpusets (for some distros, this covers the majority of
systems) all tasks end up in the top cpuset.

If that system does support memory hotplug, then these tasks cannot make
use of memory nodes that are added after system boot, because the memory
nodes are not allowed in the top cpuset.  This is a surprising regression
over earlier kernels that didn't have cpusets enabled.

One key motivation for this change is to remain consistent with the
behaviour for the top_cpuset's 'cpus', which is also read-only, and which
automatically tracks the cpu_online_map.

This change also has the minor benefit that it fixes a long standing,
little noticed, minor bug in cpusets.  The cpuset performance tweak to
short circuit the cpuset_zone_allowed() check on systems with just a single
cpuset (see 'number_of_cpusets', in linux/cpuset.h) meant that simply
changing the 'mems' of the top_cpuset had no affect, even though the change
(the write system call) appeared to succeed.  With the following change,
that write to the 'mems' file fails -EACCES, and the 'mems' file stubbornly
refuses to be changed via user space writes.  Thus no one should be mislead
into thinking they've changed the top_cpusets's 'mems' when in affect they
haven't.

In order to keep the behaviour of cpusets consistent between systems
actively making use of them and systems not using them, this patch changes
the behaviour of the 'mems' file in the top (root) cpuset, making it read
only, and making it automatically track the value of node_online_map.  Thus
tasks in the top cpuset will have automatic use of hot plugged memory nodes
allowed by their cpuset.

[akpm@osdl.org: build fix]
[bunk@stusta.de: build fix]
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Showing 4 changed files with 37 additions and 8 deletions Side-by-side Diff

Documentation/cpusets.txt
... ... @@ -217,11 +217,11 @@
217 217 to represent the cpuset hierarchy provides for a familiar permission
218 218 and name space for cpusets, with a minimum of additional kernel code.
219 219  
220   -The cpus file in the root (top_cpuset) cpuset is read-only.
221   -It automatically tracks the value of cpu_online_map, using a CPU
222   -hotplug notifier. If and when memory nodes can be hotplugged,
223   -we expect to make the mems file in the root cpuset read-only
224   -as well, and have it track the value of node_online_map.
  220 +The cpus and mems files in the root (top_cpuset) cpuset are
  221 +read-only. The cpus file automatically tracks the value of
  222 +cpu_online_map using a CPU hotplug notifier, and the mems file
  223 +automatically tracks the value of node_online_map using the
  224 +cpuset_track_online_nodes() hook.
225 225  
226 226  
227 227 1.4 What are exclusive cpusets ?
include/linux/cpuset.h
... ... @@ -63,6 +63,8 @@
63 63 return current->flags & PF_SPREAD_SLAB;
64 64 }
65 65  
  66 +extern void cpuset_track_online_nodes(void);
  67 +
66 68 #else /* !CONFIG_CPUSETS */
67 69  
68 70 static inline int cpuset_init_early(void) { return 0; }
... ... @@ -125,6 +127,8 @@
125 127 {
126 128 return 0;
127 129 }
  130 +
  131 +static inline void cpuset_track_online_nodes(void) {}
128 132  
129 133 #endif /* !CONFIG_CPUSETS */
130 134  
... ... @@ -912,6 +912,10 @@
912 912 int fudge;
913 913 int retval;
914 914  
  915 + /* top_cpuset.mems_allowed tracks node_online_map; it's read-only */
  916 + if (cs == &top_cpuset)
  917 + return -EACCES;
  918 +
915 919 trialcs = *cs;
916 920 retval = nodelist_parse(buf, trialcs.mems_allowed);
917 921 if (retval < 0)
... ... @@ -2042,9 +2046,8 @@
2042 2046 * (of no affect) on systems that are actively using CPU hotplug
2043 2047 * but making no active use of cpusets.
2044 2048 *
2045   - * This handles CPU hotplug (cpuhp) events. If someday Memory
2046   - * Nodes can be hotplugged (dynamically changing node_online_map)
2047   - * then we should handle that too, perhaps in a similar way.
  2049 + * This routine ensures that top_cpuset.cpus_allowed tracks
  2050 + * cpu_online_map on each CPU hotplug (cpuhp) event.
2048 2051 */
2049 2052  
2050 2053 #ifdef CONFIG_HOTPLUG_CPU
... ... @@ -2060,6 +2063,25 @@
2060 2063 mutex_unlock(&manage_mutex);
2061 2064  
2062 2065 return 0;
  2066 +}
  2067 +#endif
  2068 +
  2069 +/*
  2070 + * Keep top_cpuset.mems_allowed tracking node_online_map.
  2071 + * Call this routine anytime after you change node_online_map.
  2072 + * See also the previous routine cpuset_handle_cpuhp().
  2073 + */
  2074 +
  2075 +#ifdef CONFIG_MEMORY_HOTPLUG
  2076 +void cpuset_track_online_nodes()
  2077 +{
  2078 + mutex_lock(&manage_mutex);
  2079 + mutex_lock(&callback_mutex);
  2080 +
  2081 + top_cpuset.mems_allowed = node_online_map;
  2082 +
  2083 + mutex_unlock(&callback_mutex);
  2084 + mutex_unlock(&manage_mutex);
2063 2085 }
2064 2086 #endif
2065 2087  
... ... @@ -21,6 +21,7 @@
21 21 #include <linux/highmem.h>
22 22 #include <linux/vmalloc.h>
23 23 #include <linux/ioport.h>
  24 +#include <linux/cpuset.h>
24 25  
25 26 #include <asm/tlbflush.h>
26 27  
... ... @@ -282,6 +283,8 @@
282 283  
283 284 /* we online node here. we can't roll back from here. */
284 285 node_set_online(nid);
  286 +
  287 + cpuset_track_online_nodes();
285 288  
286 289 if (new_pgdat) {
287 290 ret = register_one_node(nid);