20 Apr, 2008
40 commits
-
provide a text based interface to the scheduler features; this saves the
'user' from setting bits using decimal arithmetic.Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
unused at the moment.
Signed-off-by: Ingo Molnar
-
Print a tree of weights.
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
In order to level the hierarchy, we need to calculate load based on the
root view. That is, each task's load is in the same unit.A
/ \
B 1
/ \
2 3To compute 1's load we do:
weight(1)
--------------
rq_weight(A)To compute 2's load we do:
weight(2) weight(B)
------------ * -----------
rq_weight(B) rw_weight(A)This yields load fractions in comparable units.
The consequence is that it changes virtual time. We used to have:
time_{i}
vtime_{i} = ------------
weight_{i}vtime = \Sum vtime_{i} = time / rq_weight.
But with the new way of load calculation we get that vtime equals time.
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
De-couple load-balancing from the rb-trees, so that I can change their
organization.Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
Currently FAIR_GROUP sched grows the scheduler latency outside of
sysctl_sched_latency, invert this so it stays within.Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
Now that the group hierarchy can have an arbitrary depth the O(n^2) nature
of RT task dequeues will really hurt. Optimize this by providing space to
store the tree path, so we can walk it the other way.Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
Add some extra debug output so we can get a better overview of the
full hierarchy.We print the cgroup path after each cfs_rq, so we can see what group
we're looking at.Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
Implement SMP nice support for the full group hierarchy.
On each load-balance action, compile a sched_domain wide view of the full
task_group tree. We compute the domain wide view when walking down the
hierarchy, and readjust the weights when walking back up.After collecting and readjusting the domain wide view, we try to balance the
tasks within the task_groups. The current approach is a naively balance each
task group until we've moved the targeted amount of load.Inspired by Srivatsa Vaddsgiri's previous code and Abhishek Chandra's H-SMP
paper.XXX: there will be some numerical issues due to the limited nature of
SCHED_LOAD_SCALE wrt to representing a task_groups influence on the
total weight. When the tree is deep enough, or the task weight small
enough, we'll run out of bits.Signed-off-by: Peter Zijlstra
CC: Abhishek Chandra
CC: Srivatsa Vaddagiri
Signed-off-by: Ingo Molnar -
[rebased for sched-devel/latest]
- Add a new cpuset file, having levels:
sched_relax_domain_level- Modify partition_sched_domains() and build_sched_domains()
to take attributes parameter passed from cpuset.- Fill newidle_idx for node domains which currently unused but
might be required if sched_relax_domain_level become higher.- We can change the default level by boot option 'relax_domain_level='.
Signed-off-by: Hidetoshi Seto
Signed-off-by: Ingo Molnar -
This patch introduces new feature of cpuset - sched domain customization.
This version provides a per-cpuset file 'sched_relax_domain_level' that
enable us to change the searching range of scheduler, which used to limit
how many cpus the scheduler searches at some schedule events, such as
wakening task and running out of runqueue.Signed-off-by: Hidetoshi Seto
Signed-off-by: Ingo Molnar -
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
multi level rt constraints
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
Add the full parentchild relation thing into task_groups as well.
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
UID grouping doesn't actually have a task_group representing the root of
the task_group tree. Add one.Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
This patch makes the group scheduler multi hierarchy aware.
[a.p.zijlstra@chello.nl: rt-parts and assorted fixes]
Signed-off-by: Dhaval Giani
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
This patch allows tasks and groups to exist in the same cfs_rq. With this
change the CFS group scheduling follows a 1/(M+N) model from a 1/(1+N)
fairness model where M tasks and N groups exist at the cfs_rq level.[a.p.zijlstra@chello.nl: rt bits and assorted fixes]
Signed-off-by: Dhaval Giani
Signed-off-by: Srivatsa Vaddagiri
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
Signed-off-by: Ingo Molnar
-
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
Add a new function that accepts a pointer to the "newly allowed cpus"
cpumask argument.int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)
The current set_cpus_allowed() function is modified to use the above
but this does not result in an ABI change. And with some compiler
optimization help, it may not introduce any additional overhead.Additionally, to enforce the read only nature of the new_mask arg, the
"const" property is migrated to sub-functions called by set_cpus_allowed.
This silences compiler warnings.Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
Move the setting of nr_cpu_ids from sched_init() to start_kernel()
so that it's available as early as possible.Note that an arch has the option of setting it even earlier if need be,
but it should not result in a different value than the setup_nr_cpu_ids()
function.Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Remove another cpumask_t variable from stack that was missed in the
last kernel_sched_c updates.Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Add cpu_sysdev_class functions to display the following maps
with cpulist_scnprintf().cpu_online_map
cpu_present_map
cpu_possible_map* Small change to include/linux/sysdev.h to allow the attribute
name and label to be different (to avoid collision with the
"attr_online" entry for bringing cpus on- and off-line.)Cc: H. Peter Anvin
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Cleaned up references to cpumask_scnprintf() and added new
cpulist_scnprintf() interfaces where appropriate.* Fix some small bugs (or code efficiency improvments) for various uses
of cpumask_scnprintf.* Clean up some checkpatch errors.
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Removed kmalloc (or local array) in show_shared_cpu_map().
* Added show_shared_cpu_list() function.
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Here is a simple patch to use an allocated array of cpumasks to
represent cpumask_of_cpu() instead of constructing one on the stack.
It's based on the Kconfig option "HAVE_CPUMASK_OF_CPU_MAP" which is
currently only set for x86_64 SMP. Otherwise the the existing
cpumask_of_cpu() is used but has been changed to produce an lvalue
so a pointer to it can be used.Cc: H. Peter Anvin
Signed-off-by: Christoph Lameter
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Add a static cpumask_t variable "CPU_MASK_ALL_PTR" to use as
a pointer reference to CPU_MASK_ALL. This reduces where possible
the instances where CPU_MASK_ALL allocates and fills a large
array on the stack. Used only if NR_CPUS > BITS_PER_LONG.* Change init/main.c to use new set_cpus_allowed_ptr().
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr functionCc: H. Peter Anvin
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Remove empty cpumask_t (and all non-zero/non-null) variables
in SD_*_INIT macros. Use memset(0) to clear. Also, don't
inline the initializer functions to save on stack space in
build_sched_domains().* Merge change to include/linux/topology.h that uses the new
node_to_cpumask_ptr function in the nr_cpus_node macro into
this patch.Depends on:
[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
[sched-devel]: sched: add new set_cpus_allowed_ptr functionCc: H. Peter Anvin
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Use new node_to_cpumask_ptr. This creates a pointer to the
cpumask for a given node. This definition is in mm patch:asm-generic-add-node_to_cpumask_ptr-macro.patch
* Use new set_cpus_allowed_ptr function.
Depends on:
[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
[sched-devel]: sched: add new set_cpus_allowed_ptr function
[x86/latest]: x86: add cpus_scnprintf functionCc: Greg Kroah-Hartman
Cc: Greg Banks
Cc: H. Peter Anvin
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Modify sched_affinity functions to pass cpumask_t variables by reference
instead of by value.* Use new set_cpus_allowed_ptr function.
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr functionCc: Paul Jackson
Cc: Cliff Wickman
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Modify cpuset_cpus_allowed to return the currently allowed cpuset
via a pointer argument instead of as the function return value.* Use new set_cpus_allowed_ptr function.
* Cleanup CPU_MASK_ALL and NODE_MASK_ALL uses.
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr functionSigned-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Use new set_cpus_allowed_ptr() function added by previous patch,
which instead of passing the "newly allowed cpus" cpumask_t arg
by value, pass it by pointer:-int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
+int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)* Modify CPU_MASK_ALL
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr functionSigned-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Use new set_cpus_allowed_ptr() function added by previous patch,
which instead of passing the "newly allowed cpus" cpumask_t arg
by value, pass it by pointer:-int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
+int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)* Cleanup uses of CPU_MASK_ALL.
* Collapse other NR_CPUS changes to arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
Use pointers to cpumask_t arguments whenever possible.Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr functionCc: Len Brown
Cc: Dave Jones
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Change fixed size arrays to per_cpu variables or dynamically allocated
arrays in sched_init() and sched_init_smp().(1) static struct sched_entity *init_sched_entity_p[NR_CPUS];
(1) static struct cfs_rq *init_cfs_rq_p[NR_CPUS];
(1) static struct sched_rt_entity *init_sched_rt_entity_p[NR_CPUS];
(1) static struct rt_rq *init_rt_rq_p[NR_CPUS];
static struct sched_group **sched_group_nodes_bycpu[NR_CPUS];(1) - these arrays are allocated via alloc_bootmem_low()
* Change sched_domain_debug_one() to use cpulist_scnprintf instead of
cpumask_scnprintf. This reduces the output buffer required and improves
readability when large NR_CPU count machines arrive.* In sched_create_group() we allocate new arrays based on nr_cpu_ids.
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Replace usages of CPU_MASK_NONE, CPU_MASK_ALL, NODE_MASK_NONE,
NODE_MASK_ALL to reduce stack requirements for large NR_CPUS
and MAXNODES counts.* In some cases, the cpumask variable was initialized but then overwritten
with another value. This is the case for changes like this:- cpumask_t oldmask = CPU_MASK_ALL;
+ cpumask_t oldmask;Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Move large array "struct bootnode nodes" from stack to _initdata
section to reduce amount of stack space required.Cc: H. Peter Anvin
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
Create a simple macro to always return a pointer to the node_to_cpumask(node)
value. This relies on compiler optimization to remove the extra indirection:#define node_to_cpumask_ptr(v, node) \
cpumask_t _##v = node_to_cpumask(node), *v = &_##vFor those systems with a large cpumask size, then a true pointer
to the array element can be used:#define node_to_cpumask_ptr(v, node) \
cpumask_t *v = &(node_to_cpumask_map[node])A node_to_cpumask_ptr_next() macro is provided to access another
node_to_cpumask value.The other change is to always include asm-generic/topology.h moving the
ifdef CONFIG_NUMA to this same file.Note: there are no references to either of these new macros in this patch,
only the definition.Based on 2.6.25-rc5-mm1
# alpha
Cc: Richard Henderson# fujitsu
Cc: David Howells# ia64
Cc: Tony Luck# powerpc
Cc: Paul Mackerras
Cc: Anton Blanchard# sparc
Cc: David S. Miller
Cc: William L. Irwin# x86
Cc: H. Peter AnvinSigned-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
Change the following arrays sized by NR_CPUS to be PERCPU variables:
static struct op_msrs cpu_msrs[NR_CPUS];
static unsigned long saved_lvtpc[NR_CPUS];Also some minor complaints from checkpatch.pl fixed.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.gitAll changes were transparent except for:
static void nmi_shutdown(void)
{
+ struct op_msrs *msrs = &__get_cpu_var(cpu_msrs);
nmi_enabled = 0;
on_each_cpu(nmi_cpu_shutdown, NULL, 0, 1);
unregister_die_notifier(&profile_exceptions_nb);
- model->shutdown(cpu_msrs);
+ model->shutdown(msrs);
free_msrs();
}The existing code passed a reference to cpu 0's instance of struct op_msrs
to model->shutdown, whilst the other functions are passed a reference to
instance of a struct op_msrs. This seemed to be a bug to me
even though as long as cpu 0 and are of the same type it would
have the same effect...?Cc: Philippe Elie
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Change the following static arrays sized by NR_CPUS to
per_cpu data variables:_cpuid4_info *cpuid4_info[NR_CPUS];
_index_kobject *index_kobject[NR_CPUS];
kobject * cache_kobject[NR_CPUS];* Remove the local NR_CPUS array with a kmalloc'd region in
show_shared_cpu_map().Also some minor complaints from checkpatch.pl fixed.
Cc: H. Peter Anvin
Cc: Andi Kleen
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
Add a new function cpumask_scnprintf_len() to return the number of
characters needed to display "len" cpumask bits. The current method
of allocating NR_CPUS bytes is incorrect as what's really needed is
9 characters per 32-bit word of cpumask bits (8 hex digits plus the
seperator [','] or the terminating NULL.) This function provides the
caller the means to allocate the correct string length.Cc: Paul Jackson
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar