Eric Lee / smarc-fsl-linux-kernel

20 Apr, 2008

40 commits

f00b45c14 sched: /debug/sched_features ... Browse Code »

provide a text based interface to the scheduler features; this saves the
'user' from setting bits using decimal arithmetic.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-04-20 01:45:00 +0800
06379aba5 sched: add SCHED_FEAT_DEADLINE ... Browse Code »

unused at the moment.

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-04-20 01:45:00 +0800
7ba2e74ab sched: debug: show a weight tree ... Browse Code »

Print a tree of weights.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-04-20 01:45:00 +0800
8f1bc385c sched: fair: weight calculations ... Browse Code »

In order to level the hierarchy, we need to calculate load based on the
root view. That is, each task's load is in the same unit.

A
/ \
B 1
/ \
2 3

To compute 1's load we do:

weight(1)
--------------
rq_weight(A)

To compute 2's load we do:

weight(2) weight(B)
------------ * -----------
rq_weight(B) rw_weight(A)

This yields load fractions in comparable units.

The consequence is that it changes virtual time. We used to have:

time_{i}
vtime_{i} = ------------
weight_{i}

vtime = \Sum vtime_{i} = time / rq_weight.

But with the new way of load calculation we get that vtime equals time.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-04-20 01:45:00 +0800
4a55bd5e9 sched: fair-group: de-couple load-balancing from the rb-trees ... Browse Code »

De-couple load-balancing from the rb-trees, so that I can change their
organization.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-04-20 01:45:00 +0800
ac884dec6 sched: fair-group scheduling vs latency ... Browse Code »

Currently FAIR_GROUP sched grows the scheduler latency outside of
sysctl_sched_latency, invert this so it stays within.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-04-20 01:45:00 +0800
58d6c2d72 sched: rt-group: optimize dequeue_rt_stack ... Browse Code »

Now that the group hierarchy can have an arbitrary depth the O(n^2) nature
of RT task dequeues will really hurt. Optimize this by providing space to
store the tree path, so we can walk it the other way.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-04-20 01:45:00 +0800
d19ca3087 sched: debug: add some debug code to handle the full hierarchy ... Browse Code »

Add some extra debug output so we can get a better overview of the
full hierarchy.

We print the cgroup path after each cfs_rq, so we can see what group
we're looking at.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-04-20 01:45:00 +0800
18d95a283 sched: fair-group: SMP-nice for group scheduling ... Browse Code »

Implement SMP nice support for the full group hierarchy.

On each load-balance action, compile a sched_domain wide view of the full
task_group tree. We compute the domain wide view when walking down the
hierarchy, and readjust the weights when walking back up.

After collecting and readjusting the domain wide view, we try to balance the
tasks within the task_groups. The current approach is a naively balance each
task group until we've moved the targeted amount of load.

Inspired by Srivatsa Vaddsgiri's previous code and Abhishek Chandra's H-SMP
paper.

XXX: there will be some numerical issues due to the limited nature of
SCHED_LOAD_SCALE wrt to representing a task_groups influence on the
total weight. When the tree is deep enough, or the task weight small
enough, we'll run out of bits.

Signed-off-by: Peter Zijlstra
CC: Abhishek Chandra
CC: Srivatsa Vaddagiri
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-04-20 01:45:00 +0800
1d3504fcf sched, cpuset: customize sched domains, core ... Browse Code »

[rebased for sched-devel/latest]

- Add a new cpuset file, having levels:
sched_relax_domain_level

- Modify partition_sched_domains() and build_sched_domains()
to take attributes parameter passed from cpuset.

- Fill newidle_idx for node domains which currently unused but
might be required if sched_relax_domain_level become higher.

- We can change the default level by boot option 'relax_domain_level='.

Signed-off-by: Hidetoshi Seto
Signed-off-by: Ingo Molnar

Hidetoshi Seto
2008-04-20 01:45:00 +0800
4d5f35533 sched, cpuset: customize sched domains, docs ... Browse Code »

This patch introduces new feature of cpuset - sched domain customization.

This version provides a per-cpuset file 'sched_relax_domain_level' that
enable us to change the searching range of scheduler, which used to limit
how many cpus the scheduler searches at some schedule events, such as
wakening task and running out of runqueue.

Signed-off-by: Hidetoshi Seto
Signed-off-by: Ingo Molnar

Hidetoshi Seto
2008-04-20 01:45:00 +0800
b758149c0 sched: prepatory code movement ... Browse Code »

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-04-20 01:45:00 +0800
b40b2e8eb sched: rt: multi level group constraints ... Browse Code »

multi level rt constraints

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-04-20 01:45:00 +0800
f473aa5e0 sched: task_group hierarchy ... Browse Code »

Add the full parentchild relation thing into task_groups as well.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-04-20 01:45:00 +0800
eff766a65 sched: fix the task_group hierarchy for UID grouping ... Browse Code »

UID grouping doesn't actually have a task_group representing the root of
the task_group tree. Add one.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-04-20 01:45:00 +0800
ec7dc8ac7 sched: allow the group scheduler to have multiple levels ... Browse Code »

This patch makes the group scheduler multi hierarchy aware.

[a.p.zijlstra@chello.nl: rt-parts and assorted fixes]
Signed-off-by: Dhaval Giani
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Dhaval Giani
2008-04-20 01:44:59 +0800
354d60c2f sched: mix tasks and groups ... Browse Code »

This patch allows tasks and groups to exist in the same cfs_rq. With this
change the CFS group scheduling follows a 1/(M+N) model from a 1/(1+N)
fairness model where M tasks and N groups exist at the cfs_rq level.

[a.p.zijlstra@chello.nl: rt bits and assorted fixes]
Signed-off-by: Dhaval Giani
Signed-off-by: Srivatsa Vaddagiri
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Dhaval Giani
2008-04-20 01:44:59 +0800
ea736ed5d sched: fix checks ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-04-20 01:44:59 +0800
112f53f5d sched: old sleeper bonus ... Browse Code »

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-04-20 01:44:59 +0800
cd8ba7cd9 sched: add new set_cpus_allowed_ptr function ... Browse Code »

Add a new function that accepts a pointer to the "newly allowed cpus"
cpumask argument.

int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)

The current set_cpus_allowed() function is modified to use the above
but this does not result in an ABI change. And with some compiler
optimization help, it may not introduce any additional overhead.

Additionally, to enforce the read only nature of the new_mask arg, the
"const" property is migrated to sub-functions called by set_cpus_allowed.
This silences compiler warnings.

Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:59 +0800
e0982e90c init: move setup of nr_cpu_ids to as early as possible ... Browse Code »

Move the setting of nr_cpu_ids from sched_init() to start_kernel()
so that it's available as early as possible.

Note that an arch has the option of setting it even earlier if need be,
but it should not result in a different value than the setup_nr_cpu_ids()
function.

Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:59 +0800
4bdbaad33 sched: remove another cpumask_t variable from stack ... Browse Code »

* Remove another cpumask_t variable from stack that was missed in the
last kernel_sched_c updates.

Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:59 +0800
9d1fe3236 cpumask: add show cpu map functions ... Browse Code »

* Add cpu_sysdev_class functions to display the following maps
with cpulist_scnprintf().

cpu_online_map
cpu_present_map
cpu_possible_map

* Small change to include/linux/sysdev.h to allow the attribute
name and label to be different (to avoid collision with the
"attr_online" entry for bringing cpus on- and off-line.)

Cc: H. Peter Anvin
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:59 +0800
39106dcf8 cpumask: use new cpus_scnprintf function ... Browse Code »

* Cleaned up references to cpumask_scnprintf() and added new
cpulist_scnprintf() interfaces where appropriate.

* Fix some small bugs (or code efficiency improvments) for various uses
of cpumask_scnprintf.

* Clean up some checkpatch errors.

Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:59 +0800
fb0f330e6 x86: modify show_shared_cpu_map in intel_cacheinfo ... Browse Code »

* Removed kmalloc (or local array) in show_shared_cpu_map().

* Added show_shared_cpu_list() function.

Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:59 +0800
9f0e8d040 x86: convert cpumask_of_cpu macro to allocated array ... Browse Code »

* Here is a simple patch to use an allocated array of cpumasks to
represent cpumask_of_cpu() instead of constructing one on the stack.
It's based on the Kconfig option "HAVE_CPUMASK_OF_CPU_MAP" which is
currently only set for x86_64 SMP. Otherwise the the existing
cpumask_of_cpu() is used but has been changed to produce an lvalue
so a pointer to it can be used.

Cc: H. Peter Anvin
Signed-off-by: Christoph Lameter
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:59 +0800
321a8e9dc cpumask: add CPU_MASK_ALL_PTR macro ... Browse Code »

* Add a static cpumask_t variable "CPU_MASK_ALL_PTR" to use as
a pointer reference to CPU_MASK_ALL. This reduces where possible
the instances where CPU_MASK_ALL allocates and fills a large
array on the stack. Used only if NR_CPUS > BITS_PER_LONG.

* Change init/main.c to use new set_cpus_allowed_ptr().

Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function

Cc: H. Peter Anvin
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:59 +0800
7c16ec585 cpumask: reduce stack usage in SD_x_INIT initializers ... Browse Code »

* Remove empty cpumask_t (and all non-zero/non-null) variables
in SD_*_INIT macros. Use memset(0) to clear. Also, don't
inline the initializer functions to save on stack space in
build_sched_domains().

* Merge change to include/linux/topology.h that uses the new
node_to_cpumask_ptr function in the nr_cpus_node macro into
this patch.

Depends on:
[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
[sched-devel]: sched: add new set_cpus_allowed_ptr function

Cc: H. Peter Anvin
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:59 +0800
c5f59f083 nodemask: use new node_to_cpumask_ptr function ... Browse Code »

* Use new node_to_cpumask_ptr. This creates a pointer to the
cpumask for a given node. This definition is in mm patch:

asm-generic-add-node_to_cpumask_ptr-macro.patch

* Use new set_cpus_allowed_ptr function.

Depends on:
[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
[sched-devel]: sched: add new set_cpus_allowed_ptr function
[x86/latest]: x86: add cpus_scnprintf function

Cc: Greg Kroah-Hartman
Cc: Greg Banks
Cc: H. Peter Anvin
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:59 +0800
b53e921ba generic: reduce stack pressure in sched_affinity ... Browse Code »

* Modify sched_affinity functions to pass cpumask_t variables by reference
instead of by value.

* Use new set_cpus_allowed_ptr function.

Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function

Cc: Paul Jackson
Cc: Cliff Wickman
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:59 +0800
f9a86fcbb cpuset: modify cpuset_set_cpus_allowed to use cpumask pointer ... Browse Code »

* Modify cpuset_cpus_allowed to return the currently allowed cpuset
via a pointer argument instead of as the function return value.

* Use new set_cpus_allowed_ptr function.

* Cleanup CPU_MASK_ALL and NODE_MASK_ALL uses.

Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function

Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:58 +0800
f70316dac generic: use new set_cpus_allowed_ptr function ... Browse Code »

* Use new set_cpus_allowed_ptr() function added by previous patch,
which instead of passing the "newly allowed cpus" cpumask_t arg
by value, pass it by pointer:

-int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
+int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)

* Modify CPU_MASK_ALL

Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function

Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:58 +0800
fc0e47484 x86: use new set_cpus_allowed_ptr function ... Browse Code »

* Use new set_cpus_allowed_ptr() function added by previous patch,
which instead of passing the "newly allowed cpus" cpumask_t arg
by value, pass it by pointer:

-int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
+int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)

* Cleanup uses of CPU_MASK_ALL.

* Collapse other NR_CPUS changes to arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
Use pointers to cpumask_t arguments whenever possible.

Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function

Cc: Len Brown
Cc: Dave Jones
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:58 +0800
434d53b00 sched: remove fixed NR_CPUS sized arrays in kernel_sched_c ... Browse Code »

* Change fixed size arrays to per_cpu variables or dynamically allocated
arrays in sched_init() and sched_init_smp().

(1) static struct sched_entity *init_sched_entity_p[NR_CPUS];
(1) static struct cfs_rq *init_cfs_rq_p[NR_CPUS];
(1) static struct sched_rt_entity *init_sched_rt_entity_p[NR_CPUS];
(1) static struct rt_rq *init_rt_rq_p[NR_CPUS];
static struct sched_group **sched_group_nodes_bycpu[NR_CPUS];

(1) - these arrays are allocated via alloc_bootmem_low()

* Change sched_domain_debug_one() to use cpulist_scnprintf instead of
cpumask_scnprintf. This reduces the output buffer required and improves
readability when large NR_CPU count machines arrive.

* In sched_create_group() we allocate new arrays based on nr_cpu_ids.

Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:58 +0800
d366f8cbc cpumask: Cleanup more uses of CPU_MASK and NODE_MASK ... Browse Code »

* Replace usages of CPU_MASK_NONE, CPU_MASK_ALL, NODE_MASK_NONE,
NODE_MASK_ALL to reduce stack requirements for large NR_CPUS
and MAXNODES counts.

* In some cases, the cpumask variable was initialized but then overwritten
with another value. This is the case for changes like this:

- cpumask_t oldmask = CPU_MASK_ALL;
+ cpumask_t oldmask;

Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:58 +0800
f46bdf2db numa: move large array from stack to _initdata section ... Browse Code »

* Move large array "struct bootnode nodes" from stack to _initdata
section to reduce amount of stack space required.

Cc: H. Peter Anvin
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:58 +0800
aa6b54461 asm-generic: add node_to_cpumask_ptr macro ... Browse Code »

Create a simple macro to always return a pointer to the node_to_cpumask(node)
value. This relies on compiler optimization to remove the extra indirection:

#define node_to_cpumask_ptr(v, node) \
cpumask_t _##v = node_to_cpumask(node), *v = &_##v

For those systems with a large cpumask size, then a true pointer
to the array element can be used:

#define node_to_cpumask_ptr(v, node) \
cpumask_t *v = &(node_to_cpumask_map[node])

A node_to_cpumask_ptr_next() macro is provided to access another
node_to_cpumask value.

The other change is to always include asm-generic/topology.h moving the
ifdef CONFIG_NUMA to this same file.

Note: there are no references to either of these new macros in this patch,
only the definition.

Based on 2.6.25-rc5-mm1

# alpha
Cc: Richard Henderson

# fujitsu
Cc: David Howells

# ia64
Cc: Tony Luck

# powerpc
Cc: Paul Mackerras
Cc: Anton Blanchard

# sparc
Cc: David S. Miller
Cc: William L. Irwin

# x86
Cc: H. Peter Anvin

Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:58 +0800
d18d00f5d x86: oprofile: remove NR_CPUS arrays in arch/x86/oprofile/nmi_int.c ... Browse Code »

Change the following arrays sized by NR_CPUS to be PERCPU variables:

static struct op_msrs cpu_msrs[NR_CPUS];
static unsigned long saved_lvtpc[NR_CPUS];

Also some minor complaints from checkpatch.pl fixed.

Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git

All changes were transparent except for:

static void nmi_shutdown(void)
{
+ struct op_msrs *msrs = &__get_cpu_var(cpu_msrs);
nmi_enabled = 0;
on_each_cpu(nmi_cpu_shutdown, NULL, 0, 1);
unregister_die_notifier(&profile_exceptions_nb);
- model->shutdown(cpu_msrs);
+ model->shutdown(msrs);
free_msrs();
}

The existing code passed a reference to cpu 0's instance of struct op_msrs
to model->shutdown, whilst the other functions are passed a reference to
instance of a struct op_msrs. This seemed to be a bug to me
even though as long as cpu 0 and are of the same type it would
have the same effect...?

Cc: Philippe Elie
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:58 +0800
6b6309b4c x86: reduce memory and stack usage in intel_cacheinfo ... Browse Code »

* Change the following static arrays sized by NR_CPUS to
per_cpu data variables:

_cpuid4_info *cpuid4_info[NR_CPUS];
_index_kobject *index_kobject[NR_CPUS];
kobject * cache_kobject[NR_CPUS];

* Remove the local NR_CPUS array with a kmalloc'd region in
show_shared_cpu_map().

Also some minor complaints from checkpatch.pl fixed.

Cc: H. Peter Anvin
Cc: Andi Kleen
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:58 +0800
30ca60c15 cpumask: add cpumask_scnprintf_len function ... Browse Code »

Add a new function cpumask_scnprintf_len() to return the number of
characters needed to display "len" cpumask bits. The current method
of allocating NR_CPUS bytes is incorrect as what's really needed is
9 characters per 32-bit word of cpumask bits (8 hex digits plus the
seperator [','] or the terminating NULL.) This function provides the
caller the means to allocate the correct string length.

Cc: Paul Jackson
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-04-20 01:44:58 +0800