Doug / smarc-fsl-linux-kernel | Embedian Git Server

27 Jul, 2008

2 commits

85ba2d862 tracehook: wait_task_inactive ... Browse Code »

This extends wait_task_inactive() with a new argument so it can be used in
a "soft" mode where it will check for the task changing state unexpectedly
and back off. There is no change to existing callers. This lays the
groundwork to allow robust, noninvasive tracing that can try to sample a
blocked thread but back off safely if it wakes up.

Signed-off-by: Roland McGrath
Cc: Oleg Nesterov
Reviewed-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2008-07-27 03:00:09 +0800
7babe8db9 Full conversion to early_initcall() interface, remove old interface ... Browse Code »

A previous patch added the early_initcall(), to allow a cleaner hooking of
pre-SMP initcalls. Now we remove the older interface, converting all
existing users to the new one.

[akpm@linux-foundation.org: cleanups]
[akpm@linux-foundation.org: build fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
Signed-off-by: Eduard - Gabriel Munteanu
Cc: Tom Zanussi
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eduard - Gabriel Munteanu
2008-07-27 03:00:04 +0800

26 Jul, 2008

1 commit

49b5cf347 accounting: account for user time when updating memory integrals ... Browse Code »

Adapt acct_update_integrals() to include user time when calculating the time
difference. The units of acct_rss_mem1 and acct_vm_mem1 are also changed from
pages-jiffies to pages-usecs to avoid calling jiffies_to_usecs() in
xacct_add_tsk() which might overflow.

Signed-off-by: Jonathan Lim
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jonathan Lim
2008-07-26 01:53:46 +0800

24 Jul, 2008

2 commits

7f9dce383 Merge branch 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: hrtick_enabled() should use cpu_active()
sched, x86: clean up hrtick implementation
sched: fix build error, provide partition_sched_domains() unconditionally
sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed
cpu hotplug: Make cpu_active_map synchronization dependency clear
cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)
sched: rework of "prioritize non-migratable tasks over migratable ones"
sched: reduce stack size in isolated_cpu_setup()
Revert parts of "ftrace: do not trace scheduler functions"

Fixed up conflicts in include/asm-x86/thread_info.h (due to the
TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and
kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr()
introduction).

Linus Torvalds
2008-07-24 10:36:53 +0800
26dcce0fa Merge branch 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
NR_CPUS: Replace NR_CPUS in speedstep-centrino.c
cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
NR_CPUS: Replace NR_CPUS in cpufreq userspace routines
NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix
cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target
cpumask: Provide a generic set of CPUMASK_ALLOC macros
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c
cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c
cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c
cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
Revert "cpumask: introduce new APIs"
cpumask: make for_each_cpu_mask a bit smaller
net: Pass reference to cpumask variable in net/sunrpc/svc.c
...

Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually

Linus Torvalds
2008-07-24 09:37:44 +0800

22 Jul, 2008

1 commit

4a0b2b4db sysdev: Pass the attribute to the low level sysdev show/store function ... Browse Code »

This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.

I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.

I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.

Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32

Signed-off-by: Andi Kleen
Signed-off-by: Greg Kroah-Hartman

Andi Kleen
2008-07-22 12:55:02 +0800

20 Jul, 2008

3 commits

ba42059fb sched: hrtick_enabled() should use cpu_active() ... Browse Code »

Peter pointed out that hrtick_enabled() should use cpu_active().

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-07-20 17:02:06 +0800
d986434a7 Merge branch 'sched/urgent' into sched/devel Browse Code »

Ingo Molnar
2008-07-20 17:01:29 +0800
31656519e sched, x86: clean up hrtick implementation ... Browse Code »

random uvesafb failures were reported against Gentoo:

http://bugs.gentoo.org/show_bug.cgi?id=222799

and Mihai Moldovan bisected it back to:

> 8f4d37ec073c17e2d4aa8851df5837d798606d6f is first bad commit
> commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
> Author: Peter Zijlstra
> Date: Fri Jan 25 21:08:29 2008 +0100
>
> sched: high-res preemption tick

Linus suspected it to be hrtick + vm86 interaction and observed:

> Btw, Peter, Ingo: I think that commit is doing bad things. They aren't
> _incorrect_ per se, but they are definitely bad.
>
> Why?
>
> Using random _TIF_WORK_MASK flags is really impolite for doing
> "scheduling" work. There's a reason that arch/x86/kernel/entry_32.S
> special-cases the _TIF_NEED_RESCHED flag: we don't want to exit out of
> vm86 mode unnecessarily.
>
> See the "work_notifysig_v86" label, and how it does that
> "save_v86_state()" thing etc etc.

Right, I never liked having to fiddle with those TIF flags. Initially I
needed it because the hrtimer base lock could not nest in the rq lock.
That however is fixed these days.

Currently the only reason left to fiddle with the TIF flags is remote
wakeups. We cannot program a remote cpu's hrtimer. I've been thinking
about using the new and improved IPI function call stuff to implement
hrtimer_start_on().

However that does require that smp_call_function_single(.wait=0) works
from interrupt context - /me looks at the latest series from Jens - Yes
that does seem to be supported, good.

Here's a stab at cleaning this stuff up ...

Mihai reported test success as well.

Signed-off-by: Peter Zijlstra
Tested-by: Mihai Moldovan
Cc: Michal Januszewski
Cc: Antonino Daplas
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-07-20 16:37:28 +0800

18 Jul, 2008

2 commits

e761b7725 cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) ... Browse Code »

This is based on Linus' idea of creating cpu_active_map that prevents
scheduler load balancer from migrating tasks to the cpu that is going
down.

It allows us to simplify domain management code and avoid unecessary
domain rebuilds during cpu hotplug event handling.

Please ignore the cpusets part for now. It needs some more work in order
to avoid crazy lock nesting. Although I did simplfy and unify domain
reinitialization logic. We now simply call partition_sched_domains() in
all the cases. This means that we're using exact same code paths as in
cpusets case and hence the test below cover cpusets too.
Cpuset changes to make rebuild_sched_domains() callable from various
contexts are in the separate patch (right next after this one).

This not only boots but also easily handles
while true; do make clean; make -j 8; done
and
while true; do on-off-cpu 1; done
at the same time.
(on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).

Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
this on right now in gnome-terminal and things are moving just fine.

Also this is running with most of the debug features enabled (lockdep,
mutex, etc) no BUG_ONs or lockdep complaints so far.

I believe I addressed all of the Dmitry's comments for original Linus'
version. I changed both fair and rt balancer to mask out non-active cpus.
And replaced cpu_is_offline() with !cpu_active() in the main scheduler
code where it made sense (to me).

Signed-off-by: Max Krasnyanskiy
Acked-by: Linus Torvalds
Acked-by: Peter Zijlstra
Acked-by: Gregory Haskins
Cc: dmitry.adamushko@gmail.com
Cc: pj@sgi.com
Signed-off-by: Ingo Molnar

Max Krasnyansky
2008-07-18 19:22:25 +0800
13b40c1e4 sched: reduce stack size in isolated_cpu_setup() ... Browse Code »

* Remove 16k stack requirements in isolated_cpu_setup when NR_CPUS=4096.

Signed-off-by: Mike Travis
Cc: Andrew Morton
Signed-off-by: Ingo Molnar

Mike Travis
2008-07-18 17:55:42 +0800

16 Jul, 2008

1 commit

82638844d Merge branch 'linus' into cpus4096 ... Browse Code »

Conflicts:

arch/x86/xen/smp.c
kernel/sched_rt.c
net/iucv/iucv.c

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-07-16 06:29:07 +0800

15 Jul, 2008

2 commits

666484f02 Merge branch 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
softirq: remove irqs_disabled warning from local_bh_enable
softirq: remove initialization of static per-cpu variable
Remove argument from open_softirq which is always NULL

Linus Torvalds
2008-07-15 06:28:42 +0800
948769a5b Merge branch 'sched/new-API-sched_setscheduler' of git://git.kernel.org/pub/scm/… ... Browse Code »

…linux/kernel/git/tip/linux-2.6-tip

* 'sched/new-API-sched_setscheduler' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: add new API sched_setscheduler_nocheck: add a flag to control access checks

Linus Torvalds
2008-07-15 05:50:49 +0800

14 Jul, 2008

2 commits

5806b81ac Merge branch 'auto-ftrace-next' into tracing/for-linus ... Browse Code »

Conflicts:

arch/x86/kernel/entry_32.S
arch/x86/kernel/process_32.c
arch/x86/kernel/process_64.c
arch/x86/lib/Makefile
include/asm-x86/irqflags.h
kernel/Makefile
kernel/sched.c

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-07-14 22:11:52 +0800
d14c8a680 Merge branch 'sched/for-linus' into tracing/for-linus Browse Code »

Ingo Molnar
2008-07-14 22:11:02 +0800

13 Jul, 2008

1 commit

54ef76f37 Merge branch 'linus' into sched/devel Browse Code »

Ingo Molnar
2008-07-13 14:50:13 +0800

12 Jul, 2008

1 commit

ae94b8075 Merge branch 'linus' into x86/core ... Browse Code »

Conflicts:

arch/x86/mm/ioremap.c

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-07-12 13:29:02 +0800

11 Jul, 2008

1 commit

b1e387348 sched: fix cpu hotplug, cleanup ... Browse Code »

Clean up __migrate_task(): to just have separate "done" and "fail"
cases, instead of that "out" case with random error behavior.

Signed-off-by: Ingo Molnar

Linus Torvalds
2008-07-11 02:39:58 +0800

10 Jul, 2008

2 commits

bac0c9103 Merge branch 'tracing/ftrace' into auto-ftrace-next Browse Code »

Ingo Molnar
2008-07-10 17:43:00 +0800
dc7fab8b3 sched: fix cpu hotplug ... Browse Code »

I think we may have a race between try_to_wake_up() and
migrate_live_tasks() -> move_task_off_dead_cpu() when the later one
may end up looping endlessly.

Interrupts are enabled on other CPUs when migration_call(CPU_DEAD, ...) is
called so we may get a race between try_to_wake_up() and
migrate_live_tasks() -> move_task_off_dead_cpu(). The former one may push
a task out of a dead CPU causing the later one to loop endlessly.

Heiko Carstens observed:

| That's exactly what explains a dump I got yesterday. Thanks for fixing! :)

Signed-off-by: Dmitry Adamushko
Cc: miaox@cn.fujitsu.com
Cc: Lai Jiangshan
Cc: Heiko Carstens
Cc: Peter Zijlstra
Cc: Avi Kivity
Cc: Andrew Morton
Signed-off-by: Ingo Molnar

Dmitry Adamushko
2008-07-10 15:35:34 +0800

08 Jul, 2008

1 commit

076ac2af8 sched, numa: replace MAX_NUMNODES with nr_node_ids in kernel/sched.c ... Browse Code »

* Replace usages of MAX_NUMNODES with nr_node_ids in kernel/sched.c,
where appropriate. This saves some allocated space as well as many
wasted cycles going through node entries that are non-existent.

Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar

Mike Travis
2008-07-08 17:31:30 +0800

07 Jul, 2008

1 commit

032f82786 Merge commit 'v2.6.26-rc9' into sched/devel Browse Code »

Ingo Molnar
2008-07-07 14:01:26 +0800

06 Jul, 2008

1 commit

68083e05d Merge commit 'v2.6.26-rc9' into cpus4096 Browse Code »

Ingo Molnar
2008-07-06 20:23:39 +0800

04 Jul, 2008

3 commits

46ac22bab sched: fix accounting in task delay accounting & migration ... Browse Code »

On Thu, Jun 19, 2008 at 12:27:14PM +0200, Peter Zijlstra wrote:
> On Thu, 2008-06-05 at 10:50 +0530, Ankita Garg wrote:
>
> > Thanks Peter for the explanation...
> >
> > I agree with the above and that is the reason why I did not see weird
> > values with cpu_time. But, run_delay still would suffer skews as the end
> > points for delta could be taken on different cpus due to migration (more
> > so on RT kernel due to the push-pull operations). With the below patch,
> > I could not reproduce the issue I had seen earlier. After every dequeue,
> > we take the delta and start wait measurements from zero when moved to a
> > different rq.
>
> OK, so task delay delay accounting is broken because it doesn't take
> migration into account.
>
> What you've done is make it symmetric wrt enqueue, and account it like
>
> cpu0 cpu1
>
> enqueue
>
> dequeue
> enqueue
>
> run
>
> Where you add both d1 and d2 to the run_delay,.. right?
>

Thanks for reviewing the patch. The above is exactly what I have done.

> This seems like a good fix, however it looks like the patch will break
> compilation in !CONFIG_SCHEDSTATS && !CONFIG_TASK_DELAY_ACCT, of it
> failing to provide a stub for sched_info_dequeue() in that case.

Fixed. Pl. find the new patch below.

Signed-off-by: Ankita Garg
Acked-by: Peter Zijlstra
Cc: Gregory Haskins
Cc: rostedt@goodmis.org
Cc: suresh.b.siddha@intel.com
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: dhaval@linux.vnet.ibm.com
Cc: vatsa@linux.vnet.ibm.com
Cc: David Bahi
Signed-off-by: Ingo Molnar

Ankita Garg
2008-07-04 18:50:23 +0800
2087a1ad8 sched: add avg-overlap support to RT tasks ... Browse Code »

We have the notion of tracking process-coupling (a.k.a. buddy-wake) via
the p->se.last_wake / p->se.avg_overlap facilities, but it is only used
for cfs to cfs interactions. There is no reason why an rt to cfs
interaction cannot share in establishing a relationhip in a similar
manner.

Because PREEMPT_RT runs many kernel threads as FIFO priority, we often
times have heavy interaction between RT threads waking CFS applications.
This patch offers a substantial boost (50-60%+) in perfomance under those
circumstances.

Signed-off-by: Gregory Haskins
Cc: npiggin@suse.de
Cc: rostedt@goodmis.org
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Gregory Haskins
2008-07-04 18:50:22 +0800
c4acb2c06 sched: terminate newidle balancing once at least one task has moved over ... Browse Code »

Inspired by Peter Zijlstra.

Signed-off-by: Gregory Haskins
Cc: npiggin@suse.de
Cc: rostedt@goodmis.org
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Gregory Haskins
2008-07-04 18:50:21 +0800

01 Jul, 2008

1 commit

619b04880 sched: fix divide error when trying to configure rt_period to zero ... Browse Code »

Here it is another little Oops we found while configuring invalid values
via cgroups:

echo 0 > /dev/cgroups/0/cpu.rt_period_us
or
echo 4294967296 > /dev/cgroups/0/cpu.rt_period_us

[ 205.509825] divide error: 0000 [#1]
[ 205.510151] Modules linked in:
[ 205.510151]
[ 205.510151] Pid: 2339, comm: bash Not tainted (2.6.26-rc8 #33)
[ 205.510151] EIP: 0060:[] EFLAGS: 00000293 CPU: 0
[ 205.510151] EIP is at div64_u64+0x5f/0x70
[ 205.510151] EAX: 0000389f EBX: 00000000 ECX: 00000000 EDX: 00000000
[ 205.510151] ESI: d9800000 EDI: 00000000 EBP: c6cede60 ESP: c6cede50
[ 205.510151] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 205.510151] Process bash (pid: 2339, ti=c6cec000 task=c79be370 task.ti=c6cec000)
[ 205.510151] Stack: d9800000 0000389f c05971a0 d9800000 c6cedeb4 c0214dbd 00000000 00000000
[ 205.510151] c6cede88 c0242bd8 c05377c0 c7a41b40 00000000 00000000 00000000 c05971a0
[ 205.510151] c780ed20 c7508494 c7a41b40 00000000 00000002 c6cedebc c05971a0 ffffffea
[ 205.510151] Call Trace:
[ 205.510151] [] ? __rt_schedulable+0x1cd/0x240
[ 205.510151] [] ? cgroup_file_open+0x18/0xe0
[ 205.510151] [] ? tg_set_bandwidth+0xa4/0xf0
[ 205.510151] [] ? sched_group_set_rt_period+0x36/0x50
[ 205.510151] [] ? cpu_rt_period_write_uint+0xe/0x10
[ 205.510151] [] ? cgroup_file_write+0x125/0x160
[ 205.510151] [] ? hrtimer_interrupt+0x155/0x190
[ 205.510151] [] ? security_file_permission+0xf/0x20
[ 205.510151] [] ? rw_verify_area+0x48/0xc0
[ 205.510151] [] ? dupfd+0x104/0x130
[ 205.510151] [] ? vfs_write+0x9c/0x160
[ 205.510151] [] ? cgroup_file_write+0x0/0x160
[ 205.510151] [] ? sys_write+0x3d/0x70
[ 205.510151] [] ? sysenter_past_esp+0x6a/0x91
[ 205.510151] =======================
[ 205.510151] Code: 0f 45 de 31 f6 0f ad d0 d3 ea f6 c1 20 0f 45 c2 0f 45 d6 89 45 f0 89 55 f4 8b 55 f4 31 c9 8b 45 f0 39 d3 89 c6 77 08 89 d0 31 d2 f3 89 c1 83 c4 08 89 f0 f7 f3 89 ca 5b 5e 5d c3 55 89 e5 56
[ 205.510151] EIP: [] div64_u64+0x5f/0x70 SS:ESP 0068:c6cede50

The attached patch solves the issue for me.

I'm checking as soon as possible for the period not being zero since, if
it is, going ahead is useless. This way we also save a mutex_lock() and
a read_lock() wrt doing it inside tg_set_bandwidth() or
__rt_schedulable().

Signed-off-by: Dario Faggioli
Signed-off-by: Michael Trimarchi
Signed-off-by: Ingo Molnar

Raistlin
2008-07-01 14:23:24 +0800

30 Jun, 2008

2 commits

30432094a sched: fix warning ... Browse Code »

This patch fixes the following warning:

kernel/sched.c:1667: warning: 'cfs_rq_set_shares' defined but not used

This seems the correct way to fix this; cfs_rq_set_shares() is only used
in a single place, which is also inside #ifdef CONFIG_FAIR_GROUP_SCHED.

Signed-off-by: Vegard Nossum
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar

Vegard Nossum
2008-06-30 14:37:32 +0800
34e83e850 sched: build fix ... Browse Code »

fix:

kernel/sched.c: In function ‘sched_group_set_shares':
kernel/sched.c:8635: error: implicit declaration of function ‘cfs_rq_set_shares'

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-06-30 14:37:13 +0800

29 Jun, 2008

1 commit

79c537998 sched: fix cpu hotplug ... Browse Code »

the CPU hotplug problems (crashes under high-volume unplug+replug
tests) seem to be related to migrate_dead_tasks().

Firstly I added traces to see all tasks being migrated with
migrate_live_tasks() and migrate_dead_tasks(). On my setup the problem
pops up (the one with "se == NULL" in the loop of
pick_next_task_fair()) shortly after the traces indicate that some has
been migrated with migrate_dead_tasks()). btw., I can reproduce it
much faster now with just a plain cpu down/up loop.

[disclaimer] Well, unless I'm really missing something important in
this late hour [/desclaimer] pick_next_task() is not something
appropriate for migrate_dead_tasks() :-)

the following change seems to eliminate the problem on my setup
(although, I kept it running only for a few minutes to get a few
messages indicating migrate_dead_tasks() does move tasks and the
system is still ok)

Signed-off-by: Ingo Molnar

Dmitry Adamushko
2008-06-29 14:50:21 +0800

27 Jun, 2008

9 commits

f1d239f73 sched: incremental effective_load() ... Browse Code »

Increase the accuracy of the effective_load values.

Not only consider the current increment (as per the attempted wakeup), but
also consider the delta between when we last adjusted the shares and the
current situation.

Signed-off-by: Peter Zijlstra
Cc: Srivatsa Vaddagiri
Cc: Mike Galbraith
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-06-27 20:31:47 +0800
83378269a sched: correct wakeup weight calculations ... Browse Code »

rw_i = {2, 4, 1, 0}
s_i = {2/7, 4/7, 1/7, 0}

wakeup on cpu0, weight=1

rw'_i = {3, 4, 1, 0}
s'_i = {3/8, 4/8, 1/8, 0}

s_0 = S * rw_0 / \Sum rw_j ->
\Sum rw_j = S*rw_0/s_0 = 1*2*7/2 = 7 (correct)

s'_0 = S * (rw_0 + 1) / (\Sum rw_j + 1) =
1 * (2+1) / (7+1) = 3/8 (correct

so we find that adding 1 to cpu0 gains 5/56 in weight
if say the other cpu were, cpu1, we'd also have to calculate its 4/56 loss

Signed-off-by: Peter Zijlstra
Cc: Srivatsa Vaddagiri
Cc: Mike Galbraith
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-06-27 20:31:46 +0800
2398f2c6d sched: update shares on wakeup ... Browse Code »

We found that the affine wakeup code needs rather accurate load figures
to be effective. The trouble is that updating the load figures is fairly
expensive with group scheduling. Therefore ratelimit the updating.

Signed-off-by: Peter Zijlstra
Cc: Srivatsa Vaddagiri
Cc: Mike Galbraith
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-06-27 20:31:45 +0800
cd80917e4 sched: fix shares boost logic ... Browse Code »

In case the domain is empty, pretend there is a single task on each cpu, so
that together with the boost logic we end up giving 1/n shares to each
cpu.

Signed-off-by: Peter Zijlstra
Cc: Srivatsa Vaddagiri
Cc: Mike Galbraith
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-06-27 20:31:44 +0800
93b75217d sched: disable source/target_load bias ... Browse Code »

The bias given by source/target_load functions can be very large, disable
it by default to get faster convergence.

Signed-off-by: Peter Zijlstra
Cc: Srivatsa Vaddagiri
Cc: Mike Galbraith
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-06-27 20:31:44 +0800
051c67640 sched: remove prio preference from balance decisions ... Browse Code »

Priority looses much of its meaning in a hierarchical context. So don't
use it in balance decisions.

Signed-off-by: Peter Zijlstra
Cc: Srivatsa Vaddagiri
Cc: Mike Galbraith
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-06-27 20:31:42 +0800
408ed066b sched: hierarchical load vs find_busiest_group ... Browse Code »

find_busiest_group() has some assumptions about task weight being in the
NICE_0_LOAD range. Hierarchical task groups break this assumption - fix this
by replacing it with the average task weight, which will adapt the situation.

Signed-off-by: Peter Zijlstra
Cc: Srivatsa Vaddagiri
Cc: Mike Galbraith
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-06-27 20:31:40 +0800
a8a51d5e5 sched: persistent average load per task ... Browse Code »

Remove the fall-back to SCHED_LOAD_SCALE by remembering the previous value of
cpu_avg_load_per_task() - this is useful because of the hierarchical group
model in which task weight can be much smaller.

Signed-off-by: Peter Zijlstra
Cc: Srivatsa Vaddagiri
Cc: Mike Galbraith
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-06-27 20:31:39 +0800
039a1c41b sched: fix sched_balance_self() smp group balancing ... Browse Code »

Finding the least idle cpu is more accurate when done with updated shares.

Signed-off-by: Peter Zijlstra
Cc: Srivatsa Vaddagiri
Cc: Mike Galbraith
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-06-27 20:31:38 +0800