Eric Lee / smarc-fsl-linux-kernel

09 Jan, 2009

8 commits

6af866af3 cpuset: remove remaining pointers to cpumask_t ... Browse Code »

Impact: cleanups, use new cpumask API

Final trivial cleanups: mainly s/cpumask_t/struct cpumask

Note there is a FIXME in generate_sched_domains(). A future patch will
change struct cpumask *doms to struct cpumask *doms[].
(I suppose Rusty will do this.)

Signed-off-by: Li Zefan
Cc: Ingo Molnar
Cc: Rusty Russell
Acked-by: Mike Travis
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-01-09 00:31:11 +0800
300ed6cbb cpuset: convert cpuset->cpus_allowed to cpumask_var_t ... Browse Code »

Impact: use new cpumask API

This patch mainly does the following things:
- change cs->cpus_allowed from cpumask_t to cpumask_var_t
- call alloc_bootmem_cpumask_var() for top_cpuset in cpuset_init_early()
- call alloc_cpumask_var() for other cpusets
- replace cpus_xxx() to cpumask_xxx()

Signed-off-by: Li Zefan
Cc: Ingo Molnar
Cc: Rusty Russell
Acked-by: Mike Travis
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-01-09 00:31:11 +0800
645fcc9d2 cpuset: don't allocate trial cpuset on stack ... Browse Code »

Impact: cleanups, reduce stack usage

This patch prepares for the next patch. When we convert
cpuset.cpus_allowed to cpumask_var_t, (trialcs = *cs) no longer works.

Another result of this patch is reducing stack usage of trialcs.
sizeof(*cs) can be as large as 148 bytes on x86_64, so it's really not
good to have it on stack.

Signed-off-by: Li Zefan
Cc: Ingo Molnar
Cc: Rusty Russell
Acked-by: Mike Travis
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-01-09 00:31:11 +0800
2341d1b65 cpuset: convert cpuset_attach() to use cpumask_var_t ... Browse Code »

Impact: reduce stack usage

Allocate a global cpumask_var_t at boot, and use it in cpuset_attach(), so
we won't fail cpuset_attach().

Signed-off-by: Li Zefan
Cc: Ingo Molnar
Cc: Rusty Russell
Acked-by: Mike Travis
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-01-09 00:31:11 +0800
5771f0a22 cpuset: remove on stack cpumask_t in cpuset_can_attach() ... Browse Code »

Impact: reduce stack usage

Just use cs->cpus_allowed, and no need to allocate a cpumask_var_t.

Signed-off-by: Li Zefan
Cc: Ingo Molnar
Cc: Rusty Russell
Acked-by: Mike Travis
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-01-09 00:31:11 +0800
5a7625df7 cpuset: remove on stack cpumask_t in cpuset_sprintf_cpulist() ... Browse Code »

This patchset converts cpuset to use new cpumask API, and thus
remove on stack cpumask_t to reduce stack usage.

Before:
# cat kernel/cpuset.c include/linux/cpuset.h | grep -c cpumask_t
21
After:
# cat kernel/cpuset.c include/linux/cpuset.h | grep -c cpumask_t
0

This patch:

Impact: reduce stack usage

It's safe to call cpulist_scnprintf inside callback_mutex, and thus we can
just remove the cpumask_t and no need to allocate a cpumask_var_t.

Signed-off-by: Li Zefan
Cc: Ingo Molnar
Cc: Rusty Russell
Acked-by: Mike Travis
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-01-09 00:31:11 +0800
f5813d942 cpusets: set task's cpu_allowed to cpu_possible_map when attaching it into top cpuset ... Browse Code »

I found a bug on my dual-cpu box. I created a sub cpuset in top cpuset
and assign 1 to its cpus. And then we attach some tasks into this sub
cpuset. After this, we offline CPU1. Now, the tasks in this new cpuset
are moved into top cpuset automatically because there is no cpu in sub
cpuset. Then we online CPU1, we find all the tasks which doesn't belong
to top cpuset originally just run on CPU0.

We fix this bug by setting task's cpu_allowed to cpu_possible_map when
attaching it into top cpuset. This method needn't modify the current
behavior of cpusets on CPU hotplug, and all of tasks in top cpuset use
cpu_possible_map to initialize their cpu_allowed.

Signed-off-by: Miao Xie
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2009-01-09 00:31:11 +0800
13337714f cpuset: rcu_read_lock() to protect task_cs() ... Browse Code »
44

task_cs() calls task_subsys_state().

We must use rcu_read_lock() to protect cgroup_subsys_state().

It's correct that top_cpuset is never freed, but cgroup_subsys_state()
accesses css_set, this css_set maybe freed when task_cs() called.

We use use rcu_read_lock() to protect it.

Signed-off-by: Lai Jiangshan
Acked-by: Paul Menage
Cc: KAMEZAWA Hiroyuki
Cc: Pavel Emelyanov
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2009-01-09 00:31:11 +0800

07 Jan, 2009

1 commit

75aa19941 oom: print triggering task's cpuset and mems allowed ... Browse Code »

When cpusets are enabled, it's necessary to print the triggering task's
set of allowable nodes so the subsequently printed meminfo can be
interpreted correctly.

We also print the task's cpuset name for informational purposes.

[rientjes@google.com: task lock current before dereferencing cpuset]
Cc: Paul Menage
Cc: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2009-01-07 07:58:59 +0800

13 Dec, 2008

1 commit

29c0177e6 cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulis… ... Browse Code »
44

…t_scnprintf to take pointers.

Impact: change calling convention of existing cpumask APIs

Most cpumask functions started with cpus_: these have been replaced by
cpumask_ ones which take struct cpumask pointers as expected.

These four functions don't have good replacement names; fortunately
they're rarely used, so we just change them over.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: mingo@redhat.com
Cc: tony.luck@intel.com
Cc: ralf@linux-mips.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: cl@linux-foundation.org
Cc: srostedt@redhat.com

Rusty Russell
2008-12-13 18:50:25 +0800

30 Nov, 2008

1 commit

1583715dd sched, cpusets: fix warning in kernel/cpuset.c ... Browse Code »

this warning:

kernel/cpuset.c: In function ‘generate_sched_domains’:
kernel/cpuset.c:588: warning: ‘ndoms’ may be used uninitialized in this function

triggers because GCC does not recognize that ndoms stays uninitialized
only if doms is NULL - but that flow is covered at the end of
generate_sched_domains().

Help out GCC by initializing this variable to 0. (that's prudent anyway)

Also, this function needs a splitup and code flow simplification:
with 160 lines length it's clearly too long.

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-11-30 03:39:29 +0800

20 Nov, 2008

1 commit

f481891fd cpuset: update top cpuset's mems after adding a node ... Browse Code »

After adding a node into the machine, top cpuset's mems isn't updated.

By reviewing the code, we found that the update function

cpuset_track_online_nodes()

was invoked after node_states[N_ONLINE] changes. It is wrong because
N_ONLINE just means node has pgdat, and if node has/added memory, we use
N_HIGH_MEMORY. So, We should invoke the update function after
node_states[N_HIGH_MEMORY] changes, just like its commit says.

This patch fixes it. And we use notifier of memory hotplug instead of
direct calling of cpuset_track_online_nodes().

Signed-off-by: Miao Xie
Acked-by: Yasunori Goto
Cc: David Rientjes
Cc: Paul Menage
Signed-off-by: Linus Torvalds

Miao Xie
2008-11-20 10:49:58 +0800

18 Nov, 2008

1 commit

700018e0a cpuset: fix regression when failed to generate sched domains ... Browse Code »

Impact: properly rebuild sched-domains on kmalloc() failure

When cpuset failed to generate sched domains due to kmalloc()
failure, the scheduler should fallback to the single partition
'fallback_doms' and rebuild sched domains, but now it only
destroys but not rebuilds sched domains.

The regression was introduced by:

| commit dfb512ec4834116124da61d6c1ee10fd0aa32bd6
| Author: Max Krasnyansky
| Date: Fri Aug 29 13:11:41 2008 -0700
|
| sched: arch_reinit_sched_domains() must destroy domains to force rebuild

After the above commit, partition_sched_domains(0, NULL, NULL) will
only destroy sched domains and partition_sched_domains(1, NULL, NULL)
will create the default sched domain.

Signed-off-by: Li Zefan
Cc: Max Krasnyansky
Cc:
Signed-off-by: Ingo Molnar

Li Zefan
2008-11-18 15:44:51 +0800

20 Oct, 2008

2 commits

30e8e1360 cpuset: use seq_*mask_* to print masks ... Browse Code »

1) seq_file excepts that m->count == m->size when it's buf is full,
so current code will causes bugs when buf is overflow.

2) There is not too good that cpuset accesses struct seq_file's
fields directly.

Signed-off-by: Lai Jiangshan
Cc: Alexey Dobriyan
Acked-by: Paul Menage
Cc: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2008-10-20 23:52:39 +0800
40b6a7623 cpuset.c: remove extra variable ... Browse Code »

Remove the use of int cpus_nonempty variable from 'update_flag' function.

Signed-off-by: Md.Rakib H. Mullick
Acked-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rakib Mullick
2008-10-20 23:52:39 +0800

03 Oct, 2008

1 commit

d294eb83d cpusets: scan_for_empty_cpusets(), cpuset doesn't seem to be so const ... Browse Code »

This fixes a warning on latest -tip:

kernel/cpuset.c: Dans la fonction «scan_for_empty_cpusets» :
kernel/cpuset.c:1932: attention : passing argument 1 of «list_add_tail» discards qualifiers from pointer target type

Actually the struct cpuset *root passed in parameter to scan_for_empty_cpusets
is not supposed to be const since an entry is added on the tail of its list.
Just correct the qualifier.

Signed-off-by: Frederic Weisbecker
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2008-10-03 19:39:50 +0800

14 Sep, 2008

1 commit

4e74339af cpuset: avoid changing cpuset's cpus when -errno returned ... Browse Code »

After the patch:

commit 0b2f630a28d53b5a2082a5275bc3334b10373508
Author: Miao Xie
Date: Fri Jul 25 01:47:21 2008 -0700

cpusets: restructure the function update_cpumask() and update_nodemask()

It might happen that 'echo 0 > /cpuset/sub/cpus' returned failure but 'cpus'
has been changed, because cpus was changed before calling heap_init() which
may return -ENOMEM.

This patch restores the orginal behavior.

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Cc: Paul Jackson
Cc: Miao Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-09-14 05:41:50 +0800

14 Aug, 2008

1 commit

cf417141c sched, cpuset: rework sched domains and CPU hotplug handling (v4) ... Browse Code »

This is an updated version of my previous cpuset patch on top of
the latest mainline git.
The patch fixes CPU hotplug handling issues in the current cpusets code.
Namely circular locking in rebuild_sched_domains() and unsafe access to
the cpu_online_map in the cpuset cpu hotplug handler.

This version includes changes suggested by Paul Jackson (naming, comments,
style, etc). I also got rid of the separate workqueue thread because it is
now safe to call get_online_cpus() from workqueue callbacks.

Here are some more details:

rebuild_sched_domains() is the only way to rebuild sched domains
correctly based on the current cpuset settings. What this means
is that we need to be able to call it from different contexts,
like cpu hotplug for example.
Also latest scheduler code in -tip now calls rebuild_sched_domains()
directly from functions like arch_reinit_sched_domains().

In order to support that properly we need to rework cpuset locking
rules to avoid circular dependencies, which is what this patch does.
New lock nesting rules are explained in the comments.
We can now safely call rebuild_sched_domains() from virtually any
context. The only requirement is that it needs to be called under
get_online_cpus(). This allows cpu hotplug handlers and the scheduler
to call rebuild_sched_domains() directly.
The rest of the cpuset code now offloads sched domains rebuilds to
a workqueue (async_rebuild_sched_domains()).

This version of the patch addresses comments from the previous review.
I fixed all miss-formated comments and trailing spaces.

I also factored out the code that builds domain masks and split up CPU and
memory hotplug handling. This was needed to simplify locking, to avoid unsafe
access to the cpu_online_map from mem hotplug handler, and in general to make
things cleaner.

The patch passes moderate testing (building kernel with -j 16, creating &
removing domains and bringing cpus off/online at the same time) on the
quad-core2 based machine.

It passes lockdep checks, even with preemptable RCU enabled.
This time I also tested in with suspend/resume path and everything is working
as expected.

Signed-off-by: Max Krasnyansky
Acked-by: Paul Jackson
Cc: menage@google.com
Cc: a.p.zijlstra@chello.nl
Cc: vegard.nossum@gmail.com
Signed-off-by: Ingo Molnar

Max Krasnyansky
2008-08-14 17:23:51 +0800

31 Jul, 2008

4 commits

aeed68242 cpuset: clean up cpuset hierarchy traversal code ... Browse Code »

Use cpuset.stack_list rather than kfifo, so we avoid memory allocation
for kfifo.

Signed-off-by: Li Zefan
Signed-off-by: Lai Jiangshan
Cc: Paul Menage
Cc: Cedric Le Goater
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Cc: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-07-31 00:41:44 +0800
93a655755 cpuset: fix wrong calculation of relax domain level ... Browse Code »

When multiple cpusets are overlapping in their 'cpus' and hence they
form a single sched domain, the largest sched_relax_domain_level among
those should be used. But when top_cpuset's sched_load_balance is
set, its sched_relax_domain_level is used regardless other sub-cpusets'.

This patch fixes it by walking the cpuset hierarchy to find the largest
sched_relax_domain_level.

Signed-off-by: Lai Jiangshan
Signed-off-by: Li Zefan
Cc: Paul Menage
Cc: Cedric Le Goater
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Reviewed-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-07-31 00:41:44 +0800
f5393693e cpuset: speed up sched domain partition ... Browse Code »

All child cpusets contain a subset of the parent's cpus, so we can skip
them when partitioning sched domains. This decreases 'csa' greately for
cpusets with multi-level hierarchy.

Signed-off-by: Lai Jiangshan
Signed-off-by: Li Zefan
Cc: Paul Menage
Cc: Cedric Le Goater
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Reviewed-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2008-07-31 00:41:44 +0800
8d1e6266f cpuset: a bit cleanup for scan_for_empty_cpusets() ... Browse Code »

clean up hierarchy traversal code

Signed-off-by: Li Zefan
Cc: Paul Menage
Cc: Cedric Le Goater
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Cc: Paul Jackson
Cc: Cliff Wickman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-07-31 00:41:44 +0800

26 Jul, 2008

7 commits

da5ef6bb9 cpuset: two minor code-cleanups ... Browse Code »

In cpuset_update_task_memory_state() local variable struct task_struct
*tsk = current;

And local variable tsk is used 14 times and statement task_cs(tsk) is used
twice in this function. So using task_cs(tsk) instead of task_cs(current)
is better for readability.

And "(struct cgroup_scanner *)&scan" is not good for readability also.
(and "container_of" is used in cpuset_do_move_task(), not
"(cpuset_hotplug_scanner *)scan")

Signed-off-by: Lai Jiangshan
Acked-by: Paul Menage
Cc: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2008-07-26 01:53:38 +0800
024124837 cpuset: code-cleanup for started_after ... Browse Code »

cgroup(cgroup_scan_tasks) will initialize heap->gt for us. This patch
removes started_after() and its helper-function.

Signed-off-by: Lai Jiangshan
Acked-by: Paul Menage
Cc: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2008-07-26 01:53:38 +0800
489a5393a cpuset: don't pass empty cpumasks to partition_sched_domains() ... Browse Code »

I create lots of empty cpusets(empty cpumasks) and turn off the
"sched_load_balance" in top cpuset.

I found that all these empty cpumasks are passed to
partition_sched_domains() in rebuild_sched_domains(), it's very
time-consuming for partition_sched_domains() and it's not need.

It also reduce memory consumed and some works in rebuild_sched_domains()
too.

Signed-off-by: Lai Jiangshan
Acked-by: Paul Menage
Cc: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2008-07-26 01:53:38 +0800
c372e817a cpuset: avoid unnecessary sched domains rebuilding ... Browse Code »

When changing 'sched_relax_domain_level', don't rebuild sched domains if
'cpus' is empty or 'sched_load_balance' is not set.

Also make the comments of rebuild_sched_domains() more readable.

Signed-off-by: Li Zefan
Cc: Hidetoshi Seto
Cc: Paul Jackson
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-07-26 01:53:38 +0800
f9b4fb8da cpusets: update task's cpus_allowed and mems_allowed after CPU/NODE offline/online ... Browse Code »

The bug is that a task may run on the cpu/node which is not in its
cpuset.cpus/ cpuset.mems.

It can be reproduced by the following commands:
-----------------------------------
# mkdir /dev/cpuset
# mount -t cpuset xxx /dev/cpuset
# mkdir /dev/cpuset/0
# echo 0-1 > /dev/cpuset/0/cpus
# echo 0 > /dev/cpuset/0/mems
# echo $$ > /dev/cpuset/0/tasks
# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo 1 > /sys/devices/system/cpu/cpu1/online
-----------------------------------

There is only CPU0 in cpuset.cpus, but the task in this cpuset runs on
both CPU0 and CPU1.

It is because the task's cpu_allowed didn't get updated after we did CPU
offline/online manipulation. Similar for mem_allowed.

This patch fixes this bug expect for root cpuset. Because there is a
problem about root cpuset, in that whether it is necessary to update all
the tasks in root cpuset or not after cpu/node offline/online.

If updating, some kernel threads which is bound into a specified cpu will
be unbound.

If not updating, there is a bug in root cpuset. This bug is also caused
by offline/online manipulation. For example, there is a dual-cpu machine.
we create a sub cpuset in root cpuset and assign 1 to its cpus. And then
we attach some tasks into this sub cpuset. After this, we offline CPU1.
Now, the tasks in this new cpuset are moved into root cpuset automatically
because there is no cpu in sub cpuset. Then we online CPU1, we find all
the tasks which doesn't belong to root cpuset originally just run on CPU0.

Maybe we need to add a flag in the task_struct to mark which task can't be
unbound?

Signed-off-by: Miao Xie
Acked-by: Paul Jackson
Cc: Li Zefan
Cc: Paul Jackson
Cc: Paul Menage
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2008-07-26 01:53:38 +0800
0b2f630a2 cpusets: restructure the function update_cpumask() and update_nodemask() ... Browse Code »

Extract two functions from update_cpumask() and update_nodemask().They
will be used later for updating tasks' cpus_allowed and mems_allowed after
CPU/NODE offline/online.

[lizf@cn.fujitsu.com: build fix]
Signed-off-by: Miao Xie
Acked-by: Paul Jackson
Cc: David Rientjes
Cc: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2008-07-26 01:53:37 +0800
e37123953 cgroup files: remove cpuset_common_file_write() ... Browse Code »

This patch tweaks the signatures of the update_cpumask() and
update_nodemask() functions so that they can be called directly as
handlers for the new cgroups write_string() method.

This allows cpuset_common_file_write() to be removed.

Signed-off-by: Paul Menage
Cc: Paul Jackson
Cc: Pavel Emelyanov
Cc: Balbir Singh
Cc: Serge Hallyn
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-07-26 01:53:36 +0800

24 Jul, 2008

1 commit

7f9dce383 Merge branch 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: hrtick_enabled() should use cpu_active()
sched, x86: clean up hrtick implementation
sched: fix build error, provide partition_sched_domains() unconditionally
sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed
cpu hotplug: Make cpu_active_map synchronization dependency clear
cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)
sched: rework of "prioritize non-migratable tasks over migratable ones"
sched: reduce stack size in isolated_cpu_setup()
Revert parts of "ftrace: do not trace scheduler functions"

Fixed up conflicts in include/asm-x86/thread_info.h (due to the
TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and
kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr()
introduction).

Linus Torvalds
2008-07-24 10:36:53 +0800

23 Jul, 2008

1 commit

91cd4d6ef cpusets: fix wrong domain attr updates ... Browse Code »

Fix wrong domain attr updates, or we will always update the first sched
domain attr.

Signed-off-by: Miao Xie
Cc: Hidetoshi Seto
Cc: Paul Jackson
Cc: Nick Piggin
Cc: Ingo Molnar
Cc: [2.6.26.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2008-07-23 00:59:41 +0800

18 Jul, 2008

1 commit

e761b7725 cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) ... Browse Code »

This is based on Linus' idea of creating cpu_active_map that prevents
scheduler load balancer from migrating tasks to the cpu that is going
down.

It allows us to simplify domain management code and avoid unecessary
domain rebuilds during cpu hotplug event handling.

Please ignore the cpusets part for now. It needs some more work in order
to avoid crazy lock nesting. Although I did simplfy and unify domain
reinitialization logic. We now simply call partition_sched_domains() in
all the cases. This means that we're using exact same code paths as in
cpusets case and hence the test below cover cpusets too.
Cpuset changes to make rebuild_sched_domains() callable from various
contexts are in the separate patch (right next after this one).

This not only boots but also easily handles
while true; do make clean; make -j 8; done
and
while true; do on-off-cpu 1; done
at the same time.
(on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).

Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
this on right now in gnome-terminal and things are moving just fine.

Also this is running with most of the debug features enabled (lockdep,
mutex, etc) no BUG_ONs or lockdep complaints so far.

I believe I addressed all of the Dmitry's comments for original Linus'
version. I changed both fair and rt balancer to mask out non-active cpus.
And replaced cpu_is_offline() with !cpu_active() in the main scheduler
code where it made sense (to me).

Signed-off-by: Max Krasnyanskiy
Acked-by: Linus Torvalds
Acked-by: Peter Zijlstra
Acked-by: Gregory Haskins
Cc: dmitry.adamushko@gmail.com
Cc: pj@sgi.com
Signed-off-by: Ingo Molnar

Max Krasnyansky
2008-07-18 19:22:25 +0800

14 Jul, 2008

1 commit

873a6ed62 Merge commit 'v2.6.26' into sched/devel Browse Code »

Ingo Molnar
2008-07-14 18:19:19 +0800

13 Jul, 2008

1 commit

3e84050c8 cpusets, hotplug, scheduler: fix scheduler domain breakage ... Browse Code »

Commit f18f982ab ("sched: CPU hotplug events must not destroy scheduler
domains created by the cpusets") introduced a hotplug-related problem as
described below:

Upon CPU_DOWN_PREPARE,

update_sched_domains() -> detach_destroy_domains(&cpu_online_map)

does the following:

/*
* Force a reinitialization of the sched domains hierarchy. The domains
* and groups cannot be updated in place without racing with the balancing
* code, so we temporarily attach all running cpus to the NULL domain
* which will prevent rebalancing while the sched domains are recalculated.
*/

The sched-domains should be rebuilt when a CPU_DOWN ops. has been
completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
initial state). That's what update_sched_domains() also does but only
for !CPUSETS case.

With f18f982ab, sched-domains' reinitialization is delegated to
CPUSETS code:

cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
rebuild_sched_domains()

Being called for CPU_UP_PREPARE and if its callback is called after
update_sched_domains()), it just negates all the work done by
update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
the sched-domains and that makes it visible for the load-balancer
while the CPU_DOWN ops. is in progress.

__migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
"offline" when this function is called).

try_to_wake_up() is called for one of these tasks from another CPU ->
the load-balancer (wake_idle()) picks up a "dead" CPU and places the
task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later
-> oops.

Signed-off-by: Dmitry Adamushko
Tested-by: Vegard Nossum
Cc: Paul Menage
Cc: Max Krasnyansky
Cc: Paul Jackson
Cc: Peter Zijlstra
Cc: miaox@cn.fujitsu.com
Cc: rostedt@goodmis.org
Cc: Linus Torvalds
Signed-off-by: Ingo Molnar

Dmitry Adamushko
2008-07-13 17:37:02 +0800

23 Jun, 2008

1 commit

1de8644cc Merge branch 'linus' into sched/devel Browse Code »

Ingo Molnar
2008-06-23 17:30:23 +0800

21 Jun, 2008

1 commit

1f1e2ce8a Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression
rcupreempt: remove export of rcu_batches_completed_bh
cpuset: limit the input of cpuset.sched_relax_domain_level

Linus Torvalds
2008-06-21 03:37:13 +0800

19 Jun, 2008

2 commits

30e0e1781 cpuset: limit the input of cpuset.sched_relax_domain_level ... Browse Code »

We allow the inputs to be [-1 ... SD_LV_MAX), and return -EINVAL
for inputs outside this range.

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Acked-by: Paul Jackson
Acked-by: Hidetoshi Seto
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Li Zefan
2008-06-19 15:45:36 +0800
f18f982ab sched: CPU hotplug events must not destroy scheduler domains created by the cpusets ... Browse Code »

First issue is not related to the cpusets. We're simply leaking doms_cur.
It's allocated in arch_init_sched_domains() which is called for every
hotplug event. So we just keep reallocation doms_cur without freeing it.
I introduced free_sched_domains() function that cleans things up.

Second issue is that sched domains created by the cpusets are
completely destroyed by the CPU hotplug events. For all CPU hotplug
events scheduler attaches all CPUs to the NULL domain and then puts
them all into the single domain thereby destroying domains created
by the cpusets (partition_sched_domains).
The solution is simple, when cpusets are enabled scheduler should not
create default domain and instead let cpusets do that. Which is
exactly what the patch does.

Signed-off-by: Max Krasnyansky
Cc: pj@sgi.com
Cc: menage@google.com
Cc: rostedt@goodmis.org
Acked-by: Peter Zijlstra
Signed-off-by: Thomas Gleixner

Max Krasnyansky
2008-06-19 15:14:51 +0800

16 Jun, 2008

1 commit

f9e8e07e0 Merge branch 'linus' into sched-devel Browse Code »

Ingo Molnar
2008-06-16 17:15:21 +0800

10 Jun, 2008

1 commit

9985b0bab sched: prevent bound kthreads from changing cpus_allowed ... Browse Code »

Kthreads that have called kthread_bind() are bound to specific cpus, so
other tasks should not be able to change their cpus_allowed from under
them. Otherwise, it is possible to move kthreads, such as the migration
or software watchdog threads, so they are not allowed access to the cpu
they work on.

Cc: Peter Zijlstra
Cc: Paul Menage
Cc: Paul Jackson
Signed-off-by: David Rientjes
Signed-off-by: Ingo Molnar

David Rientjes
2008-06-10 18:26:16 +0800