Doug / smarc-fsl-linux-kernel | Embedian Git Server

10 Jul, 2009

1 commit

a1ba4d8ba sched_rt: Fix overload bug on rt group scheduling ... Browse Code »

Fixes an easily triggerable BUG() when setting process affinities.

Make sure to count the number of migratable tasks in the same place:
the root rt_rq. Otherwise the number doesn't make sense and we'll hit
the BUG in set_cpus_allowed_rt().

Also, make sure we only count tasks, not groups (this is probably
already taken care of by the fact that rt_se->nr_cpus_allowed will be 0
for groups, but be more explicit)

Tested-by: Thomas Gleixner
CC: stable@kernel.org
Signed-off-by: Peter Zijlstra
Acked-by: Gregory Haskins
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-07-10 16:43:29 +0800

09 Jun, 2009

1 commit

eaa958402 cpumask: alloc zeroed cpumask for static cpumask_var_ts ... Browse Code »

These are defined as static cpumask_var_t so if MAXSMP is not used,
they are cleared already. Avoid surprises when MAXSMP is enabled.

Signed-off-by: Yinghai Lu
Signed-off-by: Rusty Russell

Yinghai Lu
2009-06-09 21:00:27 +0800

08 Apr, 2009

1 commit

5af8c4e0f Merge commit 'v2.6.30-rc1' into sched/urgent ... Browse Code »

Merge reason: update to latest upstream to queue up fix

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-04-08 23:26:00 +0800

01 Apr, 2009

1 commit

13b8bd0a5 sched_rt: don't allocate cpumask in fastpath ... Browse Code »

Impact: cleanup

As pointed out by Steven Rostedt. Since the arg in question is
unused, we simply change cpupri_find() to accept NULL.

Reported-by: Steven Rostedt
Signed-off-by: Rusty Russell
LKML-Reference:
Signed-off-by: Ingo Molnar

Rusty Russell
2009-04-01 19:24:51 +0800

28 Mar, 2009

1 commit

6e15cf048 Merge branch 'core/percpu' into percpu-cpumask-x86-for-linus-2 ... Browse Code »

Conflicts:
arch/parisc/kernel/irq.c
arch/x86/include/asm/fixmap_64.h
arch/x86/include/asm/setup.h
kernel/irq/handle.c

Semantic merge:
arch/x86/include/asm/fixmap.h

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-03-28 00:28:43 +0800

09 Feb, 2009

1 commit

140573d33 Merge branches 'sched/rt' and 'sched/urgent' into sched/core Browse Code »

Ingo Molnar
2009-02-09 03:12:46 +0800

01 Feb, 2009

1 commit

3d398703e sched_rt: don't use first_cpu on cpumask created with cpumask_and ... Browse Code »

cpumask_and() only initializes nr_cpu_ids bits, so the (deprecated)
first_cpu() might find one of those uninitialized bits if nr_cpu_ids
is less than NR_CPUS (as it can be for CONFIG_CPUMASK_OFFSTACK).

Signed-off-by: Rusty Russell
Signed-off-by: Ingo Molnar

Rusty Russell
2009-02-01 17:49:52 +0800

16 Jan, 2009

1 commit

ceacc2c1c sched: make plist a library facility ... Browse Code »

Ingo Molnar wrote:

> here's a new build failure with tip/sched/rt:
>
> LD .tmp_vmlinux1
> kernel/built-in.o: In function `set_curr_task_rt':
> sched.c:(.text+0x3675): undefined reference to `plist_del'
> kernel/built-in.o: In function `pick_next_task_rt':
> sched.c:(.text+0x37ce): undefined reference to `plist_del'
> kernel/built-in.o: In function `enqueue_pushable_task':
> sched.c:(.text+0x381c): undefined reference to `plist_del'

Eliminate the plist library kconfig and make it available
unconditionally.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-01-16 22:01:31 +0800

14 Jan, 2009

2 commits

398a153b1 sched: fix build error in kernel/sched_rt.c when RT_GROUP_SCHED && !SMP ... Browse Code »

Ingo found a build error in the scheduler when RT_GROUP_SCHED was
enabled, but SMP was not. This patch rearranges the code such
that it is a little more streamlined and compiles under all permutations
of SMP, UP and RT_GROUP_SCHED. It was boot tested on my 4-way x86_64
and it still passes preempt-test.

Signed-off-by: Gregory Haskins

Gregory Haskins
2009-01-14 22:10:04 +0800
b07430ac3 sched: de CPP-ify the scheduler code ... Browse Code »

Signed-off-by: Gregory Haskins

Gregory Haskins
2009-01-14 21:55:39 +0800

12 Jan, 2009

1 commit

d38b223c8 cpumask: reduce stack usage in find_lowest_rq ... Browse Code »

Impact: reduce stack usage, cleanup

Use a cpumask_var_t in find_lowest_rq() and clean up other old
cpumask_t calls.

Signed-off-by: Mike Travis

Mike Travis
2009-01-12 02:13:22 +0800

11 Jan, 2009

1 commit

0a6d4e1dc Merge branch 'sched/latest' of git://git.kernel.org/pub/scm/linux/kernel/git/gha… ... Browse Code »

…skins/linux-2.6-hacks into sched/rt

Ingo Molnar
2009-01-11 11:58:49 +0800

04 Jan, 2009

2 commits

6ca09dfc9 sched: put back some stack hog changes that were undone in kernel/sched.c ... Browse Code »

Impact: prevents panic from stack overflow on numa-capable machines.

Some of the "removal of stack hogs" changes in kernel/sched.c by using
node_to_cpumask_ptr were undone by the early cpumask API updates, and
causes a panic due to stack overflow. This patch undoes those changes
by using cpumask_of_node() which returns a 'const struct cpumask *'.

In addition, cpu_coregoup_map is replaced with cpu_coregroup_mask further
reducing stack usage. (Both of these updates removed 9 FIXME's!)

Also:
Pick up some remaining changes from the old 'cpumask_t' functions to
the new 'struct cpumask *' functions.

Optimize memory traffic by allocating each percpu local_cpu_mask on the
same node as the referring cpu.

Signed-off-by: Mike Travis
Acked-by: Rusty Russell
Signed-off-by: Ingo Molnar

Mike Travis
2009-01-04 02:00:09 +0800
7eb195533 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/lin… ... Browse Code »

…ux-2.6-cpumask into merge-rr-cpumask

Conflicts:
arch/x86/kernel/io_apic.c
kernel/rcuclassic.c
kernel/sched.c
kernel/time/tick-sched.c

Signed-off-by: Mike Travis <travis@sgi.com>
[ mingo@elte.hu: backmerged typo fix for io_apic.c ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Mike Travis
2009-01-04 01:53:31 +0800

29 Dec, 2008

8 commits

1563513d3 RT: fix push_rt_task() to handle dequeue_pushable properly ... Browse Code »

A panic was discovered by Chirag Jog where a BUG_ON sanity check
in the new "pushable_task" logic would trigger a panic under
certain circumstances:

http://lkml.org/lkml/2008/9/25/189

Gilles Carry discovered that the root cause was attributed to the
pushable_tasks list getting corrupted in the push_rt_task logic.
This was the result of a dropped rq lock in double_lock_balance
allowing a task in the process of being pushed to potentially migrate
away, and thus corrupt the pushable_tasks() list.

I traced back the problem as introduced by the pushable_tasks patch
that went in recently. There is a "retry" path in push_rt_task()
that actually had a compound conditional to decide whether to
retry or exit. I missed the meaning behind the rationale for the
virtual "if(!task) goto out;" portion of the compound statement and
thus did not handle it properly. The new pushable_tasks logic
actually creates three distinct conditions:

1) an untouched and unpushable task should be dequeued
2) a migrated task where more pushable tasks remain should be retried
3) a migrated task where no more pushable tasks exist should exit

The original logic mushed (1) and (3) together, resulting in the
system dequeuing a migrated task (against an unlocked foreign run-queue
nonetheless).

To fix this, we get rid of the notion of "paranoid" and we support the
three unique conditions properly. The paranoid feature is no longer
relevant with the new pushable logic (since pushable naturally limits
the loop) anyway, so lets just remove it.

Reported-By: Chirag Jog
Found-by: Gilles Carry
Signed-off-by: Gregory Haskins

Gregory Haskins
2008-12-29 22:39:53 +0800
917b627d4 sched: create "pushable_tasks" list to limit pushing to one attempt ... Browse Code »

The RT scheduler employs a "push/pull" design to actively balance tasks
within the system (on a per disjoint cpuset basis). When a task is
awoken, it is immediately determined if there are any lower priority
cpus which should be preempted. This is opposed to the way normal
SCHED_OTHER tasks behave, which will wait for a periodic rebalancing
operation to occur before spreading out load.

When a particular RQ has more than 1 active RT task, it is said to
be in an "overloaded" state. Once this occurs, the system enters
the active balancing mode, where it will try to push the task away,
or persuade a different cpu to pull it over. The system will stay
in this state until the system falls back below the lock for certain workloads, and by making sure the algorithm
considers all eligible tasks in the system.

[ rostedt: added a couple more BUG_ONs ]

Signed-off-by: Gregory Haskins
Acked-by: Steven Rostedt

Gregory Haskins
2008-12-29 22:39:53 +0800
967fc0467 sched: add sched_class->needs_post_schedule() member ... Browse Code »

We currently run class->post_schedule() outside of the rq->lock, which
means that we need to test for the need to post_schedule outside of
the lock to avoid a forced reacquistion. This is currently not a problem
as we only look at rq->rt.overloaded. However, we want to enhance this
going forward to look at more state to reduce the need to post_schedule to
a bare minimum set. Therefore, we introduce a new member-func called
needs_post_schedule() which tests for the post_schedule condtion without
actually performing the work. Therefore it is safe to call this
function before the rq->lock is released, because we are guaranteed not
to drop the lock at an intermediate point (such as what post_schedule()
may do).

We will use this later in the series

[ rostedt: removed paranoid BUG_ON ]

Signed-off-by: Gregory Haskins

Gregory Haskins
2008-12-29 22:39:52 +0800
777c2f389 sched: only try to push a task on wakeup if it is migratable ... Browse Code »

There is no sense in wasting time trying to push a task away that
cannot move anywhere else. We gain no benefit from trying to push
other tasks at this point, so if the task being woken up is non
migratable, just skip the whole operation. This reduces overhead
in the wakeup path for certain tasks.

Signed-off-by: Gregory Haskins

Gregory Haskins
2008-12-29 22:39:50 +0800
74ab8e4f6 sched: use highest_prio.next to optimize pull operations ... Browse Code »

We currently take the rq->lock for every cpu in an overload state during
pull_rt_tasks(). However, we now have enough information via the
highest_prio.[curr|next] fields to determine if there is any tasks of
interest to warrant the overhead of the rq->lock, before we actually take
it. So we use this information to reduce lock contention during the
pull for the case where the source-rq doesnt have tasks that preempt
the current task.

Signed-off-by: Gregory Haskins

Gregory Haskins
2008-12-29 22:39:50 +0800
a8728944e sched: use highest_prio.curr for pull threshold ... Browse Code »

highest_prio.curr is actually a more accurate way to keep track of
the pull_rt_task() threshold since it is always up to date, even
if the "next" task migrates during double_lock. Therefore, stop
looking at the "next" task object and simply use the highest_prio.curr.

Signed-off-by: Gregory Haskins

Gregory Haskins
2008-12-29 22:39:49 +0800
e864c499d sched: track the next-highest priority on each runqueue ... Browse Code »

We will use this later in the series to reduce the amount of rq-lock
contention during a pull operation

Signed-off-by: Gregory Haskins

Gregory Haskins
2008-12-29 22:39:49 +0800
4d9842776 sched: cleanup inc/dec_rt_tasks ... Browse Code »

Move some common definitions up to the function prologe to simplify the
body logic.

Signed-off-by: Gregory Haskins

Gregory Haskins
2008-12-29 22:39:49 +0800

25 Dec, 2008

1 commit

4e202284e Merge branch 'sched/urgent'; commit 'v2.6.28' into sched/core Browse Code »

Ingo Molnar
2008-12-25 20:42:23 +0800

17 Dec, 2008

1 commit

80f40ee4a sched: use RCU variant of list traversal in for_each_leaf_rt_rq() ... Browse Code »

Impact: fix potential of rare crash

for_each_leaf_rt_rq() walks an RCU protected list (rq->leaf_rt_rq_list),
but doesn't use list_for_each_entry_rcu(). Fix this.

Signed-off-by: Bharata B Rao
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar

Bharata B Rao
2008-12-17 04:39:14 +0800

12 Dec, 2008

1 commit

45ab6b0c7 Merge branch 'sched/core' into cpus4096 ... Browse Code »

Conflicts:
include/linux/ftrace.h
kernel/sched.c

Ingo Molnar
2008-12-12 20:48:57 +0800

29 Nov, 2008

1 commit

70574a996 sched: move double_unlock_balance() higher ... Browse Code »

Move double_lock_balance()/double_unlock_balance() higher to fix the following
with gcc-3.4.6:

CC kernel/sched.o
In file included from kernel/sched.c:1605:
kernel/sched_rt.c: In function `find_lock_lowest_rq':
kernel/sched_rt.c:914: sorry, unimplemented: inlining failed in call to 'double_unlock_balance': function body not available
kernel/sched_rt.c:1077: sorry, unimplemented: called from here
make[2]: *** [kernel/sched.o] Error 1

Signed-off-by: Alexey Dobriyan
Signed-off-by: Ingo Molnar

Alexey Dobriyan
2008-11-29 03:11:15 +0800

26 Nov, 2008

1 commit

3d8cbdf86 sched: convert local_cpu_mask to cpumask_var_t, fix ... Browse Code »

Impact: build fix for !CONFIG_SMP

Signed-off-by: Rusty Russell
Acked-by: Mike Travis
Signed-off-by: Ingo Molnar

Rusty Russell
2008-11-26 14:58:28 +0800

25 Nov, 2008

5 commits

96f874e26 sched: convert remaining old-style cpumask operators ... Browse Code »

Impact: Trivial API conversion

NR_CPUS -> nr_cpu_ids
cpumask_t -> struct cpumask
sizeof(cpumask_t) -> cpumask_size()
cpumask_a = cpumask_b -> cpumask_copy(&cpumask_a, &cpumask_b)

cpu_set() -> cpumask_set_cpu()
first_cpu() -> cpumask_first()
cpumask_of_cpu() -> cpumask_of()
cpus_* -> cpumask_*

There are some FIXMEs where we all archs to complete infrastructure
(patches have been sent):

cpu_coregroup_map -> cpu_coregroup_mask
node_to_cpumask* -> cpumask_of_node

There is also one FIXME where we pass an array of cpumasks to
partition_sched_domains(): this implies knowing the definition of
'struct cpumask' and the size of a cpumask. This will be fixed in a
future patch.

Signed-off-by: Rusty Russell
Signed-off-by: Ingo Molnar

Rusty Russell
2008-11-25 00:52:42 +0800
0e3900e6d sched: convert local_cpu_mask to cpumask_var_t. ... Browse Code »

Impact: (future) size reduction for large NR_CPUS.

Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves
space for small nr_cpu_ids but big CONFIG_NR_CPUS. cpumask_var_t
is just a struct cpumask for !CONFIG_CPUMASK_OFFSTACK.

Signed-off-by: Rusty Russell
Signed-off-by: Ingo Molnar

Rusty Russell
2008-11-25 00:52:35 +0800
24600ce89 sched: convert check_preempt_equal_prio to cpumask_var_t. ... Browse Code »

Impact: stack reduction for large NR_CPUS

Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves
stack space.

We simply return if the allocation fails: since we don't use it we
could just pass NULL to cpupri_find and have it handle that.

Signed-off-by: Rusty Russell
Signed-off-by: Ingo Molnar

Rusty Russell
2008-11-25 00:52:28 +0800
c6c4927b2 sched: convert struct root_domain to cpumask_var_t. ... Browse Code »

Impact: (future) size reduction for large NR_CPUS.

Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves
space for small nr_cpu_ids but big CONFIG_NR_CPUS. cpumask_var_t
is just a struct cpumask for !CONFIG_CPUMASK_OFFSTACK.

def_root_domain is static, and so its masks are initialized with
alloc_bootmem_cpumask_var. After that, alloc_cpumask_var is used.

Signed-off-by: Rusty Russell
Signed-off-by: Ingo Molnar

Rusty Russell
2008-11-25 00:51:18 +0800
758b2cdc6 sched: wrap sched_group and sched_domain cpumask accesses. ... Browse Code »

Impact: trivial wrap of member accesses

This eases the transition in the next patch.

We also get rid of a temporary cpumask in find_idlest_cpu() thanks to
for_each_cpu_and, and sched_balance_self() due to getting weight before
setting sd to NULL.

Signed-off-by: Rusty Russell
Signed-off-by: Ingo Molnar

Rusty Russell
2008-11-25 00:50:45 +0800

07 Nov, 2008

1 commit

cf7f8690e sched, lockdep: inline double_unlock_balance() ... Browse Code »

We have a test case which measures the variation in the amount of time
needed to perform a fixed amount of work on the preempt_rt kernel. We
started seeing deterioration in it's performance recently. The test
should never take more than 10 microseconds, but we started 5-10%
failure rate.

Using elimination method, we traced the problem to commit
1b12bbc747560ea68bcc132c3d05699e52271da0 (lockdep: re-annotate
scheduler runqueues).

When LOCKDEP is disabled, this patch only adds an additional function
call to double_unlock_balance(). Hence I inlined double_unlock_balance()
and the problem went away. Here is a patch to make this change.

Signed-off-by: Sripathi Kodi
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Sripathi Kodi
2008-11-07 05:12:09 +0800

03 Nov, 2008

1 commit

e113a745f sched/rt: small optimization to update_curr_rt() ... Browse Code »

Impact: micro-optimization to SCHED_FIFO/RR scheduling

A very minor improvement, but might it be better to check sched_rt_runtime(rt_rq)
before taking the rt_runtime_lock?

Peter Zijlstra observes:

> Yes, I think its ok to do so.
>
> Like pointed out in the other thread, there are two races:
>
> - sched_rt_runtime() going to RUNTIME_INF, and that will be handled
> properly by sched_rt_runtime_exceeded()
>
> - sched_rt_runtime() going to !RUNTIME_INF, and here we can miss an
> accounting cycle, but I don't think that is something to worry too
> much about.

Signed-off-by: Dimitri Sivanich
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

--

kernel/sched_rt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

Dimitri Sivanich
2008-11-03 18:29:00 +0800

24 Oct, 2008

1 commit

8c82a17e9 Merge commit 'v2.6.28-rc1' into sched/urgent Browse Code »

Ingo Molnar
2008-10-24 18:48:46 +0800

22 Oct, 2008

1 commit

4ce72a2c0 sched: add CONFIG_SMP consistency ... Browse Code »

a patch from Henrik Austad did this:

>> Do not declare select_task_rq as part of sched_class when CONFIG_SMP is
>> not set.

Peter observed:

> While a proper cleanup, could you do it by re-arranging the methods so
> as to not create an additional ifdef?

Do not declare select_task_rq and some other methods as part of sched_class
when CONFIG_SMP is not set.

Also gather those methods to avoid CONFIG_SMP mess.

Idea-by: Henrik Austad
Signed-off-by: Li Zefan
Acked-by: Peter Zijlstra
Acked-by: Henrik Austad
Signed-off-by: Ingo Molnar

Li Zefan
2008-10-22 16:01:52 +0800

20 Oct, 2008

1 commit

c465a76af Merge branches 'timers/clocksource', 'timers/hrtimers', 'timers/nohz', 'timers/n… ... Browse Code »

…tp', 'timers/posixtimers' and 'timers/debug' into v28-timers-for-linus

Thomas Gleixner
2008-10-20 19:14:06 +0800

04 Oct, 2008

1 commit

f6121f4f8 sched_rt.c: resch needed in rt_rq_enqueue() for the root rt_rq ... Browse Code »

While working on the new version of the code for SCHED_SPORADIC I
noticed something strange in the present throttling mechanism. More
specifically in the throttling timer handler in sched_rt.c
(do_sched_rt_period_timer()) and in rt_rq_enqueue().

The problem is that, when unthrottling a runqueue, rt_rq_enqueue() only
asks for rescheduling if the runqueue has a sched_entity associated to
it (i.e., rt_rq->rt_se != NULL).
Now, if the runqueue is the root rq (which has a rt_se = NULL)
rescheduling does not take place, and it is delayed to some undefined
instant in the future.

This imply some random bandwidth usage by the RT tasks under throttling.
For instance, setting rt_runtime_us/rt_period_us = 950ms/1000ms an RT
task will get less than 95%. In our tests we got something varying
between 70% to 95%.
Using smaller time values, e.g., 95ms/100ms, things are even worse, and
I can see values also going down to 20-25%!!

The tests we performed are simply running 'yes' as a SCHED_FIFO task,
and checking the CPU usage with top, but we can investigate thoroughly
if you think it is needed.

Things go much better, for us, with the attached patch... Don't know if
it is the best approach, but it solved the issue for us.

Signed-off-by: Dario Faggioli
Signed-off-by: Michael Trimarchi
Acked-by: Peter Zijlstra
Cc:
Signed-off-by: Ingo Molnar

Dario Faggioli
2008-10-04 20:31:54 +0800

23 Sep, 2008

2 commits

78333cdd0 sched: add some comments to the bandwidth code ... Browse Code »

Hopefully clarify some of this code a little.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-09-23 22:23:16 +0800
63e5c3985 Merge branches 'sched/urgent' and 'sched/rt' into sched/devel Browse Code »

Ingo Molnar
2008-09-23 22:23:05 +0800