Eric Lee / smarc-fsl-linux-kernel

26 Jan, 2011

2 commits

da7a735e5 sched: Fix switch_from_fair() ... Browse Code »

When a task is taken out of the fair class we must ensure the vruntime
is properly normalized because when we put it back in it will assume
to be normalized.

The case that goes wrong is when changing away from the fair class
while sleeping. Sleeping tasks have non-normalized vruntime in order
to make sleeper-fairness work. So treat the switch away from fair as a
wakeup and preserve the relative vruntime.

Also update sysrq-n to call the ->switch_{to,from} methods.

Reported-by: Onkalo Samu
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-01-26 19:33:22 +0800
a8941d7ec sched: Simplify the idle scheduling class ... Browse Code »

Since commit 48c5ccae88dcd (sched: Simplify cpu-hot-unplug task
migration) this should no longer happen, so remove the code.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-01-26 19:33:22 +0800

23 Apr, 2010

1 commit

74f5187ac sched: Cure load average vs NO_HZ woes ... Browse Code »

Chase reported that due to us decrementing calc_load_task prematurely
(before the next LOAD_FREQ sample), the load average could be scewed
by as much as the number of CPUs in the machine.

This patch, based on Chase's patch, cures the problem by keeping the
delta of the CPU going into NO_HZ idle separately and folding that in
on the next LOAD_FREQ update.

This restores the balance and we get strict LOAD_FREQ period samples.

Signed-off-by: Peter Zijlstra
Acked-by: Chase Douglas
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-04-23 17:02:02 +0800

03 Apr, 2010

2 commits

371fd7e7a sched: Add enqueue/dequeue flags ... Browse Code »

In order to reduce the dependency on TASK_WAKING rework the enqueue
interface to support a proper flags field.

Replace the int wakeup, bool head arguments with an int flags argument
and create the following flags:

ENQUEUE_WAKEUP - the enqueue is a wakeup of a sleeping task,
ENQUEUE_WAKING - the enqueue has relative vruntime due to
having sched_class::task_waking() called,
ENQUEUE_HEAD - the waking task should be places on the head
of the priority queue (where appropriate).

For symmetry also convert sched_class::dequeue() to a flags scheme.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-04-03 02:12:05 +0800
0017d7350 sched: Fix TASK_WAKING vs fork deadlock ... Browse Code »

Oleg noticed a few races with the TASK_WAKING usage on fork.

- since TASK_WAKING is basically a spinlock, it should be IRQ safe
- since we set TASK_WAKING (*) without holding rq->lock it could
be there still is a rq->lock holder, thereby not actually
providing full serialization.

(*) in fact we clear PF_STARTING, which in effect enables TASK_WAKING.

Cure the second issue by not setting TASK_WAKING in sched_fork(), but
only temporarily in wake_up_new_task() while calling select_task_rq().

Cure the first by holding rq->lock around the select_task_rq() call,
this will disable IRQs, this however requires that we push down the
rq->lock release into select_task_rq_fair()'s cgroup stuff.

Because select_task_rq_fair() still needs to drop the rq->lock we
cannot fully get rid of TASK_WAKING.

Reported-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-04-03 02:12:03 +0800

21 Jan, 2010

1 commit

3d45fd804 sched: Remove the sched_class load_balance methods ... Browse Code »

Take out the sched_class methods for load-balancing.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-01-21 20:40:09 +0800

17 Jan, 2010

1 commit

6d686f456 sched: Don't expose local functions ... Browse Code »

kernel/sched: don't expose local functions

The get_rr_interval_* functions are all class methods of
struct sched_class. They are not exported so make them
static.

Signed-off-by: H Hartley Sweeten
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

H Hartley Sweeten
2010-01-17 15:09:45 +0800

21 Dec, 2009

1 commit

3df0fc5b2 sched: Restore printk sanity ... Browse Code »

Revert the braindead pr_* crap. (Commit 663997d "sched: Use
pr_fmt() and pr_()")

It's dumb and causes stupid "sched: " strings all over the place.

Signed-off-by: Peter Zijlstra
Acked-by: Mike Galbraith
Cc: Joe Perches
Cc: Linus Torvalds
Cc: Andrew Morton
LKML-Reference:
[ i dont mind the pr_*() patterns that much - but Peter dislikes them with a vengence. ]
[ - v2: remove spurious diffstat from changelog :-/ ]
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-12-21 02:05:02 +0800

17 Dec, 2009

1 commit

ee1156c11 Merge branch 'linus' into sched/urgent ... Browse Code »

Conflicts:
kernel/sched_idletask.c

Merge reason: resolve the conflicts, pick up latest changes.

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-12-17 01:33:49 +0800

15 Dec, 2009

1 commit

05fa785cf sched: Convert rq->lock to raw_spinlock ... Browse Code »

Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Acked-by: Ingo Molnar

Thomas Gleixner
2009-12-15 06:55:33 +0800

13 Dec, 2009

1 commit

663997d41 sched: Use pr_fmt() and pr_<level>() ... Browse Code »

- Convert printk(KERN_ to pr_ (not KERN_DEBUG)
- Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
- Coalesce long format strings
- Add missing \n to "ERROR: !SD_LOAD_BALANCE domain has parent"

Signed-off-by: Joe Perches
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Joe Perches
2009-12-13 15:13:55 +0800

09 Dec, 2009

1 commit

dba091b9e sched: Protect sched_rr_get_param() access to task->sched_class ... Browse Code »

sched_rr_get_param calls
task->sched_class->get_rr_interval(task) without protection
against a concurrent sched_setscheduler() call which modifies
task->sched_class.

Serialize the access with task_rq_lock(task) and hand the rq
pointer into get_rr_interval() as it's needed at least in the
sched_fair implementation.

Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Thomas Gleixner
2009-12-09 17:01:07 +0800

21 Sep, 2009

1 commit

0d721cead sched: Simplify sys_sched_rr_get_interval() system call ... Browse Code »

By removing the need for it to know details of scheduling classes.

This allows PlugSched to define orthogonal scheduling classes.

Signed-off-by: Peter Williams
Acked-by: Peter Zijlstra
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Williams
2009-09-21 15:53:55 +0800

15 Sep, 2009

3 commits

7d4787214 sched: Rename sync arguments ... Browse Code »

In order to extend the functions to have more than 1 flag (sync),
rename the argument to flags, and explicitly define a WF_ space for
individual flags.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-15 22:51:30 +0800
0763a660a sched: Rename select_task_rq() argument ... Browse Code »

In order to be able to rename the sync argument, we need to rename
the current flag argument.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-15 22:51:29 +0800
5f3edc1b1 sched: Hook sched_balance_self() into sched_class::select_task_rq() ... Browse Code »

Rather ugly patch to fully place the sched_balance_self() code
inside the fair class.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-15 22:01:04 +0800

15 May, 2009

1 commit

dce48a84a sched, timers: move calc_load() to scheduler ... Browse Code »

Dimitri Sivanich noticed that xtime_lock is held write locked across
calc_load() which iterates over all online CPUs. That can cause long
latencies for xtime_lock readers on large SMP systems.

The load average calculation is an rough estimate anyway so there is
no real need to protect the readers vs. the update. It's not a problem
when the avenrun array is updated while a reader copies the values.

Instead of iterating over all online CPUs let the scheduler_tick code
update the number of active tasks shortly before the avenrun update
happens. The avenrun update itself is handled by the CPU which calls
do_timer().

[ Impact: reduce xtime_lock write locked section ]

Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra

Thomas Gleixner
2009-05-15 21:32:45 +0800

22 Oct, 2008

1 commit

4ce72a2c0 sched: add CONFIG_SMP consistency ... Browse Code »

a patch from Henrik Austad did this:

>> Do not declare select_task_rq as part of sched_class when CONFIG_SMP is
>> not set.

Peter observed:

> While a proper cleanup, could you do it by re-arranging the methods so
> as to not create an additional ifdef?

Do not declare select_task_rq and some other methods as part of sched_class
when CONFIG_SMP is not set.

Also gather those methods to avoid CONFIG_SMP mess.

Idea-by: Henrik Austad
Signed-off-by: Li Zefan
Acked-by: Peter Zijlstra
Acked-by: Henrik Austad
Signed-off-by: Ingo Molnar

Li Zefan
2008-10-22 16:01:52 +0800

22 Sep, 2008

1 commit

15afe09bf sched: wakeup preempt when small overlap ... Browse Code »

Lin Ming reported a 10% OLTP regression against 2.6.27-rc4.

The difference seems to come from different preemption agressiveness,
which affects the cache footprint of the workload and its effective
cache trashing.

Aggresively preempt a task if its avg overlap is very small, this should
avoid the task going to sleep and find it still running when we schedule
back to it - saving a wakeup.

Reported-by: Lin Ming
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-09-22 22:28:32 +0800

06 May, 2008

1 commit

2abdad0a4 sched: make rt_sched_class, idle_sched_class static ... Browse Code »

The C files are included directly in sched.c, so they are
effectively static.

Signed-off-by: Harvey Harrison
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Harvey Harrison
2008-05-06 05:56:17 +0800

26 Jan, 2008

3 commits

8f4d37ec0 sched: high-res preemption tick ... Browse Code »

Use HR-timers (when available) to deliver an accurate preemption tick.

The regular scheduler tick that runs at 1/HZ can be too coarse when nice
level are used. The fairness system will still keep the cpu utilisation 'fair'
by then delaying the task that got an excessive amount of CPU time but try to
minimize this by delivering preemption points spot-on.

The average frequency of this extra interrupt is sched_latency / nr_latency.
Which need not be higher than 1/HZ, its just that the distribution within the
sched_latency period is important.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-01-26 04:08:29 +0800
cb4698450 sched: RT-balance, add new methods to sched_class ... Browse Code »

Dmitry Adamushko found that the current implementation of the RT
balancing code left out changes to the sched_setscheduler and
rt_mutex_setprio.

This patch addresses this issue by adding methods to the schedule classes
to handle being switched out of (switched_from) and being switched into
(switched_to) a sched_class. Also a method for changing of priorities
is also added (prio_changed).

This patch also removes some duplicate logic between rt_mutex_setprio and
sched_setscheduler.

Signed-off-by: Steven Rostedt
Signed-off-by: Ingo Molnar

Steven Rostedt
2008-01-26 04:08:22 +0800
e7693a362 sched: de-SCHED_OTHER-ize the RT path ... Browse Code »

The current wake-up code path tries to determine if it can optimize the
wake-up to "this_cpu" by computing load calculations. The problem is that
these calculations are only relevant to SCHED_OTHER tasks where load is king.
For RT tasks, priority is king. So the load calculation is completely wasted
bandwidth.

Therefore, we create a new sched_class interface to help with
pre-wakeup routing decisions and move the load calculation as a function
of CFS task's class.

Signed-off-by: Gregory Haskins
Signed-off-by: Steven Rostedt
Signed-off-by: Ingo Molnar

Gregory Haskins
2008-01-26 04:08:09 +0800

25 Oct, 2007

2 commits

681f3e685 sched: isolate SMP balancing code a bit more ... Browse Code »

At the moment, a lot of load balancing code that is irrelevant to non
SMP systems gets included during non SMP builds.

This patch addresses this issue and reduces the binary size on non
SMP systems:

text data bss dec hex filename
10983 28 1192 12203 2fab sched.o.before
10739 28 1192 11959 2eb7 sched.o.after

Signed-off-by: Peter Williams
Signed-off-by: Ingo Molnar

Peter Williams
2007-10-25 00:23:51 +0800
e1d1484f7 sched: reduce balance-tasks overhead ... Browse Code »

At the moment, balance_tasks() provides low level functionality for both
move_tasks() and move_one_task() (indirectly) via the load_balance()
function (in the sched_class interface) which also provides dual
functionality. This dual functionality complicates the interfaces and
internal mechanisms and makes the run time overhead of operations that
are called with two run queue locks held.

This patch addresses this issue and reduces the overhead of these
operations.

Signed-off-by: Peter Williams
Signed-off-by: Ingo Molnar

Peter Williams
2007-10-25 00:23:51 +0800

15 Oct, 2007

4 commits

5522d5d5f sched: mark scheduling classes as const ... Browse Code »

mark scheduling classes as const. The speeds up the code
a bit and shrinks it:

text data bss dec hex filename
40027 4018 292 44337 ad31 sched.o.before
40190 3842 292 44324 ad24 sched.o.after

Signed-off-by: Ingo Molnar
Reviewed-by: Thomas Gleixner

Ingo Molnar
2007-10-15 23:00:12 +0800
83b699ed2 sched: revert recent removal of set_curr_task() ... Browse Code »

Revert removal of set_curr_task.
Use put_prev_task/set_curr_task when changing groups/policies

Signed-off-by: Srivatsa Vaddagiri < vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani
Signed-off-by: Ingo Molnar
Signed-off-by: Peter Zijlstra

Srivatsa Vaddagiri
2007-10-15 23:00:08 +0800
f6b53205e sched: rework enqueue/dequeue_entity() to get rid of set_curr_task() ... Browse Code »

rework enqueue/dequeue_entity() to get rid of
sched_class::set_curr_task(). This simplifies sched_setscheduler(),
rt_mutex_setprio() and sched_move_tasks().

text data bss dec hex filename
24330 2734 20 27084 69cc sched.o.before
24233 2730 20 26983 6967 sched.o.after

Signed-off-by: Dmitry Adamushko
Signed-off-by: Srivatsa Vaddagiri
Signed-off-by: Ingo Molnar
Signed-off-by: Peter Zijlstra
Reviewed-by: Thomas Gleixner

Dmitry Adamushko
2007-10-15 23:00:08 +0800
29f59db3a sched: group-scheduler core ... Browse Code »

Add interface to control cpu bandwidth allocation to task-groups.

(not yet configurable, due to missing CONFIG_CONTAINERS)

Signed-off-by: Srivatsa Vaddagiri
Signed-off-by: Dhaval Giani
Signed-off-by: Ingo Molnar
Signed-off-by: Peter Zijlstra

Srivatsa Vaddagiri
2007-10-15 23:00:07 +0800

09 Aug, 2007

5 commits

31ee529cc sched: remove the 'u64 now' parameter from ->put_prev_task() ... Browse Code »

remove the 'u64 now' parameter from ->put_prev_task().

( identity transformation that causes no change in functionality. )

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-08-09 17:16:49 +0800
fb8d47240 sched: remove the 'u64 now' parameter from ->pick_next_task() ... Browse Code »

remove the 'u64 now' parameter from ->pick_next_task().

( identity transformation that causes no change in functionality. )

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-08-09 17:16:48 +0800
f02231e51 sched: remove the 'u64 now' parameter from ->dequeue_task() ... Browse Code »

remove the 'u64 now' parameter from ->dequeue_task().

( identity transformation that causes no change in functionality. )

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-08-09 17:16:48 +0800
a4ac01c36 sched: fix bug in balance_tasks() ... Browse Code »

There are two problems with balance_tasks() and how it used:

1. The variables best_prio and best_prio_seen (inherited from the old
move_tasks()) were only required to handle problems caused by the
active/expired arrays, the order in which they were processed and the
possibility that the task with the highest priority could be on either.
These issues are no longer present and the extra overhead associated
with their use is unnecessary (and possibly wrong).

2. In the absence of CONFIG_FAIR_GROUP_SCHED being set, the same
this_best_prio variable needs to be used by all scheduling classes or
there is a risk of moving too much load. E.g. if the highest priority
task on this at the beginning is a fairly low priority task and the rt
class migrates a task (during its turn) then that moved task becomes the
new highest priority task on this_rq but when the sched_fair class
initializes its copy of this_best_prio it will get the priority of the
original highest priority task as, due to the run queue locks being
held, the reschedule triggered by pull_task() will not have taken place.
This could result in inappropriate overriding of skip_for_load and
excessive load being moved.

The attached patch addresses these problems by deleting all reference to
best_prio and best_prio_seen and making this_best_prio a reference
parameter to the various functions involved.

load_balance_fair() has also been modified so that this_best_prio is
only reset (in the loop) if CONFIG_FAIR_GROUP_SCHED is set. This should
preserve the effect of helping spread groups' higher priority tasks
around the available CPUs while improving system performance when
CONFIG_FAIR_GROUP_SCHED isn't set.

Signed-off-by: Peter Williams
Signed-off-by: Ingo Molnar

Peter Williams
2007-08-09 17:16:46 +0800
430106592 sched: simplify move_tasks() ... Browse Code »

The move_tasks() function is currently multiplexed with two distinct
capabilities:

1. attempt to move a specified amount of weighted load from one run
queue to another; and
2. attempt to move a specified number of tasks from one run queue to
another.

The first of these capabilities is used in two places, load_balance()
and load_balance_idle(), and in both of these cases the return value of
move_tasks() is used purely to decide if tasks/load were moved and no
notice of the actual number of tasks moved is taken.

The second capability is used in exactly one place,
active_load_balance(), to attempt to move exactly one task and, as
before, the return value is only used as an indicator of success or failure.

This multiplexing of sched_task() was introduced, by me, as part of the
smpnice patches and was motivated by the fact that the alternative, one
function to move specified load and one to move a single task, would
have led to two functions of roughly the same complexity as the old
move_tasks() (or the new balance_tasks()). However, the new modular
design of the new CFS scheduler allows a simpler solution to be adopted
and this patch addresses that solution by:

1. adding a new function, move_one_task(), to be used by
active_load_balance(); and
2. making move_tasks() a single purpose function that tries to move a
specified weighted load and returns 1 for success and 0 for failure.

One of the consequences of these changes is that neither move_one_task()
or the new move_tasks() care how many tasks sched_class.load_balance()
moves and this enables its interface to be simplified by returning the
amount of load moved as its result and removing the load_moved pointer
from the argument list. This helps simplify the new move_tasks() and
slightly reduces the amount of work done in each of
sched_class.load_balance()'s implementations.

Further simplification, e.g. changes to balance_tasks(), are possible
but (slightly) complicated by the special needs of load_balance_fair()
so I've left them to a later patch (if this one gets accepted).

NB Since move_tasks() gets called with two run queue locks held even
small reductions in overhead are worthwhile.

[ mingo@elte.hu ]

this change also reduces code size nicely:

text data bss dec hex filename
39216 3618 24 42858 a76a sched.o.before
39173 3618 24 42815 a73f sched.o.after

Signed-off-by: Peter Williams
Signed-off-by: Ingo Molnar

Peter Williams
2007-08-09 17:16:46 +0800

10 Jul, 2007

1 commit

fa72e9e48 sched: cfs core, kernel/sched_idletask.c ... Browse Code »

add kernel/sched_idletask.c - which implements the idle thread
scheduling class. This further simplifies sched.c (under CFS),
for example a number of 'if (p == rq->idle)' type of special-cases
can be removed from sched.c, and schedule() gets simpler too.

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-07-10 00:51:58 +0800