20 Jul, 2007
4 commits
-
Implement the cpu_clock(cpu) interface for kernel-internal use:
high-speed (but slightly incorrect) per-cpu clock constructed from
sched_clock().This API, unused at the moment, will be used in the future by blktrace,
by the softlockup-watchdog, by printk and by lockstat.Signed-off-by: Ingo Molnar
-
nr_moved is not the correct check for triggering all pinned logic. Fix
the all pinned logic in the case of load_balance_newidle().Signed-off-by: Suresh Siddha
Signed-off-by: Ingo Molnar -
In the presence of SMT, newly idle balance was never happening for
multi-core and SMP domains (even when both the logical siblings are
idle).If thread 0 is already idle and when thread 1 is about to go to idle,
newly idle load balance always think that one of the threads is not idle
and skips doing the newly idle load balance for multi-core and SMP
domains.This is because of the idle_cpu() macro, which checks if the current
process on a cpu is an idle process. But this is not the case for the
thread doing the load_balance_newidle().Fix this by using runqueue's nr_running field instead of idle_cpu(). And
also skip the logic of 'only one idle cpu in the group will be doing
load balancing' during newly idle case.Signed-off-by: Suresh Siddha
Signed-off-by: Ingo Molnar -
Currently most of the per cpu data, which is accessed by different cpus,
has a ____cacheline_aligned_in_smp attribute. Move all this data to the
new per cpu shared data section: .data.percpu.shared_aligned.This will seperate the percpu data which is referenced frequently by other
cpus from the local only percpu data.Signed-off-by: Fenghua Yu
Acked-by: Suresh Siddha
Cc: Rusty Russell
Cc: Christoph Lameter
Cc: "Luck, Tony"
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Jul, 2007
1 commit
-
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki
Acked-by: Nigel Cunningham
Cc: Pavel Machek
Cc: Oleg Nesterov
Cc: Gautham R Shenoy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Jul, 2007
3 commits
-
prettify the prio_to_wmult[] array. (this could have saved us from the typos)
Signed-off-by: Ingo Molnar
-
document prio_to_wmult[].
Signed-off-by: Ingo Molnar
-
improve the comments around the wmult array (which controls the weight
of niced tasks). Clarify that to achieve a 10% difference in CPU
utilization, a weight multiplier of 1.25 has to be used.Signed-off-by: Ingo Molnar
14 Jul, 2007
4 commits
-
Roman Zippel noticed another inconsistency of the wmult table.
wmult[16] has a missing digit.
Signed-off-by: Thomas Gleixner
Signed-off-by: Linus Torvalds -
fix show_task()/show_tasks() output:
- there's no sibling info anymore
- the fields were not aligned properly with the description
- get rid of the lazy-TLB output: it's been quite some time since
we last had a bug there, and when we had a bug it wasnt helped a
bit by this debug output.Signed-off-by: Ingo Molnar
Signed-off-by: Linus Torvalds -
Allow granularity up to 100 msecs, instead of 10 msecs.
(needed on larger boxes)Signed-off-by: Ingo Molnar
Signed-off-by: Linus Torvalds -
There's a typo in the values in prio_to_wmult[] for nice level 1. While
it did not cause bad CPU distribution, but caused more rescheduling
between nice-0 and nice-1 tasks than necessary.Signed-off-by: Ingo Molnar
Signed-off-by: Linus Torvalds
10 Jul, 2007
27 commits
-
add credits for recent major scheduler contributions:
Con Kolivas, for pioneering the fair-scheduling approach
Peter Williams, for smpnice
Mike Galbraith, for interactivity tuning of CFS
Srivatsa Vaddagiri, for group scheduling enhancementsSigned-off-by: Ingo Molnar
-
clean up the sleep_on() APIs:
- do not use fastcall
- replace fragile macro magic with proper inline functionsSigned-off-by: Ingo Molnar
-
4 small style cleanups to sched.c: checkpatch.pl is now happy about
the totality of sched.c [ignoring false positives] - yay! ;-)Signed-off-by: Ingo Molnar
-
remove unused rq types from sched.c, now that we switched
over to CFS.Signed-off-by: Ingo Molnar
-
remove now unused interactivity-heuristics related defined and
types of the old scheduler.Signed-off-by: Ingo Molnar
-
clean up include files in sched.c, they were still old-style .
Signed-off-by: Ingo Molnar
-
make use of sched-clock-unstable events.
Signed-off-by: Ingo Molnar
-
track TSC-unstable events and propagate it to the scheduler code.
Also allow sched_clock() to be used when the TSC is unstable,
the rq_clock() wrapper creates a reliable clock out of it.Signed-off-by: Ingo Molnar
-
apply the CFS core code.
this change switches over the scheduler core to CFS's modular
design and makes use of kernel/sched_fair/rt/idletask.c to implement
Linux's scheduling policies.thanks to Andrew Morton and Thomas Gleixner for lots of detailed review
feedback and for fixlets.Signed-off-by: Ingo Molnar
Signed-off-by: Mike Galbraith
Signed-off-by: Dmitry Adamushko
Signed-off-by: Srivatsa Vaddagiri -
remove the sleep-bonus interactivity code from the core scheduler.
scheduling policy is implemented in the policy modules, and CFS does
not need such type of heuristics.Signed-off-by: Ingo Molnar
-
remove the expired_starving() heuristics from the core scheduler.
CFS does not need it, and this did not really work well in practice
anyway, due to the rq->nr_running multiplier to STARVATION_LIMIT.Signed-off-by: Ingo Molnar
-
remove the sleep_type heuristics from the core scheduler - scheduling
policy is implemented in the scheduling-policy modules. (and CFS does
not use this type of sleep-type heuristics)Signed-off-by: Ingo Molnar
-
add the new load-calculation methods of CFS.
Signed-off-by: Ingo Molnar
-
clean up: move __normal_prio() in head of normal_prio().
no code changed.
Signed-off-by: Ingo Molnar
-
cleanup: move dequeue/enqueue_task() to a more logical place, to
not split up __normal_prio()/normal_prio().Signed-off-by: Ingo Molnar
-
move resched_task()/resched_cpu() into the 'public interfaces'
section of sched.c, for use by kernel/sched_fair/rt/idletask.cSigned-off-by: Ingo Molnar
-
clean up the rt priority macros, pointed out by Andrew Morton.
Signed-off-by: Ingo Molnar
-
add the set_task_cfs_rq() abstraction needed by CONFIG_FAIR_GROUP_SCHED.
(not activated yet)
Signed-off-by: Ingo Molnar
-
update the posix-cpu-timers code to use CFS's CPU accounting information.
Signed-off-by: Ingo Molnar
-
add rq_clock()/__rq_clock(), a robust wrapper around sched_clock(),
used by CFS. It protects against common type of sched_clock() problems
(caused by hardware): time warps forwards and backwards.Signed-off-by: Ingo Molnar
-
add the CFS rq data types to sched.c.
(the old scheduler fields are still intact, they are removed
by a later patch)Signed-off-by: Ingo Molnar
-
create sched_stats.h and move sched.c schedstats code into it.
This cleans up sched.c a bit.no code changes are caused by this patch.
Signed-off-by: Ingo Molnar
-
add the init_idle_bootup_task() callback to the bootup thread,
unused at the moment. (CFS will use it to switch the scheduling
class of the boot thread to the idle class)Signed-off-by: Ingo Molnar
-
remove sched_exit(): the elaborate dance of us trying to recover
timeslices given to child tasks never really worked.CFS does not need it either.
Signed-off-by: Ingo Molnar
-
uninline set_task_cpu(): CFS will add more code to it.
Signed-off-by: Ingo Molnar
-
the SMP load-balancer uses the boot-time migration-cost estimation
code to attempt to improve the quality of balancing. The reason for
this code is that the discrete priority queues do not preserve
the order of scheduling accurately, so the load-balancer skips
tasks that were running on a CPU 'recently'.this code is fundamental fragile: the boot-time migration cost detector
doesnt really work on systems that had large L3 caches, it caused boot
delays on large systems and the whole cache-hot concept made the
balancing code pretty undeterministic as well.(and hey, i wrote most of it, so i can say it out loud that it sucks ;-)
under CFS the same purpose of cache affinity can be achieved without
any special cache-hot special-case: tasks are sorted in the 'timeline'
tree and the SMP balancer picks tasks from the left side of the
tree, thus the most cache-cold task is balanced automatically.Signed-off-by: Ingo Molnar
-
enum idle_type (used by the load-balancer) clashes with the
SCHED_IDLE name that we want to introduce. 'CPU_IDLE' instead
of 'SCHED_IDLE' is more descriptive as well.Signed-off-by: Ingo Molnar
24 Jun, 2007
1 commit
-
The intervals of domains that do not have SD_BALANCE_NEWIDLE must be
considered for the calculation of the time of the next balance. Otherwise
we may defer rebalancing forever.Siddha also spotted that the conversion of the balance interval
to jiffies is missing. Fix that to.From: Srivatsa Vaddagiri
also continue the loop if !(sd->flags & SD_LOAD_BALANCE).
Tested-by: Paul E. McKenney
It did in fact trigger under all three of mainline, CFS, and -rt including CFS
-- see below for a couple of emails from last Friday giving results for these
three on the AMD box (where it happened) and on a single-quad NUMA-Q system
(where it did not, at least not with such severity).Signed-off-by: Christoph Lameter
Signed-off-by: Ingo Molnar
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds