15 Dec, 2009
1 commit
-
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Acked-by: Ingo Molnar
11 Dec, 2009
1 commit
-
This build warning:
kernel/sched.c: In function 'set_task_cpu':
kernel/sched.c:2070: warning: unused variable 'old_rq'Made me realize that the forced2_migrations stat looks pretty
pointless (and a misnomer) - remove it.Cc: Peter Zijlstra
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar
09 Dec, 2009
2 commits
-
As scaling now takes place on all kind of cpu add/remove events a user
that configures values via proc should be able to configure if his set
values are still rescaled or kept whatever happens.As the comments state that log2 was just a second guess that worked the
interface is not just designed for on/off, but to choose a scaling type.
Currently this allows none, log and linear, but more important it allwos
us to keep the interface even if someone has an even better idea how to
scale the values.Signed-off-by: Christian Ehrhardt
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
WAKEUP_RUNNING was an experiment, not sure why that ever ended up being
merged...Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
05 Nov, 2009
1 commit
-
Rate limit newidle to migration_cost. It's a win for all
stages of sysbench oltp tests.Signed-off-by: Mike Galbraith
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
17 Sep, 2009
1 commit
-
Create a new wakeup preemption mode, preempt towards tasks that run
shorter on avg. It sets next buddy to be sure we actually run the task
we preempted for.Test results:
root@twins:~# while :; do :; done &
[1] 6537
root@twins:~# while :; do :; done &
[2] 6538
root@twins:~# while :; do :; done &
[3] 6539
root@twins:~# while :; do :; done &
[4] 6540root@twins:/home/peter# ./latt -c4 sleep 4
Entries: 48 (clients=4)Averages:
------------------------------
Max 4750 usec
Avg 497 usec
Stdev 737 usecroot@twins:/home/peter# echo WAKEUP_RUNNING > /debug/sched_features
root@twins:/home/peter# ./latt -c4 sleep 4
Entries: 48 (clients=4)Averages:
------------------------------
Max 14 usec
Avg 5 usec
Stdev 3 usecDisabled by default - needs more testing.
Signed-off-by: Peter Zijlstra
Acked-by: Mike Galbraith
Signed-off-by: Ingo Molnar
LKML-Reference:
02 Sep, 2009
1 commit
-
For counting how long an application has been waiting for
(disk) IO, there currently is only the HZ sample driven
information available, while for all other counters in this
class, a high resolution version is available via
CONFIG_SCHEDSTATS.In order to make an improved bootchart tool possible, we also
need a higher resolution version of the iowait time.This patch below adds this scheduler statistic to the kernel.
Signed-off-by: Arjan van de Ven
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
18 Jun, 2009
1 commit
-
There are some points which refer the per-cpu value "runqueues" directly.
sched.c provides nice abstraction, such as cpu_rq() and this_rq(),
so we should use these macros when looking runqueues.Signed-off-by: Hitoshi Mitake
LKML-Reference:
Signed-off-by: Ingo Molnar
25 Mar, 2009
1 commit
-
Impact: cleanup, new schedstat ABI
Since they are used on in statistics and are always set to zero, the
following fields from struct rq have been removed: yld_exp_empty,
yld_act_empty and yld_both_empty.Both Sched Debug and SCHEDSTAT_VERSION versions has also been
incremented since ABIs have been changed.The schedtop tool has been updated to properly handle new version of
schedstat:http://rt.wiki.kernel.org/index.php/Schedtop_utility
Signed-off-by: Luis Henriques
Acked-by: Gregory Haskins
Acked-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
18 Mar, 2009
1 commit
-
The jiffies value was being printed for each CPU, which does not seem to make
sense. Moved jiffies to system section.Signed-off-by: Luis Henriques
Acked-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
15 Jan, 2009
1 commit
-
Introduce a new avg_wakeup statistic.
avg_wakeup is a measure of how frequently a task wakes up other tasks, it
represents the average time between wakeups, with a limit of avg_runtime
for when it doesn't wake up anybody.Signed-off-by: Peter Zijlstra
Signed-off-by: Mike Galbraith
Signed-off-by: Ingo Molnar
11 Jan, 2009
1 commit
-
Impact: avoid accessing NULL tg.css->cgroup
In commit 0a0db8f5c9d4bbb9bbfcc2b6cb6bce2d0ef4d73d, I removed checking
NULL tg.css->cgroup, but I realized I was wrong when I found reading
/proc/sched_debug can race with cgroup_create().Signed-off-by: Li Zefan
Signed-off-by: Ingo Molnar
02 Dec, 2008
1 commit
-
Impact: extend information in /proc/sched_debug
This patch adds uid information in sched_debug for CONFIG_USER_SCHED
Signed-off-by: Arun R Bharadwaj
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar
19 Nov, 2008
1 commit
-
Conflicts:
kernel/Makefile
16 Nov, 2008
1 commit
-
Luis Henriques reported that with CONFIG_PREEMPT=y + CONFIG_PREEMPT_DEBUG=y +
CONFIG_SCHED_DEBUG=y + CONFIG_LATENCYTOP=y enabled, the following warning
triggers when using latencytop:> [ 775.663239] BUG: using smp_processor_id() in preemptible [00000000] code: latencytop/6585
> [ 775.663303] caller is native_sched_clock+0x3a/0x80
> [ 775.663314] Pid: 6585, comm: latencytop Tainted: G W 2.6.28-rc4-00355-g9c7c354 #1
> [ 775.663322] Call Trace:
> [ 775.663343] [] debug_smp_processor_id+0xe4/0xf0
> [ 775.663356] [] native_sched_clock+0x3a/0x80
> [ 775.663368] [] sched_clock+0x9/0x10
> [ 775.663381] [] proc_sched_show_task+0x8bd/0x10e0
> [ 775.663395] [] sched_show+0x3e/0x80
> [ 775.663408] [] seq_read+0xdb/0x350
> [ 775.663421] [] ? security_file_permission+0x16/0x20
> [ 775.663435] [] vfs_read+0xc8/0x170
> [ 775.663447] [] sys_read+0x55/0x90
> [ 775.663460] [] system_call_fastpath+0x16/0x1b
> ...This breakage was caused by me via:
7cbaef9: sched: optimize sched_clock() a bit
Change the calls to cpu_clock().
Reported-by: Luis Henriques
11 Nov, 2008
1 commit
-
Impact: extend /proc/sched_debug info
Since the statistics of a group entity isn't exported directly from the
kernel, it becomes difficult to obtain some of the group statistics.
For example, the current method to obtain exec time of a group entity
is not always accurate. One has to read the exec times of all
the tasks(/proc//sched) in the group and add them. This method
fails (or becomes difficult) if we want to collect stats of a group over
a duration where tasks get created and terminated.This patch makes it easier to obtain group stats by directly including
them in /proc/sched_debug. Stats like group exec time would help user
programs (like LTP) to accurately measure the group fairness.An example output of group stats from /proc/sched_debug:
cfs_rq[3]:/3/a/1
.exec_clock : 89.598007
.MIN_vruntime : 0.000001
.min_vruntime : 256300.970506
.max_vruntime : 0.000001
.spread : 0.000000
.spread0 : -25373.372248
.nr_running : 0
.load : 0
.yld_exp_empty : 0
.yld_act_empty : 0
.yld_both_empty : 0
.yld_count : 4474
.sched_switch : 0
.sched_count : 40507
.sched_goidle : 12686
.ttwu_count : 15114
.ttwu_local : 11950
.bkl_count : 67
.nr_spread_over : 0
.shares : 0
.se->exec_start : 113676.727170
.se->vruntime : 1592.612714
.se->sum_exec_runtime : 89.598007
.se->wait_start : 0.000000
.se->sleep_start : 0.000000
.se->block_start : 0.000000
.se->sleep_max : 0.000000
.se->block_max : 0.000000
.se->exec_max : 1.000282
.se->slice_max : 1.999750
.se->wait_max : 54.981093
.se->wait_sum : 217.610521
.se->wait_count : 50
.se->load.weight : 2Signed-off-by: Bharata B Rao
Acked-by: Srivatsa Vaddagiri
Acked-by: Dhaval Giani
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar
10 Nov, 2008
1 commit
-
Impact: clean up and fix debug info printout
While looking over the sched_debug code I noticed that we printed the rq
schedstats for every cfs_rq, ammend this.Also change nr_spead_over into an int, and fix a little buglet in
min_vruntime printing.Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar
04 Nov, 2008
1 commit
-
Impact: cleanup
cfs->tg is initialized in init_tg_cfs_entry() with tg != NULL, and
will never be invalidated to NULL. And the underlying cgroup of a
valid task_group is always valid.Same for rt->tg.
Signed-off-by: Li Zefan
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar
30 Oct, 2008
1 commit
-
Impact: change /proc/sched/debug from rw-r--r-- to r--r--r--
/proc/sched_debug is read-only.
Signed-off-by: Li Zefan
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar
10 Oct, 2008
1 commit
-
lock_task_sighand() make sure task->sighand is being protected,
so we do not need rcu_read_lock().
[ exec() will get task->sighand->siglock before change task->sighand! ]But code using rcu_read_lock() _just_ to protect lock_task_sighand()
only appear in procfs. (and some code in procfs use lock_task_sighand()
without such redundant protection.)Other subsystem may put lock_task_sighand() into rcu_read_lock()
critical region, but these rcu_read_lock() are used for protecting
"for_each_process()", "find_task_by_vpid()" etc. , not for protecting
lock_task_sighand().Signed-off-by: Lai Jiangshan
[ok from Oleg]
Signed-off-by: Alexey Dobriyan
27 Jun, 2008
2 commits
-
show all the schedstats in /debug/sched_debug as well.
Signed-off-by: Peter Zijlstra
Cc: Srivatsa Vaddagiri
Cc: Mike Galbraith
Signed-off-by: Ingo Molnar -
Try again..
Initial commit: 18d95a2832c1392a2d63227a7a6d433cb9f2037e
Revert: 6363ca57c76b7b83639ca8c83fc285fa26a7880eSigned-off-by: Peter Zijlstra
Cc: Srivatsa Vaddagiri
Cc: Mike Galbraith
Signed-off-by: Ingo Molnar
20 Jun, 2008
1 commit
-
Signed-off-by: Peter Zijlstra
Cc: "Daniel K."
Signed-off-by: Ingo Molnar
29 May, 2008
1 commit
-
Yanmin Zhang reported:
Comparing with 2.6.25, volanoMark has big regression with kernel 2.6.26-rc1.
It's about 50% on my 8-core stoakley, 16-core tigerton, and Itanium Montecito.With bisect, I located the following patch:
| 18d95a2832c1392a2d63227a7a6d433cb9f2037e is first bad commit
| commit 18d95a2832c1392a2d63227a7a6d433cb9f2037e
| Author: Peter Zijlstra
| Date: Sat Apr 19 19:45:00 2008 +0200
|
| sched: fair-group: SMP-nice for group schedulingRevert it so that we get v2.6.25 behavior.
Bisected-by: Yanmin Zhang
Signed-off-by: Ingo Molnar
06 May, 2008
1 commit
-
this replaces the rq->clock stuff (and possibly cpu_clock()).
- architectures that have an 'imperfect' hardware clock can set
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK- the 'jiffie' window might be superfulous when we update tick_gtod
before the __update_sched_clock() call in sched_clock_tick()- cpu_clock() might be implemented as:
sched_clock_cpu(smp_processor_id())
if the accuracy proves good enough - how far can TSC drift in a
single jiffie when considering the filtering and idle hooks?[ mingo@elte.hu: various fixes and cleanups ]
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar
01 May, 2008
1 commit
-
Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide. Move its definition
to math64.h as currently no architecture overrides the generic implementation.
They can still override it of course, but the duplicated declarations are
avoided.Signed-off-by: Roman Zippel
Cc: Avi Kivity
Cc: Russell King
Cc: Geert Uytterhoeven
Cc: Ralf Baechle
Cc: David Howells
Cc: Jeff Dike
Cc: Ingo Molnar
Cc: "David S. Miller"
Cc: Patrick McHardy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Apr, 2008
1 commit
-
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.Signed-off-by: Denis V. Lunev
Cc: Alexey Dobriyan
Cc: "Eric W. Biederman"
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Apr, 2008
3 commits
-
Signed-off-by: Ingo Molnar
-
Add some extra debug output so we can get a better overview of the
full hierarchy.We print the cgroup path after each cfs_rq, so we can see what group
we're looking at.Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
it's unused.
Signed-off-by: Ingo Molnar
19 Mar, 2008
1 commit
-
improve affine wakeups. Maintain the 'overlap' metric based on CFS's
sum_exec_runtime - which means the amount of time a task executes
after it wakes up some other task.Use the 'overlap' for the wakeup decisions: if the 'overlap' is short,
it means there's strong workload coupling between this task and the
woken up task. If the 'overlap' is large then the workload is decoupled
and the scheduler will move them to separate CPUs more easily.( Also slightly move the preempt_check within try_to_wake_up() - this has
no effect on functionality but allows 'early wakeups' (for still-on-rq
tasks) to be correctly accounted as well.)Signed-off-by: Ingo Molnar
26 Jan, 2008
2 commits
-
Right now, the linux kernel (with scheduler statistics enabled) keeps track
of the maximum time a process is waiting to be scheduled. While the maximum
is a very useful metric, tracking average and total is equally useful
(at least for latencytop) to figure out the accumulated effect of scheduler
delays. The accumulated effect is important to judge the performance impact
of scheduler tuning/behavior.Signed-off-by: Arjan van de Ven
Signed-off-by: Ingo Molnar -
We monitor clock overflows, let's also monitor clock underflows.
Signed-off-by: Guillaume Chazarain
Signed-off-by: Ingo Molnar
31 Dec, 2007
1 commit
-
Meelis Roos reported these warnings on sparc64:
CC kernel/sched.o
In file included from kernel/sched.c:879:
kernel/sched_debug.c: In function 'nsec_high':
kernel/sched_debug.c:38: warning: comparison of distinct pointer types lacks a castthe debug check in do_div() is over-eager here, because the long long
is always positive in these places. Mark this by casting them to
unsigned long long.no change in code output:
text data bss dec hex filename
51471 6582 376 58429 e43d sched.o.before
51471 6582 376 58429 e43d sched.o.aftermd5:
7f7729c111f185bf3ccea4d542abc049 sched.o.before.asm
7f7729c111f185bf3ccea4d542abc049 sched.o.after.asmSigned-off-by: Ingo Molnar
28 Nov, 2007
1 commit
-
clean up overlong line in kernel/sched_debug.c.
Signed-off-by: Ingo Molnar
27 Nov, 2007
1 commit
-
bump version of kernel/sched_debug.c and remove CFS version
information from it.Signed-off-by: Ingo Molnar
10 Nov, 2007
1 commit
-
we lost the sched_min_granularity tunable to a clever optimization
that uses the sched_latency/min_granularity ratio - but the ratio
is quite unintuitive to users and can also crash the kernel if the
ratio is set to 0. So reintroduce the min_granularity tunable,
while keeping the ratio maintained internally.no functionality changed.
[ mingo@elte.hu: some fixlets. ]
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar
25 Oct, 2007
1 commit
-
Lockdep noticed that this lock can also be taken from hardirq context, and can
thus not unconditionally disable/enable irqs.WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on()
[show_trace_log_lvl+26/48] show_trace_log_lvl+0x1a/0x30
[show_trace+18/32] show_trace+0x12/0x20
[dump_stack+22/32] dump_stack+0x16/0x20
[trace_hardirqs_on+405/416] trace_hardirqs_on+0x195/0x1a0
[_read_unlock_irq+34/48] _read_unlock_irq+0x22/0x30
[sched_debug_show+2615/4224] sched_debug_show+0xa37/0x1080
[show_state_filter+326/368] show_state_filter+0x146/0x170
[sysrq_handle_showstate+10/16] sysrq_handle_showstate+0xa/0x10
[__handle_sysrq+123/288] __handle_sysrq+0x7b/0x120
[handle_sysrq+40/64] handle_sysrq+0x28/0x40
[kbd_event+1045/1680] kbd_event+0x415/0x690
[input_pass_event+206/208] input_pass_event+0xce/0xd0
[input_handle_event+170/928] input_handle_event+0xaa/0x3a0
[input_event+95/112] input_event+0x5f/0x70
[atkbd_interrupt+434/1456] atkbd_interrupt+0x1b2/0x5b0
[serio_interrupt+59/128] serio_interrupt+0x3b/0x80
[i8042_interrupt+263/576] i8042_interrupt+0x107/0x240
[handle_IRQ_event+40/96] handle_IRQ_event+0x28/0x60
[handle_edge_irq+175/320] handle_edge_irq+0xaf/0x140
[do_IRQ+64/128] do_IRQ+0x40/0x80
[common_interrupt+46/52] common_interrupt+0x2e/0x34Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar
19 Oct, 2007
1 commit
-
schedstat is useful in investigating CPU scheduler behavior. Ideally,
I think it is beneficial to have it on all the time. However, the
cost of turning it on in production system is quite high, largely due
to number of events it collects and also due to its large memory
footprint.Most of the fields probably don't need to be full 64-bit on 64-bit
arch. Rolling over 4 billion events will most like take a long time
and user space tool can be made to accommodate that. I'm proposing
kernel to cut back most of variable width on 64-bit system. (note,
the following patch doesn't affect 32-bit system).Signed-off-by: Ken Chen
Signed-off-by: Ingo Molnar
15 Oct, 2007
1 commit
-
In general, struct file_operations are const in the kernel, to not have
false cacheline sharing and to catch bugs at compiletime with accidental
writes to them. The new scheduler code introduces a new non-const one;
fix this up.Signed-off-by: Arjan van de Ven
Signed-off-by: Ingo Molnar