08 Dec, 2009
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
security/tomoyo: Remove now unnecessary handling of security_sysctl.
security/tomoyo: Add a special case to handle accesses through the internal proc mount.
sysctl: Drop & in front of every proc_handler.
sysctl: Remove CTL_NONE and CTL_UNNUMBERED
sysctl: kill dead ctl_handler definitions.
sysctl: Remove the last of the generic binary sysctl support
sysctl net: Remove unused binary sysctl code
sysctl security/tomoyo: Don't look at ctl_name
sysctl arm: Remove binary sysctl support
sysctl x86: Remove dead binary sysctl support
sysctl sh: Remove dead binary sysctl support
sysctl powerpc: Remove dead binary sysctl support
sysctl ia64: Remove dead binary sysctl support
sysctl s390: Remove dead sysctl binary support
sysctl frv: Remove dead binary sysctl support
sysctl mips/lasat: Remove dead binary sysctl support
sysctl drivers: Remove dead binary sysctl support
sysctl crypto: Remove dead binary sysctl support
sysctl security/keys: Remove dead binary sysctl support
sysctl kernel: Remove binary sysctl logic
...
06 Dec, 2009
2 commits
-
…/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits)
sched, cputime: Introduce thread_group_times()
sched, cputime: Cleanups related to task_times()
Revert "sched, x86: Optimize branch hint in __switch_to()"
sched: Fix isolcpus boot option
sched: Revert 498657a478c60be092208422fefa9c7b248729c2
sched, time: Define nsecs_to_jiffies()
sched: Remove task_{u,s,g}time()
sched: Introduce task_times() to replace task_{u,s}time() pair
sched: Limit the number of scheduler debug messages
sched.c: Call debug_show_all_locks() when dumping all tasks
sched, x86: Optimize branch hint in __switch_to()
sched: Optimize branch hint in context_switch()
sched: Optimize branch hint in pick_next_task_fair()
sched_feat_write(): Update ppos instead of file->f_pos
sched: Sched_rt_periodic_timer vs cpu hotplug
sched, kvm: Fix race condition involving sched_in_preempt_notifers
sched: More generic WAKE_AFFINE vs select_idle_sibling()
sched: Cleanup select_task_rq_fair()
sched: Fix granularity of task_u/stime()
sched: Fix/add missing update_rq_clock() calls
... -
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
rcu: Make RCU's CPU-stall detector be default
rcu: Add expedited grace-period support for preemptible RCU
rcu: Enable fourth level of TREE_RCU hierarchy
rcu: Rename "quiet" functions
rcu: Re-arrange code to reduce #ifdef pain
rcu: Eliminate unneeded function wrapping
rcu: Fix grace-period-stall bug on large systems with CPU hotplug
rcu: Eliminate __rcu_pending() false positives
rcu: Further cleanups of use of lastcomp
rcu: Simplify association of forced quiescent states with grace periods
rcu: Accelerate callback processing on CPUs not detecting GP end
rcu: Mark init-time-only rcu_bootup_announce() as __init
rcu: Simplify association of quiescent states with grace periods
rcu: Rename dynticks_completed to completed_fqs
rcu: Enable synchronize_sched_expedited() fastpath
rcu: Remove inline from forward-referenced functions
rcu: Fix note_new_gpnum() uses of ->gpnum
rcu: Fix synchronization for rcu_process_gp_end() uses of ->completed counter
rcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling of ->completed counter
rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls
...
03 Dec, 2009
3 commits
-
We don't need to build mutex_spin_on_owner() if we have
CONFIG_DEBUG_MUTEXES or CONFIG_HAVE_DEFAULT_NO_SPIN_MUTEXES as
it won't be used under such configs.Use CONFIG_MUTEX_SPIN_ON_OWNER as it gathers all the necessary
checks before building it.Signed-off-by: Frederic Weisbecker
Acked-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
Cc: Peter Zijlstra -
This is a real fix for problem of utime/stime values decreasing
described in the thread:http://lkml.org/lkml/2009/11/3/522
Now cputime is accounted in the following way:
- {u,s}time in task_struct are increased every time when the thread
is interrupted by a tick (timer interrupt).- When a thread exits, its {u,s}time are added to signal->{u,s}time,
after adjusted by task_times().- When all threads in a thread_group exits, accumulated {u,s}time
(and also c{u,s}time) in signal struct are added to c{u,s}time
in signal struct of the group's parent.So {u,s}time in task struct are "raw" tick count, while
{u,s}time and c{u,s}time in signal struct are "adjusted" values.And accounted values are used by:
- task_times(), to get cputime of a thread:
This function returns adjusted values that originates from raw
{u,s}time and scaled by sum_exec_runtime that accounted by CFS.- thread_group_cputime(), to get cputime of a thread group:
This function returns sum of all {u,s}time of living threads in
the group, plus {u,s}time in the signal struct that is sum of
adjusted cputimes of all exited threads belonged to the group.The problem is the return value of thread_group_cputime(),
because it is mixed sum of "raw" value and "adjusted" value:group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)
This misbehavior can break {u,s}time monotonicity.
Assume that if there is a thread that have raw values greater
than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
but only runs 45ms) and if it exits, cputime will decrease (e.g.
-5ms).To fix this, we could do:
group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)
But task_times() contains hard divisions, so applying it for
every thread should be avoided.This patch fixes the above problem in the following way:
- Modify thread's exit (= __exit_signal()) not to use task_times().
It means {u,s}time in signal struct accumulates raw values instead
of adjusted values. As the result it makes thread_group_cputime()
to return pure sum of "raw" values.- Introduce a new function thread_group_times(*task, *utime, *stime)
that converts "raw" values of thread_group_cputime() to "adjusted"
values, in same calculation procedure as task_times().- Modify group's exit (= wait_task_zombie()) to use this introduced
thread_group_times(). It make c{u,s}time in signal struct to
have adjusted values like before this patch.- Replace some thread_group_cputime() by thread_group_times().
This replacements are only applied where conveys the "adjusted"
cputime to users, and where already uses task_times() near by it.
(i.e. sys_times(), getrusage(), and /proc//stat.)This patch have a positive side effect:
- Before this patch, if a group contains many short-life threads
(e.g. runs 0.9ms and not interrupted by ticks), the group's
cputime could be invisible since thread's cputime was accumulated
after adjusted: imagine adjustment function as adj(ticks, runtime),
{adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
After this patch it will not happen because the adjustment is
applied after accumulated.v2:
- remove if()s, put new variables into signal_struct.Signed-off-by: Hidetoshi Seto
Acked-by: Peter Zijlstra
Cc: Spencer Candland
Cc: Americo Wang
Cc: Oleg Nesterov
Cc: Balbir Singh
Cc: Stanislaw Gruszka
LKML-Reference:
Signed-off-by: Ingo Molnar -
- Remove if({u,s}t)s because no one call it with NULL now.
- Use cputime_{add,sub}().
- Add ifndef-endif for prev_{u,s}time since they are used
only when !VIRT_CPU_ACCOUNTING.Signed-off-by: Hidetoshi Seto
Cc: Peter Zijlstra
Cc: Spencer Candland
Cc: Americo Wang
Cc: Oleg Nesterov
Cc: Balbir Singh
Cc: Stanislaw Gruszka
LKML-Reference:
Signed-off-by: Ingo Molnar
02 Dec, 2009
2 commits
-
Anton Blanchard wrote:
> We allocate and zero cpu_isolated_map after the isolcpus
> __setup option has run. This means cpu_isolated_map always
> ends up empty and if CPUMASK_OFFSTACK is enabled we write to a
> cpumask that hasn't been allocated.I introduced this regression in 49557e620339cb13 (sched: Fix
boot crash by zalloc()ing most of the cpu masks).Use the bootmem allocator if they set isolcpus=, otherwise
allocate and zero like normal.Reported-by: Anton Blanchard
Signed-off-by: Rusty Russell
Cc: peterz@infradead.org
Cc: Linus Torvalds
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar
Tested-by: Anton Blanchard -
498657a478c60be092208422fefa9c7b248729c2 incorrectly assumed
that preempt wasn't disabled around context_switch() and thus
was fixing imaginary problem. It also broke KVM because it
depended on ->sched_in() to be called with irq enabled so that
it can do smp calls from there.Revert the incorrect commit and add comment describing different
contexts under with the two callbacks are invoked.Avi: spotted transposed in/out in the added comment.
Signed-off-by: Tejun Heo
Acked-by: Avi Kivity
Cc: peterz@infradead.org
Cc: efault@gmx.de
Cc: rusty@rustcorp.com.au
LKML-Reference:
Signed-off-by: Ingo Molnar
26 Nov, 2009
5 commits
-
Use of msecs_to_jiffies() for nsecs_to_cputime() have some
problems:- The type of msecs_to_jiffies()'s argument is unsigned int, so
it cannot convert msecs greater than UINT_MAX = about 49.7 days.- msecs_to_jiffies() returns MAX_JIFFY_OFFSET if MSB of argument
is set, assuming that input was negative value. So it cannot
convert msecs greater than INT_MAX = about 24.8 days too.This patch defines a new function nsecs_to_jiffies() that can
deal greater values, and that can deal all incoming values as
unsigned.Signed-off-by: Hidetoshi Seto
Acked-by: Peter Zijlstra
Cc: Stanislaw Gruszka
Cc: Spencer Candland
Cc: Oleg Nesterov
Cc: Balbir Singh
Cc: Amrico Wang
Cc: Thomas Gleixner
Cc: John Stultz
LKML-Reference:
Signed-off-by: Ingo Molnar -
Now all task_{u,s}time() pairs are replaced by task_times().
And task_gtime() is too simple to be an inline function.Cleanup them all.
Signed-off-by: Hidetoshi Seto
Acked-by: Peter Zijlstra
Cc: Stanislaw Gruszka
Cc: Spencer Candland
Cc: Oleg Nesterov
Cc: Balbir Singh
Cc: Americo Wang
LKML-Reference:
Signed-off-by: Ingo Molnar -
Functions task_{u,s}time() are called in pair in almost all
cases. However task_stime() is implemented to call task_utime()
from its inside, so such paired calls run task_utime() twice.It means we do heavy divisions (div_u64 + do_div) twice to get
utime and stime which can be obtained at same time by one set
of divisions.This patch introduces a function task_times(*tsk, *utime,
*stime) to retrieve utime and stime at once in better, optimized
way.Signed-off-by: Hidetoshi Seto
Acked-by: Peter Zijlstra
Cc: Stanislaw Gruszka
Cc: Spencer Candland
Cc: Oleg Nesterov
Cc: Balbir Singh
Cc: Americo Wang
LKML-Reference:
Signed-off-by: Ingo Molnar -
Merge reason: Pick up fixes that did not make it into .32.0
Signed-off-by: Ingo Molnar
-
Remove the verbose scheduler debug messages unless kernel
parameter "sched_debug" set. /proc/sched_debug unchanged.Signed-off-by: Mike Travis
Cc: Heiko Carstens
Cc: Roland Dreier
Cc: Randy Dunlap
Cc: Tejun Heo
Cc: Andi Kleen
Cc: Greg Kroah-Hartman
Cc: Yinghai Lu
Cc: David Rientjes
Cc: Steven Rostedt
Cc: Rusty Russell
Cc: Hidetoshi Seto
Cc: Jack Steiner
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar
25 Nov, 2009
1 commit
-
In commit v2.6.21-691-g39bc89f ("make SysRq-T show all tasks
again") the interface of show_state_filter() was changed: zero
valued 'state_filter' specifies "dump all tasks" (instead of -1).However, the condition for calling debug_show_all_locks() ("show
locks if all tasks are dumped") was not updated accordingly.Signed-off-by: Shmulik Ladkani
Cc: peterz@infradead.org
LKML-Reference:
Signed-off-by: Ingo Molnar
24 Nov, 2009
2 commits
-
Branch hint profiling on my nehalem machine showed over 90%
incorrect branch hints:10420275 170645395 94 context_switch sched.c
3043
10408421 171098521 94 context_switch sched.c
3050Signed-off-by: Tim Blechmann
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
sched_feat_write() should update ppos instead of file->f_pos.
(This reduces some BKL dependencies of this code.)
Signed-off-by: Jan Blunck
Cc: jkacur@redhat.com
Cc: Arnd Bergmann
Cc: Frederic Weisbecker
Cc: Jamie Lokier
Cc: Peter Zijlstra
Cc: Christoph Hellwig
Cc: Alan Cox
LKML-Reference:
Signed-off-by: Ingo Molnar
17 Nov, 2009
1 commit
-
Resolve the conflict between v2.6.32-rc7 where dn_def_dev_handler
gets a small bug fix and the sysctl tree where I am removing all
sysctl strategy routines.
16 Nov, 2009
1 commit
-
Heiko reported a case where a timer interrupt managed to
reference a root_domain structure that was already freed by a
concurrent hot-un-plug operation.Solve this like the regular sched_domain stuff is also
synchronized, by adding a synchronize_sched() stmt to the free
path, this ensures that a root_domain stays present for any
atomic section that could have observed it.Reported-by: Heiko Carstens
Signed-off-by: Peter Zijlstra
Acked-by: Heiko Carstens
Cc: Gregory Haskins
Cc: Siddha Suresh B
Cc: Martin Schwidefsky
LKML-Reference:
Signed-off-by: Ingo Molnar
15 Nov, 2009
1 commit
-
In finish_task_switch(), fire_sched_in_preempt_notifiers() is
called after finish_lock_switch().However, depending on architecture, preemption can be enabled after
finish_lock_switch() which breaks the semantics of preempt
notifiers.So move it before finish_arch_switch(). This also makes the in-
notifiers symmetric to out- notifiers in terms of locking - now
both are called under rq lock.Signed-off-by: Tejun Heo
Acked-by: Avi Kivity
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
12 Nov, 2009
3 commits
-
Originally task_s/utime() were designed to return clock_t but
later changed to return cputime_t by following commit:commit efe567fc8281661524ffa75477a7c4ca9b466c63
Author: Christian Borntraeger
Date: Thu Aug 23 15:18:02 2007 +0200It only changed the type of return value, but not the
implementation. As the result the granularity of task_s/utime()
is still that of clock_t, not that of cputime_t.So using task_s/utime() in __exit_signal() makes values
accumulated to the signal struct to be rounded and coarse
grained.This patch removes casts to clock_t in task_u/stime(), to keep
granularity of cputime_t over the calculation.v2:
Use div_u64() to avoid error "undefined reference to `__udivdi3`"
on some 32bit systems.Signed-off-by: Hidetoshi Seto
Acked-by: Peter Zijlstra
Cc: xiyou.wangcong@gmail.com
Cc: Spencer Candland
Cc: Oleg Nesterov
Cc: Stanislaw Gruszka
LKML-Reference:
Signed-off-by: Ingo Molnar -
kthread_bind(), migrate_task() and sched_fork were missing
updates, and try_to_wake_up() was updating after having already
used the stale clock.Aside from preventing potential latency hits, there' a side
benefit in that early boot printk time stamps become monotonic.Signed-off-by: Mike Galbraith
Acked-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
LKML-Reference: -
Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name
and .strategy members of sysctl tables are dead code. Remove them.Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: David Howells
Signed-off-by: Eric W. Biederman
11 Nov, 2009
2 commits
-
This patch adds a counter increment to enable tasks to actually
take the synchronize_sched_expedited() function's fastpath.Signed-off-by: Paul E. McKenney
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference:
Signed-off-by: Ingo Molnar -
From the code in rt_mutex_setprio(), it is evident that the
intention is that task's with a RT 'prio' value as a consequence
of receiving a PI boost also have their 'sched_class' field set
to '&rt_sched_class'.However, Peter noticed that the code in __setscheduler() could
result in this intention being frustrated. Fix it.Reported-by: Peter Williams
Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar
10 Nov, 2009
1 commit
-
Commit 1b9508f, "Rate-limit newidle" has been confirmed to fix
the netperf UDP loopback regression reported by Alex Shi.This is a cleanup and a fix:
- moved to a more out of the way spot
- fix to ensure that balancing doesn't try to balance
runqueues which haven't gone online yet, which can
mess up CPU enumeration during boot.Reported-by: Alex Shi
Reported-by: Zhang, Yanmin
Signed-off-by: Mike Galbraith
Acked-by: Peter Zijlstra
Cc: # .32.x: a1f84a3: sched: Check for an idle shared cache
Cc: # .32.x: 1b9508f: sched: Rate-limit newidle
Cc: # .32.x: fd21073: sched: Fix affinity logic
Cc: # .32.x
LKML-Reference:
Signed-off-by: Ingo Molnar
08 Nov, 2009
3 commits
-
In 15934a37324f32e0fda633dc7984a671ea81cd75,
field last_tick_seen is added to struct rq.
But it is unused now.Signed-off-by: Lai Jiangshan
Cc: Guillaume Chazarain
LKML-Reference:
Signed-off-by: Ingo Molnar -
root_task_group_empty is used only with FAIR_GROUP_SCHED
so if we use other scheduler options we get:kernel/sched.c:314: warning: 'root_task_group_empty' defined but not used
So move CONFIG_FAIR_GROUP_SCHED up that it covers
root_task_group_empty().Signed-off-by: Cyrill Gorcunov
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Fix variable name in sched.c kernel-doc notation.
Fixes this DocBook warning:
Warning(kernel/sched.c:2008): No description found for parameter
'p' Warning(kernel/sched.c:2008): Excess function parameter 'k'
description in 'kthread_bind'Signed-off-by: Randy Dunlap
LKML-Reference:
Signed-off-by: Ingo Molnar
06 Nov, 2009
1 commit
-
…l/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix kthread_bind() by moving the body of kthread_bind() to sched.c
sched: Disable SD_PREFER_LOCAL at node level
sched: Fix boot crash by zalloc()ing most of the cpu masks
sched: Strengthen buddies and mitigate buddy induced latencies
05 Nov, 2009
1 commit
-
Rate limit newidle to migration_cost. It's a win for all
stages of sysbench oltp tests.Signed-off-by: Mike Galbraith
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
04 Nov, 2009
2 commits
-
Currently partition_sched_domains() takes a 'struct cpumask
*doms_new' which is a kmalloc'ed array of cpumask_t. You can't
have such an array if 'struct cpumask' is undefined, as we plan
for CONFIG_CPUMASK_OFFSTACK=y.So, we make this an array of cpumask_var_t instead: this is the
same for the CONFIG_CPUMASK_OFFSTACK=n case, but requires
multiple allocations for the CONFIG_CPUMASK_OFFSTACK=y case.
Hence we add alloc_sched_domains() and free_sched_domains()
functions.Signed-off-by: Rusty Russell
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
cpu_nr_migrations() is not used, remove it.
Signed-off-by: Hiroshi Shimamoto
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
03 Nov, 2009
1 commit
-
Eric Paris reported that commit
f685ceacab07d3f6c236f04803e2f2f0dbcc5afb causes boot time
PREEMPT_DEBUG complaints.[ 4.590699] BUG: using smp_processor_id() in preemptible [00000000] code: rmmod/1314
[ 4.593043] caller is task_hot+0x86/0xd0Since kthread_bind() messes with scheduler internals, move the
body to sched.c, and lock the runqueue.Reported-by: Eric Paris
Signed-off-by: Mike Galbraith
Tested-by: Eric Paris
Cc: Peter Zijlstra
LKML-Reference:
[ v2: fix !SMP build and clean up ]
Signed-off-by: Ingo Molnar
02 Nov, 2009
1 commit
-
I got a boot crash when forcing cpumasks offstack on 32 bit,
because find_new_ilb() returned 3 on my UP system (nohz.cpu_mask
wasn't zeroed).AFAICT the others need to be zeroed too: only
nohz.ilb_grp_nohz_mask is initialized before use.Signed-off-by: Rusty Russell
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
30 Oct, 2009
1 commit
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
sched: move rq_weight data array out of .percpu
percpu: allow pcpu_alloc() to be called with IRQs off
28 Oct, 2009
1 commit
-
Commit 34d76c41 introduced percpu array update_shares_data, size of which
being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
NR_CPUS configuration, as ia64 allows only 64k for .percpu section.Fix this by allocating this array dynamically and keep only pointer to it
percpu.The per-cpu handling doesn't impose significant performance penalty on
potentially contented path in tg_shares_up()....
ffffffff8104337c: 65 48 8b 14 25 20 cd mov %gs:0xcd20,%rdx
ffffffff81043383: 00 00
ffffffff81043385: 48 c7 c0 00 e1 00 00 mov $0xe100,%rax
ffffffff8104338c: 48 c7 45 a0 00 00 00 movq $0x0,-0x60(%rbp)
ffffffff81043393: 00
ffffffff81043394: 48 c7 45 a8 00 00 00 movq $0x0,-0x58(%rbp)
ffffffff8104339b: 00
ffffffff8104339c: 48 01 d0 add %rdx,%rax
ffffffff8104339f: 49 8d 94 24 08 01 00 lea 0x108(%r12),%rdx
ffffffff810433a6: 00
ffffffff810433a7: b9 ff ff ff ff mov $0xffffffff,%ecx
ffffffff810433ac: 48 89 45 b0 mov %rax,-0x50(%rbp)
ffffffff810433b0: bb 00 04 00 00 mov $0x400,%ebx
ffffffff810433b5: 48 89 55 c0 mov %rdx,-0x40(%rbp)
...After:
...
ffffffff8104337c: 65 8b 04 25 28 cd 00 mov %gs:0xcd28,%eax
ffffffff81043383: 00
ffffffff81043384: 48 98 cltq
ffffffff81043386: 49 8d bc 24 08 01 00 lea 0x108(%r12),%rdi
ffffffff8104338d: 00
ffffffff8104338e: 48 8b 15 d3 7f 76 00 mov 0x767fd3(%rip),%rdx # ffffffff817ab368
ffffffff81043395: 48 8b 34 c5 00 ee 6d mov -0x7e921200(,%rax,8),%rsi
ffffffff8104339c: 81
ffffffff8104339d: 48 c7 45 a0 00 00 00 movq $0x0,-0x60(%rbp)
ffffffff810433a4: 00
ffffffff810433a5: b9 ff ff ff ff mov $0xffffffff,%ecx
ffffffff810433aa: 48 89 7d c0 mov %rdi,-0x40(%rbp)
ffffffff810433ae: 48 c7 45 a8 00 00 00 movq $0x0,-0x58(%rbp)
ffffffff810433b5: 00
ffffffff810433b6: bb 00 04 00 00 mov $0x400,%ebx
ffffffff810433bb: 48 01 f2 add %rsi,%rdx
ffffffff810433be: 48 89 55 b0 mov %rdx,-0x50(%rbp)
...Signed-off-by: Jiri Kosina
Acked-by: Ingo Molnar
Signed-off-by: Tejun Heo
26 Oct, 2009
2 commits
-
CPU time of a guest is always accounted in 'user' time
without concern for the nice value of its counterpart
process although the guest is scheduled under the nice
value.This patch fixes the defect and accounts cpu time of
a niced guest in 'nice' time as same as a niced process.And also the patch adds 'guest_nice' to cpuacct. The
value provides niced guest cpu time which is like 'nice'
to 'user'.The original discussions can be found here:
http://www.mail-archive.com/kvm@vger.kernel.org/msg23982.html
http://www.mail-archive.com/kvm@vger.kernel.org/msg23860.htmlSigned-off-by: Ryota Ozaki
Acked-by: Avi Kivity
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Conflicts:
fs/proc/array.cMerge reason: resolve conflict and queue up dependent patch.
Signed-off-by: Ingo Molnar
24 Oct, 2009
1 commit
-
This patch restores the effectiveness of LAST_BUDDY in preventing
pgsql+oltp from collapsing due to wakeup preemption. It also
switches LAST_BUDDY to exclusively do what it does best, namely
mitigate the effects of aggressive wakeup preemption, which
improves vmark throughput markedly, and restores mysql+oltp
scalability.Since buddies are about scalability, enable them beginning at the
point where we begin expanding sched_latency, namely
sched_nr_latency. Previously, buddies were cleared aggressively,
which seriously reduced their effectiveness. Not clearing
aggressively however, produces a small drop in mysql+oltp
throughput immediately after peak, indicating that LAST_BUDDY is
actually doing some harm. This is right at the point where X on the
desktop in competition with another load wants low latency service.
Ergo, do not enable until we need to scale.To mitigate latency induced by buddies, or by a task just missing
wakeup preemption, check latency at tick time.Last hunk prevents buddies from stymieing BALANCE_NEWIDLE via
CACHE_HOT_BUDDY.Supporting performance tests:
tip = v2.6.32-rc5-1497-ga525b32
tipx = NO_GENTLE_FAIR_SLEEPERS NEXT_BUDDY granularity knobs = 31 knobs + 31 buddies
tip+x = NO_GENTLE_FAIR_SLEEPERS granularity knobs = 31 knobs(Three run averages except where noted.)
vmark:
------
tip 108466 messages per second
tip+ 125307 messages per second
tip+x 125335 messages per second
tipx 117781 messages per second
2.6.31.3 122729 messages per secondmysql+oltp:
-----------
clients 1 2 4 8 16 32 64 128 256
..........................................................................................
tip 9949.89 18690.20 34801.24 34460.04 32682.88 30765.97 28305.27 25059.64 19548.08
tip+ 10013.90 18526.84 34900.38 34420.14 33069.83 32083.40 30578.30 28010.71 25605.47
tipx 9698.71 18002.70 34477.56 33420.01 32634.30 31657.27 29932.67 26827.52 21487.18
2.6.31.3 8243.11 18784.20 34404.83 33148.38 31900.32 31161.90 29663.81 25995.94 18058.86pgsql+oltp:
-----------
clients 1 2 4 8 16 32 64 128 256
..........................................................................................
tip 13686.37 26609.25 51934.28 51347.81 49479.51 45312.65 36691.91 26851.57 24145.35
tip+ (1x) 13907.85 27135.87 52951.98 52514.04 51742.52 50705.43 49947.97 48374.19 46227.94
tip+x 13906.78 27065.81 52951.19 52542.59 52176.11 51815.94 50838.90 49439.46 46891.00
tipx 13742.46 26769.81 52351.99 51891.73 51320.79 50938.98 50248.65 48908.70 46553.84
2.6.31.3 13815.35 26906.46 52683.34 52061.31 51937.10 51376.80 50474.28 49394.47 47003.25Signed-off-by: Mike Galbraith
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
15 Oct, 2009
1 commit
-
…l/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix missing kernel-doc notation
Revert "x86, timers: Check for pending timers after (device) interrupts"
sched: Update the clock of runqueue select_task_rq() selected