08 Dec, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
    security/tomoyo: Remove now unnecessary handling of security_sysctl.
    security/tomoyo: Add a special case to handle accesses through the internal proc mount.
    sysctl: Drop & in front of every proc_handler.
    sysctl: Remove CTL_NONE and CTL_UNNUMBERED
    sysctl: kill dead ctl_handler definitions.
    sysctl: Remove the last of the generic binary sysctl support
    sysctl net: Remove unused binary sysctl code
    sysctl security/tomoyo: Don't look at ctl_name
    sysctl arm: Remove binary sysctl support
    sysctl x86: Remove dead binary sysctl support
    sysctl sh: Remove dead binary sysctl support
    sysctl powerpc: Remove dead binary sysctl support
    sysctl ia64: Remove dead binary sysctl support
    sysctl s390: Remove dead sysctl binary support
    sysctl frv: Remove dead binary sysctl support
    sysctl mips/lasat: Remove dead binary sysctl support
    sysctl drivers: Remove dead binary sysctl support
    sysctl crypto: Remove dead binary sysctl support
    sysctl security/keys: Remove dead binary sysctl support
    sysctl kernel: Remove binary sysctl logic
    ...

    Linus Torvalds
     

06 Dec, 2009

2 commits

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits)
    sched, cputime: Introduce thread_group_times()
    sched, cputime: Cleanups related to task_times()
    Revert "sched, x86: Optimize branch hint in __switch_to()"
    sched: Fix isolcpus boot option
    sched: Revert 498657a478c60be092208422fefa9c7b248729c2
    sched, time: Define nsecs_to_jiffies()
    sched: Remove task_{u,s,g}time()
    sched: Introduce task_times() to replace task_{u,s}time() pair
    sched: Limit the number of scheduler debug messages
    sched.c: Call debug_show_all_locks() when dumping all tasks
    sched, x86: Optimize branch hint in __switch_to()
    sched: Optimize branch hint in context_switch()
    sched: Optimize branch hint in pick_next_task_fair()
    sched_feat_write(): Update ppos instead of file->f_pos
    sched: Sched_rt_periodic_timer vs cpu hotplug
    sched, kvm: Fix race condition involving sched_in_preempt_notifers
    sched: More generic WAKE_AFFINE vs select_idle_sibling()
    sched: Cleanup select_task_rq_fair()
    sched: Fix granularity of task_u/stime()
    sched: Fix/add missing update_rq_clock() calls
    ...

    Linus Torvalds
     
  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
    rcu: Make RCU's CPU-stall detector be default
    rcu: Add expedited grace-period support for preemptible RCU
    rcu: Enable fourth level of TREE_RCU hierarchy
    rcu: Rename "quiet" functions
    rcu: Re-arrange code to reduce #ifdef pain
    rcu: Eliminate unneeded function wrapping
    rcu: Fix grace-period-stall bug on large systems with CPU hotplug
    rcu: Eliminate __rcu_pending() false positives
    rcu: Further cleanups of use of lastcomp
    rcu: Simplify association of forced quiescent states with grace periods
    rcu: Accelerate callback processing on CPUs not detecting GP end
    rcu: Mark init-time-only rcu_bootup_announce() as __init
    rcu: Simplify association of quiescent states with grace periods
    rcu: Rename dynticks_completed to completed_fqs
    rcu: Enable synchronize_sched_expedited() fastpath
    rcu: Remove inline from forward-referenced functions
    rcu: Fix note_new_gpnum() uses of ->gpnum
    rcu: Fix synchronization for rcu_process_gp_end() uses of ->completed counter
    rcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling of ->completed counter
    rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls
    ...

    Linus Torvalds
     

03 Dec, 2009

3 commits

  • We don't need to build mutex_spin_on_owner() if we have
    CONFIG_DEBUG_MUTEXES or CONFIG_HAVE_DEFAULT_NO_SPIN_MUTEXES as
    it won't be used under such configs.

    Use CONFIG_MUTEX_SPIN_ON_OWNER as it gathers all the necessary
    checks before building it.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Cc: Peter Zijlstra

    Frederic Weisbecker
     
  • This is a real fix for problem of utime/stime values decreasing
    described in the thread:

    http://lkml.org/lkml/2009/11/3/522

    Now cputime is accounted in the following way:

    - {u,s}time in task_struct are increased every time when the thread
    is interrupted by a tick (timer interrupt).

    - When a thread exits, its {u,s}time are added to signal->{u,s}time,
    after adjusted by task_times().

    - When all threads in a thread_group exits, accumulated {u,s}time
    (and also c{u,s}time) in signal struct are added to c{u,s}time
    in signal struct of the group's parent.

    So {u,s}time in task struct are "raw" tick count, while
    {u,s}time and c{u,s}time in signal struct are "adjusted" values.

    And accounted values are used by:

    - task_times(), to get cputime of a thread:
    This function returns adjusted values that originates from raw
    {u,s}time and scaled by sum_exec_runtime that accounted by CFS.

    - thread_group_cputime(), to get cputime of a thread group:
    This function returns sum of all {u,s}time of living threads in
    the group, plus {u,s}time in the signal struct that is sum of
    adjusted cputimes of all exited threads belonged to the group.

    The problem is the return value of thread_group_cputime(),
    because it is mixed sum of "raw" value and "adjusted" value:

    group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)

    This misbehavior can break {u,s}time monotonicity.
    Assume that if there is a thread that have raw values greater
    than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
    but only runs 45ms) and if it exits, cputime will decrease (e.g.
    -5ms).

    To fix this, we could do:

    group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)

    But task_times() contains hard divisions, so applying it for
    every thread should be avoided.

    This patch fixes the above problem in the following way:

    - Modify thread's exit (= __exit_signal()) not to use task_times().
    It means {u,s}time in signal struct accumulates raw values instead
    of adjusted values. As the result it makes thread_group_cputime()
    to return pure sum of "raw" values.

    - Introduce a new function thread_group_times(*task, *utime, *stime)
    that converts "raw" values of thread_group_cputime() to "adjusted"
    values, in same calculation procedure as task_times().

    - Modify group's exit (= wait_task_zombie()) to use this introduced
    thread_group_times(). It make c{u,s}time in signal struct to
    have adjusted values like before this patch.

    - Replace some thread_group_cputime() by thread_group_times().
    This replacements are only applied where conveys the "adjusted"
    cputime to users, and where already uses task_times() near by it.
    (i.e. sys_times(), getrusage(), and /proc//stat.)

    This patch have a positive side effect:

    - Before this patch, if a group contains many short-life threads
    (e.g. runs 0.9ms and not interrupted by ticks), the group's
    cputime could be invisible since thread's cputime was accumulated
    after adjusted: imagine adjustment function as adj(ticks, runtime),
    {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
    After this patch it will not happen because the adjustment is
    applied after accumulated.

    v2:
    - remove if()s, put new variables into signal_struct.

    Signed-off-by: Hidetoshi Seto
    Acked-by: Peter Zijlstra
    Cc: Spencer Candland
    Cc: Americo Wang
    Cc: Oleg Nesterov
    Cc: Balbir Singh
    Cc: Stanislaw Gruszka
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hidetoshi Seto
     
  • - Remove if({u,s}t)s because no one call it with NULL now.
    - Use cputime_{add,sub}().
    - Add ifndef-endif for prev_{u,s}time since they are used
    only when !VIRT_CPU_ACCOUNTING.

    Signed-off-by: Hidetoshi Seto
    Cc: Peter Zijlstra
    Cc: Spencer Candland
    Cc: Americo Wang
    Cc: Oleg Nesterov
    Cc: Balbir Singh
    Cc: Stanislaw Gruszka
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hidetoshi Seto
     

02 Dec, 2009

2 commits

  • Anton Blanchard wrote:

    > We allocate and zero cpu_isolated_map after the isolcpus
    > __setup option has run. This means cpu_isolated_map always
    > ends up empty and if CPUMASK_OFFSTACK is enabled we write to a
    > cpumask that hasn't been allocated.

    I introduced this regression in 49557e620339cb13 (sched: Fix
    boot crash by zalloc()ing most of the cpu masks).

    Use the bootmem allocator if they set isolcpus=, otherwise
    allocate and zero like normal.

    Reported-by: Anton Blanchard
    Signed-off-by: Rusty Russell
    Cc: peterz@infradead.org
    Cc: Linus Torvalds
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Tested-by: Anton Blanchard

    Rusty Russell
     
  • 498657a478c60be092208422fefa9c7b248729c2 incorrectly assumed
    that preempt wasn't disabled around context_switch() and thus
    was fixing imaginary problem. It also broke KVM because it
    depended on ->sched_in() to be called with irq enabled so that
    it can do smp calls from there.

    Revert the incorrect commit and add comment describing different
    contexts under with the two callbacks are invoked.

    Avi: spotted transposed in/out in the added comment.

    Signed-off-by: Tejun Heo
    Acked-by: Avi Kivity
    Cc: peterz@infradead.org
    Cc: efault@gmx.de
    Cc: rusty@rustcorp.com.au
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tejun Heo
     

26 Nov, 2009

5 commits

  • Use of msecs_to_jiffies() for nsecs_to_cputime() have some
    problems:

    - The type of msecs_to_jiffies()'s argument is unsigned int, so
    it cannot convert msecs greater than UINT_MAX = about 49.7 days.

    - msecs_to_jiffies() returns MAX_JIFFY_OFFSET if MSB of argument
    is set, assuming that input was negative value. So it cannot
    convert msecs greater than INT_MAX = about 24.8 days too.

    This patch defines a new function nsecs_to_jiffies() that can
    deal greater values, and that can deal all incoming values as
    unsigned.

    Signed-off-by: Hidetoshi Seto
    Acked-by: Peter Zijlstra
    Cc: Stanislaw Gruszka
    Cc: Spencer Candland
    Cc: Oleg Nesterov
    Cc: Balbir Singh
    Cc: Amrico Wang
    Cc: Thomas Gleixner
    Cc: John Stultz
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hidetoshi Seto
     
  • Now all task_{u,s}time() pairs are replaced by task_times().
    And task_gtime() is too simple to be an inline function.

    Cleanup them all.

    Signed-off-by: Hidetoshi Seto
    Acked-by: Peter Zijlstra
    Cc: Stanislaw Gruszka
    Cc: Spencer Candland
    Cc: Oleg Nesterov
    Cc: Balbir Singh
    Cc: Americo Wang
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hidetoshi Seto
     
  • Functions task_{u,s}time() are called in pair in almost all
    cases. However task_stime() is implemented to call task_utime()
    from its inside, so such paired calls run task_utime() twice.

    It means we do heavy divisions (div_u64 + do_div) twice to get
    utime and stime which can be obtained at same time by one set
    of divisions.

    This patch introduces a function task_times(*tsk, *utime,
    *stime) to retrieve utime and stime at once in better, optimized
    way.

    Signed-off-by: Hidetoshi Seto
    Acked-by: Peter Zijlstra
    Cc: Stanislaw Gruszka
    Cc: Spencer Candland
    Cc: Oleg Nesterov
    Cc: Balbir Singh
    Cc: Americo Wang
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hidetoshi Seto
     
  • Merge reason: Pick up fixes that did not make it into .32.0

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Remove the verbose scheduler debug messages unless kernel
    parameter "sched_debug" set. /proc/sched_debug unchanged.

    Signed-off-by: Mike Travis
    Cc: Heiko Carstens
    Cc: Roland Dreier
    Cc: Randy Dunlap
    Cc: Tejun Heo
    Cc: Andi Kleen
    Cc: Greg Kroah-Hartman
    Cc: Yinghai Lu
    Cc: David Rientjes
    Cc: Steven Rostedt
    Cc: Rusty Russell
    Cc: Hidetoshi Seto
    Cc: Jack Steiner
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Travis
     

25 Nov, 2009

1 commit

  • In commit v2.6.21-691-g39bc89f ("make SysRq-T show all tasks
    again") the interface of show_state_filter() was changed: zero
    valued 'state_filter' specifies "dump all tasks" (instead of -1).

    However, the condition for calling debug_show_all_locks() ("show
    locks if all tasks are dumped") was not updated accordingly.

    Signed-off-by: Shmulik Ladkani
    Cc: peterz@infradead.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Shmulik Ladkani
     

24 Nov, 2009

2 commits

  • Branch hint profiling on my nehalem machine showed over 90%
    incorrect branch hints:

    10420275 170645395 94 context_switch sched.c
    3043
    10408421 171098521 94 context_switch sched.c
    3050

    Signed-off-by: Tim Blechmann
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tim Blechmann
     
  • sched_feat_write() should update ppos instead of file->f_pos.

    (This reduces some BKL dependencies of this code.)

    Signed-off-by: Jan Blunck
    Cc: jkacur@redhat.com
    Cc: Arnd Bergmann
    Cc: Frederic Weisbecker
    Cc: Jamie Lokier
    Cc: Peter Zijlstra
    Cc: Christoph Hellwig
    Cc: Alan Cox
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jan Blunck
     

17 Nov, 2009

1 commit


16 Nov, 2009

1 commit

  • Heiko reported a case where a timer interrupt managed to
    reference a root_domain structure that was already freed by a
    concurrent hot-un-plug operation.

    Solve this like the regular sched_domain stuff is also
    synchronized, by adding a synchronize_sched() stmt to the free
    path, this ensures that a root_domain stays present for any
    atomic section that could have observed it.

    Reported-by: Heiko Carstens
    Signed-off-by: Peter Zijlstra
    Acked-by: Heiko Carstens
    Cc: Gregory Haskins
    Cc: Siddha Suresh B
    Cc: Martin Schwidefsky
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

15 Nov, 2009

1 commit

  • In finish_task_switch(), fire_sched_in_preempt_notifiers() is
    called after finish_lock_switch().

    However, depending on architecture, preemption can be enabled after
    finish_lock_switch() which breaks the semantics of preempt
    notifiers.

    So move it before finish_arch_switch(). This also makes the in-
    notifiers symmetric to out- notifiers in terms of locking - now
    both are called under rq lock.

    Signed-off-by: Tejun Heo
    Acked-by: Avi Kivity
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tejun Heo
     

12 Nov, 2009

3 commits

  • Originally task_s/utime() were designed to return clock_t but
    later changed to return cputime_t by following commit:

    commit efe567fc8281661524ffa75477a7c4ca9b466c63
    Author: Christian Borntraeger
    Date: Thu Aug 23 15:18:02 2007 +0200

    It only changed the type of return value, but not the
    implementation. As the result the granularity of task_s/utime()
    is still that of clock_t, not that of cputime_t.

    So using task_s/utime() in __exit_signal() makes values
    accumulated to the signal struct to be rounded and coarse
    grained.

    This patch removes casts to clock_t in task_u/stime(), to keep
    granularity of cputime_t over the calculation.

    v2:
    Use div_u64() to avoid error "undefined reference to `__udivdi3`"
    on some 32bit systems.

    Signed-off-by: Hidetoshi Seto
    Acked-by: Peter Zijlstra
    Cc: xiyou.wangcong@gmail.com
    Cc: Spencer Candland
    Cc: Oleg Nesterov
    Cc: Stanislaw Gruszka
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hidetoshi Seto
     
  • kthread_bind(), migrate_task() and sched_fork were missing
    updates, and try_to_wake_up() was updating after having already
    used the stale clock.

    Aside from preventing potential latency hits, there' a side
    benefit in that early boot printk time stamps become monotonic.

    Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    LKML-Reference:

    Mike Galbraith
     
  • Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name
    and .strategy members of sysctl tables are dead code. Remove them.

    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: David Howells
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

11 Nov, 2009

2 commits

  • This patch adds a counter increment to enable tasks to actually
    take the synchronize_sched_expedited() function's fastpath.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • From the code in rt_mutex_setprio(), it is evident that the
    intention is that task's with a RT 'prio' value as a consequence
    of receiving a PI boost also have their 'sched_class' field set
    to '&rt_sched_class'.

    However, Peter noticed that the code in __setscheduler() could
    result in this intention being frustrated. Fix it.

    Reported-by: Peter Williams
    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

10 Nov, 2009

1 commit

  • Commit 1b9508f, "Rate-limit newidle" has been confirmed to fix
    the netperf UDP loopback regression reported by Alex Shi.

    This is a cleanup and a fix:

    - moved to a more out of the way spot

    - fix to ensure that balancing doesn't try to balance
    runqueues which haven't gone online yet, which can
    mess up CPU enumeration during boot.

    Reported-by: Alex Shi
    Reported-by: Zhang, Yanmin
    Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    Cc: # .32.x: a1f84a3: sched: Check for an idle shared cache
    Cc: # .32.x: 1b9508f: sched: Rate-limit newidle
    Cc: # .32.x: fd21073: sched: Fix affinity logic
    Cc: # .32.x
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     

08 Nov, 2009

3 commits


06 Nov, 2009

1 commit


05 Nov, 2009

1 commit


04 Nov, 2009

2 commits

  • Currently partition_sched_domains() takes a 'struct cpumask
    *doms_new' which is a kmalloc'ed array of cpumask_t. You can't
    have such an array if 'struct cpumask' is undefined, as we plan
    for CONFIG_CPUMASK_OFFSTACK=y.

    So, we make this an array of cpumask_var_t instead: this is the
    same for the CONFIG_CPUMASK_OFFSTACK=n case, but requires
    multiple allocations for the CONFIG_CPUMASK_OFFSTACK=y case.
    Hence we add alloc_sched_domains() and free_sched_domains()
    functions.

    Signed-off-by: Rusty Russell
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Rusty Russell
     
  • cpu_nr_migrations() is not used, remove it.

    Signed-off-by: Hiroshi Shimamoto
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hiroshi Shimamoto
     

03 Nov, 2009

1 commit

  • Eric Paris reported that commit
    f685ceacab07d3f6c236f04803e2f2f0dbcc5afb causes boot time
    PREEMPT_DEBUG complaints.

    [ 4.590699] BUG: using smp_processor_id() in preemptible [00000000] code: rmmod/1314
    [ 4.593043] caller is task_hot+0x86/0xd0

    Since kthread_bind() messes with scheduler internals, move the
    body to sched.c, and lock the runqueue.

    Reported-by: Eric Paris
    Signed-off-by: Mike Galbraith
    Tested-by: Eric Paris
    Cc: Peter Zijlstra
    LKML-Reference:
    [ v2: fix !SMP build and clean up ]
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     

02 Nov, 2009

1 commit

  • I got a boot crash when forcing cpumasks offstack on 32 bit,
    because find_new_ilb() returned 3 on my UP system (nohz.cpu_mask
    wasn't zeroed).

    AFAICT the others need to be zeroed too: only
    nohz.ilb_grp_nohz_mask is initialized before use.

    Signed-off-by: Rusty Russell
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Rusty Russell
     

30 Oct, 2009

1 commit


28 Oct, 2009

1 commit

  • Commit 34d76c41 introduced percpu array update_shares_data, size of which
    being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
    NR_CPUS configuration, as ia64 allows only 64k for .percpu section.

    Fix this by allocating this array dynamically and keep only pointer to it
    percpu.

    The per-cpu handling doesn't impose significant performance penalty on
    potentially contented path in tg_shares_up().

    ...
    ffffffff8104337c: 65 48 8b 14 25 20 cd mov %gs:0xcd20,%rdx
    ffffffff81043383: 00 00
    ffffffff81043385: 48 c7 c0 00 e1 00 00 mov $0xe100,%rax
    ffffffff8104338c: 48 c7 45 a0 00 00 00 movq $0x0,-0x60(%rbp)
    ffffffff81043393: 00
    ffffffff81043394: 48 c7 45 a8 00 00 00 movq $0x0,-0x58(%rbp)
    ffffffff8104339b: 00
    ffffffff8104339c: 48 01 d0 add %rdx,%rax
    ffffffff8104339f: 49 8d 94 24 08 01 00 lea 0x108(%r12),%rdx
    ffffffff810433a6: 00
    ffffffff810433a7: b9 ff ff ff ff mov $0xffffffff,%ecx
    ffffffff810433ac: 48 89 45 b0 mov %rax,-0x50(%rbp)
    ffffffff810433b0: bb 00 04 00 00 mov $0x400,%ebx
    ffffffff810433b5: 48 89 55 c0 mov %rdx,-0x40(%rbp)
    ...

    After:

    ...
    ffffffff8104337c: 65 8b 04 25 28 cd 00 mov %gs:0xcd28,%eax
    ffffffff81043383: 00
    ffffffff81043384: 48 98 cltq
    ffffffff81043386: 49 8d bc 24 08 01 00 lea 0x108(%r12),%rdi
    ffffffff8104338d: 00
    ffffffff8104338e: 48 8b 15 d3 7f 76 00 mov 0x767fd3(%rip),%rdx # ffffffff817ab368
    ffffffff81043395: 48 8b 34 c5 00 ee 6d mov -0x7e921200(,%rax,8),%rsi
    ffffffff8104339c: 81
    ffffffff8104339d: 48 c7 45 a0 00 00 00 movq $0x0,-0x60(%rbp)
    ffffffff810433a4: 00
    ffffffff810433a5: b9 ff ff ff ff mov $0xffffffff,%ecx
    ffffffff810433aa: 48 89 7d c0 mov %rdi,-0x40(%rbp)
    ffffffff810433ae: 48 c7 45 a8 00 00 00 movq $0x0,-0x58(%rbp)
    ffffffff810433b5: 00
    ffffffff810433b6: bb 00 04 00 00 mov $0x400,%ebx
    ffffffff810433bb: 48 01 f2 add %rsi,%rdx
    ffffffff810433be: 48 89 55 b0 mov %rdx,-0x50(%rbp)
    ...

    Signed-off-by: Jiri Kosina
    Acked-by: Ingo Molnar
    Signed-off-by: Tejun Heo

    Jiri Kosina
     

26 Oct, 2009

2 commits

  • CPU time of a guest is always accounted in 'user' time
    without concern for the nice value of its counterpart
    process although the guest is scheduled under the nice
    value.

    This patch fixes the defect and accounts cpu time of
    a niced guest in 'nice' time as same as a niced process.

    And also the patch adds 'guest_nice' to cpuacct. The
    value provides niced guest cpu time which is like 'nice'
    to 'user'.

    The original discussions can be found here:

    http://www.mail-archive.com/kvm@vger.kernel.org/msg23982.html
    http://www.mail-archive.com/kvm@vger.kernel.org/msg23860.html

    Signed-off-by: Ryota Ozaki
    Acked-by: Avi Kivity
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ryota Ozaki
     
  • Conflicts:
    fs/proc/array.c

    Merge reason: resolve conflict and queue up dependent patch.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

24 Oct, 2009

1 commit

  • This patch restores the effectiveness of LAST_BUDDY in preventing
    pgsql+oltp from collapsing due to wakeup preemption. It also
    switches LAST_BUDDY to exclusively do what it does best, namely
    mitigate the effects of aggressive wakeup preemption, which
    improves vmark throughput markedly, and restores mysql+oltp
    scalability.

    Since buddies are about scalability, enable them beginning at the
    point where we begin expanding sched_latency, namely
    sched_nr_latency. Previously, buddies were cleared aggressively,
    which seriously reduced their effectiveness. Not clearing
    aggressively however, produces a small drop in mysql+oltp
    throughput immediately after peak, indicating that LAST_BUDDY is
    actually doing some harm. This is right at the point where X on the
    desktop in competition with another load wants low latency service.
    Ergo, do not enable until we need to scale.

    To mitigate latency induced by buddies, or by a task just missing
    wakeup preemption, check latency at tick time.

    Last hunk prevents buddies from stymieing BALANCE_NEWIDLE via
    CACHE_HOT_BUDDY.

    Supporting performance tests:

    tip = v2.6.32-rc5-1497-ga525b32
    tipx = NO_GENTLE_FAIR_SLEEPERS NEXT_BUDDY granularity knobs = 31 knobs + 31 buddies
    tip+x = NO_GENTLE_FAIR_SLEEPERS granularity knobs = 31 knobs

    (Three run averages except where noted.)

    vmark:
    ------
    tip 108466 messages per second
    tip+ 125307 messages per second
    tip+x 125335 messages per second
    tipx 117781 messages per second
    2.6.31.3 122729 messages per second

    mysql+oltp:
    -----------
    clients 1 2 4 8 16 32 64 128 256
    ..........................................................................................
    tip 9949.89 18690.20 34801.24 34460.04 32682.88 30765.97 28305.27 25059.64 19548.08
    tip+ 10013.90 18526.84 34900.38 34420.14 33069.83 32083.40 30578.30 28010.71 25605.47
    tipx 9698.71 18002.70 34477.56 33420.01 32634.30 31657.27 29932.67 26827.52 21487.18
    2.6.31.3 8243.11 18784.20 34404.83 33148.38 31900.32 31161.90 29663.81 25995.94 18058.86

    pgsql+oltp:
    -----------
    clients 1 2 4 8 16 32 64 128 256
    ..........................................................................................
    tip 13686.37 26609.25 51934.28 51347.81 49479.51 45312.65 36691.91 26851.57 24145.35
    tip+ (1x) 13907.85 27135.87 52951.98 52514.04 51742.52 50705.43 49947.97 48374.19 46227.94
    tip+x 13906.78 27065.81 52951.19 52542.59 52176.11 51815.94 50838.90 49439.46 46891.00
    tipx 13742.46 26769.81 52351.99 51891.73 51320.79 50938.98 50248.65 48908.70 46553.84
    2.6.31.3 13815.35 26906.46 52683.34 52061.31 51937.10 51376.80 50474.28 49394.47 47003.25

    Signed-off-by: Mike Galbraith
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     

15 Oct, 2009

1 commit