21 Jun, 2013

1 commit


19 Jun, 2013

1 commit

  • Dave Jones hit the following bug report:

    ===============================
    [ INFO: suspicious RCU usage. ]
    3.10.0-rc2+ #1 Not tainted
    -------------------------------
    include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
    other info that might help us debug this:
    RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
    RCU used illegally from extended quiescent state!
    2 locks held by cc1/63645:
    #0: (&rq->lock){-.-.-.}, at: [] __schedule+0xed/0x9b0
    #1: (rcu_read_lock){.+.+..}, at: [] cpuacct_charge+0x5/0x1f0

    CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
    Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
    0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
    ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
    0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
    Call Trace:
    [] dump_stack+0x19/0x1b
    [] lockdep_rcu_suspicious+0xfd/0x130
    [] cpuacct_charge+0x185/0x1f0
    [] ? cpuacct_charge+0x5/0x1f0
    [] update_curr+0xec/0x240
    [] put_prev_task_fair+0x228/0x480
    [] __schedule+0x161/0x9b0
    [] preempt_schedule+0x51/0x80
    [] ? __cond_resched_softirq+0x60/0x60
    [] ? retint_careful+0x12/0x2e
    [] ftrace_ops_control_func+0x1dc/0x210
    [] ftrace_call+0x5/0x2f
    [] ? retint_careful+0xb/0x2e
    [] ? schedule_user+0x5/0x70
    [] ? schedule_user+0x5/0x70
    [] ? retint_careful+0x12/0x2e
    ------------[ cut here ]------------

    What happened was that the function tracer traced the schedule_user() code
    that tells RCU that the system is coming back from userspace, and to
    add the CPU back to the RCU monitoring.

    Because the function tracer does a preempt_disable/enable_notrace() calls
    the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
    then preempt_schedule() is called. But this is called before the user_exit()
    function can inform the kernel that the CPU is no longer in user mode and
    needs to be accounted for by RCU.

    The fix is to create a new preempt_schedule_context() that checks if
    the kernel is still in user mode and if so to switch it to kernel mode
    before calling schedule. It also switches back to user mode coming back
    from schedule in need be.

    The only user of this currently is the preempt_enable_notrace(), which is
    only used by the tracing subsystem.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.home
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

31 May, 2013

1 commit

  • The kvm_host.h header file doesn't handle well
    inclusion when archs don't support KVM.

    This results in build crashes for such archs when they
    want to implement context tracking because this subsystem
    includes kvm_host.h in order to implement the
    guest_enter/exit APIs but it doesn't handle KVM off case.

    To fix this, move the guest_enter()/guest_exit()
    declarations and generic implementation to the context
    tracking headers. These generic APIs actually belong to
    this subsystem, besides other domains boundary tracking
    like user_enter() et al.

    KVM now properly becomes a user of this library, not the
    other buggy way around.

    Reported-by: Kevin Hilman
    Reviewed-by: Kevin Hilman
    Tested-by: Kevin Hilman
    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Kevin Hilman
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

20 Feb, 2013

1 commit

  • Pull scheduler changes from Ingo Molnar:
    "Main changes:

    - scheduler side full-dynticks (user-space execution is undisturbed
    and receives no timer IRQs) preparation changes that convert the
    cputime accounting code to be full-dynticks ready, from Frederic
    Weisbecker.

    - Initial sched.h split-up changes, by Clark Williams

    - select_idle_sibling() performance improvement by Mike Galbraith:

    " 1 tbench pair (worst case) in a 10 core + SMT package:

    pre 15.22 MB/sec 1 procs
    post 252.01 MB/sec 1 procs "

    - sched_rr_get_interval() ABI fix/change. We think this detail is not
    used by apps (so it's not an ABI in practice), but lets keep it
    under observation.

    - misc RT scheduling cleanups, optimizations"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    sched/rt: Add header to
    cputime: Remove irqsave from seqlock readers
    sched, powerpc: Fix sched.h split-up build failure
    cputime: Restore CPU_ACCOUNTING config defaults for PPC64
    sched/rt: Move rt specific bits into new header file
    sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice
    sched: Move sched.h sysctl bits into separate header
    sched: Fix signedness bug in yield_to()
    sched: Fix select_idle_sibling() bouncing cow syndrome
    sched/rt: Further simplify pick_rt_task()
    sched/rt: Do not account zero delta_exec in update_curr_rt()
    cputime: Safely read cputime of full dynticks CPUs
    kvm: Prepare to add generic guest entry/exit callbacks
    cputime: Use accessors to read task cputime stats
    cputime: Allow dynamic switch between tick/virtual based cputime accounting
    cputime: Generic on-demand virtual cputime accounting
    cputime: Move default nsecs_to_cputime() to jiffies based cputime file
    cputime: Librarize per nsecs resolution cputime definitions
    cputime: Avoid multiplication overflow on utime scaling
    context_tracking: Export context state for generic vtime
    ...

    Fix up conflict in kernel/context_tracking.c due to comment additions.

    Linus Torvalds
     

28 Jan, 2013

2 commits

  • While remotely reading the cputime of a task running in a
    full dynticks CPU, the values stored in utime/stime fields
    of struct task_struct may be stale. Its values may be those
    of the last kernel user transition time snapshot and
    we need to add the tickless time spent since this snapshot.

    To fix this, flush the cputime of the dynticks CPUs on
    kernel user transition and record the time / context
    where we did this. Then on top of this snapshot and the current
    time, perform the fixup on the reader side from task_times()
    accessors.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    [fixed kvm module related build errors]
    Signed-off-by: Sedat Dilek

    Frederic Weisbecker
     
  • If we want to stop the tick further idle, we need to be
    able to account the cputime without using the tick.

    Virtual based cputime accounting solves that problem by
    hooking into kernel/user boundaries.

    However implementing CONFIG_VIRT_CPU_ACCOUNTING require
    low level hooks and involves more overhead. But we already
    have a generic context tracking subsystem that is required
    for RCU needs by archs which plan to shut down the tick
    outside idle.

    This patch implements a generic virtual based cputime
    accounting that relies on these generic kernel/user hooks.

    There are some upsides of doing this:

    - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
    if context tracking is already built (already necessary for RCU in full
    tickless mode).

    - We can rely on the generic context tracking subsystem to dynamically
    (de)activate the hooks, so that we can switch anytime between virtual
    and tick based accounting. This way we don't have the overhead
    of the virtual accounting when the tick is running periodically.

    And one downside:

    - There is probably more overhead than a native virtual based cputime
    accounting. But this relies on hooks that are already set anyway.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

27 Jan, 2013

2 commits

  • This subsystem lacks many explanations on its purpose and
    design. Add these missing comments.

    v4: Document function parameter to be more kernel-doc
    friendly, as per Namhyung suggestion.

    Reported-by: Andrew Morton
    Signed-off-by: Frederic Weisbecker
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Signed-off-by: Paul E. McKenney

    Frederic Weisbecker
     
  • Export the context state: whether we run in user / kernel
    from the context tracking subsystem point of view.

    This is going to be used by the generic virtual cputime
    accounting subsystem that is needed to implement the full
    dynticks.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

01 Dec, 2012

1 commit

  • Create a new subsystem that probes on kernel boundaries
    to keep track of the transitions between level contexts
    with two basic initial contexts: user or kernel.

    This is an abstraction of some RCU code that use such tracking
    to implement its userspace extended quiescent state.

    We need to pull this up from RCU into this new level of indirection
    because this tracking is also going to be used to implement an "on
    demand" generic virtual cputime accounting. A necessary step to
    shutdown the tick while still accounting the cputime.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Li Zhong
    Cc: Gilad Ben-Yossef
    Reviewed-by: Steven Rostedt
    [ paulmck: fix whitespace error and email address. ]
    Signed-off-by: Paul E. McKenney

    Frederic Weisbecker