24 Nov, 2015

1 commit

  • This is much less error-prone than the old code.

    Signed-off-by: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/812df7e64f120c5c7c08481f36a8caa9f53b2199.1447361906.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

10 Nov, 2015

2 commits

  • guest_enter and guest_exit must be called with interrupts disabled,
    since they take the vtime_seqlock with write_seq{lock,unlock}.
    Therefore, it is not necessary to check for exceptions, nor to
    save/restore the IRQ state, when context tracking functions are
    called by guest_enter and guest_exit.

    Split the body of context_tracking_entry and context_tracking_exit
    out to __-prefixed functions, and use them from KVM.

    Rik van Riel has measured this to speed up a tight vmentry/vmexit
    loop by about 2%.

    Cc: Andy Lutomirski
    Cc: Frederic Weisbecker
    Cc: Paul McKenney
    Reviewed-by: Rik van Riel
    Tested-by: Rik van Riel
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • All calls to context_tracking_enter and context_tracking_exit
    are already checking context_tracking_is_enabled, except the
    context_tracking_user_enter and context_tracking_user_exit
    functions left in for the benefit of assembly calls.

    Pull the check up to those functions, by making them simple
    wrappers around the user_enter and user_exit inline functions.

    Cc: Frederic Weisbecker
    Cc: Paul McKenney
    Reviewed-by: Rik van Riel
    Tested-by: Rik van Riel
    Acked-by: Andy Lutomirski
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

07 May, 2015

2 commits

  • TIF_NOHZ is used by context_tracking to force syscall slow-path
    on every task in order to track userspace roundtrips. As such,
    it must be set on all running tasks.

    It's currently explicitly inherited through context switches.
    There is no need to do it in this fast-path though. The flag
    could simply be set once for all on all tasks, whether they are
    running or not.

    Lets do this by setting the flag for the init task on early boot,
    and let it propagate through fork inheritance.

    While at it, mark context_tracking_cpu_set() as init code, we
    only need it at early boot time.

    Suggested-by: Oleg Nesterov
    Signed-off-by: Frederic Weisbecker
    Reviewed-by: Rik van Riel
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Dave Jones
    Cc: H. Peter Anvin
    Cc: Martin Schwidefsky
    Cc: Mike Galbraith
    Cc: Paul E . McKenney
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rafael J . Wysocki
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1430928266-24888-3-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Context tracking recursion can happen when an exception triggers
    in the middle of a call to a context tracking probe.

    This special case can be caused by vmalloc faults. If an access
    to a memory area allocated by vmalloc happens in the middle of
    context_tracking_enter(), we may run into an endless fault loop
    because the exception in turn calls context_tracking_enter()
    which faults on the same vmalloc'ed memory, triggering an
    exception again, etc...

    Some rare crashes have been reported so lets protect against
    this with a recursion counter.

    Reported-by: Dave Jones
    Signed-off-by: Frederic Weisbecker
    Reviewed-by: Rik van Riel
    Acked-by: Peter Zijlstra (Intel)
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: H. Peter Anvin
    Cc: Martin Schwidefsky
    Cc: Mike Galbraith
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Rafael J . Wysocki
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1430928266-24888-2-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

09 Mar, 2015

4 commits

  • Export context_tracking_user_enter/exit so it can be used by KVM.

    Reviewed-by: Paul E. McKenney
    Signed-off-by: Rik van Riel
    Cc: Paul E. McKenney
    Cc: Andy Lutomirski
    Cc: Will deacon
    Cc: Marcelo Tosatti
    Cc: Christian Borntraeger
    Cc: Luiz Capitulino
    Cc: Paolo Bonzini
    Signed-off-by: Frederic Weisbecker

    Rik van Riel
     
  • Only run vtime_user_enter, vtime_user_exit, and the user enter & exit
    trace points when we are entering or exiting user state, respectively.

    The KVM code in guest_enter and guest_exit already take care of calling
    vtime_guest_enter and vtime_guest_exit, respectively.

    The RCU code only distinguishes between "idle" and "not idle or kernel".
    There should be no need to add an additional (unused) state there.

    Reviewed-by: Paul E. McKenney
    Signed-off-by: Rik van Riel
    Cc: Paul E. McKenney
    Cc: Andy Lutomirski
    Cc: Will deacon
    Cc: Marcelo Tosatti
    Cc: Christian Borntraeger
    Cc: Luiz Capitulino
    Cc: Paolo Bonzini
    Signed-off-by: Frederic Weisbecker

    Rik van Riel
     
  • Generalize the context tracking APIs to support various nature of
    contexts. This is performed by splitting out the mechanism from
    context_tracking_user_enter and context_tracking_user_exit into
    context_tracking_enter and context_tracking_exit.

    The nature of the context we track is now detailed in a ctx_state
    parameter pushed to these APIs, allowing the same functions to not just
    track kernel <> user space switching, but also kernel <> guest transitions.

    But leave the old functions in order to avoid breaking ARM, which calls
    these functions from assembler code, and cannot easily use C enum
    parameters.

    Reviewed-by: Paul E. McKenney
    Signed-off-by: Rik van Riel
    Cc: Paul E. McKenney
    Cc: Andy Lutomirski
    Cc: Will deacon
    Cc: Marcelo Tosatti
    Cc: Christian Borntraeger
    Cc: Luiz Capitulino
    Cc: Paolo Bonzini
    Signed-off-by: Frederic Weisbecker

    Rik van Riel
     
  • Current context tracking symbols are designed to express living state.
    As such they are prefixed with "IN_": IN_USER, IN_KERNEL.

    Now we are going to use these symbols to also express state transitions
    such as context_tracking_enter(IN_USER) or context_tracking_exit(IN_USER).
    But while the "IN_" prefix works well to express entering a context, it's
    confusing to depict a context exit: context_tracking_exit(IN_USER)
    could mean two things:
    1) We are exiting the current context to enter user context.
    2) We are exiting the user context
    We want 2) but the reviewer may be confused and understand 1)

    So lets disambiguate these symbols and rename them to CONTEXT_USER and
    CONTEXT_KERNEL.

    Acked-by: Rik van Riel
    Cc: Paul E. McKenney
    Cc: Andy Lutomirski
    Cc: Will deacon
    Cc: Marcelo Tosatti
    Cc: Christian Borntraeger
    Cc: Luiz Capitulino
    Cc: Paolo Bonzini
    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     

28 Oct, 2014

1 commit

  • preempt_schedule_context() does preempt_enable_notrace() at the end
    and this can call the same function again; exception_exit() is heavy
    and it is quite possible that need-resched is true again.

    1. Change this code to dec preempt_count() and check need_resched()
    by hand.

    2. As Linus suggested, we can use the PREEMPT_ACTIVE bit and avoid
    the enable/disable dance around __schedule(). But in this case
    we need to move into sched/core.c.

    3. Cosmetic, but x86 forgets to declare this function. This doesn't
    really matter because it is only called by asm helpers, still it
    make sense to add the declaration into asm/preempt.h to match
    preempt_schedule().

    Reported-by: Sasha Levin
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Graf
    Cc: Andrew Morton
    Cc: Christoph Lameter
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Steven Rostedt
    Cc: Peter Anvin
    Cc: Andy Lutomirski
    Cc: Denys Vlasenko
    Cc: Chuck Ebbert
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/20141005202322.GB27962@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

14 Jun, 2014

1 commit

  • This essentially reverts commit:

    ecd50f714c42 ("kprobes, x86: Call exception_enter after kprobes handled")

    since it causes build errors with CONFIG_CONTEXT_TRACKING and
    that has been made from misunderstandings;
    context_track_user_*() don't involve much in interrupt context,
    it just returns if in_interrupt() is true.

    Instead of changing the do_debug/int3(), this just adds
    context_track_user_*() to kprobes blacklist, since those are
    still can be called right before kprobes handles int3 and debug
    exceptions, and probing those will cause an infinite loop.

    Reported-by: Frederic Weisbecker
    Signed-off-by: Masami Hiramatsu
    Cc: Borislav Petkov
    Cc: Kees Cook
    Cc: Jiri Kosina
    Cc: Rusty Russell
    Cc: Steven Rostedt
    Cc: Seiji Aguchi
    Cc: Andrew Morton
    Cc: Kees Cook
    Link: http://lkml.kernel.org/r/20140614064711.7865.45957.stgit@kbuild-fedora.novalocal
    Signed-off-by: Ingo Molnar

    Masami Hiramatsu
     

06 May, 2014

1 commit


03 Dec, 2013

1 commit


09 Oct, 2013

1 commit


27 Sep, 2013

1 commit

  • ad65782fba50 (context_tracking: Optimize main APIs off case
    with static key) converted context tracking main APIs to inline
    function and left ARM asm callers behind.

    This can be easily fixed by making ARM calling the post static
    keys context tracking function. We just need to replicate the
    static key checks there. We'll remove these later when ARM will
    support the context tracking static keys.

    Reported-by: Guenter Roeck
    Reported-by: Russell King
    Signed-off-by: Frederic Weisbecker
    Tested-by: Kevin Hilman
    Cc: Nicolas Pitre
    Cc: Anil Kumar
    Cc: Tony Lindgren
    Cc: Benoit Cousson
    Cc: Guenter Roeck
    Cc: Russell King
    Cc: Kevin Hilman

    Frederic Weisbecker
     

25 Sep, 2013

1 commit

  • Rewrite the preempt_count macros in order to extract the 3 basic
    preempt_count value modifiers:

    __preempt_count_add()
    __preempt_count_sub()

    and the new:

    __preempt_count_dec_and_test()

    And since we're at it anyway, replace the unconventional
    $op_preempt_count names with the more conventional preempt_count_$op.

    Since these basic operators are equivalent to the previous _notrace()
    variants, do away with the _notrace() versions.

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-ewbpdbupy9xpsjhg960zwbv8@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

14 Aug, 2013

5 commits

  • This can be useful to track all kernel/user round trips.
    And it's also helpful to debug the context tracking subsystem.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     
  • No need for syscall slowpath if no CPU is full dynticks,
    rather nop this in this case.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     
  • Optimize guest entry/exit APIs with static keys. This minimize
    the overhead for those who enable CONFIG_NO_HZ_FULL without
    always using it. Having no range passed to nohz_full= should
    result in the probes overhead to be minimized.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     
  • Optimize user and exception entry/exit APIs with static
    keys. This minimize the overhead for those who enable
    CONFIG_NO_HZ_FULL without always using it. Having no range
    passed to nohz_full= should result in the probes to be nopped
    (at least we hope so...).

    If this proves not be enough in the long term, we'll need
    to bring an exception slow path by re-routing the exception
    handlers.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     
  • Prepare for using a static key in the context tracking subsystem.
    This will help optimizing the off case on its many users:

    * user_enter, user_exit, exception_enter, exception_exit, guest_enter,
    guest_exit, vtime_*()

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     

13 Aug, 2013

4 commits

  • The context tracking subsystem has the ability to selectively
    enable the tracking on any defined subset of CPU. This means that
    we can define a CPU range that doesn't run the context tracking
    and another range that does.

    Now what we want in practice is to enable the tracking on full
    dynticks CPUs only. In order to perform this, we just need to pass
    our full dynticks CPU range selection from the full dynticks
    subsystem to the context tracking.

    This way we can spare the overhead of RCU user extended quiescent
    state and vtime maintainance on the CPUs that are outside the
    full dynticks range. Just keep in mind the raw context tracking
    itself is still necessary everywhere.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     
  • As long as the context tracking is enabled on any CPU, even
    a single one, all other CPUs need to keep track of their
    user kernel boundaries cross as well.

    This is because a task can sleep while servicing an exception
    that happened in the kernel or in userspace. Then when the task
    eventually wakes up and return from the exception, the CPU needs
    to know if we resume in userspace or in the kernel. exception_exit()
    get this information from exception_enter() that saved the previous
    state.

    If the CPU where the exception happened didn't keep track of
    these informations, exception_exit() doesn't know which state
    tracking to restore on the CPU where the task got migrated
    and we may return to userspace with the context tracking
    subsystem thinking that we are in kernel mode.

    This can be fixed in the long term if we move our context tracking
    probes on very low level arch fast path user kernel boundary,
    although even that is worrisome as an exception can still happen
    in the few instructions between the probe and the actual iret.

    Also we are not yet ready to set these probes in the fast path given
    the potential overhead problem it induces.

    So let's fix this by always enable context tracking even on CPUs
    that are not in the full dynticks range. OTOH we can spare the
    rcu_user_*() and vtime_user_*() calls there because the tick runs
    on these CPUs and we can handle RCU state machine and cputime
    accounting through it.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     
  • 1) If context tracking is enabled with native vtime accounting (which
    combo is useless except for dev testing), we call vtime_guest_enter()
    and vtime_guest_exit() on host guest switches. But those are stubs
    in this configurations. As a result, cputime is not correctly flushed
    on kvm context switches.

    2) If context tracking runs but is disabled on some CPUs, those
    CPUs end up calling __guest_enter/__guest_exit which in turn
    call vtime_account_system(). We don't want to call this because we
    run in tick based accounting for these CPUs.

    Refactor the guest_enter/guest_exit code such that all combinations
    finally work.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     
  • preempt_schedule() and preempt_schedule_context() open
    code their preemptability checks.

    Use the standard API instead for consolidation.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Borislav Petkov
    Cc: Alex Shi
    Cc: Paul Turner
    Cc: Mike Galbraith
    Cc: Vincent Guittot

    Frederic Weisbecker
     

21 Jun, 2013

1 commit


19 Jun, 2013

1 commit

  • Dave Jones hit the following bug report:

    ===============================
    [ INFO: suspicious RCU usage. ]
    3.10.0-rc2+ #1 Not tainted
    -------------------------------
    include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
    other info that might help us debug this:
    RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
    RCU used illegally from extended quiescent state!
    2 locks held by cc1/63645:
    #0: (&rq->lock){-.-.-.}, at: [] __schedule+0xed/0x9b0
    #1: (rcu_read_lock){.+.+..}, at: [] cpuacct_charge+0x5/0x1f0

    CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
    Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
    0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
    ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
    0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
    Call Trace:
    [] dump_stack+0x19/0x1b
    [] lockdep_rcu_suspicious+0xfd/0x130
    [] cpuacct_charge+0x185/0x1f0
    [] ? cpuacct_charge+0x5/0x1f0
    [] update_curr+0xec/0x240
    [] put_prev_task_fair+0x228/0x480
    [] __schedule+0x161/0x9b0
    [] preempt_schedule+0x51/0x80
    [] ? __cond_resched_softirq+0x60/0x60
    [] ? retint_careful+0x12/0x2e
    [] ftrace_ops_control_func+0x1dc/0x210
    [] ftrace_call+0x5/0x2f
    [] ? retint_careful+0xb/0x2e
    [] ? schedule_user+0x5/0x70
    [] ? schedule_user+0x5/0x70
    [] ? retint_careful+0x12/0x2e
    ------------[ cut here ]------------

    What happened was that the function tracer traced the schedule_user() code
    that tells RCU that the system is coming back from userspace, and to
    add the CPU back to the RCU monitoring.

    Because the function tracer does a preempt_disable/enable_notrace() calls
    the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
    then preempt_schedule() is called. But this is called before the user_exit()
    function can inform the kernel that the CPU is no longer in user mode and
    needs to be accounted for by RCU.

    The fix is to create a new preempt_schedule_context() that checks if
    the kernel is still in user mode and if so to switch it to kernel mode
    before calling schedule. It also switches back to user mode coming back
    from schedule in need be.

    The only user of this currently is the preempt_enable_notrace(), which is
    only used by the tracing subsystem.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.home
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

31 May, 2013

1 commit

  • The kvm_host.h header file doesn't handle well
    inclusion when archs don't support KVM.

    This results in build crashes for such archs when they
    want to implement context tracking because this subsystem
    includes kvm_host.h in order to implement the
    guest_enter/exit APIs but it doesn't handle KVM off case.

    To fix this, move the guest_enter()/guest_exit()
    declarations and generic implementation to the context
    tracking headers. These generic APIs actually belong to
    this subsystem, besides other domains boundary tracking
    like user_enter() et al.

    KVM now properly becomes a user of this library, not the
    other buggy way around.

    Reported-by: Kevin Hilman
    Reviewed-by: Kevin Hilman
    Tested-by: Kevin Hilman
    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Kevin Hilman
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

20 Feb, 2013

1 commit

  • Pull scheduler changes from Ingo Molnar:
    "Main changes:

    - scheduler side full-dynticks (user-space execution is undisturbed
    and receives no timer IRQs) preparation changes that convert the
    cputime accounting code to be full-dynticks ready, from Frederic
    Weisbecker.

    - Initial sched.h split-up changes, by Clark Williams

    - select_idle_sibling() performance improvement by Mike Galbraith:

    " 1 tbench pair (worst case) in a 10 core + SMT package:

    pre 15.22 MB/sec 1 procs
    post 252.01 MB/sec 1 procs "

    - sched_rr_get_interval() ABI fix/change. We think this detail is not
    used by apps (so it's not an ABI in practice), but lets keep it
    under observation.

    - misc RT scheduling cleanups, optimizations"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    sched/rt: Add header to
    cputime: Remove irqsave from seqlock readers
    sched, powerpc: Fix sched.h split-up build failure
    cputime: Restore CPU_ACCOUNTING config defaults for PPC64
    sched/rt: Move rt specific bits into new header file
    sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice
    sched: Move sched.h sysctl bits into separate header
    sched: Fix signedness bug in yield_to()
    sched: Fix select_idle_sibling() bouncing cow syndrome
    sched/rt: Further simplify pick_rt_task()
    sched/rt: Do not account zero delta_exec in update_curr_rt()
    cputime: Safely read cputime of full dynticks CPUs
    kvm: Prepare to add generic guest entry/exit callbacks
    cputime: Use accessors to read task cputime stats
    cputime: Allow dynamic switch between tick/virtual based cputime accounting
    cputime: Generic on-demand virtual cputime accounting
    cputime: Move default nsecs_to_cputime() to jiffies based cputime file
    cputime: Librarize per nsecs resolution cputime definitions
    cputime: Avoid multiplication overflow on utime scaling
    context_tracking: Export context state for generic vtime
    ...

    Fix up conflict in kernel/context_tracking.c due to comment additions.

    Linus Torvalds
     

28 Jan, 2013

2 commits

  • While remotely reading the cputime of a task running in a
    full dynticks CPU, the values stored in utime/stime fields
    of struct task_struct may be stale. Its values may be those
    of the last kernel user transition time snapshot and
    we need to add the tickless time spent since this snapshot.

    To fix this, flush the cputime of the dynticks CPUs on
    kernel user transition and record the time / context
    where we did this. Then on top of this snapshot and the current
    time, perform the fixup on the reader side from task_times()
    accessors.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    [fixed kvm module related build errors]
    Signed-off-by: Sedat Dilek

    Frederic Weisbecker
     
  • If we want to stop the tick further idle, we need to be
    able to account the cputime without using the tick.

    Virtual based cputime accounting solves that problem by
    hooking into kernel/user boundaries.

    However implementing CONFIG_VIRT_CPU_ACCOUNTING require
    low level hooks and involves more overhead. But we already
    have a generic context tracking subsystem that is required
    for RCU needs by archs which plan to shut down the tick
    outside idle.

    This patch implements a generic virtual based cputime
    accounting that relies on these generic kernel/user hooks.

    There are some upsides of doing this:

    - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
    if context tracking is already built (already necessary for RCU in full
    tickless mode).

    - We can rely on the generic context tracking subsystem to dynamically
    (de)activate the hooks, so that we can switch anytime between virtual
    and tick based accounting. This way we don't have the overhead
    of the virtual accounting when the tick is running periodically.

    And one downside:

    - There is probably more overhead than a native virtual based cputime
    accounting. But this relies on hooks that are already set anyway.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

27 Jan, 2013

2 commits

  • This subsystem lacks many explanations on its purpose and
    design. Add these missing comments.

    v4: Document function parameter to be more kernel-doc
    friendly, as per Namhyung suggestion.

    Reported-by: Andrew Morton
    Signed-off-by: Frederic Weisbecker
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Signed-off-by: Paul E. McKenney

    Frederic Weisbecker
     
  • Export the context state: whether we run in user / kernel
    from the context tracking subsystem point of view.

    This is going to be used by the generic virtual cputime
    accounting subsystem that is needed to implement the full
    dynticks.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

01 Dec, 2012

1 commit

  • Create a new subsystem that probes on kernel boundaries
    to keep track of the transitions between level contexts
    with two basic initial contexts: user or kernel.

    This is an abstraction of some RCU code that use such tracking
    to implement its userspace extended quiescent state.

    We need to pull this up from RCU into this new level of indirection
    because this tracking is also going to be used to implement an "on
    demand" generic virtual cputime accounting. A necessary step to
    shutdown the tick while still accounting the cputime.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Li Zhong
    Cc: Gilad Ben-Yossef
    Reviewed-by: Steven Rostedt
    [ paulmck: fix whitespace error and email address. ]
    Signed-off-by: Paul E. McKenney

    Frederic Weisbecker