15 Aug, 2014

1 commit

  • Benjamin Herrenschmidt pointed out that I further missed modifying
    update_vsyscall after the wall_to_mono value was changed to a
    timespec64. This causes issues on powerpc32, which expects a 32bit
    timespec.

    This patch fixes the problem by properly converting from a timespec64 to
    a timespec before passing the value on to the arch-specific vsyscall
    logic.

    [ Thomas is currently on vacation, but reviewed it and wanted me to send
    this fix on to you directly. ]

    Cc: LKML
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Benjamin Herrenschmidt
    Reported-by: Benjamin Herrenschmidt
    Reviewed-by: Thomas Gleixner
    Signed-off-by: John Stultz
    Signed-off-by: Linus Torvalds

    John Stultz
     

06 Aug, 2014

1 commit

  • Pull timer and time updates from Thomas Gleixner:
    "A rather large update of timers, timekeeping & co

    - Core timekeeping code is year-2038 safe now for 32bit machines.
    Now we just need to fix all in kernel users and the gazillion of
    user space interfaces which rely on timespec/timeval :)

    - Better cache layout for the timekeeping internal data structures.

    - Proper nanosecond based interfaces for in kernel users.

    - Tree wide cleanup of code which wants nanoseconds but does hoops
    and loops to convert back and forth from timespecs. Some of it
    definitely belongs into the ugly code museum.

    - Consolidation of the timekeeping interface zoo.

    - A fast NMI safe accessor to clock monotonic for tracing. This is a
    long standing request to support correlated user/kernel space
    traces. With proper NTP frequency correction it's also suitable
    for correlation of traces accross separate machines.

    - Checkpoint/restart support for timerfd.

    - A few NOHZ[_FULL] improvements in the [hr]timer code.

    - Code move from kernel to kernel/time of all time* related code.

    - New clocksource/event drivers from the ARM universe. I'm really
    impressed that despite an architected timer in the newer chips SoC
    manufacturers insist on inventing new and differently broken SoC
    specific timers.

    [ Ed. "Impressed"? I don't think that word means what you think it means ]

    - Another round of code move from arch to drivers. Looks like most
    of the legacy mess in ARM regarding timers is sorted out except for
    a few obnoxious strongholds.

    - The usual updates and fixlets all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
    timekeeping: Fixup typo in update_vsyscall_old definition
    clocksource: document some basic timekeeping concepts
    timekeeping: Use cached ntp_tick_length when accumulating error
    timekeeping: Rework frequency adjustments to work better w/ nohz
    timekeeping: Minor fixup for timespec64->timespec assignment
    ftrace: Provide trace clocks monotonic
    timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC
    seqcount: Add raw_write_seqcount_latch()
    seqcount: Provide raw_read_seqcount()
    timekeeping: Use tk_read_base as argument for timekeeping_get_ns()
    timekeeping: Create struct tk_read_base and use it in struct timekeeper
    timekeeping: Restructure the timekeeper some more
    clocksource: Get rid of cycle_last
    clocksource: Move cycle_last validation to core code
    clocksource: Make delta calculation a function
    wireless: ath9k: Get rid of timespec conversions
    drm: vmwgfx: Use nsec based interfaces
    drm: i915: Use nsec based interfaces
    timekeeping: Provide ktime_get_raw()
    hangcheck-timer: Use ktime_get_ns()
    ...

    Linus Torvalds
     

05 Aug, 2014

3 commits

  • Pull staging driver updates from Greg KH:
    "Here's the big pull request for the staging driver tree for 3.17-rc1.

    Lots of things in here, over 2000 patches, but the best part is this:
    1480 files changed, 39070 insertions(+), 254659 deletions(-)

    Thanks to the great work of Kristina Martšenko, 14 different staging
    drivers have been removed from the tree as they were obsolete and no
    one was willing to work on cleaning them up. Other than the driver
    removals, loads of cleanups are in here (comedi, lustre, etc.) as well
    as the usual IIO driver updates and additions.

    All of this has been in the linux-next tree for a while"

    * tag 'staging-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (2199 commits)
    staging: comedi: addi_apci_1564: remove diagnostic interrupt support code
    staging: comedi: addi_apci_1564: add subdevice to check diagnostic status
    staging: wlan-ng: coding style problem fix
    staging: wlan-ng: fixing coding style problems
    staging: comedi: ii_pci20kc: request and ioremap memory
    staging: lustre: bitwise vs logical typo
    staging: dgnc: Remove unneeded dgnc_trace.c and dgnc_trace.h
    staging: dgnc: rephrase comment
    staging: comedi: ni_tio: remove some dead code
    staging: rtl8723au: Fix static symbol sparse warning
    staging: rtl8723au: usb_dvobj_init(): Remove unused variable 'pdev_desc'
    staging: rtl8723au: Do not duplicate kernel provided USB macros
    staging: rtl8723au: Remove never set struct pwrctrl_priv.bHWPowerdown
    staging: rtl8723au: Remove two never set variables
    staging: rtl8723au: RSSI_test is never set
    staging:r8190: coding style: Fixed checkpatch reported Error
    staging:r8180: coding style: Fixed too long lines
    staging:r8180: coding style: Fixed commenting style
    staging: lustre: ptlrpc: lproc_ptlrpc.c - fix dereferenceing user space buffer
    staging: lustre: ldlm: ldlm_resource.c - fix dereferenceing user space buffer
    ...

    Linus Torvalds
     
  • Pull scheduler updates from Ingo Molnar:

    - Move the nohz kick code out of the scheduler tick to a dedicated IPI,
    from Frederic Weisbecker.

    This necessiated quite some background infrastructure rework,
    including:

    * Clean up some irq-work internals
    * Implement remote irq-work
    * Implement nohz kick on top of remote irq-work
    * Move full dynticks timer enqueue notification to new kick
    * Move multi-task notification to new kick
    * Remove unecessary barriers on multi-task notification

    - Remove proliferation of wait_on_bit() action functions and allow
    wait_on_bit_action() functions to support a timeout. (Neil Brown)

    - Another round of sched/numa improvements, cleanups and fixes. (Rik
    van Riel)

    - Implement fast idling of CPUs when the system is partially loaded,
    for better scalability. (Tim Chen)

    - Restructure and fix the CPU hotplug handling code that may leave
    cfs_rq and rt_rq's throttled when tasks are migrated away from a dead
    cpu. (Kirill Tkhai)

    - Robustify the sched topology setup code. (Peterz Zijlstra)

    - Improve sched_feat() handling wrt. static_keys (Jason Baron)

    - Misc fixes.

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
    sched/fair: Fix 'make xmldocs' warning caused by missing description
    sched: Use macro for magic number of -1 for setparam
    sched: Robustify topology setup
    sched: Fix sched_setparam() policy == -1 logic
    sched: Allow wait_on_bit_action() functions to support a timeout
    sched: Remove proliferation of wait_on_bit() action functions
    sched/numa: Revert "Use effective_load() to balance NUMA loads"
    sched: Fix static_key race with sched_feat()
    sched: Remove extra static_key*() function indirection
    sched/rt: Fix replenish_dl_entity() comments to match the current upstream code
    sched: Transform resched_task() into resched_curr()
    sched/deadline: Kill task_struct->pi_top_task
    sched: Rework check_for_tasks()
    sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime()
    sched/fair: Disable runtime_enabled on dying rq
    sched/numa: Change scan period code to match intent
    sched/numa: Rework best node setting in task_numa_migrate()
    sched/numa: Examine a task move when examining a task swap
    sched/numa: Simplify task_numa_compare()
    sched/numa: Use effective_load() to balance NUMA loads
    ...

    Linus Torvalds
     
  • Pull RCU changes from Ingo Molar:
    "The main changes:

    - torture-test updates
    - callback-offloading changes
    - maintainership changes
    - update RCU documentation
    - miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
    rcu: Allow for NULL tick_nohz_full_mask when nohz_full= missing
    rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp()
    rcu: Fix a sparse warning in rcu_initiate_boost()
    rcu: Fix __rcu_reclaim() to use true/false for bool
    rcu: Remove CONFIG_PROVE_RCU_DELAY
    rcu: Use __this_cpu_read() instead of per_cpu_ptr()
    rcu: Don't use NMIs to dump other CPUs' stacks
    rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
    rcu: Simplify priority boosting by putting rt_mutex in rcu_node
    rcu: Check both root and current rcu_node when setting up future grace period
    rcu: Allow post-unlock reference for rt_mutex
    rcu: Loosen __call_rcu()'s rcu_head alignment constraint
    rcu: Eliminate read-modify-write ACCESS_ONCE() calls
    rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpu
    rcu: Make rcu node arrays static const char * const
    signal: Explain local_irq_save() call
    rcu: Handle obsolete references to TINY_PREEMPT_RCU
    rcu: Document deadlock-avoidance information for rcu_read_unlock()
    scripts: Teach get_maintainer.pl about the new "R:" tag
    rcu: Update rcu torture maintainership filename patterns
    ...

    Linus Torvalds
     

01 Aug, 2014

1 commit

  • clockevents_increase_min_delta() calls printk() from under
    hrtimer_bases.lock. That causes lock inversion on scheduler locks because
    printk() can call into the scheduler. Lockdep puts it as:

    ======================================================
    [ INFO: possible circular locking dependency detected ]
    3.15.0-rc8-06195-g939f04b #2 Not tainted
    -------------------------------------------------------
    trinity-main/74 is trying to acquire lock:
    (&port_lock_key){-.....}, at: [] serial8250_console_write+0x8c/0x10c

    but task is already holding lock:
    (hrtimer_bases.lock){-.-...}, at: [] hrtimer_try_to_cancel+0x13/0x66

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #5 (hrtimer_bases.lock){-.-...}:
    [] lock_acquire+0x92/0x101
    [] _raw_spin_lock_irqsave+0x2e/0x3e
    [] __hrtimer_start_range_ns+0x1c/0x197
    [] perf_swevent_start_hrtimer.part.41+0x7a/0x85
    [] task_clock_event_start+0x3a/0x3f
    [] task_clock_event_add+0xd/0x14
    [] event_sched_in+0xb6/0x17a
    [] group_sched_in+0x44/0x122
    [] ctx_sched_in.isra.67+0x105/0x11f
    [] perf_event_sched_in.isra.70+0x47/0x4b
    [] __perf_install_in_context+0x8b/0xa3
    [] remote_function+0x12/0x2a
    [] smp_call_function_single+0x2d/0x53
    [] task_function_call+0x30/0x36
    [] perf_install_in_context+0x87/0xbb
    [] SYSC_perf_event_open+0x5c6/0x701
    [] SyS_perf_event_open+0x17/0x19
    [] syscall_call+0x7/0xb

    -> #4 (&ctx->lock){......}:
    [] lock_acquire+0x92/0x101
    [] _raw_spin_lock+0x21/0x30
    [] __perf_event_task_sched_out+0x1dc/0x34f
    [] __schedule+0x4c6/0x4cb
    [] schedule+0xf/0x11
    [] work_resched+0x5/0x30

    -> #3 (&rq->lock){-.-.-.}:
    [] lock_acquire+0x92/0x101
    [] _raw_spin_lock+0x21/0x30
    [] __task_rq_lock+0x33/0x3a
    [] wake_up_new_task+0x25/0xc2
    [] do_fork+0x15c/0x2a0
    [] kernel_thread+0x1a/0x1f
    [] rest_init+0x1a/0x10e
    [] start_kernel+0x303/0x308
    [] i386_start_kernel+0x79/0x7d

    -> #2 (&p->pi_lock){-.-...}:
    [] lock_acquire+0x92/0x101
    [] _raw_spin_lock_irqsave+0x2e/0x3e
    [] try_to_wake_up+0x1d/0xd6
    [] default_wake_function+0xb/0xd
    [] __wake_up_common+0x39/0x59
    [] __wake_up+0x29/0x3b
    [] tty_wakeup+0x49/0x51
    [] uart_write_wakeup+0x17/0x19
    [] serial8250_tx_chars+0xbc/0xfb
    [] serial8250_handle_irq+0x54/0x6a
    [] serial8250_default_handle_irq+0x19/0x1c
    [] serial8250_interrupt+0x38/0x9e
    [] handle_irq_event_percpu+0x5f/0x1e2
    [] handle_irq_event+0x2c/0x43
    [] handle_level_irq+0x57/0x80
    [] handle_irq+0x46/0x5c
    [] do_IRQ+0x32/0x89
    [] common_interrupt+0x2e/0x33
    [] _raw_spin_unlock_irqrestore+0x3f/0x49
    [] uart_start+0x2d/0x32
    [] uart_write+0xc7/0xd6
    [] n_tty_write+0xb8/0x35e
    [] tty_write+0x163/0x1e4
    [] redirected_tty_write+0x6d/0x75
    [] vfs_write+0x75/0xb0
    [] SyS_write+0x44/0x77
    [] syscall_call+0x7/0xb

    -> #1 (&tty->write_wait){-.....}:
    [] lock_acquire+0x92/0x101
    [] _raw_spin_lock_irqsave+0x2e/0x3e
    [] __wake_up+0x15/0x3b
    [] tty_wakeup+0x49/0x51
    [] uart_write_wakeup+0x17/0x19
    [] serial8250_tx_chars+0xbc/0xfb
    [] serial8250_handle_irq+0x54/0x6a
    [] serial8250_default_handle_irq+0x19/0x1c
    [] serial8250_interrupt+0x38/0x9e
    [] handle_irq_event_percpu+0x5f/0x1e2
    [] handle_irq_event+0x2c/0x43
    [] handle_level_irq+0x57/0x80
    [] handle_irq+0x46/0x5c
    [] do_IRQ+0x32/0x89
    [] common_interrupt+0x2e/0x33
    [] _raw_spin_unlock_irqrestore+0x3f/0x49
    [] uart_start+0x2d/0x32
    [] uart_write+0xc7/0xd6
    [] n_tty_write+0xb8/0x35e
    [] tty_write+0x163/0x1e4
    [] redirected_tty_write+0x6d/0x75
    [] vfs_write+0x75/0xb0
    [] SyS_write+0x44/0x77
    [] syscall_call+0x7/0xb

    -> #0 (&port_lock_key){-.....}:
    [] __lock_acquire+0x9ea/0xc6d
    [] lock_acquire+0x92/0x101
    [] _raw_spin_lock_irqsave+0x2e/0x3e
    [] serial8250_console_write+0x8c/0x10c
    [] call_console_drivers.constprop.31+0x87/0x118
    [] console_unlock+0x1d7/0x398
    [] vprintk_emit+0x3da/0x3e4
    [] printk+0x17/0x19
    [] clockevents_program_min_delta+0x104/0x116
    [] clockevents_program_event+0xe7/0xf3
    [] tick_program_event+0x1e/0x23
    [] hrtimer_force_reprogram+0x88/0x8f
    [] __remove_hrtimer+0x5b/0x79
    [] hrtimer_try_to_cancel+0x49/0x66
    [] hrtimer_cancel+0xd/0x18
    [] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
    [] task_clock_event_stop+0x20/0x64
    [] task_clock_event_del+0xd/0xf
    [] event_sched_out+0xab/0x11e
    [] group_sched_out+0x1d/0x66
    [] ctx_sched_out+0xaf/0xbf
    [] __perf_event_task_sched_out+0x1ed/0x34f
    [] __schedule+0x4c6/0x4cb
    [] schedule+0xf/0x11
    [] work_resched+0x5/0x30

    other info that might help us debug this:

    Chain exists of:
    &port_lock_key --> &ctx->lock --> hrtimer_bases.lock

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(hrtimer_bases.lock);
    lock(&ctx->lock);
    lock(hrtimer_bases.lock);
    lock(&port_lock_key);

    *** DEADLOCK ***

    4 locks held by trinity-main/74:
    #0: (&rq->lock){-.-.-.}, at: [] __schedule+0xed/0x4cb
    #1: (&ctx->lock){......}, at: [] __perf_event_task_sched_out+0x1dc/0x34f
    #2: (hrtimer_bases.lock){-.-...}, at: [] hrtimer_try_to_cancel+0x13/0x66
    #3: (console_lock){+.+...}, at: [] vprintk_emit+0x3c7/0x3e4

    stack backtrace:
    CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2
    00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
    8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
    8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
    Call Trace:
    [] dump_stack+0x16/0x18
    [] print_circular_bug+0x18f/0x19c
    [] __lock_acquire+0x9ea/0xc6d
    [] lock_acquire+0x92/0x101
    [] ? serial8250_console_write+0x8c/0x10c
    [] ? wait_for_xmitr+0x76/0x76
    [] _raw_spin_lock_irqsave+0x2e/0x3e
    [] ? serial8250_console_write+0x8c/0x10c
    [] serial8250_console_write+0x8c/0x10c
    [] ? lock_release+0x191/0x223
    [] ? wait_for_xmitr+0x76/0x76
    [] call_console_drivers.constprop.31+0x87/0x118
    [] console_unlock+0x1d7/0x398
    [] vprintk_emit+0x3da/0x3e4
    [] printk+0x17/0x19
    [] clockevents_program_min_delta+0x104/0x116
    [] tick_program_event+0x1e/0x23
    [] hrtimer_force_reprogram+0x88/0x8f
    [] __remove_hrtimer+0x5b/0x79
    [] hrtimer_try_to_cancel+0x49/0x66
    [] hrtimer_cancel+0xd/0x18
    [] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
    [] task_clock_event_stop+0x20/0x64
    [] task_clock_event_del+0xd/0xf
    [] event_sched_out+0xab/0x11e
    [] group_sched_out+0x1d/0x66
    [] ctx_sched_out+0xaf/0xbf
    [] __perf_event_task_sched_out+0x1ed/0x34f
    [] ? __dequeue_entity+0x23/0x27
    [] ? pick_next_task_fair+0xb1/0x120
    [] __schedule+0x4c6/0x4cb
    [] ? trace_hardirqs_off_caller+0xd7/0x108
    [] ? trace_hardirqs_off+0xb/0xd
    [] ? rcu_irq_exit+0x64/0x77

    Fix the problem by using printk_deferred() which does not call into the
    scheduler.

    Reported-by: Fengguang Wu
    Signed-off-by: Jan Kara
    Cc: stable@vger.kernel.org
    Signed-off-by: Thomas Gleixner

    Jan Kara
     

28 Jul, 2014

1 commit


24 Jul, 2014

33 commits

  • During suspend we call sched_clock_poll() to update the epoch and
    accumulated time and reprogram the sched_clock_timer to fire
    before the next wrap-around time. Unfortunately,
    sched_clock_poll() doesn't restart the timer, instead it relies
    on the hrtimer layer to do that and during suspend we aren't
    calling that function from the hrtimer layer. Instead, we're
    reprogramming the expires time while the hrtimer is enqueued,
    which can cause the hrtimer tree to be corrupted. Furthermore, we
    restart the timer during suspend but we update the epoch during
    resume which seems counter-intuitive.

    Let's fix this by saving the accumulated state and canceling the
    timer during suspend. On resume we can update the epoch and
    restart the timer similar to what we would do if we were starting
    the clock for the first time.

    Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer"
    Signed-off-by: Stephen Boyd
    Signed-off-by: John Stultz
    Link: http://lkml.kernel.org/r/1406174630-23458-1-git-send-email-john.stultz@linaro.org
    Cc: Ingo Molnar
    Cc: stable
    Signed-off-by: Thomas Gleixner

    Stephen Boyd
     
  • By caching the ntp_tick_length() when we correct the frequency error,
    and then using that cached value to accumulate error, we avoid large
    initial errors when the tick length is changed.

    This makes convergence happen much faster in the simulator, since the
    initial error doesn't have to be slowly whittled away.

    This initially seems like an accounting error, but Miroslav pointed out
    that ntp_tick_length() can change mid-tick, so when we apply it in the
    error accumulation, we are applying any recent change to the entire tick.

    This approach chooses to apply changes in the ntp_tick_length() only to
    the next tick, which allows us to calculate the freq correction before
    using the new tick length, which avoids accummulating error.

    Credit to Miroslav for pointing this out and providing the original patch
    this functionality has been pulled out from, along with the rational.

    Cc: Miroslav Lichvar
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Reported-by: Miroslav Lichvar
    Signed-off-by: John Stultz

    John Stultz
     
  • The existing timekeeping_adjust logic has always been complicated
    to understand. Further, since it was developed prior to NOHZ becoming
    common, its not surprising it performs poorly when NOHZ is enabled.

    Since Miroslav pointed out the problematic nature of the existing code
    in the NOHZ case, I've tried to refactor the code to perform better.

    The problem with the previous approach was that it tried to adjust
    for the total cumulative error using a scaled dampening factor. This
    resulted in large errors to be corrected slowly, while small errors
    were corrected quickly. With NOHZ the timekeeping code doesn't know
    how far out the next tick will be, so this results in bad
    over-correction to small errors, and insufficient correction to large
    errors.

    Inspired by Miroslav's patch, I've refactored the code to try to
    address the correction in two steps.

    1) Check the future freq error for the next tick, and if the frequency
    error is large, try to make sure we correct it so it doesn't cause
    much accumulated error.

    2) Then make a small single unit adjustment to correct any cumulative
    error that has collected over time.

    This method performs fairly well in the simulator Miroslav created.

    Major credit to Miroslav for pointing out the issue, providing the
    original patch to resolve this, a simulator for testing, as well as
    helping debug and resolve issues in my implementation so that it
    performed closer to his original implementation.

    Cc: Miroslav Lichvar
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Reported-by: Miroslav Lichvar
    Signed-off-by: John Stultz

    John Stultz
     
  • In the GENERIC_TIME_VSYSCALL_OLD update_vsyscall implementation,
    we take the tk_xtime() value, which returns a timespec64, and
    store it in a timespec.

    This luckily is ok, since the only architectures that use
    GENERIC_TIME_VSYSCALL_OLD are ia64 and ppc64, which are both
    64 bit systems where timespec64 is the same as a timespec.

    Even so, for cleanliness reasons, use the conversion function
    to assign the proper type.

    Signed-off-by: John Stultz

    John Stultz
     
  • Tracers want a correlated time between the kernel instrumentation and
    user space. We really do not want to export sched_clock() to user
    space, so we need to provide something sensible for this.

    Using separate data structures with an non blocking sequence count
    based update mechanism allows us to do that. The data structure
    required for the readout has a sequence counter and two copies of the
    timekeeping data.

    On the update side:

    smp_wmb();
    tkf->seq++;
    smp_wmb();
    update(tkf->base[0], tk);
    smp_wmb();
    tkf->seq++;
    smp_wmb();
    update(tkf->base[1], tk);

    On the reader side:

    do {
    seq = tkf->seq;
    smp_rmb();
    idx = seq & 0x01;
    now = now(tkf->base[idx]);
    smp_rmb();
    } while (seq != tkf->seq)

    So if a NMI hits the update of base[0] it will use base[1] which is
    still consistent, but this timestamp is not guaranteed to be monotonic
    across an update.

    The timestamp is calculated by:

    now = base_mono + clock_delta * slope

    So if the update lowers the slope, readers who are forced to the
    not yet updated second array are still using the old steeper slope.

    tmono
    ^
    | o n
    | o n
    | u
    | o
    |o
    |12345678---> reader order

    o = old slope
    u = update
    n = new slope

    So reader 6 will observe time going backwards versus reader 5.

    While other CPUs are likely to be able observe that, the only way
    for a CPU local observation is when an NMI hits in the middle of
    the update. Timestamps taken from that NMI context might be ahead
    of the following timestamps. Callers need to be aware of that and
    deal with it.

    V2: Got rid of clock monotonic raw and reorganized the data
    structures. Folded in the barrier fix from Mathieu.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Mathieu Desnoyers
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • All the function needs is in the tk_read_base struct. No functional
    change for the current code, just a preparatory patch for the NMI safe
    accessor to clock monotonic which will use struct tk_read_base as well.

    Signed-off-by: Thomas Gleixner
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Mathieu Desnoyers
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • The members of the new struct are the required ones for the new NMI
    safe accessor to clcok monotonic. In order to reuse the existing
    timekeeping code and to make the update of the fast NMI safe
    timekeepers a simple memcpy use the struct for the timekeeper as well
    and convert all users.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Mathieu Desnoyers
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Access to time requires to touch two cachelines at minimum

    1) The timekeeper data structure

    2) The clocksource data structure

    The access to the clocksource data structure can be avoided as almost
    all clocksource implementations ignore the argument to the read
    callback, which is a pointer to the clocksource.

    But the core needs to touch it to access the members @read and @mask.

    So we are better off by copying the @read function pointer and the
    @mask from the clocksource to the core data structure itself.

    For the most used ktime_get() access all required data including the
    @read and @mask copies fits together with the sequence counter into a
    single 64 byte cacheline.

    For the other time access functions we touch in the current code three
    cache lines in the worst case. But with the clocksource data copies we
    can reduce that to two adjacent cachelines, which is more efficient
    than disjunct cache lines.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • cycle_last was added to the clocksource to support the TSC
    validation. We moved that to the core code, so we can get rid of the
    extra copy.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • The only user of the cycle_last validation is the x86 TSC. In order to
    provide NMI safe accessor functions for clock monotonic and
    monotonic_raw we need to do that in the core.

    We can't do the TSC specific

    if (now < cycle_last)
    now = cycle_last;

    for the other wrapping around clocksources, but TSC has
    CLOCKSOURCE_MASK(64) which actually does not mask out anything so if
    now is less than cycle_last the subtraction will give a negative
    result. So we can check for that in clocksource_delta() and return 0
    for that case.

    Implement and enable it for x86

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • We want to move the TSC sanity check into core code to make NMI safe
    accessors to clock monotonic[_raw] possible. For this we need to
    sanity check the delta calculation. Create a helper function and
    convert all sites to use it.

    [ Build fix from jstultz ]

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Provide a ktime_t based interface for raw monotonic time.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • timekeeping_clocktai() is not used in fast pathes, so the extra
    timespec conversion is not problematic.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • No more users. Remove it

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Subtracting plain nsec values and converting to timespec is simpler
    than the whole timespec math. Not really fastpath code, so the
    division is not an issue.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • get_monotonic_boottime() is not used in fast pathes, so the extra
    timespec conversion is not problematic.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • No more users.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Required for moving drivers to the nanosecond based interfaces.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • No more users.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • ktime based conversion function to map a monotonic time stamp to a
    different CLOCK.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • No need to juggle with timespecs.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • No need to juggle with timespecs.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Speed up the readout.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Provide a helper function which lets us implement ktime_t based
    interfaces for real, boot and tai clocks.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Speed up ktime_get() by using ktime_t based data. Text size shrinks by
    64 bytes on x8664.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • The ktime_t based interfaces are used a lot in performance critical
    code pathes. Add ktime_t based data so the interfaces don't have to
    convert from the xtime/timespec based data.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • We already have a function which does the right thing, that also makes
    sure that the coming ktime_t based cached values are getting updated.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • struct timekeeper is quite badly sorted for the hot readout path. Most
    time access functions need to load two cache lines.

    Rearrange it so ktime_get() and getnstimeofday() are happy with a
    single cache line.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • No users outside of the core.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • To convert callers of the core code to timespec64 we need to provide
    the proper interfaces.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Right now we have time related prototypes in 3 different header
    files. Move it to a single timekeeping header file and move the core
    internal stuff into a core private header.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner