15 Oct, 2014

1 commit

  • Pull percpu consistent-ops changes from Tejun Heo:
    "Way back, before the current percpu allocator was implemented, static
    and dynamic percpu memory areas were allocated and handled separately
    and had their own accessors. The distinction has been gone for many
    years now; however, the now duplicate two sets of accessors remained
    with the pointer based ones - this_cpu_*() - evolving various other
    operations over time. During the process, we also accumulated other
    inconsistent operations.

    This pull request contains Christoph's patches to clean up the
    duplicate accessor situation. __get_cpu_var() uses are replaced with
    with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

    Unfortunately, the former sometimes is tricky thanks to C being a bit
    messy with the distinction between lvalues and pointers, which led to
    a rather ugly solution for cpumask_var_t involving the introduction of
    this_cpu_cpumask_var_ptr().

    This converts most of the uses but not all. Christoph will follow up
    with the remaining conversions in this merge window and hopefully
    remove the obsolete accessors"

    * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
    irqchip: Properly fetch the per cpu offset
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
    ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
    Revert "powerpc: Replace __get_cpu_var uses"
    percpu: Remove __this_cpu_ptr
    clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
    sparc: Replace __get_cpu_var uses
    avr32: Replace __get_cpu_var with __this_cpu_write
    blackfin: Replace __get_cpu_var uses
    tile: Use this_cpu_ptr() for hardware counters
    tile: Replace __get_cpu_var uses
    powerpc: Replace __get_cpu_var uses
    alpha: Replace __get_cpu_var
    ia64: Replace __get_cpu_var uses
    s390: cio driver &__get_cpu_var replacements
    s390: Replace __get_cpu_var uses
    mips: Replace __get_cpu_var uses
    MIPS: Replace __get_cpu_var uses in FPU emulator.
    arm: Replace __this_cpu_ptr with raw_cpu_ptr
    ...

    Linus Torvalds
     

14 Sep, 2014

1 commit

  • This way we unbloat a bit main.c and more importantly we initialize
    nohz full after init_IRQ(). This dependency will be needed in further
    patches because nohz full needs irq work to raise its own IRQ.
    Information about the support for this ability on ARM64 is obtained on
    init_IRQ() which initialize the pointer to __smp_call_function.

    Since tick_init() is called right after init_IRQ(), this is a good place
    to call tick_nohz_init() and prepare for that dependency.

    Acked-by: Peter Zijlstra (Intel)
    Cc: Ingo Molnar
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     

27 Aug, 2014

1 commit

  • Convert uses of __get_cpu_var for creating a address from a percpu
    offset to this_cpu_ptr.

    The two cases where get_cpu_var is used to actually access a percpu
    variable are changed to use this_cpu_read/raw_cpu_read.

    Reviewed-by: Thomas Gleixner
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

16 Apr, 2014

1 commit

  • tick_check_replacement() returns if a replacement of clock_event_device is
    possible or not. It does this as the first check:

    if (tick_check_percpu(curdev, newdev, smp_processor_id()))
    return false;

    Thats wrong. tick_check_percpu() returns true when the device is
    useable. Check for false instead.

    [ tglx: Massaged changelog ]

    Signed-off-by: Viresh Kumar
    Cc: # v3.11+
    Cc: linaro-kernel@lists.linaro.org
    Cc: fweisbec@gmail.com
    Cc: Arvind.Chauhan@arm.com
    Cc: linaro-networking@linaro.org
    Link: http://lkml.kernel.org/r/486a02efe0246635aaba786e24b42d316438bf3b.1397537987.git.viresh.kumar@linaro.org
    Signed-off-by: Thomas Gleixner

    Viresh Kumar
     

26 Mar, 2014

2 commits

  • tick_handle_periodic() is calling ktime_add() at two places, first before the
    infinite loop and then at the end of infinite loop. We can rearrange code a bit
    to fix code duplication here.

    It looks quite simple and shouldn't break anything, I guess :)

    Signed-off-by: Viresh Kumar
    Cc: linaro-kernel@lists.linaro.org
    Cc: fweisbec@gmail.com
    Link: http://lkml.kernel.org/r/be3481e8f3f71df694a4b43623254fc93ca51b59.1395735873.git.viresh.kumar@linaro.org
    Signed-off-by: Thomas Gleixner

    Viresh Kumar
     
  • One of the comments in tick_handle_periodic() had 'when' instead of 'which' (My
    guess :)). Fix it.

    Also fix spelling mistake in 'Possible'.

    Signed-off-by: Viresh Kumar
    Cc: linaro-kernel@lists.linaro.org
    Cc: skarafotis@gmail.com
    Link: http://lkml.kernel.org/r/2b29ca4230c163e44179941d7c7a16c1474385c2.1395743878.git.viresh.kumar@linaro.org
    Signed-off-by: Thomas Gleixner

    Viresh Kumar
     

12 Jan, 2014

1 commit


24 Dec, 2013

1 commit

  • Since the xtime lock was split into the timekeeping lock and
    the jiffies lock, we no longer need to call update_wall_time()
    while holding the jiffies lock.

    Thus, this patch splits update_wall_time() out from do_timer().

    This allows us to get away from calling clock_was_set_delayed()
    in update_wall_time() and instead use the standard clock_was_set()
    call that previously would deadlock, as it causes the jiffies lock
    to be acquired.

    Cc: Sasha Levin
    Cc: Thomas Gleixner
    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Ingo Molnar
    Signed-off-by: John Stultz

    John Stultz
     

19 Nov, 2013

1 commit


02 Jul, 2013

1 commit

  • The recent implementation of a generic dummy timer resulted in a
    different registration order of per cpu local timers which made the
    broadcast control logic go belly up.

    If the dummy timer is the first clock event device which is registered
    for a CPU, then it is installed, the broadcast timer is initialized
    and the CPU is marked as broadcast target.

    If a real clock event device is installed after that, we can fail to
    take the CPU out of the broadcast mask. In the worst case we end up
    with two periodic timer events firing for the same CPU. One from the
    per cpu hardware device and one from the broadcast.

    Now the problem is that we have no way to distinguish whether the
    system is in a state which makes broadcasting necessary or the
    broadcast bit was set due to the nonfunctional dummy timer
    installment.

    To solve this we need to keep track of the system state seperately and
    provide a more detailed decision logic whether we keep the CPU in
    broadcast mode or not.

    The old decision logic only clears the broadcast mode, if the newly
    installed clock event device is not affected by power states.

    The new logic clears the broadcast mode if one of the following is
    true:

    - The new device is not affected by power states.

    - The system is not in a power state affected mode

    - The system has switched to oneshot mode. The oneshot broadcast is
    controlled from the deep idle state. The CPU is not in idle at
    this point, so it's safe to remove it from the mask.

    If we clear the broadcast bit for the CPU when a new device is
    installed, we also shutdown the broadcast device when this was the
    last CPU in the broadcast mask.

    If the broadcast bit is kept, then we leave the new device in shutdown
    state and rely on the broadcast to deliver the timer interrupts via
    the broadcast ipis.

    Reported-and-tested-by: Stehle Vincent-B46079
    Reviewed-by: Stephen Boyd
    Cc: John Stultz ,
    Cc: Mark Rutland
    Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
    Cc: stable@vger.kernel.org
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

25 Jun, 2013

1 commit

  • On an SMP system with only one global clockevent and a dummy
    clockevent per CPU we run into problems. We want the dummy
    clockevents to be registered as the per CPU tick devices, but
    we can only achieve that if we register the dummy clockevents
    before the global clockevent or if we artificially inflate the
    rating of the dummy clockevents to be higher than the rating
    of the global clockevent. Failure to do so leads to boot
    hangs when the dummy timers are registered on all other CPUs
    besides the CPU that accepted the global clockevent as its tick
    device and there is no broadcast timer to poke the dummy
    devices.

    If we're registering multiple clockevents and one clockevent is
    global and the other is local to a particular CPU we should
    choose to use the local clockevent regardless of the rating of
    the device. This way, if the clockevent is a dummy it will take
    the tick device duty as long as there isn't a higher rated tick
    device and any global clockevent will be bumped out into
    broadcast mode, fixing the problem described above.

    Reported-and-tested-by: Mark Rutland
    Signed-off-by: Stephen Boyd
    Tested-by: soren.brinkmann@xilinx.com
    Cc: John Stultz
    Cc: Daniel Lezcano
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: John Stultz
    Link: http://lkml.kernel.org/r/20130613183950.GA32061@codeaurora.org
    Signed-off-by: Thomas Gleixner

    Stephen Boyd
     

16 May, 2013

6 commits

  • Provide a sysfs interface to allow unbinding of clockevent
    devices. The device is unbound if it is unused or if there is a
    replacement device available. Unbinding of broadcast devices is not
    supported as we don't want to foster that nonsense. If no replacement
    device is available the unbind returns -EBUSY. Unbind is available
    from the kernel and through sysfs, which is necessary to drop the
    module refcount.

    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Cc: Magnus Damm
    Link: http://lkml.kernel.org/r/20130425143436.499216659@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Split out the clockevent device selection logic. Preparatory patch to
    allow unbinding active clockevent devices.

    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Cc: Magnus Damm
    Link: http://lkml.kernel.org/r/20130425143436.431796247@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • We want to be able to remove clockevent modules as well. Add a
    refcount so we don't remove a module with an active clock event
    device.

    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Cc: Magnus Damm
    Link: http://lkml.kernel.org/r/20130425143436.307435149@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • No need to call another function and have duplicated cases.

    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Cc: Magnus Damm
    Link: http://lkml.kernel.org/r/20130425143436.235746557@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Now that the notifier chain is gone there are no other users and it's
    pointless to nest tick_device_lock inside of clockevents_lock because
    there is no other use case.

    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Cc: Magnus Damm
    Link: http://lkml.kernel.org/r/20130425143436.162888472@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • 7+ years and still a single user. Kill it.

    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Cc: Magnus Damm
    Link: http://lkml.kernel.org/r/20130425143436.098520211@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

06 May, 2013

1 commit

  • Pull 'full dynticks' support from Ingo Molnar:
    "This tree from Frederic Weisbecker adds a new, (exciting! :-) core
    kernel feature to the timer and scheduler subsystems: 'full dynticks',
    or CONFIG_NO_HZ_FULL=y.

    This feature extends the nohz variable-size timer tick feature from
    idle to busy CPUs (running at most one task) as well, potentially
    reducing the number of timer interrupts significantly.

    This feature got motivated by real-time folks and the -rt tree, but
    the general utility and motivation of full-dynticks runs wider than
    that:

    - HPC workloads get faster: CPUs running a single task should be able
    to utilize a maximum amount of CPU power. A periodic timer tick at
    HZ=1000 can cause a constant overhead of up to 1.0%. This feature
    removes that overhead - and speeds up the system by 0.5%-1.0% on
    typical distro configs even on modern systems.

    - Real-time workload latency reduction: CPUs running critical tasks
    should experience as little jitter as possible. The last remaining
    source of kernel-related jitter was the periodic timer tick.

    - A single task executing on a CPU is a pretty common situation,
    especially with an increasing number of cores/CPUs, so this feature
    helps desktop and mobile workloads as well.

    The cost of the feature is mainly related to increased timer
    reprogramming overhead when a CPU switches its tick period, and thus
    slightly longer to-idle and from-idle latency.

    Configuration-wise a third mode of operation is added to the existing
    two NOHZ kconfig modes:

    - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
    as a config option. This is the traditional Linux periodic tick
    design: there's a HZ tick going on all the time, regardless of
    whether a CPU is idle or not.

    - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
    periodic tick when a CPU enters idle mode.

    - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
    tick when a CPU is idle, also slows the tick down to 1 Hz (one
    timer interrupt per second) when only a single task is running on a
    CPU.

    The .config behavior is compatible: existing !CONFIG_NO_HZ and
    CONFIG_NO_HZ=y settings get translated to the new values, without the
    user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
    default.

    This feature is based on a lot of infrastructure work that has been
    steadily going upstream in the last 2-3 cycles: related RCU support
    and non-periodic cputime support in particular is upstream already.

    This tree adds the final pieces and activates the feature. The pull
    request is marked RFC because:

    - it's marked 64-bit only at the moment - the 32-bit support patch is
    small but did not get ready in time.

    - it has a number of fresh commits that came in after the merge
    window. The overwhelming majority of commits are from before the
    merge window, but still some aspects of the tree are fresh and so I
    marked it RFC.

    - it's a pretty wide-reaching feature with lots of effects - and
    while the components have been in testing for some time, the full
    combination is still not very widely used. That it's default-off
    should reduce its regression abilities and obviously there are no
    known regressions with CONFIG_NO_HZ_FULL=y enabled either.

    - the feature is not completely idempotent: there is no 100%
    equivalent replacement for a periodic scheduler/timer tick. In
    particular there's ongoing work to map out and reduce its effects
    on scheduler load-balancing and statistics. This should not impact
    correctness though, there are no known regressions related to this
    feature at this point.

    - it's a pretty ambitious feature that with time will likely be
    enabled by most Linux distros, and we'd like you to make input on
    its design/implementation, if you dislike some aspect we missed.
    Without flaming us to crisp! :-)

    Future plans:

    - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
    the periodic tick altogether when there's a single busy task on a
    CPU. We'd first like 1 Hz to be exposed more widely before we go
    for the 0 Hz target though.

    - once we reach 0 Hz we can remove the periodic tick assumption from
    nr_running>=2 as well, by essentially interrupting busy tasks only
    as frequently as the sched_latency constraints require us to do -
    once every 4-40 msecs, depending on nr_running.

    I am personally leaning towards biting the bullet and doing this in
    v3.10, like the -rt tree this effort has been going on for too long -
    but the final word is up to you as usual.

    More technical details can be found in Documentation/timers/NO_HZ.txt"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
    sched: Keep at least 1 tick per second for active dynticks tasks
    rcu: Fix full dynticks' dependency on wide RCU nocb mode
    nohz: Protect smp_processor_id() in tick_nohz_task_switch()
    nohz_full: Add documentation.
    cputime_nsecs: use math64.h for nsec resolution conversion helpers
    nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
    nohz: Reduce overhead under high-freq idling patterns
    nohz: Remove full dynticks' superfluous dependency on RCU tree
    nohz: Fix unavailable tick_stop tracepoint in dynticks idle
    nohz: Add basic tracing
    nohz: Select wide RCU nocb for full dynticks
    nohz: Disable the tick when irq resume in full dynticks CPU
    nohz: Re-evaluate the tick for the new task after a context switch
    nohz: Prepare to stop the tick on irq exit
    nohz: Implement full dynticks kick
    nohz: Re-evaluate the tick from the scheduler IPI
    sched: New helper to prevent from stopping the tick in full dynticks
    sched: Kick full dynticks CPU that have more than one task enqueued.
    perf: New helper to prevent full dynticks CPUs from stopping tick
    perf: Kick full dynticks CPU if events rotation is needed
    ...

    Linus Torvalds
     

25 Apr, 2013

1 commit

  • Vitaliy reported that a per cpu HPET timer interrupt crashes the
    system during hibernation. What happens is that the per cpu HPET timer
    gets shut down when the nonboot cpus are stopped. When the nonboot
    cpus are onlined again the HPET code sets up the MSI interrupt which
    fires before the clock event device is registered. The event handler
    is still set to hrtimer_interrupt, which then crashes the machine due
    to highres mode not being active.

    See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=700333

    There is no real good way to avoid that in the HPET code. The HPET
    code alrady has a mechanism to detect spurious interrupts when event
    handler == NULL for a similar reason.

    We can handle that in the clockevent/tick layer and replace the
    previous functional handler with a dummy handler like we do in
    tick_setup_new_device().

    The original clockevents code did this in clockevents_exchange_device(),
    but that got removed by commit 7c1e76897 (clockevents: prevent
    clockevent event_handler ending up handler_noop) which forgot to fix
    it up in tick_shutdown(). Same issue with the broadcast device.

    Reported-by: Vitaliy Fillipov
    Cc: Ben Hutchings
    Cc: stable@vger.kernel.org
    Cc: 700333@bugs.debian.org
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

16 Apr, 2013

1 commit

  • "Extended nohz" was used as a naming base for the full dynticks
    API and Kconfig symbols. It reflects the fact the system tries
    to stop the tick in more places than just idle.

    But that "extended" name is a bit opaque and vague. Rename it to
    "full" makes it clearer what the system tries to do under this
    config: try to shutdown the tick anytime it can. The various
    constraints that prevent that to happen shouldn't be considered
    as fundamental properties of this feature but rather technical
    issues that may be solved in the future.

    Reported-by: Ingo Molnar
    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

21 Mar, 2013

1 commit

  • This way the full nohz CPUs can safely run with the tick
    stopped with a guarantee that somebody else is taking
    care of the jiffies and GTOD progression.

    Once the duty is attributed to a CPU, it won't change. Also that
    CPU can't enter into dyntick idle mode or be hot unplugged.

    This may later be improved from a power consumption POV. At
    least we should be able to share the duty amongst all CPUs
    outside the full dynticks range. Then the duty could even be
    shared with full dynticks CPUs when those can't stop their
    tick for any reason.

    But let's start with that very simple approach first.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    [fix have_nohz_full_mask offcase]
    Signed-off-by: Steven Rostedt

    Frederic Weisbecker
     

07 Mar, 2013

1 commit


14 Nov, 2012

1 commit


08 Sep, 2011

1 commit

  • The automatic increase of the min_delta_ns of a clockevents device
    should be done in the clockevents code as the minimum delay is an
    attribute of the clockevents device.

    In addition not all architectures want the automatic adjustment, on a
    massively virtualized system it can happen that the programming of a
    clock event fails several times in a row because the virtual cpu has
    been rescheduled quickly enough. In that case the minimum delay will
    erroneously be increased with no way back. The new config symbol
    GENERIC_CLOCKEVENTS_MIN_ADJUST is used to enable the automatic
    adjustment. The config option is selected only for x86.

    Signed-off-by: Martin Schwidefsky
    Cc: john stultz
    Link: http://lkml.kernel.org/r/20110823133142.494157493@de.ibm.com
    Signed-off-by: Thomas Gleixner

    Martin Schwidefsky
     

16 Mar, 2011

1 commit

  • …l/git/tip/linux-2.6-tip

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits)
    posix-clocks: Check write permissions in posix syscalls
    hrtimer: Remove empty hrtimer_init_hres_timer()
    hrtimer: Update hrtimer->state documentation
    hrtimer: Update base[CLOCK_BOOTTIME].offset correctly
    timers: Export CLOCK_BOOTTIME via the posix timers interface
    timers: Add CLOCK_BOOTTIME hrtimer base
    time: Extend get_xtime_and_monotonic_offset() to also return sleep
    time: Introduce get_monotonic_boottime and ktime_get_boottime
    hrtimers: extend hrtimer base code to handle more then 2 clockids
    ntp: Remove redundant and incorrect parameter check
    mn10300: Switch do_timer() to xtimer_update()
    posix clocks: Introduce dynamic clocks
    posix-timers: Cleanup namespace
    posix-timers: Add support for fd based clocks
    x86: Add clock_adjtime for x86
    posix-timers: Introduce a syscall for clock tuning.
    time: Splitout compat timex accessors
    ntp: Add ADJ_SETOFFSET mode bit
    time: Introduce timekeeping_inject_offset
    posix-timer: Update comment
    ...

    Fix up new system-call-related conflicts in
    arch/x86/ia32/ia32entry.S
    arch/x86/include/asm/unistd_32.h
    arch/x86/include/asm/unistd_64.h
    arch/x86/kernel/syscall_table_32.S
    (name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some
    due to movement of get_jiffies_64() in:
    kernel/time.c

    Linus Torvalds
     

26 Feb, 2011

1 commit

  • When the per cpu timer is marked CLOCK_EVT_FEAT_C3STOP, then we only
    can switch into oneshot mode, when the backup broadcast device
    supports oneshot mode as well. Otherwise we would try to switch the
    broadcast device into an unsupported mode unconditionally. This went
    unnoticed so far as the current available broadcast devices support
    oneshot mode. Seth unearthed this problem while debugging and working
    around an hpet related BIOS wreckage.

    Add the necessary check to tick_is_oneshot_available().

    Reported-and-tested-by: Seth Forshee
    Signed-off-by: Thomas Gleixner
    LKML-Reference:
    Cc: stable@kernel.org # .21 ->

    Thomas Gleixner
     

01 Feb, 2011

1 commit

  • All callers of do_timer() are converted to xtime_update(). The only
    users of xtime_lock are in kernel/time/. Make both local to
    kernel/time/ and remove them from the global header files.

    [ tglx: Reuse tick-internal.h instead of creating another local header
    file. Massaged changelog ]

    Signed-off-by: Torben Hohn
    Cc: Peter Zijlstra
    Cc: johnstul@us.ibm.com
    Cc: yong.zhang0@gmail.com
    Cc: hch@infradead.org
    Signed-off-by: Thomas Gleixner

    Torben Hohn
     

17 Dec, 2010

1 commit

  • __get_cpu_var() can be replaced with this_cpu_read and will then use a
    single read instruction with implied address calculation to access the
    correct per cpu instance.

    However, the address of a per cpu variable passed to __this_cpu_read()
    cannot be determined (since it's an implied address conversion through
    segment prefixes). Therefore apply this only to uses of __get_cpu_var
    where the address of the variable is not used.

    Cc: Pekka Enberg
    Cc: Hugh Dickins
    Cc: Thomas Gleixner
    Acked-by: H. Peter Anvin
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

15 Dec, 2009

2 commits


02 May, 2009

1 commit

  • tick_handle_periodic() can lock up hard when a one shot clock event
    device is used in combination with jiffies clocksource.

    Avoid an endless loop issue by requiring that a highres valid
    clocksource be installed before we call tick_periodic() in a loop when
    using ONESHOT mode. The result is we will only increment jiffies once
    per interrupt until a continuous hardware clocksource is available.

    Without this, we can run into a endless loop, where each cycle through
    the loop, jiffies is updated which increments time by tick_period or
    more (due to clock steering), which can cause the event programming to
    think the next event was before the newly incremented time and fail
    causing tick_periodic() to be called again and the whole process loops
    forever.

    [ Impact: prevent hard lock up ]

    Signed-off-by: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner
    Cc: stable@kernel.org

    john stultz
     

31 Jan, 2009

1 commit

  • Impact: fix CPU hotplug hang on Power6 testbox

    On architectures that support offlining all cpus (at least powerpc/pseries),
    hot-unpluging the tick_do_timer_cpu can result in a system hang.

    This comes from the fact that if the cpu going down happens to be the
    cpu doing the tick, then as the tick_do_timer_cpu handover happens after the
    cpu is dead (via the CPU_DEAD notification), we're left without ticks,
    jiffies are frozen and any task relying on timers (msleep, ...) is stuck.
    That's particularly the case for the cpu looping in __cpu_die() waiting
    for the dying cpu to be dead.

    This patch addresses this by having the tick_do_timer_cpu handover happen
    earlier during the CPU_DYING notification. For this, a new clockevent
    notification type is introduced (CLOCK_EVT_NOTIFY_CPU_DYING) which is triggered
    in hrtimer_cpu_notify().

    Signed-off-by: Sebastien Dugue
    Cc:
    Signed-off-by: Ingo Molnar

    Sebastien Dugue
     

01 Jan, 2009

1 commit

  • Impact: Use new APIs

    Convert kernel/time functions to use struct cpumask *.

    Note the ugly bitmap declarations in tick-broadcast.c. These should
    be cpumask_var_t, but there was no obvious initialization function to
    put the alloc_cpumask_var() calls in. This was safe.

    (Eventually 'struct cpumask' will be undefined for CONFIG_CPUMASK_OFFSTACK,
    so we use a bitmap here to show we really mean it).

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis

    Rusty Russell
     

13 Dec, 2008

2 commits

  • Impact: change calling convention of existing clock_event APIs

    struct clock_event_timer's cpumask field gets changed to take pointer,
    as does the ->broadcast function.

    Another single-patch change. For safety, we BUG_ON() in
    clockevents_register_device() if it's not set.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar

    Rusty Russell
     
  • Impact: change existing irq_chip API

    Not much point with gentle transition here: the struct irq_chip's
    setaffinity method signature needs to change.

    Fortunately, not widely used code, but hits a few architectures.

    Note: In irq_select_affinity() I save a temporary in by mangling
    irq_desc[irq].affinity directly. Ingo, does this break anything?

    (Folded in fix from KOSAKI Motohiro)

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis
    Reviewed-by: Grant Grundler
    Acked-by: Ingo Molnar
    Cc: ralf@linux-mips.org
    Cc: grundler@parisc-linux.org
    Cc: jeremy@xensource.com
    Cc: KOSAKI Motohiro

    Rusty Russell
     

23 Sep, 2008

2 commits

  • Impact: timer hang on CPU online observed on AMD C1E systems

    When a CPU is brought online then the broadcast machinery can
    be in the one shot state already. Check this and setup the timer
    device of the new CPU in one shot mode so the broadcast code
    can pick up the next_event value correctly.

    Another AMD C1E oddity, as we switch to broadcast immediately and
    not after the full bring up via the ACPI cpu idle code.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Impact: rare hang which can be triggered on CPU online.

    tick_do_timer_cpu keeps track of the CPU which updates jiffies
    via do_timer. The value -1 is used to signal, that currently no
    CPU is doing this. There are two cases, where the variable can
    have this state:

    boot:
    necessary for systems where the boot cpu id can be != 0

    nohz long idle sleep:
    When the CPU which did the jiffies update last goes into
    a long idle sleep it drops the update jiffies duty so
    another CPU which is not idle can pick it up and keep
    jiffies going.

    Using the same value for both situations is wrong, as the CPU online
    code can see the -1 state when the timer of the newly onlined CPU is
    setup. The setup for a newly onlined CPU goes through periodic mode
    and can pick up the do_timer duty without being aware of the nohz /
    highres mode of the already running system.

    Use two separate states and make them constants to avoid magic
    numbers confusion.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

17 Sep, 2008

1 commit

  • The device shut down does not cleanup the next_event variable of the
    clock event device. So when the device is reactivated the possible
    stale next_event value can prevent the device to be reprogrammed as it
    claims to wait on a event already.

    This is the root cause of the resurfacing suspend/resume problem,
    where systems need key press to come back to life.

    Fix this by setting next_event to KTIME_MAX when the device is shut
    down. Use a separate function for shutdown which takes care of that
    and only keep the direct set mode call in the broadcast code, where we
    can not touch the next_event value.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

05 Sep, 2008

1 commit

  • There is a ordering related problem with clockevents code, due to which
    clockevents_register_device() called after tickless/highres switch
    will not work. The new clockevent ends up with clockevents_handle_noop as
    event handler, resulting in no timer activity.

    The problematic path seems to be

    * old device already has hrtimer_interrupt as the event_handler
    * new clockevent device registers with a higher rating
    * tick_check_new_device() is called
    * clockevents_exchange_device() gets called
    * old->event_handler is set to clockevents_handle_noop
    * tick_setup_device() is called for the new device
    * which sets new->event_handler using the old->event_handler which is noop.

    Change the ordering so that new device inherits the proper handler.

    This does not have any issue in normal case as most likely all the clockevent
    devices are setup before the highres switch. But, can potentially be affecting
    some corner case where HPET force detect happens after the highres switch.
    This was a problem with HPET in MSI mode code that we have been experimenting
    with.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Shaohua Li
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Venkatesh Pallipadi
     

26 Jul, 2008

1 commit