23 May, 2018

1 commit

  • commit 5596fe34495cf0f645f417eb928ef224df3e3cb4 upstream.

    for_each_cpu() unintuitively reports CPU0 as set independent of the actual
    cpumask content on UP kernels. This causes an unexpected PIT interrupt
    storm on a UP kernel running in an SMP virtual machine on Hyper-V, and as
    a result, the virtual machine can suffer from a strange random delay of 1~20
    minutes during boot-up, and sometimes it can hang forever.

    Protect if by checking whether the cpumask is empty before entering the
    for_each_cpu() loop.

    [ tglx: Use !IS_ENABLED(CONFIG_SMP) instead of #ifdeffery ]

    Signed-off-by: Dexuan Cui
    Signed-off-by: Thomas Gleixner
    Cc: Josh Poulson
    Cc: "Michael Kelley (EOSG)"
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: stable@vger.kernel.org
    Cc: Rakib Mullick
    Cc: Jork Loeser
    Cc: Greg Kroah-Hartman
    Cc: Andrew Morton
    Cc: KY Srinivasan
    Cc: Linus Torvalds
    Cc: Alexey Dobriyan
    Cc: Dmitry Vyukov
    Link: https://lkml.kernel.org/r/KL1P15301MB000678289FE55BA365B3279ABF990@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
    Link: https://lkml.kernel.org/r/KL1P15301MB0006FA63BC22BEB64902EAA0BF930@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     

13 Jun, 2017

1 commit


21 Feb, 2017

1 commit

  • Pull timer updates from Thomas Gleixner:
    "Nothing exciting, just the usual pile of fixes, updates and cleanups:

    - A bunch of clocksource driver updates

    - Removal of CONFIG_TIMER_STATS and the related /proc file

    - More posix timer slim down work

    - A scalability enhancement in the tick broadcast code

    - Math cleanups"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
    hrtimer: Catch invalid clockids again
    math64, tile: Fix build failure
    clocksource/drivers/arm_arch_timer:: Mark cyclecounter __ro_after_init
    timerfd: Protect the might cancel mechanism proper
    timer_list: Remove useless cast when printing
    time: Remove CONFIG_TIMER_STATS
    clocksource/drivers/arm_arch_timer: Work around Hisilicon erratum 161010101
    clocksource/drivers/arm_arch_timer: Introduce generic errata handling infrastructure
    clocksource/drivers/arm_arch_timer: Remove fsl-a008585 parameter
    clocksource/drivers/arm_arch_timer: Add dt binding for hisilicon-161010101 erratum
    clocksource/drivers/ostm: Add renesas-ostm timer driver
    clocksource/drivers/ostm: Document renesas-ostm timer DT bindings
    clocksource/drivers/tcb_clksrc: Use 32 bit tcb as sched_clock
    clocksource/drivers/gemini: Add driver for the Cortina Gemini
    clocksource: add DT bindings for Cortina Gemini
    clockevents: Add a clkevt-of mechanism like clksrc-of
    tick/broadcast: Reduce lock cacheline contention
    timers: Omit POSIX timer stuff from task_struct when disabled
    x86/timer: Make delay() work during early bootup
    delay: Add explanation of udelay() inaccuracy
    ...

    Linus Torvalds
     

13 Feb, 2017

1 commit

  • tick_broadcast_lock is taken from interrupt context, but the following call
    chain takes the lock without disabling interrupts:

    [ 12.703736] _raw_spin_lock+0x3b/0x50
    [ 12.703738] tick_broadcast_control+0x5a/0x1a0
    [ 12.703742] intel_idle_cpu_online+0x22/0x100
    [ 12.703744] cpuhp_invoke_callback+0x245/0x9d0
    [ 12.703752] cpuhp_thread_fun+0x52/0x110
    [ 12.703754] smpboot_thread_fn+0x276/0x320

    So the following deadlock can happen:

    lock(tick_broadcast_lock);

    lock(tick_broadcast_lock);

    intel_idle_cpu_online() is the only place which violates the calling
    convention of tick_broadcast_control(). This was caused by the removal of
    the smp function call in course of the cpu hotplug rework.

    Instead of slapping local_irq_disable/enable() at the call site, we can
    relax the calling convention and handle it in the core code, which makes
    the whole machinery more robust.

    Fixes: 29d7bbada98e ("intel_idle: Remove superfluous SMP fuction call")
    Reported-by: Gabriel C
    Signed-off-by: Mike Galbraith
    Cc: Ruslan Ruslichenko
    Cc: Jiri Slaby
    Cc: Greg KH
    Cc: Borislav Petkov
    Cc: lwn@lwn.net
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Anna-Maria Gleixner
    Cc: Sebastian Siewior
    Cc: stable
    Link: http://lkml.kernel.org/r/1486953115.5912.4.camel@gmx.de
    Signed-off-by: Thomas Gleixner

    Mike Galbraith
     

04 Feb, 2017

1 commit

  • It was observed that on an Intel x86 system without the ARAT (Always
    running APIC timer) feature and with fairly large number of CPUs as
    well as CPUs coming in and out of intel_idle frequently, the lock
    contention on the tick_broadcast_lock can become significant.

    To reduce contention, the lock is put into its own cacheline and all
    the cpumask_var_t variables are put into the __read_mostly section.

    Running the SP benchmark of the NAS Parallel Benchmarks on a 4-socket
    16-core 32-thread Nehalam system, the performance number improved
    from 3353.94 Mop/s to 3469.31 Mop/s when this patch was applied on
    a 4.9.6 kernel. This is a 3.4% improvement.

    Signed-off-by: Waiman Long
    Cc: "Peter Zijlstra (Intel)"
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/1485799063-20857-1-git-send-email-longman@redhat.com
    Signed-off-by: Thomas Gleixner

    Waiman Long
     

26 Dec, 2016

1 commit

  • ktime is a union because the initial implementation stored the time in
    scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
    variant for 32bit machines. The Y2038 cleanup removed the timespec variant
    and switched everything to scalar nanoseconds. The union remained, but
    become completely pointless.

    Get rid of the union and just keep ktime_t as simple typedef of type s64.

    The conversion was done with coccinelle and some manual mopping up.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     

15 Dec, 2016

1 commit

  • When a disfunctional timer, e.g. dummy timer, is installed, the tick core
    tries to setup the broadcast timer.

    If no broadcast device is installed, the kernel crashes with a NULL pointer
    dereference in tick_broadcast_setup_oneshot() because the function has no
    sanity check.

    Reported-by: Mason
    Signed-off-by: Thomas Gleixner
    Cc: Mark Rutland
    Cc: Anna-Maria Gleixner
    Cc: Richard Cochran
    Cc: Sebastian Andrzej Siewior
    Cc: Daniel Lezcano
    Cc: Peter Zijlstra ,
    Cc: Sebastian Frias
    Cc: Thibaud Cornic
    Cc: Robin Murphy
    Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr

    Thomas Gleixner
     

14 Jul, 2015

1 commit


11 Jul, 2015

1 commit


08 Jul, 2015

9 commits

  • Andriy reported that on a virtual machine the warning about negative
    expiry time in the clock events programming code triggered:

    hpet: hpet0 irq 40 for MSI
    hpet: hpet1 irq 41 for MSI
    Switching to clocksource hpet
    WARNING: at kernel/time/clockevents.c:239

    [] clockevents_program_event+0xdb/0xf0
    [] tick_handle_periodic_broadcast+0x41/0x50
    [] timer_interrupt+0x15/0x20

    When the second hpet is installed as a per cpu timer the broadcast
    event is not longer required and stopped, which sets the next_evt of
    the broadcast device to KTIME_MAX.

    If after that a spurious interrupt happens on the broadcast device,
    then the current code blindly handles it and tries to reprogram the
    broadcast device afterwards, which adds the period to
    next_evt. KTIME_MAX + period results in a negative expiry value
    causing the WARN_ON in the clockevents code to trigger.

    Add a proper check for the state of the broadcast device into the
    interrupt handler and return if the interrupt is spurious.

    [ Folded in pointer fix from Sudeep ]

    Reported-by: Andriy Gapon
    Signed-off-by: Thomas Gleixner
    Cc: Sudeep Holla
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Link: http://lkml.kernel.org/r/20150705205221.802094647@linutronix.de

    Thomas Gleixner
     
  • If the current cpu is the one which has the hrtimer based broadcast
    queued then we better return busy immediately instead of going through
    loops and hoops to figure that out.

    [ Split out from a larger combo patch ]

    Tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     
  • Tell the idle code not to go deep if the broadcast IPI is about to
    arrive.

    [ Split out from a larger combo patch ]

    Tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     
  • If the system is in periodic mode and the broadcast device is hrtimer
    based, return busy as we have no proper handling for this.

    [ Split out from a larger combo patch ]

    Tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     
  • We need to check more than the periodic mode for proper operation in
    all runtime combinations. To avoid code duplication move the check
    into the enter state handling.

    No functional change.

    [ Split out from a larger combo patch ]

    Reported-and-tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     
  • Add a check for a installed broadcast device to the oneshot control
    function and return busy if not.

    [ Split out from a larger combo patch ]

    Reported-and-tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     
  • Currently the broadcast busy check, which prevents the idle code from
    going into deep idle, works only in one shot mode.

    If NOHZ and HIGHRES are off (config or command line) there is no
    sanity check at all, so under certain conditions cpus are allowed to
    go into deep idle, where the local timer stops, and are not woken up
    again because there is no broadcast timer installed or a hrtimer based
    broadcast device is not evaluated.

    Move tick_broadcast_oneshot_control() into the common code and provide
    proper subfunctions for the various config combinations.

    The common check in tick_broadcast_oneshot_control() is for the C3STOP
    misfeature flag of the local clock event device. If its not set, idle
    can proceed. If set, further checks are necessary.

    Provide checks for the trivial cases:

    - If broadcast is disabled in the config, then return busy

    - If oneshot mode (NOHZ/HIGHES) is disabled in the config, return
    busy if the broadcast device is hrtimer based.

    - If oneshot mode is enabled in the config call the original
    tick_broadcast_oneshot_control() function. That function needs
    extra checks which will be implemented in seperate patches.

    [ Split out from a larger combo patch ]

    Reported-and-tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     
  • The broadcast code shuts down the local clock event unconditionally
    even if no broadcast device is installed or if the broadcast device is
    hrtimer based.

    Add proper sanity checks.

    [ Split out from a larger combo patch ]

    Reported-and-tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     
  • The hrtimer based broadcast vehicle can cause a hrtimer recursion
    which went unnoticed until we changed the hrtimer expiry code to keep
    track of the currently running timer.

    local_timer_interrupt()
    local_handler()
    hrtimer_interrupt()
    expire_hrtimers()
    broadcast_hrtimer()
    send_ipis()
    local_handler()
    hrtimer_interrupt()
    ....

    Solution is simple: Prevent the local handler call from the broadcast
    code when the broadcast 'device' is hrtimer based.

    [ Split out from a larger combo patch ]

    Tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     

02 Jun, 2015

2 commits


05 May, 2015

2 commits

  • Simplify the oneshot logic by avoiding the reprogramming loops. That
    also allows to call the cpu local handler outside of the
    broadcast_lock held region.

    Tested-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • With the removal of the hrtimer softirq the switch to highres/nohz
    mode happens in the tick interrupt. That leads to a livelock when the
    per cpu event handler is directly called from the broadcast handler
    under broadcast lock because broadcast lock needs to be taken for the
    highres/nohz switch as well.

    Solve this by calling the cpu local handler outside the broadcast_lock
    held region.

    Fixes: c6eb3f70d448 "hrtimer: Get rid of hrtimer softirq"
    Reported-and-tested-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

03 Apr, 2015

3 commits

  • clockevents_notify() is a leftover from the early design of the
    clockevents facility. It's really not a notification mechanism,
    it's a multiplex call. We are way better off to have explicit
    calls instead of this monstrosity.

    Split out the cleanup function for a dead cpu and invoke it
    directly from the cpu down code. Make it conditional on
    CPU_HOTPLUG as well.

    Temporary change, will be refined in the future.

    Signed-off-by: Thomas Gleixner
    [ Rebased, added clockevents_notify() removal ]
    Signed-off-by: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1735025.raBZdQHM3m@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • clockevents_notify() is a leftover from the early design of the
    clockevents facility. It's really not a notification mechanism,
    it's a multiplex call. We are way better off to have explicit
    calls instead of this monstrosity.

    Split out the broadcast oneshot control into a separate function
    and provide inline helpers. Switch clockevents_notify() over.
    This will go away once all callers are converted.

    This also gets rid of the nested locking of clockevents_lock and
    broadcast_lock. The broadcast oneshot control functions do not
    require clockevents_lock. Only the managing functions
    (setup/shutdown/suspend/resume of the broadcast device require
    clockevents_lock.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki
    Cc: Alexandre Courbot
    Cc: Daniel Lezcano
    Cc: Len Brown
    Cc: Peter Zijlstra
    Cc: Stephen Warren
    Cc: Thierry Reding
    Cc: Tony Lindgren
    Link: http://lkml.kernel.org/r/13000649.8qZuEDV0OA@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • clockevents_notify() is a leftover from the early design of the
    clockevents facility. It's really not a notification mechanism,
    it's a multiplex call. We are way better off to have explicit
    calls instead of this monstrosity.

    Split out the broadcast control into a separate function and
    provide inline helpers. Switch clockevents_notify() over. This
    will go away once all callers are converted.

    This also gets rid of the nested locking of clockevents_lock and
    broadcast_lock. The broadcast control functions do not require
    clockevents_lock. Only the managing functions
    (setup/shutdown/suspend/resume of the broadcast device require
    clockevents_lock.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki
    Cc: Daniel Lezcano
    Cc: Len Brown
    Cc: Peter Zijlstra
    Cc: Tony Lindgren
    Link: http://lkml.kernel.org/r/8086559.ttsuS0n1Xr@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

02 Apr, 2015

1 commit

  • It was found when doing a hotplug stress test on POWER, that the
    machine either hit softlockups or rcu_sched stall warnings. The
    issue was traced to commit:

    7cba160ad789 ("powernv/cpuidle: Redesign idle states management")

    which exposed the cpu_down() race with hrtimer based broadcast mode:

    5d1638acb9f6 ("tick: Introduce hrtimer based broadcast")

    The race is the following:

    Assume CPU1 is the CPU which holds the hrtimer broadcasting duty
    before it is taken down.

    CPU0 CPU1

    cpu_down() take_cpu_down()
    disable_interrupts()

    cpu_die()

    while (CPU1 != CPU_DEAD) {
    msleep(100);
    switch_to_idle();
    stop_cpu_timer();
    schedule_broadcast();
    }

    tick_cleanup_cpu_dead()
    take_over_broadcast()

    So after CPU1 disabled interrupts it cannot handle the broadcast
    hrtimer anymore, so CPU0 will be stuck forever.

    Fix this by explicitly taking over broadcast duty before cpu_die().

    This is a temporary workaround. What we really want is a callback
    in the clockevent device which allows us to do that from the dying
    CPU by pushing the hrtimer onto a different cpu. That might involve
    an IPI and is definitely more complex than this immediate fix.

    Changelog was picked up from:

    https://lkml.org/lkml/2015/2/16/213

    Suggested-by: Thomas Gleixner
    Tested-by: Nicolas Pitre
    Signed-off-by: Preeti U. Murthy
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: mpe@ellerman.id.au
    Cc: nicolas.pitre@linaro.org
    Cc: peterz@infradead.org
    Cc: rjw@rjwysocki.net
    Fixes: http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html
    Link: http://lkml.kernel.org/r/20150330092410.24979.59887.stgit@preeti.in.ibm.com
    [ Merged it to the latest timer tree, renamed the callback, tidied up the changelog. ]
    Signed-off-by: Ingo Molnar

    Preeti U Murthy
     

01 Apr, 2015

2 commits

  • Xen calls on every cpu into tick_resume() which is just wrong.
    tick_resume() is for the syscore global suspend/resume
    invocation. What XEN really wants is a per cpu local resume
    function.

    Provide a tick_resume_local() function and use it in XEN.

    Also provide a complementary tick_suspend_local() and modify
    tick_unfreeze() and tick_freeze(), respectively, to use the
    new local tick resume/suspend functions.

    Signed-off-by: Thomas Gleixner
    [ Combined two patches, rebased, modified subject/changelog. ]
    Signed-off-by: Rafael J. Wysocki
    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Cc: Konrad Rzeszutek Wilk
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1698741.eezk9tnXtG@vostro.rjw.lan
    [ Merged to latest timers/core. ]
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Solely used in tick-broadcast.c and the return value is
    hardcoded 0. Make it static and void.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1689058.QkHYDJSRKu@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

27 Mar, 2015

2 commits

  • 'enum clock_event_mode' is used for two purposes today:

    - to pass mode to the driver of clockevent device::set_mode().

    - for managing state of the device for clockevents core.

    For supporting new modes/states we have moved away from the
    legacy set_mode() callback to new per-mode/state callbacks. New
    modes/states shouldn't be exposed to the legacy (now OBSOLOTE)
    callbacks and so we shouldn't add new states to 'enum
    clock_event_mode'.

    Lets have separate enums for the two use cases mentioned above.
    Keep using the earlier enum for legacy set_mode() callback and
    mark it OBSOLETE. And add another enum to clearly specify the
    possible states of a clockevent device.

    This also renames the newly added per-mode callbacks to reflect
    state changes.

    We haven't got rid of 'mode' member of 'struct
    clock_event_device' as it is used by some of the clockevent
    drivers and it would automatically die down once we migrate
    those drivers to the new interface. It ('mode') is only updated
    now for the drivers using the legacy interface.

    Suggested-by: Peter Zijlstra
    Suggested-by: Ingo Molnar
    Signed-off-by: Viresh Kumar
    Acked-by: Peter Zijlstra
    Cc: Daniel Lezcano
    Cc: Frederic Weisbecker
    Cc: Kevin Hilman
    Cc: Preeti U Murthy
    Cc: linaro-kernel@lists.linaro.org
    Cc: linaro-networking@linaro.org
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lkml.kernel.org/r/b6b0143a8a57bd58352ad35e08c25424c879c0cb.1425037853.git.viresh.kumar@linaro.org
    Signed-off-by: Ingo Molnar

    Viresh Kumar
     
  • Upcoming patch will redefine possible states of a clockevent
    device. The RESUME mode is a special case only for tick's
    clockevent devices. In future it can be replaced by ->resume()
    callback already available for clockevent devices.

    Lets handle it separately so that clockevents_set_mode() only
    handles states valid across all devices. This also renames
    set_mode_resume() to tick_resume() to make it more explicit.

    Signed-off-by: Viresh Kumar
    Acked-by: Peter Zijlstra
    Cc: Daniel Lezcano
    Cc: Frederic Weisbecker
    Cc: Kevin Hilman
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: linaro-kernel@lists.linaro.org
    Cc: linaro-networking@linaro.org
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lkml.kernel.org/r/c1b0112410870f49e7bf06958e1483eac6c15e20.1425037853.git.viresh.kumar@linaro.org
    Signed-off-by: Ingo Molnar

    Viresh Kumar
     

27 Aug, 2014

1 commit

  • Convert uses of __get_cpu_var for creating a address from a percpu
    offset to this_cpu_ptr.

    The two cases where get_cpu_var is used to actually access a percpu
    variable are changed to use this_cpu_read/raw_cpu_read.

    Reviewed-by: Thomas Gleixner
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

02 Apr, 2014

1 commit

  • Pull timer changes from Thomas Gleixner:
    "This assorted collection provides:

    - A new timer based timer broadcast feature for systems which do not
    provide a global accessible timer device. That allows those
    systems to put CPUs into deep idle states where the per cpu timer
    device stops.

    - A few NOHZ_FULL related improvements to the timer wheel

    - The usual updates to timer devices found in ARM SoCs

    - Small improvements and updates all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
    tick: Remove code duplication in tick_handle_periodic()
    tick: Fix spelling mistake in tick_handle_periodic()
    x86: hpet: Use proper destructor for delayed work
    workqueue: Provide destroy_delayed_work_on_stack()
    clocksource: CMT, MTU2, TMU and STI should depend on GENERIC_CLOCKEVENTS
    timer: Remove code redundancy while calling get_nohz_timer_target()
    hrtimer: Rearrange comments in the order struct members are declared
    timer: Use variable head instead of &work_list in __run_timers()
    clocksource: exynos_mct: silence a static checker warning
    arm: zynq: Add support for cpufreq
    arm: zynq: Don't use arm_global_timer with cpufreq
    clocksource/cadence_ttc: Overhaul clocksource frequency adjustment
    clocksource/cadence_ttc: Call clockevents_update_freq() with IRQs enabled
    clocksource: Add Kconfig entries for CMT, MTU2, TMU and STI
    sh: Remove Kconfig entries for TMU, CMT and MTU2
    ARM: shmobile: Remove CMT, TMU and STI Kconfig entries
    clocksource: armada-370-xp: Use atomic access for shared registers
    clocksource: orion: Use atomic access for shared registers
    clocksource: timer-keystone: Delete unnecessary variable
    clocksource: timer-keystone: introduce clocksource driver for Keystone
    ...

    Linus Torvalds
     

14 Feb, 2014

1 commit

  • AMD systems which use the C1E workaround in the amd_e400_idle routine
    trigger the WARN_ON_ONCE in the broadcast code when onlining a CPU.

    The reason is that the idle routine of those AMD systems switches the
    cpu into forced broadcast mode early on before the newly brought up
    CPU can switch over to high resolution / NOHZ mode. The timer related
    CPU1 bringup looks like this:

    clockevent_register_device(local_apic);
    tick_setup(local_apic);
    ...
    idle()
    tick_broadcast_on_off(FORCE);
    tick_broadcast_oneshot_control(ENTER)
    cpumask_set(cpu, broadcast_oneshot_mask);
    halt();

    Now the broadcast interrupt on CPU0 sets CPU1 in the
    broadcast_pending_mask and wakes CPU1. So CPU1 continues:

    local_apic_timer_interrupt()
    tick_handle_periodic();
    softirq()
    tick_init_highres();
    cpumask_clr(cpu, broadcast_oneshot_mask);

    tick_broadcast_oneshot_control(ENTER)
    WARN_ON(cpumask_test(cpu, broadcast_pending_mask);

    So while we remove CPU1 from the broadcast_oneshot_mask when we switch
    over to highres mode, we do not clear the pending bit, which then
    triggers the warning when we go back to idle.

    The reason why this is only visible on C1E affected AMD systems is
    that the other machines enter the deep sleep states via
    acpi_idle/intel_idle and exit the broadcast mode before executing the
    remote triggered local_apic_timer_interrupt. So the pending bit is
    already cleared when the switch over to highres mode is clearing the
    oneshot mask.

    The solution is simple: Clear the pending bit together with the mask
    bit when we switch over to highres mode.

    Stanislaw came up independently with the same patch by enforcing the
    C1E workaround and debugging the fallout. I picked mine, because mine
    has a changelog :)

    Reported-by: poma
    Debugged-by: Stanislaw Gruszka
    Signed-off-by: Thomas Gleixner
    Cc: Olaf Hering
    Cc: Dave Jones
    Cc: Justin M. Forbes
    Cc: Josh Boyer
    Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402111434180.21991@ionos.tec.linutronix.de
    Cc: stable@vger.kernel.org # 3.10+
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

07 Feb, 2014

3 commits

  • On some architectures, in certain CPU deep idle states the local timers stop.
    An external clock device is used to wakeup these CPUs. The kernel support for the
    wakeup of these CPUs is provided by the tick broadcast framework by using the
    external clock device as the wakeup source.

    However not all implementations of architectures provide such an external
    clock device. This patch includes support in the broadcast framework to handle
    the wakeup of the CPUs in deep idle states on such systems by queuing a hrtimer
    on one of the CPUs, which is meant to handle the wakeup of CPUs in deep idle states.

    This patchset introduces a pseudo clock device which can be registered by the
    archs as tick_broadcast_device in the absence of a real external clock
    device. Once registered, the broadcast framework will work as is for these
    architectures as long as the archs take care of the BROADCAST_ENTER
    notification failing for one of the CPUs. This CPU is made the stand by CPU to
    handle wakeup of the CPUs in deep idle and it *must not enter deep idle states*.

    The CPU with the earliest wakeup is chosen to be this CPU. Hence this way the
    stand by CPU dynamically moves around and so does the hrtimer which is queued
    to trigger at the next earliest wakeup time. This is consistent with the case where
    an external clock device is present. The smp affinity of this clock device is
    set to the CPU with the earliest wakeup. This patchset handles the hotplug of
    the stand by CPU as well by moving the hrtimer on to the CPU handling the CPU_DEAD
    notification.

    Originally-from: Thomas Gleixner
    Signed-off-by: Preeti U Murthy
    Cc: deepthi@linux.vnet.ibm.com
    Cc: paulmck@linux.vnet.ibm.com
    Cc: fweisbec@gmail.com
    Cc: paulus@samba.org
    Cc: srivatsa.bhat@linux.vnet.ibm.com
    Cc: svaidy@linux.vnet.ibm.com
    Cc: peterz@infradead.org
    Cc: benh@kernel.crashing.org
    Cc: rafael.j.wysocki@intel.com
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/20140207080632.17187.80532.stgit@preeti.in.ibm.com
    Signed-off-by: Thomas Gleixner

    Preeti U Murthy
     
  • The broadcast framework can potentially be made use of by archs which do not have an
    external clock device as well. Then, it is required that one of the CPUs need
    to handle the broadcasting of wakeup IPIs to the CPUs in deep idle. As a
    result its local timers should remain functional all the time. For such
    a CPU, the BROADCAST_ENTER notification has to fail indicating that its clock
    device cannot be shutdown. To make way for this support, change the return
    type of tick_broadcast_oneshot_control() and hence clockevents_notify() to
    indicate such scenarios.

    Signed-off-by: Preeti U Murthy
    Cc: deepthi@linux.vnet.ibm.com
    Cc: paulmck@linux.vnet.ibm.com
    Cc: fweisbec@gmail.com
    Cc: paulus@samba.org
    Cc: srivatsa.bhat@linux.vnet.ibm.com
    Cc: svaidy@linux.vnet.ibm.com
    Cc: peterz@infradead.org
    Cc: benh@kernel.crashing.org
    Cc: rafael.j.wysocki@intel.com
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/20140207080606.17187.78306.stgit@preeti.in.ibm.com
    Signed-off-by: Thomas Gleixner

    Preeti U Murthy
     
  • We can identify the broadcast device in the core and serialize all
    callers including interrupts on a different CPU against the update.
    Also, disabling interrupts is moved into the core allowing callers to
    leave interrutps enabled when calling clockevents_update_freq().

    Signed-off-by: Soren Brinkmann
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Soeren Brinkmann
    Cc: Daniel Lezcano
    Cc: Michal Simek
    Link: http://lkml.kernel.org/r/1391466877-28908-2-git-send-email-soren.brinkmann@xilinx.com
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

03 Dec, 2013

1 commit

  • A few functions use remote per CPU access APIs when they
    deal with local values.

    Just do the right conversion to improve performance, code
    readability and debug checks.

    While at it, lets extend some of these function names with *_this_cpu()
    suffix in order to display their purpose more clearly.

    Signed-off-by: Frederic Weisbecker
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Steven Rostedt

    Frederic Weisbecker
     

02 Oct, 2013

1 commit

  • On most ARM systems the per-cpu clockevents are truly per-cpu in
    the sense that they can't be controlled on any other CPU besides
    the CPU that they interrupt. If one of these clockevents were to
    become a broadcast source we will run into a lot of trouble
    because the broadcast source is enabled on the first CPU to go
    into deep idle (if that CPU suffers from FEAT_C3_STOP) and that
    could be a different CPU than what the clockevent is interrupting
    (or even worse the CPU that the clockevent interrupts could be
    offline).

    Theoretically it's possible to support per-cpu clockevents as the
    broadcast source but so far we haven't needed this and supporting
    it is rather complicated. Let's just deny the possibility for now
    until this becomes a reality (let's hope it never does!).

    Signed-off-by: Soren Brinkmann
    Signed-off-by: Daniel Lezcano
    Acked-by: Michal Simek

    Soren Brinkmann
     

12 Jul, 2013

1 commit

  • On ARM systems the dummy clockevent is registered with the cpu
    hotplug notifier chain before any other per-cpu clockevent. This
    has the side-effect of causing the dummy clockevent to be
    registered first in every hotplug sequence. Because the dummy is
    first, we'll try to turn the broadcast source on but the code in
    tick_device_uses_broadcast() assumes the broadcast source is in
    periodic mode and calls tick_broadcast_start_periodic()
    unconditionally.

    On boot this isn't a problem because we typically haven't
    switched into oneshot mode yet (if at all). During hotplug, if
    the broadcast source isn't in periodic mode we'll replace the
    broadcast oneshot handler with the broadcast periodic handler and
    start emulating oneshot mode when we shouldn't. Due to the way
    the broadcast oneshot handler programs the next_event it's
    possible for it to contain KTIME_MAX and cause us to hang the
    system when the periodic handler tries to program the next tick.
    Fix this by using the appropriate function to start the broadcast
    source.

    Reported-by: Stephen Warren
    Tested-by: Stephen Warren
    Signed-off-by: Stephen Boyd
    Cc: Mark Rutland
    Cc: Marc Zyngier
    Cc: ARM kernel mailing list
    Cc: John Stultz
    Cc: Joseph Lo
    Link: http://lkml.kernel.org/r/20130711140059.GA27430@codeaurora.org
    Signed-off-by: Thomas Gleixner

    Stephen Boyd
     

05 Jul, 2013

1 commit