20 Apr, 2012

2 commits

  • During resume, tick_resume_broadcast() programs the broadcast timer in
    oneshot mode unconditionally. On the platforms where broadcast timer
    is not really required, this will generate spurious broadcast timer
    ticks upon resume. For example, on the always running apic timer
    platforms with HPET, I see spurious hpet tick once every ~5minutes
    (which is the 32-bit hpet counter wraparound time).

    Similar to boot time, during resume make the oneshot mode setting of
    the broadcast clock event device conditional on the state of active
    broadcast users.

    Signed-off-by: Suresh Siddha
    Tested-by: Santosh Shilimkar
    Tested-by: svenjoac@gmx.de
    Cc: torvalds@linux-foundation.org
    Cc: rjw@sisk.pl
    Link: http://lkml.kernel.org/r/1334802459.28674.209.camel@sbsiddha-desk.sc.intel.com
    Signed-off-by: Thomas Gleixner

    Suresh Siddha
     
  • Santosh found another trap when we avoid to initialize the broadcast
    device in the switch_to_oneshot code. The broadcast device might be
    still in SHUTDOWN state when we actually need to use it. That
    obviously breaks, as set_next_event() is called on a shutdown
    device. This did not break on x86, but Suresh analyzed it:

    From the review, most likely on Sven's system we are force enabling
    the hpet using the pci quirk's method very late. And in this case,
    hpet_clockevent (which will be global_clock_event) handler can be
    null, specifically as this platform might not be using deeper c-states
    and using the reliable APIC timer.

    Prior to commit 'fa4da365bc7772c', that handler will be set to
    'tick_handle_oneshot_broadcast' when we switch the broadcast timer to
    oneshot mode, even though we don't use it. Post commit
    'fa4da365bc7772c', we stopped switching the broadcast mode to oneshot
    as this is not really needed and his platform's global_clock_event's
    handler will remain null. While on my SNB laptop, same is set to
    'clockevents_handle_noop' because hpet gets enabled very early. (noop
    handler on my platform set when the early enabled hpet timer gets
    replaced by the lapic timer).

    But the commit 'fa4da365bc7772c' tracked the broadcast timer mode in
    the SW as oneshot, even though it didn't touch the HW timer. During
    resume however, tick_resume_broadcast() saw the SW broadcast mode as
    oneshot and actually programmed the broadcast device also into oneshot
    mode. So this triggered the null pointer de-reference after the hpet
    wraps around and depending on what the hpet counter is set to. On the
    normal platforms where hpet gets enabled early we should be seeing a
    spurious interrupt (in my SNB laptop I see one spurious interrupt
    after around 5 minutes ;) which is 32-bit hpet counter wraparound
    time), but that's a separate issue.

    Enforce the mode setting when trying to set an event.

    Reported-and-tested-by: Santosh Shilimkar
    Signed-off-by: Thomas Gleixner
    Acked-by: Suresh Siddha
    Cc: torvalds@linux-foundation.org
    Cc: svenjoac@gmx.de
    Cc: rjw@sisk.pl
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1204181723350.2542@ionos

    Thomas Gleixner
     

18 Apr, 2012

1 commit

  • Sven Joachim reported, that suspend/resume on rc3 trips over a NULL
    pointer dereference. Linus spotted the clockevent handler being NULL.

    commit fa4da365b(clockevents: tTack broadcast device mode change in
    tick_broadcast_switch_to_oneshot()) tried to fix a problem with the
    broadcast device setup, which was introduced in commit 77b0d60c5(
    clockevents: Leave the broadcast device in shutdown mode when not
    needed).

    The initial commit avoided to set up the broadcast device when no
    broadcast request bits were set, but that left the broadcast device
    disfunctional. In consequence deep idle states which need the
    broadcast device were not woken up.

    commit fa4da365b tried to fix that by initializing the state of the
    broadcast facility, but that missed the fact, that nothing initializes
    the event handler and some other state of the underlying clock event
    device.

    The fix is to revert both commits and make only the mode setting of
    the clock event device conditional on the state of active broadcast
    users.

    That initializes everything except the low level device mode, but this
    happens when the broadcast functionality is invoked by deep idle.

    Reported-and-tested-by: Sven Joachim
    Signed-off-by: Thomas Gleixner
    Cc: Rafael J. Wysocki
    Cc: Linus Torvalds
    Cc: Suresh Siddha
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1204181205540.2542@ionos

    Thomas Gleixner
     

10 Apr, 2012

1 commit

  • In the commit 77b0d60c5adf39c74039e2142a1d3cd1e4d53799,
    "clockevents: Leave the broadcast device in shutdown mode when not needed",
    we were bailing out too quickly in tick_broadcast_switch_to_oneshot(),
    with out tracking the broadcast device mode change to 'TICKDEV_MODE_ONESHOT'.

    This breaks the platforms which need broadcast device oneshot services during
    deep idle states. tick_broadcast_oneshot_control() thinks that it is
    in periodic mode and fails to take proper decisions based on the
    CLOCK_EVT_NOTIFY_BROADCAST_[ENTER, EXIT] notifications during deep
    idle entry/exit.

    Fix this by tracking the broadcast device mode as 'TICKDEV_MODE_ONESHOT',
    before leaving the broadcast HW device in shutdown mode if there are no active
    requests for the moment.

    Reported-and-tested-by: Santosh Shilimkar
    Signed-off-by: Suresh Siddha
    Cc: johnstul@us.ibm.com
    Link: http://lkml.kernel.org/r/1334011304.12400.81.camel@sbsiddha-desk.sc.intel.com
    Signed-off-by: Thomas Gleixner

    Suresh Siddha
     

15 Feb, 2012

1 commit

  • Platforms with Always Running APIC Timer doesn't use the broadcast timer
    but the kernel is leaving the broadcast timer (HPET in this case)
    in oneshot mode.

    On these platforms, before the switch to oneshot mode, broadcast device is
    actually in shutdown mode. Code checks for empty tick_broadcast_mask and
    avoids going into the periodic mode.

    During switch to oneshot mode, add the same tick_broadcast_mask checks in the
    tick_broadcast_switch_to_oneshot() and avoid the broadcast device going into
    the oneshot mode.

    Signed-off-by: Suresh Siddha
    Cc: john stultz
    Cc: venki@google.com
    Link: http://lkml.kernel.org/r/1320452301.15071.16.camel@sbsiddha-desk.sc.intel.com
    Signed-off-by: Thomas Gleixner

    Suresh Siddha
     

02 Dec, 2011

1 commit


08 Sep, 2011

1 commit

  • The automatic increase of the min_delta_ns of a clockevents device
    should be done in the clockevents code as the minimum delay is an
    attribute of the clockevents device.

    In addition not all architectures want the automatic adjustment, on a
    massively virtualized system it can happen that the programming of a
    clock event fails several times in a row because the virtual cpu has
    been rescheduled quickly enough. In that case the minimum delay will
    erroneously be increased with no way back. The new config symbol
    GENERIC_CLOCKEVENTS_MIN_ADJUST is used to enable the automatic
    adjustment. The config option is selected only for x86.

    Signed-off-by: Martin Schwidefsky
    Cc: john stultz
    Link: http://lkml.kernel.org/r/20110823133142.494157493@de.ibm.com
    Signed-off-by: Thomas Gleixner

    Martin Schwidefsky
     

21 May, 2011

1 commit


17 May, 2011

1 commit

  • The first cpu which switches from periodic to oneshot mode switches
    also the broadcast device into oneshot mode. The broadcast device
    serves as a backup for per cpu timers which stop in deeper
    C-states. To avoid starvation of the cpus which might be in idle and
    depend on broadcast mode it marks the other cpus as broadcast active
    and sets the brodcast expiry value of those cpus to the next tick.

    The oneshot mode broadcast bit for the other cpus is sticky and gets
    only cleared when those cpus exit idle. If a cpu was not idle while
    the bit got set in consequence the bit prevents that the broadcast
    device is armed on behalf of that cpu when it enters idle for the
    first time after it switched to oneshot mode.

    In most cases that goes unnoticed as one of the other cpus has usually
    a timer pending which keeps the broadcast device armed with a short
    timeout. Now if the only cpu which has a short timer active has the
    bit set then the broadcast device will not be armed on behalf of that
    cpu and will fire way after the expected timer expiry. In the case of
    Christians bug report it took ~145 seconds which is about half of the
    wrap around time of HPET (the limit for that device) due to the fact
    that all other cpus had no timers armed which expired before the 145
    seconds timeframe.

    The solution is simply to clear the broadcast active bit
    unconditionally when a cpu switches to oneshot mode after the first
    cpu switched the broadcast device over. It's not idle at that point
    otherwise it would not be executing that code.

    [ I fundamentally hate that broadcast crap. Why the heck thought some
    folks that when going into deep idle it's a brilliant concept to
    switch off the last device which brings the cpu back from that
    state? ]

    Thanks to Christian for providing all the valuable debug information!

    Reported-and-tested-by: Christian Hoffmann
    Cc: John Stultz
    Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1105161105170.3078%40ionos%3E
    Cc: stable@kernel.org
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

05 May, 2011

1 commit

  • Avoid taking broadcast_lock in the idle path for systems where the
    timer doesn't stop in C3.

    [ tglx: Removed the stale label and added comment ]

    Signed-off-by: Andi Kleen
    Cc: Dave Kleikamp
    Cc: Chris Mason
    Cc: Peter Zijlstra
    Cc: Tim Chen
    Cc: lenb@kernel.org
    Cc: paulmck@us.ibm.com
    Link: http://lkml.kernel.org/r/%3C20110504234806.GF2925%40one.firstfloor.org%3E
    Signed-off-by: Thomas Gleixner

    Andi Kleen
     

16 Mar, 2011

1 commit

  • …l/git/tip/linux-2.6-tip

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits)
    posix-clocks: Check write permissions in posix syscalls
    hrtimer: Remove empty hrtimer_init_hres_timer()
    hrtimer: Update hrtimer->state documentation
    hrtimer: Update base[CLOCK_BOOTTIME].offset correctly
    timers: Export CLOCK_BOOTTIME via the posix timers interface
    timers: Add CLOCK_BOOTTIME hrtimer base
    time: Extend get_xtime_and_monotonic_offset() to also return sleep
    time: Introduce get_monotonic_boottime and ktime_get_boottime
    hrtimers: extend hrtimer base code to handle more then 2 clockids
    ntp: Remove redundant and incorrect parameter check
    mn10300: Switch do_timer() to xtimer_update()
    posix clocks: Introduce dynamic clocks
    posix-timers: Cleanup namespace
    posix-timers: Add support for fd based clocks
    x86: Add clock_adjtime for x86
    posix-timers: Introduce a syscall for clock tuning.
    time: Splitout compat timex accessors
    ntp: Add ADJ_SETOFFSET mode bit
    time: Introduce timekeeping_inject_offset
    posix-timer: Update comment
    ...

    Fix up new system-call-related conflicts in
    arch/x86/ia32/ia32entry.S
    arch/x86/include/asm/unistd_32.h
    arch/x86/include/asm/unistd_64.h
    arch/x86/kernel/syscall_table_32.S
    (name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some
    due to movement of get_jiffies_64() in:
    kernel/time.c

    Linus Torvalds
     

26 Feb, 2011

1 commit

  • When the per cpu timer is marked CLOCK_EVT_FEAT_C3STOP, then we only
    can switch into oneshot mode, when the backup broadcast device
    supports oneshot mode as well. Otherwise we would try to switch the
    broadcast device into an unsupported mode unconditionally. This went
    unnoticed so far as the current available broadcast devices support
    oneshot mode. Seth unearthed this problem while debugging and working
    around an hpet related BIOS wreckage.

    Add the necessary check to tick_is_oneshot_available().

    Reported-and-tested-by: Seth Forshee
    Signed-off-by: Thomas Gleixner
    LKML-Reference:
    Cc: stable@kernel.org # .21 ->

    Thomas Gleixner
     

01 Feb, 2011

1 commit

  • All callers of do_timer() are converted to xtime_update(). The only
    users of xtime_lock are in kernel/time/. Make both local to
    kernel/time/ and remove them from the global header files.

    [ tglx: Reuse tick-internal.h instead of creating another local header
    file. Massaged changelog ]

    Signed-off-by: Torben Hohn
    Cc: Peter Zijlstra
    Cc: johnstul@us.ibm.com
    Cc: yong.zhang0@gmail.com
    Cc: hch@infradead.org
    Signed-off-by: Thomas Gleixner

    Torben Hohn
     

12 Jul, 2010

1 commit


15 Dec, 2009

1 commit


20 Aug, 2009

1 commit

  • Currently clockevents_notify() is called with interrupts enabled at
    some places and interrupts disabled at some other places.

    This results in a deadlock in this scenario.

    cpu A holds clockevents_lock in clockevents_notify() with irqs enabled
    cpu B waits for clockevents_lock in clockevents_notify() with irqs disabled
    cpu C doing set_mtrr() which will try to rendezvous of all the cpus.

    This will result in C and A come to the rendezvous point and waiting
    for B. B is stuck forever waiting for the spinlock and thus not
    reaching the rendezvous point.

    Fix the clockevents code so that clockevents_lock is taken with
    interrupts disabled and thus avoid the above deadlock.

    Also call lapic_timer_propagate_broadcast() on the destination cpu so
    that we avoid calling smp_call_function() in the clockevents notifier
    chain.

    This issue left us wondering if we need to change the MTRR rendezvous
    logic to use stop machine logic (instead of smp_call_function) or add
    a check in spinlock debug code to see if there are other spinlocks
    which gets taken under both interrupts enabled/disabled conditions.

    Signed-off-by: Suresh Siddha
    Signed-off-by: Venkatesh Pallipadi
    Cc: "Pallipadi Venkatesh"
    Cc: "Brown Len"
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Suresh Siddha
     

02 May, 2009

1 commit


01 Jan, 2009

2 commits


13 Dec, 2008

1 commit


18 Oct, 2008

1 commit

  • We did not restart the tick device from irq_enter() to avoid double
    reprogramming and extra events in the return immediate to idle case.

    But long lasting softirqs can lead to a situation where jiffies become
    stale:

    idle()
    tick stopped (reprogrammed to next pending timer)
    halt()
    interrupt
    jiffies updated from irq_enter()
    interrupt handler
    softirq function 1 runs 20ms
    softirq function 2 arms a 10ms timer with a stale jiffies value
    jiffies updated from irq_exit()
    timer wheel has now an already expired timer
    (the one added in function 2)
    timer fires and timer softirq runs

    This was discovered when debugging a timer problem which happend only
    when the ath5k driver is active. The debugging proved that there is a
    softirq function running for more than 20ms, which is a bug by itself.

    To solve this we restart the tick timer right from irq_enter(), but do
    not go through the other functions which are necessary to return from
    idle when need_resched() is set.

    Reported-by: Elias Oltmanns
    Signed-off-by: Thomas Gleixner
    Tested-by: Elias Oltmanns

    Thomas Gleixner
     

04 Oct, 2008

1 commit


23 Sep, 2008

2 commits

  • Impact: timer hang on CPU online observed on AMD C1E systems

    When a CPU is brought online then the broadcast machinery can
    be in the one shot state already. Check this and setup the timer
    device of the new CPU in one shot mode so the broadcast code
    can pick up the next_event value correctly.

    Another AMD C1E oddity, as we switch to broadcast immediately and
    not after the full bring up via the ACPI cpu idle code.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Impact: Possible hang on CPU online observed on AMD C1E machines.

    The broadcast setup code looks at the mode of the tick device to
    determine whether it needs to be shut down or setup. This is wrong
    when the broadcast mode is set to one shot already. This can happen
    when a CPU is brought online as it goes through the periodic setup
    first.

    The problem went unnoticed as sane systems do not call into that code
    before the switch to one shot for the clock event device happens.
    The AMD C1E idle routine switches over immediately and thereby shuts
    down the just setup device before the first interrupt happens.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

17 Sep, 2008

1 commit

  • The device shut down does not cleanup the next_event variable of the
    clock event device. So when the device is reactivated the possible
    stale next_event value can prevent the device to be reprogrammed as it
    claims to wait on a event already.

    This is the root cause of the resurfacing suspend/resume problem,
    where systems need key press to come back to life.

    Fix this by setting next_event to KTIME_MAX when the device is shut
    down. Use a separate function for shutdown which takes care of that
    and only keep the direct set mode call in the broadcast code, where we
    can not touch the next_event value.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

06 Sep, 2008

1 commit


05 Sep, 2008

3 commits

  • The C1E/HPET bug reports on AMDX2/RS690 systems where tracked down to a
    too small value of the HPET minumum delta for programming an event.

    The clockevents code needs to enforce an interrupt event on the clock event
    device in some cases. The enforcement code was stupid and naive, as it just
    added the minimum delta to the current time and tried to reprogram the device.
    When the minimum delta is too small, then this loops forever.

    Add a sanity check. Allow reprogramming to fail 3 times, then print a warning
    and double the minimum delta value to make sure, that this does not happen again.
    Use the same function for both tick-oneshot and tick-broadcast code.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • While chasing the C1E/HPET bugreports I went through the clock events
    code inch by inch and found that the broadcast device can be initialized
    and shutdown multiple times. Multiple shutdowns are not critical, but
    useless waste of time. Multiple initializations are simply broken. Another
    CPU might have the device in use already after the first initialization and
    the second init could just render it unusable again.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The reprogramming of the periodic broadcast handler was broken,
    when the first programming returned -ETIME. The clockevents code
    stores the new expiry value in the clock events device next_event field
    only when the programming time has not been elapsed yet. The loop in
    question calculates the new expiry value from the next_event value
    and therefor never increases.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

16 Jul, 2008

2 commits

  • Conflicts:

    arch/x86/xen/smp.c
    kernel/sched_rt.c
    net/iucv/iucv.c

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Conflicts:

    arch/powerpc/Kconfig
    arch/s390/kernel/time.c
    arch/x86/kernel/apic_32.c
    arch/x86/kernel/cpu/perfctr-watchdog.c
    arch/x86/kernel/i8259_64.c
    arch/x86/kernel/ldt.c
    arch/x86/kernel/nmi_64.c
    arch/x86/kernel/smpboot.c
    arch/x86/xen/smp.c
    include/asm-x86/hw_irq_32.h
    include/asm-x86/hw_irq_64.h
    include/asm-x86/mach-default/irq_vectors.h
    include/asm-x86/mach-voyager/irq_vectors.h
    include/asm-x86/smp.h
    kernel/Makefile

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

08 Jul, 2008

1 commit

  • C1E on AMD machines is like C3 but without control from the OS. Up to
    now we disabled the local apic timer for those machines as it stops
    when the CPU goes into C1E. This excludes those machines from high
    resolution timers / dynamic ticks, which hurts especially X2 based
    laptops.

    The current boot time C1E detection has another, more serious flaw
    as well: some BIOSes do not enable C1E until the ACPI processor module
    is loaded. This causes systems to stop working after that point.

    To work nicely with C1E enabled machines we use a separate idle
    function, which checks on idle entry whether C1E was enabled in the
    Interrupt Pending Message MSR. This allows us to do timer broadcasting
    for C1E and covers the late enablement of C1E as well.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

26 Jun, 2008

1 commit


24 May, 2008

1 commit


21 Apr, 2008

1 commit


17 Apr, 2008

1 commit

  • > Generic code is not supposed to include irq.h. Replace this include
    > by linux/hardirq.h instead and add/replace an include of linux/irq.h
    > in asm header files where necessary.
    > This change should only matter for architectures that make use of
    > GENERIC_CLOCKEVENTS.
    > Architectures in question are mips, x86, arm, sh, powerpc, uml and sparc64.
    >
    > I did some cross compile tests for mips, x86_64, arm, powerpc and sparc64.
    > This patch fixes also build breakages caused by the include replacement in
    > tick-common.h.

    I generally dislike adding optional linux/* includes in asm/* includes -
    I'm nervous about this causing include loops.

    However, there's a separate point to be discussed here.

    That is, what interfaces are expected of every architecture in the kernel.
    If generic code wants to be able to set the affinity of interrupts, then
    that needs to become part of the interfaces listed in linux/interrupt.h
    rather than linux/irq.h.

    So what I suggest is this approach instead (against Linus' tree of a
    couple of days ago) - we move irq_set_affinity() and irq_can_set_affinity()
    to linux/interrupt.h, change the linux/irq.h includes to linux/interrupt.h
    and include asm/irq_regs.h where needed (asm/irq_regs.h is supposed to be
    rarely used include since not much touches the stacked parent context
    registers.)

    Build tested on ARM PXA family kernels and ARM's Realview platform
    kernels which both use genirq.

    [ tglx@linutronix.de: add GENERIC_HARDIRQ dependencies ]

    Signed-off-by: Russell King
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Russell King
     

30 Jan, 2008

1 commit


19 Dec, 2007

1 commit

  • Resolve the following regression of a choppy, almost unusable laptop:

    http://lkml.org/lkml/2007/12/7/299
    http://bugzilla.kernel.org/show_bug.cgi?id=9525

    A previous version of the code did the reprogramming of the broadcast
    device in the return from idle code. This was removed, but the logic in
    tick_handle_oneshot_broadcast() was kept the same.

    When a broadcast interrupt happens we signal the expiry to all CPUs
    which have an expired event. If none of the CPUs has an expired event,
    which can happen in dyntick mode, then we reprogram the broadcast
    device. We do not reprogram otherwise, but this is only correct if all
    CPUs, which are in the idle broadcast state have been woken up.

    The code ignores, that there might be pending not yet expired events on
    other CPUs, which are in the idle broadcast state. So the delivery of
    those events can be delayed for quite a time.

    Change the tick_handle_oneshot_broadcast() function to check for CPUs,
    which are in broadcast state and are not woken up by the current event,
    and enforce the rearming of the broadcast device for those CPUs.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

06 Nov, 2007

1 commit


18 Oct, 2007

1 commit