11 Aug, 2011

1 commit

  • Its possible to jam up the alarm timers by setting very small interval
    timers, which will cause the alarmtimer subsystem to spend all of its time
    firing and restarting timers. This can effectivly lock up a box.

    A deeper fix is needed, closely mimicking the hrtimer code, but for now
    just cap the interval to 100us to avoid userland hanging the system.

    CC: Thomas Gleixner
    CC: stable@kernel.org
    Signed-off-by: John Stultz

    John Stultz
     

10 Aug, 2011

2 commits


23 Jul, 2011

1 commit


21 Jul, 2011

1 commit

  • Terribly embarassing. Don't know how I committed this, but its
    KERN_WARNING not KERN_WARN.

    This fixes the following compile error:
    kernel/time/timekeeping.c: In function ‘__timekeeping_inject_sleeptime’:
    kernel/time/timekeeping.c:608: error: ‘KERN_WARN’ undeclared (first use in this function)
    kernel/time/timekeeping.c:608: error: (Each undeclared identifier is reported only once
    kernel/time/timekeeping.c:608: error: for each function it appears in.)
    kernel/time/timekeeping.c:608: error: expected ‘)’ before string constant
    make[2]: *** [kernel/time/timekeeping.o] Error 1

    Reported-by: Ingo Molnar
    Signed-off-by: John Stultz

    John Stultz
     

22 Jun, 2011

4 commits

  • Because the read_persistent_clock interface is usually backed by
    only a second granular interface, each time we read from the persistent
    clock for suspend/resume, we introduce a half second (on average) of error.

    In order to avoid this error accumulating as the system is suspended
    over and over, this patch measures the time delta between the persistent
    clock and the system CLOCK_REALTIME.

    If the delta is less then 2 seconds from the last suspend, we compensate
    by using the previous time delta (keeping it close). If it is larger
    then 2 seconds, we assume the clock was set or has been changed, so we
    do no correction and update the delta.

    Note: If NTP is running, ths could seem to "fight" with the NTP corrected
    time, where as if the system time was off by 1 second, and NTP slewed the
    value in, a suspend/resume cycle could undo this correction, by trying to
    restore the previous offset from the persistent clock. However, without
    this patch, since each read could cause almost a full second worth of
    error, its possible to get almost 2 seconds of error just from the
    suspend/resume cycle alone, so this about equal to any offset added by
    the compensation.

    Further on systems that suspend/resume frequently, this should keep time
    closer then NTP could compensate for if the errors were allowed to
    accumulate.

    Credits to Arve Hjønnevåg for suggesting this solution.

    CC: Arve Hjønnevåg
    CC: Thomas Gleixner
    Signed-off-by: John Stultz

    John Stultz
     
  • Arve suggested making sure we catch possible negative sleep time
    intervals that could be passed into timekeeping_inject_sleeptime.

    CC: Arve Hjønnevåg
    CC: Thomas Gleixner
    Signed-off-by: John Stultz

    John Stultz
     
  • Toralf Förster and Richard Weinberger noted that if there is
    no RTC device, the alarm timers core prints out an annoying
    "ALARM timers will not wake from suspend" message.

    This warning has been removed in a previous patch, however
    the issue still remains: The original idea was to support
    alarm timers even if there was no rtc device, as long as the
    system didn't go into suspend.

    However, after further consideration, communicating to the application
    that alarmtimers are not fully functional seems like the better
    solution.

    So this patch makes it so we return -ENOTSUPP to any posix _ALARM
    clockid calls if there is no backing RTC device on the system.

    Further this changes the behavior where when there is no rtc device
    we will check for one on clock_getres, clock_gettime, timer_create,
    and timer_nsleep instead of on suspend.

    CC: Toralf Förster
    CC: Richard Weinberger
    CC: Thomas Gleixner
    Reported-by: Toralf Förster
    Reported by: Richard Weinberger
    Signed-off-by: John Stultz

    John Stultz
     
  • The alarmtimers code currently picks a rtc device to use at
    late init time. However, if your rtc driver is loaded as a module,
    it may be registered after the alarmtimers late init code, leaving
    the alarmtimers nonfunctional.

    This patch moves the the rtcdevice selection to when we actually try
    to use it, allowing us to make use of rtc modules that may have been
    loaded at any point since bootup.

    CC: Thomas Gleixner
    CC: Meelis Roos
    Reported-by: Meelis Roos
    Signed-off-by: John Stultz

    John Stultz
     

17 Jun, 2011

1 commit

  • The clocksource watchdog code is interruptible and it has been
    observed that this can trigger false positives which disable the TSC.

    The reason is that an interrupt storm or a long running interrupt
    handler between the read of the watchdog source and the read of the
    TSC brings the two far enough apart that the delta is larger than the
    unstable treshold. Move both reads into a short interrupt disabled
    region to avoid that.

    Reported-and-tested-by: Vernon Mauery
    Signed-off-by: Thomas Gleixner
    Cc: stable@kernel.org

    Thomas Gleixner
     

03 Jun, 2011

1 commit

  • For UP it's stupid to request an initialized cpumask for the clock
    event devices. Though we need the mask set even on UP to avoid a
    horrible ifdeffery especially in the broadcast code.

    For SMP we can at least try to survive with a warning and set the
    cpumask of the cpu we're running on. That gives a decent chance to
    bring the machine up and retrieve the debug info.

    Signed-off-by: Thomas Gleixner
    Cc: Linus Walleij
    Cc: Russell King - ARM Linux
    Cc: Stephen Boyd

    Thomas Gleixner
     

23 May, 2011

1 commit


21 May, 2011

1 commit


20 May, 2011

3 commits

  • unsigned long is not 64bit on 32bit machine.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • …l/git/tip/linux-2.6-tip

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    hrtimer: Make lookup table const
    RTC: Disable CONFIG_RTC_CLASS from being built as a module
    timers: Fix alarmtimer build issues when CONFIG_RTC_CLASS=n
    timers: Remove delayed irqwork from alarmtimers implementation
    timers: Improve alarmtimer comments and minor fixes
    timers: Posix interface for alarm-timers
    timers: Introduce in-kernel alarm-timer interface
    timers: Add rb_init_node() to allow for stack allocated rb nodes
    time: Add timekeeping_inject_sleeptime

    Linus Torvalds
     
  • …x/kernel/git/tip/linux-2.6-tip

    * 'timers-clockevents-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: hpet: Cleanup the clockevents init and register code
    x86: Convert PIT to clockevents_config_and_register()
    clockevents: Provide interface to reconfigure an active clock event device
    clockevents: Provide combined configure and register function
    clockevents: Restructure clock_event_device members
    clocksource: Get rid of the hardcoded 5 seconds sleep time limit
    clocksource: Restructure clocksource struct members

    Linus Torvalds
     

19 May, 2011

3 commits


17 May, 2011

1 commit

  • The first cpu which switches from periodic to oneshot mode switches
    also the broadcast device into oneshot mode. The broadcast device
    serves as a backup for per cpu timers which stop in deeper
    C-states. To avoid starvation of the cpus which might be in idle and
    depend on broadcast mode it marks the other cpus as broadcast active
    and sets the brodcast expiry value of those cpus to the next tick.

    The oneshot mode broadcast bit for the other cpus is sticky and gets
    only cleared when those cpus exit idle. If a cpu was not idle while
    the bit got set in consequence the bit prevents that the broadcast
    device is armed on behalf of that cpu when it enters idle for the
    first time after it switched to oneshot mode.

    In most cases that goes unnoticed as one of the other cpus has usually
    a timer pending which keeps the broadcast device armed with a short
    timeout. Now if the only cpu which has a short timer active has the
    bit set then the broadcast device will not be armed on behalf of that
    cpu and will fire way after the expected timer expiry. In the case of
    Christians bug report it took ~145 seconds which is about half of the
    wrap around time of HPET (the limit for that device) due to the fact
    that all other cpus had no timers armed which expired before the 145
    seconds timeframe.

    The solution is simply to clear the broadcast active bit
    unconditionally when a cpu switches to oneshot mode after the first
    cpu switched the broadcast device over. It's not idle at that point
    otherwise it would not be executing that code.

    [ I fundamentally hate that broadcast crap. Why the heck thought some
    folks that when going into deep idle it's a brilliant concept to
    switch off the last device which brings the cpu back from that
    state? ]

    Thanks to Christian for providing all the valuable debug information!

    Reported-and-tested-by: Christian Hoffmann
    Cc: John Stultz
    Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1105161105170.3078%40ionos%3E
    Cc: stable@kernel.org
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

05 May, 2011

2 commits

  • Avoid taking broadcast_lock in the idle path for systems where the
    timer doesn't stop in C3.

    [ tglx: Removed the stale label and added comment ]

    Signed-off-by: Andi Kleen
    Cc: Dave Kleikamp
    Cc: Chris Mason
    Cc: Peter Zijlstra
    Cc: Tim Chen
    Cc: lenb@kernel.org
    Cc: paulmck@us.ibm.com
    Link: http://lkml.kernel.org/r/%3C20110504234806.GF2925%40one.firstfloor.org%3E
    Signed-off-by: Thomas Gleixner

    Andi Kleen
     
  • Christian Hoffmann reported that the command line clocksource override
    with acpi_pm timer fails:

    Kernel command line: clocksource=acpi_pm
    hpet clockevent registered
    Switching to clocksource hpet
    Override clocksource acpi_pm is not HRT compatible.
    Cannot switch while in HRT/NOHZ mode.

    The watchdog code is what enables CLOCK_SOURCE_VALID_FOR_HRES, but we
    actually end up selecting the clocksource before we enqueue it into
    the watchdog list, so that's why we see the warning and fail to switch
    to acpi_pm timer as requested. That's particularly bad when we want to
    debug timekeeping related problems in early boot.

    Put the selection call last.

    Reported-by: Christian Hoffmann
    Signed-off-by: John Stultz
    Cc: stable@kernel.org # 32...
    Link: http://lkml.kernel.org/r/%3C1304558210.2943.24.camel%40work-vm%3E
    Signed-off-by: Thomas Gleixner

    john stultz
     

04 May, 2011

2 commits

  • class_find_device() takes a refcount on the rtc device. rtc_open()
    takes another one, so we can drop it after the rtc_open() call.

    Signed-off-by: Thomas Gleixner
    Cc: John Stultz

    Thomas Gleixner
     
  • alarmtimer_late_init() uses class_find_device() to find a alarm
    capable rtc device. The match callback stores a pointer to the name in
    the char pointer handed in from the call site. alarmtimer_late_init()
    checks the char pointer for NULL, but the pointer is on the stack and
    not initialized to NULL before the call. So it can have random content
    when the match function did not identify a device, which leads to
    random access in the following rtc_open() call where the pointer is
    dereferenced

    Instead of relying on the char pointer, check the return value of
    class_find_device. If a device is found then the name pointer is valid
    as well.

    Reported-by: Ingo Molnar
    Cc: John Stultz
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

03 May, 2011

3 commits

  • Some applications must be aware of clock realtime being set
    backward. A simple example is a clock applet which arms a timer for
    the next minute display. If clock realtime is set backward then the
    applet displays a stale time for the amount of time which the clock
    was set backwards. Due to that applications poll the time because we
    don't have an interface.

    Extend the timerfd interface by adding a flag which puts the timer
    onto a different internal realtime clock. All timers on this clock are
    expired whenever the clock was set.

    The timerfd core records the monotonic offset when the timer is
    created. When the timer is armed, then the current offset is compared
    to the previous recorded offset. When it has changed, then
    timerfd_settime returns -ECANCELED. When a timer is read the offset is
    compared and if it changed -ECANCELED returned to user space. Periodic
    timers are not rearmed in the cancelation case.

    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: Chris Friesen
    Tested-by: Kay Sievers
    Cc: "Kirill A. Shutemov"
    Cc: Peter Zijlstra
    Cc: Davide Libenzi
    Reviewed-by: Alexander Shishkin
    Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104271359580.3323%40ionos%3E
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Make clock_was_set() unconditional and rename hres_timers_resume to
    hrtimers_resume. This is a preparatory patch for hrtimers which are
    cancelled when clock realtime was set.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Ingo pointed out that the alarmtimers won't build if CONFIG_RTC_CLASS=n.
    This patch adds proper ifdefs to the alarmtimer code to disable the rtc
    usage if it is not built in.

    Reported-by: Ingo Molnar
    Signed-off-by: John Stultz
    Signed-off-by: Thomas Gleixner

    John Stultz
     

29 Apr, 2011

2 commits

  • Thomas asked about the delayed irq work in the alarmtimers code,
    and I realized that it was a legacy from when the alarmtimer base
    lock was a mutex (due to concerns that we'd be interacting with
    the RTC device, which is protected by mutexes).

    Since the alarmtimer base is now protected by a spinlock, we can
    simply execute alarmtimer functions directly from the hrtimer
    callback. Should any future alarmtimer functions sleep, they can
    simply manage scheduling any delayed work themselves.

    CC: Thomas Gleixner
    Signed-off-by: John Stultz

    John Stultz
     
  • This patch addresses a number of minor comment improvements and
    other minor issues from Thomas' review of the alarmtimers code.

    CC: Thomas Gleixner
    Signed-off-by: John Stultz

    John Stultz
     

27 Apr, 2011

3 commits

  • This patch exposes alarm-timers to userland via the posix clock
    and timers interface, using two new clockids: CLOCK_REALTIME_ALARM
    and CLOCK_BOOTTIME_ALARM. Both clockids behave identically to
    CLOCK_REALTIME and CLOCK_BOOTTIME, respectively, but timers
    set against the _ALARM suffixed clockids will wake the system if
    it is suspended.

    Some background can be found here:
    https://lwn.net/Articles/429925/

    The concept for Alarm-timers was inspired by the Android Alarm
    driver (by Arve Hjønnevåg) found in the Android kernel tree.

    See: http://android.git.kernel.org/?p=kernel/common.git;a=blob;f=drivers/rtc/alarm.c;h=1250edfbdf3302f5e4ea6194847c6ef4bb7beb1c;hb=android-2.6.36

    While the in-kernel interface is pretty similar between
    alarm-timers and Android alarm driver, the user-space interface
    for the Android alarm driver is via ioctls to a new char device.
    As mentioned above, I've instead chosen to export this functionality
    via the posix interface, as it seemed a little simpler and avoids
    creating duplicate interfaces to things like CLOCK_REALTIME and
    CLOCK_MONOTONIC under alternate names (ie:ANDROID_ALARM_RTC and
    ANDROID_ALARM_SYSTEMTIME).

    The semantics of the Android alarm driver are different from what
    this posix interface provides. For instance, threads other then
    the thread waiting on the Android alarm driver are able to modify
    the alarm being waited on. Also this interface does not allow
    the same wakelock semantics that the Android driver provides
    (ie: kernel takes a wakelock on RTC alarm-interupt, and holds it
    through process wakeup, and while the process runs, until the
    process either closes the char device or calls back in to wait
    on a new alarm).

    One potential way to implement similar semantics may be via
    the timerfd infrastructure, but this needs more research.

    There may also need to be some sort of sysfs system level policy
    hooks that allow alarm timers to be disabled to keep them
    from firing at inappropriate times (ie: laptop in a well insulated
    bag, mid-flight).

    CC: Arve Hjønnevåg
    CC: Thomas Gleixner
    CC: Alessandro Zummo
    Acked-by: Arnd Bergmann
    Signed-off-by: John Stultz

    John Stultz
     
  • This provides the in kernel interface and infrastructure for
    alarm-timers.

    Alarm-timers are a hybrid style timer, similar to hrtimers,
    but when the system is suspended, the RTC device is set to
    fire and wake the system for when the soonest alarm-timer
    expires.

    The concept for Alarm-timers was inspired by the Android Alarm
    driver (by Arve Hjønnevåg) found in the Android kernel tree.

    See: http://android.git.kernel.org/?p=kernel/common.git;a=blob;f=drivers/rtc/alarm.c;h=1250edfbdf3302f5e4ea6194847c6ef4bb7beb1c;hb=android-2.6.36

    This in-kernel interface should be fairly compatible with the
    Android alarm driver in-kernel interface, but has the advantage
    of utilizing the new RTC timerqueue code instead of doing direct
    RTC manipulation.

    CC: Arve Hjønnevåg
    CC: Thomas Gleixner
    CC: Alessandro Zummo
    Acked-by: Arnd Bergmann
    Signed-off-by: John Stultz

    John Stultz
     
  • Some platforms cannot implement read_persistent_clock, as
    their RTC devices are only accessible when interrupts are enabled.
    This keeps them from being used by the timekeeping code on resume
    to measure the time in suspend.

    The RTC layer tries to work around this, by calling do_settimeofday
    on resume after irqs are reenabled to set the time properly. However,
    this only corrects CLOCK_REALTIME, and does not properly adjust
    the sleep time value. This causes btime in /proc/stat to be incorrect
    as well as making the new CLOCK_BOTTTIME inaccurate.

    This patch resolves the issue by introducing a new timekeeping hook
    to allow the RTC layer to inject the sleep time on resume.

    The code also checks to make sure that read_persistent_clock is
    nonfunctional before setting the sleep time, so that should the RTC's
    HCTOSYS option be configured in on a system that does support
    read_persistent_clock we will not increase the total_sleep_time twice.

    CC: Arve Hjønnevåg
    CC: Thomas Gleixner
    Acked-by: Arnd Bergmann
    Signed-off-by: John Stultz

    John Stultz
     

18 Apr, 2011

1 commit

  • A dynamic posix clock is protected from asynchronous removal by a mutex.
    However, using a mutex has the unwanted effect that a long running clock
    operation in one process will unnecessarily block other processes.

    For example, one process might call read() to get an external time stamp
    coming in at one pulse per second. A second process calling clock_gettime
    would have to wait for almost a whole second.

    This patch fixes the issue by using a reader/writer semaphore instead of
    a mutex.

    Signed-off-by: Richard Cochran
    Cc: John Stultz
    Link: http://lkml.kernel.org/r/%3C20110330132421.GA31771%40riccoc20.at.omicron.at%3E
    Signed-off-by: Thomas Gleixner

    Richard Cochran
     

08 Apr, 2011

1 commit


04 Apr, 2011

1 commit

  • The ADJ_SETOFFSET bit added in commit 094aa188 ("ntp: Add ADJ_SETOFFSET
    mode bit") also introduced a way for any user to change the system time.
    Sneaky or buggy calls to adjtimex() could set

    ADJ_OFFSET_SS_READ | ADJ_SETOFFSET

    which would result in a successful call to timekeeping_inject_offset().
    This patch fixes the issue by adding the capability check.

    Signed-off-by: Richard Cochran
    Signed-off-by: Linus Torvalds

    Richard Cochran
     

31 Mar, 2011

1 commit


24 Mar, 2011

1 commit

  • The timekeeping subsystem uses a sysdev class and a sysdev for
    executing timekeeping_suspend() after interrupts have been turned off
    on the boot CPU (during system suspend) and for executing
    timekeeping_resume() before turning on interrupts on the boot CPU
    (during system resume). However, since both of these functions
    ignore their arguments, the entire mechanism may be replaced with a
    struct syscore_ops object which is simpler.

    Signed-off-by: Rafael J. Wysocki
    Reviewed-by: Thomas Gleixner

    Rafael J. Wysocki
     

16 Mar, 2011

1 commit

  • …l/git/tip/linux-2.6-tip

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits)
    posix-clocks: Check write permissions in posix syscalls
    hrtimer: Remove empty hrtimer_init_hres_timer()
    hrtimer: Update hrtimer->state documentation
    hrtimer: Update base[CLOCK_BOOTTIME].offset correctly
    timers: Export CLOCK_BOOTTIME via the posix timers interface
    timers: Add CLOCK_BOOTTIME hrtimer base
    time: Extend get_xtime_and_monotonic_offset() to also return sleep
    time: Introduce get_monotonic_boottime and ktime_get_boottime
    hrtimers: extend hrtimer base code to handle more then 2 clockids
    ntp: Remove redundant and incorrect parameter check
    mn10300: Switch do_timer() to xtimer_update()
    posix clocks: Introduce dynamic clocks
    posix-timers: Cleanup namespace
    posix-timers: Add support for fd based clocks
    x86: Add clock_adjtime for x86
    posix-timers: Introduce a syscall for clock tuning.
    time: Splitout compat timex accessors
    ntp: Add ADJ_SETOFFSET mode bit
    time: Introduce timekeeping_inject_offset
    posix-timer: Update comment
    ...

    Fix up new system-call-related conflicts in
    arch/x86/ia32/ia32entry.S
    arch/x86/include/asm/unistd_32.h
    arch/x86/include/asm/unistd_64.h
    arch/x86/kernel/syscall_table_32.S
    (name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some
    due to movement of get_jiffies_64() in:
    kernel/time.c

    Linus Torvalds
     

13 Mar, 2011

1 commit

  • pc_clock_settime() and pc_clock_adjtime() do not check whether the fd
    was opened in write mode, so a clock can be set with a read only fd.

    [ tglx: We deliberately do not return -EPERM as we want this to be
    distingushable from the capability based permission check ]

    Signed-off-by: Torben Hohn
    LKML-Reference:
    Cc: Richard Cochran
    Cc: John Stultz
    Cc: Thomas Gleixner

    Torben Hohn
     

26 Feb, 2011

1 commit

  • When the per cpu timer is marked CLOCK_EVT_FEAT_C3STOP, then we only
    can switch into oneshot mode, when the backup broadcast device
    supports oneshot mode as well. Otherwise we would try to switch the
    broadcast device into an unsupported mode unconditionally. This went
    unnoticed so far as the current available broadcast devices support
    oneshot mode. Seth unearthed this problem while debugging and working
    around an hpet related BIOS wreckage.

    Add the necessary check to tick_is_oneshot_available().

    Reported-and-tested-by: Seth Forshee
    Signed-off-by: Thomas Gleixner
    LKML-Reference:
    Cc: stable@kernel.org # .21 ->

    Thomas Gleixner