21 Apr, 2008

2 commits

  • The previous optimization did not take the case into account where a
    clock provides its own softirq_get_time() function.

    Check for the availablitiy of the clock get time function first and
    then check if we need to retrieve the time for both clocks via
    hrtimer_softirq_gettime() to avoid a double evaluation of time in that
    case as well.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • It seems that hrtimer_run_queues() is calling hrtimer_get_softirq_time() more
    often than it needs to. This can cause frequent contention on systems with
    large numbers of processors/cores.

    With this patch, hrtimer_run_queues only calls hrtimer_get_softirq_time() if
    there is a pending timer in one of the hrtimer bases, and only once.

    This also combines hrtimer_run_queues() and the inline run_hrtimer_queue()
    into one function.

    [ tglx@linutronix.de: coding style ]

    Signed-off-by: Dimitri Sivanich
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Dimitri Sivanich
     

17 Apr, 2008

2 commits

  • In order to avoid the false positive from lockdep, each per-cpu base->lock has
    the separate lock class and migrate_hrtimers() uses double_spin_lock().

    This is overcomplicated: except for migrate_hrtimers() we never take 2 locks
    at once, and migrate_hrtimers() can use spin_lock_nested().

    Signed-off-by: Oleg Nesterov
    Cc: Arjan van de Ven
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • Convert all the nanosleep related users of restart_block to the
    new nanosleep specific restart_block fields.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

15 Feb, 2008

2 commits

  • A CLOCK_REALTIME timer, which has an absolute expiry time less than
    the clock realtime offset calls with a negative delta into the clock
    events code and triggers the WARN_ON() there.

    This is a false positive and needs to be prevented. Check the result
    of timer->expires - timer->base->offset right away and return -ETIME
    right away.

    Thanks to Frans Pop, who reported the problem and tested the fixes.

    Signed-off-by: Thomas Gleixner
    Tested-by: Frans Pop

    Thomas Gleixner
     
  • Various user space callers ask for relative timeouts. While we fixed
    that overflow issue in hrtimer_start(), the sites which convert
    relative user space values to absolute timeouts themself were uncovered.

    Instead of putting overflow checks into each place add a function
    which does the sanity checking and convert all affected callers to use
    it.

    Thanks to Frans Pop, who reported the problem and tested the fixes.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Tested-by: Frans Pop

    Thomas Gleixner
     

10 Feb, 2008

2 commits

  • hrtimer_nanosleep_restart() clears/restores restart_block->fn. This is
    pointless and complicates its usage. Note that if sys_restart_syscall()
    doesn't actually happen, we have a bogus "pending" restart->fn anyway,
    this is harmless.

    Signed-off-by: Oleg Nesterov
    Cc: Alexey Dobriyan
    Cc: Pavel Emelyanov
    Cc: Peter Zijlstra
    Cc: Toyo Abe
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • Spotted by Pavel Emelyanov and Alexey Dobriyan.

    hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to
    the local variable which lives in the caller's stack frame. This means that
    if sys_restart_syscall() actually happens and it is interrupted as well, we
    don't update the user-space variable, but write into the already dead stack
    frame.

    Introduced by commit 04c227140fed77587432667a574b14736a06dd7f
    hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier

    Change the callers to pass "__user *rmtp" to hrtimer_nanosleep(), and change
    hrtimer_nanosleep() to use copy_to_user() to actually update *rmtp.

    Small problem remains. man 2 nanosleep states that *rtmp should be written if
    nanosleep() was interrupted (it says nothing whether it is OK to update *rmtp
    if nanosleep returns 0), but (with or without this patch) we can dirty *rem
    even if nanosleep() returns 0.

    NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other
    bugs. Fixed by the next patch.

    Signed-off-by: Oleg Nesterov
    Cc: Alexey Dobriyan
    Cc: Michael Kerrisk
    Cc: Pavel Emelyanov
    Cc: Peter Zijlstra
    Cc: Toyo Abe
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    include/linux/hrtimer.h | 2 -
    kernel/hrtimer.c | 51 +++++++++++++++++++++++++-----------------------
    kernel/posix-timers.c | 14 +------------
    3 files changed, 30 insertions(+), 37 deletions(-)

    Oleg Nesterov
     

06 Feb, 2008

1 commit

  • This is the new timerfd API as it is implemented by the following patch:

    int timerfd_create(int clockid, int flags);
    int timerfd_settime(int ufd, int flags,
    const struct itimerspec *utmr,
    struct itimerspec *otmr);
    int timerfd_gettime(int ufd, struct itimerspec *otmr);

    The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
    parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.

    The timerfd_settime() API give new settings by the timerfd fd, by optionally
    retrieving the previous expiration time (in case the "otmr" parameter is not
    NULL).

    The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
    is set in the "flags" parameter. Otherwise it's a relative time.

    The timerfd_gettime() API returns the next expiration time of the timer, or
    {0, 0} if the timerfd has not been set yet.

    Like the previous timerfd API implementation, read(2) and poll(2) are
    supported (with the same interface). Here's a simple test program I used to
    exercise the new timerfd APIs:

    http://www.xmailserver.org/timerfd-test2.c

    [akpm@linux-foundation.org: coding-style cleanups]
    [akpm@linux-foundation.org: fix ia64 build]
    [akpm@linux-foundation.org: fix m68k build]
    [akpm@linux-foundation.org: fix mips build]
    [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
    [heiko.carstens@de.ibm.com: fix s390]
    [akpm@linux-foundation.org: fix powerpc build]
    [akpm@linux-foundation.org: fix sparc64 more]
    Signed-off-by: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Thomas Gleixner
    Cc: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Cc: Michael Kerrisk
    Cc: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

02 Feb, 2008

1 commit

  • this patch:

    commit 37bb6cb4097e29ffee970065b74499cbf10603a3
    Author: Peter Zijlstra
    Date: Fri Jan 25 21:08:32 2008 +0100

    hrtimer: unlock hrtimer_wakeup

    Broke hrtimer_init_sleeper() users. It forgot to fix up the futex
    caller of this function to detect the failed queueing and messed up
    the do_nanosleep() caller in that it could leak a TASK_INTERRUPTIBLE
    state.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

26 Jan, 2008

3 commits


22 Jan, 2008

1 commit

  • Fix section mismatch in hrtimer.c:

    WARNING: vmlinux.o(.text+0x50c61): Section mismatch: reference to .init.text: (between 'hrtimer_cpu_notify' and 'down_read_trylock')

    Noticed by Johannes Berg and confirmed by Sam Ravnborg.

    Signed-off-by: Randy Dunlap
    Cc: Sam Ravnborg
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

08 Dec, 2007

1 commit

  • Relative hrtimers with a large timeout value might end up as negative
    timer values, when the current time is added in hrtimer_start().

    This in turn is causing the clockevents_set_next() function to set an
    huge timeout and sleep for quite a long time when we have a clock
    source which is capable of long sleeps like HPET. With PIT this almost
    goes unnoticed as the maximum delta is ~27ms. The non-hrt/nohz code
    sorts this out in the next timer interrupt, so we never noticed that
    problem which has been there since the first day of hrtimers.

    This bug became more apparent in 2.6.24 which activates HPET on more
    hardware.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

29 Oct, 2007

1 commit


20 Oct, 2007

2 commits


19 Oct, 2007

1 commit


11 Oct, 2007

1 commit


26 Jul, 2007

2 commits

  • This avoids xtime lag seen with dynticks, because while 'xtime' itself
    is still not updated often, we keep a 'xtime_cache' variable around that
    contains the approximate real-time that _is_ updated each time we do a
    'update_wall_time()', and is thus never off by more than one tick.

    IOW, this restores the original semantics for 'xtime' users, as long as
    you use the proper abstraction functions (ie 'current_kernel_time()' or
    'get_seconds()' depending on whether you want a timespec or just the
    seconds field).

    [ Updated Patch. As penance for my sins I've also yanked another #ifdef
    that was added to avoid the xtime lag w/ hrtimers. ]

    Signed-off-by: John Stultz
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    john stultz
     
  • This avoids use of the kernel-internal "xtime" variable directly outside
    of the actual time-related functions. Instead, use the helper functions
    that we already have available to us.

    This doesn't actually change any behaviour, but this will allow us to
    fix the fact that "xtime" isn't updated very often with CONFIG_NO_HZ
    (because much of the realtime information is maintained as separate
    offsets to 'xtime'), which has caused interfaces that use xtime directly
    to get a time that is out of sync with the real-time clock by up to a
    third of a second or so.

    Signed-off-by: John Stultz
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    john stultz
     

22 Jul, 2007

2 commits


17 Jul, 2007

1 commit


10 May, 2007

1 commit

  • Since nonboot CPUs are now disabled after tasks and devices have been
    frozen and the CPU hotplug infrastructure is used for this purpose, we need
    special CPU hotplug notifications that will help the CPU-hotplug-aware
    subsystems distinguish normal CPU hotplug events from CPU hotplug events
    related to a system-wide suspend or resume operation in progress. This
    patch introduces such notifications and causes them to be used during
    suspend and resume transitions. It also changes all of the
    CPU-hotplug-aware subsystems to take these notifications into consideration
    (for now they are handled in the same way as the corresponding "normal"
    ones).

    [oleg@tv-sign.ru: cleanups]
    Signed-off-by: Rafael J. Wysocki
    Cc: Gautham R Shenoy
    Cc: Pavel Machek
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

09 May, 2007

1 commit

  • Other symbols of the hrtimers API are already exported.

    Signed-off-by: Stas Sergeev
    Acked-by: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stas Sergeev
     

28 Apr, 2007

1 commit


26 Apr, 2007

1 commit

  • Get rid of the manual clock source selection mess and use ktime. Also
    use a scalar representation, which allows to clean up pkt_sched.h a bit
    more and results in less ktime_to_ns() calls in most cases.

    The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite
    inefficient by this patch, following patches will convert all qdiscs
    to hrtimers and get rid of them entirely.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

08 Apr, 2007

1 commit

  • Soeren Sonnenburg reported that upon resume he is getting
    this backtrace:

    [] smp_apic_timer_interrupt+0x57/0x90
    [] retrigger_next_event+0x0/0xb0
    [] apic_timer_interrupt+0x28/0x30
    [] retrigger_next_event+0x0/0xb0
    [] __kfifo_put+0x8/0x90
    [] on_each_cpu+0x35/0x60
    [] clock_was_set+0x18/0x20
    [] timekeeping_resume+0x7c/0xa0
    [] __sysdev_resume+0x11/0x80
    [] sysdev_resume+0x47/0x80
    [] device_power_up+0x5/0x10

    it turns out that on resume we mistakenly re-enable interrupts too
    early. Do the timer retrigger only on the current CPU.

    Signed-off-by: Ingo Molnar
    Acked-by: Thomas Gleixner
    Acked-by: Soeren Sonnenburg
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

29 Mar, 2007

1 commit

  • hrtimer_start() incorrectly set the 'reprogram' flag to enqueue_hrtimer(),
    which should only be 1 if the hrtimer is queued to the current CPU.

    Doing otherwise could result in a reprogramming of the current CPU's
    clockevents device, with a timer that is not queued to it - resulting in a
    bogus next expiry value.

    Signed-off-by: Ingo Molnar
    Cc: Michal Piotrowski
    Acked-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

17 Mar, 2007

2 commits

  • commit f4304ab21513b834c8fe3403927c60c2b81a72d7 (HZ free NTP) moved the
    access to wall_to_monotonic in hrtimer_get_softirq_time() out of the
    xtime_lock protection.

    Move it back into the seq_lock section.

    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • hrtimer_forward() does not check for the possible overflow of
    timer->expires. This can happen on 64 bit machines with large interval
    values and results currently in an endless loop in the softirq because the
    expiry value becomes negative and therefor the timer is expired all the
    time.

    Check for this condition and set the expiry value to the max. expiry time
    in the future. The fix should be applied to stable kernel series as well.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

07 Mar, 2007

1 commit

  • The TIMER_SOFTIRQ runs the hrtimers during bootup until a usable
    clocksource and clock event sources are registered. The switch to high
    resolution mode happens inside of the TIMER_SOFTIRQ, but runs the softirq
    afterwards. That way the tick emulation timer, which was set up in the
    switch to highres might be executed in the softirq context, which is a BUG.
    The rbtree has not to be touched by the softirq after the highres switch.

    This BUG was observed by Andres Salomon, who provided the information to
    debug it.

    Return early from the softirq, when the switch was sucessful.

    [dilinger@debian.org: add debug warning]
    [akpm@linux-foundation.org: make debug warning compile]
    Signed-off-by: Thomas Gleixner
    Cc: Andres Salomon
    Acked-by: Ingo Molnar
    Signed-off-by: Andres Salomon
    Acked-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

05 Mar, 2007

1 commit

  • Doing something like this on a two cpu system

    # echo 0 > /sys/devices/system/cpu/cpu0/online
    # echo 1 > /sys/devices/system/cpu/cpu0/online
    # echo 0 > /sys/devices/system/cpu/cpu1/online

    will give me this:

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.21-rc2-g562aa1d4-dirty #7
    -------------------------------------------------------
    bash/1282 is trying to acquire lock:
    (&cpu_base->lock_key){.+..}, at: [] hrtimer_cpu_notify+0xc6/0x240

    but task is already holding lock:
    (&cpu_base->lock_key#2){.+..}, at: [] hrtimer_cpu_notify+0xbc/0x240

    which lock already depends on the new lock.

    This happens because we have the following code in kernel/hrtimer.c:

    migrate_hrtimers(int cpu)
    [...]
    old_base = &per_cpu(hrtimer_bases, cpu);
    new_base = &get_cpu_var(hrtimer_bases);
    [...]
    spin_lock(&new_base->lock);
    spin_lock(&old_base->lock);

    Which means the spinlocks are taken in an order which depends on which cpu
    gets shut down from which other cpu. Therefore lockdep complains that there
    might be an ABBA deadlock. Since migrate_hrtimers() gets only called on
    cpu hotplug it's safe to assume that it isn't executed concurrently on a

    The same problem exists in kernel/timer.c: migrate_timers().

    As pointed out by Christian Borntraeger one possible solution to avoid
    the locking order complaints would be to make sure that the locks are
    always taken in the same order. E.g. by taking the lock of the cpu with
    the lower number first.

    To achieve this we introduce two new spinlock functions double_spin_lock
    and double_spin_unlock which lock or unlock two locks in a given order.

    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Roman Zippel
    Cc: John Stultz
    Cc: Christian Borntraeger
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

17 Feb, 2007

5 commits

  • Add /proc/timer_stats support: debugging feature to profile timer expiration.
    Both the starting site, process/PID and the expiration function is captured.
    This allows the quick identification of timer event sources in a system.

    Sample output:

    # echo 1 > /proc/timer_stats
    # cat /proc/timer_stats
    Timer Stats Version: v0.1
    Sample period: 4.010 s
    24, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
    11, 0 swapper sk_reset_timer (tcp_delack_timer)
    6, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
    2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
    17, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
    2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
    4, 2050 pcscd do_nanosleep (hrtimer_wakeup)
    5, 4179 sshd sk_reset_timer (tcp_write_timer)
    4, 2248 yum-updatesd schedule_timeout (process_timeout)
    18, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
    3, 0 swapper sk_reset_timer (tcp_delack_timer)
    1, 1 swapper neigh_table_init_no_netlink (neigh_periodic_timer)
    2, 1 swapper e1000_up (e1000_watchdog)
    1, 1 init schedule_timeout (process_timeout)
    100 total events, 25.24 events/sec

    [ cleanups and hrtimers support from Thomas Gleixner ]
    [bunk@stusta.de: nr_entries can become static]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Cc: john stultz
    Cc: Roman Zippel
    Cc: Andi Kleen
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Implement high resolution timers on top of the hrtimers infrastructure and the
    clockevents / tick-management framework. This provides accurate timers for
    all hrtimer subsystem users.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • With Ingo Molnar

    Add functions to provide dynamic ticks and high resolution timers. The code
    which keeps track of jiffies and handles the long idle periods is shared
    between tick based and high resolution timer based dynticks. The dyntick
    functionality can be disabled on the kernel commandline. Provide also the
    infrastructure to support high resolution timers.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Architectures register their clock event devices, in the clock events core.
    Users of the clockevents core can get clock event devices for their use. The
    clockevents core code provides notification mechanisms for various clock
    related management events.

    This allows to control the clock event devices without the architectures
    having to worry about the details of function assignment. This is also a
    preliminary for high resolution timers and dynamic ticks to allow the core
    code to control the clock functionality without intrusive changes to the
    architecture code.

    [Fixes-by: Ingo Molnar ]
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: Roman Zippel
    Cc: john stultz
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Reintroduce ktimers feature "optimized away" by the ktimers review process:
    remove the curr_timer pointer from the cpu-base and use the hrtimer state.

    No functional changes.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: Roman Zippel
    Cc: john stultz
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner