25 Nov, 2008

1 commit

  • Impact: cleanup, move all hrtimer processing into hardirq context

    This is an attempt at removing some of the hrtimer complexity by
    reducing the number of callback modes to 1.

    This means that all hrtimer callback functions will be ran from HARD-irq
    context.

    I went through all the 30 odd hrtimer callback functions in the kernel
    and saw only one that I'm not quite sure of, which is the one in
    net/can/bcm.c - hence I'm CC-ing the folks responsible for that code.

    Furthermore, the hrtimer core now calls callbacks directly with IRQs
    disabled in case you try to enqueue an expired timer. If this timer is a
    periodic timer (which should use hrtimer_forward() to advance its time)
    then it might be possible to end up in an inf. recursive loop due to the
    fact that hrtimer_forward() doesn't round up to the next timer
    granularity, and therefore keeps on calling the callback - obviously
    this needs a fix.

    Aside from that, this seems to compile and actually boot on my dual core
    test box - although I'm sure there are some bugs in, me not hitting any
    makes me certain :-)

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

12 Nov, 2008

1 commit


07 Nov, 2008

1 commit

  • Fix the hrtimer_add_expires_ns() function. It should take a 'u64 ns' argument,
    but rather takes an 'unsigned long ns' argument - which might only be 32-bits.

    On FRV, this results in the kernel locking up because hrtimer_forward() passes
    the result of a 64-bit multiplication to this function, for which the compiler
    discards the top 32-bits - something that didn't happen when ktime_add_ns() was
    called directly.

    Signed-off-by: David Howells
    Acked-by: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    David Howells
     

22 Oct, 2008

1 commit


20 Oct, 2008

2 commits


18 Oct, 2008

1 commit


15 Oct, 2008

1 commit


08 Oct, 2008

1 commit


29 Sep, 2008

2 commits

  • Impact: per CPU hrtimers can be migrated from a dead CPU

    The hrtimer code has no knowledge about per CPU timers, but we need to
    prevent the migration of such timers and warn when such a timer is
    active at migration time.

    Explicitely mark the timers as per CPU and use a more understandable
    mode descriptor for the interrupts safe unlocked callback mode, which
    is used by hrtimer_sleeper and the scheduler code.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Impact: during migration active hrtimers can be seen as inactive

    The migration code removes the hrtimers from the queues of the dead
    CPU and sets the state temporary to INACTIVE. The enqueue code sets it
    to ACTIVE/PENDING again.

    Prevent that the wrong state can be seen by using a separate migration
    state bit.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

24 Sep, 2008

1 commit


22 Sep, 2008

2 commits


11 Sep, 2008

1 commit

  • As part of going idle, we already look at the time of the next timer event to determine
    which C-state to select etc.

    This patch adds functionality that causes the timers that are past their
    soft expire time, to fire at this time, before we calculate the next wakeup
    time. This functionality will thus avoid wakeups by running timers before
    going idle rather than specially waking up for it.

    Signed-off-by: Arjan van de Ven

    Arjan van de Ven
     

08 Sep, 2008

2 commits


07 Sep, 2008

1 commit


06 Sep, 2008

5 commits

  • in some randconfig configurations, hrtimers are used even though
    the hrtimer config if off; and it broke the build due to some of
    the new functions being on the wrong side of the ifdef.

    This patch moves the functions to the other side of the ifdef, fixing
    the build bug.

    Signed-off-by: Arjan van de Ven

    Arjan van de Ven
     
  • this patch turns hrtimers into range timers; they have 2 expire points
    1) the soft expire point
    2) the hard expire point

    the kernel will do it's regular best effort attempt to get the timer run
    at the hard expire point. However, if some other time fires after the soft
    expire point, the kernel now has the freedom to fire this timer at this point,
    and thus grouping the events and preventing a power-expensive wakeup in the
    future.

    Signed-off-by: Arjan van de Ven

    Arjan van de Ven
     
  • To catch code that still touches the "expires" memory directly, rename it
    to have the compiler complain rather than get nasty, hard to explain,
    runtime behavior

    Signed-off-by: Arjan van de Ven

    Arjan van de Ven
     
  • In order to be able to turn hrtimers into range based, we need to provide
    accessor functions for getting to the "expires" ktime_t member of the
    struct hrtimer.

    This patch adds a set of accessors for this purpose:
    * hrtimer_set_expires
    * hrtimer_set_expires_tv64
    * hrtimer_add_expires
    * hrtimer_add_expires_ns
    * hrtimer_get_expires
    * hrtimer_get_expires_tv64
    * hrtimer_get_expires_ns
    * hrtimer_expires_remaining
    * hrtimer_start_expires

    No users of these new accessors are added yet; these follow in later patches.
    Hopefully this patch can even go into 2.6.27-rc so that the conversions will
    not have a bottleneck in -next

    Signed-off-by: Arjan van de Ven

    Arjan van de Ven
     
  • This patch adds a schedule_hrtimeout() function, to be used by select() and
    poll() in a later patch. This function works similar to schedule_timeout()
    in most ways, but takes a timespec rather than jiffies.

    With a lot of contributions/fixes from Thomas

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Thomas Gleixner

    Arjan van de Ven
     

04 May, 2008

1 commit

  • The helper function hrtimer_callback_running() is used in
    kernel/hrtimer.c as well as in the updated net/can/bcm.c which now
    supports hrtimers. Moving the helper function to hrtimer.h removes the
    duplicate definition in the C-files.

    Signed-off-by: Oliver Hartkopp
    Cc: David Miller
    Signed-off-by: Thomas Gleixner

    Oliver Hartkopp
     

30 Apr, 2008

1 commit

  • hrtimers have now dynamic users in the network code. Put them under
    debugobjects surveillance as well.

    Add calls to the generic object debugging infrastructure and provide fixup
    functions which allow to keep the system alive when recoverable problems have
    been detected by the object debugging core code.

    Signed-off-by: Thomas Gleixner
    Cc: Greg KH
    Cc: Randy Dunlap
    Cc: Kay Sievers
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

17 Apr, 2008

1 commit

  • In order to avoid the false positive from lockdep, each per-cpu base->lock has
    the separate lock class and migrate_hrtimers() uses double_spin_lock().

    This is overcomplicated: except for migrate_hrtimers() we never take 2 locks
    at once, and migrate_hrtimers() can use spin_lock_nested().

    Signed-off-by: Oleg Nesterov
    Cc: Arjan van de Ven
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     

10 Feb, 2008

1 commit

  • Spotted by Pavel Emelyanov and Alexey Dobriyan.

    hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to
    the local variable which lives in the caller's stack frame. This means that
    if sys_restart_syscall() actually happens and it is interrupted as well, we
    don't update the user-space variable, but write into the already dead stack
    frame.

    Introduced by commit 04c227140fed77587432667a574b14736a06dd7f
    hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier

    Change the callers to pass "__user *rmtp" to hrtimer_nanosleep(), and change
    hrtimer_nanosleep() to use copy_to_user() to actually update *rmtp.

    Small problem remains. man 2 nanosleep states that *rtmp should be written if
    nanosleep() was interrupted (it says nothing whether it is OK to update *rmtp
    if nanosleep returns 0), but (with or without this patch) we can dirty *rem
    even if nanosleep() returns 0.

    NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other
    bugs. Fixed by the next patch.

    Signed-off-by: Oleg Nesterov
    Cc: Alexey Dobriyan
    Cc: Michael Kerrisk
    Cc: Pavel Emelyanov
    Cc: Peter Zijlstra
    Cc: Toyo Abe
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    include/linux/hrtimer.h | 2 -
    kernel/hrtimer.c | 51 +++++++++++++++++++++++++-----------------------
    kernel/posix-timers.c | 14 +------------
    3 files changed, 30 insertions(+), 37 deletions(-)

    Oleg Nesterov
     

09 Feb, 2008

1 commit

  • Fix typo in comments.

    BTW: I have to fix coding style in arch/ia64/kernel/time.c also, otherwise
    checkpatch.pl will be complaining.

    Signed-off-by: Li Zefan
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

08 Feb, 2008

1 commit

  • Commit ad7f71674ad7c3c4467e48f6ab9e85516dae2720 ("[POWERPC] Use a
    sensible default for clock_getres() in the VDSO") corrected the clock
    resolution reported by the VDSO clock_getres() but introduced another
    problem in that older versions of gcc (gcc-4.0 and earlier) fail to
    compile the new code in arch/powerpc/kernel/asm-offsets.c.

    This fixes it by introducing a new MONOTONIC_RES_NSEC define in the
    generic code which is equivalent to KTIME_MONOTONIC_RES but is just an
    integer constant, not a ktime union.

    Signed-off-by: Tony Breeds
    Signed-off-by: Paul Mackerras
    Signed-off-by: Linus Torvalds

    Tony Breeds
     

06 Feb, 2008

2 commits

  • This is the new timerfd API as it is implemented by the following patch:

    int timerfd_create(int clockid, int flags);
    int timerfd_settime(int ufd, int flags,
    const struct itimerspec *utmr,
    struct itimerspec *otmr);
    int timerfd_gettime(int ufd, struct itimerspec *otmr);

    The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
    parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.

    The timerfd_settime() API give new settings by the timerfd fd, by optionally
    retrieving the previous expiration time (in case the "otmr" parameter is not
    NULL).

    The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
    is set in the "flags" parameter. Otherwise it's a relative time.

    The timerfd_gettime() API returns the next expiration time of the timer, or
    {0, 0} if the timerfd has not been set yet.

    Like the previous timerfd API implementation, read(2) and poll(2) are
    supported (with the same interface). Here's a simple test program I used to
    exercise the new timerfd APIs:

    http://www.xmailserver.org/timerfd-test2.c

    [akpm@linux-foundation.org: coding-style cleanups]
    [akpm@linux-foundation.org: fix ia64 build]
    [akpm@linux-foundation.org: fix m68k build]
    [akpm@linux-foundation.org: fix mips build]
    [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
    [heiko.carstens@de.ibm.com: fix s390]
    [akpm@linux-foundation.org: fix powerpc build]
    [akpm@linux-foundation.org: fix sparc64 more]
    Signed-off-by: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Thomas Gleixner
    Cc: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Cc: Michael Kerrisk
    Cc: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • I think that advancing the timer against the timer's current "now" can be a
    pretty common usage, so, w/out exposing hrtimer's internals, we add a new
    hrtimer_forward_now() function.

    Signed-off-by: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

03 Feb, 2008

1 commit


26 Jan, 2008

2 commits

  • Currently all highres=off timers are run from softirq context, but
    HRTIMER_CB_IRQSAFE_NO_SOFTIRQ timers expect to run from irq context.

    Fix this up by splitting it similar to the highres=on case.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Use HR-timers (when available) to deliver an accurate preemption tick.

    The regular scheduler tick that runs at 1/HZ can be too coarse when nice
    level are used. The fairness system will still keep the cpu utilisation 'fair'
    by then delaying the task that got an excessive amount of CPU time but try to
    minimize this by delivering preemption points spot-on.

    The average frequency of this extra interrupt is sched_latency / nr_latency.
    Which need not be higher than 1/HZ, its just that the distribution within the
    sched_latency period is important.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

19 Oct, 2007

1 commit


17 Jul, 2007

1 commit

  • Add a flag in /proc/timer_stats to indicate deferrable timers. This will
    let developers/users to differentiate between types of tiemrs in
    /proc/timer_stats.

    Deferrable timer and normal timer will appear in /proc/timer_stats as below.
    10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
    10, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)

    Also version of timer_stats changes from v0.1 to v0.2

    Signed-off-by: Venkatesh Pallipadi
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Venki Pallipadi
     

08 Apr, 2007

1 commit

  • Soeren Sonnenburg reported that upon resume he is getting
    this backtrace:

    [] smp_apic_timer_interrupt+0x57/0x90
    [] retrigger_next_event+0x0/0xb0
    [] apic_timer_interrupt+0x28/0x30
    [] retrigger_next_event+0x0/0xb0
    [] __kfifo_put+0x8/0x90
    [] on_each_cpu+0x35/0x60
    [] clock_was_set+0x18/0x20
    [] timekeeping_resume+0x7c/0xa0
    [] __sysdev_resume+0x11/0x80
    [] sysdev_resume+0x47/0x80
    [] device_power_up+0x5/0x10

    it turns out that on resume we mistakenly re-enable interrupts too
    early. Do the timer retrigger only on the current CPU.

    Signed-off-by: Ingo Molnar
    Acked-by: Thomas Gleixner
    Acked-by: Soeren Sonnenburg
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

07 Mar, 2007

2 commits


02 Mar, 2007

1 commit