15 Jul, 2016

1 commit

  • When tearing down, call timers_dead_cpu() before notify_dead().
    There is a hidden dependency between:

    - timers
    - block multiqueue
    - rcutree

    If timers_dead_cpu() comes later than blk_mq_queue_reinit_notify()
    that latter function causes a RCU stall.

    Signed-off-by: Richard Cochran
    Signed-off-by: Anna-Maria Gleixner
    Reviewed-by: Sebastian Andrzej Siewior
    Cc: John Stultz
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rasmus Villemoes
    Cc: Thomas Gleixner
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160713153337.566790058@linutronix.de
    Signed-off-by: Ingo Molnar

    Richard Cochran
     

07 Jul, 2016

5 commits

  • We now have implicit batching in the timer wheel. The slack API is no longer
    used, so remove it.

    Signed-off-by: Thomas Gleixner
    Cc: Alan Stern
    Cc: Andrew F. Davis
    Cc: Arjan van de Ven
    Cc: Chris Mason
    Cc: David S. Miller
    Cc: David Woodhouse
    Cc: Dmitry Eremin-Solenikov
    Cc: Eric Dumazet
    Cc: Frederic Weisbecker
    Cc: George Spelvin
    Cc: Greg Kroah-Hartman
    Cc: Jaehoon Chung
    Cc: Jens Axboe
    Cc: John Stultz
    Cc: Josh Triplett
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Mathias Nyman
    Cc: Pali Rohár
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Sebastian Reichel
    Cc: Ulf Hansson
    Cc: linux-block@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-mmc@vger.kernel.org
    Cc: linux-pm@vger.kernel.org
    Cc: linux-usb@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160704094342.189813118@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The current timer wheel has some drawbacks:

    1) Cascading:

    Cascading can be an unbound operation and is completely pointless in most
    cases because the vast majority of the timer wheel timers are canceled or
    rearmed before expiration. (They are used as timeout safeguards, not as
    real timers to measure time.)

    2) No fast lookup of the next expiring timer:

    In NOHZ scenarios the first timer soft interrupt after a long NOHZ period
    must fast forward the base time to the current value of jiffies. As we
    have no way to find the next expiring timer fast, the code loops linearly
    and increments the base time one by one and checks for expired timers
    in each step. This causes unbound overhead spikes exactly in the moment
    when we should wake up as fast as possible.

    After a thorough analysis of real world data gathered on laptops,
    workstations, webservers and other machines (thanks Chris!) I came to the
    conclusion that the current 'classic' timer wheel implementation can be
    modified to address the above issues.

    The vast majority of timer wheel timers is canceled or rearmed before
    expiry. Most of them are timeouts for networking and other I/O tasks. The
    nature of timeouts is to catch the exception from normal operation (TCP ack
    timed out, disk does not respond, etc.). For these kinds of timeouts the
    accuracy of the timeout is not really a concern. Timeouts are very often
    approximate worst-case values and in case the timeout fires, we already
    waited for a long time and performance is down the drain already.

    The few timers which actually expire can be split into two categories:

    1) Short expiry times which expect halfways accurate expiry

    2) Long term expiry times are inaccurate today already due to the
    batching which is done for NOHZ automatically and also via the
    set_timer_slack() API.

    So for long term expiry timers we can avoid the cascading property and just
    leave them in the less granular outer wheels until expiry or
    cancelation. Timers which are armed with a timeout larger than the wheel
    capacity are no longer cascaded. We expire them with the longest possible
    timeout (6+ days). We have not observed such timeouts in our data collection,
    but at least we handle them, applying the rule of the least surprise.

    To avoid extending the wheel levels for HZ=1000 so we can accomodate the
    longest observed timeouts (5 days in the network conntrack code) we reduce the
    first level granularity on HZ=1000 to 4ms, which effectively is the same as
    the HZ=250 behaviour. From our data analysis there is nothing which relies on
    that 1ms granularity and as a side effect we get better batching and timer
    locality for the networking code as well.

    Contrary to the classic wheel the granularity of the next wheel is not the
    capacity of the first wheel. The granularities of the wheels are in the
    currently chosen setting 8 times the granularity of the previous wheel.

    So for HZ=250 we end up with the following granularity levels:

    Level Offset Granularity Range
    0 0 4 ms 0 ms - 252 ms
    1 64 32 ms 256 ms - 2044 ms (256ms - ~2s)
    2 128 256 ms 2048 ms - 16380 ms (~2s - ~16s)
    3 192 2048 ms (~2s) 16384 ms - 131068 ms (~16s - ~2m)
    4 256 16384 ms (~16s) 131072 ms - 1048572 ms (~2m - ~17m)
    5 320 131072 ms (~2m) 1048576 ms - 8388604 ms (~17m - ~2h)
    6 384 1048576 ms (~17m) 8388608 ms - 67108863 ms (~2h - ~18h)
    7 448 8388608 ms (~2h) 67108864 ms - 536870911 ms (~18h - ~6d)

    That's a worst case inaccuracy of 12.5% for the timers which are queued at the
    beginning of a level.

    So the new wheel concept addresses the old issues:

    1) Cascading is avoided completely

    2) By keeping the timers in the bucket until expiry/cancelation we can track
    the buckets which have timers enqueued in a bucket bitmap and therefore can
    look up the next expiring timer very fast and O(1).

    A further benefit of the concept is that the slack calculation which is done
    on every timer start is no longer necessary because the granularity levels
    provide natural batching already.

    Our extensive testing with various loads did not show any performance
    degradation vs. the current wheel implementation.

    This patch does not address the 'fast lookup' issue as we wanted to make sure
    that there is no regression introduced by the wheel redesign. The
    optimizations are in follow up patches.

    This patch contains fixes from Anna-Maria Gleixner and Richard Cochran.

    Signed-off-by: Thomas Gleixner
    Cc: Arjan van de Ven
    Cc: Chris Mason
    Cc: Eric Dumazet
    Cc: Frederic Weisbecker
    Cc: George Spelvin
    Cc: Josh Triplett
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160704094342.108621834@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • We want to store the array index in the flags space. 256k CPUs should be
    enough for a while.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Cc: Arjan van de Ven
    Cc: Chris Mason
    Cc: George Spelvin
    Cc: Josh Triplett
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Paul McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160704094342.030144293@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • We switched all users to initialize the timers as pinned and call
    mod_timer(). Remove the now unused timer API function.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Cc: Arjan van de Ven
    Cc: Chris Mason
    Cc: Eric Dumazet
    Cc: George Spelvin
    Cc: Josh Triplett
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160704094341.706205231@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • We want to move the timer migration logic from a 'push' to a 'pull' model.

    Under the current 'push' model pinned timers are handled via
    a runtime API variant: mod_timer_pinned().

    The 'pull' model requires us to store the pinned attribute of a timer
    in the timer_list structure itself, as a new TIMER_PINNED bit in
    timer->flags.

    This flag must be set at initialization time and the timer APIs
    recognize the flag.

    This patch:

    - Implements the new flag and associated new-style initialization
    methods

    - makes mod_timer() recognize new-style pinned timers,

    - and adds some migration helper facility to allow
    step by step conversion of old-style to new-style
    pinned timers.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Cc: Arjan van de Ven
    Cc: Chris Mason
    Cc: Eric Dumazet
    Cc: George Spelvin
    Cc: Josh Triplett
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160704094341.049338558@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

06 May, 2016

1 commit


19 Jun, 2015

4 commits

  • Eric reported that the timer_migration sysctl is not really nice
    performance wise as it needs to check at every timer insertion whether
    the feature is enabled or not. Further the check does not live in the
    timer code, so we have an extra function call which checks an extra
    cache line to figure out that it is disabled.

    We can do better and store that information in the per cpu (hr)timer
    bases. I pondered to use a static key, but that's a nightmare to
    update from the nohz code and the timer base cache line is hot anyway
    when we select a timer base.

    The old logic enabled the timer migration unconditionally if
    CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
    line.

    With this modification, we start off with migration disabled. The user
    visible sysctl is still set to enabled. If the kernel switches to NOHZ
    migration is enabled, if the user did not disable it via the sysctl
    prior to the switch. If nohz=off is on the kernel command line,
    migration stays disabled no matter what.

    Before:
    47.76% hog [.] main
    14.84% [kernel] [k] _raw_spin_lock_irqsave
    9.55% [kernel] [k] _raw_spin_unlock_irqrestore
    6.71% [kernel] [k] mod_timer
    6.24% [kernel] [k] lock_timer_base.isra.38
    3.76% [kernel] [k] detach_if_pending
    3.71% [kernel] [k] del_timer
    2.50% [kernel] [k] internal_add_timer
    1.51% [kernel] [k] get_nohz_timer_target
    1.28% [kernel] [k] __internal_add_timer
    0.78% [kernel] [k] timerfn
    0.48% [kernel] [k] wake_up_nohz_cpu

    After:
    48.10% hog [.] main
    15.25% [kernel] [k] _raw_spin_lock_irqsave
    9.76% [kernel] [k] _raw_spin_unlock_irqrestore
    6.50% [kernel] [k] mod_timer
    6.44% [kernel] [k] lock_timer_base.isra.38
    3.87% [kernel] [k] detach_if_pending
    3.80% [kernel] [k] del_timer
    2.67% [kernel] [k] internal_add_timer
    1.33% [kernel] [k] __internal_add_timer
    0.73% [kernel] [k] timerfn
    0.54% [kernel] [k] wake_up_nohz_cpu

    Reported-by: Eric Dumazet
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Paul McKenney
    Cc: Frederic Weisbecker
    Cc: Viresh Kumar
    Cc: John Stultz
    Cc: Joonwoo Park
    Cc: Wenbo Wang
    Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Simplify the handling of the flag storage for the timer statistics. No
    intermediate storage anymore. Just hand over the flags field.

    I left the printout of 'deferrable' for now because changing this
    would be an ABI update and I have no idea how strong people feel about
    that. OTOH, I wonder whether we should kill the whole timer stats
    stuff because all of that information can be retrieved via ftrace/perf
    as well.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Paul McKenney
    Cc: Frederic Weisbecker
    Cc: Eric Dumazet
    Cc: Viresh Kumar
    Cc: John Stultz
    Cc: Joonwoo Park
    Cc: Wenbo Wang
    Link: http://lkml.kernel.org/r/20150526224512.046626248@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Instead of storing a pointer to the per cpu tvec_base we can simply
    cache a CPU index in the timer_list and use that to get hold of the
    correct per cpu tvec_base. This is only used in lock_timer_base() and
    the slightly larger code is peanuts versus the spinlock operation and
    the d-cache foot print of the timer wheel.

    Aside of that this allows to get rid of following nuisances:

    - boot_tvec_base

    That statically allocated 4k bss data is just kept around so the
    timer has a home when it gets statically initialized. It serves no
    other purpose.

    With the CPU index we assign the timer to CPU0 at static
    initialization time and therefor can avoid the whole boot_tvec_base
    dance. That also simplifies the init code, which just can use the
    per cpu base.

    Before:
    text data bss dec hex filename
    17491 9201 4160 30852 7884 ../build/kernel/time/timer.o
    After:
    text data bss dec hex filename
    17440 9193 0 26633 6809 ../build/kernel/time/timer.o

    - Overloading the base pointer with various flags

    The CPU index has enough space to hold the flags (deferrable,
    irqsafe) so we can get rid of the extra masking and bit fiddling
    with the base pointer.

    As a benefit we reduce the size of struct timer_list on 64 bit
    machines. 4 - 8 bytes, a size reduction up to 15% per struct timer_list,
    which is a real win as we have tons of them embedded in other structs.

    This changes also the newly added deferrable printout of the timer
    start trace point to capture and print all timer->flags, which allows
    us to decode the target cpu of the timer as well.

    We might have used bitfields for this, but that would change the
    static initializers and the init function for no value to accomodate
    big endian bitfields.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Paul McKenney
    Cc: Frederic Weisbecker
    Cc: Eric Dumazet
    Cc: Viresh Kumar
    Cc: John Stultz
    Cc: Joonwoo Park
    Cc: Wenbo Wang
    Cc: Steven Rostedt
    Cc: Badhri Jagan Sridharan
    Link: http://lkml.kernel.org/r/20150526224511.950084301@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • This reduces the size of struct tvec_base by 50% and results in
    slightly smaller code as well.

    Before:
    struct tvec_base: size: 8256, cachelines: 129

    text data bss dec hex filename
    17698 13297 8256 39251 9953 ../build/kernel/time/timer.o

    After:
    struct tvec_base: 4160, cachelines: 65

    text data bss dec hex filename
    17491 9201 4160 30852 7884 ../build/kernel/time/timer.o

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Viresh Kumar
    Cc: Peter Zijlstra
    Cc: Paul McKenney
    Cc: Frederic Weisbecker
    Cc: Eric Dumazet
    Cc: John Stultz
    Cc: Joonwoo Park
    Cc: Wenbo Wang
    Link: http://lkml.kernel.org/r/20150526224511.854731214@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

22 Apr, 2015

1 commit

  • The evaluation of the next timer in the nohz code is based on jiffies
    while all the tick internals are nano seconds based. We have also to
    convert hrtimer nanoseconds to jiffies in the !highres case. That's
    just wrong and introduces interesting corner cases.

    Turn it around and convert the next timer wheel timer expiry and the
    rcu event to clock monotonic and base all calculations on
    nanoseconds. That identifies the case where no timer is pending
    clearly with an absolute expiry value of KTIME_MAX.

    Makes the code more readable and gets rid of the jiffies magic in the
    nohz code.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Paul E. McKenney
    Acked-by: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Viresh Kumar
    Cc: Marcelo Tosatti
    Cc: Frederic Weisbecker
    Cc: Josh Triplett
    Cc: Lai Jiangshan
    Cc: John Stultz
    Cc: Marcelo Tosatti
    Link: http://lkml.kernel.org/r/20150414203502.184198593@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

21 Aug, 2012

4 commits

  • Timer internals are protected with irq-safe locks but timer execution
    isn't, so a timer being dequeued for execution and its execution
    aren't atomic against IRQs. This makes it impossible to wait for its
    completion from IRQ handlers and difficult to shoot down a timer from
    IRQ handlers.

    This issue caused some issues for delayed_work interface. Because
    there's no way to reliably shoot down delayed_work->timer from IRQ
    handlers, __cancel_delayed_work() can't share the logic to steal the
    target delayed_work with cancel_delayed_work_sync(), and can only
    steal delayed_works which are on queued on timer. Similarly, the
    pending mod_delayed_work() can't be used from IRQ handlers.

    This patch adds a new timer flag TIMER_IRQSAFE, which makes the timer
    to be executed without enabling IRQ after dequeueing such that its
    dequeueing and execution are atomic against IRQ handlers.

    This makes it safe to wait for the timer's completion from IRQ
    handlers, for example, using del_timer_sync(). It can never be
    executing on the local CPU and if executing on other CPUs it won't be
    interrupted until done.

    This will enable simplifying delayed_work cancel/mod interface.

    Signed-off-by: Tejun Heo
    Cc: torvalds@linux-foundation.org
    Cc: peterz@infradead.org
    Link: http://lkml.kernel.org/r/1344449428-24962-5-git-send-email-tj@kernel.org
    Signed-off-by: Thomas Gleixner

    Tejun Heo
     
  • Over time, timer initializers became messy with unnecessarily
    duplicated code which are inconsistently spread across timer.h and
    timer.c.

    This patch cleans up timer initializers.

    * timer.c::__init_timer() is renamed to do_init_timer().

    * __TIMER_INITIALIZER() added. It takes @flags and all initializers
    are wrappers around it.

    * init_timer[_on_stack]_key() now take @flags.

    * __init_timer[_on_stack]() added. They take @flags and all init
    macros are wrappers around them.

    * __setup_timer[_on_stack]() added. It uses __init_timer() and takes
    @flags. All setup macros are wrappers around the two.

    Note that this patch doesn't add missing init/setup combinations -
    e.g. init_timer_deferrable_on_stack(). Adding missing ones is
    trivial.

    Signed-off-by: Tejun Heo
    Cc: torvalds@linux-foundation.org
    Cc: peterz@infradead.org
    Link: http://lkml.kernel.org/r/1344449428-24962-4-git-send-email-tj@kernel.org
    Signed-off-by: Thomas Gleixner

    Tejun Heo
     
  • init_timer_on_stack_key() is used by init macro definitions. Move
    init_timer_on_stack_key() and destroy_timer_on_stack() declarations
    above init macro defs. This will make the next init cleanup patch
    easier to read.

    Signed-off-by: Tejun Heo
    Cc: torvalds@linux-foundation.org
    Cc: peterz@infradead.org
    Link: http://lkml.kernel.org/r/1344449428-24962-3-git-send-email-tj@kernel.org
    Signed-off-by: Thomas Gleixner

    Tejun Heo
     
  • To prepare for addition of another flag, generalize timer->base flags
    handling.

    * Rename from TBASE_*_FLAG to TIMER_* and make them LU constants.

    * Define and use TIMER_FLAG_MASK for flags masking so that multiple
    flags can be handled correctly.

    * Don't dereference timer->base directly even if
    !tbase_get_deferrable(). All two such places are already passed in
    @base, so use it instead.

    * Make sure tvec_base's alignment is large enough for timer->base
    flags using BUILD_BUG_ON().

    Signed-off-by: Tejun Heo
    Cc: torvalds@linux-foundation.org
    Cc: peterz@infradead.org
    Link: http://lkml.kernel.org/r/1344449428-24962-2-git-send-email-tj@kernel.org
    Signed-off-by: Thomas Gleixner

    Tejun Heo
     

22 Oct, 2010

1 commit

  • On UP try_to_del_timer_sync() is mapped to del_timer() which does not
    take the running timer callback into account, so it has different
    semantics.

    Remove the SMP dependency of try_to_del_timer_sync() by using
    base->running_timer in the UP case as well.

    [ tglx: Removed set_running_timer() inline and tweaked the changelog ]

    Signed-off-by: Yong Zhang
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Acked-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Yong Zhang
     

21 Oct, 2010

3 commits

  • Currently, you have to just define a delayed_work uninitialised, and then
    initialise it before first use. That's a tad clumsy. At risk of playing
    mind-games with the compiler, fooling it into doing pointer arithmetic
    with compile-time-constants, this lets clients properly initialise delayed
    work with deferrable timers statically.

    This patch was inspired by the issues which lead Artem Bityutskiy to
    commit 8eab945c5616fc984 ("sunrpc: make the cache cleaner workqueue
    deferrable").

    Signed-off-by: Phil Carmody
    Acked-by: Artem Bityutskiy
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Phil Carmody
     
  • TIMER_INITIALIZER() should initialize the field slack of timer_list as
    __init_timer() does.

    Signed-off-by: Changli Gao
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Changli Gao
     
  • Reorder struct timer_list to remove 8 bytes of alignment padding on 64
    bit builds when CONFIG_TIMER_STATS is selected.

    timer_list is widely used across the kernel so many structures will
    benefit and shrink in size.

    For example, with my config on x86_64
    per_cpu_dm_data shrinks from 136 to 128 bytes
    and
    ahci_port_priv shrinks from 1032 to 968 bytes.

    Signed-off-by: Richard Kennedy
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Richard Kennedy
     

03 Aug, 2010

1 commit


07 Apr, 2010

1 commit

  • While HR timers have had the concept of timer slack for quite some time
    now, the legacy timers lacked this concept, and had to make do with
    round_jiffies() and friends.

    Timer slack is important for power management; grouping timers reduces the
    number of wakeups which in turn reduces power consumption.

    This patch introduces timer slack to the legacy timers using the following
    pieces:
    * A slack field in the timer struct
    * An api (set_timer_slack) that callers can use to set explicit timer slack
    * A default slack of 0.4% of the requested delay for callers that do not set
    any explicit slack
    * Rounding code that is part of mod_timer() that tries to
    group timers around jiffies values every 'power of two'
    (so quick timers will group around every 2, but longer timers
    will group around every 4, 8, 16, 32 etc)

    Signed-off-by: Arjan van de Ven
    Cc: johnstul@us.ibm.com
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Arjan van de Ven
     

31 Aug, 2009

1 commit


24 Jun, 2009

1 commit

  • When the kernel is configured with CONFIG_TIMER_STATS but timer
    stats are runtime disabled we still get calls to
    __timer_stats_timer_set_start_info which initializes some
    fields in the corresponding struct timer_list.

    So add some quick checks in the the timer stats setup functions
    to avoid function calls to __timer_stats_timer_set_start_info
    when timer stats are disabled.

    In an artificial workload that does nothing but playing ping
    pong with a single tcp packet via loopback this decreases cpu
    consumption by 1 - 1.5%.

    This is part of a modified function trace output on SLES11:

    perl-2497 [00] 28630647177732388 [+ 125]: sk_reset_timer
    Cc: Andrew Morton
    Cc: Martin Schwidefsky
    Cc: Mustafa Mesanovic
    Cc: Arjan van de Ven
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Heiko Carstens
     

13 May, 2009

1 commit

  • * Arun R Bharadwaj [2009-04-16 12:11:36]:

    This patch creates a new framework for identifying cpu-pinned timers
    and hrtimers.

    This framework is needed because pinned timers are expected to fire on
    the same CPU on which they are queued. So it is essential to identify
    these and not migrate them, in case there are any.

    For regular timers, the currently existing add_timer_on() can be used
    queue pinned timers and subsequently mod_timer_pinned() can be used
    to modify the 'expires' field.

    For hrtimers, new modes HRTIMER_ABS_PINNED and HRTIMER_REL_PINNED are
    added to queue cpu-pinned hrtimer.

    [ tglx: use .._PINNED mode argument instead of creating tons of new
    functions ]

    Signed-off-by: Arun R Bharadwaj
    Signed-off-by: Thomas Gleixner

    Arun R Bharadwaj
     

31 Mar, 2009

1 commit

  • * 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (33 commits)
    lockdep: fix deadlock in lockdep_trace_alloc
    lockdep: annotate reclaim context (__GFP_NOFS), fix SLOB
    lockdep: annotate reclaim context (__GFP_NOFS), fix
    lockdep: build fix for !PROVE_LOCKING
    lockstat: warn about disabled lock debugging
    lockdep: use stringify.h
    lockdep: simplify check_prev_add_irq()
    lockdep: get_user_chars() redo
    lockdep: simplify get_user_chars()
    lockdep: add comments to mark_lock_irq()
    lockdep: remove macro usage from mark_held_locks()
    lockdep: fully reduce mark_lock_irq()
    lockdep: merge the !_READ mark_lock_irq() helpers
    lockdep: merge the _READ mark_lock_irq() helpers
    lockdep: simplify mark_lock_irq() helpers #3
    lockdep: further simplify mark_lock_irq() helpers
    lockdep: simplify the mark_lock_irq() helpers
    lockdep: split up mark_lock_irq()
    lockdep: generate usage strings
    lockdep: generate the state bit definitions
    ...

    Linus Torvalds
     

19 Feb, 2009

1 commit

  • Impact: new timer API

    Based on an idea from Martin Josefsson with the help of
    Patrick McHardy and Stephen Hemminger:

    introduce the mod_timer_pending() API which is a mod_timer()
    offspring that is an invariant on already removed timers.

    (regular mod_timer() re-activates non-pending timers.)

    This is useful for the networking code in that it can
    allow unserialized mod_timer_pending() timer-forwarding
    calls, but a single del_timer*() will stop the timer
    from being reactivated again.

    Also while at it:

    - optimize the regular mod_timer() path some more, the
    timer-stat and a debug check was needlessly duplicated
    in __mod_timer().

    - make the exports come straight after the function, as
    most other exports in timer.c already did.

    - eliminate __mod_timer() as an external API, change the
    users to mod_timer().

    The regular mod_timer() code path is not impacted
    significantly, due to inlining optimizations and due to
    the simplifications.

    Based-on-patch-from: Stephen Hemminger
    Acked-by: Stephen Hemminger
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Cc: netdev@vger.kernel.org
    Cc: Oleg Nesterov
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

15 Feb, 2009

1 commit


06 Nov, 2008

1 commit

  • This patch (as1158b) adds round_jiffies_up() and friends. These
    routines work like the analogous round_jiffies() functions, except
    that they will never round down.

    The new routines will be useful for timeouts where we don't care
    exactly when the timer expires, provided it doesn't expire too soon.

    Signed-off-by: Alan Stern
    Signed-off-by: Jens Axboe

    Alan Stern
     

30 Apr, 2008

1 commit

  • Add calls to the generic object debugging infrastructure and provide fixup
    functions which allow to keep the system alive when recoverable problems have
    been detected by the object debugging core code.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Cc: Greg KH
    Cc: Randy Dunlap
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

09 Feb, 2008

2 commits


30 Jan, 2008

1 commit


17 Jul, 2007

2 commits

  • Add a flag in /proc/timer_stats to indicate deferrable timers. This will
    let developers/users to differentiate between types of tiemrs in
    /proc/timer_stats.

    Deferrable timer and normal timer will appear in /proc/timer_stats as below.
    10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
    10, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)

    Also version of timer_stats changes from v0.1 to v0.2

    Signed-off-by: Venkatesh Pallipadi
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Venki Pallipadi
     
  • Remove the obviously unnecessary includes of under the
    include/linux/ directory, and fix the couple errors that are introduced as
    a result of that.

    Signed-off-by: Robert P. J. Day
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

30 May, 2007

1 commit

  • get_next_timer_interrupt() returns a delta of (LONG_MAX > 1) in case
    there is no timer pending. On 64 bit machines this results in a
    multiplication overflow in tick_nohz_stop_sched_tick().

    Reported by: Dave Miller

    Make the return value a constant and limit the return value to a 32 bit
    value.

    When the max timeout value is returned, we can safely stop the tick
    timer device. The max jiffies delta results in a 12 days timeout for
    HZ=1000.

    In the long term the get_next_timer_interrupt() code needs to be
    reworked to return ktime instead of jiffies, but we have to wait until
    the last users of the original NO_IDLE_HZ code are converted.

    Signed-off-by: Thomas Gleixner
    Acked-off-by: David S. Miller
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

09 May, 2007

1 commit

  • Introduce a new flag for timers - deferrable: Timers that work normally
    when system is busy. But, will not cause CPU to come out of idle (just to
    service this timer), when CPU is idle. Instead, this timer will be
    serviced when CPU eventually wakes up with a subsequent non-deferrable
    timer.

    The main advantage of this is to avoid unnecessary timer interrupts when
    CPU is idle. If the routine currently called by a timer can wait until
    next event without any issues, this new timer can be used to setup timer
    event for that routine. This, with dynticks, allows CPUs to be lazy,
    allowing them to stay in idle for extended period of time by reducing
    unnecesary wakeup and thereby reducing the power consumption.

    This patch:

    Builds this new timer on top of existing timer infrastructure. It uses
    last bit in 'base' pointer of timer_list structure to store this deferrable
    timer flag. __next_timer_interrupt() function skips over these deferrable
    timers when CPU looks for next timer event for which it has to wake up.

    This is exported by a new interface init_timer_deferrable() that can be
    called in place of regular init_timer().

    [akpm@linux-foundation.org: Privatise a #define]
    Signed-off-by: Venkatesh Pallipadi
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Oleg Nesterov
    Cc: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Venki Pallipadi
     

17 Feb, 2007

3 commits

  • Add /proc/timer_stats support: debugging feature to profile timer expiration.
    Both the starting site, process/PID and the expiration function is captured.
    This allows the quick identification of timer event sources in a system.

    Sample output:

    # echo 1 > /proc/timer_stats
    # cat /proc/timer_stats
    Timer Stats Version: v0.1
    Sample period: 4.010 s
    24, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
    11, 0 swapper sk_reset_timer (tcp_delack_timer)
    6, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
    2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
    17, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
    2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
    4, 2050 pcscd do_nanosleep (hrtimer_wakeup)
    5, 4179 sshd sk_reset_timer (tcp_write_timer)
    4, 2248 yum-updatesd schedule_timeout (process_timeout)
    18, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
    3, 0 swapper sk_reset_timer (tcp_delack_timer)
    1, 1 swapper neigh_table_init_no_netlink (neigh_periodic_timer)
    2, 1 swapper e1000_up (e1000_watchdog)
    1, 1 init schedule_timeout (process_timeout)
    100 total events, 25.24 events/sec

    [ cleanups and hrtimers support from Thomas Gleixner ]
    [bunk@stusta.de: nr_entries can become static]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Cc: john stultz
    Cc: Roman Zippel
    Cc: Andi Kleen
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • - hrtimers did not use the hrtimer_restart enum and relied on the implict
    int representation. Fix the prototypes and the functions using the enums.
    - Use seperate name spaces for the enumerations
    - Convert hrtimer_restart macro to inline function
    - Add comments

    No functional changes.

    [akpm@osdl.org: fix input driver]
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Cc: Dmitry Torokhov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • For CONFIG_NO_HZ we need to calculate the next timer wheel event based on a
    given jiffie value. Extend the existing code to allow the extra 'now'
    argument. Provide a compability function for the existing implementations to
    call the function with now == jiffies. (This also solves the racyness of the
    original code vs. jiffies changing during the iteration.)

    No functional changes to existing users of this infrastructure.

    [ remove WARN_ON() that triggered on s390, by Carsten Otte ]
    [ made new helper static, Adrian Bunk ]
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner