Eric Lee / smarc-fsl-linux-kernel

15 Jul, 2016

1 commit

24f73b997 timers/core: Convert to hotplug state machine ... Browse Code »

When tearing down, call timers_dead_cpu() before notify_dead().
There is a hidden dependency between:

- timers
- block multiqueue
- rcutree

If timers_dead_cpu() comes later than blk_mq_queue_reinit_notify()
that latter function causes a RCU stall.

Signed-off-by: Richard Cochran
Signed-off-by: Anna-Maria Gleixner
Reviewed-by: Sebastian Andrzej Siewior
Cc: John Stultz
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Rasmus Villemoes
Cc: Thomas Gleixner
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153337.566790058@linutronix.de
Signed-off-by: Ingo Molnar

Richard Cochran
2016-07-15 16:41:42 +0800

07 Jul, 2016

5 commits

53bf837b7 timers: Remove set_timer_slack() leftovers ... Browse Code »

We now have implicit batching in the timer wheel. The slack API is no longer
used, so remove it.

Signed-off-by: Thomas Gleixner
Cc: Alan Stern
Cc: Andrew F. Davis
Cc: Arjan van de Ven
Cc: Chris Mason
Cc: David S. Miller
Cc: David Woodhouse
Cc: Dmitry Eremin-Solenikov
Cc: Eric Dumazet
Cc: Frederic Weisbecker
Cc: George Spelvin
Cc: Greg Kroah-Hartman
Cc: Jaehoon Chung
Cc: Jens Axboe
Cc: John Stultz
Cc: Josh Triplett
Cc: Len Brown
Cc: Linus Torvalds
Cc: Mathias Nyman
Cc: Pali Rohár
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Sebastian Reichel
Cc: Ulf Hansson
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.189813118@linutronix.de
Signed-off-by: Ingo Molnar

Thomas Gleixner
2016-07-07 16:35:09 +0800
500462a9d timers: Switch to a non-cascading wheel ... Browse Code »

The current timer wheel has some drawbacks:

1) Cascading:

Cascading can be an unbound operation and is completely pointless in most
cases because the vast majority of the timer wheel timers are canceled or
rearmed before expiration. (They are used as timeout safeguards, not as
real timers to measure time.)

2) No fast lookup of the next expiring timer:

In NOHZ scenarios the first timer soft interrupt after a long NOHZ period
must fast forward the base time to the current value of jiffies. As we
have no way to find the next expiring timer fast, the code loops linearly
and increments the base time one by one and checks for expired timers
in each step. This causes unbound overhead spikes exactly in the moment
when we should wake up as fast as possible.

After a thorough analysis of real world data gathered on laptops,
workstations, webservers and other machines (thanks Chris!) I came to the
conclusion that the current 'classic' timer wheel implementation can be
modified to address the above issues.

The vast majority of timer wheel timers is canceled or rearmed before
expiry. Most of them are timeouts for networking and other I/O tasks. The
nature of timeouts is to catch the exception from normal operation (TCP ack
timed out, disk does not respond, etc.). For these kinds of timeouts the
accuracy of the timeout is not really a concern. Timeouts are very often
approximate worst-case values and in case the timeout fires, we already
waited for a long time and performance is down the drain already.

The few timers which actually expire can be split into two categories:

1) Short expiry times which expect halfways accurate expiry

2) Long term expiry times are inaccurate today already due to the
batching which is done for NOHZ automatically and also via the
set_timer_slack() API.

So for long term expiry timers we can avoid the cascading property and just
leave them in the less granular outer wheels until expiry or
cancelation. Timers which are armed with a timeout larger than the wheel
capacity are no longer cascaded. We expire them with the longest possible
timeout (6+ days). We have not observed such timeouts in our data collection,
but at least we handle them, applying the rule of the least surprise.

To avoid extending the wheel levels for HZ=1000 so we can accomodate the
longest observed timeouts (5 days in the network conntrack code) we reduce the
first level granularity on HZ=1000 to 4ms, which effectively is the same as
the HZ=250 behaviour. From our data analysis there is nothing which relies on
that 1ms granularity and as a side effect we get better batching and timer
locality for the networking code as well.

Contrary to the classic wheel the granularity of the next wheel is not the
capacity of the first wheel. The granularities of the wheels are in the
currently chosen setting 8 times the granularity of the previous wheel.

So for HZ=250 we end up with the following granularity levels:

Level Offset Granularity Range
0 0 4 ms 0 ms - 252 ms
1 64 32 ms 256 ms - 2044 ms (256ms - ~2s)
2 128 256 ms 2048 ms - 16380 ms (~2s - ~16s)
3 192 2048 ms (~2s) 16384 ms - 131068 ms (~16s - ~2m)
4 256 16384 ms (~16s) 131072 ms - 1048572 ms (~2m - ~17m)
5 320 131072 ms (~2m) 1048576 ms - 8388604 ms (~17m - ~2h)
6 384 1048576 ms (~17m) 8388608 ms - 67108863 ms (~2h - ~18h)
7 448 8388608 ms (~2h) 67108864 ms - 536870911 ms (~18h - ~6d)

That's a worst case inaccuracy of 12.5% for the timers which are queued at the
beginning of a level.

So the new wheel concept addresses the old issues:

1) Cascading is avoided completely

2) By keeping the timers in the bucket until expiry/cancelation we can track
the buckets which have timers enqueued in a bucket bitmap and therefore can
look up the next expiring timer very fast and O(1).

A further benefit of the concept is that the slack calculation which is done
on every timer start is no longer necessary because the granularity levels
provide natural batching already.

Our extensive testing with various loads did not show any performance
degradation vs. the current wheel implementation.

This patch does not address the 'fast lookup' issue as we wanted to make sure
that there is no regression introduced by the wheel redesign. The
optimizations are in follow up patches.

This patch contains fixes from Anna-Maria Gleixner and Richard Cochran.

Signed-off-by: Thomas Gleixner
Cc: Arjan van de Ven
Cc: Chris Mason
Cc: Eric Dumazet
Cc: Frederic Weisbecker
Cc: George Spelvin
Cc: Josh Triplett
Cc: Len Brown
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.108621834@linutronix.de
Signed-off-by: Ingo Molnar

Thomas Gleixner
2016-07-07 16:35:09 +0800
b0d6e2dcb timers: Reduce the CPU index space to 256k ... Browse Code »

We want to store the array index in the flags space. 256k CPUs should be
enough for a while.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Cc: Arjan van de Ven
Cc: Chris Mason
Cc: George Spelvin
Cc: Josh Triplett
Cc: Len Brown
Cc: Linus Torvalds
Cc: Paul McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.030144293@linutronix.de
Signed-off-by: Ingo Molnar

Thomas Gleixner
2016-07-07 16:35:08 +0800
177ec0a0a timers: Remove the deprecated mod_timer_pinned() API ... Browse Code »

We switched all users to initialize the timers as pinned and call
mod_timer(). Remove the now unused timer API function.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Cc: Arjan van de Ven
Cc: Chris Mason
Cc: Eric Dumazet
Cc: George Spelvin
Cc: Josh Triplett
Cc: Len Brown
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094341.706205231@linutronix.de
Signed-off-by: Ingo Molnar

Thomas Gleixner
2016-07-07 16:35:06 +0800
e675447bd timers: Make 'pinned' a timer property ... Browse Code »

We want to move the timer migration logic from a 'push' to a 'pull' model.

Under the current 'push' model pinned timers are handled via
a runtime API variant: mod_timer_pinned().

The 'pull' model requires us to store the pinned attribute of a timer
in the timer_list structure itself, as a new TIMER_PINNED bit in
timer->flags.

This flag must be set at initialization time and the timer APIs
recognize the flag.

This patch:

- Implements the new flag and associated new-style initialization
methods

- makes mod_timer() recognize new-style pinned timers,

- and adds some migration helper facility to allow
step by step conversion of old-style to new-style
pinned timers.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Cc: Arjan van de Ven
Cc: Chris Mason
Cc: Eric Dumazet
Cc: George Spelvin
Cc: Josh Triplett
Cc: Len Brown
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094341.049338558@linutronix.de
Signed-off-by: Ingo Molnar

Thomas Gleixner
2016-07-07 16:25:13 +0800

06 May, 2016

1 commit

6f3ffc191 timer: add setup_deferrable_timer macro ... Browse Code »

Add the trivial missing macro to setup a deferrable timer.

Signed-off-by: Lucas Stach
Acked-by: Thomas Gleixner

Lucas Stach
2016-05-06 16:42:29 +0800

19 Jun, 2015

4 commits

bc7a34b8b timer: Reduce timer migration overhead if disabled ... Browse Code »

Eric reported that the timer_migration sysctl is not really nice
performance wise as it needs to check at every timer insertion whether
the feature is enabled or not. Further the check does not live in the
timer code, so we have an extra function call which checks an extra
cache line to figure out that it is disabled.

We can do better and store that information in the per cpu (hr)timer
bases. I pondered to use a static key, but that's a nightmare to
update from the nohz code and the timer base cache line is hot anyway
when we select a timer base.

The old logic enabled the timer migration unconditionally if
CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
line.

With this modification, we start off with migration disabled. The user
visible sysctl is still set to enabled. If the kernel switches to NOHZ
migration is enabled, if the user did not disable it via the sysctl
prior to the switch. If nohz=off is on the kernel command line,
migration stays disabled no matter what.

Before:
47.76% hog [.] main
14.84% [kernel] [k] _raw_spin_lock_irqsave
9.55% [kernel] [k] _raw_spin_unlock_irqrestore
6.71% [kernel] [k] mod_timer
6.24% [kernel] [k] lock_timer_base.isra.38
3.76% [kernel] [k] detach_if_pending
3.71% [kernel] [k] del_timer
2.50% [kernel] [k] internal_add_timer
1.51% [kernel] [k] get_nohz_timer_target
1.28% [kernel] [k] __internal_add_timer
0.78% [kernel] [k] timerfn
0.48% [kernel] [k] wake_up_nohz_cpu

After:
48.10% hog [.] main
15.25% [kernel] [k] _raw_spin_lock_irqsave
9.76% [kernel] [k] _raw_spin_unlock_irqrestore
6.50% [kernel] [k] mod_timer
6.44% [kernel] [k] lock_timer_base.isra.38
3.87% [kernel] [k] detach_if_pending
3.80% [kernel] [k] del_timer
2.67% [kernel] [k] internal_add_timer
1.33% [kernel] [k] __internal_add_timer
0.73% [kernel] [k] timerfn
0.54% [kernel] [k] wake_up_nohz_cpu

Reported-by: Eric Dumazet
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Paul McKenney
Cc: Frederic Weisbecker
Cc: Viresh Kumar
Cc: John Stultz
Cc: Joonwoo Park
Cc: Wenbo Wang
Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2015-06-19 21:18:28 +0800
c74441a17 timer: Stats: Simplify the flags handling ... Browse Code »

Simplify the handling of the flag storage for the timer statistics. No
intermediate storage anymore. Just hand over the flags field.

I left the printout of 'deferrable' for now because changing this
would be an ABI update and I have no idea how strong people feel about
that. OTOH, I wonder whether we should kill the whole timer stats
stuff because all of that information can be retrieved via ftrace/perf
as well.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Paul McKenney
Cc: Frederic Weisbecker
Cc: Eric Dumazet
Cc: Viresh Kumar
Cc: John Stultz
Cc: Joonwoo Park
Cc: Wenbo Wang
Link: http://lkml.kernel.org/r/20150526224512.046626248@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2015-06-19 21:18:27 +0800
0eeda71bc timer: Replace timer base by a cpu index ... Browse Code »

Instead of storing a pointer to the per cpu tvec_base we can simply
cache a CPU index in the timer_list and use that to get hold of the
correct per cpu tvec_base. This is only used in lock_timer_base() and
the slightly larger code is peanuts versus the spinlock operation and
the d-cache foot print of the timer wheel.

Aside of that this allows to get rid of following nuisances:

- boot_tvec_base

That statically allocated 4k bss data is just kept around so the
timer has a home when it gets statically initialized. It serves no
other purpose.

With the CPU index we assign the timer to CPU0 at static
initialization time and therefor can avoid the whole boot_tvec_base
dance. That also simplifies the init code, which just can use the
per cpu base.

Before:
text data bss dec hex filename
17491 9201 4160 30852 7884 ../build/kernel/time/timer.o
After:
text data bss dec hex filename
17440 9193 0 26633 6809 ../build/kernel/time/timer.o

- Overloading the base pointer with various flags

The CPU index has enough space to hold the flags (deferrable,
irqsafe) so we can get rid of the extra masking and bit fiddling
with the base pointer.

As a benefit we reduce the size of struct timer_list on 64 bit
machines. 4 - 8 bytes, a size reduction up to 15% per struct timer_list,
which is a real win as we have tons of them embedded in other structs.

This changes also the newly added deferrable printout of the timer
start trace point to capture and print all timer->flags, which allows
us to decode the target cpu of the timer as well.

We might have used bitfields for this, but that would change the
static initializers and the init function for no value to accomodate
big endian bitfields.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Paul McKenney
Cc: Frederic Weisbecker
Cc: Eric Dumazet
Cc: Viresh Kumar
Cc: John Stultz
Cc: Joonwoo Park
Cc: Wenbo Wang
Cc: Steven Rostedt
Cc: Badhri Jagan Sridharan
Link: http://lkml.kernel.org/r/20150526224511.950084301@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2015-06-19 21:18:27 +0800
1dabbcec2 timer: Use hlist for the timer wheel hash buckets ... Browse Code »

This reduces the size of struct tvec_base by 50% and results in
slightly smaller code as well.

Before:
struct tvec_base: size: 8256, cachelines: 129

text data bss dec hex filename
17698 13297 8256 39251 9953 ../build/kernel/time/timer.o

After:
struct tvec_base: 4160, cachelines: 65

text data bss dec hex filename
17491 9201 4160 30852 7884 ../build/kernel/time/timer.o

Signed-off-by: Thomas Gleixner
Reviewed-by: Viresh Kumar
Cc: Peter Zijlstra
Cc: Paul McKenney
Cc: Frederic Weisbecker
Cc: Eric Dumazet
Cc: John Stultz
Cc: Joonwoo Park
Cc: Wenbo Wang
Link: http://lkml.kernel.org/r/20150526224511.854731214@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2015-06-19 21:18:27 +0800

22 Apr, 2015

1 commit

c1ad348b4 tick: Nohz: Rework next timer evaluation ... Browse Code »

The evaluation of the next timer in the nohz code is based on jiffies
while all the tick internals are nano seconds based. We have also to
convert hrtimer nanoseconds to jiffies in the !highres case. That's
just wrong and introduces interesting corner cases.

Turn it around and convert the next timer wheel timer expiry and the
rcu event to clock monotonic and base all calculations on
nanoseconds. That identifies the case where no timer is pending
clearly with an absolute expiry value of KTIME_MAX.

Makes the code more readable and gets rid of the jiffies magic in the
nohz code.

Signed-off-by: Thomas Gleixner
Reviewed-by: Paul E. McKenney
Acked-by: Peter Zijlstra
Cc: Preeti U Murthy
Cc: Viresh Kumar
Cc: Marcelo Tosatti
Cc: Frederic Weisbecker
Cc: Josh Triplett
Cc: Lai Jiangshan
Cc: John Stultz
Cc: Marcelo Tosatti
Link: http://lkml.kernel.org/r/20150414203502.184198593@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2015-04-22 23:06:50 +0800

21 Aug, 2012

4 commits

c5f66e99b timer: Implement TIMER_IRQSAFE ... Browse Code »

Timer internals are protected with irq-safe locks but timer execution
isn't, so a timer being dequeued for execution and its execution
aren't atomic against IRQs. This makes it impossible to wait for its
completion from IRQ handlers and difficult to shoot down a timer from
IRQ handlers.

This issue caused some issues for delayed_work interface. Because
there's no way to reliably shoot down delayed_work->timer from IRQ
handlers, __cancel_delayed_work() can't share the logic to steal the
target delayed_work with cancel_delayed_work_sync(), and can only
steal delayed_works which are on queued on timer. Similarly, the
pending mod_delayed_work() can't be used from IRQ handlers.

This patch adds a new timer flag TIMER_IRQSAFE, which makes the timer
to be executed without enabling IRQ after dequeueing such that its
dequeueing and execution are atomic against IRQ handlers.

This makes it safe to wait for the timer's completion from IRQ
handlers, for example, using del_timer_sync(). It can never be
executing on the local CPU and if executing on other CPUs it won't be
interrupted until done.

This will enable simplifying delayed_work cancel/mod interface.

Signed-off-by: Tejun Heo
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-5-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner

Tejun Heo
2012-08-21 22:28:31 +0800
fc683995a timer: Clean up timer initializers ... Browse Code »

Over time, timer initializers became messy with unnecessarily
duplicated code which are inconsistently spread across timer.h and
timer.c.

This patch cleans up timer initializers.

* timer.c::__init_timer() is renamed to do_init_timer().

* __TIMER_INITIALIZER() added. It takes @flags and all initializers
are wrappers around it.

* init_timer[_on_stack]_key() now take @flags.

* __init_timer[_on_stack]() added. They take @flags and all init
macros are wrappers around them.

* __setup_timer[_on_stack]() added. It uses __init_timer() and takes
@flags. All setup macros are wrappers around the two.

Note that this patch doesn't add missing init/setup combinations -
e.g. init_timer_deferrable_on_stack(). Adding missing ones is
trivial.

Signed-off-by: Tejun Heo
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-4-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner

Tejun Heo
2012-08-21 22:28:30 +0800
5a9af38d0 timer: Relocate declarations of init_timer_on_stack_key() ... Browse Code »

init_timer_on_stack_key() is used by init macro definitions. Move
init_timer_on_stack_key() and destroy_timer_on_stack() declarations
above init macro defs. This will make the next init cleanup patch
easier to read.

Signed-off-by: Tejun Heo
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-3-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner

Tejun Heo
2012-08-21 22:28:30 +0800
e52b1db37 timer: Generalize timer->base flags handling ... Browse Code »

To prepare for addition of another flag, generalize timer->base flags
handling.

* Rename from TBASE_*_FLAG to TIMER_* and make them LU constants.

* Define and use TIMER_FLAG_MASK for flags masking so that multiple
flags can be handled correctly.

* Don't dereference timer->base directly even if
!tbase_get_deferrable(). All two such places are already passed in
@base, so use it instead.

* Make sure tvec_base's alignment is large enough for timer->base
flags using BUILD_BUG_ON().

Signed-off-by: Tejun Heo
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-2-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner

Tejun Heo
2012-08-21 22:28:30 +0800

22 Oct, 2010

1 commit

6f1bc451e timer: Make try_to_del_timer_sync() the same on SMP and UP ... Browse Code »

On UP try_to_del_timer_sync() is mapped to del_timer() which does not
take the running timer callback into account, so it has different
semantics.

Remove the SMP dependency of try_to_del_timer_sync() by using
base->running_timer in the UP case as well.

[ tglx: Removed set_running_timer() inline and tweaked the changelog ]

Signed-off-by: Yong Zhang
Cc: Ingo Molnar
Cc: Peter Zijlstra
Acked-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Yong Zhang
2010-10-22 20:46:25 +0800

21 Oct, 2010

3 commits

dd6414b50 timer: Permit statically-declared work with deferrable timers ... Browse Code »

Currently, you have to just define a delayed_work uninitialised, and then
initialise it before first use. That's a tad clumsy. At risk of playing
mind-games with the compiler, fooling it into doing pointer arithmetic
with compile-time-constants, this lets clients properly initialise delayed
work with deferrable timers statically.

This patch was inspired by the issues which lead Artem Bityutskiy to
commit 8eab945c5616fc984 ("sunrpc: make the cache cleaner workqueue
deferrable").

Signed-off-by: Phil Carmody
Acked-by: Artem Bityutskiy
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Phil Carmody
2010-10-21 23:30:06 +0800
aaabe31c2 timer: Initialize the field slack of timer_list ... Browse Code »

TIMER_INITIALIZER() should initialize the field slack of timer_list as
__init_timer() does.

Signed-off-by: Changli Gao
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Changli Gao
2010-10-21 23:30:06 +0800
d0959024d timer_list: Remove alignment padding on 64 bit when CONFIG_TIMER_STATS ... Browse Code »

Reorder struct timer_list to remove 8 bytes of alignment padding on 64
bit builds when CONFIG_TIMER_STATS is selected.

timer_list is widely used across the kernel so many structures will
benefit and shrink in size.

For example, with my config on x86_64
per_cpu_dm_data shrinks from 136 to 128 bytes
and
ahci_port_priv shrinks from 1032 to 968 bytes.

Signed-off-by: Richard Kennedy
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Richard Kennedy
2010-10-21 23:30:06 +0800

03 Aug, 2010

1 commit

8cadd2831 timer: add on-stack deferrable timer interfaces ... Browse Code »

In some cases (for instance with kernel threads) it may be desireable to
use on-stack deferrable timers to get their power saving benefits. Add
interfaces to support this for the IPS driver.

Signed-off-by: Jesse Barnes
Signed-off-by: Matthew Garrett

Jesse Barnes
2010-08-03 21:48:45 +0800

07 Apr, 2010

1 commit

3bbb9ec94 timers: Introduce the concept of timer slack for legacy timers ... Browse Code »

While HR timers have had the concept of timer slack for quite some time
now, the legacy timers lacked this concept, and had to make do with
round_jiffies() and friends.

Timer slack is important for power management; grouping timers reduces the
number of wakeups which in turn reduces power consumption.

This patch introduces timer slack to the legacy timers using the following
pieces:
* A slack field in the timer struct
* An api (set_timer_slack) that callers can use to set explicit timer slack
* A default slack of 0.4% of the requested delay for callers that do not set
any explicit slack
* Rounding code that is part of mod_timer() that tries to
group timers around jiffies values every 'power of two'
(so quick timers will group around every 2, but longer timers
will group around every 4, 8, 16, 32 etc)

Signed-off-by: Arjan van de Ven
Cc: johnstul@us.ibm.com
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Arjan van de Ven
2010-04-07 03:50:02 +0800

31 Aug, 2009

1 commit

e500011ff timers: Drop a function prototype ... Browse Code »

Drop prototype for non-existent next_timer_interrupt() function.

Signed-off-by: Randy Dunlap
Cc: akpm
LKML-Reference:
Signed-off-by: Thomas Gleixner

Randy Dunlap
2009-08-31 04:26:34 +0800

24 Jun, 2009

1 commit

507e12315 timer stats: Optimize by adding quick check to avoid function calls ... Browse Code »

When the kernel is configured with CONFIG_TIMER_STATS but timer
stats are runtime disabled we still get calls to
__timer_stats_timer_set_start_info which initializes some
fields in the corresponding struct timer_list.

So add some quick checks in the the timer stats setup functions
to avoid function calls to __timer_stats_timer_set_start_info
when timer stats are disabled.

In an artificial workload that does nothing but playing ping
pong with a single tcp packet via loopback this decreases cpu
consumption by 1 - 1.5%.

This is part of a modified function trace output on SLES11:

perl-2497 [00] 28630647177732388 [+ 125]: sk_reset_timer
Cc: Andrew Morton
Cc: Martin Schwidefsky
Cc: Mustafa Mesanovic
Cc: Arjan van de Ven
LKML-Reference:
Signed-off-by: Ingo Molnar

Heiko Carstens
2009-06-24 17:15:09 +0800

13 May, 2009

1 commit

597d02757 timers: Framework for identifying pinned timers ... Browse Code »

* Arun R Bharadwaj [2009-04-16 12:11:36]:

This patch creates a new framework for identifying cpu-pinned timers
and hrtimers.

This framework is needed because pinned timers are expected to fire on
the same CPU on which they are queued. So it is essential to identify
these and not migrate them, in case there are any.

For regular timers, the currently existing add_timer_on() can be used
queue pinned timers and subsequently mod_timer_pinned() can be used
to modify the 'expires' field.

For hrtimers, new modes HRTIMER_ABS_PINNED and HRTIMER_REL_PINNED are
added to queue cpu-pinned hrtimer.

[ tglx: use .._PINNED mode argument instead of creating tons of new
functions ]

Signed-off-by: Arun R Bharadwaj
Signed-off-by: Thomas Gleixner

Arun R Bharadwaj
2009-05-13 22:52:42 +0800

31 Mar, 2009

1 commit

c4e1aa67e Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (33 commits)
lockdep: fix deadlock in lockdep_trace_alloc
lockdep: annotate reclaim context (__GFP_NOFS), fix SLOB
lockdep: annotate reclaim context (__GFP_NOFS), fix
lockdep: build fix for !PROVE_LOCKING
lockstat: warn about disabled lock debugging
lockdep: use stringify.h
lockdep: simplify check_prev_add_irq()
lockdep: get_user_chars() redo
lockdep: simplify get_user_chars()
lockdep: add comments to mark_lock_irq()
lockdep: remove macro usage from mark_held_locks()
lockdep: fully reduce mark_lock_irq()
lockdep: merge the !_READ mark_lock_irq() helpers
lockdep: merge the _READ mark_lock_irq() helpers
lockdep: simplify mark_lock_irq() helpers #3
lockdep: further simplify mark_lock_irq() helpers
lockdep: simplify the mark_lock_irq() helpers
lockdep: split up mark_lock_irq()
lockdep: generate usage strings
lockdep: generate the state bit definitions
...

Linus Torvalds
2009-03-31 08:17:35 +0800

19 Feb, 2009

1 commit

74019224a timers: add mod_timer_pending() ... Browse Code »

Impact: new timer API

Based on an idea from Martin Josefsson with the help of
Patrick McHardy and Stephen Hemminger:

introduce the mod_timer_pending() API which is a mod_timer()
offspring that is an invariant on already removed timers.

(regular mod_timer() re-activates non-pending timers.)

This is useful for the networking code in that it can
allow unserialized mod_timer_pending() timer-forwarding
calls, but a single del_timer*() will stop the timer
from being reactivated again.

Also while at it:

- optimize the regular mod_timer() path some more, the
timer-stat and a debug check was needlessly duplicated
in __mod_timer().

- make the exports come straight after the function, as
most other exports in timer.c already did.

- eliminate __mod_timer() as an external API, change the
users to mod_timer().

The regular mod_timer() code path is not impacted
significantly, due to inlining optimizations and due to
the simplifications.

Based-on-patch-from: Stephen Hemminger
Acked-by: Stephen Hemminger
Cc: "David S. Miller"
Cc: Patrick McHardy
Cc: netdev@vger.kernel.org
Cc: Oleg Nesterov
Cc: Andrew Morton
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-02-19 02:26:33 +0800

15 Feb, 2009

1 commit

6f2b9b9a9 timer: implement lockdep deadlock detection ... Browse Code »

This modifies the timer code in a way to allow lockdep to detect
deadlocks resulting from a lock being taken in the timer function
as well as around the del_timer_sync() call.

Signed-off-by: Johannes Berg

Johannes Berg
2009-02-15 06:25:52 +0800

06 Nov, 2008

1 commit

9c133c469 Add round_jiffies_up and related routines ... Browse Code »

This patch (as1158b) adds round_jiffies_up() and friends. These
routines work like the analogous round_jiffies() functions, except
that they will never round down.

The new routines will be useful for timeouts where we don't care
exactly when the timer expires, provided it doesn't expire too soon.

Signed-off-by: Alan Stern
Signed-off-by: Jens Axboe

Alan Stern
2008-11-06 15:42:48 +0800

30 Apr, 2008

1 commit

c6f3a97f8 debugobjects: add timer specific object debugging code ... Browse Code »

Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.

Signed-off-by: Thomas Gleixner
Acked-by: Ingo Molnar
Cc: Greg KH
Cc: Randy Dunlap
Cc: Kay Sievers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
2008-04-30 23:29:53 +0800

09 Feb, 2008

2 commits

6d141c3ff workqueue: make delayed_work_timer_fn() static ... Browse Code »

delayed_work_timer_fn() is a timer function, make it static.

Signed-off-by: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-02-09 01:22:37 +0800
ec7015840 Remove fastcall from linux/include ... Browse Code »

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Harvey Harrison
2008-02-09 01:22:31 +0800

30 Jan, 2008

1 commit

a6fa8e5a6 time: clean hungarian notation from timers ... Browse Code »

Clean up hungarian notation from timer code.

Signed-off-by: Pavel Machek
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Pavel Machek
2008-01-30 20:30:00 +0800

17 Jul, 2007

2 commits

c5c061b8f Add a flag to indicate deferrable timers in /proc/timer_stats ... Browse Code »

Add a flag in /proc/timer_stats to indicate deferrable timers. This will
let developers/users to differentiate between types of tiemrs in
/proc/timer_stats.

Deferrable timer and normal timer will appear in /proc/timer_stats as below.
10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
10, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)

Also version of timer_stats changes from v0.1 to v0.2

Signed-off-by: Venkatesh Pallipadi
Acked-by: Ingo Molnar
Cc: Thomas Gleixner
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Venki Pallipadi
2007-07-17 00:05:45 +0800
0a3021f4e Remove unnecessary includes of spinlock.h under include/linux ... Browse Code »

Remove the obviously unnecessary includes of under the
include/linux/ directory, and fix the couple errors that are introduced as
a result of that.

Signed-off-by: Robert P. J. Day
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robert P. J. Day
2007-07-17 00:05:42 +0800

30 May, 2007

1 commit

eaad084bb NOHZ: prevent multiplication overflow - stop timer for huge timeouts ... Browse Code »

get_next_timer_interrupt() returns a delta of (LONG_MAX > 1) in case
there is no timer pending. On 64 bit machines this results in a
multiplication overflow in tick_nohz_stop_sched_tick().

Reported by: Dave Miller

Make the return value a constant and limit the return value to a 32 bit
value.

When the max timeout value is returned, we can safely stop the tick
timer device. The max jiffies delta results in a 12 days timeout for
HZ=1000.

In the long term the get_next_timer_interrupt() code needs to be
reworked to return ktime instead of jiffies, but we have to wait until
the last users of the original NO_IDLE_HZ code are converted.

Signed-off-by: Thomas Gleixner
Acked-off-by: David S. Miller
Signed-off-by: Linus Torvalds

Thomas Gleixner
2007-05-30 09:11:10 +0800

09 May, 2007

1 commit

6e453a675 Add support for deferrable timers ... Browse Code »

Introduce a new flag for timers - deferrable: Timers that work normally
when system is busy. But, will not cause CPU to come out of idle (just to
service this timer), when CPU is idle. Instead, this timer will be
serviced when CPU eventually wakes up with a subsequent non-deferrable
timer.

The main advantage of this is to avoid unnecessary timer interrupts when
CPU is idle. If the routine currently called by a timer can wait until
next event without any issues, this new timer can be used to setup timer
event for that routine. This, with dynticks, allows CPUs to be lazy,
allowing them to stay in idle for extended period of time by reducing
unnecesary wakeup and thereby reducing the power consumption.

This patch:

Builds this new timer on top of existing timer infrastructure. It uses
last bit in 'base' pointer of timer_list structure to store this deferrable
timer flag. __next_timer_interrupt() function skips over these deferrable
timers when CPU looks for next timer event for which it has to wake up.

This is exported by a new interface init_timer_deferrable() that can be
called in place of regular init_timer().

[akpm@linux-foundation.org: Privatise a #define]
Signed-off-by: Venkatesh Pallipadi
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Oleg Nesterov
Cc: Dave Jones
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Venki Pallipadi
2007-05-09 02:15:05 +0800

17 Feb, 2007

3 commits

82f67cd9f [PATCH] Add debugging feature /proc/timer_stat ... Browse Code »

Add /proc/timer_stats support: debugging feature to profile timer expiration.
Both the starting site, process/PID and the expiration function is captured.
This allows the quick identification of timer event sources in a system.

Sample output:

# echo 1 > /proc/timer_stats
# cat /proc/timer_stats
Timer Stats Version: v0.1
Sample period: 4.010 s
24, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
11, 0 swapper sk_reset_timer (tcp_delack_timer)
6, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
17, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
4, 2050 pcscd do_nanosleep (hrtimer_wakeup)
5, 4179 sshd sk_reset_timer (tcp_write_timer)
4, 2248 yum-updatesd schedule_timeout (process_timeout)
18, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
3, 0 swapper sk_reset_timer (tcp_delack_timer)
1, 1 swapper neigh_table_init_no_netlink (neigh_periodic_timer)
2, 1 swapper e1000_up (e1000_watchdog)
1, 1 init schedule_timeout (process_timeout)
100 total events, 25.24 events/sec

[ cleanups and hrtimers support from Thomas Gleixner ]
[bunk@stusta.de: nr_entries can become static]
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner
Cc: john stultz
Cc: Roman Zippel
Cc: Andi Kleen
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2007-02-17 00:13:59 +0800
c9cb2e3d7 [PATCH] hrtimers: namespace and enum cleanup ... Browse Code »

- hrtimers did not use the hrtimer_restart enum and relied on the implict
int representation. Fix the prototypes and the functions using the enums.
- Use seperate name spaces for the enumerations
- Convert hrtimer_restart macro to inline function
- Add comments

No functional changes.

[akpm@osdl.org: fix input driver]
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Cc: Dmitry Torokhov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
2007-02-17 00:13:58 +0800
fd064b9b7 [PATCH] Extend next_timer_interrupt() to use a reference jiffie ... Browse Code »

For CONFIG_NO_HZ we need to calculate the next timer wheel event based on a
given jiffie value. Extend the existing code to allow the extra 'now'
argument. Provide a compability function for the existing implementations to
call the function with now == jiffies. (This also solves the racyness of the
original code vs. jiffies changing during the iteration.)

No functional changes to existing users of this infrastructure.

[ remove WARN_ON() that triggered on s390, by Carsten Otte ]
[ made new helper static, Adrian Bunk ]
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
2007-02-17 00:13:58 +0800