Doug / smarc-fsl-linux-kernel | Embedian Git Server

07 Aug, 2010

1 commit

b62ad9ab1 Merge branch 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linu… ... Browse Code »

…x/kernel/git/tip/linux-2.6-tip

* 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
um: Fix read_persistent_clock fallout
kgdb: Do not access xtime directly
powerpc: Clean up obsolete code relating to decrementer and timebase
powerpc: Rework VDSO gettimeofday to prevent time going backwards
clocksource: Add __clocksource_updatefreq_hz/khz methods
x86: Convert common clocksources to use clocksource_register_hz/khz
timekeeping: Make xtime and wall_to_monotonic static
hrtimer: Cleanup direct access to wall_to_monotonic
um: Convert to use read_persistent_clock
timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset
powerpc: Cleanup xtime usage
powerpc: Simplify update_vsyscall
time: Kill off CONFIG_GENERIC_TIME
time: Implement timespec_add
x86: Fix vtime/file timestamp inconsistencies

Trivial conflicts in Documentation/feature-removal-schedule.txt

Much less trivial conflicts in arch/powerpc/kernel/time.c resolved as
per Thomas' earlier merge commit 47916be4e28c ("Merge branch
'powerpc.cherry-picks' into timers/clocksource")

Linus Torvalds
2010-08-07 04:18:29 +0800

27 Jul, 2010

1 commit

8ab4351a4 hrtimer: Cleanup direct access to wall_to_monotonic ... Browse Code »

Provides an accessor function to replace hrtimer.c's
direct access of wall_to_monotonic.

This will allow wall_to_monotonic to be made static as
planned in Documentation/feature-removal-schedule.txt

Signed-off-by: John Stultz
LKML-Reference:
Signed-off-by: Thomas Gleixner

John Stultz
2010-07-27 18:40:55 +0800

09 Jun, 2010

1 commit

83cd4fe27 sched: Change nohz idle load balancing logic to push model ... Browse Code »

In the new push model, all idle CPUs indeed go into nohz mode. There is
still the concept of idle load balancer (performing the load balancing
on behalf of all the idle cpu's in the system). Busy CPU kicks the nohz
balancer when any of the nohz CPUs need idle load balancing.
The kickee CPU does the idle load balancing on behalf of all idle CPUs
instead of the normal idle balance.

This addresses the below two problems with the current nohz ilb logic:
* the idle load balancer continued to have periodic ticks during idle and
wokeup frequently, even though it did not have any rebalancing to do on
behalf of any of the idle CPUs.
* On x86 and CPUs that have APIC timer stoppage on idle CPUs, this
periodic wakeup can result in a periodic additional interrupt on a CPU
doing the timer broadcast.

Also currently we are migrating the unpinned timers from an idle to the cpu
doing idle load balancing (when all the cpus in the system are idle,
there is no idle load balancing cpu and timers get added to the same idle cpu
where the request was made. So the existing optimization works only on semi idle
system).

And In semi idle system, we no longer have periodic ticks on the idle load
balancer CPU. Using that cpu will add more delays to the timers than intended
(as that cpu's timer base may not be uptodate wrt jiffies etc). This was
causing mysterious slowdowns during boot etc.

For now, in the semi idle case, use the nearest busy cpu for migrating timers
from an idle cpu. This is good for power-savings anyway.

Signed-off-by: Venkatesh Pallipadi
Signed-off-by: Suresh Siddha
Signed-off-by: Peter Zijlstra
Cc: Thomas Gleixner
LKML-Reference:
Signed-off-by: Ingo Molnar

Venkatesh Pallipadi
2010-06-09 16:34:52 +0800

26 May, 2010

1 commit

174bd1994 hrtimer: Avoid double seqlock ... Browse Code »

hrtimer_get_softirq_time() has it's own xtime lock protection, so it's
safe to use plain __current_kernel_time() and avoid the double seqlock
loop.

Signed-off-by: Stanislaw Gruszka
LKML-Reference:
Signed-off-by: Thomas Gleixner

Stanislaw Gruszka
2010-05-26 22:15:37 +0800

07 Apr, 2010

1 commit

351b3f7a2 hrtimers: Provide schedule_hrtimeout for CLOCK_REALTIME ... Browse Code »

The current version of schedule_hrtimeout() always uses the
monotonic clock. Some system calls such as mq_timedsend()
and mq_timedreceive(), however, require the use of the wall
clock due to the definition of the system call.

This patch provides the infrastructure to use schedule_hrtimeout()
with a CLOCK_REALTIME timer.

Signed-off-by: Carsten Emde
Tested-by: Pradyumna Sampath
Cc: Andrew Morton
Cc: Arjan van de Veen
LKML-Reference:
Signed-off-by: Thomas Gleixner

Carsten Emde
2010-04-07 03:50:03 +0800

15 Dec, 2009

1 commit

ecb49d1a6 hrtimers: Convert to raw_spinlocks ... Browse Code »

Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Acked-by: Ingo Molnar

Thomas Gleixner
2009-12-15 06:55:34 +0800

10 Dec, 2009

2 commits

5f201907d hrtimer: move timer stats helper functions to hrtimer.c ... Browse Code »

There is no reason to make timer_stats_hrtimer_set_start_info and
friends visible to the rest of the kernel. So move all of them to
hrtimer.c. Also make timer_stats_hrtimer_set_start_info a static
inline function so it gets inlined and we avoid another function call.
Based on a patch by Thomas Gleixner.

Signed-off-by: Heiko Carstens
LKML-Reference:
Signed-off-by: Thomas Gleixner

Heiko Carstens
2009-12-10 20:08:11 +0800
41d2e4949 hrtimer: Tune hrtimer_interrupt hang logic ... Browse Code »

The hrtimer_interrupt hang logic adjusts min_delta_ns based on the
execution time of the hrtimer callbacks.

This is error-prone for virtual machines, where a guest vcpu can be
scheduled out during the execution of the callbacks (and the callbacks
themselves can do operations that translate to blocking operations in
the hypervisor), which in can lead to large min_delta_ns rendering the
system unusable.

Replace the current heuristics with something more reliable. Allow the
interrupt code to try 3 times to catch up with the lost time. If that
fails use the total time spent in the interrupt handler to defer the
next timer interrupt so the system can catch up with other things
which got delayed. Limit that deferment to 100ms.

The retry events and the maximum time spent in the interrupt handler
are recorded and exposed via /proc/timer_list

Inspired by a patch from Marcelo.

Reported-by: Michael Tokarev
Signed-off-by: Thomas Gleixner
Tested-by: Marcelo Tosatti
Cc: kvm@vger.kernel.org

Thomas Gleixner
2009-12-10 20:08:11 +0800

09 Dec, 2009

1 commit

60d8ce2cd Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers, init: Limit the number of per cpu calibration bootup messages
posix-cpu-timers: optimize and document timer_create callback
clockevents: Add missing include to pacify sparse
x86: vmiclock: Fix printk format
x86: Fix printk format due to variable type change
sparc: fix printk for change of variable type
clocksource/events: Fix fallout of generic code changes
nohz: Allow 32-bit machines to sleep for more than 2.15 seconds
nohz: Track last do_timer() cpu
nohz: Prevent clocksource wrapping during idle
nohz: Type cast printk argument
mips: Use generic mult/shift factor calculation for clocks
clocksource: Provide a generic mult/shift factor calculation
clockevents: Use u32 for mult and shift factors
nohz: Introduce arch_needs_cpu
nohz: Reuse ktime in sub-functions of tick_check_idle.
time: Remove xtime_cache
time: Implement logarithmic time accumulation

Linus Torvalds
2009-12-09 11:27:08 +0800

14 Nov, 2009

1 commit

97813f2fe nohz: Allow 32-bit machines to sleep for more than 2.15 seconds ... Browse Code »

In the dynamic tick code, "max_delta_ns" (member of the
"clock_event_device" structure) represents the maximum sleep time
that can occur between timer events in nanoseconds.

The variable, "max_delta_ns", is defined as an unsigned long
which is a 32-bit integer for 32-bit machines and a 64-bit
integer for 64-bit machines (if -m64 option is used for gcc).
The value of max_delta_ns is set by calling the function
"clockevent_delta2ns()" which returns a maximum value of LONG_MAX.
For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in
nanoseconds this equates to ~2.15 seconds. Hence, the maximum
sleep time for a 32-bit machine is ~2.15 seconds, where as for
a 64-bit machine it will be many years.

This patch changes the type of max_delta_ns to be "u64" instead of
"unsigned long" so that this variable is a 64-bit type for both 32-bit
and 64-bit machines. It also changes the maximum value returned by
clockevent_delta2ns() to KTIME_MAX. Hence this allows a 32-bit
machine to sleep for longer than ~2.15 seconds. Please note that this
patch also changes "min_delta_ns" to be "u64" too and although this is
unnecessary, it makes the patch simpler as it avoids to fixup all
callers of clockevent_delta2ns().

[ tglx: changed "unsigned long long" to u64 as we use this data type
through out the time code ]

Signed-off-by: Jon Hunter
Cc: John Stultz
LKML-Reference:
Signed-off-by: Thomas Gleixner

Jon Hunter
2009-11-14 03:46:24 +0800

06 Oct, 2009

1 commit

e69a9ac59 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
hrtimer: Remove overly verbose "switch to high res mode" message

Linus Torvalds
2009-10-06 03:04:16 +0800

28 Sep, 2009

1 commit

6f5071020 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
hrtimer: Eliminate needless reprogramming of clock events device

Linus Torvalds
2009-09-28 01:39:04 +0800

26 Sep, 2009

1 commit

d3f6302e7 hrtimer: Remove overly verbose "switch to high res mode" message ... Browse Code »

On big systems, printing copies of

Switched to high resolution mode on CPU nnn

clutters up the kernel log for minimal gain. Just get rid of them.

Signed-off-by: Roland Dreier
LKML-Reference:
Signed-off-by: Ingo Molnar

Roland Dreier
2009-09-26 22:58:09 +0800

24 Sep, 2009

1 commit

31bbb9b58 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
itimers: Add tracepoints for itimer
hrtimer: Add tracepoint for hrtimers
timers: Add tracepoints for timer_list timers
cputime: Optimize jiffies_to_cputime(1)
itimers: Simplify arm_timer() code a bit
itimers: Fix periodic tics precision
itimers: Merge ITIMER_VIRT and ITIMER_PROF

Trivial header file include conflicts in kernel/fork.c

Linus Torvalds
2009-09-24 00:46:15 +0800

19 Sep, 2009

1 commit

a03fdb761 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (34 commits)
time: Prevent 32 bit overflow with set_normalized_timespec()
clocksource: Delay clocksource down rating to late boot
clocksource: clocksource_select must be called with mutex locked
clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
timers: Drop a function prototype
clocksource: Resolve cpu hotplug dead lock with TSC unstable
timer.c: Fix S/390 comments
timekeeping: Fix invalid getboottime() value
timekeeping: Fix up read_persistent_clock() breakage on sh
timekeeping: Increase granularity of read_persistent_clock(), build fix
time: Introduce CLOCK_REALTIME_COARSE
x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown
clocksource: Avoid clocksource watchdog circular locking dependency
clocksource: Protect the watchdog rating changes with clocksource_mutex
clocksource: Call clocksource_change_rating() outside of watchdog_lock
timekeeping: Introduce read_boot_clock
timekeeping: Increase granularity of read_persistent_clock()
timekeeping: Update clocksource with stop_machine
timekeeping: Add timekeeper read_clock helper functions
timekeeping: Move NTP adjusted clock multiplier to struct timekeeper
...

Fix trivial conflict due to MIPS lemote -> loongson renaming.

Linus Torvalds
2009-09-19 00:15:24 +0800

15 Sep, 2009

1 commit

7403f41f1 hrtimer: Eliminate needless reprogramming of clock events device ... Browse Code »

On NOHZ systems the following timers,

- tick_nohz_restart_sched_tick (tick_sched_timer)
- hrtimer_start (tick_sched_timer)

are reprogramming the clock events device far more often than needed.
No specific test case was required to observe this effect. This
occurres because there was no check to see if the currently removed or
restarted hrtimer was:

1) the one which previously armed the clock events device.
2) going to be replaced by another timer which has the same expiry time.

Avoid the reprogramming in hrtimer_force_reprogram when the new expiry
value which is evaluated from the clock bases is equal to
cpu_base->expires_next. This results in faster application startup
time by ~4%.

[ tglx: simplified initial solution ]

Signed-off-by: Ashwin Chaugule
LKML-Reference:
Signed-off-by: Thomas Gleixner

Ashwin Chaugule
2009-09-15 23:09:44 +0800

29 Aug, 2009

2 commits

c6a2a1770 hrtimer: Add tracepoint for hrtimers ... Browse Code »

Add tracepoints which cover the life cycle of a hrtimer. The
tracepoints are integrated with the already existing debug_object
debug points as far as possible.

[ tglx: Fixed comments, made output conistent, easier to read and
parse. Fixed output for 32bit archs which do not use the
scalar representation of ktime_t. Hand current time to
trace_hrtimer_expiry_entry instead of calling get_time()
inside of the trace assignment. ]

Signed-off-by: Xiao Guangrong
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Mathieu Desnoyers
Cc: Anton Blanchard
Cc: Peter Zijlstra
Cc: KOSAKI Motohiro
Cc: Zhaolei
LKML-Reference:
Signed-off-by: Thomas Gleixner

Xiao Guangrong
2009-08-29 20:10:06 +0800
2bc481cf4 pktgen: spin using hrtimer ... Browse Code »

This changes how the pktgen thread spins/waits between
packets if delay is configured. It uses a high res timer to
wait for time to arrive.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2009-08-29 14:41:29 +0800

14 Aug, 2009

1 commit

4cd1993f0 Merge branch 'linus' into timers/core ... Browse Code »

Reason: Martin's timekeeping cleanup series depends on both
timers/core and mainline changes.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2009-08-14 21:59:30 +0800

22 Jul, 2009

1 commit

fbd90375d hrtimer: Remove cb_entry from struct hrtimer ... Browse Code »

It's unused, remove it.

Signed-off-by: Peter Zijlstra
Signed-off-by: Thomas Gleixner
LKML-Reference:

Peter Zijlstra
2009-07-22 23:12:32 +0800

10 Jul, 2009

2 commits

6ff7041db hrtimer: Fix migration expiry check ... Browse Code »

The timer migration expiry check should prevent the migration of a
timer to another CPU when the timer expires before the next event is
scheduled on the other CPU. Migrating the timer might delay it because
we can not reprogram the clock event device on the other CPU. But the
code implementing that check has two flaws:

- for !HIGHRES the check compares the expiry value with the clock
events device expiry value which is wrong for CLOCK_REALTIME based
timers.

- the check is racy. It holds the hrtimer base lock of the target CPU,
but the clock event device expiry value can be modified
nevertheless, e.g. by an timer interrupt firing.

The !HIGHRES case is easy to fix as we can enqueue the timer on the
cpu which was selected by the load balancer. It runs the idle
balancing code once per jiffy anyway. So the maximum delay for the
timer is the same as when we keep the tick on the current cpu going.

In the HIGHRES case we can get the next expiry value from the hrtimer
cpu_base of the target CPU and serialize the update with the cpu_base
lock. This moves the lock section in hrtimer_interrupt() so we can set
next_event to KTIME_MAX while we are handling the expired timers and
set it to the next expiry value after we handled the timers under the
base lock. While the expired timers are processed timer migration is
blocked because the expiry time of the timer is always

Thomas Gleixner
2009-07-10 23:32:55 +0800
7e0c5086c hrtimer: migration: do not check expiry time on current CPU ... Browse Code »

The timer migration code needs to check whether the expiry time of the
timer is before the programmed clock event expiry time when the timer
is enqueued on another CPU because we can not reprogram the timer
device on the other CPU. The current logic checks the expiry time even
if we enqueue on the current CPU when nohz_get_load_balancer() returns
current CPU. This might lead to an endless loop in the expiry check
code when the expiry time of the timer is before the current
programmed next event.

Check whether nohz_get_load_balancer() returns current CPU and skip
the expiry check if this is the case.

The bug was triggered from the networking code. The patch fixes the
regression http://bugzilla.kernel.org/show_bug.cgi?id=13738
(Soft-Lockup/Race in networking in 2.6.31-rc1+195)

Cc: Arun Bharadwaj
Tested-by: Andres Freund
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2009-07-10 23:22:20 +0800

07 Jul, 2009

2 commits

a40f262cc timekeeping: Move ktime_get() functions to timekeeping.c ... Browse Code »

The ktime_get() functions for GENERIC_TIME=n are still located in
hrtimer.c. Move them to time/timekeeping.c where they belong.

LKML-Reference:
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2009-07-07 19:00:31 +0800
951ed4d36 timekeeping: optimized ktime_get[_ts] for GENERIC_TIME=y ... Browse Code »

The generic ktime_get function defined in kernel/hrtimer.c is suboptimial
for GENERIC_TIME=y:

0) | ktime_get() {
0) | ktime_get_ts() {
0) | getnstimeofday() {
0) | read_tod_clock() {
0) 0.601 us | }
0) 1.938 us | }
0) | set_normalized_timespec() {
0) 0.602 us | }
0) 4.375 us | }
0) 5.523 us | }

Overall there are two read_seqbegin/read_seqretry loops and a lot of
unnecessary struct timespec calculations. ktime_get returns a nano second
value which is the sum of xtime, wall_to_monotonic and the nano second
delta from the clock source.

ktime_get can be optimized for GENERIC_TIME=y. The new version only calls
clocksource_read:

0) | ktime_get() {
0) | read_tod_clock() {
0) 0.610 us | }
0) 1.977 us | }

It uses a single read_seqbegin/readseqretry loop and just adds everthing
to a nano second value.

ktime_get_ts is optimized in a similar fashion.

[ tglx: added WARN_ON(timekeeping_suspended) as in getnstimeofday() ]

Signed-off-by: Martin Schwidefsky
Acked-by: john stultz
LKML-Reference:
Signed-off-by: Thomas Gleixner

Martin Schwidefsky
2009-07-07 18:47:33 +0800

18 Jun, 2009

1 commit

b7c142dbf Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 ... Browse Code »

* 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBIFS: start using hrtimers
hrtimer: export ktime_add_safe
UBIFS: do not forget to register BDI device
UBIFS: allow sync option in rootflags
UBIFS: remove dead code
UBIFS: use anonymous device
UBIFS: return proper error code if the compr is not present
UBIFS: return error if link and unlink race
UBIFS: reset no_space flag after inode deletion

Linus Torvalds
2009-06-18 00:46:33 +0800

08 Jun, 2009

1 commit

8daa21e61 hrtimer: export ktime_add_safe ... Browse Code »

We want to use hrtimers in UBIFS (for write-buffer write-back timer).
We need the 'hrtimer_set_expires_range_ns()', which is an in-line
function which uses 'ktime_add_safe()'.

Signed-off-by: Artem Bityutskiy
Acked-by: Ingo Molnar

Artem Bityutskiy
2009-06-08 16:14:58 +0800

13 May, 2009

2 commits

eea08f32a timers: Logic to move non pinned timers ... Browse Code »

* Arun R Bharadwaj [2009-04-16 12:11:36]:

This patch migrates all non pinned timers and hrtimers to the current
idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
are not migrated.

While migrating hrtimers, care should be taken to check if migrating
a hrtimer would result in a latency or not. So we compare the expiry of the
hrtimer with the next timer interrupt on the target cpu and migrate the
hrtimer only if it expires *after* the next interrupt on the target cpu.
So, added a clockevents_get_next_event() helper function to return the
next_event on the target cpu's clock_event_device.

[ tglx: cleanups and simplifications ]

Signed-off-by: Arun R Bharadwaj
Signed-off-by: Thomas Gleixner

Arun R Bharadwaj
2009-05-13 22:52:42 +0800
597d02757 timers: Framework for identifying pinned timers ... Browse Code »

* Arun R Bharadwaj [2009-04-16 12:11:36]:

This patch creates a new framework for identifying cpu-pinned timers
and hrtimers.

This framework is needed because pinned timers are expected to fire on
the same CPU on which they are queued. So it is essential to identify
these and not migrate them, in case there are any.

For regular timers, the currently existing add_timer_on() can be used
queue pinned timers and subsequently mod_timer_pinned() can be used
to modify the 'expires' field.

For hrtimers, new modes HRTIMER_ABS_PINNED and HRTIMER_REL_PINNED are
added to queue cpu-pinned hrtimer.

[ tglx: use .._PINNED mode argument instead of creating tons of new
functions ]

Signed-off-by: Arun R Bharadwaj
Signed-off-by: Thomas Gleixner

Arun R Bharadwaj
2009-05-13 22:52:42 +0800

31 Mar, 2009

1 commit

7f1e2ca9f hrtimer: fix rq->lock inversion (again) ... Browse Code »

It appears I inadvertly introduced rq->lock recursion to the
hrtimer_start() path when I delegated running already expired
timers to softirq context.

This patch fixes it by introducing a __hrtimer_start_range_ns()
method that will not use raise_softirq_irqoff() but
__raise_softirq_irqoff() which avoids the wakeup.

It then also changes schedule() to check for pending softirqs and
do the wakeup then, I'm not quite sure I like this last bit, nor
am I convinced its really needed.

Signed-off-by: Peter Zijlstra
Cc: Peter Zijlstra
Cc: paulus@samba.org
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-03-31 20:52:52 +0800

31 Jan, 2009

3 commits

b0a9b5111 hrtimer: prevent negative expiry value after clock_was_set() ... Browse Code »

Impact: prevent false positive WARN_ON() in clockevents_program_event()

clock_was_set() changes the base->offset of CLOCK_REALTIME and
enforces the reprogramming of the clockevent device to expire timers
which are based on CLOCK_REALTIME. If the clock change is large enough
then the subtraction of the timer expiry value and base->offset can
become negative which triggers the warning in
clockevents_program_event().

Check the subtraction result and set a negative value to 0.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2009-01-31 05:35:34 +0800
94df7de02 hrtimers: allow the hot-unplugging of all cpus ... Browse Code »

Impact: fix CPU hotplug hang on Power6 testbox

On architectures that support offlining all cpus (at least powerpc/pseries),
hot-unpluging the tick_do_timer_cpu can result in a system hang.

This comes from the fact that if the cpu going down happens to be the
cpu doing the tick, then as the tick_do_timer_cpu handover happens after the
cpu is dead (via the CPU_DEAD notification), we're left without ticks,
jiffies are frozen and any task relying on timers (msleep, ...) is stuck.
That's particularly the case for the cpu looping in __cpu_die() waiting
for the dying cpu to be dead.

This patch addresses this by having the tick_do_timer_cpu handover happen
earlier during the CPU_DYING notification. For this, a new clockevent
notification type is introduced (CLOCK_EVT_NOTIFY_CPU_DYING) which is triggered
in hrtimer_cpu_notify().

Signed-off-by: Sebastien Dugue
Cc:
Signed-off-by: Ingo Molnar

Sebastien Dugue
2009-01-31 05:35:29 +0800
7f22391cb hrtimers: increase clock min delta threshold while interrupt hanging ... Browse Code »

Impact: avoid timer IRQ hanging slow systems

While using the function graph tracer on a virtualized system, the
hrtimer_interrupt can hang the system on an infinite loop.

This can be caused in several situations:

- the hardware is very slow and HZ is set too high

- something intrusive is slowing the system down (tracing under emulation)

... and the next clock events to program are always before the current time.

This patch implements a reasonable compromise: if such a situation is
detected, we share the CPUs time in 1/4 to process the hrtimer interrupts.
This is enough to let the system running without serious starvation.

It has been successfully tested under VirtualBox with 1000 HZ and 100 HZ
with function graph tracer launched. On both cases, the clock events were
increased until about 25 ms periodic ticks, which means 40 HZ.

So we change a hard to debug hang into a warning message and a system that
still manages to limp along.

Signed-off-by: Frederic Weisbecker
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-01-31 05:35:10 +0800

27 Jan, 2009

1 commit

1e70c7f7a Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
hrtimers: fix inconsistent lock state on resume in hres_timers_resume
time-sched.c: tick_nohz_update_jiffies should be static
locking, hpet: annotate false positive warning
kernel/fork.c: unused variable 'ret'
itimers: remove the per-cpu-ish-ness

Linus Torvalds
2009-01-27 01:47:43 +0800

19 Jan, 2009

1 commit

1d4a7f1c4 hrtimers: fix inconsistent lock state on resume in hres_timers_resume ... Browse Code »

Andrey Borzenkov reported this lockdep assert:

> [17854.688347] =================================
> [17854.688347] [ INFO: inconsistent lock state ]
> [17854.688347] 2.6.29-rc2-1avb #1
> [17854.688347] ---------------------------------
> [17854.688347] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
> [17854.688347] pm-suspend/18240 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [17854.688347] (&cpu_base->lock){++..}, at: [] retrigger_next_event+0x5c/0xa0
> [17854.688347] {in-hardirq-W} state was registered at:
> [17854.688347] [] __lock_acquire+0x79d/0x1930
> [17854.688347] [] lock_acquire+0x5c/0x80
> [17854.688347] [] _spin_lock+0x35/0x70
> [17854.688347] [] hrtimer_run_queues+0x31/0x140
> [17854.688347] [] run_local_timers+0x8/0x20
> [17854.688347] [] update_process_times+0x23/0x60
> [17854.688347] [] tick_periodic+0x24/0x80
> [17854.688347] [] tick_handle_periodic+0x12/0x70
> [17854.688347] [] timer_interrupt+0x14/0x20
> [17854.688347] [] handle_IRQ_event+0x29/0x60
> [17854.688347] [] handle_level_irq+0x69/0xe0
> [17854.688347] [] 0xffffffff
> [17854.688347] irq event stamp: 55771
> [17854.688347] hardirqs last enabled at (55771): [] _spin_unlock_irqrestore+0x35/0x60
> [17854.688347] hardirqs last disabled at (55770): [] _spin_lock_irqsave+0x19/0x80
> [17854.688347] softirqs last enabled at (54836): [] __do_softirq+0xc4/0x110
> [17854.688347] softirqs last disabled at (54831): [] do_softirq+0x8e/0xe0
> [17854.688347]
> [17854.688347] other info that might help us debug this:
> [17854.688347] 3 locks held by pm-suspend/18240:
> [17854.688347] #0: (&buffer->mutex){--..}, at: [] sysfs_write_file+0x25/0x100
> [17854.688347] #1: (pm_mutex){--..}, at: [] enter_state+0x4f/0x140
> [17854.688347] #2: (dpm_list_mtx){--..}, at: [] device_pm_lock+0xf/0x20
> [17854.688347]
> [17854.688347] stack backtrace:
> [17854.688347] Pid: 18240, comm: pm-suspend Not tainted 2.6.29-rc2-1avb #1
> [17854.688347] Call Trace:
> [17854.688347] [] ? printk+0x18/0x20
> [17854.688347] [] print_usage_bug+0x16c/0x1d0
> [17854.688347] [] mark_lock+0x8bf/0xc90
> [17854.688347] [] ? pit_next_event+0x2f/0x40
> [17854.688347] [] __lock_acquire+0x580/0x1930
> [17854.688347] [] ? _spin_unlock+0x1d/0x20
> [17854.688347] [] ? pit_next_event+0x2f/0x40
> [17854.688347] [] ? clockevents_program_event+0x98/0x160
> [17854.688347] [] ? mark_held_locks+0x48/0x90
> [17854.688347] [] ? _spin_unlock_irqrestore+0x35/0x60
> [17854.688347] [] ? trace_hardirqs_on_caller+0x139/0x190
> [17854.688347] [] ? trace_hardirqs_on+0xb/0x10
> [17854.688347] [] lock_acquire+0x5c/0x80
> [17854.688347] [] ? retrigger_next_event+0x5c/0xa0
> [17854.688347] [] _spin_lock+0x35/0x70
> [17854.688347] [] ? retrigger_next_event+0x5c/0xa0
> [17854.688347] [] retrigger_next_event+0x5c/0xa0
> [17854.688347] [] hres_timers_resume+0xa/0x10
> [17854.688347] [] timekeeping_resume+0xee/0x150
> [17854.688347] [] __sysdev_resume+0x14/0x50
> [17854.688347] [] sysdev_resume+0x47/0x80
> [17854.688347] [] device_power_up+0xb/0x20
> [17854.688347] [] suspend_devices_and_enter+0xcf/0x150
> [17854.688347] [] ? freeze_processes+0x3f/0x90
> [17854.688347] [] enter_state+0xf4/0x140
> [17854.688347] [] state_store+0x7d/0xc0
> [17854.688347] [] ? state_store+0x0/0xc0
> [17854.688347] [] kobj_attr_store+0x24/0x30
> [17854.688347] [] sysfs_write_file+0x9c/0x100
> [17854.688347] [] vfs_write+0x9c/0x160
> [17854.688347] [] ? restore_nocheck_notrace+0x0/0xe
> [17854.688347] [] ? sysfs_write_file+0x0/0x100
> [17854.688347] [] sys_write+0x3d/0x70
> [17854.688347] [] sysenter_do_call+0x12/0x31

Andrey's analysis:

> timekeeping_resume() is called via class ->resume
> method; and according to comments in sysdev_resume() and
> device_power_up(), they are called with interrupts disabled.
>
> Looking at suspend_enter, irqs *are* disabled at this point.
>
> So it actually looks like something (may be some driver)
> unconditionally enabled irqs in resume path.

Add a debug check to test this theory. If it triggers then it
triggers because the resume code calls it with irqs enabled,
which is a no-no not just for timekeeping_resume(), but also
bad for a number of other resume handlers.

Reported-by: Andrey Borzenkov
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-01-19 04:31:37 +0800

14 Jan, 2009

1 commit

58fd3aa28 [CVE-2009-0029] System call wrappers part 01 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:18 +0800

05 Jan, 2009

5 commits

82c5b7b52 hrtimer: splitout peek ahead functionality, fix ... Browse Code »

Impact: build fix on !CONFIG_HIGH_RES_TIMERS

Fix:

kernel/hrtimer.c:1586: error: implicit declaration of function '__hrtimer_peek_ahead_timers'

Signen-off-by: Ingo Molnar

Ingo Molnar
2009-01-05 21:11:10 +0800
e3f1d8837 hrtimer: fixup comments ... Browse Code »

Clean up the comments

Signed-off-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Thomas Gleixner
2009-01-05 20:14:34 +0800
a6037b61c hrtimer: fix recursion deadlock by re-introducing the softirq ... Browse Code »

Impact: fix rare runtime deadlock

There are a few sites that do:

spin_lock_irq(&foo)
hrtimer_start(&bar)
__run_hrtimer(&bar)
func()
spin_lock(&foo)

which obviously deadlocks. In order to avoid this, never call __run_hrtimer()
from hrtimer_start*() context, but instead defer this to softirq context.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-01-05 20:14:33 +0800
731a55ba0 hrtimer: simplify hotplug migration ... Browse Code »

Impact: cleanup

No need for a smp function call, which is likely to run on the same
CPU anyway. We can just call hrtimers_peek_ahead() in the interrupts
disabled section of migrate_hrtimers().

Signed-off-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Thomas Gleixner
2009-01-05 20:14:33 +0800
d5fd43c4a hrtimer: fix HOTPLUG_CPU=n compile warning ... Browse Code »

Impact: cleanup

kernel/hrtimer.c: In function 'hrtimer_cpu_notify':
kernel/hrtimer.c:1574: warning: unused variable 'dcpu'

Introduced by commit 37810659ea7d9572c5ac284ade272f806ef8f788
("hrtimer: removing all ur callback modes, fix hotplug") from the
timers. dcpu is only used if CONFIG_HOTPLUG_CPU is set.

Reported-by: Stephen Rothwell
Signed-off-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Thomas Gleixner
2009-01-05 20:14:32 +0800