Doug / smarc-fsl-linux-kernel | Embedian Git Server

11 Nov, 2008

1 commit

ae99286b4 nohz: disable tick_nohz_kick_tick() for now ... Browse Code »

Impact: nohz powersavings and wakeup regression

commit fb02fbc14d17837b4b7b02dbb36142c16a7bf208 (NOHZ: restart tick
device from irq_enter()) causes a serious wakeup regression.

While the patch is correct it does not take into account that spurious
wakeups happen on x86. A fix for this issue is available, but we just
revert to the .27 behaviour and let long running softirqs screw
themself.

Disable it for now.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-11-11 05:39:27 +0800

22 Oct, 2008

2 commits

268a3dcfe Merge branch 'timers/range-hrtimers' into v28-range-hrtimers-for-linus-v2 ... Browse Code »

Conflicts:

kernel/time/tick-sched.c

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-10-22 15:48:06 +0800
c4bd822e7 NOHZ: fix thinko in the timer restart code path ... Browse Code »

commit fb02fbc14d17837b4b7b02dbb36142c16a7bf208 (NOHZ: restart tick
device from irq_enter())

solves the problem of stale jiffies when long running softirqs happen
in a long idle sleep period, but it has a major thinko in it:

When the interrupt which came in _is_ the timer interrupt which should
expire ts->sched_timer then we cancel and rearm the timer _before_ it
gets expired in hrtimer_interrupt() to the next period. That means the
call back function is not called. This game can go on for ever :(

Prevent this by making sure to only rearm the timer when the expiry
time is more than one tick_period away. Otherwise keep it running as
it is either already expired or will expiry at the right point to
update jiffies.

Signed-off-by: Thomas Gleixner
Tested-by: Venkatesch Pallipadi

Thomas Gleixner
2008-10-22 02:53:24 +0800

20 Oct, 2008

4 commits

c465a76af Merge branches 'timers/clocksource', 'timers/hrtimers', 'timers/nohz', 'timers/n… ... Browse Code »

…tp', 'timers/posixtimers' and 'timers/debug' into v28-timers-for-linus

Thomas Gleixner
2008-10-20 19:14:06 +0800
870e2a284 timer_list: add base address to clock base ... Browse Code »

The base address of a (per cpu) clock base is a useful debug info.
Add it and bump the version number of timer_lists.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-10-20 17:51:30 +0800
c5b77a3d3 timer_list: print cpu number of clockevents device ... Browse Code »

The per cpu clock events device output of timer_list lacks an
association of the device to the cpu which is annoying when looking at
the output of /proc/timer_list from a 128 way system.

Add the CPU number info and mark the broadcast device in the device
list printout.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-10-20 17:51:30 +0800
e67ef25a3 timer_list: print real timer address ... Browse Code »

The current timer_list output prints the address of the on stack copy
of the active hrtimer instead of the hrtimer itself.

Print the address of the real timer instead.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-10-20 17:51:30 +0800

18 Oct, 2008

4 commits

651dab426 Merge commit 'linus/master' into merge-linus ... Browse Code »

Conflicts:

arch/x86/kvm/i8254.c

Arjan van de Ven
2008-10-18 00:20:26 +0800
fb02fbc14 NOHZ: restart tick device from irq_enter() ... Browse Code »

We did not restart the tick device from irq_enter() to avoid double
reprogramming and extra events in the return immediate to idle case.

But long lasting softirqs can lead to a situation where jiffies become
stale:

idle()
tick stopped (reprogrammed to next pending timer)
halt()
interrupt
jiffies updated from irq_enter()
interrupt handler
softirq function 1 runs 20ms
softirq function 2 arms a 10ms timer with a stale jiffies value
jiffies updated from irq_exit()
timer wheel has now an already expired timer
(the one added in function 2)
timer fires and timer softirq runs

This was discovered when debugging a timer problem which happend only
when the ath5k driver is active. The debugging proved that there is a
softirq function running for more than 20ms, which is a bug by itself.

To solve this we restart the tick timer right from irq_enter(), but do
not go through the other functions which are necessary to return from
idle when need_resched() is set.

Reported-by: Elias Oltmanns
Signed-off-by: Thomas Gleixner
Tested-by: Elias Oltmanns

Thomas Gleixner
2008-10-18 00:13:38 +0800
c34bec5a4 NOHZ: split tick_nohz_restart_sched_tick() ... Browse Code »

Split out the clock event device reprogramming. Preparatory
patch.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-10-18 00:13:38 +0800
719254faa NOHZ: unify the nohz function calls in irq_enter() ... Browse Code »

We have two separate nohz function calls in irq_enter() for no good
reason. Just call a single NOHZ function from irq_enter() and call
the bits in the tick code.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-10-18 00:13:38 +0800

17 Oct, 2008

2 commits

e533b2270 Merge branch 'core-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'core-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
do_generic_file_read: s/EINTR/EIO/ if lock_page_killable() fails
softirq, warning fix: correct a format to avoid a warning
softirqs, debug: preemption check
x86, pci-hotplug, calgary / rio: fix EBDA ioremap()
IO resources, x86: ioremap sanity check to catch mapping requests exceeding, fix
IO resources, x86: ioremap sanity check to catch mapping requests exceeding the BAR sizes
softlockup: Documentation/sysctl/kernel.txt: fix softlockup_thresh description
dmi scan: warn about too early calls to dmi_check_system()
generic: redefine resource_size_t as phys_addr_t
generic: make PFN_PHYS explicitly return phys_addr_t
generic: add phys_addr_t for holding physical addresses
softirq: allocate less vectors
IO resources: fix/remove printk
printk: robustify printk, update comment
printk: robustify printk, fix #2
printk: robustify printk, fix
printk: robustify printk

Fixed up conflicts in:
arch/powerpc/include/asm/types.h
arch/powerpc/platforms/Kconfig.cputype
manually.

Linus Torvalds
2008-10-17 06:17:40 +0800
9ba16087d Kconfig: eliminate "def_bool n" constructs ... Browse Code »

Using "def_bool n" is pointless, simply using bool here appears more
appropriate.

Further, retaining such options that don't have a prompt and aren't
selected by anything seems also at least questionable.

Signed-off-by: Jan Beulich
Cc: Ingo Molnar
Cc: Tony Luck
Cc: Thomas Gleixner
Cc: Bartlomiej Zolnierkiewicz
Cc: Sam Ravnborg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
2008-10-17 02:21:31 +0800

15 Oct, 2008

1 commit

6b2ada821 Merge branches 'core/softlockup', 'core/softirq', 'core/resources', 'core/printk… ... Browse Code »

…' and 'core/misc' into core-v28-for-linus

Ingo Molnar
2008-10-15 18:48:44 +0800

10 Oct, 2008

1 commit

8083e4ad9 [CPUFREQ][5/6] cpufreq: Changes to get_cpu_idle_time_us(), used by ondemand governor ... Browse Code »

export get_cpu_idle_time_us() for it to be used in ondemand governor.
Last update time can be current time when the CPU is currently non-idle,
accounting for the busy time since last idle.

Signed-off-by: Venkatesh Pallipadi
Signed-off-by: Dave Jones

venkatesh.pallipadi@intel.com
2008-10-10 01:52:44 +0800

04 Oct, 2008

1 commit

07454bfff clockevents: check broadcast tick device not the clock events device ... Browse Code »

Impact: jiffies increment too fast.

Hugh Dickins noted that with NOHZ=n and HIGHRES=n jiffies get
incremented too fast. The reason is a wrong check in the broadcast
enter/exit code, which keeps the local apic timer in periodic mode
when the switch happens.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-10-04 16:51:07 +0800

29 Sep, 2008

1 commit

ccc7dadf7 hrtimer: prevent migration of per CPU hrtimers ... Browse Code »

Impact: per CPU hrtimers can be migrated from a dead CPU

The hrtimer code has no knowledge about per CPU timers, but we need to
prevent the migration of such timers and warn when such a timer is
active at migration time.

Explicitely mark the timers as per CPU and use a more understandable
mode descriptor for the interrupts safe unlocked callback mode, which
is used by hrtimer_sleeper and the scheduler code.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-29 23:09:14 +0800

24 Sep, 2008

3 commits

d40e944c2 ntp: improve adjtimex frequency rounding ... Browse Code »

Change PPM_SCALE_INV_SHIFT so that it doesn't throw away any input bits
(19 is the amount of the factor 2 in PPM_SCALE), the output frequency
can then be calculated back to its input value, as the inverse divide
produce a slightly larger value, which is then correctly rounded by the
final shift.

Reported-by: Martin Ziegler
Signed-off-by: Roman Zippel
Cc: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Roman Zippel
2008-09-24 23:33:13 +0800
5cd1c9c5c timekeeping: fix rounding problem during clock update ... Browse Code »

Due to a rounding problem during a clock update it's possible for readers
to observe the clock jumping back by 1nsec. The following simplified
example demonstrates the problem:

cycle xtime
0 0
1000 999999.6
2000 1999999.2
3000 2999998.8
...

1500 = 1499999.4
= 0.0 + 1499999.4
= 999999.6 + 499999.8

When reading the clock only the full nanosecond part is used, while
timekeeping internally keeps nanosecond fractions. If the clock is now
updated at cycle 1500 here, a nanosecond is missing due to the truncation.

The simple fix is to round up the xtime value during the update, this also
changes the distance to the reference time, but the adjustment will
automatically take care that it stays under control.

Signed-off-by: Roman Zippel
Signed-off-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Roman Zippel
2008-09-24 23:33:13 +0800
eb3f938fd ntp: let update_persistent_clock() sleep ... Browse Code »

This is a change that makes the 11-minute RTC update be run in the process
context. This is so that update_persistent_clock() can sleep, which may
be required for certain types of RTC hardware -- most notably I2C devices.

Signed-off-by: Maciej W. Rozycki
Cc: Roman Zippel
Cc: Rik van Riel
Cc: David Brownell
Acked-by: Alessandro Zummo
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Maciej W. Rozycki
2008-09-24 23:33:12 +0800

23 Sep, 2008

5 commits

f8e256c68 timers: fix build error in !oneshot case ... Browse Code »

kernel/time/tick-common.c: In function ‘tick_setup_periodic’:
kernel/time/tick-common.c:113: error: implicit declaration of function ‘tick_broadcast_oneshot_active’

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-09-23 18:57:00 +0800
27ce4cb4a clockevents: prevent mode mismatch on cpu online ... Browse Code »

Impact: timer hang on CPU online observed on AMD C1E systems

When a CPU is brought online then the broadcast machinery can
be in the one shot state already. Check this and setup the timer
device of the new CPU in one shot mode so the broadcast code
can pick up the next_event value correctly.

Another AMD C1E oddity, as we switch to broadcast immediately and
not after the full bring up via the ACPI cpu idle code.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-23 17:38:53 +0800
302745699 clockevents: check broadcast device not tick device ... Browse Code »

Impact: Possible hang on CPU online observed on AMD C1E machines.

The broadcast setup code looks at the mode of the tick device to
determine whether it needs to be shut down or setup. This is wrong
when the broadcast mode is set to one shot already. This can happen
when a CPU is brought online as it goes through the periodic setup
first.

The problem went unnoticed as sane systems do not call into that code
before the switch to one shot for the clock event device happens.
The AMD C1E idle routine switches over immediately and thereby shuts
down the just setup device before the first interrupt happens.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-23 17:38:53 +0800
49d670fb8 clockevents: prevent stale tick_next_period for onlining CPUs ... Browse Code »

Impact: possible hang on CPU onlining in timer one shot mode.

The tick_next_period variable is only used during boot on nohz/highres
enabled systems, but for CPU onlining it needs to be maintained when
the per cpu clock events device operates in one shot mode.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-23 17:38:53 +0800
6441402b1 clockevents: prevent cpu online to interfere with nohz ... Browse Code »

Impact: rare hang which can be triggered on CPU online.

tick_do_timer_cpu keeps track of the CPU which updates jiffies
via do_timer. The value -1 is used to signal, that currently no
CPU is doing this. There are two cases, where the variable can
have this state:

boot:
necessary for systems where the boot cpu id can be != 0

nohz long idle sleep:
When the CPU which did the jiffies update last goes into
a long idle sleep it drops the update jiffies duty so
another CPU which is not idle can pick it up and keep
jiffies going.

Using the same value for both situations is wrong, as the CPU online
code can see the -1 state when the timer of the newly onlined CPU is
setup. The setup for a newly onlined CPU goes through periodic mode
and can pick up the do_timer duty without being aware of the nohz /
highres mode of the already running system.

Use two separate states and make them constants to avoid magic
numbers confusion.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-23 17:38:52 +0800

17 Sep, 2008

1 commit

2344abbcb clockevents: make device shutdown robust ... Browse Code »

The device shut down does not cleanup the next_event variable of the
clock event device. So when the device is reactivated the possible
stale next_event value can prevent the device to be reprogrammed as it
claims to wait on a event already.

This is the root cause of the resurfacing suspend/resume problem,
where systems need key press to come back to life.

Fix this by setting next_event to KTIME_MAX when the device is shut
down. Use a separate function for shutdown which takes care of that
and only keep the direct set mode call in the broadcast code, where we
can not touch the next_event value.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-17 04:47:02 +0800

10 Sep, 2008

1 commit

61c22c34c clockevents: remove WARN_ON which was used to gather information ... Browse Code »

The issue of the endless reprogramming loop due to a too small
min_delta_ns was fixed with the previous updates of the clock events
code, but we had no information about the spread of this problem. I
added a WARN_ON to get automated information via kerneloops.org and to
get some direct reports, which allowed me to analyse the affected
machines.

The WARN_ON has served its purpose and would be annoying for a release
kernel. Remove it and just keep the information about the increase of
the min_delta_ns value.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-10 04:20:01 +0800

08 Sep, 2008

1 commit

704af52bd hrtimer: show the timer ranges in /proc/timer_list ... Browse Code »

to help debugging and visibility of timer ranges, show them
in the existing timer list in /proc/timer_list

Signed-off-by: Arjan van de Ven

Arjan van de Ven
2008-09-08 07:10:20 +0800

07 Sep, 2008

1 commit

f53252256 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clocksource, acpi_pm.c: check for monotonicity
clocksource, acpi_pm.c: use proper read function also in errata mode
ntp: fix calculation of the next jiffie to trigger RTC sync
x86: HPET: read back compare register before reading counter
x86: HPET fix moronic 32/64bit thinko
clockevents: broadcast fixup possible waiters
HPET: make minimum reprogramming delta useful
clockevents: prevent endless loop lockup
clockevents: prevent multiple init/shutdown
clockevents: enforce reprogram in oneshot setup
clockevents: prevent endless loop in periodic broadcast handler
clockevents: prevent clockevent event_handler ending up handler_noop

Linus Torvalds
2008-09-07 10:33:26 +0800

06 Sep, 2008

5 commits

4ff4b9e19 ntp: fix calculation of the next jiffie to trigger RTC sync ... Browse Code »

We have a bug in the calculation of the next jiffie to trigger the RTC
synchronisation. The aim here is to run sync_cmos_clock() as close as
possible to the middle of a second. Which means we want this function to
be called less than or equal to half a jiffie away from when now.tv_nsec
equals 5e8 (500000000).

If this is not the case for a given call to the function, for this purpose
instead of updating the RTC we calculate the offset in nanoseconds to the
next point in time where now.tv_nsec will be equal 5e8. The calculated
offset is then converted to jiffies as these are the unit used by the
timer.

Hovewer timespec_to_jiffies() used here uses a ceil()-type rounding mode,
where the resulting value is rounded up. As a result the range of
now.tv_nsec when the timer will trigger is from 5e8 to 5e8 + TICK_NSEC
rather than the desired 5e8 - TICK_NSEC / 2 to 5e8 + TICK_NSEC / 2.

As a result if for example sync_cmos_clock() happens to be called at the
time when now.tv_nsec is between 5e8 + TICK_NSEC / 2 and 5e8 to 5e8 +
TICK_NSEC, it will simply be rescheduled HZ jiffies later, falling in the
same range of now.tv_nsec again. Similarly for cases offsetted by an
integer multiple of TICK_NSEC.

This change addresses the problem by subtracting TICK_NSEC / 2 from the
nanosecond offset to the next point in time where now.tv_nsec will be
equal 5e8, effectively shifting the following rounding in
timespec_to_jiffies() so that it produces a rounded-to-nearest result.

Signed-off-by: Maciej W. Rozycki
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Maciej W. Rozycki
2008-09-06 21:31:48 +0800
77dd3b3bd Merge branch 'linus' into timers/ntp Browse Code »

Ingo Molnar
2008-09-06 21:31:03 +0800
7300711e8 clockevents: broadcast fixup possible waiters ... Browse Code »

Until the C1E patches arrived there where no users of periodic broadcast
before switching to oneshot mode. Now we need to trigger a possible
waiter for a periodic broadcast when switching to oneshot mode.
Otherwise we can starve them for ever.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-06 13:21:17 +0800
cc584b213 hrtimer: convert kernel/* to the new hrtimer apis ... Browse Code »

In order to be able to do range hrtimers we need to use accessor functions
to the "expire" member of the hrtimer struct.
This patch converts kernel/* to these accessors.

Signed-off-by: Arjan van de Ven

Arjan van de Ven
2008-09-06 12:35:13 +0800
56c7426b3 sched_clock: fix NOHZ interaction ... Browse Code »

If HLT stops the TSC, we'll fail to account idle time, thereby inflating the
actual process times. Fix this by re-calibrating the clock against GTOD when
leaving nohz mode.

Signed-off-by: Peter Zijlstra
Tested-by: Avi Kivity
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-09-06 00:14:08 +0800

05 Sep, 2008

5 commits

1fb9b7d29 clockevents: prevent endless loop lockup ... Browse Code »

The C1E/HPET bug reports on AMDX2/RS690 systems where tracked down to a
too small value of the HPET minumum delta for programming an event.

The clockevents code needs to enforce an interrupt event on the clock event
device in some cases. The enforcement code was stupid and naive, as it just
added the minimum delta to the current time and tried to reprogram the device.
When the minimum delta is too small, then this loops forever.

Add a sanity check. Allow reprogramming to fail 3 times, then print a warning
and double the minimum delta value to make sure, that this does not happen again.
Use the same function for both tick-oneshot and tick-broadcast code.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2008-09-05 17:11:53 +0800
9c17bcda9 clockevents: prevent multiple init/shutdown ... Browse Code »

While chasing the C1E/HPET bugreports I went through the clock events
code inch by inch and found that the broadcast device can be initialized
and shutdown multiple times. Multiple shutdowns are not critical, but
useless waste of time. Multiple initializations are simply broken. Another
CPU might have the device in use already after the first initialization and
the second init could just render it unusable again.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2008-09-05 17:11:52 +0800
7205656ab clockevents: enforce reprogram in oneshot setup ... Browse Code »

In tick_oneshot_setup we program the device to the given next_event,
but we do not check the return value. We need to make sure that the
device is programmed enforced so the interrupt handler engine starts
working. Split out the reprogramming function from tick_program_event()
and call it with the device, which was handed in to tick_setup_oneshot().
Set the force argument, so the devices is firing an interrupt.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2008-09-05 17:11:52 +0800
d4496b395 clockevents: prevent endless loop in periodic broadcast handler ... Browse Code »

The reprogramming of the periodic broadcast handler was broken,
when the first programming returned -ETIME. The clockevents code
stores the new expiry value in the clock events device next_event field
only when the programming time has not been elapsed yet. The loop in
question calculates the new expiry value from the next_event value
and therefor never increases.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2008-09-05 17:11:51 +0800
7c1e76897 clockevents: prevent clockevent event_handler ending up handler_noop ... Browse Code »

There is a ordering related problem with clockevents code, due to which
clockevents_register_device() called after tickless/highres switch
will not work. The new clockevent ends up with clockevents_handle_noop as
event handler, resulting in no timer activity.

The problematic path seems to be

* old device already has hrtimer_interrupt as the event_handler
* new clockevent device registers with a higher rating
* tick_check_new_device() is called
* clockevents_exchange_device() gets called
* old->event_handler is set to clockevents_handle_noop
* tick_setup_device() is called for the new device
* which sets new->event_handler using the old->event_handler which is noop.

Change the ordering so that new device inherits the proper handler.

This does not have any issue in normal case as most likely all the clockevent
devices are setup before the highres switch. But, can potentially be affecting
some corner case where HPET force detect happens after the highres switch.
This was a problem with HPET in MSI mode code that we have been experimenting
with.

Signed-off-by: Venkatesh Pallipadi
Signed-off-by: Shaohua Li
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Venkatesh Pallipadi
2008-09-05 17:11:51 +0800

22 Aug, 2008

1 commit

916c7a855 ntp: fix ADJ_OFFSET_SS_READ bug and do_adjtimex() cleanup ... Browse Code »

Thanks to the review by Michael Kerrisk a bug in the recent
ADJ_OFFSET_SS_READ option was discovered, where the ntp time_offset was
inadvertently set by it. This fixes this by making the adjtime code
more separate from the ntp_adjtime code (both of which really want to
be separate syscalls).

Signed-off-by: Roman Zippel
Signed-off-by: Andrew Morton
Acked-by: John Stultz
Signed-off-by: Ingo Molnar

Roman Zippel
2008-08-22 12:40:18 +0800