Eric Lee / smarc-fsl-linux-kernel

07 Oct, 2008

1 commit

cc1e0f4f7 kgdb: call touch_softlockup_watchdog on resume ... Browse Code »

The softlockup watchdog needs to be touched when resuming the from the
kgdb stopped state to avoid the printk that a CPU is stuck if the
debugger was active for longer than the softlockup threshold.

Signed-off-by: Jason Wessel

Jason Wessel
2008-10-07 02:50:59 +0800

04 Oct, 2008

1 commit

07454bfff clockevents: check broadcast tick device not the clock events device ... Browse Code »

Impact: jiffies increment too fast.

Hugh Dickins noted that with NOHZ=n and HIGHRES=n jiffies get
incremented too fast. The reason is a wrong check in the broadcast
enter/exit code, which keeps the local apic timer in periodic mode
when the switch happens.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-10-04 16:51:07 +0800

03 Oct, 2008

1 commit

aa94fbd5c fix error-path NULL deref in alloc_posix_timer() ... Browse Code »

Found by static checker (http://repo.or.cz/w/smatch.git).

Signed-off-by: Dan Carpenter
Acked-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2008-10-03 06:53:13 +0800

30 Sep, 2008

1 commit

cf4b0b2c9 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
hrtimer: prevent migration of per CPU hrtimers
hrtimer: mark migration state
hrtimer: fix migration of CB_IRQSAFE_NO_SOFTIRQ hrtimers
hrtimer: migrate pending list on cpu offline

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Linus Torvalds
2008-09-30 23:39:28 +0800

29 Sep, 2008

5 commits

31a78f23b mm owner: fix race between swapoff and exit ... Browse Code »

There's a race between mm->owner assignment and swapoff, more easily
seen when task slab poisoning is turned on. The condition occurs when
try_to_unuse() runs in parallel with an exiting task. A similar race
can occur with callers of get_task_mm(), such as /proc//
or ptrace or page migration.

CPU0 CPU1
try_to_unuse
looks at mm = task0->mm
increments mm->mm_users
task 0 exits
mm->owner needs to be updated, but no
new owner is found (mm_users > 1, but
no other task has task->mm = task0->mm)
mm_update_next_owner() leaves
mmput(mm) decrements mm->mm_users
task0 freed
dereferencing mm->owner fails

The fix is to notify the subsystem via mm_owner_changed callback(),
if no new owner is found, by specifying the new task as NULL.

Jiri Slaby:
mm->owner was set to NULL prior to calling cgroup_mm_owner_callbacks(), but
must be set after that, so as not to pass NULL as old owner causing oops.

Daisuke Nishimura:
mm_update_next_owner() may set mm->owner to NULL, but mem_cgroup_from_task()
and its callers need to take account of this situation to avoid oops.

Hugh Dickins:
Lockdep warning and hang below exec_mmap() when testing these patches.
exit_mm() up_reads mmap_sem before calling mm_update_next_owner(),
so exec_mmap() now needs to do the same. And with that repositioning,
there's now no point in mm_need_new_owner() allowing for NULL mm.

Reported-by: Hugh Dickins
Signed-off-by: Balbir Singh
Signed-off-by: Jiri Slaby
Signed-off-by: Daisuke Nishimura
Signed-off-by: Hugh Dickins
Cc: KAMEZAWA Hiroyuki
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Balbir Singh
2008-09-29 23:41:47 +0800
ccc7dadf7 hrtimer: prevent migration of per CPU hrtimers ... Browse Code »

Impact: per CPU hrtimers can be migrated from a dead CPU

The hrtimer code has no knowledge about per CPU timers, but we need to
prevent the migration of such timers and warn when such a timer is
active at migration time.

Explicitely mark the timers as per CPU and use a more understandable
mode descriptor for the interrupts safe unlocked callback mode, which
is used by hrtimer_sleeper and the scheduler code.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-29 23:09:14 +0800
b00c1a99e hrtimer: mark migration state ... Browse Code »

Impact: during migration active hrtimers can be seen as inactive

The migration code removes the hrtimers from the queues of the dead
CPU and sets the state temporary to INACTIVE. The enqueue code sets it
to ACTIVE/PENDING again.

Prevent that the wrong state can be seen by using a separate migration
state bit.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-29 23:09:14 +0800
41e1022ea hrtimer: fix migration of CB_IRQSAFE_NO_SOFTIRQ hrtimers ... Browse Code »

Impact: Stale timers after a CPU went offline.

commit 37bb6cb4097e29ffee970065b74499cbf10603a3
hrtimer: unlock hrtimer_wakeup

changed the hrtimer sleeper callback mode to CB_IRQSAFE_NO_SOFTIRQ due
to locking problems. A result of this change is that when enqueue is
called for an already expired hrtimer the callback function is not
longer called directly from the enqueue code. The normal callers have
been fixed in the code, but the migration code which moves hrtimers
from a dead CPU to a live CPU was not made aware of this.

This can be fixed by checking the timer state after the call to
enqueue in the migration code.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-29 23:09:14 +0800
7659e3496 hrtimer: migrate pending list on cpu offline ... Browse Code »

Impact: hrtimers which are on the pending list are not migrated at cpu
offline and can be stale forever

Add the pending list migration when CONFIG_HIGH_RES_TIMERS is enabled

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-29 23:09:13 +0800

26 Sep, 2008

2 commits

d7161a653 kgdb, x86, arm, mips, powerpc: ignore user space single stepping ... Browse Code »

On the x86 arch, user space single step exceptions should be ignored
if they occur in the kernel space, such as ptrace stepping through a
system call.

First check if it is kgdb that is executing a single step, then ensure
it is not an accidental traversal into the user space, while in kgdb,
any other time the TIF_SINGLESTEP is set, kgdb should ignore the
exception.

On x86, arm, mips and powerpc, the kgdb_contthread usage was
inconsistent with the way single stepping is implemented in the kgdb
core. The arch specific stub should always set the
kgdb_cpu_doing_single_step correctly if it is single stepping. This
allows kgdb to correctly process an instruction steps if ptrace
happens to be requesting an instruction step over a system call.

Signed-off-by: Jason Wessel

Jason Wessel
2008-09-26 23:36:41 +0800
18d6522b8 kgdb: could not write to the last of valid memory with kgdb ... Browse Code »

On the ARM architecture, kgdb will crash the kernel if the last byte
of valid memory is written due to a flush_icache_range flushing
beyond the memory boundary.

Signed-off-by: Atsuo Igarashi
Signed-off-by: Jason Wessel

Atsuo Igarashi
2008-09-26 23:36:41 +0800

24 Sep, 2008

2 commits

8553f321e Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers: fix build error in !oneshot case
x86: c1e_idle: don't mark TSC unstable if CPU has invariant TSC
x86: prevent C-states hang on AMD C1E enabled machines
clockevents: prevent mode mismatch on cpu online
clockevents: check broadcast device not tick device
clockevents: prevent stale tick_next_period for onlining CPUs
x86: prevent stale state of c1e_mask across CPU offline/online
clockevents: prevent cpu online to interfere with nohz

Linus Torvalds
2008-09-24 05:57:36 +0800
be3be8905 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: fix init_hrtick() section mismatch warning

Linus Torvalds
2008-09-24 05:57:22 +0800

23 Sep, 2008

7 commits

f9092f358 kexec: fix segmentation fault in kimage_add_entry ... Browse Code »

A segmentation fault can occur in kimage_add_entry in kexec.c when loading
a kernel image into memory. The fault occurs because a page is requested
by calling kimage_alloc_page with gfp_mask GFP_KERNEL and the function may
actually return a page with gfp_mask GFP_HIGHUSER. The high mem page is
returned because it was swapped with the kernel page due to the kernel
page being a page that will shortly be copied to.

This patch ensures that kimage_alloc_page returns a page that was created
with the correct gfp flags.

I have verified the change and fixed the whitespace damage of the original
patch. Jonathan did a great job of tracking this down after he hit the
problem. -- Eric

Signed-off-by: Jonathan Steel
Signed-off-by: Eric W. Biederman
Acked-by: Simon Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jonathan Steel
2008-09-23 23:09:14 +0800
f8e256c68 timers: fix build error in !oneshot case ... Browse Code »

kernel/time/tick-common.c: In function ‘tick_setup_periodic’:
kernel/time/tick-common.c:113: error: implicit declaration of function ‘tick_broadcast_oneshot_active’

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-09-23 18:57:00 +0800
27ce4cb4a clockevents: prevent mode mismatch on cpu online ... Browse Code »

Impact: timer hang on CPU online observed on AMD C1E systems

When a CPU is brought online then the broadcast machinery can
be in the one shot state already. Check this and setup the timer
device of the new CPU in one shot mode so the broadcast code
can pick up the next_event value correctly.

Another AMD C1E oddity, as we switch to broadcast immediately and
not after the full bring up via the ACPI cpu idle code.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-23 17:38:53 +0800
302745699 clockevents: check broadcast device not tick device ... Browse Code »

Impact: Possible hang on CPU online observed on AMD C1E machines.

The broadcast setup code looks at the mode of the tick device to
determine whether it needs to be shut down or setup. This is wrong
when the broadcast mode is set to one shot already. This can happen
when a CPU is brought online as it goes through the periodic setup
first.

The problem went unnoticed as sane systems do not call into that code
before the switch to one shot for the clock event device happens.
The AMD C1E idle routine switches over immediately and thereby shuts
down the just setup device before the first interrupt happens.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-23 17:38:53 +0800
49d670fb8 clockevents: prevent stale tick_next_period for onlining CPUs ... Browse Code »

Impact: possible hang on CPU onlining in timer one shot mode.

The tick_next_period variable is only used during boot on nohz/highres
enabled systems, but for CPU onlining it needs to be maintained when
the per cpu clock events device operates in one shot mode.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-23 17:38:53 +0800
6441402b1 clockevents: prevent cpu online to interfere with nohz ... Browse Code »

Impact: rare hang which can be triggered on CPU online.

tick_do_timer_cpu keeps track of the CPU which updates jiffies
via do_timer. The value -1 is used to signal, that currently no
CPU is doing this. There are two cases, where the variable can
have this state:

boot:
necessary for systems where the boot cpu id can be != 0

nohz long idle sleep:
When the CPU which did the jiffies update last goes into
a long idle sleep it drops the update jiffies duty so
another CPU which is not idle can pick it up and keep
jiffies going.

Using the same value for both situations is wrong, as the CPU online
code can see the -1 state when the timer of the newly onlined CPU is
setup. The setup for a newly onlined CPU goes through periodic mode
and can pick up the do_timer duty without being aware of the nohz /
highres mode of the already running system.

Use two separate states and make them constants to avoid magic
numbers confusion.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-23 17:38:52 +0800
fa7482031 sched: fix init_hrtick() section mismatch warning ... Browse Code »

LD kernel/built-in.o
WARNING: kernel/built-in.o(.text+0x326): Section mismatch in reference
from the function init_hrtick() to the variable
.cpuinit.data:hotplug_hrtick_nb.8
The function init_hrtick() references
the variable __cpuinitdata hotplug_hrtick_nb.8.
This is often because init_hrtick lacks a __cpuinitdata
annotation or the annotation of hotplug_hrtick_nb.8 is wrong.

Signed-off-by: Md.Rakib H. Mullick
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Rakib Mullick
2008-09-23 17:02:13 +0800

20 Sep, 2008

2 commits

b4df9d88a Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: fix deadlock in setting scheduler parameter to zero
sched: fix 2.6.27-rc5 couldn't boot on tulsa machine randomly

Linus Torvalds
2008-09-20 07:17:12 +0800
902f2ac9d Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clockevents: make device shutdown robust
clocksource, acpi_pm.c: fix check for monotonicity
clockevents: remove WARN_ON which was used to gather information

Linus Torvalds
2008-09-20 07:16:50 +0800

17 Sep, 2008

1 commit

2344abbcb clockevents: make device shutdown robust ... Browse Code »

The device shut down does not cleanup the next_event variable of the
clock event device. So when the device is reactivated the possible
stale next_event value can prevent the device to be reprogrammed as it
claims to wait on a event already.

This is the root cause of the resurfacing suspend/resume problem,
where systems need key press to come back to life.

Fix this by setting next_event to KTIME_MAX when the device is shut
down. Use a separate function for shutdown which takes care of that
and only keep the direct set mode call in the broadcast code, where we
can not touch the next_event value.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-17 04:47:02 +0800

14 Sep, 2008

1 commit

4e74339af cpuset: avoid changing cpuset's cpus when -errno returned ... Browse Code »

After the patch:

commit 0b2f630a28d53b5a2082a5275bc3334b10373508
Author: Miao Xie
Date: Fri Jul 25 01:47:21 2008 -0700

cpusets: restructure the function update_cpumask() and update_nodemask()

It might happen that 'echo 0 > /cpuset/sub/cpus' returned failure but 'cpus'
has been changed, because cpus was changed before calling heap_init() which
may return -ENOMEM.

This patch restores the orginal behavior.

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Cc: Paul Jackson
Cc: Miao Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-09-14 05:41:50 +0800

11 Sep, 2008

2 commits

ec5d49899 sched: fix deadlock in setting scheduler parameter to zero ... Browse Code »

Andrei Gusev wrote:

> I played witch scheduler settings. After doing something like:
> echo -n 1000000 >sched_rt_period_us
>
> command is locked. I found in kernel.log:
>
> Sep 11 00:39:34 zaratustra
> Sep 11 00:39:34 zaratustra Pid: 4495, comm: bash Tainted: G W
> (2.6.26.3 #12)
> Sep 11 00:39:34 zaratustra EIP: 0060:[] EFLAGS: 00210246 CPU: 0
> Sep 11 00:39:34 zaratustra EIP is at div64_u64+0x57/0x80
> Sep 11 00:39:34 zaratustra EAX: 0000389f EBX: 00000000 ECX: 00000000
> EDX: 00000000
> Sep 11 00:39:34 zaratustra ESI: d9800000 EDI: d9800000 EBP: 0000389f
> ESP: ea7a6edc
> Sep 11 00:39:34 zaratustra DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> Sep 11 00:39:34 zaratustra Process bash (pid: 4495, ti=ea7a6000
> task=ea744000 task.ti=ea7a6000)
> Sep 11 00:39:34 zaratustra Stack: 00000000 000003e8 d9800000 0000389f
> c0119042 00000000 00000000 00000001
> Sep 11 00:39:34 zaratustra 00000000 00000000 ea7a6f54 00010000 00000000
> c04d2e80 00000001 000e7ef0
> Sep 11 00:39:34 zaratustra c01191a3 00000000 00000000 ea7a6fa0 00000001
> ffffffff c04d2e80 ea5b2480
> Sep 11 00:39:34 zaratustra Call Trace:
> Sep 11 00:39:34 zaratustra [] __rt_schedulable+0x52/0x130
> Sep 11 00:39:34 zaratustra [] sched_rt_handler+0x83/0x120
> Sep 11 00:39:34 zaratustra [] proc_sys_call_handler+0xb6/0xd0
> Sep 11 00:39:34 zaratustra [] proc_sys_write+0x0/0x20
> Sep 11 00:39:34 zaratustra [] proc_sys_write+0x19/0x20
> Sep 11 00:39:34 zaratustra [] vfs_write+0xa8/0x140
> Sep 11 00:39:34 zaratustra [] sys_write+0x41/0x80
> Sep 11 00:39:34 zaratustra [] sysenter_past_esp+0x6a/0x91
> Sep 11 00:39:34 zaratustra =======================
> Sep 11 00:39:34 zaratustra Code: c8 41 0f ad f3 d3 ee f6 c1 20 0f 45 de
> 31 f6 0f ad ef d3 ed f6 c1 20 0f 45 fd 0f 45 ee 31 c9 39 eb 89 fe 89 ea
> 77 08 89 e8 31 d2 f3 89 c1 89 f0 8b 7c 24 08 f7 f3 8b 74 24 04 89
> ca 8b 1c 24
> Sep 11 00:39:34 zaratustra EIP: [] div64_u64+0x57/0x80 SS:ESP
> 0068:ea7a6edc
> Sep 11 00:39:34 zaratustra ---[ end trace 4eaa2a86a8e2da22 ]---

fix the boundary condition.

sysctl_sched_rt_period=0 makes exception at to_ratio().

Signed-off-by: Hiroshi Shimamoto
Signed-off-by: Ingo Molnar

Hiroshi Shimamoto
2008-09-11 15:39:18 +0800
baf25731e sched: fix 2.6.27-rc5 couldn't boot on tulsa machine randomly ... Browse Code »

On my tulsa x86-64 machine, kernel 2.6.25-rc5 couldn't boot randomly.

Basically, function __enable_runtime forgets to reset rt_rq->rt_throttled
to 0. When every cpu is up, per-cpu migration_thread is created and it runs
very fast, sometimes to mark the corresponding rt_rq->rt_throttled to 1 very
quickly. After all cpus are up, with below calling chain:

sched_init_smp => arch_init_sched_domains => build_sched_domains => ...
=> cpu_attach_domain => rq_attach_root => set_rq_online => ...
=> _enable_runtime

_enable_runtime is called against every rt_rq again, so rt_rq->rt_time is
reset to 0, but rt_rq->rt_throttled might be still 1. Later on function
do_sched_rt_period_timer couldn't reset it, and all RT tasks couldn't be
scheduled to run on that cpu. here is RT task migration_thread which is
woken up when a task is migrated to another cpu.

Below patch fixes it against 2.6.27-rc5.

Signed-off-by: Zhang Yanmin
Signed-off-by: Ingo Molnar

Zhang, Yanmin
2008-09-11 15:34:28 +0800

10 Sep, 2008

1 commit

61c22c34c clockevents: remove WARN_ON which was used to gather information ... Browse Code »

The issue of the endless reprogramming loop due to a too small
min_delta_ns was fixed with the previous updates of the clock events
code, but we had no information about the spread of this problem. I
added a WARN_ON to get automated information via kerneloops.org and to
get some direct reports, which allowed me to analyse the affected
machines.

The WARN_ON has served its purpose and would be annoying for a release
kernel. Remove it and just keep the information about the increase of
the min_delta_ns value.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-10 04:20:01 +0800

09 Sep, 2008

1 commit

e1d7bf149 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: arch_reinit_sched_domains() must destroy domains to force rebuild
sched, cpuset: rework sched domains and CPU hotplug handling (v4)

Linus Torvalds
2008-09-09 06:47:21 +0800

07 Sep, 2008

3 commits

f53252256 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clocksource, acpi_pm.c: check for monotonicity
clocksource, acpi_pm.c: use proper read function also in errata mode
ntp: fix calculation of the next jiffie to trigger RTC sync
x86: HPET: read back compare register before reading counter
x86: HPET fix moronic 32/64bit thinko
clockevents: broadcast fixup possible waiters
HPET: make minimum reprogramming delta useful
clockevents: prevent endless loop lockup
clockevents: prevent multiple init/shutdown
clockevents: enforce reprogram in oneshot setup
clockevents: prevent endless loop in periodic broadcast handler
clockevents: prevent clockevent event_handler ending up handler_noop

Linus Torvalds
2008-09-07 10:33:26 +0800
291c54ff7 Merge branch 'sched/cpuset' into sched/urgent Browse Code »

Ingo Molnar
2008-09-07 03:03:16 +0800
dfb512ec4 sched: arch_reinit_sched_domains() must destroy domains to force rebuild ... Browse Code »

What I realized recently is that calling rebuild_sched_domains() in
arch_reinit_sched_domains() by itself is not enough when cpusets are enabled.
partition_sched_domains() code is trying to avoid unnecessary domain rebuilds
and will not actually rebuild anything if new domain masks match the old ones.

What this means is that doing
echo 1 > /sys/devices/system/cpu/sched_mc_power_savings
on a system with cpusets enabled will not take affect untill something changes
in the cpuset setup (ie new sets created or deleted).

This patch fixes restore correct behaviour where domains must be rebuilt in
order to enable MC powersaving flags.

Test on quad-core Core2 box with both CONFIG_CPUSETS and !CONFIG_CPUSETS.
Also tested on dual-core Core2 laptop. Lockdep is happy and things are working
as expected.

Signed-off-by: Max Krasnyansky
Tested-by: Vaidyanathan Srinivasan
Signed-off-by: Ingo Molnar

Max Krasnyansky
2008-09-07 01:22:15 +0800

06 Sep, 2008

4 commits

4ff4b9e19 ntp: fix calculation of the next jiffie to trigger RTC sync ... Browse Code »

We have a bug in the calculation of the next jiffie to trigger the RTC
synchronisation. The aim here is to run sync_cmos_clock() as close as
possible to the middle of a second. Which means we want this function to
be called less than or equal to half a jiffie away from when now.tv_nsec
equals 5e8 (500000000).

If this is not the case for a given call to the function, for this purpose
instead of updating the RTC we calculate the offset in nanoseconds to the
next point in time where now.tv_nsec will be equal 5e8. The calculated
offset is then converted to jiffies as these are the unit used by the
timer.

Hovewer timespec_to_jiffies() used here uses a ceil()-type rounding mode,
where the resulting value is rounded up. As a result the range of
now.tv_nsec when the timer will trigger is from 5e8 to 5e8 + TICK_NSEC
rather than the desired 5e8 - TICK_NSEC / 2 to 5e8 + TICK_NSEC / 2.

As a result if for example sync_cmos_clock() happens to be called at the
time when now.tv_nsec is between 5e8 + TICK_NSEC / 2 and 5e8 to 5e8 +
TICK_NSEC, it will simply be rescheduled HZ jiffies later, falling in the
same range of now.tv_nsec again. Similarly for cases offsetted by an
integer multiple of TICK_NSEC.

This change addresses the problem by subtracting TICK_NSEC / 2 from the
nanosecond offset to the next point in time where now.tv_nsec will be
equal 5e8, effectively shifting the following rounding in
timespec_to_jiffies() so that it produces a rounded-to-nearest result.

Signed-off-by: Maciej W. Rozycki
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Maciej W. Rozycki
2008-09-06 21:31:48 +0800
7300711e8 clockevents: broadcast fixup possible waiters ... Browse Code »

Until the C1E patches arrived there where no users of periodic broadcast
before switching to oneshot mode. Now we need to trigger a possible
waiter for a periodic broadcast when switching to oneshot mode.
Otherwise we can starve them for ever.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-09-06 13:21:17 +0800
49048622e sched: fix process time monotonicity ... Browse Code »

Spencer reported a problem where utime and stime were going negative despite
the fixes in commit b27f03d4bdc145a09fb7b0c0e004b29f1ee555fa. The suspected
reason for the problem is that signal_struct maintains it's own utime and
stime (of exited tasks), these are not updated using the new task_utime()
routine, hence sig->utime can go backwards and cause the same problem
to occur (sig->utime, adds tsk->utime and not task_utime()). This patch
fixes the problem

TODO: using max(task->prev_utime, derived utime) works for now, but a more
generic solution is to implement cputime_max() and use the cputime_gt()
function for comparison.

Reported-by: spencer@bluehost.com
Signed-off-by: Balbir Singh
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Balbir Singh
2008-09-06 00:14:35 +0800
56c7426b3 sched_clock: fix NOHZ interaction ... Browse Code »

If HLT stops the TSC, we'll fail to account idle time, thereby inflating the
actual process times. Fix this by re-calibrating the clock against GTOD when
leaving nohz mode.

Signed-off-by: Peter Zijlstra
Tested-by: Avi Kivity
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-09-06 00:14:08 +0800

05 Sep, 2008

5 commits

1fb9b7d29 clockevents: prevent endless loop lockup ... Browse Code »

The C1E/HPET bug reports on AMDX2/RS690 systems where tracked down to a
too small value of the HPET minumum delta for programming an event.

The clockevents code needs to enforce an interrupt event on the clock event
device in some cases. The enforcement code was stupid and naive, as it just
added the minimum delta to the current time and tried to reprogram the device.
When the minimum delta is too small, then this loops forever.

Add a sanity check. Allow reprogramming to fail 3 times, then print a warning
and double the minimum delta value to make sure, that this does not happen again.
Use the same function for both tick-oneshot and tick-broadcast code.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2008-09-05 17:11:53 +0800
9c17bcda9 clockevents: prevent multiple init/shutdown ... Browse Code »

While chasing the C1E/HPET bugreports I went through the clock events
code inch by inch and found that the broadcast device can be initialized
and shutdown multiple times. Multiple shutdowns are not critical, but
useless waste of time. Multiple initializations are simply broken. Another
CPU might have the device in use already after the first initialization and
the second init could just render it unusable again.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2008-09-05 17:11:52 +0800
7205656ab clockevents: enforce reprogram in oneshot setup ... Browse Code »

In tick_oneshot_setup we program the device to the given next_event,
but we do not check the return value. We need to make sure that the
device is programmed enforced so the interrupt handler engine starts
working. Split out the reprogramming function from tick_program_event()
and call it with the device, which was handed in to tick_setup_oneshot().
Set the force argument, so the devices is firing an interrupt.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2008-09-05 17:11:52 +0800
d4496b395 clockevents: prevent endless loop in periodic broadcast handler ... Browse Code »

The reprogramming of the periodic broadcast handler was broken,
when the first programming returned -ETIME. The clockevents code
stores the new expiry value in the clock events device next_event field
only when the programming time has not been elapsed yet. The loop in
question calculates the new expiry value from the next_event value
and therefor never increases.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2008-09-05 17:11:51 +0800
7c1e76897 clockevents: prevent clockevent event_handler ending up handler_noop ... Browse Code »

There is a ordering related problem with clockevents code, due to which
clockevents_register_device() called after tickless/highres switch
will not work. The new clockevent ends up with clockevents_handle_noop as
event handler, resulting in no timer activity.

The problematic path seems to be

* old device already has hrtimer_interrupt as the event_handler
* new clockevent device registers with a higher rating
* tick_check_new_device() is called
* clockevents_exchange_device() gets called
* old->event_handler is set to clockevents_handle_noop
* tick_setup_device() is called for the new device
* which sets new->event_handler using the old->event_handler which is noop.

Change the ordering so that new device inherits the proper handler.

This does not have any issue in normal case as most likely all the clockevent
devices are setup before the highres switch. But, can potentially be affecting
some corner case where HPET force detect happens after the highres switch.
This was a problem with HPET in MSI mode code that we have been experimenting
with.

Signed-off-by: Venkatesh Pallipadi
Signed-off-by: Shaohua Li
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Venkatesh Pallipadi
2008-09-05 17:11:51 +0800