Eric Lee / smarc-fsl-linux-kernel

25 Mar, 2008

6 commits

266c2e0ab Make printk() console semaphore accesses sensible ... Browse Code »

The printk() logic on when/how to get the console semaphore was
unreadable, this splits the code up into a few helper functions and
makes it easier to follow what is going on.

Signed-off-by: Linus Torvalds

Linus Torvalds
2008-03-25 10:25:08 +0800
5f7b703fe bsd_acct: using task_struct->tgid is not right in pid-namespaces ... Browse Code »

In case we're accounting from a sub-namespace, the tgids reported will not
refer to the right namespace.

Save the pid_namespace we're accounting in on the acct_glbs and use it in
do_acct_process.

Two less :) places using the task_struct.tgid member.

Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-03-25 10:22:20 +0800
a846a1954 bsd_acct: plain current->real_parent access is not always safe ... Browse Code »

This is minor, but dereferencing even current real_parent is not safe on debug
kernels, since the memory, this points to, can be unmapped - RCU protection is
required.

Besides, the tgid field is deprecated and is to be replaced with task_tgid_xxx
call (the 2nd patch), so RCU will be required anyway.

Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-03-25 10:22:19 +0800
58336114a markers: remove ACCESS_ONCE ... Browse Code »

As Paul pointed out, the ACCESS_ONCE are not needed because we already have
the explicit surrounding memory barriers.

Signed-off-by: Mathieu Desnoyers
Cc: Christoph Hellwig
Cc: Mike Mason
Cc: Dipankar Sarma
Cc: David Smith
Cc: "Paul E. McKenney"
Cc: Steven Rostedt
Cc: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2008-03-25 10:22:19 +0800
fd3c36f8b markers: update preempt_disable. call_rcu, rcu_barrier comments ... Browse Code »

Add comments requested by Andrew.

Updated comments about synchronize_sched(). Since we use call_rcu and
rcu_barrier now, these comments were out of sync with the code.

Signed-off-by: Mathieu Desnoyers
Cc: Christoph Hellwig
Cc: Mike Mason
Cc: Dipankar Sarma
Cc: David Smith
Cc: "Paul E. McKenney"
Cc: Steven Rostedt
Cc: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2008-03-25 10:22:19 +0800
92896bd9f Don't 'printk()' while holding xtime lock for writing ... Browse Code »

The printk() can deadlock because it can wake up klogd(), and
task enqueueing will try to read the time in order to set a hrtimer.

Reported-by: Marcin Slusarz
Debugged-by: Peter Zijlstra
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Linus Torvalds

Linus Torvalds
2008-03-25 02:07:15 +0800

21 Mar, 2008

6 commits

22e52b072 sched: add arch_update_cpu_topology hook. ... Browse Code »

Will be called each time the scheduling domains are rebuild.
Needed for architectures that don't have a static cpu topology.

Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky
Signed-off-by: Ingo Molnar

Heiko Carstens
2008-03-21 23:43:48 +0800
9aefd0abd sched: add exported arch_reinit_sched_domains() to header file. ... Browse Code »

Needed so it can be called from outside of sched.c.

Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky
Signed-off-by: Ingo Molnar

Heiko Carstens
2008-03-21 23:43:47 +0800
23e3c3cd2 sched: remove double unlikely from schedule() ... Browse Code »

Combine two unlikely's

Signed-off-by: Roel Kluin
Signed-off-by: Ingo Molnar

Roel Kluin
2008-03-21 23:43:47 +0800
2070ee01d sched: cleanup old and rarely used 'debug' features. ... Browse Code »

TREE_AVG and APPROX_AVG are initial task placement policies that have been
disabled for a long while.. time to remove them.

Signed-off-by: Peter Zijlstra
CC: Srivatsa Vaddagiri
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-03-21 23:43:47 +0800
7d3628b23 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits)
[NET] ifb: set separate lockdep classes for queue locks
[IPV6] KCONFIG: Fix description about IPV6_TUNNEL.
[TCP]: Fix shrinking windows with window scaling
netpoll: zap_completion_queue: adjust skb->users counter
bridge: use time_before() in br_fdb_cleanup()
[TG3]: Fix build warning on sparc32.
MAINTAINERS: bluez-devel is subscribers-only
audit: netlink socket can be auto-bound to pid other than current->pid (v2)
[NET]: Fix permissions of /proc/net
[SCTP]: Fix a race between module load and protosw access
[NETFILTER]: ipt_recent: sanity check hit count
[NETFILTER]: nf_conntrack_h323: logical-bitwise & confusion in process_setup()
[RT2X00] drivers/net/wireless/rt2x00/rt2x00dev.c: remove dead code, fix warning
[IPV4]: esp_output() misannotations
[8021Q]: vlan_dev misannotations
xfrm: ->eth_proto is __be16
[IPV4]: ipv4_is_lbcast() misannotations
[SUNRPC]: net/* NULL noise
[SCTP]: fix misannotated __sctp_rcv_asconf_lookup()
[PKT_SCHED]: annotate cls_u32
...

Linus Torvalds
2008-03-21 22:57:45 +0800
75c0371a2 audit: netlink socket can be auto-bound to pid other than current->pid (v2) ... Browse Code »

From: Pavel Emelyanov

This patch is based on the one from Thomas.

The kauditd_thread() calls the netlink_unicast() and passes
the audit_pid to it. The audit_pid, in turn, is received from
the user space and the tool (I've checked the audit v1.6.9)
uses getpid() to pass one in the kernel. Besides, this tool
doesn't bind the netlink socket to this id, but simply creates
it allowing the kernel to auto-bind one.

That's the preamble.

The problem is that netlink_autobind() _does_not_ guarantees
that the socket will be auto-bound to the current pid. Instead
it uses the current pid as a hint to start looking for a free
id. So, in case of conflict, the audit messages can be sent
to a wrong socket. This can happen (it's unlikely, but can be)
in case some task opens more than one netlink sockets and then
the audit one starts - in this case the audit's pid can be busy
and its socket will be bound to another id.

The proposal is to introduce an audit_nlk_pid in audit subsys,
that will point to the netlink socket to send packets to. It
will most often be equal to audit_pid. The socket id can be
got from the skb's netlink CB right in the audit_receive_msg.
The audit_nlk_pid reset to 0 is not required, since all the
decisions are taken based on audit_pid value only.

Later, if the audit tools will bind the socket themselves, the
kernel will have to provide a way to setup the audit_nlk_pid
as well.

A good side effect of this patch is that audit_pid can later
be converted to struct pid, as it is not longer safe to use
pid_t-s in the presence of pid namespaces. But audit code still
uses the tgid from task_struct in the audit_signal_info and in
the audit_filter_syscall.

Signed-off-by: Thomas Graf
Signed-off-by: Pavel Emelyanov
Acked-by: Eric Paris
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-03-21 06:39:41 +0800

20 Mar, 2008

1 commit

3150e63df revert "clocksource: make clocksource watchdog cycle through online CPUs" ... Browse Code »

Revert commit 1ada5cba6a0318f90e45b38557e7b5206a9cba38 ("clocksource:
make clocksource watchdog cycle through online CPUs") due to the
regression reported by Gabriel C at

http://lkml.org/lkml/2008/2/24/281

(short vesion: it makes TSC be marked as always unstable on his
machine).

Cc: Andi Kleen
Acked-by: Ingo Molnar
Cc: Thomas Gleixner
Cc: Robert Hancock
Acked-by: Linus Torvalds
Cc: "Rafael J. Wysocki"
Cc: Gabriel C
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2008-03-20 09:53:37 +0800

19 Mar, 2008

6 commits

74e3cd7f4 sched: retune wake granularity ... Browse Code »

reduce wake-up granularity for better interactivity.

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-03-19 11:27:53 +0800
f540a6080 sched: wakeup-buddy tasks are cache-hot ... Browse Code »

Wakeup-buddy tasks are cache-hot - this makes it a bit harder
for the load-balancer to tear them apart. (but it's still possible,
if the load is sufficiently assymetric)

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-03-19 11:27:53 +0800
4ae7d5cef sched: improve affine wakeups ... Browse Code »

improve affine wakeups. Maintain the 'overlap' metric based on CFS's
sum_exec_runtime - which means the amount of time a task executes
after it wakes up some other task.

Use the 'overlap' for the wakeup decisions: if the 'overlap' is short,
it means there's strong workload coupling between this task and the
woken up task. If the 'overlap' is large then the workload is decoupled
and the scheduler will move them to separate CPUs more easily.

( Also slightly move the preempt_check within try_to_wake_up() - this has
no effect on functionality but allows 'early wakeups' (for still-on-rq
tasks) to be correctly accounted as well.)

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-03-19 11:27:53 +0800
f48273860 sched: clean up wakeup balancing, code flow ... Browse Code »

Clean up the code flow. No code changed:

kernel/sched.o:

text data bss dec hex filename
42521 2858 232 45611 b22b sched.o.before
42521 2858 232 45611 b22b sched.o.after

md5:
09b31c44e9aff8666f72773dc433e2df sched.o.before.asm
09b31c44e9aff8666f72773dc433e2df sched.o.after.asm

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-03-19 11:27:53 +0800
ac192d392 sched: clean up wakeup balancing, rename variables ... Browse Code »

rename 'cpu' to 'prev_cpu'. No code changed:

kernel/sched.o:

text data bss dec hex filename
42521 2858 232 45611 b22b sched.o.before
42521 2858 232 45611 b22b sched.o.after

md5:
09b31c44e9aff8666f72773dc433e2df sched.o.before.asm
09b31c44e9aff8666f72773dc433e2df sched.o.after.asm

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-03-19 11:27:52 +0800
098fb9db2 sched: clean up wakeup balancing, move wake_affine() ... Browse Code »

split out the affine-wakeup bits.

No code changed:

kernel/sched.o:

text data bss dec hex filename
42521 2858 232 45611 b22b sched.o.before
42521 2858 232 45611 b22b sched.o.after

md5:
9d76738f1272aa82f0b7affd2f51df6b sched.o.before.asm
09b31c44e9aff8666f72773dc433e2df sched.o.after.asm

(the md5's changed because stack slots changed and some registers
get scheduled by gcc in a different order - but otherwise the before
and after assembly is instruction for instruction equivalent.)

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-03-19 11:27:52 +0800

17 Mar, 2008

1 commit

16d546694 relay: fix subbuf_splice_actor() adding too many pages ... Browse Code »

If subbuf_pages was larger than the max number of pages the pipe
buffer will hold, subbuf_splice_actor() would happily go beyond
the array size.

Signed-off-by: Jens Axboe

Jens Axboe
2008-03-17 16:04:59 +0800

15 Mar, 2008

7 commits

6a6029b8c sched: simplify sched_slice() ... Browse Code »

Use the existing calc_delta_mine() calculation for sched_slice(). This
saves a divide and simplifies the code because we share it with the
other /cfs_rq->load users.

It also improves code size:

text data bss dec hex filename
42659 2740 144 45543 b1e7 sched.o.before
42093 2740 144 44977 afb1 sched.o.after

Signed-off-by: Ingo Molnar
Signed-off-by: Peter Zijlstra

Ingo Molnar
2008-03-15 10:02:50 +0800
e22ecef1d sched: fix fair sleepers ... Browse Code »

Fair sleepers need to scale their latency target down by runqueue
weight. Otherwise busy systems will gain ever larger sleep bonus.

Signed-off-by: Ingo Molnar
Signed-off-by: Peter Zijlstra

Ingo Molnar
2008-03-15 10:02:50 +0800
aa2ac2522 sched: fix overload performance: buddy wakeups ... Browse Code »

Currently we schedule to the leftmost task in the runqueue. When the
runtimes are very short because of some server/client ping-pong,
especially in over-saturated workloads, this will cycle through all
tasks trashing the cache.

Reduce cache trashing by keeping dependent tasks together by running
newly woken tasks first. However, by not running the leftmost task first
we could starve tasks because the wakee can gain unlimited runtime.

Therefore we only run the wakee if its within a small
(wakeup_granularity) window of the leftmost task. This preserves
fairness, but does alternate server/client task groups.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-03-15 10:02:50 +0800
27d117266 sched: fix calc_delta_mine() ... Browse Code »

lw->weight can be 0 for a short time during bootup.

Signed-off-by: Ingo Molnar
Signed-off-by: Peter Zijlstra

Ingo Molnar
2008-03-15 10:02:50 +0800
e89996ae3 sched: fix update_load_add()/sub() ... Browse Code »

Clear the cached inverse value when updating load. This is needed for
calc_delta_mine() to work correctly when using the rq load.

Signed-off-by: Ingo Molnar
Signed-off-by: Peter Zijlstra

Ingo Molnar
2008-03-15 10:02:49 +0800
3fe69747d sched: min_vruntime fix ... Browse Code »

Current min_vruntime tracking is incorrect and will cause serious
problems when we don't run the leftmost task for some reason.

min_vruntime does two things; 1) it's used to determine a forward
direction when the u64 vruntime wraps, 2) it's used to track the
leftmost vruntime to position newly enqueued tasks from.

The current logic advances min_vruntime whenever the current task's
vruntime advance. Because the current task may pass the leftmost task
still waiting we're failing the second goal. This causes new tasks to be
placed too far ahead and thus penalizes their runtime.

Fix this by making min_vruntime the min_vruntime of the waiting tasks by
tracking it in enqueue/dequeue, and compare against current's vruntime
to obtain the absolute minimum when placing new tasks.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-03-15 10:02:49 +0800
0e1f34833 sched: fix race in schedule() ... Browse Code »

Fix a hard to trigger crash seen in the -rt kernel that also affects
the vanilla scheduler.

There is a race condition between schedule() and some dequeue/enqueue
functions; rt_mutex_setprio(), __setscheduler() and sched_move_task().

When scheduling to idle, idle_balance() is called to pull tasks from
other busy processor. It might drop the rq lock. It means that those 3
functions encounter on_rq=0 and running=1. The current task should be
put when running.

Here is a possible scenario:

CPU0 CPU1
| schedule()
| ->deactivate_task()
| ->idle_balance()
| -->load_balance_newidle()
rt_mutex_setprio() |
| --->double_lock_balance()
*get lock *rel lock
* on_rq=0, ruuning=1 |
* sched_class is changed |
*rel lock *get lock
: |
:
->put_prev_task_rt()
->pick_next_task_fair()
=> panic

The current process of CPU1(P1) is scheduling. Deactivated P1, and the
scheduler looks for another process on other CPU's runqueue because CPU1
will be idle. idle_balance(), load_balance_newidle() and
double_lock_balance() are called and double_lock_balance() could drop
the rq lock. On the other hand, CPU0 is trying to boost the priority of
P1. The result of boosting only P1's prio and sched_class are changed to
RT. The sched entities of P1 and P1's group are never put. It makes
cfs_rq invalid, because the cfs_rq has curr and no leaf, but
pick_next_task_fair() is called, then the kernel panics.

Signed-off-by: Hiroshi Shimamoto
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Hiroshi Shimamoto
2008-03-15 10:02:49 +0800

13 Mar, 2008

2 commits

29ea5171c Merge branches 'release' and 'doc' into release Browse Code »

Len Brown
2008-03-13 13:59:53 +0800
53471121a documentation: Move power-related files to Documentation/power/ ... Browse Code »

Move 00-INDEX entries to power/00-INDEX (and add entry for
pm_qos_interface.txt).

Update references to moved filenames.

Fix some trailing whitespace.

Signed-off-by: Randy Dunlap
Signed-off-by: Len Brown

Randy Dunlap
2008-03-13 06:10:51 +0800

12 Mar, 2008

1 commit

a82f7119f Hibernation: Fix mark_nosave_pages() ... Browse Code »

There is a problem in the hibernation code that triggers on some NUMA
systems on which pfn_valid() returns 'true' for some PFNs that don't
belong to any zone. Namely, there is a BUG_ON() in
memory_bm_find_bit() that triggers for PFNs not belonging to any
zone and passing the pfn_valid() test. On the affected systems it
triggers when we mark PFNs reported by the platform as not saveable,
because the PFNs in question belong to a region mapped directly using
iorepam() (i.e. the ACPI data area) and they pass the pfn_valid()
test.

Modify memory_bm_find_bit() so that it returns an error if given PFN
doesn't belong to any zone instead of crashing the kernel and ignore
the result returned by it in mark_nosave_pages(), while marking the
"nosave" memory regions.

This doesn't affect the hibernation functionality, as we won't touch
the PFNs in question anyway.

http://bugzilla.kernel.org/show_bug.cgi?id=9966 .

Signed-off-by: Rafael J. Wysocki
Signed-off-by: Len Brown

Rafael J. Wysocki
2008-03-12 11:15:55 +0800

11 Mar, 2008

5 commits

08f503b0c keep rd->online and cpu_online_map in sync ... Browse Code »

It is possible to allow the root-domain cache of online cpus to
become out of sync with the global cpu_online_map. This is because we
currently trigger removal of cpus too early in the notifier chain.
Other DOWN_PREPARE handlers may in fact run and reconfigure the
root-domain topology, thereby stomping on our own offline handling.

The end result is that rd->online may become out of sync with
cpu_online_map, which results in potential task misrouting.

So change the offline handling to be more tightly coupled with the
global offline process by triggering on CPU_DYING intead of
CPU_DOWN_PREPARE.

Signed-off-by: Gregory Haskins
Cc: Gautham R Shenoy
Cc: "Siddha, Suresh B"
Cc: "Rafael J. Wysocki"
Cc: Andrew Morton
Signed-off-by: Ingo Molnar

Gregory Haskins
2008-03-11 21:02:58 +0800
1f94ef598 Revert "cpu hotplug: adjust root-domain->online span in response to hotplug event" ... Browse Code »

This reverts commit 393d94d98b19089ec172566e23557997931b137e.

Lets fix this right.

Signed-off-by: Gregory Haskins
Cc: Gautham R Shenoy
Cc: "Siddha, Suresh B"
Cc: "Rafael J. Wysocki"
Cc: Andrew Morton
Signed-off-by: Ingo Molnar

Gregory Haskins
2008-03-11 21:02:58 +0800
21bbb39c3 rcu: move PREEMPT_RCU config option back under PREEMPT ... Browse Code »

The original preemptible-RCU patch put the choice between classic and
preemptible RCU into kernel/Kconfig.preempt, which resulted in build failures
on machines not supporting CONFIG_PREEMPT. This choice was therefore moved to
init/Kconfig, which worked, but placed the choice between classic and
preemptible RCU at the top level, a very obtuse choice indeed.

This patch changes from the Kconfig "choice" mechanism to a pair of booleans,
only one of which (CONFIG_PREEMPT_RCU) is user-visible, and is located in
kernel/Kconfig.preempt, where one would expect it to be. The other
(CONFIG_CLASSIC_RCU) is in init/Kconfig so that it is available to all
architectures, hopefully avoiding build breakage. Thanks to Roman Zippel for
suggesting this approach.

Signed-off-by: Paul E. McKenney
Cc: Ingo Molnar
Acked-by: Steven Rostedt
Cc: Dipankar Sarma
Cc: Josh Triplett
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Roman Zippel
Cc: Sam Ravnborg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul E. McKenney
2008-03-11 09:01:20 +0800
e24e2e64c modules: warn about suspicious return values from module's ->init() hook ... Browse Code »

Return value convention of module's init functions is 0/-E. Sometimes,
e.g. during forward-porting mistakes happen and buggy module created,
where result of comparison "workqueue != NULL" is propagated all the way up
to sys_init_module. What happens is that some other module created
workqueue in question, our module created it again and module was
successfully loaded.

Or it could be some other bug.

Let's make such mistakes much more visible. In retrospective, such
messages would noticeably shorten some of my head-scratching sessions.

Note, that dump_stack() is just a way to get attention from user. Sample
message:

sys_init_module: 'foo'->init suspiciously returned 1, it should follow 0/-E convention
sys_init_module: loading module anyway...
Pid: 4223, comm: modprobe Not tainted 2.6.24-25f666300625d894ebe04bac2b4b3aadb907c861 #5

Call Trace:
[] sys_init_module+0xe5/0x1d0
[] system_call_after_swapgs+0x7b/0x80

Signed-off-by: Alexey Dobriyan
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2008-03-11 09:01:20 +0800
6c5db22d2 modules: fix module waiting for dependent modules' init ... Browse Code »

Commit c9a3ba55 (module: wait for dependent modules doing init.) didn't quite
work because the waiter holds the module lock, meaning that the state of the
module it's waiting for cannot change.

Fortunately, it's fairly simple to update the state outside the lock and do
the wakeup.

Thanks to Jan Glauber for tracking this down and testing (qdio and qeth).

Signed-off-by: Rusty Russell
Cc: Jan Glauber
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
2008-03-11 09:01:19 +0800

10 Mar, 2008

2 commits

bf5a25e1f Merge git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
time: remove obsolete CLOCK_TICK_ADJUST
time: don't touch an offlined CPU's ts->tick_stopped in tick_cancel_sched_timer()
time: prevent the loop in timespec_add_ns() from being optimised away
ntp: use unsigned input for do_div()

Linus Torvalds
2008-03-10 01:06:49 +0800
393d94d98 cpu hotplug: adjust root-domain->online span in response to hotplug event ... Browse Code »

We currently set the root-domain online span automatically when the
domain is added to the cpu if the cpu is already a member of
cpu_online_map.

This was done as a hack/bug-fix for s2ram, but it also causes a problem
with hotplug CPU_DOWN transitioning. The right way to fix the original
problem is to actually respond to CPU_UP events, instead of CPU_ONLINE,
which is already too late.

This solves the hung reboot regression reported by Andrew Morton and
others.

Signed-off-by: Gregory Haskins
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar
Signed-off-by: Linus Torvalds

Gregory Haskins
2008-03-10 01:05:14 +0800

09 Mar, 2008

3 commits

10a398d04 time: remove obsolete CLOCK_TICK_ADJUST ... Browse Code »

The first version of the ntp_interval/tick_length inconsistent usage patch was
recently merged as bbe4d18ac2e058c56adb0cd71f49d9ed3216a405

http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bbe4d18ac2e058c56adb0cd71f49d9ed3216a405

While the fix did greatly improve the situation, it was correctly pointed out
by Roman that it does have a small bug: If the users change clocksources after
the system has been running and NTP has made corrections, the correctoins made
against the old clocksource will be applied against the new clocksource,
causing error.

The second attempt, which corrects the issue in the NTP_INTERVAL_LENGTH
definition has also made it up-stream as commit
e13a2e61dd5152f5499d2003470acf9c838eab84

http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e13a2e61dd5152f5499d2003470acf9c838eab84

Roman has correctly pointed out that CLOCK_TICK_ADJUST is calculated
based on the PIT's frequency, and isn't really relevant to non-PIT
driven clocksources (that is, clocksources other then jiffies and pit).

This patch reverts both of those changes, and simply removes
CLOCK_TICK_ADJUST.

This does remove the granularity error correction for users of PIT and Jiffies
clocksource users, but the granularity error but for the majority of users, it
should be within the 500ppm range NTP can accommodate for.

For systems that have granularity errors greater then 500ppm, the
"ntp_tick_adj=" boot option can be used to compensate.

[johnstul@us.ibm.com: provided changelog]
[mattilinnanvuori@yahoo.com: maek ntp_tick_adj static]
Signed-off-by: Roman Zippel
Acked-by: john stultz
Signed-off-by: Matti Linnanvuori
Signed-off-by: Andrew Morton
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner

Roman Zippel
2008-03-09 15:42:57 +0800
a79017660 time: don't touch an offlined CPU's ts->tick_stopped in tick_cancel_sched_timer() ... Browse Code »

Silences WARN_ONs in rcu_enter_nohz() and rcu_exit_nohz(), which appeared
before caused by (repeated) calls to:
$ echo 0 > /sys/devices/system/cpu/cpu1/online
$ echo 1 > /sys/devices/system/cpu/cpu1/online

Signed-off-by: Karsten Wiese
Cc: johnstul@us.ibm.com
Cc: Rafael Wysocki
Cc: Steven Rostedt
Cc: Ingo Molnar
Acked-by: Paul E. McKenney
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Karsten Wiese
2008-03-09 15:42:57 +0800
e48af19f5 ntp: use unsigned input for do_div() ... Browse Code »

The kernel NTP code shouldn't hand 64-bit *signed* values to do_div(). Make it
instead hand 64-bit unsigned values. This gets rid of a couple of warnings.

Signed-off-by: David Howells
Cc: Roman Zippel
Cc: Ingo Molnar
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

David Howells
2008-03-09 15:42:57 +0800