Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

15 Aug, 2006

1 commit

6997a6faa [PATCH] sys_getppid oopses on debug kernel ... Browse Code »

sys_getppid() optimization can access a freed memory. On kernels with
DEBUG_SLAB turned ON, this results in Oops. As Dave Hansen noted, this
optimization is also unsafe for memory hotplug.

So this patch always takes the lock to be safe.

[oleg@tv-sign.ru: simplifications]
Signed-off-by: Kirill Korotaev
Cc:
Cc: Dave Hansen
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman

Kirill Korotaev
19 years ago

01 Aug, 2006

3 commits

51d8c5edd [PATCH] timer: Fix tvec_bases initializer ... Browse Code »

kernel/timer.c defines a (per-cpu) pointer to tvec_base_t, but initializes
it using { &a_tvec_base_t }, which sparse warns about; change this to just
&a_tvec_base_t.

Signed-off-by: Josh Triplett
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josh Triplett
19 years ago
6ea24f9ad [PATCH] fix bad macro param in timer.c ... Browse Code »

We have

#define INDEX(N) (base->timer_jiffies >> (TVR_BITS + N * TVN_BITS)) & TVN_MASK

and it's used via

list = varray[i + 1]->vec + (INDEX(i + 1));

So, due to underparenthesisation, this INDEX(i+1) is now a ... (TVR_BITS + i
+ 1 * TVN_BITS)) ...

So this bugfix changes behaviour. It worked before by sheer luck:

"If i was anything but 0, it was broken. But this was only used by
s390 and arm. Since it was for the next interrupt, could that next
interrupt be a problem (going into the second cascade)? But it was
probably seldom wrong. That is, this would fail if the next
interrupt was in the second cascade, and was wrapped. Which may
never of happened. Also if it did happen, it would have just missed
the interrupt.

If an interrupt was missed, and no one was there to miss it, was it
really missed :-)"

Signed-off-by: Steven Rostedt
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Steven Rostedt
19 years ago
8c78f3075 [PATCH] cpu hotplug: replace __devinit* with __cpuinit* for cpu notifications ... Browse Code »

Few of the callback functions and notifier blocks that are associated with cpu
notifications incorrectly have __devinit and __devinitdata. They should be
__cpuinit and __cpuinitdata instead.

It makes no functional difference but wastes text area when CONFIG_HOTPLUG is
enabled and CONFIG_HOTPLUG_CPU is not.

This patch fixes all those instances.

Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chandra Seetharaman
19 years ago

15 Jul, 2006

2 commits

3e143475c [PATCH] improve timekeeping resume robustness ... Browse Code »

Resolve problems seen w/ APM suspend.

Due to resume initialization ordering, its possible we could get a timer
interrupt before the timekeeping resume() function is called. This patch
ensures we don't do any timekeeping accounting before we're fully resumed.

(akpm: fixes the machine-freezes-on-APM-resume bug)

Signed-off-by: John Stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

john stultz
19 years ago
a0009652a [PATCH] del_timer_sync(): add cpu_relax() ... Browse Code »

Relax the CPU in the del_timer_sync() busywait loop.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
19 years ago

11 Jul, 2006

1 commit

e154ff3d2 [PATCH] adjust clock for lost ticks ... Browse Code »

A large number of lost ticks can cause an overadjustment of the clock. To
compensate for this we look at the current error and the larger the error
already is the more careful we are at adjusting the error. As small extra
fix reset the error when the clock is set.

Signed-off-by: Roman Zippel
Acked-by: john stultz
Cc: Uwe Bugla
Cc: James Bottomley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roman Zippel
19 years ago

04 Jul, 2006

3 commits

36c8b5868 [PATCH] sched: cleanup, remove task_t, convert to struct task_struct ... Browse Code »

cleanup: remove task_t and convert all the uses to struct task_struct. I
introduced it for the scheduler anno and it was a mistake.

Conversion was mostly scripted, the result was reviewed and all
secondary whitespace and style impact (if any) was fixed up by hand.

Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
19 years ago
d730e882a [PATCH] lockdep: annotate timer base locks ... Browse Code »

Split the per-CPU timer base locks up into separate lock classes, because they
are used recursively.

Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
19 years ago
e4d919188 [PATCH] lockdep: locking init debugging improvement ... Browse Code »

Locking init improvement:

- introduce and use __SPIN_LOCK_UNLOCKED for array initializations,
to pass in the name string of locks, used by debugging

Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
19 years ago

28 Jun, 2006

2 commits

054cc8a2d [PATCH] cpu hotplug: revert initdata patch submitted for 2.6.17 ... Browse Code »

This patch reverts notifier_block changes made in 2.6.17

Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chandra Seetharaman
19 years ago
9c7b216d2 [PATCH] cpu hotplug: revert init patch submitted for 2.6.17 ... Browse Code »

In 2.6.17, there was a problem with cpu_notifiers and XFS. I provided a
band-aid solution to solve that problem. In the process, i undid all the
changes you both were making to ensure that these notifiers were available
only at init time (unless CONFIG_HOTPLUG_CPU is defined).

We deferred the real fix to 2.6.18. Here is a set of patches that fixes the
XFS problem cleanly and makes the cpu notifiers available only at init time
(unless CONFIG_HOTPLUG_CPU is defined).

If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run
time.

This patch reverts the notifier_call changes made in 2.6.17

Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chandra Seetharaman
19 years ago

27 Jun, 2006

6 commits

19923c190 [PATCH] fix and optimize clock source update ... Browse Code »

This fixes the clock source updates in update_wall_time() to correctly
track the time coming in via current_tick_length(). Optimize the fast
paths to be as short as possible to keep the overhead low.

Signed-off-by: Roman Zippel
Acked-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roman Zippel
19 years ago
a27525497 [PATCH] time: rename clocksource functions ... Browse Code »

As suggested by Roman Zippel, change clocksource functions to use
clocksource_xyz rather then xyz_clocksource to avoid polluting the
namespace.

Signed-off-by: John Stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

john stultz
19 years ago
cf3c769b4 [PATCH] Time: Introduce arch generic time accessors ... Browse Code »

Introduces clocksource switching code and the arch generic time accessor
functions that use the clocksource infrastructure.

Signed-off-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

john stultz
19 years ago
5eb6d2053 [PATCH] Time: Use clocksource abstraction for NTP adjustments ... Browse Code »

Instead of incrementing xtime by tick_nsec + ntp adjustments, use the
clocksource abstraction to increment and scale time. Using the clocksource
abstraction allows other clocksources to be used consistently in the face of
late or lost ticks, while preserving the existing behavior via the jiffies
clocksource.

This removes the need to keep time_phase adjustments as we just use the
current_tick_length() function as the NTP interface and accumulate time using
shifted nanoseconds.

The basics of this design was by Roman Zippel, however it is my own
interpretation and implementation, so the credit should go to him and the
blame to me.

Signed-off-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

john stultz
19 years ago
260a42309 [PATCH] Time: Let user request precision from current_tick_length() ... Browse Code »

Change the current_tick_length() function so it takes an argument which
specifies how much precision to return in shifted nanoseconds. This provides
a simple way to convert between NTPs internal nanoseconds shifted by
(SHIFT_SCALE - 10) to other shifted nanosecond units that are used by the
clocksource abstraction.

Signed-off-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

john stultz
19 years ago
ad596171e [PATCH] Time: Use clocksource infrastructure for update_wall_time ... Browse Code »

Modify the update_wall_time function so it increments time using the
clocksource abstraction instead of jiffies. Since the only clocksource driver
currently provided is the jiffies clocksource, this should result in no
functional change. Additionally, a timekeeping_init and timekeeping_resume
function has been added to initialize and maintain some of the new timekeping
state.

[hirofumi@mail.parknet.co.jp: fixlet]
Signed-off-by: John Stultz
Signed-off-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

john stultz
19 years ago

26 Jun, 2006

1 commit

bfe5d8341 [PATCH] Define __raw_get_cpu_var and use it ... Browse Code »

There are several instances of per_cpu(foo, raw_smp_processor_id()), which
is semantically equivalent to __get_cpu_var(foo) but without the warning
that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled. For
those architectures with optimized per-cpu implementations, namely ia64,
powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower
code than __get_cpu_var(), so it would be preferable to use __get_cpu_var
on those platforms.

This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x,
raw_smp_processor_id()) on architectures that use the generic per-cpu
implementation, and turns into __get_cpu_var(x) on the architectures that
have an optimized per-cpu implementation.

Signed-off-by: Paul Mackerras
Acked-by: David S. Miller
Acked-by: Ingo Molnar
Acked-by: Martin Schwidefsky
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Mackerras
19 years ago

23 Jun, 2006

2 commits

3439dd86e [PATCH] When CONFIG_BASE_SMALL=1, cascade() may enter an infinite loop ... Browse Code »

When CONFIG_BASE_SAMLL=1, cascade() in may enter the infinite loop.
Because of CONFIG_BASE_SMALL=1(TVR_BITS=6 and TVN_BITS=4), the list
base->tv5 may cascade into base->tv5. So, the kernel enters the infinite
loop in the function cascade().

I created a test module to verify this bug, and a patch to fix it.

#include
#include
#include
#include
#if 0
#include
#else
#define kdb_printf printk
#endif

#define TVN_BITS (CONFIG_BASE_SMALL ? 4 : 6)
#define TVR_BITS (CONFIG_BASE_SMALL ? 6 : 8)
#define TVN_SIZE (1 << TVN_BITS)
#define TVR_SIZE (1 << TVR_BITS)
#define TVN_MASK (TVN_SIZE - 1)
#define TVR_MASK (TVR_SIZE - 1)

#define TV_SIZE(N) (N*TVN_BITS + TVR_BITS)

struct timer_list timer0;
struct timer_list dummy_timer1;
struct timer_list dummy_timer2;

void dummy_timer_fun(unsigned long data) {
}
unsigned long j=0;
void check_timer_base(unsigned long data)
{
kdb_printf("check_timer_base %08x\n",jiffies);
mod_timer(&timer0,(jiffies & (~0xFFF)) + 0x1FFF);
}

int init_module(void)
{
init_timer(&timer0);
timer0.data = (unsigned long)0;
timer0.function = check_timer_base;
mod_timer(&timer0,jiffies+1);

init_timer(&dummy_timer1);
dummy_timer1.data = (unsigned long)0;
dummy_timer1.function = dummy_timer_fun;

init_timer(&dummy_timer2);
dummy_timer2.data = (unsigned long)0;
dummy_timer2.function = dummy_timer_fun;

j=jiffies;
j&=(~((1<<<
Cc: Matt Mackall
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Porpoise
19 years ago
626ab0e69 [PATCH] list: use list_replace_init() instead of list_splice_init() ... Browse Code »

list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1. We can use list_replace_init() instead.

Signed-off-by: Oleg Nesterov
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
19 years ago

22 May, 2006

1 commit

0662b7132 [PATCH] Fix a NO_IDLE_HZ timer bug ... Browse Code »

Under certain timing conditions, a race during boot occurs where timer
ticks are being processed on remote CPUs. The remote timer ticks can
increment jiffies, and if this happens during a window when a timeout is
very close to expiring but a local tick has not yet been delivered, you can
end up with

1) No softirq pending
2) A local timer wheel which is not synced to jiffies
3) No high resolution timer active
4) A local timer which is supposed to fire before the current jiffies value.

In this circumstance, the comparison in next_timer_interrupt overflows,
because the base of the comparison for high resolution timers is jiffies,
but for the softirq timer wheel, it is relative the the current base of the
wheel (jiffies_base).

Signed-off-by: Zachary Amsden
Cc: Martin Schwidefsky
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
19 years ago

26 Apr, 2006

2 commits

83d722f7e [PATCH] Remove __devinit and __cpuinit from notifier_call definitions ... Browse Code »

Few of the notifier_chain_register() callers use __init in the definition
of notifier_call. It is incorrect as the function definition should be
available after the initializations (they do not unregister them during
initializations).

This patch fixes all such usages to _not_ have the notifier_call __init
section.

Signed-off-by: Chandra Seetharaman
Signed-off-by: Linus Torvalds

Chandra Seetharaman
19 years ago
649bbaa48 [PATCH] Remove __devinitdata from notifier block definitions ... Browse Code »

Few of the notifier_chain_register() callers use __devinitdata in the
definition of notifier_block data structure. It is incorrect as the
data structure should be available after the initializations (they do
not unregister them during initializations).

This was leading to an oops when notifier_chain_register() call is
invoked for those callback chains after initialization.

This patch fixes all such usages to _not_ have the notifier_block data
structure in the init data section.

Signed-off-by: Chandra Seetharaman
Signed-off-by: Linus Torvalds

Chandra Seetharaman
19 years ago

11 Apr, 2006

1 commit

ba6edfcd1 [PATCH] timer initialisation fix ... Browse Code »

We need the boot CPU's tvec_bases[] entry to be initialised super-early in
boot, for early_serial_setup(). That runs within setup_arch(), before even
per-cpu areas are initialised.

The patch changes tvec_bases to use compile-time initialisation, and adds a
separate array `tvec_base_done' to keep track of which CPU has had its
tvec_bases[] entry initialised (because we can no longer use the zeroness of
that tvec_bases[] entry to determine whether it has been initialised).

Thanks to Eugene Surovegin for diagnosing this.

Cc: Eugene Surovegin
Cc: Jan Beulich
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
19 years ago

10 Apr, 2006

1 commit

b20367a6c [PATCH] x86_64: Fix drift with HPET timer enabled ... Browse Code »

If the HPET timer is enabled, the clock can drift by ~3 seconds a day.
This is due to the HPET timer not being initialized with the correct
setting (still using PIT count).

If HZ changes, this drift can become even more pronounced.

HPET patch initializes tick_nsec with correct tick_nsec settings for
HPET timer.

Vojtech comments:

"It's not entirely correct (it assumes the HPET ticks totally
exactly), but it's significantly better than assuming the PIT error
there."

Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Jordan Hargrave
19 years ago

02 Apr, 2006

1 commit

9f31252cb BUG_ON() Conversion in kernel/signal.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
19 years ago

01 Apr, 2006

3 commits

db1b1fefc [PATCH] sched: reduce overhead of calc_load ... Browse Code »

Currently, count_active_tasks() calls both nr_running() &
nr_interruptible(). Each of these functions does a "for_each_cpu" & reads
values from the runqueue of each cpu. Although this is not a lot of
instructions, each runqueue may be located on different node. Depending on
the architecture, a unique TLB entry may be required to access each
runqueue.

Since there may be more runqueues than cpu TLB entries, a scan of all
runqueues can trash the TLB. Each memory reference incurs a TLB miss &
refill.

In addition, the runqueue cacheline that contains nr_running &
nr_uninterruptible may be evicted from the cache between the two passes.
This causes unnecessary cache misses.

Combining nr_running() & nr_interruptible() into a single function
substantially reduces the TLB & cache misses on large systems. This should
have no measureable effect on smaller systems.

On a 128p IA64 system running a memory stress workload, the new function
reduced the overhead of calc_load() from 605 usec/call to 324 usec/call.

Signed-off-by: Jack Steiner
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jack Steiner
19 years ago
a2c348fe0 [PATCH] __mod_timer: simplify ->base changing ... Browse Code »

Since base and new_base are of the same type now, we can save one 'if'
branch and simplify the code a bit.

Signed-off-by: Oleg Nesterov
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
19 years ago
3691c5199 [PATCH] kill __init_timer_base in favor of boot_tvec_bases ... Browse Code »

Commit a4a6198b80cf82eb8160603c98da218d1bd5e104:
[PATCH] tvec_bases too large for per-cpu data

introduced "struct tvec_t_base_s boot_tvec_bases" which is visible at
compile time. This means we can kill __init_timer_base and move
timer_base_s's content into tvec_t_base_s.

Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
19 years ago

26 Mar, 2006

2 commits

5ddcfa878 [PATCH] remove pps support ... Browse Code »

This removes the support for pps. It's completely unused within the kernel
and is basically in the way for further cleanups. It should be easier to
readd proper support for it after the rest has been converted to NTP4
(where the pps mechanisms are quite different from NTP3 anyway).

Signed-off-by: Roman Zippel
Cc: Adrian Bunk
Cc: john stultz
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roman Zippel
19 years ago
c08b8a491 [PATCH] sys_alarm() unsigned signed conversion fixup ... Browse Code »

alarm() calls the kernel with an unsigend int timeout in seconds. The
value is stored in the tv_sec field of a struct timeval to setup the
itimer. The tv_sec field of struct timeval is of type long, which causes
the tv_sec value to be negative on 32 bit machines if seconds > INT_MAX.

Before the hrtimer merge (pre 2.6.16) such a negative value was converted
to the maximum jiffies timeout by the timeval_to_jiffies conversion. It's
not clear whether this was intended or just happened to be done by the
timeval_to_jiffies code.

hrtimers expect a timeval in canonical form and treat a negative timeout as
already expired. This breaks the legitimate usage of alarm() with a
timeout value > INT_MAX seconds.

For 32 bit machines it is therefor necessary to limit the internal seconds
value to avoid API breakage. Instead of doing this in all implementations
of sys_alarm the duplicated sys_alarm code is moved into a common function
in itimer.c

Signed-off-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
19 years ago

24 Mar, 2006

2 commits

6687a97d4 [PATCH] timer-irq-driven soft-watchdog, cleanups ... Browse Code »

Make the softlockup detector purely timer-interrupt driven, removing
softirq-context (timer) dependencies. This means that if the softlockup
watchdog triggers, it has truly observed a longer than 10 seconds
scheduling delay of a SCHED_FIFO prio 99 task.

(the patch also turns off the softlockup detector during the initial bootup
phase and does small style fixes)

Signed-off-by: Ingo Molnar
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
19 years ago
a4a6198b8 [PATCH] tvec_bases too large for per-cpu data ... Browse Code »

With internal Xen-enabled kernels we see the kernel's static per-cpu data
area exceed the limit of 32k on x86-64, and even native x86-64 kernels get
fairly close to that limit. I generally question whether it is reasonable
to have data structures several kb in size allocated as per-cpu data when
the space there is rather limited.

The biggest arch-independent consumer is tvec_bases (over 4k on 32-bit
archs, over 8k on 64-bit ones), which now gets converted to use dynamically
allocated memory instead.

Signed-off-by: Jan Beulich
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
19 years ago

17 Mar, 2006

1 commit

67890d708 [PATCH] time_interpolator: add __read_mostly ... Browse Code »

The pointer to the current time interpolator and the current list of time
interpolators are typically only changed during bootup. Adding
__read_mostly takes them away from possibly hot cachelines.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
19 years ago

07 Mar, 2006

2 commits

5aee405c6 [PATCH] time: add barrier after updating jiffies_64 ... Browse Code »

Add a compiler barrier so that we don't read jiffies before updating
jiffies_64.

Signed-off-by: Atsushi Nemoto
Cc: Ralf Baechle
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Atsushi Nemoto
19 years ago
69239749e [PATCH] fix next_timer_interrupt() for hrtimer ... Browse Code »

Also from Thomas Gleixner

Function next_timer_interrupt() got broken with a recent patch
6ba1b91213e81aa92b5cf7539f7d2a94ff54947c as sys_nanosleep() was moved to
hrtimer. This broke things as next_timer_interrupt() did not check hrtimer
tree for next event.

Function next_timer_interrupt() is needed with dyntick (CONFIG_NO_IDLE_HZ,
VST) implementations, as the system can be in idle when next hrtimer event
was supposed to happen. At least ARM and S390 currently use
next_timer_interrupt().

Signed-off-by: Thomas Gleixner
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tony Lindgren
19 years ago

03 Mar, 2006

1 commit

685db65e4 [PATCH] time_interpolator: Use readq_relaxed() instead of readq(). ... Browse Code »

On some platforms readq performs additional work to make sure I/O is done
in a coherent way. This is not needed for time retrieval as done by the
time interpolator. So we can use readq_relaxed instead which will improve
performance.

It affects sparc64 and ia64 only. Apparently it makes a significant
difference on ia64.

Signed-off-by: Christoph Lameter
Cc: john stultz
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
19 years ago

18 Feb, 2006

1 commit

726c14bf4 [PATCH] Provide an interface for getting the current tick length ... Browse Code »

This provides an interface for arch code to find out how many
nanoseconds are going to be added on to xtime by the next call to
do_timer. The value returned is a fixed-point number in 52.12 format
in nanoseconds. The reason for this format is that it gives the
full precision that the timekeeping code is using internally.

The motivation for this is to fix a problem that has arisen on 32-bit
powerpc in that the value returned by do_gettimeofday drifts apart
from xtime if NTP is being used. PowerPC is now using a lockless
do_gettimeofday based on reading the timebase register and performing
some simple arithmetic. (This method of getting the time is also
exported to userspace via the VDSO.) However, the factor and offset
it uses were calculated based on the nominal tick length and weren't
being adjusted when NTP varied the tick length.

Note that 64-bit powerpc has had the lockless do_gettimeofday for a
long time now. It also had an extremely hairy routine that got called
from the 32-bit compat routine for adjtimex, which adjusted the
factor and offset according to what it thought the timekeeping code
was going to do. Not only was this only called if a 32-bit task did
adjtimex (i.e. not if a 64-bit task did adjtimex), it was also
duplicating computations from kernel/timer.c and it wasn't clear that
it was (still) correct.

The simple solution is to ask the timekeeping code how long the
current jiffy will be on each timer interrupt, after calling
do_timer. If this jiffy will be a different length from the last one,
we then need to compute new values for the factor and offset used in
the lockless do_gettimeofday. In this way we can keep xtime and
do_gettimeofday in sync, even when NTP is varying the tick length.

Note that when adjtimex varies the tick length, it almost always
introduces the variation from the next tick on. The only case I could
see where adjtimex would vary the length of the current tick is when
an old-style adjtime adjustment is being cancelled. (It's not clear
to me why the adjustment has to be cancelled immediately rather than
from the next tick on.) Thus I don't see any real need for a hook in
adjtimex; the rare case of an old-style adjustment being cancelled can
be fixed up at the next tick.

Signed-off-by: Paul Mackerras
Acked-by: john stultz
Signed-off-by: Linus Torvalds

Paul Mackerras
19 years ago

08 Feb, 2006

1 commit

53f087feb [PATCH] timer.c NULL noise removal ... Browse Code »

Signed-off-by: Al Viro

Al Viro
19 years ago