30 Sep, 2006
4 commits
-
Pass ticks to do_timer() and update_times(), and adjust x86_64 and s390
timer interrupt handler with this change.Currently update_times() calculates ticks by "jiffies - wall_jiffies", but
callers of do_timer() should know how many ticks to update. Passing ticks
get rid of this redundant calculation. Also there are another redundancy
pointed out by Martin Schwidefsky.This cleanup make a barrier added by
5aee405c662ca644980c184774277fc6d0769a84 needless. So this patch removes
it.As a bonus, this cleanup make wall_jiffies can be removed easily, since now
wall_jiffies is always synced with jiffies. (This patch does not really
remove wall_jiffies. It would be another cleanup patch)Signed-off-by: Atsushi Nemoto
Cc: Martin Schwidefsky
Cc: "Eric W. Biederman"
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: john stultz
Cc: Andi Kleen
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Acked-by: Russell King
Cc: Ian Molton
Cc: Mikael Starvik
Acked-by: David Howells
Cc: Yoshinori Sato
Cc: Hirokazu Takata
Acked-by: Ralf Baechle
Cc: Kyle McMartin
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Paul Mundt
Cc: Kazumoto Kojima
Cc: Richard Curnow
Cc: William Lee Irwin III
Cc: "David S. Miller"
Cc: Jeff Dike
Cc: Paolo 'Blaisorblade' Giarrusso
Cc: Miles Bader
Cc: Chris Zankel
Acked-by: "Luck, Tony"
Cc: Geert Uytterhoeven
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Spawing ksoftirqd, migration, or watchdog, and calling init_timers_cpu()
may fail with small memory. If it happens in initcalls, kernel NULL
pointer dereference happens later. This patch makes crash happen
immediately in such cases. It seems a bit better than getting kernel NULL
pointer dereference later.Cc: Ingo Molnar
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Some of the kerneldoc comments in this file are ignored since the lead-in
is malformed, using either "/*" or "/***" instead of "/**".[rdunlap@xenotime.net: kerneldoc fixes]
Signed-off-by: Rolf Eike Beer
Acked-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
lock_timer_base acquires a lock and returns with that lock held. Add a
lock annotation to this function so that sparse can check callers for lock
pairing, and so that sparse will not complain about this function since it
intentionally uses the lock in this manner.Signed-off-by: Josh Triplett
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Aug, 2006
1 commit
-
sys_getppid() optimization can access a freed memory. On kernels with
DEBUG_SLAB turned ON, this results in Oops. As Dave Hansen noted, this
optimization is also unsafe for memory hotplug.So this patch always takes the lock to be safe.
[oleg@tv-sign.ru: simplifications]
Signed-off-by: Kirill Korotaev
Cc:
Cc: Dave Hansen
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman
01 Aug, 2006
3 commits
-
kernel/timer.c defines a (per-cpu) pointer to tvec_base_t, but initializes
it using { &a_tvec_base_t }, which sparse warns about; change this to just
&a_tvec_base_t.Signed-off-by: Josh Triplett
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We have
#define INDEX(N) (base->timer_jiffies >> (TVR_BITS + N * TVN_BITS)) & TVN_MASK
and it's used via
list = varray[i + 1]->vec + (INDEX(i + 1));
So, due to underparenthesisation, this INDEX(i+1) is now a ... (TVR_BITS + i
+ 1 * TVN_BITS)) ...So this bugfix changes behaviour. It worked before by sheer luck:
"If i was anything but 0, it was broken. But this was only used by
s390 and arm. Since it was for the next interrupt, could that next
interrupt be a problem (going into the second cascade)? But it was
probably seldom wrong. That is, this would fail if the next
interrupt was in the second cascade, and was wrapped. Which may
never of happened. Also if it did happen, it would have just missed
the interrupt.If an interrupt was missed, and no one was there to miss it, was it
really missed :-)"Signed-off-by: Steven Rostedt
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Few of the callback functions and notifier blocks that are associated with cpu
notifications incorrectly have __devinit and __devinitdata. They should be
__cpuinit and __cpuinitdata instead.It makes no functional difference but wastes text area when CONFIG_HOTPLUG is
enabled and CONFIG_HOTPLUG_CPU is not.This patch fixes all those instances.
Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Jul, 2006
2 commits
-
Resolve problems seen w/ APM suspend.
Due to resume initialization ordering, its possible we could get a timer
interrupt before the timekeeping resume() function is called. This patch
ensures we don't do any timekeeping accounting before we're fully resumed.(akpm: fixes the machine-freezes-on-APM-resume bug)
Signed-off-by: John Stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Relax the CPU in the del_timer_sync() busywait loop.
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Jul, 2006
1 commit
-
A large number of lost ticks can cause an overadjustment of the clock. To
compensate for this we look at the current error and the larger the error
already is the more careful we are at adjusting the error. As small extra
fix reset the error when the clock is set.Signed-off-by: Roman Zippel
Acked-by: john stultz
Cc: Uwe Bugla
Cc: James Bottomley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Jul, 2006
3 commits
-
cleanup: remove task_t and convert all the uses to struct task_struct. I
introduced it for the scheduler anno and it was a mistake.Conversion was mostly scripted, the result was reviewed and all
secondary whitespace and style impact (if any) was fixed up by hand.Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Split the per-CPU timer base locks up into separate lock classes, because they
are used recursively.Has no effect on non-lockdep kernels.
Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Locking init improvement:
- introduce and use __SPIN_LOCK_UNLOCKED for array initializations,
to pass in the name string of locks, used by debuggingSigned-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Jun, 2006
2 commits
-
This patch reverts notifier_block changes made in 2.6.17
Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In 2.6.17, there was a problem with cpu_notifiers and XFS. I provided a
band-aid solution to solve that problem. In the process, i undid all the
changes you both were making to ensure that these notifiers were available
only at init time (unless CONFIG_HOTPLUG_CPU is defined).We deferred the real fix to 2.6.18. Here is a set of patches that fixes the
XFS problem cleanly and makes the cpu notifiers available only at init time
(unless CONFIG_HOTPLUG_CPU is defined).If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run
time.This patch reverts the notifier_call changes made in 2.6.17
Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jun, 2006
6 commits
-
This fixes the clock source updates in update_wall_time() to correctly
track the time coming in via current_tick_length(). Optimize the fast
paths to be as short as possible to keep the overhead low.Signed-off-by: Roman Zippel
Acked-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As suggested by Roman Zippel, change clocksource functions to use
clocksource_xyz rather then xyz_clocksource to avoid polluting the
namespace.Signed-off-by: John Stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduces clocksource switching code and the arch generic time accessor
functions that use the clocksource infrastructure.Signed-off-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Instead of incrementing xtime by tick_nsec + ntp adjustments, use the
clocksource abstraction to increment and scale time. Using the clocksource
abstraction allows other clocksources to be used consistently in the face of
late or lost ticks, while preserving the existing behavior via the jiffies
clocksource.This removes the need to keep time_phase adjustments as we just use the
current_tick_length() function as the NTP interface and accumulate time using
shifted nanoseconds.The basics of this design was by Roman Zippel, however it is my own
interpretation and implementation, so the credit should go to him and the
blame to me.Signed-off-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change the current_tick_length() function so it takes an argument which
specifies how much precision to return in shifted nanoseconds. This provides
a simple way to convert between NTPs internal nanoseconds shifted by
(SHIFT_SCALE - 10) to other shifted nanosecond units that are used by the
clocksource abstraction.Signed-off-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Modify the update_wall_time function so it increments time using the
clocksource abstraction instead of jiffies. Since the only clocksource driver
currently provided is the jiffies clocksource, this should result in no
functional change. Additionally, a timekeeping_init and timekeeping_resume
function has been added to initialize and maintain some of the new timekeping
state.[hirofumi@mail.parknet.co.jp: fixlet]
Signed-off-by: John Stultz
Signed-off-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Jun, 2006
1 commit
-
There are several instances of per_cpu(foo, raw_smp_processor_id()), which
is semantically equivalent to __get_cpu_var(foo) but without the warning
that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled. For
those architectures with optimized per-cpu implementations, namely ia64,
powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower
code than __get_cpu_var(), so it would be preferable to use __get_cpu_var
on those platforms.This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x,
raw_smp_processor_id()) on architectures that use the generic per-cpu
implementation, and turns into __get_cpu_var(x) on the architectures that
have an optimized per-cpu implementation.Signed-off-by: Paul Mackerras
Acked-by: David S. Miller
Acked-by: Ingo Molnar
Acked-by: Martin Schwidefsky
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Jun, 2006
2 commits
-
When CONFIG_BASE_SAMLL=1, cascade() in may enter the infinite loop.
Because of CONFIG_BASE_SMALL=1(TVR_BITS=6 and TVN_BITS=4), the list
base->tv5 may cascade into base->tv5. So, the kernel enters the infinite
loop in the function cascade().I created a test module to verify this bug, and a patch to fix it.
#include
#include
#include
#include
#if 0
#include
#else
#define kdb_printf printk
#endif#define TVN_BITS (CONFIG_BASE_SMALL ? 4 : 6)
#define TVR_BITS (CONFIG_BASE_SMALL ? 6 : 8)
#define TVN_SIZE (1 << TVN_BITS)
#define TVR_SIZE (1 << TVR_BITS)
#define TVN_MASK (TVN_SIZE - 1)
#define TVR_MASK (TVR_SIZE - 1)#define TV_SIZE(N) (N*TVN_BITS + TVR_BITS)
struct timer_list timer0;
struct timer_list dummy_timer1;
struct timer_list dummy_timer2;void dummy_timer_fun(unsigned long data) {
}
unsigned long j=0;
void check_timer_base(unsigned long data)
{
kdb_printf("check_timer_base %08x\n",jiffies);
mod_timer(&timer0,(jiffies & (~0xFFF)) + 0x1FFF);
}int init_module(void)
{
init_timer(&timer0);
timer0.data = (unsigned long)0;
timer0.function = check_timer_base;
mod_timer(&timer0,jiffies+1);init_timer(&dummy_timer1);
dummy_timer1.data = (unsigned long)0;
dummy_timer1.function = dummy_timer_fun;init_timer(&dummy_timer2);
dummy_timer2.data = (unsigned long)0;
dummy_timer2.function = dummy_timer_fun;j=jiffies;
j&=(~((1<<<
Cc: Matt Mackall
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1. We can use list_replace_init() instead.Signed-off-by: Oleg Nesterov
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 May, 2006
1 commit
-
Under certain timing conditions, a race during boot occurs where timer
ticks are being processed on remote CPUs. The remote timer ticks can
increment jiffies, and if this happens during a window when a timeout is
very close to expiring but a local tick has not yet been delivered, you can
end up with1) No softirq pending
2) A local timer wheel which is not synced to jiffies
3) No high resolution timer active
4) A local timer which is supposed to fire before the current jiffies value.In this circumstance, the comparison in next_timer_interrupt overflows,
because the base of the comparison for high resolution timers is jiffies,
but for the softirq timer wheel, it is relative the the current base of the
wheel (jiffies_base).Signed-off-by: Zachary Amsden
Cc: Martin Schwidefsky
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Apr, 2006
2 commits
-
Few of the notifier_chain_register() callers use __init in the definition
of notifier_call. It is incorrect as the function definition should be
available after the initializations (they do not unregister them during
initializations).This patch fixes all such usages to _not_ have the notifier_call __init
section.Signed-off-by: Chandra Seetharaman
Signed-off-by: Linus Torvalds -
Few of the notifier_chain_register() callers use __devinitdata in the
definition of notifier_block data structure. It is incorrect as the
data structure should be available after the initializations (they do
not unregister them during initializations).This was leading to an oops when notifier_chain_register() call is
invoked for those callback chains after initialization.This patch fixes all such usages to _not_ have the notifier_block data
structure in the init data section.Signed-off-by: Chandra Seetharaman
Signed-off-by: Linus Torvalds
11 Apr, 2006
1 commit
-
We need the boot CPU's tvec_bases[] entry to be initialised super-early in
boot, for early_serial_setup(). That runs within setup_arch(), before even
per-cpu areas are initialised.The patch changes tvec_bases to use compile-time initialisation, and adds a
separate array `tvec_base_done' to keep track of which CPU has had its
tvec_bases[] entry initialised (because we can no longer use the zeroness of
that tvec_bases[] entry to determine whether it has been initialised).Thanks to Eugene Surovegin for diagnosing this.
Cc: Eugene Surovegin
Cc: Jan Beulich
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Apr, 2006
1 commit
-
If the HPET timer is enabled, the clock can drift by ~3 seconds a day.
This is due to the HPET timer not being initialized with the correct
setting (still using PIT count).If HZ changes, this drift can become even more pronounced.
HPET patch initializes tick_nsec with correct tick_nsec settings for
HPET timer.Vojtech comments:
"It's not entirely correct (it assumes the HPET ticks totally
exactly), but it's significantly better than assuming the PIT error
there."Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds
02 Apr, 2006
1 commit
-
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk
01 Apr, 2006
3 commits
-
Currently, count_active_tasks() calls both nr_running() &
nr_interruptible(). Each of these functions does a "for_each_cpu" & reads
values from the runqueue of each cpu. Although this is not a lot of
instructions, each runqueue may be located on different node. Depending on
the architecture, a unique TLB entry may be required to access each
runqueue.Since there may be more runqueues than cpu TLB entries, a scan of all
runqueues can trash the TLB. Each memory reference incurs a TLB miss &
refill.In addition, the runqueue cacheline that contains nr_running &
nr_uninterruptible may be evicted from the cache between the two passes.
This causes unnecessary cache misses.Combining nr_running() & nr_interruptible() into a single function
substantially reduces the TLB & cache misses on large systems. This should
have no measureable effect on smaller systems.On a 128p IA64 system running a memory stress workload, the new function
reduced the overhead of calc_load() from 605 usec/call to 324 usec/call.Signed-off-by: Jack Steiner
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since base and new_base are of the same type now, we can save one 'if'
branch and simplify the code a bit.Signed-off-by: Oleg Nesterov
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit a4a6198b80cf82eb8160603c98da218d1bd5e104:
[PATCH] tvec_bases too large for per-cpu dataintroduced "struct tvec_t_base_s boot_tvec_bases" which is visible at
compile time. This means we can kill __init_timer_base and move
timer_base_s's content into tvec_t_base_s.Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Mar, 2006
2 commits
-
This removes the support for pps. It's completely unused within the kernel
and is basically in the way for further cleanups. It should be easier to
readd proper support for it after the rest has been converted to NTP4
(where the pps mechanisms are quite different from NTP3 anyway).Signed-off-by: Roman Zippel
Cc: Adrian Bunk
Cc: john stultz
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
alarm() calls the kernel with an unsigend int timeout in seconds. The
value is stored in the tv_sec field of a struct timeval to setup the
itimer. The tv_sec field of struct timeval is of type long, which causes
the tv_sec value to be negative on 32 bit machines if seconds > INT_MAX.Before the hrtimer merge (pre 2.6.16) such a negative value was converted
to the maximum jiffies timeout by the timeval_to_jiffies conversion. It's
not clear whether this was intended or just happened to be done by the
timeval_to_jiffies code.hrtimers expect a timeval in canonical form and treat a negative timeout as
already expired. This breaks the legitimate usage of alarm() with a
timeout value > INT_MAX seconds.For 32 bit machines it is therefor necessary to limit the internal seconds
value to avoid API breakage. Instead of doing this in all implementations
of sys_alarm the duplicated sys_alarm code is moved into a common function
in itimer.cSigned-off-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Mar, 2006
2 commits
-
Make the softlockup detector purely timer-interrupt driven, removing
softirq-context (timer) dependencies. This means that if the softlockup
watchdog triggers, it has truly observed a longer than 10 seconds
scheduling delay of a SCHED_FIFO prio 99 task.(the patch also turns off the softlockup detector during the initial bootup
phase and does small style fixes)Signed-off-by: Ingo Molnar
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With internal Xen-enabled kernels we see the kernel's static per-cpu data
area exceed the limit of 32k on x86-64, and even native x86-64 kernels get
fairly close to that limit. I generally question whether it is reasonable
to have data structures several kb in size allocated as per-cpu data when
the space there is rather limited.The biggest arch-independent consumer is tvec_bases (over 4k on 32-bit
archs, over 8k on 64-bit ones), which now gets converted to use dynamically
allocated memory instead.Signed-off-by: Jan Beulich
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Mar, 2006
1 commit
-
The pointer to the current time interpolator and the current list of time
interpolators are typically only changed during bootup. Adding
__read_mostly takes them away from possibly hot cachelines.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Mar, 2006
1 commit
-
Add a compiler barrier so that we don't read jiffies before updating
jiffies_64.Signed-off-by: Atsushi Nemoto
Cc: Ralf Baechle
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds