15 Aug, 2006
1 commit
-
sys_getppid() optimization can access a freed memory. On kernels with
DEBUG_SLAB turned ON, this results in Oops. As Dave Hansen noted, this
optimization is also unsafe for memory hotplug.So this patch always takes the lock to be safe.
[oleg@tv-sign.ru: simplifications]
Signed-off-by: Kirill Korotaev
Cc:
Cc: Dave Hansen
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman
01 Aug, 2006
3 commits
-
kernel/timer.c defines a (per-cpu) pointer to tvec_base_t, but initializes
it using { &a_tvec_base_t }, which sparse warns about; change this to just
&a_tvec_base_t.Signed-off-by: Josh Triplett
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We have
#define INDEX(N) (base->timer_jiffies >> (TVR_BITS + N * TVN_BITS)) & TVN_MASK
and it's used via
list = varray[i + 1]->vec + (INDEX(i + 1));
So, due to underparenthesisation, this INDEX(i+1) is now a ... (TVR_BITS + i
+ 1 * TVN_BITS)) ...So this bugfix changes behaviour. It worked before by sheer luck:
"If i was anything but 0, it was broken. But this was only used by
s390 and arm. Since it was for the next interrupt, could that next
interrupt be a problem (going into the second cascade)? But it was
probably seldom wrong. That is, this would fail if the next
interrupt was in the second cascade, and was wrapped. Which may
never of happened. Also if it did happen, it would have just missed
the interrupt.If an interrupt was missed, and no one was there to miss it, was it
really missed :-)"Signed-off-by: Steven Rostedt
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Few of the callback functions and notifier blocks that are associated with cpu
notifications incorrectly have __devinit and __devinitdata. They should be
__cpuinit and __cpuinitdata instead.It makes no functional difference but wastes text area when CONFIG_HOTPLUG is
enabled and CONFIG_HOTPLUG_CPU is not.This patch fixes all those instances.
Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Jul, 2006
2 commits
-
Resolve problems seen w/ APM suspend.
Due to resume initialization ordering, its possible we could get a timer
interrupt before the timekeeping resume() function is called. This patch
ensures we don't do any timekeeping accounting before we're fully resumed.(akpm: fixes the machine-freezes-on-APM-resume bug)
Signed-off-by: John Stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Relax the CPU in the del_timer_sync() busywait loop.
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Jul, 2006
1 commit
-
A large number of lost ticks can cause an overadjustment of the clock. To
compensate for this we look at the current error and the larger the error
already is the more careful we are at adjusting the error. As small extra
fix reset the error when the clock is set.Signed-off-by: Roman Zippel
Acked-by: john stultz
Cc: Uwe Bugla
Cc: James Bottomley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Jul, 2006
3 commits
-
cleanup: remove task_t and convert all the uses to struct task_struct. I
introduced it for the scheduler anno and it was a mistake.Conversion was mostly scripted, the result was reviewed and all
secondary whitespace and style impact (if any) was fixed up by hand.Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Split the per-CPU timer base locks up into separate lock classes, because they
are used recursively.Has no effect on non-lockdep kernels.
Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Locking init improvement:
- introduce and use __SPIN_LOCK_UNLOCKED for array initializations,
to pass in the name string of locks, used by debuggingSigned-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Jun, 2006
2 commits
-
This patch reverts notifier_block changes made in 2.6.17
Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In 2.6.17, there was a problem with cpu_notifiers and XFS. I provided a
band-aid solution to solve that problem. In the process, i undid all the
changes you both were making to ensure that these notifiers were available
only at init time (unless CONFIG_HOTPLUG_CPU is defined).We deferred the real fix to 2.6.18. Here is a set of patches that fixes the
XFS problem cleanly and makes the cpu notifiers available only at init time
(unless CONFIG_HOTPLUG_CPU is defined).If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run
time.This patch reverts the notifier_call changes made in 2.6.17
Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jun, 2006
6 commits
-
This fixes the clock source updates in update_wall_time() to correctly
track the time coming in via current_tick_length(). Optimize the fast
paths to be as short as possible to keep the overhead low.Signed-off-by: Roman Zippel
Acked-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As suggested by Roman Zippel, change clocksource functions to use
clocksource_xyz rather then xyz_clocksource to avoid polluting the
namespace.Signed-off-by: John Stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduces clocksource switching code and the arch generic time accessor
functions that use the clocksource infrastructure.Signed-off-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Instead of incrementing xtime by tick_nsec + ntp adjustments, use the
clocksource abstraction to increment and scale time. Using the clocksource
abstraction allows other clocksources to be used consistently in the face of
late or lost ticks, while preserving the existing behavior via the jiffies
clocksource.This removes the need to keep time_phase adjustments as we just use the
current_tick_length() function as the NTP interface and accumulate time using
shifted nanoseconds.The basics of this design was by Roman Zippel, however it is my own
interpretation and implementation, so the credit should go to him and the
blame to me.Signed-off-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change the current_tick_length() function so it takes an argument which
specifies how much precision to return in shifted nanoseconds. This provides
a simple way to convert between NTPs internal nanoseconds shifted by
(SHIFT_SCALE - 10) to other shifted nanosecond units that are used by the
clocksource abstraction.Signed-off-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Modify the update_wall_time function so it increments time using the
clocksource abstraction instead of jiffies. Since the only clocksource driver
currently provided is the jiffies clocksource, this should result in no
functional change. Additionally, a timekeeping_init and timekeeping_resume
function has been added to initialize and maintain some of the new timekeping
state.[hirofumi@mail.parknet.co.jp: fixlet]
Signed-off-by: John Stultz
Signed-off-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Jun, 2006
1 commit
-
There are several instances of per_cpu(foo, raw_smp_processor_id()), which
is semantically equivalent to __get_cpu_var(foo) but without the warning
that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled. For
those architectures with optimized per-cpu implementations, namely ia64,
powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower
code than __get_cpu_var(), so it would be preferable to use __get_cpu_var
on those platforms.This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x,
raw_smp_processor_id()) on architectures that use the generic per-cpu
implementation, and turns into __get_cpu_var(x) on the architectures that
have an optimized per-cpu implementation.Signed-off-by: Paul Mackerras
Acked-by: David S. Miller
Acked-by: Ingo Molnar
Acked-by: Martin Schwidefsky
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Jun, 2006
2 commits
-
When CONFIG_BASE_SAMLL=1, cascade() in may enter the infinite loop.
Because of CONFIG_BASE_SMALL=1(TVR_BITS=6 and TVN_BITS=4), the list
base->tv5 may cascade into base->tv5. So, the kernel enters the infinite
loop in the function cascade().I created a test module to verify this bug, and a patch to fix it.
#include
#include
#include
#include
#if 0
#include
#else
#define kdb_printf printk
#endif#define TVN_BITS (CONFIG_BASE_SMALL ? 4 : 6)
#define TVR_BITS (CONFIG_BASE_SMALL ? 6 : 8)
#define TVN_SIZE (1 << TVN_BITS)
#define TVR_SIZE (1 << TVR_BITS)
#define TVN_MASK (TVN_SIZE - 1)
#define TVR_MASK (TVR_SIZE - 1)#define TV_SIZE(N) (N*TVN_BITS + TVR_BITS)
struct timer_list timer0;
struct timer_list dummy_timer1;
struct timer_list dummy_timer2;void dummy_timer_fun(unsigned long data) {
}
unsigned long j=0;
void check_timer_base(unsigned long data)
{
kdb_printf("check_timer_base %08x\n",jiffies);
mod_timer(&timer0,(jiffies & (~0xFFF)) + 0x1FFF);
}int init_module(void)
{
init_timer(&timer0);
timer0.data = (unsigned long)0;
timer0.function = check_timer_base;
mod_timer(&timer0,jiffies+1);init_timer(&dummy_timer1);
dummy_timer1.data = (unsigned long)0;
dummy_timer1.function = dummy_timer_fun;init_timer(&dummy_timer2);
dummy_timer2.data = (unsigned long)0;
dummy_timer2.function = dummy_timer_fun;j=jiffies;
j&=(~((1<<<
Cc: Matt Mackall
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1. We can use list_replace_init() instead.Signed-off-by: Oleg Nesterov
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 May, 2006
1 commit
-
Under certain timing conditions, a race during boot occurs where timer
ticks are being processed on remote CPUs. The remote timer ticks can
increment jiffies, and if this happens during a window when a timeout is
very close to expiring but a local tick has not yet been delivered, you can
end up with1) No softirq pending
2) A local timer wheel which is not synced to jiffies
3) No high resolution timer active
4) A local timer which is supposed to fire before the current jiffies value.In this circumstance, the comparison in next_timer_interrupt overflows,
because the base of the comparison for high resolution timers is jiffies,
but for the softirq timer wheel, it is relative the the current base of the
wheel (jiffies_base).Signed-off-by: Zachary Amsden
Cc: Martin Schwidefsky
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Apr, 2006
2 commits
-
Few of the notifier_chain_register() callers use __init in the definition
of notifier_call. It is incorrect as the function definition should be
available after the initializations (they do not unregister them during
initializations).This patch fixes all such usages to _not_ have the notifier_call __init
section.Signed-off-by: Chandra Seetharaman
Signed-off-by: Linus Torvalds -
Few of the notifier_chain_register() callers use __devinitdata in the
definition of notifier_block data structure. It is incorrect as the
data structure should be available after the initializations (they do
not unregister them during initializations).This was leading to an oops when notifier_chain_register() call is
invoked for those callback chains after initialization.This patch fixes all such usages to _not_ have the notifier_block data
structure in the init data section.Signed-off-by: Chandra Seetharaman
Signed-off-by: Linus Torvalds
11 Apr, 2006
1 commit
-
We need the boot CPU's tvec_bases[] entry to be initialised super-early in
boot, for early_serial_setup(). That runs within setup_arch(), before even
per-cpu areas are initialised.The patch changes tvec_bases to use compile-time initialisation, and adds a
separate array `tvec_base_done' to keep track of which CPU has had its
tvec_bases[] entry initialised (because we can no longer use the zeroness of
that tvec_bases[] entry to determine whether it has been initialised).Thanks to Eugene Surovegin for diagnosing this.
Cc: Eugene Surovegin
Cc: Jan Beulich
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Apr, 2006
1 commit
-
If the HPET timer is enabled, the clock can drift by ~3 seconds a day.
This is due to the HPET timer not being initialized with the correct
setting (still using PIT count).If HZ changes, this drift can become even more pronounced.
HPET patch initializes tick_nsec with correct tick_nsec settings for
HPET timer.Vojtech comments:
"It's not entirely correct (it assumes the HPET ticks totally
exactly), but it's significantly better than assuming the PIT error
there."Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds
02 Apr, 2006
1 commit
-
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk
01 Apr, 2006
3 commits
-
Currently, count_active_tasks() calls both nr_running() &
nr_interruptible(). Each of these functions does a "for_each_cpu" & reads
values from the runqueue of each cpu. Although this is not a lot of
instructions, each runqueue may be located on different node. Depending on
the architecture, a unique TLB entry may be required to access each
runqueue.Since there may be more runqueues than cpu TLB entries, a scan of all
runqueues can trash the TLB. Each memory reference incurs a TLB miss &
refill.In addition, the runqueue cacheline that contains nr_running &
nr_uninterruptible may be evicted from the cache between the two passes.
This causes unnecessary cache misses.Combining nr_running() & nr_interruptible() into a single function
substantially reduces the TLB & cache misses on large systems. This should
have no measureable effect on smaller systems.On a 128p IA64 system running a memory stress workload, the new function
reduced the overhead of calc_load() from 605 usec/call to 324 usec/call.Signed-off-by: Jack Steiner
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since base and new_base are of the same type now, we can save one 'if'
branch and simplify the code a bit.Signed-off-by: Oleg Nesterov
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit a4a6198b80cf82eb8160603c98da218d1bd5e104:
[PATCH] tvec_bases too large for per-cpu dataintroduced "struct tvec_t_base_s boot_tvec_bases" which is visible at
compile time. This means we can kill __init_timer_base and move
timer_base_s's content into tvec_t_base_s.Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Mar, 2006
2 commits
-
This removes the support for pps. It's completely unused within the kernel
and is basically in the way for further cleanups. It should be easier to
readd proper support for it after the rest has been converted to NTP4
(where the pps mechanisms are quite different from NTP3 anyway).Signed-off-by: Roman Zippel
Cc: Adrian Bunk
Cc: john stultz
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
alarm() calls the kernel with an unsigend int timeout in seconds. The
value is stored in the tv_sec field of a struct timeval to setup the
itimer. The tv_sec field of struct timeval is of type long, which causes
the tv_sec value to be negative on 32 bit machines if seconds > INT_MAX.Before the hrtimer merge (pre 2.6.16) such a negative value was converted
to the maximum jiffies timeout by the timeval_to_jiffies conversion. It's
not clear whether this was intended or just happened to be done by the
timeval_to_jiffies code.hrtimers expect a timeval in canonical form and treat a negative timeout as
already expired. This breaks the legitimate usage of alarm() with a
timeout value > INT_MAX seconds.For 32 bit machines it is therefor necessary to limit the internal seconds
value to avoid API breakage. Instead of doing this in all implementations
of sys_alarm the duplicated sys_alarm code is moved into a common function
in itimer.cSigned-off-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Mar, 2006
2 commits
-
Make the softlockup detector purely timer-interrupt driven, removing
softirq-context (timer) dependencies. This means that if the softlockup
watchdog triggers, it has truly observed a longer than 10 seconds
scheduling delay of a SCHED_FIFO prio 99 task.(the patch also turns off the softlockup detector during the initial bootup
phase and does small style fixes)Signed-off-by: Ingo Molnar
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With internal Xen-enabled kernels we see the kernel's static per-cpu data
area exceed the limit of 32k on x86-64, and even native x86-64 kernels get
fairly close to that limit. I generally question whether it is reasonable
to have data structures several kb in size allocated as per-cpu data when
the space there is rather limited.The biggest arch-independent consumer is tvec_bases (over 4k on 32-bit
archs, over 8k on 64-bit ones), which now gets converted to use dynamically
allocated memory instead.Signed-off-by: Jan Beulich
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Mar, 2006
1 commit
-
The pointer to the current time interpolator and the current list of time
interpolators are typically only changed during bootup. Adding
__read_mostly takes them away from possibly hot cachelines.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Mar, 2006
2 commits
-
Add a compiler barrier so that we don't read jiffies before updating
jiffies_64.Signed-off-by: Atsushi Nemoto
Cc: Ralf Baechle
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Also from Thomas Gleixner
Function next_timer_interrupt() got broken with a recent patch
6ba1b91213e81aa92b5cf7539f7d2a94ff54947c as sys_nanosleep() was moved to
hrtimer. This broke things as next_timer_interrupt() did not check hrtimer
tree for next event.Function next_timer_interrupt() is needed with dyntick (CONFIG_NO_IDLE_HZ,
VST) implementations, as the system can be in idle when next hrtimer event
was supposed to happen. At least ARM and S390 currently use
next_timer_interrupt().Signed-off-by: Thomas Gleixner
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Mar, 2006
1 commit
-
On some platforms readq performs additional work to make sure I/O is done
in a coherent way. This is not needed for time retrieval as done by the
time interpolator. So we can use readq_relaxed instead which will improve
performance.It affects sparc64 and ia64 only. Apparently it makes a significant
difference on ia64.Signed-off-by: Christoph Lameter
Cc: john stultz
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Feb, 2006
1 commit
-
This provides an interface for arch code to find out how many
nanoseconds are going to be added on to xtime by the next call to
do_timer. The value returned is a fixed-point number in 52.12 format
in nanoseconds. The reason for this format is that it gives the
full precision that the timekeeping code is using internally.The motivation for this is to fix a problem that has arisen on 32-bit
powerpc in that the value returned by do_gettimeofday drifts apart
from xtime if NTP is being used. PowerPC is now using a lockless
do_gettimeofday based on reading the timebase register and performing
some simple arithmetic. (This method of getting the time is also
exported to userspace via the VDSO.) However, the factor and offset
it uses were calculated based on the nominal tick length and weren't
being adjusted when NTP varied the tick length.Note that 64-bit powerpc has had the lockless do_gettimeofday for a
long time now. It also had an extremely hairy routine that got called
from the 32-bit compat routine for adjtimex, which adjusted the
factor and offset according to what it thought the timekeeping code
was going to do. Not only was this only called if a 32-bit task did
adjtimex (i.e. not if a 64-bit task did adjtimex), it was also
duplicating computations from kernel/timer.c and it wasn't clear that
it was (still) correct.The simple solution is to ask the timekeeping code how long the
current jiffy will be on each timer interrupt, after calling
do_timer. If this jiffy will be a different length from the last one,
we then need to compute new values for the factor and offset used in
the lockless do_gettimeofday. In this way we can keep xtime and
do_gettimeofday in sync, even when NTP is varying the tick length.Note that when adjtimex varies the tick length, it almost always
introduces the variation from the next tick on. The only case I could
see where adjtimex would vary the length of the current tick is when
an old-style adjtime adjustment is being cancelled. (It's not clear
to me why the adjustment has to be cancelled immediately rather than
from the next tick on.) Thus I don't see any real need for a hook in
adjtimex; the rare case of an old-style adjustment being cancelled can
be fixed up at the next tick.Signed-off-by: Paul Mackerras
Acked-by: john stultz
Signed-off-by: Linus Torvalds
08 Feb, 2006
1 commit
-
Signed-off-by: Al Viro