14 Jan, 2008
1 commit
-
task_ppid_nr_ns is called in three places. One of these should never
have called it. In the other two, using it broke the existing
semantics. This was presumably accidental. If the function had not
been there, it would have been much more obvious to the eye that those
patches were changing the behavior. We don't need this function.In task_state, the pid of the ptracer is not the ppid of the ptracer.
In do_task_stat, ppid is the tgid of the real_parent, not its pid.
I also moved the call outside of lock_task_sighand, since it doesn't
need it.In sys_getppid, ppid is the tgid of the real_parent, not its pid.
Signed-off-by: Roland McGrath
Signed-off-by: Linus Torvalds
19 Dec, 2007
1 commit
-
This patch fixes the following section mismatches with CONFIG_HOTPLUG=n,
CONFIG_HOTPLUG_CPU=y:...
WARNING: vmlinux.o(.text+0x41cd3): Section mismatch: reference to .init.data:tvec_base_done.22610 (between 'timer_cpu_notify' and 'run_timer_softirq')
WARNING: vmlinux.o(.text+0x41d67): Section mismatch: reference to .init.data:tvec_base_done.22610 (between 'timer_cpu_notify' and 'run_timer_softirq')
...Signed-off-by: Adrian Bunk
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner
10 Nov, 2007
1 commit
-
Since powerpc started using CONFIG_GENERIC_CLOCKEVENTS, the
deterministic CPU accounting (CONFIG_VIRT_CPU_ACCOUNTING) has been
broken on powerpc, because we end up counting user time twice: once in
timer_interrupt() and once in update_process_times().This fixes the problem by pulling the code in update_process_times
that updates utime and stime into a separate function called
account_process_tick. If CONFIG_VIRT_CPU_ACCOUNTING is not defined,
there is a version of account_process_tick in kernel/timer.c that
simply accounts a whole tick to either utime or stime as before. If
CONFIG_VIRT_CPU_ACCOUNTING is defined, then arch code gets to
implement account_process_tick.This also lets us simplify the s390 code a bit; it means that the s390
timer interrupt can now call update_process_times even when
CONFIG_VIRT_CPU_ACCOUNTING is turned on, and can just implement a
suitable account_process_tick().account_process_tick() now takes the task_struct * as an argument.
Tested both with and without CONFIG_VIRT_CPU_ACCOUNTING.Signed-off-by: Paul Mackerras
Signed-off-by: Ingo Molnar
06 Nov, 2007
1 commit
-
Signed-off-by: Li Zefan
Cc: Thomas Gleixner
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Oct, 2007
1 commit
-
This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.The idea is:
- all in-kernel data structures must store either struct pid itself
or the pid's global nr, obtained with pid_nr() call;
- when seeking the task from kernel code with the stored id one
should use find_task_by_pid() call that works with global pids;
- when showing pid's numerical value to the user the virtual one
should be used, but however when one shows task's pid outside this
task's namespace the global one is to be used;
- when getting the pid from userspace one need to consider this as
the virtual one and use appropriate task/pid-searching functions.[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov
Signed-off-by: Alexey Dobriyan
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
19 Oct, 2007
2 commits
-
This adds items to the taststats struct to account for user and system
time based on scaling the CPU frequency and instruction issue rates.Adds account_(user|system)_time_scaled callbacks which architectures
can use to account for time using this mechanism.Signed-off-by: Michael Neuling
Cc: Balbir Singh
Cc: Jay Lan
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Daniel Walker
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
21 Jul, 2007
2 commits
-
Remove time_interpolator code (This is generic code, but
only user was ia64. It has been superseded by the
CONFIG_GENERIC_TIME code).Signed-off-by: Bob Picco
Signed-off-by: John Stultz
Signed-off-by: Peter Keilty
Signed-off-by: Tony Luck
20 Jul, 2007
1 commit
-
Signed-off-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Jul, 2007
1 commit
-
kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing
variant in the past. But with __GFP_ZERO it is possible now to do zeroing
while allocating.Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever
we can.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Jul, 2007
2 commits
-
Add a flag in /proc/timer_stats to indicate deferrable timers. This will
let developers/users to differentiate between types of tiemrs in
/proc/timer_stats.Deferrable timer and normal timer will appear in /proc/timer_stats as below.
10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
10, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)Also version of timer_stats changes from v0.1 to v0.2
Signed-off-by: Venkatesh Pallipadi
Acked-by: Ingo Molnar
Cc: Thomas Gleixner
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused uptime not to increase
during suspend. This may cause confusion so I restore the old behaviour by
using the boot based time instead of monotonic for uptime.Signed-off-by: Tomas Janousek
Acked-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 May, 2007
1 commit
-
get_next_timer_interrupt() returns a delta of (LONG_MAX > 1) in case
there is no timer pending. On 64 bit machines this results in a
multiplication overflow in tick_nohz_stop_sched_tick().Reported by: Dave Miller
Make the return value a constant and limit the return value to a 32 bit
value.When the max timeout value is returned, we can safely stop the tick
timer device. The max jiffies delta results in a 12 days timeout for
HZ=1000.In the long term the get_next_timer_interrupt() code needs to be
reworked to return ktime instead of jiffies, but we have to wait until
the last users of the original NO_IDLE_HZ code are converted.Signed-off-by: Thomas Gleixner
Acked-off-by: David S. Miller
Signed-off-by: Linus Torvalds
15 May, 2007
1 commit
-
The time keeping code move to kernel/time/timekeeping.c broke the
clocksource resume logic patch, which got applied to the old file by a
fuzzy application. Fix it up and move the clocksource_resume() call to
the appropriate place.Signed-off-by: Thomas Gleixner
[ tssk, tssk, everybody should use --fuzz=0 ]
Signed-off-by: Linus Torvalds
11 May, 2007
1 commit
-
On 09-05-2007 21:10, Pallipadi, Venkatesh wrote:
...
> On a 64 bit system, converting pointer to int causes unnecessary
> compiler warning, and intermediate long conversion was to avoid that.
> I will have to rephrase my comment to remove 32 bit value and use int,
> as that is what the function returns.So, this patch reverts all changes done by my previous patch.
I apologize for my wrong comment about "logical error" here.
Cc: "Pallipadi, Venkatesh"
Cc: Satyam Sharma
Cc: Oleg Nesterov
Signed-off-by: Jarek Poplawski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 May, 2007
3 commits
-
We need to make sure that the clocksources are resumed, when timekeeping is
resumed. The current resume logic does not guarantee this.Add a resume function pointer to the clocksource struct, so clocksource
drivers which need to reinitialize the clocksource can provide a resume
function.Add a resume function, which calls the maybe available clocksource resume
functions and resets the watchdog function, so a stable TSC can be used
accross suspend/resume.Signed-off-by: Thomas Gleixner
Cc: john stultz
Cc: Andi Kleen
Cc: Ingo Molnar
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress. This
patch introduces such notifications and causes them to be used during
suspend and resume transitions. It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki
Cc: Gautham R Shenoy
Cc: Pavel Machek
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Jarek Poplawski
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 May, 2007
3 commits
-
There are many places in the kernel where the construction like
foo = list_entry(head->next, struct foo_struct, list);
are used.
The code might look more descriptive and neat if using the macrolist_first_entry(head, type, member) \
list_entry((head)->next, type, member)Here is the macro itself and the examples of its usage in the generic code.
If it will turn out to be useful, I can prepare the set of patches to
inject in into arch-specific code, drivers, networking, etc.Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Cc: Randy Dunlap
Cc: Andi Kleen
Cc: Zach Brown
Cc: Davide Libenzi
Cc: John McCutchan
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: john stultz
Cc: Ram Pai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move the timekeeping code out of kernel/timer.c and into
kernel/time/timekeeping.c. I made no cleanups or other changes in transit.[akpm@linux-foundation.org: build fix]
Signed-off-by: John Stultz
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduce a new flag for timers - deferrable: Timers that work normally
when system is busy. But, will not cause CPU to come out of idle (just to
service this timer), when CPU is idle. Instead, this timer will be
serviced when CPU eventually wakes up with a subsequent non-deferrable
timer.The main advantage of this is to avoid unnecessary timer interrupts when
CPU is idle. If the routine currently called by a timer can wait until
next event without any issues, this new timer can be used to setup timer
event for that routine. This, with dynticks, allows CPUs to be lazy,
allowing them to stay in idle for extended period of time by reducing
unnecesary wakeup and thereby reducing the power consumption.This patch:
Builds this new timer on top of existing timer infrastructure. It uses
last bit in 'base' pointer of timer_list structure to store this deferrable
timer flag. __next_timer_interrupt() function skips over these deferrable
timers when CPU looks for next timer event for which it has to wake up.This is exported by a new interface init_timer_deferrable() that can be
called in place of regular init_timer().[akpm@linux-foundation.org: Privatise a #define]
Signed-off-by: Venkatesh Pallipadi
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Oleg Nesterov
Cc: Dave Jones
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Apr, 2007
1 commit
-
Export try_to_del_timer_sync() for use by the AF_RXRPC module.
Signed-off-by: David Howells
Signed-off-by: David S. Miller
08 Apr, 2007
1 commit
-
Soeren Sonnenburg reported that upon resume he is getting
this backtrace:[] smp_apic_timer_interrupt+0x57/0x90
[] retrigger_next_event+0x0/0xb0
[] apic_timer_interrupt+0x28/0x30
[] retrigger_next_event+0x0/0xb0
[] __kfifo_put+0x8/0x90
[] on_each_cpu+0x35/0x60
[] clock_was_set+0x18/0x20
[] timekeeping_resume+0x7c/0xa0
[] __sysdev_resume+0x11/0x80
[] sysdev_resume+0x47/0x80
[] device_power_up+0x5/0x10it turns out that on resume we mistakenly re-enable interrupts too
early. Do the timer retrigger only on the current CPU.Signed-off-by: Ingo Molnar
Acked-by: Thomas Gleixner
Acked-by: Soeren Sonnenburg
Signed-off-by: Linus Torvalds
26 Mar, 2007
1 commit
-
The rework of next_timer_interrupt() fixed the timer wheel bugs, but
invented a rounding error versus the next hrtimer event. This is caused
by the conversion of the hrtimer internal representation to relative
jiffies.This causes bug #8100:
http://bugzilla.kernel.org/show_bug.cgi?id=8100next_timer_interrupt() returns "now" in such a case and causes the code
in tick_nohz_stop_sched_tick() to trigger the timer softirq, which is
bogus as no timer is due for expiry. This results in an endless context
switching between idle and ksoftirqd until a timer is due for expiry.Modify the hrtimer evaluation so that, it returns now + 1, when the
conversion results in a delta < 1 jiffie.It's confirmed to resolve bug #8100
Reported-by: Emil Karlson
Signed-off-by: Thomas Gleixner
Signed-off-by: Linus Torvalds
07 Mar, 2007
2 commits
-
I've only seen this on x86_64.
The vsyscall state only gets updated when a timer interrupts comes in. So
if the time is set long before the next timer, there will be a period when
a gettimeofday() won't reflect the correct time.I added an explicit update_vsyscall() during the settimeofday(), that way
the vsyscall state doesn't get stale.Signed-off-by: Daniel Walker
Cc: Thomas Gleixner
Acked-by: Ingo Molnar
Acked-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The programming of periodic tick devices needs to be saved/restored
across suspend/resume - otherwise we might end up with a system coming
up that relies on getting a PIT (or HPET) interrupt, while those devices
default to 'no interrupts' after powerup. (To confuse things it worked
to a certain degree on some systems because the lapic gets initialized
as a side-effect of SMP bootup.)This suspend / resume thing was dropped unintentionally during the
last-minute -mm code reshuffling.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Signed-off-by: Linus Torvalds
05 Mar, 2007
1 commit
-
Doing something like this on a two cpu system
# echo 0 > /sys/devices/system/cpu/cpu0/online
# echo 1 > /sys/devices/system/cpu/cpu0/online
# echo 0 > /sys/devices/system/cpu/cpu1/onlinewill give me this:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.21-rc2-g562aa1d4-dirty #7
-------------------------------------------------------
bash/1282 is trying to acquire lock:
(&cpu_base->lock_key){.+..}, at: [] hrtimer_cpu_notify+0xc6/0x240but task is already holding lock:
(&cpu_base->lock_key#2){.+..}, at: [] hrtimer_cpu_notify+0xbc/0x240which lock already depends on the new lock.
This happens because we have the following code in kernel/hrtimer.c:
migrate_hrtimers(int cpu)
[...]
old_base = &per_cpu(hrtimer_bases, cpu);
new_base = &get_cpu_var(hrtimer_bases);
[...]
spin_lock(&new_base->lock);
spin_lock(&old_base->lock);Which means the spinlocks are taken in an order which depends on which cpu
gets shut down from which other cpu. Therefore lockdep complains that there
might be an ABBA deadlock. Since migrate_hrtimers() gets only called on
cpu hotplug it's safe to assume that it isn't executed concurrently on aThe same problem exists in kernel/timer.c: migrate_timers().
As pointed out by Christian Borntraeger one possible solution to avoid
the locking order complaints would be to make sure that the locks are
always taken in the same order. E.g. by taking the lock of the cpu with
the lower number first.To achieve this we introduce two new spinlock functions double_spin_lock
and double_spin_unlock which lock or unlock two locks in a given order.Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Roman Zippel
Cc: John Stultz
Cc: Christian Borntraeger
Cc: Martin Schwidefsky
Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Mar, 2007
2 commits
-
Fix kernel-doc warnings in 2.6.20-git15 (lib/, mm/, kernel/, include/).
Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Daniel Walker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Feb, 2007
10 commits
-
Provides generic infrastructure for vsyscall-gtod.
[akpm@osdl.org: cleanup]
Signed-off-by: John Stultz
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Andi Kleen
Cc: Roman ZippelSigned-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add /proc/timer_stats support: debugging feature to profile timer expiration.
Both the starting site, process/PID and the expiration function is captured.
This allows the quick identification of timer event sources in a system.Sample output:
# echo 1 > /proc/timer_stats
# cat /proc/timer_stats
Timer Stats Version: v0.1
Sample period: 4.010 s
24, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
11, 0 swapper sk_reset_timer (tcp_delack_timer)
6, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
17, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
4, 2050 pcscd do_nanosleep (hrtimer_wakeup)
5, 4179 sshd sk_reset_timer (tcp_write_timer)
4, 2248 yum-updatesd schedule_timeout (process_timeout)
18, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
3, 0 swapper sk_reset_timer (tcp_delack_timer)
1, 1 swapper neigh_table_init_no_netlink (neigh_periodic_timer)
2, 1 swapper e1000_up (e1000_watchdog)
1, 1 init schedule_timeout (process_timeout)
100 total events, 25.24 events/sec[ cleanups and hrtimers support from Thomas Gleixner ]
[bunk@stusta.de: nr_entries can become static]
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner
Cc: john stultz
Cc: Roman Zippel
Cc: Andi Kleen
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With Ingo Molnar
Add functions to provide dynamic ticks and high resolution timers. The code
which keeps track of jiffies and handles the long idle periods is shared
between tick based and high resolution timer based dynticks. The dyntick
functionality can be disabled on the kernel commandline. Provide also the
infrastructure to support high resolution timers.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Architectures register their clock event devices, in the clock events core.
Users of the clockevents core can get clock event devices for their use. The
clockevents core code provides notification mechanisms for various clock
related management events.This allows to control the clock event devices without the architectures
having to worry about the details of function assignment. This is also a
preliminary for high resolution timers and dynamic ticks to allow the core
code to control the clock functionality without intrusive changes to the
architecture code.[Fixes-by: Ingo Molnar ]
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: Roman Zippel
Cc: john stultz
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For CONFIG_NO_HZ we need to calculate the next timer wheel event based on a
given jiffie value. Extend the existing code to allow the extra 'now'
argument. Provide a compability function for the existing implementations to
call the function with now == jiffies. (This also solves the racyness of the
original code vs. jiffies changing during the iteration.)No functional changes to existing users of this infrastructure.
[ remove WARN_ON() that triggered on s390, by Carsten Otte ]
[ made new helper static, Adrian Bunk ]
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When searching for the next pending timer in the timer wheel we need to take
the cascade into account. The current code has several problems:1. it looks into the previous cascade
2. it ignores a pending cascade
3. it ignores multiple cascadesChange the cascade lookup, so it calculates the array index from the point of
the next cascade and always look at the cascade buckets, when the cascade is
pending, i.e. gets executed in the next timer softirq. When multiple
cascades are pending, then lookup the next buckets too.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The TSC needs to be verified against another clocksource. Instead of using
hardwired assumptions of available hardware, provide a generic verification
mechanism. The verification uses the best available clocksource and handles
the usability for high resolution timers / dynticks of the clocksource which
needs to be verified.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The clocksource code allows direct updates of the rating of a given
clocksource now. Change TSC unstable tracking to use this interface and
remove the update callback.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Using a flag filed allows to encode more than one information into a variable.
Preparatory patch for the generic clocksource verification.[mingo@elte.hu: convert vmitime.c to the new clocksource flag]
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Persistent clock support: do proper timekeeping across suspend/resume.
[bunk@stusta.de: cleanup]
Signed-off-by: John Stultz
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: Roman Zippel
Cc: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds