17 Feb, 2007
31 commits
-
Use mask_ack_irq() where possible.
Signed-off-by: Jan Beulich
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix kernel-doc warnings in IRQ management.
Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Never mask interrupts immediately upon request. Disabling interrupts in
high-performance codepaths is rare, and on the other hand this change could
recover lost edges (or even other types of lost interrupts) by conservatively
only masking interrupts after they happen. (NOTE: with this change the
highlevel irq-disable code still soft-disables this IRQ line - and if such an
interrupt happens then the IRQ flow handler keeps the IRQ masked.)Mark i8529A controllers as 'never loses an edge'.
Signed-off-by: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use RCU to avoid the need to acquire tasklist_lock in the single-threaded
case of clock_gettime(). It still acquires tasklist_lock when for a
(potentially multithreaded) process. This change allows realtime
applications to frequently monitor CPU consumption of individual tasks, as
requested (and now deployed) by some off-list users.This has been in Ingo Molnar's -rt patchset since late 2005 with no
problems reported, and tests successfully on 2.6.20-rc6, so I believe that
it is long-since ready for mainline adoption.[paulmck@linux.vnet.ibm.com: fix exit()/posix_cpu_clock_get() race spotted by Oleg]
Signed-off-by: Paul E. McKenney
Signed-off-by: Ingo Molnar
Cc: Thomas Gleixner
Cc: john stultz
Cc: Roman Zippel
Cc: Oleg Nesterov
Signed-off-by: Paul E. McKenney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In preparation for the x86_64 generic time conversion, this patch splits out
TSC and HPET related code from arch/x86_64/kernel/time.c into respective
hpet.c and tsc.c files.[akpm@osdl.org: fix printk timestamps]
[akpm@osdl.org: cleanup]
Signed-off-by: John Stultz
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Andi Kleen
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Provides generic infrastructure for vsyscall-gtod.
[akpm@osdl.org: cleanup]
Signed-off-by: John Stultz
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Andi Kleen
Cc: Roman ZippelSigned-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
add /proc/timer_list, which prints all currently pending (high-res) timers,
all clock-event sources and their parameters in a human-readable form.Sample output:
Timer List Version: v0.1
HRTIMER_MAX_CLOCK_BASES: 2
now at 4246046273872 nsecscpu: 0
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1273998312645738432 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: , hrtimer_sched_tick, hrtimer_stop_sched_tick, swapper/0
# expires at 4246432689566 nsecs [in 386415694 nsecs]
#1: , hrtimer_wakeup, do_nanosleep, pcscd/2050
# expires at 4247018194689 nsecs [in 971920817 nsecs]
#2: , hrtimer_wakeup, do_nanosleep, irqbalance/1909
# expires at 4247351358392 nsecs [in 1305084520 nsecs]
#3: , hrtimer_wakeup, do_nanosleep, crond/2157
# expires at 4249097614968 nsecs [in 3051341096 nsecs]
#4: , it_real_fn, do_setitimer, syslogd/1888
# expires at 4251329900926 nsecs [in 5283627054 nsecs]
.expires_next : 4246432689566 nsecs
.hres_active : 1
.check_clocks : 0
.nr_events : 31306
.idle_tick : 4246020791890 nsecs
.tick_stopped : 1
.idle_jiffies : 986504
.idle_calls : 40700
.idle_sleeps : 36014
.idle_entrytime : 4246019418883 nsecs
.idle_sleeptime : 4178181972709 nsecscpu: 1
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1273998312645738432 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: , hrtimer_sched_tick, hrtimer_restart_sched_tick, swapper/0
# expires at 4246050084568 nsecs [in 3810696 nsecs]
#1: , hrtimer_wakeup, do_nanosleep, atd/2227
# expires at 4261010635003 nsecs [in 14964361131 nsecs]
#2: , hrtimer_wakeup, do_nanosleep, smartd/2332
# expires at 5469485798970 nsecs [in 1223439525098 nsecs]
.expires_next : 4246050084568 nsecs
.hres_active : 1
.check_clocks : 0
.nr_events : 24043
.idle_tick : 4246046084568 nsecs
.tick_stopped : 0
.idle_jiffies : 986510
.idle_calls : 26360
.idle_sleeps : 22551
.idle_entrytime : 4246043874339 nsecs
.idle_sleeptime : 4170763761184 nsecstick_broadcast_mask: 00000003
event_broadcast_mask: 00000001CPU#0's local event device:
Clock Event Device: lapic
capabilities: 0000000e
max_delta_ns: 807385544
min_delta_ns: 1443
mult: 44624025
shift: 32
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt
.installed: 1
.expires: 4246432689566 nsecsCPU#1's local event device:
Clock Event Device: lapic
capabilities: 0000000e
max_delta_ns: 807385544
min_delta_ns: 1443
mult: 44624025
shift: 32
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt
.installed: 1
.expires: 4246050084568 nsecsClock Event Device: hpet
capabilities: 00000007
max_delta_ns: 2147483647
min_delta_ns: 3352
mult: 61496110
shift: 32
set_next_event: hpet_next_event
set_mode: hpet_set_mode
event_handler: handle_nextevt_broadcastSigned-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add /proc/timer_stats support: debugging feature to profile timer expiration.
Both the starting site, process/PID and the expiration function is captured.
This allows the quick identification of timer event sources in a system.Sample output:
# echo 1 > /proc/timer_stats
# cat /proc/timer_stats
Timer Stats Version: v0.1
Sample period: 4.010 s
24, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
11, 0 swapper sk_reset_timer (tcp_delack_timer)
6, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
17, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
4, 2050 pcscd do_nanosleep (hrtimer_wakeup)
5, 4179 sshd sk_reset_timer (tcp_write_timer)
4, 2248 yum-updatesd schedule_timeout (process_timeout)
18, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
3, 0 swapper sk_reset_timer (tcp_delack_timer)
1, 1 swapper neigh_table_init_no_netlink (neigh_periodic_timer)
2, 1 swapper e1000_up (e1000_watchdog)
1, 1 init schedule_timeout (process_timeout)
100 total events, 25.24 events/sec[ cleanups and hrtimers support from Thomas Gleixner ]
[bunk@stusta.de: nr_entries can become static]
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner
Cc: john stultz
Cc: Roman Zippel
Cc: Andi Kleen
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix potential setitimer DoS with high-res timers by pushing itimer rearm
processing to process context.[Fixes from: Ingo Molnar ]
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Implement high resolution timers on top of the hrtimers infrastructure and the
clockevents / tick-management framework. This provides accurate timers for
all hrtimer subsystem users.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With Ingo Molnar
Add functions to provide dynamic ticks and high resolution timers. The code
which keeps track of jiffies and handles the long idle periods is shared
between tick based and high resolution timer based dynticks. The dyntick
functionality can be disabled on the kernel commandline. Provide also the
infrastructure to support high resolution timers.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With Ingo Molnar
Add broadcast functionality, so per cpu clock event devices can be registered
as dummy devices or switched from/to broadcast on demand. The broadcast
function distributes the events via the broadcast function of the clock event
device. This is primarily designed to replace the switch apic timer to / from
IPI in power states, where the apic stops.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With Ingo Molnar
The tick-management code is the first user of the clockevents layer. It takes
clock event devices from the clock events core and uses them to provide the
periodic tick.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Architectures register their clock event devices, in the clock events core.
Users of the clockevents core can get clock event devices for their use. The
clockevents core code provides notification mechanisms for various clock
related management events.This allows to control the clock event devices without the architectures
having to worry about the details of function assignment. This is also a
preliminary for high resolution timers and dynamic ticks to allow the core
code to control the clock functionality without intrusive changes to the
architecture code.[Fixes-by: Ingo Molnar ]
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: Roman Zippel
Cc: john stultz
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Reintroduce ktimers feature "optimized away" by the ktimers review process:
remove the curr_timer pointer from the cpu-base and use the hrtimer state.No functional changes.
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: Roman Zippel
Cc: john stultz
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Reintroduce ktimers feature "optimized away" by the ktimers review process:
multiple hrtimer states to enable the running of hrtimers without holding the
cpu-base-lock.(The "optimized" rbtree hack carried only 2 states worth of information and we
need 4 for high resolution timers and dynamic ticks.)No functional changes.
Build-fixes-from: Andrew Morton
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: Roman Zippel
Cc: john stultz
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Improve kernel/hrtimers.c locking: use a per-CPU base with a lock to control
locking of all clocks belonging to a CPU. This simplifies code that needs to
lock all clocks at once. This makes life easier for high-res timers and
dyntick.No functional changes.
[ optimization change from Andrew Morton ]
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
- hrtimers did not use the hrtimer_restart enum and relied on the implict
int representation. Fix the prototypes and the functions using the enums.
- Use seperate name spaces for the enumerations
- Convert hrtimer_restart macro to inline function
- Add commentsNo functional changes.
[akpm@osdl.org: fix input driver]
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Cc: Dmitry Torokhov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For CONFIG_NO_HZ we need to calculate the next timer wheel event based on a
given jiffie value. Extend the existing code to allow the extra 'now'
argument. Provide a compability function for the existing implementations to
call the function with now == jiffies. (This also solves the racyness of the
original code vs. jiffies changing during the iteration.)No functional changes to existing users of this infrastructure.
[ remove WARN_ON() that triggered on s390, by Carsten Otte ]
[ made new helper static, Adrian Bunk ]
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When searching for the next pending timer in the timer wheel we need to take
the cascade into account. The current code has several problems:1. it looks into the previous cascade
2. it ignores a pending cascade
3. it ignores multiple cascadesChange the cascade lookup, so it calculates the array index from the point of
the next cascade and always look at the cascade buckets, when the cascade is
pending, i.e. gets executed in the next timer softirq. When multiple
cascades are pending, then lookup the next buckets too.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Uninline irq_enter(). [dynticks adds more stuff to it]
No functional changes.
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The TSC needs to be verified against another clocksource. Instead of using
hardwired assumptions of available hardware, provide a generic verification
mechanism. The verification uses the best available clocksource and handles
the usability for high resolution timers / dynticks of the clocksource which
needs to be verified.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The clocksource code allows direct updates of the rating of a given
clocksource now. Change TSC unstable tracking to use this interface and
remove the update callback.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Using a flag filed allows to encode more than one information into a variable.
Preparatory patch for the generic clocksource verification.[mingo@elte.hu: convert vmitime.c to the new clocksource flag]
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Enqueue clocksources in rating order to make selection of the clocksource
easier. Also check the match with an user override at enqueue time.Preparatory patch for the generic clocksource verification.
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Persistent clock support: do proper timekeeping across suspend/resume.
[bunk@stusta.de: cleanup]
Signed-off-by: John Stultz
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: Roman Zippel
Cc: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix multiple conversion bugs in msecs_to_jiffies().
The main problem is that this condition:
if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
overflows if HZ is smaller than 1000!
This change is user-visible: for HZ=250 SUS-compliant poll()-timeout
value of -20 is mistakenly converted to 'immediate timeout'.(The new dyntick code also triggered this, that's how we noticed.)
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are loads of fat functions hidden in jiffies.h. Uninline them. No code
changes.[jeremy@goop.org: export fix]
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner
Cc: john stultz
Cc: Roman Zippel
Cc: Jeremy Fitzhardinge
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Distangle the NTP update from HZ. This is necessary for dynamic tick enabled
kernels.Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Provide funtions to:
- check, whether an interrupt can set the affinity
- pin the interrupt to a given cpuNecessary for the ability to setup clocksources more flexible (e.g. use the
different HPET channels per CPU)[akpm@osdl.org: alpha build fix]
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add a flag so we can prevent the irq balancing of an interrupt. Move the
bits, so we have room for more :)Necessary for the ability to setup clocksources more flexible (e.g. use the
different HPET channels per CPU)Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Feb, 2007
9 commits
-
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (94 commits)
[PATCH] x86-64: Remove mk_pte_phys()
[PATCH] i386: Fix broken CONFIG_COMPAT_VDSO on i386
[PATCH] i386: fix 32-bit ioctls on x64_32
[PATCH] x86: Unify pcspeaker platform device code between i386/x86-64
[PATCH] i386: Remove extern declaration from mm/discontig.c, put in header.
[PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c
[PATCH] i386: Move mce_disabled to asm/mce.h
[PATCH] i386: paravirt unhandled fallthrough
[PATCH] x86_64: Wire up compat epoll_pwait
[PATCH] x86: Don't require the vDSO for handling a.out signals
[PATCH] i386: Fix Cyrix MediaGX detection
[PATCH] i386: Fix warning in cpu initialization
[PATCH] i386: Fix warning in microcode.c
[PATCH] x86: Enable NMI watchdog for AMD Family 0x10 CPUs
[PATCH] x86: Add new CPUID bits for AMD Family 10 CPUs in /proc/cpuinfo
[PATCH] i386: Remove fastcall in paravirt.[ch]
[PATCH] x86-64: Fix wrong gcc check in bitops.h
[PATCH] x86-64: survive having no irq mapping for a vector
[PATCH] i386: geode configuration fixes
[PATCH] i386: add option to show more code in oops reports
... -
Add a parent entry into the ctl_table so you can walk the list of parents and
find the entire path to a ctl_table entry.Signed-off-by: Eric W. Biederman
Cc: Stephen Smalley
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With this change the sysctl inodes can be cached and nothing needs to be done
when removing a sysctl table.For a cost of 2K code we will save about 4K of static tables (when we remove
de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
about half that on a 32bit arch.The speed feels about the same, even though we can now cache the sysctl
dentries :(We get the core advantage that we don't need to have a 1 to 1 mapping between
ctl table entries and proc files. Making it possible to have /proc/sys vary
depending on the namespace you are in. The currently merged namespaces don't
have an issue here but the network namespace under /proc/sys/net needs to have
different directories depending on which network adapters are visible. By
simply being a cache different directories being visible depending on who you
are is trivial to implement.[akpm@osdl.org: fix uninitialised var]
[akpm@osdl.org: fix ARM build]
[bunk@stusta.de: make things static]
Signed-off-by: Eric W. Biederman
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The current logic to walk through the list of sysctl table headers is slightly
painful and implement in a way it cannot be used by code outside sysctl.cI am in the process of implementing a version of the sysctl proc support that
instead of using the proc generic non-caching monster, just uses the existing
sysctl data structure as backing store for building the dcache entries and for
doing directory reads. To use the existing data structures however I need a
way to get at them.[akpm@osdl.org: warning fix]
Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name. Which is
pain for caching and the proc interface never implemented.I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.Signed-off-by: Eric W. Biederman
Acked-by: Ralf Baechle
Acked-by: Martin Schwidefsky
Cc: Russell King
Cc: David Howells
Cc: "Luck, Tony"
Cc: Ralf Baechle
Cc: Paul Mackerras
Cc: Martin Schwidefsky
Cc: Andi Kleen
Cc: Jens Axboe
Cc: Corey Minyard
Cc: Neil Brown
Cc: "John W. Linville"
Cc: James Bottomley
Cc: Jan Kara
Cc: Trond Myklebust
Cc: Mark Fasheh
Cc: David Chinner
Cc: "David S. Miller"
Cc: Patrick McHardy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
parse_table has support for calling a strategy routine when descending into a
directory. To date no one has used this functionality and the /proc/sys
interface has no analog to it.So no one is using this functionality kill it and make the binary sysctl code
easier to follow.Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are currently no users in the kernel for CTL_ANY and it only has effect
on the binary interface which is practically unused.So this complicates sysctl lookups for no good reason so just remove it.
Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
binfmt_misc has a mount point in the middle of the sysctl and that mount point
is created as a proc_generic directory.Doing it that way gets in the way of cleaning up the sysctl proc support as it
continues the existence of a horrible hack. So instead simply create the
directory as an ordinary sysctl directory. At least that removes the magic
special case.[akpm@osdl.org: warning fix]
Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds