02 Feb, 2006
9 commits
-
This function is neither used nor has any real contents.
Signed-off-by: Adrian Bunk
Acked-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The expiry time for relative timers with SIGEV_NONE set was never
updated to the correct value.Pointed out by George Anzinger.
Signed-off-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
At some point we added credits to people who actively helped to bring
k/hr-timers along. This was lost in the big code revamp. Add it back.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Clean up the interface to hrtimers by changing the init code to pass the mode
as well as the clock. This allow the init code to select the correct base and
eliminates extra timer re-init code in posix-timers. We also simplify the
restart interface nanosleep use.Signed-off-by: George Anzinger
Signed-off-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
From: Steven Rostedtrostedt@goodmis.org
CPU0 expires a posix-timer and runs the callback function. The signal is
queued.After releasing the posix-timer lock and before returning to hrtimer_run_queue
CPU0 gets interrupted. CPU1 delivers the queued signal and rearms the timer.
CPU0 comes back to hrtimer_run_queue and sets the timer state to expired.The next modification of the timer can result in an oops, because the state
information is wrong.Keep track of state = RUNNING and check if the state has been in the return
path of hrtimer_run_queue. In case the state has been changed, ignore a
restart request and do not touch the state variable.Signed-off-by: Steven Rostedt
Signed-off-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This resolves bugzilla bug#5617. The oldvalue of the timer was read after the
timer was cancelled, so the remaining time was always zero.Signed-off-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fixup the conversion of posix-timers to hrtimers.
Signed-off-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The itimer conversion removed the locking which protects the timer and
variables in the shared signal structure. Steven Rostedt found the problem in
the latest -rt patches.Signed-off-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make swsusp use bytes as the image size units, which is needed for future
compatibility.Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Feb, 2006
5 commits
-
I get storms of warnings from local_bh_enable(). Better-tested patches,
please.Cc: Ingo Molnar
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
rcu_torture_lock is used in a softirq-unsafe manner, but it is also
taken by rcu_torture_cb(), which may execute in softirq-context,
resulting in potential deadlocks.The fix is to acquire rcu_torture_lock in a softirq-safe manner. With
this fix applied, the rcu-torture code passes validation.Signed-off-by: Ingo Molnar
Acked-by: Paul E. McKenney
Signed-off-by: Linus Torvalds -
RCU task-struct freeing can call free_uid(), which is taking
uidhash_lock - while other users of uidhash_lock are softirq-unsafe.The fix is to always take the uidhash_spinlock in a softirq-safe manner.
Signed-off-by: Ingo Molnar
Acked-by: Paul E. McKenney
Signed-off-by: Linus Torvalds -
This reduces the amount of time the migration cost calculations cost
during bootup. Based on numbers by Tony Luck .Signed-off-by: Ingo Molnar
-
settime() with a NULL timeval is silly but legal.
Noticed by Dave Jones
Signed-off-by: Linus Torvalds
19 Jan, 2006
4 commits
-
EDAC requires a way to scrub memory if an ECC error is found and the chipset
does not do the work automatically. That means rewriting memory locations
atomically with respect to all CPUs _and_ bus masters. That means we can't
use atomic_add(foo, 0) as it gets optimised for non-SMPThis adds a function to include/asm-foo/atomic.h for the platforms currently
supported which implements a scrub of a mapped block.It also adjusts a few other files include order where atomic.h is included
before types.h as this now causes an error as atomic_scrub uses u32.Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The TIF_RESTORE_SIGMASK flag allows us to have a generic implementation of
sys_rt_sigsuspend() instead of duplicating it for each architecture. This
provides such an implementation and makes arch/powerpc use it.It also tidies up the ppc32 sys_sigsuspend() to use TIF_RESTORE_SIGMASK.
Signed-off-by: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently, a negative policy argument passed into the
'sys_sched_setscheduler()' system call, will return with success. However,
the manpage for 'sys_sched_setscheduler' says:EINVAL The scheduling policy is not one of the recognized policies, or the
parameter p does not make sense for the policy.Signed-off-by: Jason Baron
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
proc support for zone reclaim
This patch creates a proc entry /proc/sys/vm/zone_reclaim_mode that may be
used to override the automatic determination of the zone reclaim made on
bootup.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Jan, 2006
3 commits
-
fix the following sparse warning:
kernel/hrtimer.c:665:34: warning: incorrect type in argument 2 (different address spaces)
kernel/hrtimer.c:665:34: expected void const *from
kernel/hrtimer.c:665:34: got struct timespec [noderef] *
kernel/hrtimer.c:664:2: warning: dereference of noderef expressionSigned-off-by: Ingo Molnar
Signed-off-by: Linus Torvalds -
Build kernel/intermodule.c only when required.
Signed-off-by: Adrian Bunk
Cc: Sam Ravnborg
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix a comment which missed an update cycle somewhere.
Signed-off-by: Jonathan Corbet
Signed-off-by: Linus Torvalds
16 Jan, 2006
1 commit
15 Jan, 2006
6 commits
-
The problem, reported in:
http://bugzilla.kernel.org/show_bug.cgi?id=5859
and by various other email messages and lkml posts is that the cpuset hook
in the oom (out of memory) code can try to take a cpuset semaphore while
holding the tasklist_lock (a spinlock).One must not sleep while holding a spinlock.
The fix seems easy enough - move the cpuset semaphore region outside the
tasklist_lock region.This required a few lines of mechanism to implement. The oom code where
the locking needs to be changed does not have access to the cpuset locks,
which are internal to kernel/cpuset.c only. So I provided a couple more
cpuset interface routines, available to the rest of the kernel, which
simple take and drop the lock needed here (cpusets callback_sem).Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove useless spin_retry_counter and fix compilation for UP kernels.
Signed-off-by: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kbSigned-off-by: Arjan van de Ven
Signed-off-by: Ingo Molnar
Acked-by: Jeff Garzik
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add a new SCHED_BATCH (3) scheduling policy: such tasks are presumed
CPU-intensive, and will acquire a constant +5 priority level penalty. Such
policy is nice for workloads that are non-interactive, but which do not
want to give up their nice levels. The policy is also useful for workloads
that want a deterministic scheduling policy without interactivity causing
extra preemptions (between that workload's tasks).Signed-off-by: Ingo Molnar
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I tried to send the forcedeth maintainer an email, but it came back with:
"The mail address manfreds@colorfullife.com is not read anymore.
Please resent your mail to manfred@ instead of manfreds@."This patch fixes this.
Signed-off-by: Adrian Bunk
-
This patch fixes a typo in the dependencies of SOFTWARE_SUSPEND.
This patch is based on a report by
Jean-Luc Leger .Signed-off-by: Adrian Bunk
Acked-by: Pavel Machek
13 Jan, 2006
3 commits
-
)
From: Nick Piggin
Track the last waker CPU, and only consider wakeup-balancing if there's a
match between current waker CPU and the previous waker CPU. This ensures
that there is some correlation between two subsequent wakeup events before
we move the task. Should help random-wakeup workloads on large SMP
systems, by reducing the migration attempts by a factor of nr_cpus.Signed-off-by: Ingo Molnar
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
)
From: Ingo Molnar
This is the latest version of the scheduler cache-hot-auto-tune patch.
The first problem was that detection time scaled with O(N^2), which is
unacceptable on larger SMP and NUMA systems. To solve this:- I've added a 'domain distance' function, which is used to cache
measurement results. Each distance is only measured once. This means
that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT
distances 0 and 1, and on SMP distance 0 is measured. The code walks
the domain tree to determine the distance, so it automatically follows
whatever hierarchy an architecture sets up. This cuts down on the boot
time significantly and removes the O(N^2) limit. The only assumption
is that migration costs can be expressed as a function of domain
distance - this covers the overwhelming majority of existing systems,
and is a good guess even for more assymetric systems.[ People hacking systems that have assymetries that break this
assumption (e.g. different CPU speeds) should experiment a bit with
the cpu_distance() function. Adding a ->migration_distance factor to
the domain structure would be one possible solution - but lets first
see the problem systems, if they exist at all. Lets not overdesign. ]Another problem was that only a single cache-size was used for measuring
the cost of migration, and most architectures didnt set that variable
up. Furthermore, a single cache-size does not fit NUMA hierarchies with
L3 caches and does not fit HT setups, where different CPUs will often
have different 'effective cache sizes'. To solve this problem:- Instead of relying on a single cache-size provided by the platform and
sticking to it, the code now auto-detects the 'effective migration
cost' between two measured CPUs, via iterating through a wide range of
cachesizes. The code searches for the maximum migration cost, which
occurs when the working set of the test-workload falls just below the
'effective cache size'. I.e. real-life optimized search is done for
the maximum migration cost, between two real CPUs.This, amongst other things, has the positive effect hat if e.g. two
CPUs share a L2/L3 cache, a different (and accurate) migration cost
will be found than between two CPUs on the same system that dont share
any caches.(The reliable measurement of migration costs is tricky - see the source
for details.)Furthermore i've added various boot-time options to override/tune
migration behavior.Firstly, there's a blanket override for autodetection:
migration_cost=1000,2000,3000
will override the depth 0/1/2 values with 1msec/2msec/3msec values.
Secondly, there's a global factor that can be used to increase (or
decrease) the autodetected values:migration_factor=120
will increase the autodetected values by 20%. This option is useful to
tune things in a workload-dependent way - e.g. if a workload is
cache-insensitive then CPU utilization can be maximized by specifying
migration_factor=0.I've tested the autodetection code quite extensively on x86, on 3
P3/Xeon/2MB, and the autodetected values look pretty good:Dual Celeron (128K L2 cache):
---------------------
migration cost matrix (max_cache_size: 131072, cpu: 467 MHz):
---------------------
[00] [01]
[00]: - 1.7(1)
[01]: 1.7(1) -
---------------------
cacheflush times [2]: 0.0 (0) 1.7 (1784008)
---------------------Here the slow memory subsystem dominates system performance, and even
though caches are small, the migration cost is 1.7 msecs.Dual HT P4 (512K L2 cache):
---------------------
migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz):
---------------------
[00] [01] [02] [03]
[00]: - 0.4(1) 0.0(0) 0.4(1)
[01]: 0.4(1) - 0.4(1) 0.0(0)
[02]: 0.0(0) 0.4(1) - 0.4(1)
[03]: 0.4(1) 0.0(0) 0.4(1) -
---------------------
cacheflush times [2]: 0.0 (33900) 0.4 (448514)
---------------------Here it can be seen that there is no migration cost between two HT
siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory
system makes inter-physical-CPU migration pretty cheap: 0.4 msecs.8-way P3/Xeon [2MB L2 cache]:
---------------------
migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
---------------------
[00] [01] [02] [03] [04] [05] [06] [07]
[00]: - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[01]: 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[02]: 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[03]: 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[04]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1)
[05]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1)
[06]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1)
[07]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) -
---------------------
cacheflush times [2]: 0.0 (0) 19.2 (19281756)
---------------------This one has huge caches and a relatively slow memory subsystem - so the
migration cost is 19 msecs.Signed-off-by: Ingo Molnar
Signed-off-by: Ashok Raj
Signed-off-by: Ken Chen
Cc:
Signed-off-by: John Hawkes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Jan, 2006
9 commits
-
Roman Zippel pointed out that the missing lower limit of intervals
leads to an accounting error in the overrun count. Enforce the lower
limit of intervals to resolution in the timer forwarding code.Signed-off-by: Thomas Gleixner
-
Change the storage format of the per base resolution to ktime_t to
make it easier accessible in the hrtimers code.Change the resolution from (NSEC_PER_SEC/HZ) to TICK_NSEC as Roman
pointed out. TICK_NSEC is closer to the real resolution.Signed-off-by: Thomas Gleixner
-
The list_head in the hrtimer structure was introduced for easy access
to the first timer with the further extensions of real high resolution
timers in mind, but it turned out in the course of development that
it is not necessary for the standard use case. Remove the list head
and access the first expiry timer by a datafield in the timer base.Signed-off-by: Thomas Gleixner
-
vSMP specific alignment patch to
1. Define INTERNODE_CACHE_SHIFT for vSMP
2. Use this for alignment of critical structures
3. Use INTERNODE_CACHE_SHIFT for ARCH_MIN_TASKALIGN,
and let the slab align task_struct allocations to the internode cacheline size
4. Introduce and use ARCH_MIN_MMSTRUCT_ALIGN for mm_struct slab allocations.Signed-off-by: Ravikiran Thirumalai
Signed-off-by: Shai Fultheim
Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds -
They are referred to often so avoid potential false sharing for them.
Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds -
- Move capable() from sched.h to capability.h;
- Use where capable() is used
(in include/, block/, ipc/, kernel/, a few drivers/,
mm/, security/, & sound/;
many more drivers/ to go)Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Uninline capable(). Saves 2K of kernel text on a generic .config, and 1K on a
tiny config. In addition it makes the use of capable more consistent between
CONFIG_SECURITY and !CONFIG_SECURITYSigned-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When a kprobes modules is written in such a way that probes are inserted on
itself, then unload of that moudle was not possible due to reference
couning on the same module.The below patch makes a check and incrementes the module refcount only if
it is not a self probed module.We need to allow modules to probe themself for kprobes performance
measurementsThis patch has been tested on several x86_64, ppc64 and IA64 architectures.
Signed-off-by: Anil S Keshavamurthy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Let's switch mutex_debug_check_no_locks_freed() to take (addr, len) as
arguments instead, since all its callers were just calculating the 'to'
address for themselves anyway... (and sometimes doing so badly).Signed-off-by: David Woodhouse
Acked-by: Ingo Molnar
Signed-off-by: Linus Torvalds