Eric Lee / linux-smarc-t335x-v3.2

26 Jun, 2005

40 commits

85d7b9498 [PATCH] Dynamic sched domains: cpuset changes ... Browse Code »

Adds the core update_cpu_domains code and updated cpusets documentation

Signed-off-by: Dinakar Guniguntala
Acked-by: Paul Jackson
Acked-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dinakar Guniguntala
2005-06-26 07:24:45 +0800
1a20ff27e [PATCH] Dynamic sched domains: sched changes ... Browse Code »

The following patches add dynamic sched domains functionality that was
extensively discussed on lkml and lse-tech. I would like to see this added to
-mm

o The main advantage with this feature is that it ensures that the scheduler
load balacing code only balances against the cpus that are in the sched
domain as defined by an exclusive cpuset and not all of the cpus in the
system. This removes any overhead due to load balancing code trying to
pull tasks outside of the cpu exclusive cpuset only to be prevented by
the tasks' cpus_allowed mask.
o cpu exclusive cpusets are useful for servers running orthogonal
workloads such as RT applications requiring low latency and HPC
applications that are throughput sensitive

o It provides a new API partition_sched_domains in sched.c
that makes dynamic sched domains possible.
o cpu_exclusive cpusets sets are now associated with a sched domain.
Which means that the users can dynamically modify the sched domains
through the cpuset file system interface
o ia64 sched domain code has been updated to support this feature as well
o Currently, this does not support hotplug. (However some of my tests
indicate hotplug+preempt is currently broken)
o I have tested it extensively on x86.
o This should have very minimal impact on performance as none of
the fast paths are affected

Signed-off-by: Dinakar Guniguntala
Acked-by: Paul Jackson
Acked-by: Nick Piggin
Acked-by: Matthew Dobson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dinakar Guniguntala
2005-06-26 07:24:45 +0800
37e4ab3f0 [PATCH] Changing RT priority without CAP_SYS_NICE ... Browse Code »

Presently, a process without the capability CAP_SYS_NICE can not change
its own policy, which is OK.

But it can also not decrease its RT priority (if scheduled with policy
SCHED_RR or SCHED_FIFO), which is what this patch changes.

The rationale is the same as for the nice value: a process should be
able to require less priority for itself. Increasing the priority is
still not allowed.

This is for example useful if you give a multithreaded user process a RT
priority, and the process would like to organize its internal threads
using priorities also. Then you can give the process the highest
priority needed N, and the process starts its threads with lower
priorities: N-1, N-2...

The POSIX norm says that the permissions are implementation specific, so
I think we can do that.

In a sense, it makes the permissions consistent whatever the policy is:
with this patch, process scheduled by SCHED_FIFO, SCHED_RR and
SCHED_OTHER can all decrease their priority.

From: Ingo Molnar

cleaned up and merged to -mm.

Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Olivier Croquette
2005-06-26 07:24:44 +0800
a3464a102 [PATCH] sched: micro-optimize task requeueing in schedule() ... Browse Code »

micro-optimize task requeueing in schedule() & clean up recalc_task_prio().

Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen Shang
2005-06-26 07:24:44 +0800
77391d716 [PATCH] sched: relax pinned balancing ... Browse Code »

The maximum rebalance interval allowed by the multiprocessor balancing
backoff is often not large enough to handle corner cases where there are
lots of tasks pinned on a CPU. Suresh reported:

I see system livelock's if for example I have 7000 processes
pinned onto one cpu (this is on the fastest 8-way system I
have access to).

After this patch, the machine is reported to go well above this number.

Signed-off-by: Nick Piggin
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:44 +0800
476d139c2 [PATCH] sched: consolidate sbe sbf ... Browse Code »

Consolidate balance-on-exec with balance-on-fork. This is made easy by the
sched-domains RCU patches.

As well as the general goodness of code reduction, this allows the runqueues
to be unlocked during balance-on-fork.

schedstats is a problem. Maybe just have balance-on-event instead of
distinguishing fork and exec?

Signed-off-by: Nick Piggin
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:44 +0800
674311d5b [PATCH] sched: RCU domains ... Browse Code »

One of the problems with the multilevel balance-on-fork/exec is that it needs
to jump through hoops to satisfy sched-domain's locking semantics (that is,
you may traverse your own domain when not preemptable, and you may traverse
others' domains when holding their runqueue lock).

balance-on-exec had to potentially migrate between more than one CPU before
finding a final CPU to migrate to, and balance-on-fork needed to potentially
take multiple runqueue locks.

So bite the bullet and make sched-domains go completely RCU. This actually
simplifies the code quite a bit.

From: Ingo Molnar

schedstats RCU fix, and a nice comment on for_each_domain, from Ingo.

Signed-off-by: Ingo Molnar
Signed-off-by: Nick Piggin
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:44 +0800
3dbd53420 [PATCH] sched: multilevel sbe sbf ... Browse Code »

The fundamental problem that Suresh has with balance on exec and fork is that
it only tries to balance the top level domain with the flag set.

This was worked around by removing degenerate domains, but is still a problem
if people want to start using more complex sched-domains, especially
multilevel NUMA that ia64 is already using.

This patch makes balance on fork and exec try balancing over not just the top
most domain with the flag set, but all the way down the domain tree.

Signed-off-by: Nick Piggin
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:43 +0800
245af2c78 [PATCH] sched: remove degenerate domains ... Browse Code »

Remove degenerate scheduler domains during the sched-domain init.

For example on x86_64, we always have NUMA configured in. On Intel EM64T
systems, top most sched domain will be of NUMA and with only one sched_group
in it.

With fork/exec balances(recent Nick's fixes in -mm tree), we always endup
taking wrong decisions because of this topmost domain (as it contains only one
group and find_idlest_group always returns NULL). We will endup loading HT
package completely first, letting active load balance kickin and correct it.

In general, this patch also makes sense with out recent Nick's fixes in -mm.

From: Nick Piggin

Modified to account for more than just sched_groups when scanning for
degenerate domains by Nick Piggin. And allow a runqueue's sd to go NULL
rather than keep a single degenerate domain around (this happens when you run
with maxcpus=1).

Signed-off-by: Suresh Siddha
Signed-off-by: Nick Piggin
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Suresh Siddha
2005-06-26 07:24:43 +0800
41c7ce9ad [PATCH] sched: null domains ... Browse Code »

Fix the last 2 places that directly access a runqueue's sched-domain and
assume it cannot be NULL.

That allows the use of NULL for domain, instead of a dummy domain, to signify
no balancing is to happen. No functional changes.

Signed-off-by: Nick Piggin
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:43 +0800
4866cde06 [PATCH] sched: cleanup context switch locking ... Browse Code »

Instead of requiring architecture code to interact with the scheduler's
locking implementation, provide a couple of defines that can be used by the
architecture to request runqueue unlocked context switches, and ask for
interrupts to be enabled over the context switch.

Also replaces the "switch_lock" used by these architectures with an oncpu
flag (note, not a potentially slow bitflag). This eliminates one bus
locked memory operation when context switching, and simplifies the
task_running function.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:43 +0800
48c08d3f8 [PATCH] sched: uninline task_timeslice ... Browse Code »

"Chen, Kenneth W"

uninline task_timeslice() - reduces code footprint noticeably, and it's
slowpath code.

Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2005-06-26 07:24:43 +0800
687f1661d [PATCH] sched: sched tuning ... Browse Code »

Do some basic initial tuning.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:42 +0800
68767a0ae [PATCH] sched: schedstats update for balance on fork ... Browse Code »

Add SCHEDSTAT statistics for sched-balance-fork.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:42 +0800
147cbb4bb [PATCH] sched: balance on fork ... Browse Code »

Reimplement the balance on exec balancing to be sched-domains aware. Use this
to also do balance on fork balancing. Make x86_64 do balance on fork over the
NUMA domain.

The problem that the non sched domains aware blancing became apparent on dual
core, multi socket opterons. What we want is for the new tasks to be sent to
a different socket, but more often than not, we would first load up our
sibling core, or fill two cores of a single remote socket before selecting a
new one.

This gives large improvements to STREAM on such systems.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:42 +0800
cafb20c1f [PATCH] sched: no aggressive idle balancing ... Browse Code »

Remove the very aggressive idle stuff that has recently gone into 2.6 - it is
going against the direction we are trying to go. Hopefully we can regain
performance through other methods.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:42 +0800
a3f21bce1 [PATCH] sched: tweak affine wakeups ... Browse Code »

Do less affine wakeups. We're trying to reduce dbt2-pgsql idle time
regressions here... make sure we don't don't move tasks the wrong way in an
imbalance condition. Also, remove the cache coldness requirement from the
calculation - this seems to induce sharp cutoff points where behaviour will
suddenly change on some workloads if the load creeps slightly over or under
some point. It is good for periodic balancing because in that case have
otherwise have no other context to determine what task to move.

But also make a minor tweak to "wake balancing" - the imbalance tolerance is
now set at half the domain's imbalance, so we get the opportunity to do wake
balancing before the more random periodic rebalancing gets preformed.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:41 +0800
7897986ba [PATCH] sched: balance timers ... Browse Code »

Do CPU load averaging over a number of different intervals. Allow each
interval to be chosen by sending a parameter to source_load and target_load.
0 is instantaneous, idx > 0 returns a decaying average with the most recent
sample weighted at 2^(idx-1). To a maximum of 3 (could be easily increased).

So generally a higher number will result in more conservative balancing.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:41 +0800
99b61ccf0 [PATCH] sched: less aggressive idle balancing ... Browse Code »

Remove the special casing for idle CPU balancing. Things like this are
hurting for example on SMT, where are single sibling being idle doesn't really
warrant a really aggressive pull over the NUMA domain, for example.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:41 +0800
db935dbd4 [PATCH] sched: add debugging ... Browse Code »

These conditions should now be impossible, and we need to fix them if they
happen.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:41 +0800
395074513 [PATCH] sched: fix SMT scheduling problems ... Browse Code »

SMT balancing has a couple of problems. Firstly, active_load_balance is too
complex - basically it should be a dumb helper for when the periodic balancer
has determined there is an imbalance, but gets stuck because the task is
running.

So rip out all its "smarts", and just make it move one task to the target CPU.

Second, the busy CPU's sched-domain tree was being used for active balancing.
This means that it may not see that nr_balance_failed has reached a critical
level. So use the target CPU's sched-domain tree for this. We can do this
because we hold its runqueue lock.

Lastly, reset nr_balance_failed to a point where we allow cache hot migration.
This will help ensure active load balancing is successful.

Thanks to Suresh Siddha for pointing out these issues.

Signed-off-by: Nick Piggin
Signed-off-by: Suresh Siddha
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:41 +0800
16cfb1c04 [PATCH] sched: reduce active load balancing ... Browse Code »

Fix up active load balancing a bit so it doesn't get called when it shouldn't.
Reset the nr_balance_failed counter at more points where we have found
conditions to be balanced. This reduces too aggressive active balancing seen
on some workloads.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:40 +0800
810267944 [PATCH] sched: improve load balancing pinned tasks ... Browse Code »

John Hawkes explained the problem best:

A large number of processes that are pinned to a single CPU results
in every other CPU's load_balance() seeing this overloaded CPU as
"busiest", yet move_tasks() never finds a task to pull-migrate. This
condition occurs during module unload, but can also occur as a
denial-of-service using sys_sched_setaffinity(). Several hundred
CPUs performing this fruitless load_balance() will livelock on the
busiest CPU's runqueue lock. A smaller number of CPUs will livelock
if the pinned task count gets high.

Expanding slightly on John's patch, this one attempts to work out whether the
balancing failure has been due to too many tasks pinned on the runqueue. This
allows it to be basically invisible to the regular blancing paths (ie. when
there are no pinned tasks). We can use this extra knowledge to shut down the
balancing faster, and ensure the migration threads don't start running which
is another problem observed in the wild.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:40 +0800
e0f364f40 [PATCH] sched: cleanup wake_idle ... Browse Code »

New sched-domains code means we don't get spans with offline CPUs in
them.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-26 07:24:40 +0800
44f410a7c [PATCH] hpet: do_div fix ... Browse Code »

We don't need to use do_div() on a 32-bit quantity.

Signed-off-by: Jon Smirl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jon Smirl
2005-06-26 07:24:40 +0800
6283d58e7 [PATCH] reiserfs: do not ignore i/io error on readpage ... Browse Code »

Reiserfs's readpage does not notice i/o errors. This patch makes
reiserfs_readpage to return -EIO when i/o error appears.

This patch makes reiserfs to not ignore I/O error on readpage.

Signed-off-by: Qu Fuping
Signed-off-by: Vladimir V. Saveliev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Qu Fuping
2005-06-26 07:24:40 +0800
442ff7022 [PATCH] mconf.c needs locale.h ... Browse Code »

This is failing on my cross-compilation environment (From a solaris system)
using gcc-3.4.1 (as the compiler can't find a prototype for the setlocale()
function).

Signed-off-by: Jean-Christophe Dubois
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jean-Christophe Dubois
2005-06-26 07:24:39 +0800
b0cfbd995 [PATCH] fix for generic_file_write iov problem ... Browse Code »

Here is the fix for the problem described in

http://bugzilla.kernel.org/show_bug.cgi?id=4721

Basically, problem is generic_file_buffered_write() is accessing beyond end
of the iov[] vector after handling the last vector. If we happen to cross
page boundary, we get a fault.

I think this simple patch is good enough. If we really don't want to
depend on the "count", then we need pass nr_segs to
filemap_set_next_iovec() and decrement it and check it.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2005-06-26 07:24:39 +0800
8ae0b7781 [PATCH] fix fsync(dir) return value for ram-based filesystems ... Browse Code »

Any filesystem which is using simple_dir_operations will retunr -EINVAL for
fsync() on a directory. Make it return zero instead.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-06-26 07:24:38 +0800
6f9beccb9 [PATCH] tpm: fix misc name memory problem ... Browse Code »

I was using invalid memory for the miscdevice.name. This patch fixes the
problem which was manifested by an ugly entry in /proc/misc.

Signed-off-by: Kylene Hall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kylene Jo Hall
2005-06-26 07:24:38 +0800
1dda8abe6 [PATCH] tpm: Fix pubek parsing ... Browse Code »

Fix parsing of the PUBEK for display which was leading to showing the wrong
modulus length and modulus.

Signed-off-by: Kylene Hall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kylene Jo Hall
2005-06-26 07:24:38 +0800
daacdfa6e [PATCH] tpm: Support new National TPMs ... Browse Code »

This patch is work to support new National TPMs that problems were reported
with on Thinkpad T43 and Thinkcentre S51. Thanks to Jens and Gang for
their debugging work on these issues.

Signed-off-by: Kylene Hall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kylene Jo Hall
2005-06-26 07:24:38 +0800
b2b186600 [PATCH] RCU: clean up a few remaining synchronize_kernel() calls ... Browse Code »

2.6.12-rc6-mm1 has a few remaining synchronize_kernel()s, some (but not
all) in comments. This patch changes these synchronize_kernel() calls (and
comments) to synchronize_rcu() or synchronize_sched() as follows:

- arch/x86_64/kernel/mce.c mce_read(): change to synchronize_sched() to
handle races with machine-check exceptions (synchronize_rcu() would not cut
it given RCU implementations intended for hardcore realtime use.

- drivers/input/serio/i8042.c i8042_stop(): change to synchronize_sched() to
handle races with i8042_interrupt() interrupt handler. Again,
synchronize_rcu() would not cut it given RCU implementations intended for
hardcore realtime use.

- include/*/kdebug.h comments: change to synchronize_sched() to handle races
with NMIs. As before, synchronize_rcu() would not cut it...

- include/linux/list.h comment: change to synchronize_rcu(), since this
comment is for list_del_rcu().

- security/keys/key.c unregister_key_type(): change to synchronize_rcu(),
since this is interacting with RCU read side.

- security/keys/process_keys.c install_session_keyring(): change to
synchronize_rcu(), since this is interacting with RCU read side.

Signed-off-by: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul E. McKenney
2005-06-26 07:24:38 +0800
ae67cd643 [PATCH] Makefile: s/gcc-option/cc-option/ ... Browse Code »

Fixes http://bugme.osdl.org/show_bug.cgi?id=4726

Signed-off-by: Alexey Dobriyan
Cc: Sam Ravnborg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2005-06-26 07:24:37 +0800
4d0145a7d [PATCH] compilation errors in drivers/serial/mpsc.c ... Browse Code »

The following patch fix gcc 4 compilation errors in drivers/serial/mpsc.c

Signed-off-by: Lee Nicks
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Nicks
2005-06-26 07:24:37 +0800
66a464dbc [PATCH] s390: debug feature changes ... Browse Code »

This patch changes the memory allocation method for the s390 debug feature.
Trace buffers had been allocated using the get_free_pages() function before.
Therefore it was not possible to get big memory areas in a running system due
to memory fragmentation. Now the trace buffers are subdivided into several
subbuffers with pagesize. Therefore it is now possible to allocate more
memory for the trace buffers and more trace records can be written.

In addition to that, dynamic specification of the size of the trace buffers is
implemented. It is now possible to change the size of a trace buffer using a
new debugfs file instance. When writing a number into this file, the trace
buffer size is changed to 'number * pagesize'.

In the past all the traces could be obtained from userspace by accessing files
in the "proc" filesystem. Now with debugfs we have a new filesystem which
should be used for debugging purposes. This patch moves the debug feature
from procfs to debugfs.

Since the interface of debug_register() changed, all device drivers, which use
the debug feature had to be adjusted.

Signed-off-by: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Holzheu
2005-06-26 07:24:37 +0800
6b979de39 [PATCH] s390: add vmcp interface ... Browse Code »

Add interface to issue VM control program commands.

Signed-off-by: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christian Borntraeger
2005-06-26 07:24:37 +0800
77fa22450 [PATCH] s390: improved machine check handling ... Browse Code »

Improved machine check handling. Kernel is now able to receive machine checks
while in kernel mode (system call, interrupt and program check handling).
Also register validation is now performed.

Signed-off-by: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2005-06-26 07:24:37 +0800
f901e5d1e [PATCH] s/390: compile fix for dcssblk ... Browse Code »

Fix compile breakage in the dcss block driver introduced by the attribute
changes.

Signed-off-by: Cornelia Huck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cornelia Huck
2005-06-26 07:24:36 +0800
c551288e3 [PATCH] s/390: use klist in dasd driver ... Browse Code »

Convert the dasd driver to use the new klist interface.

Signed-off-by: Cornelia Huck
Cc: Greg KH
Cc: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cornelia Huck
2005-06-26 07:24:36 +0800