Doug / smarc-fsl-linux-kernel | Embedian Git Server

01 Oct, 2006

6 commits

ef6edc974 [PATCH] Directed yield: cpu_relax variants for spinlocks and rw-locks ... Browse Code »

On systems running with virtual cpus there is optimization potential in
regard to spinlocks and rw-locks. If the virtual cpu that has taken a lock
is known to a cpu that wants to acquire the same lock it is beneficial to
yield the timeslice of the virtual cpu in favour of the cpu that has the
lock (directed yield).

With CONFIG_PREEMPT="n" this can be implemented by the architecture without
common code changes. Powerpc already does this.

With CONFIG_PREEMPT="y" the lock loops are coded with _raw_spin_trylock,
_raw_read_trylock and _raw_write_trylock in kernel/spinlock.c. If the lock
could not be taken cpu_relax is called. A directed yield is not possible
because cpu_relax doesn't know anything about the lock. To be able to
yield the lock in favour of the current lock holder variants of cpu_relax
for spinlocks and rw-locks are needed. The new _raw_spin_relax,
_raw_read_relax and _raw_write_relax primitives differ from cpu_relax
insofar that they have an argument: a pointer to the lock structure.

Signed-off-by: Martin Schwidefsky
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Haavard Skinnemoen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Martin Schwidefsky
2006-10-01 15:39:21 +0800
756184b7d [PATCH] CodingStyle cleanup for kernel/sys.c ... Browse Code »

Fix up kernel/sys.c to be consistent with CodingStyle and the rest of the
file.

Signed-off-by: Cal Peake
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cal Peake
2006-10-01 15:39:20 +0800
5c87579e6 [PATCH] maximum latency tracking infrastructure ... Browse Code »

Add infrastructure to track "maximum allowable latency" for power saving
policies.

The reason for adding this infrastructure is that power management in the
idle loop needs to make a tradeoff between latency and power savings
(deeper power save modes have a longer latency to running code again). The
code that today makes this tradeoff just does a rather simple algorithm;
however this is not good enough: There are devices and use cases where a
lower latency is required than that the higher power saving states provide.
An example would be audio playback, but another example is the ipw2100
wireless driver that right now has a very direct and ugly acpi hook to
disable some higher power states randomly when it gets certain types of
error.

The proposed solution is to have an interface where drivers can

* announce the maximum latency (in microseconds) that they can deal with
* modify this latency
* give up their constraint

and a function where the code that decides on power saving strategy can
query the current global desired maximum.

This patch has a user of each side: on the consumer side, ACPI is patched
to use this, on the producer side the ipw2100 driver is patched.

A generic maximum latency is also registered of 2 timer ticks (more and you
lose accurate time tracking after all).

While the existing users of the patch are x86 specific, the infrastructure
is not. I'd like to ask the arch maintainers of other architectures if the
infrastructure is generic enough for their use (assuming the architecture
has such a tradeoff as concept at all), and the sound/multimedia driver
owners to look at the driver facing API to see if this is something they
can use.

[akpm@osdl.org: cleanups]
Signed-off-by: Arjan van de Ven
Signed-off-by: Ingo Molnar
Acked-by: Jesse Barnes
Cc: "Brown, Len"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2006-10-01 15:39:19 +0800
9361401eb [PATCH] BLOCK: Make it possible to disable the block layer [try #6] ... Browse Code »

Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.

This patch does the following:

(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.

(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:

(*) Block I/O tracing.

(*) Disk partition code.

(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.

(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.

(*) MTD blockdev handling and FTL.

(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.

(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.

(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.

(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.

(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:

(*) Default blockdev file operations (to give error ENODEV on opening).

(*) Makes some /proc changes:

(*) /proc/devices does not list any blockdevs.

(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.

(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.

(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).

(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.

Signed-Off-By: David Howells
Signed-off-by: Jens Axboe

David Howells
2006-10-01 02:52:31 +0800
07f3f05c1 [PATCH] BLOCK: Move extern declarations out of fs/*.c into header files [try #6] ... Browse Code »

Create a new header file, fs/internal.h, for common definitions local to the
sources in the fs/ directory.

Move extern definitions that should be in header files from fs/*.c to
fs/internal.h or other main header files where they span directories.

Signed-Off-By: David Howells
Signed-off-by: Jens Axboe

David Howells
2006-10-01 02:52:18 +0800
0d67a46df [PATCH] BLOCK: Remove duplicate declaration of exit_io_context() [try #6] ... Browse Code »

Remove the duplicate declaration of exit_io_context() from linux/sched.h.

Signed-Off-By: David Howells
Signed-off-by: Jens Axboe

David Howells
2006-10-01 02:31:20 +0800

30 Sep, 2006

34 commits

34596dc9e [PATCH] Define vsyscall cache as blob to make clearer that user space shouldn't use it ... Browse Code »

Signed-off-by: Andi Kleen

Andi Kleen
2006-09-30 07:47:55 +0800
29cbc78b9 [PATCH] x86: Clean up x86 NMI sysctls ... Browse Code »

Use prototypes in headers
Don't define panic_on_unrecovered_nmi for all architectures

Cc: dzickus@redhat.com

Signed-off-by: Andi Kleen

Andi Kleen
2006-09-30 07:47:55 +0800
181b64803 [PATCH] cpuset: fix obscure attach_task vs exiting race ... Browse Code »

Fix obscure race condition in kernel/cpuset.c attach_task() code.

There is basically zero chance of anyone accidentally being harmed by this
race.

It requires a special 'micro-stress' load and a special timing loop hacks
in the kernel to hit in less than an hour, and even then you'd have to hit
it hundreds or thousands of times, followed by some unusual and senseless
cpuset configuration requests, including removing the top cpuset, to cause
any visibly harm affects.

One could, with perhaps a few days or weeks of such effort, get the
reference count on the top cpuset below zero, and manage to crash the
kernel by asking to remove the top cpuset.

I found it by code inspection.

The race was introduced when 'the_top_cpuset_hack' was introduced, and one
piece of code was not updated. An old check for a possibly null task
cpuset pointer needed to be changed to a check for a task marked
PF_EXITING. The pointer can't be null anymore, thanks to
the_top_cpuset_hack (documented in kernel/cpuset.c). But the task could
have gone into PF_EXITING state after it was found in the task_list scan.

If a task is PF_EXITING in this code, it is possible that its task->cpuset
pointer is pointing to the top cpuset due to the_top_cpuset_hack, rather
than because the top_cpuset was that tasks last valid cpuset. In that
case, the wrong cpuset reference counter would be decremented.

The fix is trivial. Instead of failing the system call if the tasks cpuset
pointer is null here, fail it if the task is in PF_EXITING state.

The code for 'the_top_cpuset_hack' that changes an exiting tasks cpuset to
the top_cpuset is done without locking, so could happen at anytime. But it
is done during the exit handling, after the PF_EXITING flag is set. So if
we verify that a task is still not PF_EXITING after we copy out its cpuset
pointer (into 'oldcs', below), we know that 'oldcs' is not one of these
hack references to the top_cpuset.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-09-30 00:18:25 +0800
03cbc358a [PATCH] lockdep core: improve the lock-chain-hash ... Browse Code »

With CONFIG_DEBUG_LOCK_ALLOC turned off i was getting sporadic failures in
the locking self-test:

------------>
| Locking API testsuite:
----------------------------------------------------------------------------
| spin |wlock |rlock |mutex | wsem | rsem |
--------------------------------------------------------------------------
A-A deadlock: ok | ok | ok | ok | ok | ok |
A-B-B-A deadlock: ok | ok | ok | ok | ok | ok |
A-B-B-C-C-A deadlock: ok | ok | ok | ok | ok | ok |
A-B-C-A-B-C deadlock: ok | ok | ok | ok | ok | ok |
A-B-B-C-C-D-D-A deadlock: ok |FAILED| ok | ok | ok | ok |
A-B-C-D-B-D-D-A deadlock: ok | ok | ok | ok | ok | ok |
A-B-C-D-B-C-D-A deadlock: ok | ok | ok | ok | ok |FAILED|

after much debugging it turned out to be caused by accidental chain-hash
key collisions. The current hash is:

#define iterate_chain_key(key1, key2) \
(((key1) << MAX_LOCKDEP_KEYS_BITS/2) ^ \
((key1) >> (64-MAX_LOCKDEP_KEYS_BITS/2)) ^ \
(key2))

where MAX_LOCKDEP_KEYS_BITS is 11. This hash is pretty good as it will
shift by 5 bits in every iteration, where every new ID 'mixed' into the
hash would have up to 11 bits. But because there was a 6 bits overlap
between subsequent IDs and their high bits tended to be similar, there was
a chance for accidental chain-hash collision for a low number of locks
held.

the solution is to shift by 11 bits:

#define iterate_chain_key(key1, key2) \
(((key1) << MAX_LOCKDEP_KEYS_BITS) ^ \
((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \
(key2))

This keeps the hash perfect up to 5 locks held, but even above that the
hash is still good because 11 bits is a relative prime to the total 64
bits, so a complete match will only occur after 64 held locks (which doesnt
happen in Linux). Even after 5 locks held, entropy of the 5 IDs mixed into
the hash is already good enough so that overlap doesnt generate a colliding
hash ID.

with this change the false positives went away.

Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2006-09-30 00:18:25 +0800
eb84a20e9 [PATCH] audit/accounting: tty locking ... Browse Code »

Add tty locking around the audit and accounting code.

The whole current->signal-> locking is all deeply strange but it's for
someone else to sort out. Add rather than replace the lock for acct.c

Signed-off-by: Alan Cox
Acked-by: Arjan van de Ven
Cc: Al Viro
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alan Cox
2006-09-30 00:18:25 +0800
e5582ca21 [PATCH] stop_machine.c copyright ... Browse Code »

I had to look back: this code was extracted from the module.c code in 2005.

Signed-off-by: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
2006-09-30 00:18:24 +0800
04b1db9fd [PATCH] /sys/modules: allow full length section names ... Browse Code »

I've been using systemtap for some debugging and I noticed that it can't
probe a lot of modules. Turns out it's kind of silly, the sections section
of /sys/module is limited to 32byte filenames and many of the actual
sections are a a bit longer than that.

[akpm@osdl.org: rewrite to use dymanic allocation]
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ian S. Nelson
2006-09-30 00:18:23 +0800
b1aac8bb8 [PATCH] cpuset: hotunplug cpus and mems in all cpusets ... Browse Code »

The cpuset code handling hot unplug of CPUs or Memory Nodes was incorrect -
it could remove a CPU or Node from the top cpuset, while leaving it still
in some child cpusets.

One basic rule of cpusets is that each cpusets cpus and mems are subsets of
its parents. The cpuset hot unplug code violated this rule.

So the cpuset hotunplug handler must walk down the tree, removing any
removed CPU or Node from all cpusets.

However, it is not allowed to make a cpusets cpus or mems become empty.
They can only transition from empty to non-empty, not back.

So if the last CPU or Node would be removed from a cpuset by the above
walk, we scan back up the cpuset hierarchy, finding the nearest ancestor
that still has something online, and copy its CPU or Memory placement.

Signed-off-by: Paul Jackson
Cc: Nathan Lynch
Cc: Anton Blanchard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-09-30 00:18:21 +0800
38837fc75 [PATCH] cpuset: top_cpuset tracks hotplug changes to node_online_map ... Browse Code »

Change the list of memory nodes allowed to tasks in the top (root) nodeset
to dynamically track what cpus are online, using a call to a cpuset hook
from the memory hotplug code. Make this top cpus file read-only.

On systems that have cpusets configured in their kernel, but that aren't
actively using cpusets (for some distros, this covers the majority of
systems) all tasks end up in the top cpuset.

If that system does support memory hotplug, then these tasks cannot make
use of memory nodes that are added after system boot, because the memory
nodes are not allowed in the top cpuset. This is a surprising regression
over earlier kernels that didn't have cpusets enabled.

One key motivation for this change is to remain consistent with the
behaviour for the top_cpuset's 'cpus', which is also read-only, and which
automatically tracks the cpu_online_map.

This change also has the minor benefit that it fixes a long standing,
little noticed, minor bug in cpusets. The cpuset performance tweak to
short circuit the cpuset_zone_allowed() check on systems with just a single
cpuset (see 'number_of_cpusets', in linux/cpuset.h) meant that simply
changing the 'mems' of the top_cpuset had no affect, even though the change
(the write system call) appeared to succeed. With the following change,
that write to the 'mems' file fails -EACCES, and the 'mems' file stubbornly
refuses to be changed via user space writes. Thus no one should be mislead
into thinking they've changed the top_cpusets's 'mems' when in affect they
haven't.

In order to keep the behaviour of cpusets consistent between systems
actively making use of them and systems not using them, this patch changes
the behaviour of the 'mems' file in the top (root) cpuset, making it read
only, and making it automatically track the value of node_online_map. Thus
tasks in the top cpuset will have automatic use of hot plugged memory nodes
allowed by their cpuset.

[akpm@osdl.org: build fix]
[bunk@stusta.de: build fix]
Signed-off-by: Paul Jackson
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-09-30 00:18:21 +0800
c394cc9fb [PATCH] introduce TASK_DEAD state ... Browse Code »

I am not sure about this patch, I am asking Ingo to take a decision.

task_struct->state == EXIT_DEAD is a very special case, to avoid a confusion
it makes sense to introduce a new state, TASK_DEAD, while EXIT_DEAD should
live only in ->exit_state as documented in sched.h.

Note that this state is not visible to user-space, get_task_state() masks off
unsuitable states.

Signed-off-by: Oleg Nesterov
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:21 +0800
55a101f8f [PATCH] kill PF_DEAD flag ... Browse Code »

After the previous change (->flags & PF_DEAD) (->state == EXIT_DEAD), we
don't need PF_DEAD any longer.

Signed-off-by: Oleg Nesterov
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:20 +0800
29b884921 [PATCH] set EXIT_DEAD state in do_exit(), not in schedule() ... Browse Code »

schedule() checks PF_DEAD on every context switch and sets ->state = EXIT_DEAD
to ensure that the exiting task will be deactivated. Note that this EXIT_DEAD
is in fact a "random" value, we can use any bit except normal TASK_XXX values.

It is better to set this state in do_exit() along with PF_DEAD flag and remove
that check in schedule().

We are safe wrt concurrent try_to_wake_up() (for example ptrace, tkill), it
can not change task's ->state: the 'state' argument of try_to_wake_up() can't
have EXIT_DEAD bit. And in case when try_to_wake_up() sees a stale value of
->state == TASK_RUNNING it will do nothing.

Signed-off-by: Oleg Nesterov
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:20 +0800
aaa2a97eb [PATCH] sys_get_robust_list(): don't take tasklist_lock ... Browse Code »

use rcu locks for find_task_by_pid().

Signed-off-by: Oleg Nesterov
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:18 +0800
d359b549b [PATCH] futex_find_get_task(): don't take tasklist_lock ... Browse Code »

It is ok to do find_task_by_pid() + get_task_struct() under
rcu_read_lock(), we cand drop tasklist_lock.

Note that testing of ->exit_state is racy with or without tasklist anyway.

Signed-off-by: Oleg Nesterov
Acked-by: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:18 +0800
5b160f5ec [PATCH] copy_process: cosmetic ->ioprio tweak ... Browse Code »

copy_process:
// holds tasklist_lock + ->siglock
/*
* inherit ioprio
*/
p->ioprio = current->ioprio;

Why? ->ioprio was already copied in dup_task_struct(). I guess this is
needed to ensure that the child can't escape
sys_ioprio_set(IOPRIO_WHO_{PGRP,USER}), yes?

In that case we don't need ->siglock held, and the comment should be
updated.

Signed-off-by: Oleg Nesterov
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:18 +0800
1c573afeb [PATCH] reparent_to_init(): use has_rt_policy() ... Browse Code »

Remove open-coded has_rt_policy(), no changes in kernel/exit.o

Signed-off-by: Oleg Nesterov
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Steven Rostedt
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:18 +0800
8dc3e9099 [PATCH] sched_setscheduler: fix? policy checks ... Browse Code »

I am not sure this patch is correct: I can't understand what the current
code does, and I don't know what it was supposed to do.

The comment says:

* can't change policy, except between SCHED_NORMAL
* and SCHED_BATCH:

The code:

if (((policy != SCHED_NORMAL && p->policy != SCHED_BATCH) &&
(policy != SCHED_BATCH && p->policy != SCHED_NORMAL)) &&

But this is equivalent to:

if ( (is_rt_policy(policy) && has_rt_policy(p)) &&

which means something different. We can't _decrease_ the current
->rt_priority with such a check (if rlim[RLIMIT_RTPRIO] == 0).

Probably, it was supposed to be:

if ( !(policy == SCHED_NORMAL && p->policy == SCHED_BATCH) &&
!(policy == SCHED_BATCH && p->policy == SCHED_NORMAL)

this matches the comment, but strange: it doesn't allow to _drop_ the
realtime priority when rlim[RLIMIT_RTPRIO] == 0.

I think the right check would be:

/* can't set/change rt policy */
if (is_rt_policy(policy) &&
policy != p->policy &&
!rlim_rtprio)
return -EPERM;

Signed-off-by: Oleg Nesterov
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Steven Rostedt
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:17 +0800
57a6f51c4 [PATCH] introduce is_rt_policy() helper ... Browse Code »

Imho, makes the code a bit easier to read.

Signed-off-by: Oleg Nesterov
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Steven Rostedt
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:17 +0800
5fe1d75f3 [PATCH] do_sched_setscheduler(): don't take tasklist_lock ... Browse Code »

Use rcu locks instead. sched_setscheduler() now takes ->siglock
before reading ->signal->rlim[].

Signed-off-by: Oleg Nesterov
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Steven Rostedt
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:17 +0800
c9472e0f2 [PATCH] kill extraneous printk in kernel_restart() ... Browse Code »

Get rid of an extraneous printk in kernel_restart().

Signed-off-by: Cal Peake
Acked-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cal Peake
2006-09-30 00:18:16 +0800
111dbe0c8 [PATCH] Fix ____call_usermodehelper errors being silently ignored ... Browse Code »

If ____call_usermodehelper fails, we're not interested in the child
process' exit value, but the real error, so let's stop wait_for_helper from
overwriting it in that case.

Issue discovered by Benedikt Böhm while working on a Linux-VServer usermode
helper.

Signed-off-by: Björn Steinbrink
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Björn Steinbrink
2006-09-30 00:18:16 +0800
54306cf04 [PATCH] exit: fix crash case ... Browse Code »

If we are going to BUG() not panic() here then we should cover the case of
the BUG being compiled out

Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alan Cox
2006-09-30 00:18:16 +0800
0b4a8a789 [PATCH] kexec warning fix ... Browse Code »

This fixes a couple of compiler warnings, and adds paranoia checks as well.

Signed-off-by: Roland McGrath
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2006-09-30 00:18:15 +0800
3171a0305 [PATCH] simplify update_times (avoid jiffies/jiffies_64 aliasing problem) ... Browse Code »

Pass ticks to do_timer() and update_times(), and adjust x86_64 and s390
timer interrupt handler with this change.

Currently update_times() calculates ticks by "jiffies - wall_jiffies", but
callers of do_timer() should know how many ticks to update. Passing ticks
get rid of this redundant calculation. Also there are another redundancy
pointed out by Martin Schwidefsky.

This cleanup make a barrier added by
5aee405c662ca644980c184774277fc6d0769a84 needless. So this patch removes
it.

As a bonus, this cleanup make wall_jiffies can be removed easily, since now
wall_jiffies is always synced with jiffies. (This patch does not really
remove wall_jiffies. It would be another cleanup patch)

Signed-off-by: Atsushi Nemoto
Cc: Martin Schwidefsky
Cc: "Eric W. Biederman"
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: john stultz
Cc: Andi Kleen
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Acked-by: Russell King
Cc: Ian Molton
Cc: Mikael Starvik
Acked-by: David Howells
Cc: Yoshinori Sato
Cc: Hirokazu Takata
Acked-by: Ralf Baechle
Cc: Kyle McMartin
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Paul Mundt
Cc: Kazumoto Kojima
Cc: Richard Curnow
Cc: William Lee Irwin III
Cc: "David S. Miller"
Cc: Jeff Dike
Cc: Paolo 'Blaisorblade' Giarrusso
Cc: Miles Bader
Cc: Chris Zankel
Acked-by: "Luck, Tony"
Cc: Geert Uytterhoeven
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Atsushi Nemoto
2006-09-30 00:18:15 +0800
27d91e07f [PATCH] __dequeue_signal() cleanup ... Browse Code »

This tightens up __dequeue_signal a little. It also avoids doing
recalc_sigpending twice in a row, instead doing it once in dequeue_signal.

Signed-off-by: Roland McGrath
Cc: Oleg Nesterov
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2006-09-30 00:18:15 +0800
b9ecb2bd5 [PATCH] has_stopped_jobs() cleanup ... Browse Code »

This check has been obsolete since the introduction of TASK_TRACED. Now
TASK_STOPPED always means job control stop.

Signed-off-by: Roland McGrath
Cc: Oleg Nesterov
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2006-09-30 00:18:15 +0800
e4b765551 [PATCH] posix-timers: Fix the flags handling in posix_cpu_nsleep() ... Browse Code »

When a posix_cpu_nsleep() sleep is interrupted by a signal more than twice, it
incorrectly reports the sleep time remaining to the user. Because
posix_cpu_nsleep() doesn't report back to the user when it's called from
restart function due to the wrong flags handling.

This patch, which applies after previous one, moves the nanosleep() function
from posix_cpu_nsleep() to do_cpu_nanosleep() and cleans up the flags handling
appropriately.

Signed-off-by: Toyo Abe
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toyo Abe
2006-09-30 00:18:15 +0800
1711ef386 [PATCH] posix-timers: Fix clock_nanosleep() doesn't return the remaining time in compatibility mode ... Browse Code »

The clock_nanosleep() function does not return the time remaining when the
sleep is interrupted by a signal.

This patch creates a new call out, compat_clock_nanosleep_restart(), which
handles returning the remaining time after a sleep is interrupted. This
patch revives clock_nanosleep_restart(). It is now accessed via the new
call out. The compat_clock_nanosleep_restart() is used for compatibility
access.

Since this is implemented in compatibility mode the normal path is
virtually unaffected - no real performance impact.

Signed-off-by: Toyo Abe
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toyo Abe
2006-09-30 00:18:15 +0800
07dccf334 [PATCH] check return value of cpu_callback ... Browse Code »

Spawing ksoftirqd, migration, or watchdog, and calling init_timers_cpu()
may fail with small memory. If it happens in initcalls, kernel NULL
pointer dereference happens later. This patch makes crash happen
immediately in such cases. It seems a bit better than getting kernel NULL
pointer dereference later.

Cc: Ingo Molnar
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2006-09-30 00:18:14 +0800
a45bce495 [PATCH] memory ordering in __kfifo primitives ... Browse Code »

Both __kfifo_put() and __kfifo_get() have header comments stating that if
there is but one concurrent reader and one concurrent writer, locking is not
necessary. This is almost the case, but a couple of memory barriers are
needed. Another option would be to change the header comments to remove the
bit about locking not being needed, and to change the those callers who
currently don't use locking to add the required locking. The attachment
analyzes this approach, but the patch below seems simpler.

Signed-off-by: Paul E. McKenney
Cc: Stelian Pop
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul E. McKenney
2006-09-30 00:18:13 +0800
99de055ac [PATCH] lockdep: print kernel version ... Browse Code »

Lets do the same thing we do for oopses - print out the version in the
report. It's an extra line of output though. We could tack it on the end
of the INFO: lines, but that screws up Ingo's pretty output.

Signed-off-by: Dave Jones
Cc: Ingo Molnar
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Jones
2006-09-30 00:18:13 +0800
f400e198b [PATCH] pidspace: is_init() ... Browse Code »

This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280). It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().

Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.

Eric's original description:

There are a lot of places in the kernel where we test for init
because we give it special properties. Most significantly init
must not die. This results in code all over the kernel test
->pid == 1.

Introduce is_init to capture this case.

With multiple pid spaces for all of the cases affected we are
looking for only the first process on the system, not some other
process that has pid == 1.

Signed-off-by: Eric W. Biederman
Signed-off-by: Sukadev Bhattiprolu
Cc: Dave Hansen
Cc: Serge Hallyn
Cc: Cedric Le Goater
Cc:
Acked-by: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2006-09-30 00:18:12 +0800
3b9b8ab65 [PATCH] Fix unserialized task->files changing ... Browse Code »

Fixed race on put_files_struct on exec with proc. Restoring files on
current on error path may lead to proc having a pointer to already kfree-d
files_struct.

->files changing at exit.c and khtread.c are safe as exit_files() makes all
things under lock.

Found during OpenVZ stress testing.

[akpm@osdl.org: add export]
Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-09-30 00:18:12 +0800
e6cab99bb [PATCH] unwind: fix unused variable warning when !CONFIG_MODULES ... Browse Code »

Fix "variable defined but not used" compiler warning in unwind.c when
CONFIG_MODULES is not set.

Signed-off-by: Chuck Ebbert
Cc: Jan Beulich
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chuck Ebbert
2006-09-30 00:18:11 +0800