Doug / smarc-fsl-linux-kernel | Embedian Git Server

31 Oct, 2005

40 commits

a241ec65a [PATCH] RCU torture-testing kernel module ... Browse Code »

This patch is a rewrite of the one submitted on October 1st, using modules
(http://marc.theaimsgroup.com/?l=linux-kernel&m=112819093522998&w=2).

This rewrite adds a tristate CONFIG_RCU_TORTURE_TEST, which enables an
intense torture test of the RCU infratructure. This is needed due to the
continued changes to the RCU infrastructure to accommodate dynamic ticks,
CPU hotplug, realtime, and so on. Most of the code is in a separate file
that is compiled only if the CONFIG variable is set. Documentation on how
to run the test and interpret the output is also included.

This code has been tested on i386 and ppc64, and an earlier version of the
code has received extensive testing on a number of architectures as part of
the PREEMPT_RT patchset.

Signed-off-by: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul E. McKenney
2005-10-31 09:37:27 +0800
b3099b48d [PATCH] fs/attr.c: remove BUG() ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2005-10-31 09:37:27 +0800
c0398ee6c [PATCH] include/linux/kernel.h:BUILD_BUG_ON(): fix a comment ... Browse Code »

Fix comment describing BUILD_BUG_ON: BUG_ON is not an assertion
(unfortunately).

Signed-off-by: Nikita Danilov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nikita Danilov
2005-10-31 09:37:26 +0800
ed8b39d0a [PATCH] watchdog: update .owner field of struct pci_driver ... Browse Code »

This updates .owner field of struct pci_driver.

This allows SYSFS to create the symlink from the driver to the module which
provides it.

Signed-off-by: Laurent Riffard
Acked-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Laurent Riffard
2005-10-31 09:37:26 +0800
413a42e2e [PATCH] SyncLink adapters: updates .owner field of struct pci_driver ... Browse Code »

This updates .owner field of struct pci_driver.

This allows SYSFS to create the symlink from the driver to the module which
provides it.

Signed-off-by: Laurent Riffard
Cc: Paul Fulghum
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Laurent Riffard
2005-10-31 09:37:26 +0800
8f04dd079 [PATCH] epca: update .owner field of struct pci_driver ... Browse Code »

This updates .owner field of struct pci_driver.

This allows SYSFS to create the symlink from the driver to the module which
provides it.

Signed-off-by: Laurent Riffard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Laurent Riffard
2005-10-31 09:37:26 +0800
1a66ddcb7 [PATCH] fix vgacon blanking ... Browse Code »

This patch fixes a long-standing vgacon bug: characters with the bright bit
set were left on the screen and not blacked out. All I did was that I
lookuped up some examples on the net about setting the vga palette, and
added the call missing from the linux kernel, but included in all other
ones. It works for me.

You can test this by writing something with the bright set to the
console, for example:
echo -e "\e[1;31mhello there\e[0m"
and then wait for the console to blank itself (by default, after 10 mins
of inactivity), maybe making it faster using
setterm -blank 1
so you only have to wait 1 minute.

Signed-off-by: Pozsar Balazs
Cc: James Simmons
Cc: "Antonino A. Daplas"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pozsar Balazs
2005-10-31 09:37:26 +0800
2973dfdb8 [PATCH] Test for sb_getblk return value ... Browse Code »

This patch adds tests for the return value of sb_getblk() in the ext2/3
filesystems. In fs/buffer.c it is stated that the getblk() function never
fails. However, it does can return NULL in some situations due to I/O
errors, which may lead us to NULL pointer dereferences

Signed-off-by: Glauber de Oliveira Costa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Glauber de Oliveira Costa
2005-10-31 09:37:26 +0800
7f04c26d7 [PATCH] fix nr_unused accounting, and avoid recursing in iput with I_WILL_FREE set ... Browse Code »

list_move(&inode->i_list, &inode_in_use);
} else {
list_move(&inode->i_list, &inode_unused);
+ inodes_stat.nr_unused++;
}
}
wake_up_inode(inode);

Are you sure the above diff is correct? It was added somewhere between
2.6.5 and 2.6.8. I think it's wrong.

The only way I can imagine the i_count to be zero in the above path, is
that I_WILL_FREE is set. And if I_WILL_FREE is set, then we must not
increase nr_unused. So I believe the above change is buggy and it will
definitely overstate the number of unused inodes and it should be backed
out.

Note that __writeback_single_inode before calling __sync_single_inode, can
drop the spinlock and we can have both the dirty and locked bitflags clear
here:

spin_unlock(&inode_lock);
__wait_on_inode(inode);
iput(inode);
XXXXXXX
spin_lock(&inode_lock);
}
use inode again here

a construct like the above makes zero sense from a reference counting
standpoint.

Either we don't ever use the inode again after the iput, or the
inode_lock should be taken _before_ executing the iput (i.e. a __iput
would be required). Taking the inode_lock after iput means the iget was
useless if we keep using the inode after the iput.

So the only chance the 2.6 was safe to call __writeback_single_inode
with the i_count == 0, is that I_WILL_FREE is set (I_WILL_FREE will
prevent the VM to free the inode in XXXXX).

Potentially calling the above iput with I_WILL_FREE was also wrong
because it would recurse in iput_final (the second mainline bug).

The below (untested) patch fixes the nr_unused accounting, avoids recursing
in iput when I_WILL_FREE is set and makes sure (with the BUG_ON) that we
don't corrupt memory and that all holders that don't set I_WILL_FREE, keeps
a reference on the inode!

Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2005-10-31 09:37:26 +0800
52303e8b5 [PATCH] modules: fix sparse warning for every MODULE_PARM ... Browse Code »

sparse complains about every MODULE_PARM used in a module: warning: symbol
'__parm_foo' was not declared. Should it be static?

The fix is to split declaration and initialization. While MODULE_PARM is
obsolete, it's not something sparse should report.

Signed-off-by: Pavel Roskin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Roskin
2005-10-31 09:37:26 +0800
c4dd0e4c6 [PATCH] extable: remove needless declaration ... Browse Code »

They aren't used anywhere in that file.

Signed-off-by: Nicolas Pitre
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nicolas Pitre
2005-10-31 09:37:26 +0800
0b360adbd [PATCH] setkeys needs root ... Browse Code »

Because people can play games reprogramming keys and leaving traps for the
next user of the console.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2005-10-31 09:37:25 +0800
eb8e31799 [PATCH] firmware: fix all kernel-doc warnings ... Browse Code »

Convert existing function docs to kernel-doc format. Eliminate all
kernel-doc warnings. Fix some doc typos and a little whitespace cleanup.

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2005-10-31 09:37:25 +0800
ecea8d19c [PATCH] jiffies_64 cleanup ... Browse Code »

Define jiffies_64 in kernel/timer.c rather than having 24 duplicated
defines in each architecture.

Signed-off-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
2005-10-31 09:37:25 +0800
371e8c25b [PATCH] Remove orphaned TIOCGDEV compat ioctl ... Browse Code »

This ioctl doesn't exist for native i386.

Signed-off-by: Brian Gerst
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Brian Gerst
2005-10-31 09:37:25 +0800
381be2545 [PATCH] ext3: sparse fixes ... Browse Code »

Fix warnings from sparse due to un-declared functions that should either
have a header file or have been declared static

fs/ext2/bitmap.c:14:15: warning: symbol 'ext2_count_free' was not declared. Should it be static?
fs/ext2/namei.c:92:15: warning: symbol 'ext2_get_parent' was not declared. Should it be static?
fs/ext3/bitmap.c:15:15: warning: symbol 'ext3_count_free' was not declared. Should it be static?
fs/ext3/namei.c:1013:15: warning: symbol 'ext3_get_parent' was not declared. Should it be static?
fs/ext3/xattr.c:214:1: warning: symbol 'ext3_xattr_block_get' was not declared. Should it be static?
fs/ext3/xattr.c:358:1: warning: symbol 'ext3_xattr_block_list' was not declared. Should it be static?
fs/ext3/xattr.c:630:1: warning: symbol 'ext3_xattr_block_find' was not declared. Should it be static?
fs/ext3/xattr.c:863:1: warning: symbol 'ext3_xattr_ibody_find' was not declared. Should it be static?

Signed-off-by: Ben Dooks
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Dooks
2005-10-31 09:37:25 +0800
1a80ba882 [PATCH] Telecom Clock Driver for MPCBL0010 ATCA computer blade ... Browse Code »

Signed-off-by: Mark Gross
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mark Gross
2005-10-31 09:37:25 +0800
1291cf416 [PATCH] fix de_thread() vs do_coredump() deadlock ... Browse Code »

de_thread() sends SIGKILL to all sub-threads and waits them to die in 'D'
state. It is possible that one of the threads already dequeued coredump
signal. When de_thread() unlocks ->sighand->lock that thread can enter
do_coredump()->coredump_wait() and cause a deadlock.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2005-10-31 09:37:25 +0800
f7232056b [PATCH] Added a Receive_Abort to the Marvell serial driver ... Browse Code »

Added a Receive_Abort to the Marvell serial driver

Fix occasional input overrun errors on Marvell serial driver

- If the Marvell serial driver is repeatedly started and then stopped it
will occasionally report an input overrun error when started.

- Added a Receive_Abort to the Marvell serial driver to abort previously
received receive errors when re-starting the receive

Acked-by: Mark A. Greer
Signed-off-by: Carlos Sanchez
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Carlos Sanchez
2005-10-31 09:37:25 +0800
6ea05db06 [PATCH] fuse: remove unused define ... Browse Code »

Setting ctime is implicit in all setattr cases, so the FATTR_CTIME
definition is unnecessary.

It is used by neither the kernel nor by userspace.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2005-10-31 09:37:24 +0800
1779381de [PATCH] fuse: spelling fixes ... Browse Code »

Correct some typos and inconsistent use of "initialise" vs "initialize" in
comments. Reported by Ioannis Barkas.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2005-10-31 09:37:24 +0800
2a38bccd0 [PATCH] Kconfig help text correction for CONFIG_FRAME_POINTER ... Browse Code »

Fix-up the CONFIG_FRAME_POINTER help text language a bit.

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2005-10-31 09:37:24 +0800
7f2a52555 [PATCH] wait4 PTRACE_ATTACH race fix ... Browse Code »

Back about a year ago when I last fiddled heavily with the do_wait code, I
was thinking too hard about the wrong thing and I now think I introduced a
bug whose inverse thought I was fixing.

Apparently noone was looking too hard over much shoulder, so as to cite my
bogus reasoning at the time. In the race condition when PTRACE_ATTACH is
about to steal a child and then the child hits a tracing event (what
my_ptrace_child checks for), the real parent does need to set its flag
noting it has some eligible live children. Otherwise a spurious ECHILD
error is possible, since the child in question is not yet on the
ptrace_children list.

Signed-off-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2005-10-31 09:37:24 +0800
396dc44bc [PATCH] ioc4 serial support - mostly cleanup ... Browse Code »

Various small mods for the Altix ioc4 serial driver - mostly cleanup:
- remove UIF_INITIALIZED usage
- use the 'lock' from uart_port
- better multiple card support

Signed-off-by: Patrick Gefre
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pat Gefre
2005-10-31 09:37:24 +0800
5b1168792 [PATCH] Locking problems while EXT3FS_DEBUG on ... Browse Code »

I noticed some problems while running ext3 with the debug flag set on.
More precisely, I was unable to umount the filesystem. Some investigation
took me to the patch that follows.

At a first glance , the lock/unlock I've taken out seems really not
necessary, as the main code (outside debug) does not lock the super. The
only additional danger operations that debug code introduces seems to be
related to bitmap, but bitmap operations tends to be all atomic anyway.

I also took the opportunity to fix 2 spelling errors.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Glauber de Oliveira Costa
2005-10-31 09:37:23 +0800
2384f55f8 [PATCH] coredump_wait() cleanup ... Browse Code »

This patch deletes pointless code from coredump_wait().

1. It does useless mm->core_waiters inc/dec under mm->mmap_sem,
but any changes to ->core_waiters have no effect until we drop
->mmap_sem.

2. It calls yield() for absolutely unknown reason.

Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2005-10-31 09:37:23 +0800
7407251a0 [PATCH] PF_DEAD cleanup ... Browse Code »

The PF_DEAD setting doesn't belong to exit_notify(), move it to a proper
place.

Signed-off-by: Coywolf Qi Hunt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Coywolf Qi Hunt
2005-10-31 09:37:23 +0800
40dc56512 [PATCH] cleanup for kernel/printk.c ... Browse Code »

- Removes some trailing whitespace

- Breaks long lines and make other small changes to conform to CodingStyle

- Add explicit printk loglevels in two places.

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2005-10-31 09:37:23 +0800
2a91f3e54 [PATCH] ide-cd mini cleanup of casts ... Browse Code »

Remove some unneeded casts.
Avoid an assignment in the case of kmalloc failure.
Break a few instances of if (foo) whatever; into two lines.

Signed-off-by: Jesper Juhl
Acked-by: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2005-10-31 09:37:23 +0800
20e1129ab [PATCH] Keys: Get rid of warning in kmod.c if keys disabled ... Browse Code »

The attached patch gets rid of a "statement without effect" warning when
CONFIG_KEYS is disabled by making use of the return value of key_get().
The compiler will optimise all of this away when keys are disabled.

Signed-Off-By: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2005-10-31 09:37:23 +0800
29db91906 [PATCH] Keys: Add LSM hooks for key management [try #3] ... Browse Code »

The attached patch adds LSM hooks for key management facilities. The notable
changes are:

(1) The key struct now supports a security pointer for the use of security
modules. This will permit key labelling and restrictions on which
programs may access a key.

(2) Security modules get a chance to note (or abort) the allocation of a key.

(3) The key permission checking can now be enhanced by the security modules;
the permissions check consults LSM if all other checks bear out.

(4) The key permissions checking functions now return an error code rather
than a boolean value.

(5) An extra permission has been added to govern the modification of
attributes (UID, GID, permissions).

Note that there isn't an LSM hook specifically for each keyctl() operation,
but rather the permissions hook allows control of individual operations based
on the permission request bits.

Key management access control through LSM is enabled by automatically if both
CONFIG_KEYS and CONFIG_SECURITY are enabled.

This should be applied on top of the patch ensubjected:

[PATCH] Keys: Possessor permissions should be additive

Signed-Off-By: David Howells
Signed-off-by: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2005-10-31 09:37:23 +0800
2aa349f6e [PATCH] Keys: Export user-defined keyring operations ... Browse Code »

Export user-defined key operations so that those who wish to define their
own key type based on the user-defined key operations may do so (as has
been requested).

The header file created has been placed into include/keys/user-type.h, thus
creating a directory where other key types may also be placed. Any
objections to doing this?

Signed-Off-By: David Howells
Signed-Off-By: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2005-10-31 09:37:22 +0800
1426d7a81 [PATCH] vm: remove unused/broken page_pte[_prot] macros ... Browse Code »

This patch removes page_pte_prot and page_pte macros from all
architectures. Some architectures define both, some only page_pte (broken)
and others none. These macros are not used anywhere.

page_pte_prot(page, prot) is identical to mk_pte(page, prot) and
page_pte(page) is identical to page_pte_prot(page, __pgprot(0)).

* The following architectures define both page_pte_prot and page_pte

arm, arm26, ia64, sh64, sparc, sparc64

* The following architectures define only page_pte (broken)

frv, i386, m32r, mips, sh, x86-64

* All other architectures define neither

Signed-off-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2005-10-31 09:37:22 +0800
c7e9dd4dd [PATCH] vm: remove redundant assignment from __pagevec_release_nonlru() ... Browse Code »

This patch removes redundant assignment from __pagevec_release_nonlru().
pages_to_free.cold is set to pvec->cold by pagevec_init() call right above
the assignment.

Signed-off-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2005-10-31 09:37:22 +0800
39e88ca2c [PATCH] fs: error case fix in __generic_file_aio_read ... Browse Code »

When __generic_file_aio_read() hits an error during reading, it reports the
error iff nothing has successfully been read yet. This is condition - when
an error occurs, if nothing has been read/written, report the error code;
otherwise, report the amount of bytes successfully transferred upto that
point.

This corner case can be exposed by performing readv(2) with the following
iov.

iov[0] = len0 @ ptr0
iov[1] = len1 @ NULL (or any other invalid pointer)
iov[2] = len2 @ ptr2

When file size is enough, performing above readv(2) results in

len0 bytes from file_pos @ ptr0
len2 bytes from file_pos + len0 @ ptr2

And the return value is len0 + len2. Test program is attached to this
mail.

This patch makes __generic_file_aio_read()'s error handling identical to
other functions.

#include
#include
#include
#include
#include
#include
#include
#include
#include

int main(int argc, char **argv)
{
const char *path;
struct stat stbuf;
size_t len0, len1;
void *buf0, *buf1;
struct iovec iov[3];
int fd, i;
ssize_t ret;

if (argc < 2) {
fprintf(stderr, "Usage: testreadv path (better be a "
"small text file)\n");
return 1;
}
path = argv[1];

if (stat(path, &stbuf) < 0) {
perror("stat");
return 1;
}

len0 = stbuf.st_size / 2;
len1 = stbuf.st_size - len0;

if (!len0 || !len1) {
fprintf(stderr, "Dude, file is too small\n");
return 1;
}

if ((fd = open(path, O_RDONLY)) < 0) {
perror("open");
return 1;
}

if (!(buf0 = malloc(len0)) || !(buf1 = malloc(len1))) {
perror("malloc");
return 1;
}

memset(buf0, 0, len0);
memset(buf1, 0, len1);

iov[0].iov_base = buf0;
iov[0].iov_len = len0;
iov[1].iov_base = NULL;
iov[1].iov_len = len1;
iov[2].iov_base = buf1;
iov[2].iov_len = len1;

printf("vector ");
for (i = 0; i < 3; i++)
printf("%p:%zu ", iov[i].iov_base, iov[i].iov_len);
printf("\n");

ret = readv(fd, iov, 3);
if (ret < 0)
perror("readv");

printf("readv returned %zd\nbuf0 = [%s]\nbuf1 = [%s]\n",
ret, (char *)buf0, (char *)buf1);

return 0;
}

Signed-off-by: Tejun Heo
Cc: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2005-10-31 09:37:22 +0800
30e0fca6c [PATCH] ptrace/coredump/exit_group deadlock ... Browse Code »

I could seldom reproduce a deadlock with a task not killable in T state
(TASK_STOPPED, not TASK_TRACED) by attaching a NPTL threaded program to
gdb, by segfaulting the task and triggering a core dump while some other
task is executing exit_group and while one task is in ptrace_attached
TASK_STOPPED state (not TASK_TRACED yet). This originated from a gdb
bugreport (the fact gdb was segfaulting the task wasn't a kernel bug), but
I just incidentally noticed the gdb bug triggered a real kernel bug as
well.

Most threads hangs in exit_mm because the core_dumping is still going, the
core dumping hangs because the stopped task doesn't exit, the stopped task
can't wakeup because it has SIGNAL_GROUP_EXIT set, hence the deadlock.

To me it seems that the problem is that the force_sig_specific(SIGKILL) in
zap_threads is a noop if the task has PF_PTRACED set (like in this case
because gdb is attached). The __ptrace_unlink does nothing because the
signal->flags is set to SIGNAL_GROUP_EXIT|SIGNAL_STOP_DEQUEUED (verified).

The above info also shows that the stopped task hit a race and got the stop
signal (presumably by the ptrace_attach, only the attach, state is still
TASK_STOPPED and gdb hangs waiting the core before it can set it to
TASK_TRACED) after one of the thread invoked the core dump (it's the core
dump that sets signal->flags to SIGNAL_GROUP_EXIT).

So beside the fact nobody would wakeup the task in __ptrace_unlink (the
state is _not_ TASK_TRACED), there's a secondary problem in the signal
handling code, where a task should ignore the ptrace-sigstops as long as
SIGNAL_GROUP_EXIT is set (or the wakeup in __ptrace_unlink path wouldn't be
enough).

So I attempted to make this patch that seems to fix the problem. There
were various ways to fix it, perhaps you prefer a different one, I just
opted to the one that looked safer to me.

I also removed the clearing of the stopped bits from the zap_other_threads
(zap_other_threads was safe unlike zap_threads). I don't like useless
code, this whole NPTL signal/ptrace thing is already unreadable enough and
full of corner cases without confusing useless code into it to make it even
less readable. And if this code is really needed, then you may want to
explain why it's not being done in the other paths that sets
SIGNAL_GROUP_EXIT at least.

Even after this patch I still wonder who serializes the read of
p->ptrace in zap_threads.

Patch is called ptrace-core_dump-exit_group-deadlock-1.

This was the trace I've got:

test T ffff81003e8118c0 0 14305 1 14311 14309 (NOTLB)
ffff810058ccdde8 0000000000000082 000001f4000037e1 ffff810000000013
00000000000000f8 ffff81003e811b00 ffff81003e8118c0 ffff810011362100
0000000000000012 ffff810017ca4180
Call Trace:{try_to_wake_up+893} {finish_stop+87}
{get_signal_to_deliver+1359} {do_signal+157}
{ptrace_check_attach+222} {sys_ptrace+2293}
{default_wake_function+0} {sys_ioctl+73}
{sysret_signal+28} {ptregscall_common+103}

test D ffff810011362100 0 14309 1 14305 14312 (NOTLB)
ffff810053c81cf8 0000000000000082 0000000000000286 0000000000000001
0000000000000195 ffff810011362340 ffff810011362100 ffff81002e338040
ffff810001e0ca80 0000000000000001
Call Trace:{try_to_wake_up+893} {wait_for_completion+173}
{default_wake_function+0} {exit_mm+149}
{do_exit+479} {do_group_exit+252}
{get_signal_to_deliver+1451} {do_signal+157}
{ptrace_check_attach+222} {specific_send_sig_info+2

{force_sig_info+186} {do_int3+112}
{retint_signal+61}
test D ffff81002e338040 0 14311 1 14716 14305 (NOTLB)
ffff81005ca8dcf8 0000000000000082 0000000000000286 0000000000000001
0000000000000120 ffff81002e338280 ffff81002e338040 ffff8100481cb740
ffff810001e0ca80 0000000000000001
Call Trace:{try_to_wake_up+893} {wait_for_completion+173}
{default_wake_function+0} {exit_mm+149}
{do_exit+479} {__dequeue_signal+558}
{do_group_exit+252} {get_signal_to_deliver+1451}
{do_signal+157} {ptrace_check_attach+222}
{specific_send_sig_info+208} {force_sig_info+186}
{do_int3+112} {retint_signal+61}

test D ffff810017ca4180 0 14312 1 14309 13882 (NOTLB)
ffff81005d15fcb8 0000000000000082 ffff81005d15fc58 ffffffff80130816
0000000000000897 ffff810017ca43c0 ffff810017ca4180 ffff81003e8118c0
0000000000000082 ffffffff801317ed
Call Trace:{activate_task+150} {try_to_wake_up+893}
{wait_for_completion+173} {default_wake_function+0}
{do_coredump+819} {thread_return+82}
{get_signal_to_deliver+1444} {do_signal+157}
{ptrace_check_attach+222} {specific_send_sig_info+2

{_spin_unlock_irqrestore+5} {force_sig_info+186}
{do_general_protection+159} {retint_signal+61}

Signed-off-by: Andrea Arcangeli
Cc: Roland McGrath
Cc: Ingo Molnar
Cc: Linus Torvalds
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2005-10-31 09:37:22 +0800
68860ec10 [PATCH] cpusets: automatic numa mempolicy rebinding ... Browse Code »

This patch automatically updates a tasks NUMA mempolicy when its cpuset
memory placement changes. It does so within the context of the task,
without any need to support low level external mempolicy manipulation.

If a system is not using cpusets, or if running on a system with just the
root (all-encompassing) cpuset, then this remap is a no-op. Only when a
task is moved between cpusets, or a cpusets memory placement is changed
does the following apply. Otherwise, the main routine below,
rebind_policy() is not even called.

When mixing cpusets, scheduler affinity, and NUMA mempolicies, the
essential role of cpusets is to place jobs (several related tasks) on a set
of CPUs and Memory Nodes, the essential role of sched_setaffinity is to
manage a jobs processor placement within its allowed cpuset, and the
essential role of NUMA mempolicy (mbind, set_mempolicy) is to manage a jobs
memory placement within its allowed cpuset.

However, CPU affinity and NUMA memory placement are managed within the
kernel using absolute system wide numbering, not cpuset relative numbering.

This is ok until a job is migrated to a different cpuset, or what's the
same, a jobs cpuset is moved to different CPUs and Memory Nodes.

Then the CPU affinity and NUMA memory placement of the tasks in the job
need to be updated, to preserve their cpuset-relative position. This can
be done for CPU affinity using sched_setaffinity() from user code, as one
task can modify anothers CPU affinity. This cannot be done from an
external task for NUMA memory placement, as that can only be modified in
the context of the task using it.

However, it easy enough to remap a tasks NUMA mempolicy automatically when
a task is migrated, using the existing cpuset mechanism to trigger a
refresh of a tasks memory placement after its cpuset has changed. All that
is needed is the old and new nodemask, and notice to the task that it needs
to rebind its mempolicy. The tasks mems_allowed has the old mask, the
tasks cpuset has the new mask, and the existing
cpuset_update_current_mems_allowed() mechanism provides the notice. The
bitmap/cpumask/nodemask remap operators provide the cpuset relative
calculations.

This patch leaves open a couple of issues:

1) Updating vma and shmfs/tmpfs/hugetlbfs memory policies:

These mempolicies may reference nodes outside of those allowed to
the current task by its cpuset. Tasks are migrated as part of jobs,
which reside on what might be several cpusets in a subtree. When such
a job is migrated, all NUMA memory policy references to nodes within
that cpuset subtree should be translated, and references to any nodes
outside that subtree should be left untouched. A future patch will
provide the cpuset mechanism needed to mark such subtrees. With that
patch, we will be able to correctly migrate these other memory policies
across a job migration.

2) Updating cpuset, affinity and memory policies in user space:

This is harder. Any placement state stored in user space using
system-wide numbering will be invalidated across a migration. More
work will be required to provide user code with a migration-safe means
to manage its cpuset relative placement, while preserving the current
API's that pass system wide numbers, not cpuset relative numbers across
the kernel-user boundary.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2005-10-31 09:37:22 +0800
fb5eeeee4 [PATCH] cpusets: bitmap and mask remap operators ... Browse Code »

In the forthcoming task migration support, a key calculation will be
mapping cpu and node numbers from the old set to the new set while
preserving cpuset-relative offset.

For example, if a task and its pages on nodes 8-11 are being migrated to
nodes 24-27, then pages on node 9 (the 2nd node in the old set) should be
moved to node 25 (the 2nd node in the new set.)

As with other bitmap operations, the proper way to code this is to provide
the underlying calculation in lib/bitmap.c, and then to provide the usual
cpumask and nodemask wrappers.

This patch provides that. These operations are termed 'remap' operations.
Both remapping a single bit and a set of bits is supported.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2005-10-31 09:37:21 +0800
28a42b9ea [PATCH] cpusets: confine pdflush to its cpuset ... Browse Code »

This patch keeps pdflush daemons on the same cpuset as their parent, the
kthread daemon.

Some large NUMA configurations put as much as they can of kernel threads
and other classic Unix load in what's called a bootcpuset, keeping the rest
of the system free for dedicated jobs.

This effort is thwarted by pdflush, which dynamically destroys and
recreates pdflush daemons depending on load.

It's easy enough to force the originally created pdflush deamons into the
bootcpuset, at system boottime. But the pdflush threads created later were
allowed to run freely across the system, due to the necessary line in their
startup kthread():

set_cpus_allowed(current, CPU_MASK_ALL);

By simply coding pdflush to start its threads with the cpus_allowed
restrictions of its cpuset (inherited from kthread, its parent) we can
ensure that dynamically created pdflush threads are also kept in the
bootcpuset.

On systems w/o cpusets, or w/o a bootcpuset implementation, the following
will have no affect, leaving pdflush to run on any CPU, as before.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2005-10-31 09:37:21 +0800
18a19cb30 [PATCH] cpusets: simple rename ... Browse Code »

Add support for renaming cpusets. Only allow simple rename of cpuset
directories in place. Don't allow moving cpusets elsewhere in hierarchy or
renaming the special cpuset files in each cpuset directory.

The usefulness of this simple rename became apparent when developing task
migration facilities. It allows building a second cpuset hierarchy using
new names and containing new CPUs and Memory Nodes, moving tasks from the
old to the new cpusets, removing the old cpusets, and then renaming the new
cpusets to be just like the old names, so that any knowledge that the tasks
had of their cpuset names will still be valid.

Leaf node cpusets can be migrated to other CPUs or Memory Nodes by just
updating their 'cpus' and 'mems' files, but because no cpuset can contain
CPUs or Nodes not in its parent cpuset, one cannot do this in a cpuset
hierarchy without first expanding all the non-leaf cpusets to contain the
union of both the old and new CPUs and Nodes, which would obfuscate the
one-to-one migration of a task from one cpuset to another required to
correctly migrate the physical page frames currently allocated to that
task.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2005-10-31 09:37:21 +0800