Eric Lee / smarc-fsl-linux-kernel

17 May, 2007

1 commit

71ce92f3f make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core filename size ... Browse Code »

Make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core
filename size and change it to 128, so that extensive patterns such as
'/local/cores/%e-%h-%s-%t-%p.core' won't result in truncated filename
generation.

Signed-off-by: Dan Aloni
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Aloni
2007-05-17 20:23:05 +0800

12 May, 2007

1 commit

853da0022 Merge branch 'audit.b38' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current ... Browse Code »

* 'audit.b38' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
[PATCH] Abnormal End of Processes
[PATCH] match audit name data
[PATCH] complete message queue auditing
[PATCH] audit inode for all xattr syscalls
[PATCH] initialize name osid
[PATCH] audit signal recipients
[PATCH] add SIGNAL syscall class (v3)
[PATCH] auditing ptrace

Linus Torvalds
2007-05-12 00:57:16 +0800

11 May, 2007

3 commits

fba2afaae signal/timer/event: signalfd core ... Browse Code »

This patch series implements the new signalfd() system call.

I took part of the original Linus code (and you know how badly it can be
broken :), and I added even more breakage ;) Signals are fetched from the same
signal queue used by the process, so signalfd will compete with standard
kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
seems to be working fine on my Dual Opteron machine. I made a quick test
program for it:

http://www.xmailserver.org/signafd-test.c

The signalfd() system call implements signal delivery into a file descriptor
receiver. The signalfd file descriptor if created with the following API:

int signalfd(int ufd, const sigset_t *mask, size_t masksize);

The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
signalfd file.

The "mask" allows to specify the signal mask of signals that we are interested
in. The "masksize" parameter is the size of "mask".

The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
will return POLLIN when signals are available to be dequeued. As a direct
consequence of supporting the Linux poll subsystem, the signalfd fd can use
used together with epoll(2) too.

The read(2) system call will return a "struct signalfd_siginfo" structure in
the userspace supplied buffer. The return value is the number of bytes copied
in the supplied buffer, or -1 in case of error. The read(2) call can also
return 0, in case the sighand structure to which the signalfd was attached,
has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
return -EAGAIN in case no signal is available.

If the size of the buffer passed to read(2) is lower than sizeof(struct
signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
return -ERESTARTSYS in case a signal hits the process. The format of the
struct signalfd_siginfo is, and the valid fields depends of the (->code &
__SI_MASK) value, in the same way a struct siginfo would:

struct signalfd_siginfo {
__u32 signo; /* si_signo */
__s32 err; /* si_errno */
__s32 code; /* si_code */
__u32 pid; /* si_pid */
__u32 uid; /* si_uid */
__s32 fd; /* si_fd */
__u32 tid; /* si_fd */
__u32 band; /* si_band */
__u32 overrun; /* si_overrun */
__u32 trapno; /* si_trapno */
__s32 status; /* si_status */
__s32 svint; /* si_int */
__u64 svptr; /* si_ptr */
__u64 utime; /* si_utime */
__u64 stime; /* si_stime */
__u64 addr; /* si_addr */
};

[akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2007-05-11 23:29:36 +0800
e713d0dab attach_pid() with struct pid parameter ... Browse Code »

attach_pid() currently takes a pid_t and then uses find_pid() to find the
corresponding struct pid. Sometimes we already have the struct pid. We can
then skip find_pid() if attach_pid() were to take a struct pid parameter.

Signed-off-by: Sukadev Bhattiprolu
Cc: Cedric Le Goater
Cc: Dave Hansen
Cc: Serge Hallyn
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2007-05-11 23:29:35 +0800
0a4ff8c25 [PATCH] Abnormal End of Processes ... Browse Code »

Hi,

I have been working on some code that detects abnormal events based on audit
system events. One kind of event that we currently have no visibility for is
when a program terminates due to segfault - which should never happen on a
production machine. And if it did, you'd want to investigate it. Attached is a
patch that collects these events and sends them into the audit system.

Signed-off-by: Steve Grubb
Signed-off-by: Al Viro

Steve Grubb
2007-05-11 17:38:26 +0800

09 May, 2007

2 commits

98701d1b0 (re)register_binfmt returns with -EBUSY ... Browse Code »

When a binary format is unregistered and re-registered, register_binfmt
fails with -EBUSY. The reason is that unregister_binfmt does not set
fmt->next to NULL, and seeing (fmt->next != NULL), register_binfmt fails
with -EBUSY.

One can find his way around by explicitly setting fmt->next to NULL after
unregistering, but that is kind of unclean (one should better be using only
the interfaces, and not the interal members, isn't it?)

Attached one-liner can fix it.

Signed-off-by: Kalash Nainwal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

kalash nainwal
2007-05-09 02:15:08 +0800
4fc75ff48 exec: fix remove_arg_zero ... Browse Code »

Petr Tesarik discovered a problem in remove_arg_zero(). He writes:

When a script is loaded, load_script() replaces argv[0] with the
name of the interpreter and the filename passed to the exec syscall.
However, there is no guarantee that the length of the interpreter
name plus the length of the filename is greater than the length of
the original argv[0]. If the difference happens to cross a page boundary,
setup_arg_pages() will call put_dirty_page() [aka install_arg_page()]
with an address outside the VMA.

Therefore, remove_arg_zero() must free all pages which would be unused
after the argument is removed.

So, rewrite the remove_arg_zero function without gotos, with a few comments,
and with the commonly used explicit index/offset. This fixes the problem
and makes it easier to understand as well.

[a.p.zijlstra@chello.nl: add comment]
Signed-off-by: Nick Piggin
Cc: Petr Tesarik
Signed-off-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-05-09 02:15:00 +0800

18 Apr, 2007

1 commit

c4bbafda7 exec.c: fix coredump to pipe problem and obscure "security hole" ... Browse Code »

The patch checks for "|" in the pattern not the output and doesn't nail a
pid on to a piped name (as it is a program name not a file)

Also fixes a very very obscure security corner case. If you happen to have
decided on a core pattern that starts with the program name then the user
can run a program called "|myevilhack" as it stands. I doubt anyone does
this.

Signed-off-by: Alan Cox
Confirmed-by: Christopher S. Aker
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alan Cox
2007-04-18 07:36:26 +0800

12 Feb, 2007

1 commit

c37622296 [PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). ... Browse Code »

Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.

Signed-off-by: Robert P. J. Day
Cc: "Luck, Tony"
Cc: Andi Kleen
Cc: Roland McGrath
Cc: James Bottomley
Cc: Greg KH
Acked-by: Joel Becker
Cc: Steven Whitehouse
Cc: Jan Kara
Cc: Michael Halcrow
Cc: "David S. Miller"
Cc: Stephen Smalley
Cc: James Morris
Cc: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robert P. J. Day
2007-02-12 02:51:27 +0800

11 Dec, 2006

1 commit

bbea9f696 [PATCH] fdtable: Make fdarray and fdsets equal in size ... Browse Code »

Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets. The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).

In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.

Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal. This
patch removes fdtable->max_fdset. As an added bonus, most of the supporting
code becomes simpler.

Signed-off-by: Vadim Lobanov
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Dipankar Sarma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vadim Lobanov
2006-12-11 01:57:22 +0800

09 Dec, 2006

2 commits

84d737866 [PATCH] add child reaper to pid_namespace ... Browse Code »

Add a per pid_namespace child-reaper. This is needed so processes are reaped
within the same pid space and do not spill over to the parent pid space. Its
also needed so containers preserve existing semantic that pid == 1 would reap
orphaned children.

This is based on Eric Biederman's patch: http://lkml.org/lkml/2006/2/6/285

Signed-off-by: Sukadev Bhattiprolu
Signed-off-by: Cedric Le Goater
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2006-12-09 00:28:52 +0800
0f7fc9e4d [PATCH] VFS: change struct file to use struct path ... Browse Code »

This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.

Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef "Jeff" Sipek
2006-12-09 00:28:41 +0800

08 Dec, 2006

2 commits

6d4df677f [PATCH] do_coredump() and not stopping rewrite attacks? ... Browse Code »

On Sat, Dec 02, 2006 at 11:47:44PM +0300, Alexey Dobriyan wrote:
> David Binderman compiled 2.6.19 with icc and grepped for "was set but never
> used". Many warnings are on
> http://coderock.org/kj/unused-2.6.19-fs

Heh, the very first line:
fs/exec.c(1465): remark #593: variable "flag" was set but never used

fs/exec.c:
1477 /*
1478 * We cannot trust fsuid as being the "true" uid of the
1479 * process nor do we know its entire history. We only know it
1480 * was tainted so we dump it as root in mode 2.
1481 */
1482 if (mm->dumpable == 2) { /* Setuid core dump mode */
1483 flag = O_EXCL; /* Stop rewrite attacks */
1484 current->fsuid = 0; /* Dump root private */
1485 }

And then filp_open follows with "flag" totally ignored.

(akpm: this restores the code to Alan's original version. Andi's "Support
piping into commands in /proc/sys/kernel/core_pattern" (cset d025c9db) broke
it).

Cc: Alan Cox
Cc:
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2006-12-08 00:39:46 +0800
e94b17660 [PATCH] slab: remove SLAB_KERNEL ... Browse Code »

SLAB_KERNEL is an alias of GFP_KERNEL.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:24 +0800

02 Oct, 2006

1 commit

e9ff3990f [PATCH] namespaces: utsname: switch to using uts namespaces ... Browse Code »

Replace references to system_utsname to the per-process uts namespace
where appropriate. This includes things like uname.

Changes: Per Eric Biederman's comments, use the per-process uts namespace
for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Signed-off-by: Cedric Le Goater
Cc: Jeff Dike
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2006-10-02 22:57:21 +0800

01 Oct, 2006

2 commits

d025c9db7 [PATCH] Support piping into commands in /proc/sys/kernel/core_pattern ... Browse Code »

Using the infrastructure created in previous patches implement support to
pipe core dumps into programs.

This is done by overloading the existing core_pattern sysctl
with a new syntax:

|program

When the first character of the pattern is a '|' the kernel will instead
threat the rest of the pattern as a command to run. The core dump will be
written to the standard input of that program instead of to a file.

This is useful for having automatic core dump analysis without filling up
disks. The program can do some simple analysis and save only a summary of
the core dump.

The core dump proces will run with the privileges and in the name space of
the process that caused the core dump.

I also increased the core pattern size to 128 bytes so that longer command
lines fit.

Most of the changes comes from allowing core dumps without seeks. They are
fairly straight forward though.

One small incompatibility is that if someone had a core pattern previously
that started with '|' they will get suddenly new behaviour. I think that's
unlikely to be a real problem though.

Additional background:

> Very nice, do you happen to have a program that can accept this kind of
> input for crash dumps? I'm guessing that the embedded people will
> really want this functionality.

I had a cheesy demo/prototype. Basically it wrote the dump to a file again,
ran gdb on it to get a backtrace and wrote the summary to a shared directory.
Then there was a simple CGI script to generate a "top 10" crashes HTML
listing.

Unfortunately this still had the disadvantage to needing full disk space for a
dump except for deleting it afterwards (in fact it was worse because over the
pipe holes didn't work so if you have a holey address map it would require
more space).

Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
cores (at least it worked with zsh's =(cat core) syntax), so it would be
likely possible to do it without temporary space with a simple wrapper that
calls it in the right way. I ran out of time before doing that though.

The demo prototype scripts weren't very good. If there is really interest I
can dig them out (they are currently on a laptop disk on the desk with the
laptop itself being in service), but I would recommend to rewrite them for any
serious application of this and fix the disk space problem.

Also to be really useful it should probably find a way to automatically fetch
the debuginfos (I cheated and just installed them in advance). If nobody else
does it I can probably do the rewrite myself again at some point.

My hope at some point was that desktops would support it in their builtin
crash reporters, but at least the KDE people I talked too seemed to be happy
with their user space only solution.

Alan sayeth:

I don't believe that piping as such as neccessarily the right model, but
the ability to intercept and processes core dumps from user space is asked
for by many enterprise users as well. They want to know about, capture,
analyse and process core dumps, often centrally and in automated form.

[akpm@osdl.org: loff_t != unsigned long]
Signed-off-by: Andi Kleen
Cc: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2006-10-01 15:39:33 +0800
8f0ab5147 [PATCH] csa: convert CONFIG tag for extended accounting routines ... Browse Code »

There were a few accounting data/macros that are used in CSA but are #ifdef'ed
inside CONFIG_BSD_PROCESS_ACCT. This patch is to change those ifdef's from
CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT. A few defines are moved from
kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
include/linux/tsacct_kern.h.

Signed-off-by: Jay Lan
Cc: Shailabh Nagar
Cc: Balbir Singh
Cc: Jes Sorensen
Cc: Chris Sturtivant
Cc: Tony Ernst
Cc: Guillaume Thouvenin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jay Lan
2006-10-01 15:39:29 +0800

30 Sep, 2006

1 commit

3b9b8ab65 [PATCH] Fix unserialized task->files changing ... Browse Code »

Fixed race on put_files_struct on exec with proc. Restoring files on
current on error path may lead to proc having a pointer to already kfree-d
files_struct.

->files changing at exit.c and khtread.c are safe as exit_files() makes all
things under lock.

Found during OpenVZ stress testing.

[akpm@osdl.org: add export]
Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-09-30 00:18:12 +0800

27 Sep, 2006

2 commits

aafe6c2a2 [PATCH] de_thread: Use tsk not current ... Browse Code »

Ingo Oeser pointed out that because current expands to an inline function
it is more space efficient and somewhat faster to simply keep a cached copy
of current in another variable. This patch implements that for the
de_thread function.

(akpm: saves nearly 100 bytes of text on x86)

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-09-27 23:26:20 +0800
c18258c6f [PATCH] pid: Implement transfer_pid and use it to simplify de_thread ... Browse Code »

In de_thread we move pids from one process to another, a rather ugly case.
The function transfer_pid makes it clear what we are doing, and makes the
action atomic. This is useful we ever want to atomically traverse the
process group and session lists, in a rcu safe manner.

Even if the atomic properties this change should be a win as transfer_pid
should be less code to execute than executing both attach_pid and
detach_pid, and this should make de_thread slightly smaller as only a
single function call needs to be emitted. The only downside is that the
code might be slower to execute as the odds are against transfer_pid being
in cache.

Signed-off-by: Eric W. Biederman
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-09-27 23:26:19 +0800

28 Aug, 2006

1 commit

513627d7f [PATCH] fix up lockdep trace in fs/exec.c ... Browse Code »

This fixes the locking error noticed by lockdep:

=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
init/1 is trying to acquire lock:
(&sighand->siglock){....}, at: [] flush_old_exec+0x3ae/0x859

but task is already holding lock:
(&sighand->siglock){....}, at: [] flush_old_exec+0x39e/0x859

other info that might help us debug this:
2 locks held by init/1:
#0: (tasklist_lock){..--}, at: [] flush_old_exec+0x38e/0x859
#1: (&sighand->siglock){....}, at: [] flush_old_exec+0x39e/0x859

stack backtrace:
[] show_trace_log_lvl+0x54/0xfd
[] show_trace+0xd/0x10
[] dump_stack+0x19/0x1b
[] __lock_acquire+0x773/0x997
[] lock_acquire+0x4b/0x6c
[] _spin_lock+0x19/0x28
[] flush_old_exec+0x3ae/0x859
[] load_elf_binary+0x4aa/0x1628
[] search_binary_handler+0xa7/0x24e
[] do_execve+0x15b/0x1f9
[] sys_execve+0x29/0x4d
[] syscall_call+0x7/0xb

Signed-off-by: Arjan van de Ven
Signed-off-by: Dave Jones
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Jones
2006-08-28 02:01:32 +0800

25 Aug, 2006

2 commits

a969fd5a4 VFS: Remove redundant open-coded mode bit checks in open_exec(). ... Browse Code »

The check in open_exec() for inode->i_mode & 0111 has been made
redundant by the fix to permission().

Signed-off-by: Trond Myklebust
(cherry picked from 1d3741c5d991686699f100b65b9956f7ee7ae0ae commit)

Trond Myklebust
2006-08-25 03:55:16 +0800
9167b0b9a VFS: Remove redundant open-coded mode bit check in prepare_binfmt(). ... Browse Code »

The check in prepare_binfmt() for inode->i_mode & 0111 is redundant,
since open_exec() will already have done that.

Signed-off-by: Trond Myklebust
(cherry picked from 822dec482ced07af32c378cd936d77345786572b commit)

Trond Myklebust
2006-08-25 03:55:06 +0800

01 Jul, 2006

1 commit

6ab3d5624 Remove obsolete #include <linux/config.h> ... Browse Code »

Signed-off-by: Jörn Engel
Signed-off-by: Adrian Bunk

Jörn Engel
2006-07-01 01:25:36 +0800

27 Jun, 2006

8 commits

5debfa6da [PATCH] coredump: shutdown current process first ... Browse Code »

This patch optimizes zap_threads() for the case when there are no ->mm
users except the current's thread group. In that case we can avoid
'for_each_process()' loop.

It also adds a useful invariant: SIGNAL_GROUP_EXIT (if checked under
->siglock) always implies that all threads (except may be current) have
pending SIGKILL.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-06-27 00:58:27 +0800
dcf560c59 [PATCH] coredump: some code relocations ... Browse Code »

This is a preparation for the next patch. No functional changes.
Basically, this patch moves '->flags & SIGNAL_GROUP_EXIT' check into
zap_threads(), and 'complete(vfork_done)' into coredump_wait outside of
->mmap_sem protected area.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-06-27 00:58:27 +0800
7b1c6154f [PATCH] coredump: don't take tasklist_lock ... Browse Code »

This patch removes tasklist_lock from zap_threads().
This is safe wrt:

do_exit:
The caller holds mm->mmap_sem. This means that task which
shares the same ->mm can't pass exit_mm(), so it can't be
unhashed from init_task.tasks or ->thread_group lists.

fork:
None of sub-threads can fork after zap_process(leader). All
processes which were created before this point should be
visible to zap_threads() because copy_process() adds the new
process to the tail of init_task.tasks list, and ->siglock
lock/unlock provides a memory barrier.

de_thread:
It does list_replace_rcu(&leader->tasks, ¤t->tasks).
So zap_threads() will see either old or new leader, it does
not matter. However, it can change p->sighand, so we should
use lock_task_sighand() in zap_process().

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-06-27 00:58:27 +0800
d5f70c00a [PATCH] coredump: kill ptrace related stuff ... Browse Code »

With this patch zap_process() sets SIGNAL_GROUP_EXIT while sending SIGKILL to
the thread group. This means that a TASK_TRACED task

1. Will be awakened by signal_wake_up(1)

2. Can't sleep again via ptrace_notify()

3. Can't go to do_signal_stop() after return
from ptrace_stop() in get_signal_to_deliver()

So we can remove all ptrace related stuff from coredump path.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-06-27 00:58:27 +0800
281de339c [PATCH] coredump: speedup SIGKILL sending ... Browse Code »

With this patch a thread group is killed atomically under ->siglock. This is
faster because we can use sigaddset() instead of force_sig_info() and this is
used in further patches.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-06-27 00:58:27 +0800
aceecc041 [PATCH] coredump: optimize ->mm users traversal ... Browse Code »

zap_threads() iterates over all threads to find those ones which share
current->mm. All threads in the thread group share the same ->mm, so we can
skip entire thread group if it has another ->mm.

This patch shifts the killing of thread group into the newly added
zap_process() function. This looks as unnecessary complication, but it is
used in further patches.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-06-27 00:58:26 +0800
2ceb8693e [PATCH] de_thread: fix lockless do_each_thread ... Browse Code »

We should keep the value of old_leader->tasks.next in de_thread, otherwise
we can't do for_each_process/do_each_thread without tasklist_lock held.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-06-27 00:58:26 +0800
48e6484d4 [PATCH] proc: Rewrite the proc dentry flush on exit optimization ... Browse Code »

To keep the dcache from filling up with dead /proc entries we flush them on
process exit. However over the years that code has gotten hairy with a
dentry_pointer and a lock in task_struct and misdocumented as a correctness
feature.

I have rewritten this code to look and see if we have a corresponding entry in
the dcache and if so flush it on process exit. This removes the extra fields
in the task_struct and allows me to trivially handle the case of a
/proc//task/ entry as well as the current /proc/ entries.

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-06-27 00:58:24 +0800

23 Jun, 2006

1 commit

c89681ed7 [PATCH] remove steal_locks() ... Browse Code »

This patch removes the steal_locks() function.

steal_locks() doesn't work correctly with any filesystem that does it's own
lock management, including NFS, CIFS, etc.

In addition it has weird semantics on local filesystems in case tasks
sharing file-descriptor tables are doing POSIX locking operations in
parallel to execve().

The steal_locks() function has an effect on applications doing:

clone(CLONE_FILES)
/* in child */
lock
execve
lock

POSIX locks acquired before execve (by "child", "parent" or any further
task sharing files_struct) will after the execve be owned exclusively by
"child".

According to Chris Wright some LSB/LTP kind of suite triggers without the
stealing behavior, but there's no known real-world application that would
also fail.

Apps using NPTL are not affected, since all other threads are killed before
execve.

Apps using LinuxThreads are only affected if they

- have multiple threads during exec (LinuxThreads doesn't kill other
threads, the app may do it with pthread_kill_other_threads_np())
- rely on POSIX locks being inherited across exec

Both conditions are documented, but not their interaction.

Apps using clone() natively are affected if they

- use clone(CLONE_FILES)
- rely on POSIX locks being inherited across exec

The above scenarios are unlikely, but possible.

If the patch is vetoed, there's a plan B, that involves mostly keeping the
weird stealing semantics, but changing the way lock ownership is handled so
that network and local filesystems work consistently.

That would add more complexity though, so this solution seems to be
preferred by most people.

Signed-off-by: Miklos Szeredi
Cc: Trond Myklebust
Cc: Matthew Wilcox
Cc: Chris Wright
Cc: Christoph Hellwig
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2006-06-23 06:05:57 +0800

20 Jun, 2006

1 commit

473ae30bc [PATCH] execve argument logging ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2006-06-20 17:25:21 +0800

20 Apr, 2006

1 commit

5e85d4abe [PATCH] task: Make task list manipulations RCU safe ... Browse Code »

While we can currently walk through thread groups, process groups, and
sessions with just the rcu_read_lock, this opens the door to walking the
entire task list.

We already have all of the other RCU guarantees so there is no cost in
doing this, this should be enough so that proc can stop taking the
tasklist lock during readdir.

prev_task was killed because it has no users, and using it will miss new
tasks when doing an rcu traversal.

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-04-20 00:13:49 +0800

14 Apr, 2006

1 commit

c06511d12 [PATCH] de_thread: Don't change our parents and ptrace flags. ... Browse Code »

This is two distinct changes.
- Not changing our real parents.
- Not changing our ptrace parents.

Not changing our real parents is trivially correct because both tasks
have the same real parents as they are part of a thread group. Now that
we demote the leader to a thread there is no longer any reason to change
it's parentage.

Not changing our ptrace parents is a user visible change if someone
looks hard enough. I don't think user space applications will care or
even notice.

In the practical and I think common case a debugger will have attached
to all of the threads using the same ptrace flags. From my quick skim
of strace and gdb that appears to be the case. Which if true means
debuggers will not notice a change.

Before this point we have already generated a ptrace event in do_exit
that reports the leaders pid has died so de_thread is visible to a
debugger. Which means attempting to hide this case by copying flags
around appears excessive.

By not doing anything it avoids all of the weird locking issues between
de_thread and ptrace attach, and removes one case from consideration for
fixing the ptrace locking.

This only addresses Oleg's first concern with ptrace_attach, that of the
problems caused by reparenting. Oleg's second concern is essentially a
race between ptrace_attach and release_task that causes an oops when we
get to force_sig_specific. There is nothing special about de_thread
with respect to that race.

Signed-off-by: Eric W. Biederman
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-04-14 23:49:19 +0800

11 Apr, 2006

2 commits

f5e902817 [PATCH] process accounting: take original leader's start_time in non-leader exec ... Browse Code »

The only record we have of the real-time age of a process, regardless of
execs it's done, is start_time. When a non-leader thread exec, the
original start_time of the process is lost. Things looking at the
real-time age of the process are fooled, for example the process accounting
record when the process finally dies. This change makes the oldest
start_time stick around with the process after a non-leader exec. This way
the association between PID and start_time is kept constant, which seems
correct to me.

Signed-off-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2006-04-11 21:18:42 +0800
de12a7878 [PATCH] de_thread: Don't confuse users do_each_thread. ... Browse Code »

Oleg Nesterov spotted two interesting bugs with the current de_thread
code. The simplest is a long standing double decrement of
__get_cpu_var(process_counts) in __unhash_process. Caused by
two processes exiting when only one was created.

The other is that since we no longer detach from the thread_group list
it is possible for do_each_thread when run under the tasklist_lock to
see the same task_struct twice. Once on the task list as a
thread_group_leader, and once on the thread list of another
thread.

The double appearance in do_each_thread can cause a double increment
of mm_core_waiters in zap_threads resulting in problems later on in
coredump_wait.

To remedy those two problems this patch takes the simple approach
of changing the old thread group leader into a child thread.
The only routine in release_task that cares is __unhash_process,
and it can be trivially seen that we handle cleaning up a
thread group leader properly.

Since de_thread doesn't change the pid of the exiting leader process
and instead shares it with the new leader process. I change
thread_group_leader to recognize group leadership based on the
group_leader field and not based on pids. This should also be
slightly cheaper then the existing thread_group_leader macro.

I performed a quick audit and I couldn't see any user of
thread_group_leader that cared about the difference.

Signed-off-by: Eric W. Biederman
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-04-11 07:36:50 +0800

01 Apr, 2006

1 commit

7dddb12c6 BUG_ON() Conversion in fs/exec.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-01 07:13:38 +0800

29 Mar, 2006

1 commit

aa1757f90 [PATCH] convert sighand_cache to use SLAB_DESTROY_BY_RCU ... Browse Code »

This patch borrows a clever Hugh's 'struct anon_vma' trick.

Without tasklist_lock held we can't trust task->sighand until we locked it
and re-checked that it is still the same.

But this means we don't need to defer 'kmem_cache_free(sighand)'. We can
return the memory to slab immediately, all we need is to be sure that
sighand->siglock can't dissapear inside rcu protected section.

To do so we need to initialize ->siglock inside ctor function,
SLAB_DESTROY_BY_RCU does the rest.

Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-03-29 10:36:42 +0800