Doug / smarc-fsl-linux-kernel | Embedian Git Server

09 Feb, 2008

40 commits

46f4f8f66 IRQ_NOPROBE helper functions ... Browse Code »

Probing non-ISA interrupts using the handle_percpu_irq as their handle_irq
method may crash the system because handle_percpu_irq does not check
IRQ_WAITING. This for example hits the MIPS Qemu configuration.

This patch provides two helper functions set_irq_noprobe and set_irq_probe to
set rsp. clear the IRQ_NOPROBE flag. The only current caller is MIPS code
but this really belongs into generic code.

As an aside, interrupt probing these days has become a mostly obsolete if not
dangerous art. I think Linux interrupts should be changed to default to
non-probing but that's subject of this patch.

Signed-off-by: Ralf Baechle
Acked-and-tested-by: Rob Landley
Cc: Alan Cox
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ralf Baechle
2008-02-09 01:22:42 +0800
06b2a76d2 Add new string functions strict_strto* and convert kernel params to use them ... Browse Code »

Currently, for every sysfs node, the callers will be responsible for
implementing store operation, so many many callers are doing duplicate
things to validate input, they have the same mistakes because they are
calling simple_strtol/ul/ll/uul, especially for module params, they are
just numeric, but you can echo such values as 0x1234xxx, 07777888 and
1234aaa, for these cases, module params store operation just ignores
succesive invalid char and converts prefix part to a numeric although input
is acctually invalid.

This patch tries to fix the aforementioned issues and implements
strict_strtox serial functions, kernel/params.c uses them to strictly
validate input, so module params will reject such values as 0x1234xxxx and
returns an error:

write error: Invalid argument

Any modules which export numeric sysfs node can use strict_strtox instead of
simple_strtox to reject any invalid input.

Here are some test results:

Before applying this patch:

[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]#

After applying this patch:

[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo -n 4096 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]#

[akpm@linux-foundation.org: fix compiler warnings]
[akpm@linux-foundation.org: fix off-by-one found by tiwai@suse.de]
Signed-off-by: Yi Yang
Cc: Greg KH
Cc: "Randy.Dunlap"
Cc: Takashi Iwai
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yi Yang
2008-02-09 01:22:41 +0800
fa7303e22 cpu: fix section mismatch warnings for enable_nonboot_cpus ... Browse Code »

Fix following warning:
WARNING: o-x86_64/kernel/built-in.o(.text+0x36d8b): Section mismatch in reference from the function enable_nonboot_cpus() to the function .cpuinit.text:_cpu_up()

enable_nonboot_cpus() are used solely from CONFIG_CONFIG_PM_SLEEP_SMP=y
and PM_SLEEP_SMP imply HOTPLUG_CPU therefore the reference
to _cpu_up() is valid.
Annotate enable_nonboot_cpus() with __ref to silence modpost.

Signed-off-by: Sam Ravnborg
Cc: Gautham R Shenoy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sam Ravnborg
2008-02-09 01:22:41 +0800
48d13e483 Don't operate with pid_t in rtmutex tester ... Browse Code »

The proper behavior to store task's pid and get this task later is to get the
struct pid pointer and get the task with the pid_task() call.

Make it for rt_mutex_waiter->deadlock_task_pid field.

Signed-off-by: Pavel Emelyanov
Cc: "Eric W. Biederman"
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-02-09 01:22:41 +0800
8dc86af00 Use find_task_by_vpid in posix timers ... Browse Code »

All the functions that need to lookup a task by pid in posix timers obtain
this pid from a user space, and thus this value refers to a task in the same
namespace, as the current task lives in.

So the proper behavior is to call find_task_by_vpid() here.

Signed-off-by: Pavel Emelyanov
Cc: "Eric W. Biederman"
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-02-09 01:22:41 +0800
bdc807871 avoid overflows in kernel/time.c ... Browse Code »

When the conversion factor between jiffies and milli- or microseconds is
not a single multiply or divide, as for the case of HZ == 300, we currently
do a multiply followed by a divide. The intervening result, however, is
subject to overflows, especially since the fraction is not simplified (for
HZ == 300, we multiply by 300 and divide by 1000).

This is exposed to the user when passing a large timeout to poll(), for
example.

This patch replaces the multiply-divide with a reciprocal multiplication on
32-bit platforms. When the input is an unsigned long, there is no portable
way to do this on 64-bit platforms there is no portable way to do this
since it requires a 128-bit intermediate result (which gcc does support on
64-bit platforms but may generate libgcc calls, e.g. on 64-bit s390), but
since the output is a 32-bit integer in the cases affected, just simplify
the multiply-divide (*3/10 instead of *300/1000).

The reciprocal multiply used can have off-by-one errors in the upper half
of the valid output range. This could be avoided at the expense of having
to deal with a potential 65-bit intermediate result. Since the intent is
to avoid overflow problems and most of the other time conversions are only
semiexact, the off-by-one errors were considered an acceptable tradeoff.

At Ralf Baechle's suggestion, this version uses a Perl script to compute
the necessary constants. We already have dependencies on Perl for kernel
compiles. This does, however, require the Perl module Math::BigInt, which
is included in the standard Perl distribution starting with version 5.8.0.
In order to support older versions of Perl, include a table of canned
constants in the script itself, and structure the script so that
Math::BigInt isn't required if pulling values from said table.

Running the script requires that the HZ value is available from the
Makefile. Thus, this patch also adds the Kconfig variable CONFIG_HZ to the
architectures which didn't already have it (alpha, cris, frv, h8300, m32r,
m68k, m68knommu, sparc, v850, and xtensa.) It does *not* touch the sh or
sh64 architectures, since Paul Mundt has dealt with those separately in the
sh tree.

Signed-off-by: H. Peter Anvin
Cc: Ralf Baechle ,
Cc: Sam Ravnborg ,
Cc: Paul Mundt ,
Cc: Richard Henderson ,
Cc: Michael Starvik ,
Cc: David Howells ,
Cc: Yoshinori Sato ,
Cc: Hirokazu Takata ,
Cc: Geert Uytterhoeven ,
Cc: Roman Zippel ,
Cc: William L. Irwin ,
Cc: Chris Zankel ,
Cc: H. Peter Anvin ,
Cc: Jan Engelhardt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

H. Peter Anvin
2008-02-09 01:22:39 +0800
7ef3d2fd1 printk_ratelimit() functions should use CONFIG_PRINTK ... Browse Code »

Makes an embedded image a bit smaller.

Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2008-02-09 01:22:39 +0800
6d141c3ff workqueue: make delayed_work_timer_fn() static ... Browse Code »

delayed_work_timer_fn() is a timer function, make it static.

Signed-off-by: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-02-09 01:22:37 +0800
a36219ac9 The scheduled 'time' option removal ... Browse Code »

The scheduled removal of the 'time' option.

Signed-off-by: Adrian Bunk
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2008-02-09 01:22:36 +0800
efae09f3e Nuke duplicate header from sysctl.c ... Browse Code »

Don't include linux/security.h twice in kernel/sysctl.c

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2008-02-09 01:22:34 +0800
f8db694e4 Nuke a duplicate include from profile.c ... Browse Code »

Remove duplicate inclusion of linux/profile.h from kernel/profile.c

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2008-02-09 01:22:34 +0800
2dc9c9131 Nuke duplicate include from printk.c ... Browse Code »

Remove the duplicate inclusion of linux/jiffies.h from kernel/printk.c

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2008-02-09 01:22:34 +0800
8b21985c9 constify tables in kernel/sysctl_check.c ... Browse Code »

Remains the question whether it is intended that many, perhaps even large,
tables are compiled in without ever having a chance to get used, i.e.
whether there shouldn't #ifdef CONFIG_xxx get added.

[akpm@linux-foundation.org: fix cut-n-paste error]
Signed-off-by: Jan Beulich
Acked-by: "Eric W. Biederman"
Cc: Dave Jones
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
2008-02-09 01:22:31 +0800
7ad5b3a50 kernel: remove fastcall in kernel/* ... Browse Code »

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Harvey Harrison
2008-02-09 01:22:31 +0800
3eb056764 time: fix typo in comments ... Browse Code »

Fix typo in comments.

BTW: I have to fix coding style in arch/ia64/kernel/time.c also, otherwise
checkpatch.pl will be complaining.

Signed-off-by: Li Zefan
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-02-09 01:22:29 +0800
cf4fc6cb7 timekeeping: rename timekeeping_is_continuous to timekeeping_valid_for_hres ... Browse Code »

Function timekeeping_is_continuous() no longer checks flag
CLOCK_IS_CONTINUOUS, and it checks CLOCK_SOURCE_VALID_FOR_HRES now. So rename
the function accordingly.

Signed-off-by: Li Zefan
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-02-09 01:22:29 +0800
0b858e6ff clockevent: simplify list operations ... Browse Code »

list_for_each_safe() suffices here.

Signed-off-by: Li Zefan
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-02-09 01:22:29 +0800
818c35780 clocksource: remove redundant code ... Browse Code »

Flag CLOCK_SOURCE_WATCHDOG is cleared twice. Note clocksource_change_rating()
won't do anyting with the cs flag.

Signed-off-by: Li Zefan
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-02-09 01:22:29 +0800
146a505d4 Get rid of the kill_pgrp_info() function ... Browse Code »

There's only one caller left - the kill_pgrp one - so merge these two
functions and forget the kill_pgrp_info one.

Signed-off-by: Pavel Emelyanov
Reviewed-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-02-09 01:22:29 +0800
d5df763b8 Clean up the kill_something_info ... Browse Code »

This is the first step (of two) in removing the kill_pgrp_info.

All the users of this function are in kernel/signal.c, but all they need is to
call __kill_pgrp_info() with the tasklist_lock read-locked.

Fortunately, one of its users is the kill_something_info(), which already
needs this lock in one of its branches, so clean these branches up and call
the __kill_pgrp_info() directly.

Based on Oleg's view of how this function should look.

Signed-off-by: Oleg Nesterov
Signed-off-by: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-02-09 01:22:29 +0800
6c5f3e7b4 Pidns: make full use of xxx_vnr() calls ... Browse Code »

Some time ago the xxx_vnr() calls (e.g. pid_vnr or find_task_by_vpid) were
_all_ converted to operate on the current pid namespace. After this each call
like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo)
one.

Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where
appropriate.

Signed-off-by: Pavel Emelyanov
Reviewed-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-02-09 01:22:29 +0800
fea9d1755 ITIMER_REAL: convert to use struct pid ... Browse Code »

signal_struct->tsk points to the ->group_leader and thus we have the nasty
code in de_thread() which has to change it and restart ->real_timer if the
leader is changed.

Use "struct pid *leader_pid" instead. This also allows us to kill now
unneeded send_group_sig_info().

Signed-off-by: Oleg Nesterov
Acked-by: "Eric W. Biederman"
Cc: Davide Libenzi
Cc: Pavel Emelyanov
Acked-by: Roland McGrath
Acked-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:29 +0800
d36174bc2 uglify kill_pid_info() to fix kill() vs exec() race ... Browse Code »

kill_pid_info()->pid_task() could be the old leader of the execing process.
In that case it is possible that the leader will be released before we take
siglock. This means that kill_pid_info() (and thus sys_kill()) can return a
false -ESRCH.

Change the code to retry when lock_task_sighand() fails. The endless loop is
not possible, __exit_signal() both clears ->sighand and does detach_pid().

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Davide Libenzi
Cc: Pavel Emelyanov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:28 +0800
ac9a8e3f0 sys_getsid: don't use ->nsproxy directly ... Browse Code »

With the new semantics of find_vpid() we don't need to play with ->nsproxy
explicitely, _vxx() do the right things.

Also s/tasklist/rcu/.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:28 +0800
44c4e1b25 pid: Extend/Fix pid_vnr ... Browse Code »

pid_vnr returns the user space pid with respect to the pid namespace the
struct pid was allocated in. What we want before we return a pid to user
space is the user space pid with respect to the pid namespace of current.

pid_vnr is a very nice optimization but because it isn't quite what we want
it is easy to use pid_vnr at times when we aren't certain the struct pid
was allocated in our pid namespace.

Currently this describes at least tiocgpgrp and tiocgsid in ttyio.c the
parent process reported in the core dumps and the parent process in
get_signal_to_deliver.

So unless the performance impact is huge having an interface that does what
we want instead of always what we want should be much more reliable and
much less error prone.

Signed-off-by: Eric W. Biederman
Cc: Oleg Nesterov
Acked-by: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2008-02-09 01:22:27 +0800
161550d74 pid: sys_wait... fixes ... Browse Code »

This modifies do_wait and eligible child to take a pair of enum pid_type
and struct pid *pid to precisely specify what set of processes are eligible
to be waited for, instead of the raw pid_t value from sys_wait4.

This fixes a bug in sys_waitid where you could not wait for children in
just process group 1.

This fixes a pid namespace crossing case in eligible_child. Allowing us to
wait for a processes in our current process group even if our current
process group == 0.

This allows the no child with this pid case to be optimized. This allows
us to optimize the pid membership test in eligible child to be optimized.

This even closes a theoretical pid wraparound race where in a threaded
parent if two threads are waiting for the same child and one thread picks
up the child and the pid numbers wrap around and generate another child
with that same pid before the other thread is scheduled (teribly insanely
unlikely) we could end up waiting on the second child with the same pid#
and not discover that the specific child we were waiting for has exited.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Eric W. Biederman
Cc: Oleg Nesterov
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2008-02-09 01:22:27 +0800
5dee1707d move the related code from exit_notify() to exit_signals() ... Browse Code »

The previous bugfix was not optimal, we shouldn't care about group stop
when we are the only thread or the group stop is in progress. In that case
nothing special is needed, just set PF_EXITING and return.

Also, take the related "TIF_SIGPENDING re-targeting" code from exit_notify().

So, from the performance POV the only difference is that we don't trust
!signal_pending() until we take ->siglock. But this in fact fixes another
___pure___ theoretical minor race. __group_complete_signal() finds the
task without PF_EXITING and chooses it as the target for signal_wake_up().
But nothing prevents this task from exiting in between without noticing the
pending signal and thus unpredictably delaying the actual delivery.

Signed-off-by: Oleg Nesterov
Cc: Davide Libenzi
Cc: Ingo Molnar
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:27 +0800
6806aac6d sys_setsid: remove now unneeded session != 1 check ... Browse Code »

Eric's "fix clone(CLONE_NEWPID)" eliminated the last reason for this hack.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:27 +0800
d12619b5f fix group stop with exit race ... Browse Code »

do_signal_stop() counts all sub-thread and sets ->group_stop_count
accordingly. Every thread should decrement ->group_stop_count and stop,
the last one should notify the parent.

However a sub-thread can exit before it notices the signal_pending(), or it
may be somewhere in do_exit() already. In that case the group stop never
finishes properly.

Note: this is a minimal fix, we can add some optimizations later. Say we
can return quickly if thread_group_empty(). Also, we can move some signal
related code from exit_notify() to exit_signals().

Signed-off-by: Oleg Nesterov
Acked-by: Davide Libenzi
Cc: Ingo Molnar
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:27 +0800
430c62312 start the global /sbin/init with 0,0 special pids ... Browse Code »

As Eric pointed out, there is no problem with init starting with sid == pgid
== 0, and this was historical linux behavior changed in 2.6.18.

Remove kernel_init()->__set_special_pids(), this is unneeded and complicates
the rules for sys_setsid().

This change and the previous change in daemonize() mean that /sbin/init does
not need the special "session != 1" hack in sys_setsid() any longer. We can't
remove this check yet, we should cleanup copy_process(CLONE_NEWPID) first, so
update the comment only.

Signed-off-by: Oleg Nesterov
Acked-by: "Eric W. Biederman"
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:27 +0800
297bd42b1 move daemonized kernel threads into the swapper's session ... Browse Code »

Daemonized kernel threads run in the init's session. This doesn't match the
behaviour of kthread_create()'ed threads, and this is one of the 2 reasons
why we need a special hack in sys_setsid().

Now that set_special_pids() was changed to use struct pid, not pid_t, we can
use init_struct_pid and set 0,0 special pids.

Signed-off-by: Oleg Nesterov
Acked-by: "Eric W. Biederman"
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:27 +0800
8520d7c7f teach set_special_pids() to use struct pid ... Browse Code »

Change set_special_pids() to work with struct pid, not pid_t from global name
space. This again speedups and imho cleanups the code, also a preparation for
the next patch.

Signed-off-by: Oleg Nesterov
Acked-by: "Eric W. Biederman"
Acked-by: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:27 +0800
e4cc0a9c8 fix setsid() for sub-namespace /sbin/init ... Browse Code »

sys_setsid() still deals with pid_t's from the global namespace. This means
that the "session > 1" check can't help for sub-namespace init, setsid() can't
succeed because copy_process(CLONE_NEWPID) populates PIDTYPE_PGID/SID links.

Remove the usage of task_struct->pid and convert the code to use "struct pid".
This also simplifies and speedups the code, saves one find_pid().

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Acked-by: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:27 +0800
4e021306c sys_setpgid(): simplify pid/ns interaction ... Browse Code »

sys_setpgid() does unneeded conversions from pid_t to "struct pid" and vice
versa. Use "struct pid" more consistently. Saves one find_vpid() and
eliminates the explicit usage of ->nsproxy->pid_ns. Imho, cleanups the
code.

Also use the same_thread_group() helper.

Signed-off-by: Oleg Nesterov
Acked-by: Pavel Emelyanov
Acked-by: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:27 +0800
c543f1ee0 wait_task_zombie: remove ->exit_state/exit_signal checks for WNOWAIT ... Browse Code »

The first "p->exit_state != EXIT_ZOMBIE" check doesn't make too much sense.
The exit_state was EXIT_ZOMBIE when the function was called, and another
thread can change it to EXIT_DEAD right after the check.

The second condition is not possible, detached non-traced threads were already
filtered out by eligible_child(), we didn't drop tasklist since then.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:27 +0800
3a515e4a6 wait_task_continued/zombie: don't use task_pid_nr_ns() lockless ... Browse Code »

Surprise, the other two wait_task_*() functions also abuse the
task_pid_nr_ns() function, and may cause read-after-free or report nr == 0
in wait_task_continued(). wait_task_zombie() doesn't have this problem,
but it is still better to cache pid_t rather than call task_pid_nr_ns()
three times on the saved pid_namespace.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:26 +0800
f2cc3eb13 do_wait: fix security checks ... Browse Code »

Imho, the current usage of security_task_wait() is not logical.

Suppose we have the single child p, and security_task_wait(p) return
-EANY. In that case waitpid(-1) returns this error. Why? Isn't it
better to return ECHLD? We don't really have reapable children.

Now suppose that child was stolen by gdb. In that case we find this
child on ->ptrace_children and set flag = 1, but we don't check that the
child was denied. So, do_wait(..., WNOHANG) returns 0, this doesn't
match the behaviour above. Without WNOHANG do_wait() blocks only to
return the error later, when the child will be untraced. Inho, really
strange.

I think eligible_child() should return the error only if the child's pid
was requested explicitly, otherwise we should silently ignore the tasks
which were nacked by security_task_wait().

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Cc: Chris Wright
Cc: Eric Paris
Cc: James Morris
Cc: Stephen Smalley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:26 +0800
96fabbf55 do_wait: cleanup delay_group_leader() usage ... Browse Code »

eligible_child() == 2 means delay_group_leader(). With the previous patch
this only matters for EXIT_ZOMBIE task, we can move that special check to
the only place it is really needed.

Also, with this patch we don't skip security_task_wait() for the group
leaders in a non-empty thread group. I don't really understand the exact
semantics of security_task_wait(), but imho this change is a bugfix.

Also rearrange the code a bit to kill an ugly "check_continued" backdoor.

Signed-off-by: Oleg Nesterov
Cc: Eric Paris
Cc: James Morris
Cc: Roland McGrath
Cc: Stephen Smalley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:26 +0800
1bad95c3b wait_task_stopped(): remove unneeded delay_group_leader check ... Browse Code »

wait_task_stopped() doesn't need the "delay_group_leader" parameter. If
the child is not traced it must be a group leader. With or without
subthreads ->group_stop_count == 0 when the whole task is stopped.

Signed-off-by: Oleg Nesterov
Cc: Mika Penttila
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:26 +0800
20686a309 ptrace_stop: fix racy nonstop_code setting ... Browse Code »

If the tracer is gone and we are not going to stop, ptrace_stop() sets
->exit_code = nostop_code. However, the tracer could actually clear the
exit code before detaching. In that case get_signal_to_deliver() "resends"
the signal which was cancelled by the debugger. For example, it is
possible that a quick PTRACE_ATTACH + PTRACE_DETACH can leave the tracee in
STOPPED state.

Change the behaviour of ptrace_stop(). If the caller is ptrace notify(),
we should always clear ->exit_code. If the caller is
get_signal_to_deliver(), we should not touch it at all. To do so, change
the nonstop_code parameter to "bool clear_code" and change the callers
accordingly.

Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:26 +0800