Eric Lee / smarc-fsl-linux-kernel

21 Aug, 2008

1 commit

2d70b68d4 fix setpriority(PRIO_PGRP) thread iterator breakage ... Browse Code »

When user calls sys_setpriority(PRIO_PGRP ...) on a NPTL style multi-LWP
process, only the task leader of the process is affected, all other
sibling LWP threads didn't receive the setting. The problem was that the
iterator used in sys_setpriority() only iteartes over one task for each
process, ignoring all other sibling thread.

Introduce a new macro do_each_pid_thread / while_each_pid_thread to walk
each thread of a process. Convert 4 call sites in {set/get}priority and
ioprio_{set/get}.

Signed-off-by: Ken Chen
Cc: Oleg Nesterov
Cc: Roland McGrath
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken Chen
2008-08-21 06:40:32 +0800

15 Aug, 2008

1 commit

ca195b7f6 kexec jump: remove duplication of kexec_restart_prepare() ... Browse Code »

Call kernel_restart_prepare() in kernel_kexec() instead of duplicating the
code.

Signed-off-by: Huang Ying
Acked-by: Pavel Machek
Acked-by: Vivek Goyal
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: "Eric W. Biederman"
Cc: Vivek Goyal
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-08-15 23:35:42 +0800

27 Jul, 2008

1 commit

3ab835213 kexec jump ... Browse Code »

This patch provides an enhancement to kexec/kdump. It implements the
following features:

- Backup/restore memory used by the original kernel before/after
kexec.

- Save/restore CPU state before/after kexec.

The features of this patch can be used as a general method to call program in
physical mode (paging turning off). This can be used to call BIOS code under
Linux.

kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:

source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

Usage example of calling some physical mode code and return:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_KEXEC=y
CONFIG_PM=y
CONFIG_KEXEC_JUMP=y

2. Build patched kexec-tool or download the pre-built one.

3. Build some physical mode executable named such as "phy_mode"

4. Boot kernel compiled in step 1.

5. Load physical mode executable with /sbin/kexec. The shell command
line can be as follow:

/sbin/kexec --load-preserve-context --args-none phy_mode

6. Call physical mode executable with following shell command line:

/sbin/kexec -e

Implementation point:

To support jumping without reserving memory. One shadow backup page (source
page) is allocated for each page used by kexeced code image (destination
page). When do kexec_load, the image of kexeced code is loaded into source
pages, and before executing, the destination pages and the source pages are
swapped, so the contents of destination pages are backupped. Before jumping
to the kexeced code image and after jumping back to the original kernel, the
destination pages and the source pages are swapped too.

C ABI (calling convention) is used as communication protocol between
kernel and called code.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.

Now, only the i386 architecture is supported.

Signed-off-by: Huang Ying
Acked-by: Vivek Goyal
Cc: "Eric W. Biederman"
Cc: Pavel Machek
Cc: Nigel Cunningham
Cc: "Rafael J. Wysocki"
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-07-27 03:00:04 +0800

26 Jul, 2008

2 commits

7394f0f6c unexport uts_sem ... Browse Code »

With the removal of the Solaris binary emulation the export of
uts_sem became unused.

Signed-off-by: Adrian Bunk
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2008-07-26 01:53:45 +0800
ac331d158 call_usermodehelper(): increase reliability ... Browse Code »

Presently call_usermodehelper_setup() uses GFP_ATOMIC. but it can return
NULL _very_ easily.

GFP_ATOMIC is needed only when we can't sleep. and, GFP_KERNEL is robust
and better.

thus, I add gfp_mask argument to call_usermodehelper_setup().

So, its callers pass the gfp_t as below:

call_usermodehelper() and call_usermodehelper_keys():
depend on 'wait' argument.
call_usermodehelper_pipe():
always GFP_KERNEL because always run under process context.
orderly_poweroff():
pass to GFP_ATOMIC because may run under interrupt context.

Signed-off-by: KOSAKI Motohiro
Cc: "Paul Menage"
Reviewed-by: Li Zefan
Acked-by: Jeremy Fitzhardinge
Cc: Rusty Russell
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2008-07-26 01:53:28 +0800

25 May, 2008

1 commit

7b26655f6 sys_prctl(): fix return of uninitialized value ... Browse Code »

If none of the switch cases match, the PR_SET_PDEATHSIG and
PR_SET_DUMPABLE cases of the switch statement will never write to local
variable `error'.

Signed-off-by: Shi Weihua
Cc: Andrew G. Morgan
Acked-by: "Serge E. Hallyn"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shi Weihua
2008-05-25 00:56:13 +0800

30 Apr, 2008

4 commits

12a3de0a9 pids: sys_getpgid: fix unsafe *pid usage, s/tasklist/rcu/ ... Browse Code »

1. sys_getpgid() needs rcu_read_lock() to derive the pgrp _nr, even if
the task is current, otherwise we can race with another thread which
does sys_setpgid().

2. Use rcu_read_lock() instead of tasklist_lock when pid != 0, make sure
that we don't use the NULL pid if the task exits right after successful
find_task_by_vpid().

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-04-30 23:29:49 +0800
1dd768c08 pids: sys_getsid: fix unsafe *pid usage, fix possible 0 instead of -ESRCH ... Browse Code »

1. sys_getsid() needs rcu_read_lock() to derive the session _nr, even if
the task is current, otherwise we can race with another thread which
does sys_setsid().

2. The task can exit between find_task_by_vpid() and task_session_vnr(),
in that unlikely case sys_getsid() returns 0 instead of -ESRCH.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-04-30 23:29:49 +0800
83beaf3c6 pids: sys_setpgid: use change_pid() helper ... Browse Code »

Use change_pid() instead of detach_pid() + attach_pid() in sys_setpgid().

This way task_pgrp() is not NULL in between.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Pavel Emelyanov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-04-30 23:29:49 +0800
d6cf723a1 k_getrusage: don't take rcu_read_lock() ... Browse Code »

Just a trivial example, more to come.

k_getrusage() holds rcu_read_lock() because it was previously required by
lock_task_sighand(). Unneeded now.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: "Paul E. McKenney"
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-04-30 23:29:34 +0800

29 Apr, 2008

1 commit

679c9cd4a add RUSAGE_THREAD ... Browse Code »

Add the RUSAGE_THREAD option for the getrusage system call. This is
essentially Roland's patch from http://lkml.org/lkml/2008/1/18/589, but the
line about RUSAGE_LWP line has been removed, as suggested by Ulrich and
Christoph.

Signed-off-by: Roland McGrath
Signed-off-by: Sripathi Kodi
Cc: Ingo Molnar
Cc: Michael Kerrisk
Cc: Ulrich Drepper
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sripathi Kodi
2008-04-29 23:05:59 +0800

28 Apr, 2008

1 commit

3898b1b4e capabilities: implement per-process securebits ... Browse Code »

Filesystem capability support makes it possible to do away with (set)uid-0
based privilege and use capabilities instead. That is, with filesystem
support for capabilities but without this present patch, it is (conceptually)
possible to manage a system with capabilities alone and never need to obtain
privilege via (set)uid-0.

Of course, conceptually isn't quite the same as currently possible since few
user applications, certainly not enough to run a viable system, are currently
prepared to leverage capabilities to exercise privilege. Further, many
applications exist that may never get upgraded in this way, and the kernel
will continue to want to support their setuid-0 base privilege needs.

Where pure-capability applications evolve and replace setuid-0 binaries, it is
desirable that there be a mechanisms by which they can contain their
privilege. In addition to leveraging the per-process bounding and inheritable
sets, this should include suppressing the privilege of the uid-0 superuser
from the process' tree of children.

The feature added by this patch can be leveraged to suppress the privilege
associated with (set)uid-0. This suppression requires CAP_SETPCAP to
initiate, and only immediately affects the 'current' process (it is inherited
through fork()/exec()). This reimplementation differs significantly from the
historical support for securebits which was system-wide, unwieldy and which
has ultimately withered to a dead relic in the source of the modern kernel.

With this patch applied a process, that is capable(CAP_SETPCAP), can now drop
all legacy privilege (through uid=0) for itself and all subsequently
fork()'d/exec()'d children with:

prctl(PR_SET_SECUREBITS, 0x2f);

This patch represents a no-op unless CONFIG_SECURITY_FILE_CAPABILITIES is
enabled at configure time.

[akpm@linux-foundation.org: fix uninitialised var warning]
[serue@us.ibm.com: capabilities: use cap_task_prctl when !CONFIG_SECURITY]
Signed-off-by: Andrew G. Morgan
Acked-by: Serge Hallyn
Reviewed-by: James Morris
Cc: Stephen Smalley
Cc: Paul Moore
Signed-off-by: Serge E. Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew G. Morgan
2008-04-28 23:58:26 +0800

20 Apr, 2008

1 commit

8fb402bcc generic, x86: add prctl commands PR_GET_TSC and PR_SET_TSC ... Browse Code »

This patch adds prctl commands that make it possible
to deny the execution of timestamp counters in userspace.
If this is not implemented on a specific architecture,
prctl will return -EINVAL.

ned-off-by: Erik Bosman
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Erik Bosman
2008-04-20 01:19:55 +0800

09 Feb, 2008

7 commits

6c5f3e7b4 Pidns: make full use of xxx_vnr() calls ... Browse Code »

Some time ago the xxx_vnr() calls (e.g. pid_vnr or find_task_by_vpid) were
_all_ converted to operate on the current pid namespace. After this each call
like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo)
one.

Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where
appropriate.

Signed-off-by: Pavel Emelyanov
Reviewed-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-02-09 01:22:29 +0800
ac9a8e3f0 sys_getsid: don't use ->nsproxy directly ... Browse Code »

With the new semantics of find_vpid() we don't need to play with ->nsproxy
explicitely, _vxx() do the right things.

Also s/tasklist/rcu/.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:28 +0800
6806aac6d sys_setsid: remove now unneeded session != 1 check ... Browse Code »

Eric's "fix clone(CLONE_NEWPID)" eliminated the last reason for this hack.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:27 +0800
430c62312 start the global /sbin/init with 0,0 special pids ... Browse Code »

As Eric pointed out, there is no problem with init starting with sid == pgid
== 0, and this was historical linux behavior changed in 2.6.18.

Remove kernel_init()->__set_special_pids(), this is unneeded and complicates
the rules for sys_setsid().

This change and the previous change in daemonize() mean that /sbin/init does
not need the special "session != 1" hack in sys_setsid() any longer. We can't
remove this check yet, we should cleanup copy_process(CLONE_NEWPID) first, so
update the comment only.

Signed-off-by: Oleg Nesterov
Acked-by: "Eric W. Biederman"
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:27 +0800
8520d7c7f teach set_special_pids() to use struct pid ... Browse Code »

Change set_special_pids() to work with struct pid, not pid_t from global name
space. This again speedups and imho cleanups the code, also a preparation for
the next patch.

Signed-off-by: Oleg Nesterov
Acked-by: "Eric W. Biederman"
Acked-by: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:27 +0800
e4cc0a9c8 fix setsid() for sub-namespace /sbin/init ... Browse Code »

sys_setsid() still deals with pid_t's from the global namespace. This means
that the "session > 1" check can't help for sub-namespace init, setsid() can't
succeed because copy_process(CLONE_NEWPID) populates PIDTYPE_PGID/SID links.

Remove the usage of task_struct->pid and convert the code to use "struct pid".
This also simplifies and speedups the code, saves one find_pid().

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Acked-by: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:27 +0800
4e021306c sys_setpgid(): simplify pid/ns interaction ... Browse Code »

sys_setpgid() does unneeded conversions from pid_t to "struct pid" and vice
versa. Use "struct pid" more consistently. Saves one find_vpid() and
eliminates the explicit usage of ->nsproxy->pid_ns. Imho, cleanups the
code.

Also use the same_thread_group() helper.

Signed-off-by: Oleg Nesterov
Acked-by: Pavel Emelyanov
Acked-by: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-02-09 01:22:27 +0800

07 Feb, 2008

2 commits

1bf47346d kernel/sys.c: get rid of expensive divides in groups_sort() ... Browse Code »

groups_sort() can be quite long if user loads a large gid table.

This is because GROUP_AT(group_info, some_integer) uses an integer divide.
So having to do XXX thousand divides during one syscall can lead to very
high latencies. (NGROUPS_MAX=65536)

In the past (25 Mar 2006), an analog problem was found in groups_search()
(commit d74beb9f33a5f16d2965f11b275e401f225c949d ) and at that time I
changed some variables to unsigned int.

I believe that a more generic fix is to make sure NGROUPS_PER_BLOCK is
unsigned.

Signed-off-by: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2008-02-07 02:41:09 +0800
9cfe015aa get rid of NR_OPEN and introduce a sysctl_nr_open ... Browse Code »

NR_OPEN (historically set to 1024*1024) actually forbids processes to open
more than 1024*1024 handles.

Unfortunatly some production servers hit the not so 'ridiculously high
value' of 1024*1024 file descriptors per process.

Changing NR_OPEN is not considered safe because of vmalloc space potential
exhaust.

This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to
1024*1024, so that admins can decide to change this limit if their workload
needs it.

[akpm@linux-foundation.org: export it for sparc64]
Signed-off-by: Eric Dumazet
Cc: Alan Cox
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: "David S. Miller"
Cc: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2008-02-07 02:41:06 +0800

06 Feb, 2008

2 commits

4ef7229ff make kernel_shutdown_prepare() static ... Browse Code »

kernel_shutdown_prepare() can now become static.

Signed-off-by: Adrian Bunk
Acked-by: Pavel Machek
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2008-02-06 01:44:22 +0800
3b7391de6 capabilities: introduce per-process capability bounding set ... Browse Code »

The capability bounding set is a set beyond which capabilities cannot grow.
Currently cap_bset is per-system. It can be manipulated through sysctl,
but only init can add capabilities. Root can remove capabilities. By
default it includes all caps except CAP_SETPCAP.

This patch makes the bounding set per-process when file capabilities are
enabled. It is inherited at fork from parent. Noone can add elements,
CAP_SETPCAP is required to remove them.

One example use of this is to start a safer container. For instance, until
device namespaces or per-container device whitelists are introduced, it is
best to take CAP_MKNOD away from a container.

The bounding set will not affect pP and pE immediately. It will only
affect pP' and pE' after subsequent exec()s. It also does not affect pI,
and exec() does not constrain pI'. So to really start a shell with no way
of regain CAP_MKNOD, you would do

prctl(PR_CAPBSET_DROP, CAP_MKNOD);
cap_t cap = cap_get_proc();
cap_value_t caparray[1];
caparray[0] = CAP_MKNOD;
cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP);
cap_set_proc(cap);
cap_free(cap);

The following test program will get and set the bounding
set (but not pI). For instance

./bset get
(lists capabilities in bset)
./bset drop cap_net_raw
(starts shell with new bset)
(use capset, setuid binary, or binary with
file capabilities to try to increase caps)

************************************************************
cap_bound.c
************************************************************
#include
#include
#include
#include
#include
#include
#include

#ifndef PR_CAPBSET_READ
#define PR_CAPBSET_READ 23
#endif

#ifndef PR_CAPBSET_DROP
#define PR_CAPBSET_DROP 24
#endif

int usage(char *me)
{
printf("Usage: %s get\n", me);
printf(" %s drop \n", me);
return 1;
}

#define numcaps 32
char *captable[numcaps] = {
"cap_chown",
"cap_dac_override",
"cap_dac_read_search",
"cap_fowner",
"cap_fsetid",
"cap_kill",
"cap_setgid",
"cap_setuid",
"cap_setpcap",
"cap_linux_immutable",
"cap_net_bind_service",
"cap_net_broadcast",
"cap_net_admin",
"cap_net_raw",
"cap_ipc_lock",
"cap_ipc_owner",
"cap_sys_module",
"cap_sys_rawio",
"cap_sys_chroot",
"cap_sys_ptrace",
"cap_sys_pacct",
"cap_sys_admin",
"cap_sys_boot",
"cap_sys_nice",
"cap_sys_resource",
"cap_sys_time",
"cap_sys_tty_config",
"cap_mknod",
"cap_lease",
"cap_audit_write",
"cap_audit_control",
"cap_setfcap"
};

int getbcap(void)
{
int comma=0;
unsigned long i;
int ret;

printf("i know of %d capabilities\n", numcaps);
printf("capability bounding set:");
for (i=0; i< 0)
perror("prctl");
else if (ret==1)
printf("%s%s", (comma++) ? ", " : " ", captable[i]);
}
printf("\n");
return 0;
}

int capdrop(char *str)
{
unsigned long i;

int found=0;
for (i=0; i
Signed-off-by: Andrew G. Morgan
Cc: Stephen Smalley
Cc: James Morris
Cc: Chris Wright
Cc: Casey Schaufler a
Signed-off-by: "Serge E. Hallyn"
Tested-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2008-02-06 01:44:20 +0800

17 Nov, 2007

1 commit

4307d1e5a x86: ignore the sys_getcpu() tcache parameter ... Browse Code »

dont use the vgetcpu tcache - it's causing problems with tasks
migrating, they'll see the old cache up to a jiffy after the
migration, further increasing the costs of the migration.

In the worst case they see a complete bogus information from
the tcache, when a sys_getcpu() call "invalidated" the cache
info by incrementing the jiffies _and_ the cpuid info in the
cache and the following vdso_getcpu() call happens after
vdso_jiffies have been incremented.

Signed-off-by: Ingo Molnar
Signed-off-by: Ulrich Drepper
Signed-off-by: Thomas Gleixner

Ingo Molnar
2007-11-17 23:27:00 +0800

20 Oct, 2007

5 commits

9a2e70572 Isolate the explicit usage of signal->pgrp ... Browse Code »

The pgrp field is not used widely around the kernel so it is now marked as
deprecated with appropriate comment.

The initialization of INIT_SIGNALS is trimmed because
a) they are set to 0 automatically;
b) gcc cannot properly initialize two anonymous (the second one
is the one with the session) unions. In this particular case
to make it compile we'd have to add some field initialized
right before the .pgrp.

This is the same patch as the 1ec320afdc9552c92191d5f89fcd1ebe588334ca one
(from Cedric), but for the pgrp field.

Some progress report:

We have to deprecate the pid, tgid, session and pgrp fields on struct
task_struct and struct signal_struct. The session and pgrp are already
deprecated. The tgid value is close to being such - the worst known usage
in in fs/locks.c and audit code. The pid field deprecation is mainly
blocked by numerous printk-s around the kernel that print the tsk->pid to
log.

Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Cedric Le Goater
Cc: Serge Hallyn
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:43 +0800
228ebcbe6 Uninline find_task_by_xxx set of functions ... Browse Code »

The find_task_by_something is a set of macros are used to find task by pid
depending on what kind of pid is proposed - global or virtual one. All of
them are wrappers above the most generic one - find_task_by_pid_type_ns() -
and just substitute some args for it.

It turned out, that dereferencing the current->nsproxy->pid_ns construction
and pushing one more argument on the stack inline cause kernel text size to
grow.

This patch moves all this stuff out-of-line into kernel/pid.c. Together
with the next patch it saves a bit less than 400 bytes from the .text
section.

Signed-off-by: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:40 +0800
b488893a3 pid namespaces: changes to show virtual ids to user ... Browse Code »

This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.

The idea is:
- all in-kernel data structures must store either struct pid itself
or the pid's global nr, obtained with pid_nr() call;
- when seeking the task from kernel code with the stored id one
should use find_task_by_pid() call that works with global pids;
- when showing pid's numerical value to the user the virtual one
should be used, but however when one shows task's pid outside this
task's namespace the global one is to be used;
- when getting the pid from userspace one need to consider this as
the virtual one and use appropriate task/pid-searching functions.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov
Signed-off-by: Alexey Dobriyan
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:40 +0800
a47afb0f9 pid namespaces: round up the API ... Browse Code »

The set of functions process_session, task_session, process_group and
task_pgrp is confusing, as the names can be mixed with each other when looking
at the code for a long time.

The proposals are to
* equip the functions that return the integer with _nr suffix to
represent that fact,
* and to make all functions work with task (not process) by making
the common prefix of the same name.

For monotony the routines signal_session() and set_signal_session() are
replaced with task_session_nr() and set_task_session(), especially since they
are only used with the explicit task->signal dereference.

Signed-off-by: Pavel Emelianov
Acked-by: Serge E. Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Cedric Le Goater
Cc: Herbert Poetzl
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelianov
2007-10-20 02:53:37 +0800
fe9d4f576 Add kernel/notifier.c ... Browse Code »

There is separate notifier header, but no separate notifier .c file.

Extract notifier code out of kernel/sys.c which will remain for
misc syscalls I hope. Merge kernel/die_notifier.c into kernel/notifier.c.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-10-20 02:53:34 +0800

19 Oct, 2007

1 commit

c3d42d752 unexport pm_power_off_prepare ... Browse Code »

This patch removes the unused EXPORT_SYMBOL(pm_power_off_prepare).

Signed-off-by: Adrian Bunk
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2007-10-19 05:37:19 +0800

01 Oct, 2007

1 commit

4047727e5 Fix SMP poweroff hangs ... Browse Code »

We need to disable all CPUs other than the boot CPU (usually 0) before
attempting to power-off modern SMP machines. This fixes the
hang-on-poweroff issue on my MythTV SMP box, and also on Thomas Gleixner's
new toybox.

Signed-off-by: Mark Lord
Acked-by: Thomas Gleixner
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mark Lord
2007-10-01 22:52:23 +0800

31 Aug, 2007

1 commit

b07e35f94 setpgid(child) fails if the child was forked by sub-thread ... Browse Code »

Spotted by Marcin Kowalczyk .

sys_setpgid(child) fails if the child was forked by sub-thread.

Fix the "is it our child" check. The previous commit
ee0acf90d320c29916ba8c5c1b2e908d81f5057d was not complete.

(this patch asks for the new same_thread_group() helper, but mainline doesn't
have it yet).

Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc:
Tested-by: "Marcin 'Qrczak' Kowalczyk"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2007-08-31 16:42:22 +0800

30 Jul, 2007

1 commit

b0cb1a19d Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION ... Browse Code »

Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION to avoid
confusion (among other things, with CONFIG_SUSPEND introduced in the
next patch).

Signed-off-by: Rafael J. Wysocki
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-30 07:45:38 +0800

27 Jul, 2007

1 commit

58b3b71df Fix ThinkPad T42 poweroff failure introduced by by "PM: Introduce pm_power_off_prepare" ... Browse Code »

Commit bd804eba1c8597cbb7cd5a5f9fe886aae16a079a ("PM: Introduce
pm_power_off_prepare") caused problems in the poweroff path, as reported by
YOSHIFUJI Hideaki / 吉藤英明.

Generally, sysdev_shutdown() should be called after the ACPI preparation for
powering the system off. To make it happen, we can separate sysdev_shutdown()
from device_shutdown() and call it directly wherever necessary.

Signed-off-by: Rafael J. Wysocki
Tested-by: YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-27 03:13:06 +0800

20 Jul, 2007

2 commits

6c5d52382 coredump masking: reimplementation of dumpable using two flags ... Browse Code »

This patch changes mm_struct.dumpable to a pair of bit flags.

set_dumpable() converts three-value dumpable to two flags and stores it into
lower two bits of mm_struct.flags instead of mm_struct.dumpable.
get_dumpable() behaves in the opposite way.

[akpm@linux-foundation.org: export set_dumpable]
Signed-off-by: Hidehiro Kawai
Cc: Alan Cox
Cc: David Howells
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kawai, Hidehiro
2007-07-20 01:04:46 +0800
bd804eba1 PM: Introduce pm_power_off_prepare ... Browse Code »

Introduce the pm_power_off_prepare() callback that can be registered by the
interested platforms in analogy with pm_idle() and pm_power_off(), used for
preparing the system to power off (needed by ACPI).

This allows us to drop acpi_sysclass and device_acpi that are only defined in
order to register the ACPI power off preparation callback, which is needed by
pm_power_off() registered in a much different way.

Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-20 01:04:42 +0800

18 Jul, 2007

2 commits

86313c488 usermodehelper: Tidy up waiting ... Browse Code »

Rather than using a tri-state integer for the wait flag in
call_usermodehelper_exec, define a proper enum, and use that. I've
preserved the integer values so that any callers I've missed should
still work OK.

Signed-off-by: Jeremy Fitzhardinge
Cc: James Bottomley
Cc: Randy Dunlap
Cc: Christoph Hellwig
Cc: Andi Kleen
Cc: Paul Mackerras
Cc: Johannes Berg
Cc: Ralf Baechle
Cc: Bjorn Helgaas
Cc: Joel Becker
Cc: Tony Luck
Cc: Kay Sievers
Cc: Srivatsa Vaddagiri
Cc: Oleg Nesterov
Cc: David Howells

Jeremy Fitzhardinge
2007-07-18 23:47:40 +0800
10a0a8d4e Add common orderly_poweroff() ... Browse Code »

Various pieces of code around the kernel want to be able to trigger an
orderly poweroff. This pulls them together into a single
implementation.

By default the poweroff command is /sbin/poweroff, but it can be set
via sysctl: kernel/poweroff_cmd. This is split at whitespace, so it
can include command-line arguments.

This patch replaces four other instances of invoking either "poweroff"
or "shutdown -h now": two sbus drivers, and acpi thermal
management.

sparc64 has its own "powerd"; still need to determine whether it should
be replaced by orderly_poweroff().

Signed-off-by: Jeremy Fitzhardinge
Acked-by: Len Brown
Signed-off-by: Chris Wright
Cc: Andrew Morton
Cc: Randy Dunlap
Cc: Andi Kleen
Cc: Al Viro
Cc: Arnd Bergmann
Cc: David S. Miller

Jeremy Fitzhardinge
2007-07-18 23:47:40 +0800

17 Jul, 2007

1 commit

1d9d02fee move seccomp from /proc to a prctl ... Browse Code »

This reduces the memory footprint and it enforces that only the current
task can enable seccomp on itself (this is a requirement for a
strightforward [modulo preempt ;) ] TIF_NOTSC implementation).

Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2007-07-17 00:05:50 +0800