20 Oct, 2007
40 commits
-
The task_struct->pid member is going to be deprecated, so start
using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
the kernel.The first thing to start with is the pid, printed to dmesg - in
this case we may safely use task_pid_nr(). Besides, printks produce
more (much more) than a half of all the explicit pid usage.[akpm@linux-foundation.org: git-drm went and changed lots of stuff]
Signed-off-by: Pavel Emelyanov
Cc: Dave Airlie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The pgrp field is not used widely around the kernel so it is now marked as
deprecated with appropriate comment.The initialization of INIT_SIGNALS is trimmed because
a) they are set to 0 automatically;
b) gcc cannot properly initialize two anonymous (the second one
is the one with the session) unions. In this particular case
to make it compile we'd have to add some field initialized
right before the .pgrp.This is the same patch as the 1ec320afdc9552c92191d5f89fcd1ebe588334ca one
(from Cedric), but for the pgrp field.Some progress report:
We have to deprecate the pid, tgid, session and pgrp fields on struct
task_struct and struct signal_struct. The session and pgrp are already
deprecated. The tgid value is close to being such - the worst known usage
in in fs/locks.c and audit code. The pid field deprecation is mainly
blocked by numerous printk-s around the kernel that print the tsk->pid to
log.Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Cedric Le Goater
Cc: Serge Hallyn
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
tsk->exit_state can only be 0, EXIT_ZOMBIE, or EXIT_DEAD. A non-zero test
is the same as tsk->exit_state & (EXIT_ZOMBIE | EXIT_DEAD), so just testing
tsk->exit_state is sufficient.Signed-off-by: Eugene Teo
Cc: Roland McGrath
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently, there exists no method for a process to query the resource
limits of another process. They can be inferred via some mechanisms but
they cannot be explicitly determined. Given that this information can be
usefull to know during the debugging of an application, I've written this
patch which exports all of a processes limits via /proc//limits.Signed-off-by: Neil Horman
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
remove BITS_TO_TYPE macro
I realized, that it is actually the same as DIV_ROUND_UP, use it instead.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
FlashPoint, use BIT instead of BITW
BITW was an ushort variant of BIT, use BIT instead
Signed-off-by: Jiri Slaby
Cc: James Bottomley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
define global BIT macro
move all local BIT defines to the new globally define macro.
Signed-off-by: Jiri Slaby
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Kumar Gala
Cc: Dmitry Torokhov
Cc: Jeff Garzik
Cc: James Bottomley
Cc: "Antonino A. Daplas"
Cc: Russell King
Acked-by: Ralf Baechle
Cc: "John W. Linville"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
get rid of input BIT* duplicate defines
use newly global defined macros for input layer. Also remove includes of
input.h from non-input sources only for BIT macro definiton. Define the
macro temporarily in local manner, all those local definitons will be
removed further in this patchset (to not break bisecting).
BIT macro will be globally defined (1<
Cc:
Acked-by: Jiri Kosina
Cc:
Acked-by: Marcel Holtmann
Cc:
Acked-by: Mauro Carvalho Chehab
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
define first set of BIT* macros
- move BITOP_MASK and BITOP_WORD from asm-generic/bitops/atomic.h to
include/linux/bitops.h and rename it to BIT_MASK and BIT_WORD
- move BITS_TO_LONGS and BITS_PER_BYTE to bitops.h too and allow easily
define another BITS_TO_something (e.g. in event.c) by BITS_TO_TYPE macro
Remaining (and common) BIT macro will be defined after all occurences and
conflicts will be sorted out in the patches.Signed-off-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
amba-pl011, rename BIT macro
Signed-off-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
s2io, rename BIT macro
BIT macro will be global definiton of (1<
Cc: Ramkrishna Vepa
Cc: Rastapur Santosh
Cc: Sivakumar Subramani
Cc: Sreenivasa Honnur
Cc: Jeff Garzik
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
i2c-pxa, rename BIT macro to PXA_BIT
BIT macro will be global definiton of (1 << x)
Signed-off-by: Jiri Slaby
Cc: Nicolas Pitre
Cc: Jean Delvare
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch fixes errors and warnings pointed out by the checkpatch.pl
script.Antonino Daplas replaced BIT with ENCODE_BIT.
Signed-off-by: Krzysztof Helt
Signed-off-by: Antonino Daplas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
cyber2000fb, rename BIT macro
BIT will be global macro for (1 << x)
Signed-off-by: Jiri Slaby
Cc: Russell King
Cc: "Antonino A. Daplas"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
forbid asm/bitops.h direct inclusion
Because of compile errors that may occur after bit changes if asm/bitops.h is
included directly without e.g. linux/kernel.h which includes linux/bitops.h,
forbid direct inclusion of asm/bitops.h. Thanks to Adrian Bunk.Signed-off-by: Jiri Slaby
Cc: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
remove asm/bitops.h includes
including asm/bitops directly may cause compile errors. don't include it
and include linux/bitops instead. next patch will deny including asm header
directly.Cc: Adrian Bunk
Signed-off-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
fs/select, remove unused macros
this is due to preparation for global BIT macro
Signed-off-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This new version guarantees amb_bit switch in small enough intervals, so that
the device won't stop working in the middle of a movement anymore. However it
preserves old (openhaptics) functionality.Signed-off-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Wait after disabling device's interrupt until the handler finishes its work if
still in progress.Signed-off-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Cause writes to cpuset "cpus" file to update cpus_allowed for member tasks:
- collect batches of tasks under tasklist_lock and then call
set_cpus_allowed() on them outside the lock (since this can sleep).- add a simple generic priority heap type to allow efficient collection
of batches of tasks to be processed without duplicating or missing any
tasks in subsequent batches.- make "cpus" file update a no-op if the mask hasn't changed
- fix race between update_cpumask() and sched_setaffinity() by making
sched_setaffinity() post-check that it's not running on any cpus outside
cpuset_cpus_allowed().[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Paul Menage
Cc: Paul Jackson
Cc: David Rientjes
Cc: Nick Piggin
Cc: Peter Zijlstra
Cc: Balbir Singh
Cc: Cedric Le Goater
Cc: "Eric W. Biederman"
Cc: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Decrustify the kernel/cpuset.c 'cpus' and 'mems' updating code.
Other than subtle improvements in the consistency of identifying
white space at the beginning and end of passed in masks, this
doesn't make any visible difference in behaviour. But it's
one or two hundred kernel text bytes smaller, and easier to
understand.[akpm@linux-foundation.org: coding-style fix]
Signed-off-by: Paul Jackson
Reviewed-by: Paul Menage
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add a new per-cpuset flag called 'sched_load_balance'.
When enabled in a cpuset (the default value) it tells the kernel scheduler
that the scheduler should provide the normal load balancing on the CPUs in
that cpuset, sometimes moving tasks from one CPU to a second CPU if the
second CPU is less loaded and if that task is allowed to run there.When disabled (write "0" to the file) then it tells the kernel scheduler
that load balancing is not required for the CPUs in that cpuset.Now even if this flag is disabled for some cpuset, the kernel may still
have to load balance some or all the CPUs in that cpuset, if some
overlapping cpuset has its sched_load_balance flag enabled.If there are some CPUs that are not in any cpuset whose sched_load_balance
flag is enabled, the kernel scheduler will not load balance tasks to those
CPUs.Moreover the kernel will partition the 'sched domains' (non-overlapping
sets of CPUs over which load balancing is attempted) into the finest
granularity partition that it can find, while still keeping any two CPUs
that are in the same shed_load_balance enabled cpuset in the same element
of the partition.This serves two purposes:
1) It provides a mechanism for real time isolation of some CPUs, and
2) it can be used to improve performance on systems with many CPUs
by supporting configurations in which load balancing is not done
across all CPUs at once, but rather only done in several smaller
disjoint sets of CPUs.This mechanism replaces the earlier overloading of the per-cpuset
flag 'cpu_exclusive', which overloading was removed in an earlier
patch: cpuset-remove-sched-domain-hooks-from-cpusetsSee further the Documentation and comments in the code itself.
[akpm@linux-foundation.org: don't be weird]
Signed-off-by: Paul Jackson
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since these are expanded into call to pid_nr_ns() anyway, it's OK to move
the whole routine out-of-line. This is a cheap way to save ~100 bytes from
vmlinux. Together with the previous two patches, it saves half-a-kilo from
the vmlinux.Un-inline other (currently inlined) functions must be done with additional
performance testing.Signed-off-by: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The find_pid/_vpid/_pid_ns functions are used to find the struct pid by its
id, depending on whic id - global or virtual - is used.The find_vpid() is a macro that pushes the current->nsproxy->pid_ns on the
stack to call another function - find_pid_ns(). It turned out, that this
dereference together with the push itself cause the kernel text size to
grow too much.Move all these out-of-line. Together with the previous patch this saves a
bit less that 400 bytes from .text section.Signed-off-by: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With pid namespaces this field is now dangerous to use explicitly, so hide
it behind the helpers.Also the pid and pgrp fields o task_struct and signal_struct are to be
deprecated. Unfortunately this patch cannot be sent right now as this
leads to tons of warnings, so start isolating them, and deprecate later.Actually the p->tgid == pid has to be changed to has_group_leader_pid(),
but Oleg pointed out that in case of posix cpu timers this is the same, and
thread_group_leader() is more preferable.Signed-off-by: Pavel Emelyanov
Acked-by: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since we've switched from using pid->nr to pid->upids->nr some
fields on struct pid are no longer neededSigned-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The find_task_by_something is a set of macros are used to find task by pid
depending on what kind of pid is proposed - global or virtual one. All of
them are wrappers above the most generic one - find_task_by_pid_type_ns() -
and just substitute some args for it.It turned out, that dereferencing the current->nsproxy->pid_ns construction
and pushing one more argument on the stack inline cause kernel text size to
grow.This patch moves all this stuff out-of-line into kernel/pid.c. Together
with the next patch it saves a bit less than 400 bytes from the .text
section.Signed-off-by: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.The idea is:
- all in-kernel data structures must store either struct pid itself
or the pid's global nr, obtained with pid_nr() call;
- when seeking the task from kernel code with the stored id one
should use find_task_by_pid() call that works with global pids;
- when showing pid's numerical value to the user the virtual one
should be used, but however when one shows task's pid outside this
task's namespace the global one is to be used;
- when getting the pid from userspace one need to consider this as
the virtual one and use appropriate task/pid-searching functions.[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov
Signed-off-by: Alexey Dobriyan
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Terminate all processes in a namespace when the reaper of the namespace is
exiting. We do this by walking the pidmap of the namespace and sending
SIGKILL to all processes.Signed-off-by: Sukadev Bhattiprolu
Acked-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Only the global-init process must be special - any other cgroup-init
process must be killable to prevent run-away processes in the system.TODO: Ideally we should allow killing the cgroup-init only from parent
cgroup and prevent it being killed from within the cgroup.
But that is a more complex change and will be addressed by a follow-on
patch. For now allow the cgroup-init to be terminated by any process
with sufficient privileges.Signed-off-by: Sukadev Bhattiprolu
Acked-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This will help fixing memory leaks due to bad reference counting.
Signed-off-by: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The namespace's proc_mnt must be kern_mount-ed to make this pointer always
valid, independently of whether the user space mounted the proc or not. This
solves raced in proc_flush_task, etc. with the proc_mnt switching from NULL
to not-NULL.The initialization is done after the init's pid is created and hashed to make
proc_get_sb() finr it and get for root inode.Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the
superblock holds the namespace we must explicitly break this circle to destroy
all the stuff. This is done after the init of the namespace dies. Running a
few steps forward - when init exits it will kill all its children, so no
proc_mnt will be needed after its death.Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This means that proc_flush_task_mnt() is to be called for many proc mounts and
with different ids, depending on the namespace this pid is to be flushed from.Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When clone() is invoked with CLONE_NEWPID, create a new pid namespace and then
create a new struct pid for the new process. Allocate pid_t's for the new
process in the new pid namespace and all ancestor pid namespaces. Make the
newly cloned process the session and process group leader.Since the active pid namespace is special and expected to be the first entry
in pid->upid_list, preserve the order of pid namespaces.The size of 'struct pid' is dependent on the the number of pid namespaces the
process exists in, so we use multiple pid-caches'. Only one pid cache is
created during system startup and this used by processes that exist only in
init_pid_ns.When a process clones its pid namespace, we create additional pid caches as
necessary and use the pid cache to allocate 'struct pids' for that depth.Note, that with this patch the newly created namespace won't work, since the
rest of the kernel still uses global pids, but this is to be fixed soon. Init
pid namespace still works.[oleg@tv-sign.ru: merge fix]
Signed-off-by: Pavel Emelyanov
Signed-off-by: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
* remove pid.h from pid_namespaces.h;
* rework is_(cgroup|global)_init;
* optimize (get|put)_pid_ns for init_pid_ns;
* declare task_child_reaper to return actual reaper.Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Each pid namespace have to be visible through its own proc mount. Thus we
need to have per-namespace proc trees with their own superblocks.We cannot easily show different pid namespace via one global proc tree, since
each pid refers to different tasks in different namespaces. E.g. pid 1
refers to the init task in the initial namespace and to some other task when
seeing from another namespace. Moreover - pid, exisintg in one namespace may
not exist in the other.This approach has one move advantage is that the tasks from the init namespace
can see what tasks live in another namespace by reading entries from another
proc tree.Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When we create new namespace we will need to allocate the struct pid, that
will have one extra struct upid in array, comparing to the parent.Thus we need to know the new namespace (if any) in alloc_pid() to init this
struct upid properly, so move the alloc_pid() call lower in copy_process().Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When searching the task by numerical id on may need to find it using global
pid (as it is done now in kernel) or by its virtual id, e.g. when sending a
signal to a task from one namespace the sender will specify the task's virtual
id and we should find the task by this value.[akpm@linux-foundation.org: fix gfs2 linkage]
Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When showing pid to user or getting the pid numerical id for in-kernel use the
value of this id may differ depending on the namespace.This set of helpers is used to get the global pid nr, the virtual (i.e. seen
by task in its namespace) nr and the nr as it is seen from the specified
namespace.Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds