Eric Lee / smarc-fsl-linux-kernel

17 Jul, 2007

1 commit

213dd266d namespace: ensure clone_flags are always stored in an unsigned long ... Browse Code »

While working on unshare support for the network namespace I noticed we
were putting clone flags in an int. Which is weird because the syscall
uses unsigned long and we at least need an unsigned to properly hold all of
the unshare flags.

So to make the code consistent, this patch updates the code to use
unsigned long instead of int for the clone flags in those places
where we get it wrong today.

Signed-off-by: Eric W. Biederman
Acked-by: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-07-17 00:05:48 +0800

11 May, 2007

2 commits

820e45db2 statically initialize struct pid for swapper ... Browse Code »

Statically initialize a struct pid for the swapper process (pid_t == 0) and
attach it to init_task. This is needed so task_pid(), task_pgrp() and
task_session() interfaces work on the swapper process also.

Signed-off-by: Sukadev Bhattiprolu
Cc: Cedric Le Goater
Cc: Dave Hansen
Cc: Serge Hallyn
Cc: Eric Biederman
Cc: Herbert Poetzl
Cc:
Acked-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2007-05-11 23:29:35 +0800
e713d0dab attach_pid() with struct pid parameter ... Browse Code »

attach_pid() currently takes a pid_t and then uses find_pid() to find the
corresponding struct pid. Sometimes we already have the struct pid. We can
then skip find_pid() if attach_pid() were to take a struct pid parameter.

Signed-off-by: Sukadev Bhattiprolu
Cc: Cedric Le Goater
Cc: Dave Hansen
Cc: Serge Hallyn
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2007-05-11 23:29:35 +0800

09 May, 2007

1 commit

e3222c4ec Merge sys_clone()/sys_unshare() nsproxy and namespace handling ... Browse Code »

sys_clone() and sys_unshare() both makes copies of nsproxy and its associated
namespaces. But they have different code paths.

This patch merges all the nsproxy and its associated namespace copy/clone
handling (as much as possible). Posted on container list earlier for
feedback.

- Create a new nsproxy and its associated namespaces and pass it back to
caller to attach it to right process.

- Changed all copy_*_ns() routines to return a new copy of namespace
instead of attaching it to task->nsproxy.

- Moved the CAP_SYS_ADMIN checks out of copy_*_ns() routines.

- Removed unnessary !ns checks from copy_*_ns() and added BUG_ON()
just incase.

- Get rid of all individual unshare_*_ns() routines and make use of
copy_*_ns() instead.

[akpm@osdl.org: cleanups, warning fix]
[clg@fr.ibm.com: remove dup_namespaces() declaration]
[serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval]
[akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n]
Signed-off-by: Badari Pulavarty
Signed-off-by: Serge Hallyn
Cc: Cedric Le Goater
Cc: "Eric W. Biederman"
Cc:
Signed-off-by: Cedric Le Goater
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2007-05-09 02:15:00 +0800

08 May, 2007

1 commit

0a31bd5f2 KMEM_CACHE(): simplify slab cache creation ... Browse Code »

This patch provides a new macro

KMEM_CACHE(, )

to simplify slab creation. KMEM_CACHE creates a slab with the name of the
struct, with the size of the struct and with the alignment of the struct.
Additional slab flags may be specified if necessary.

Example

struct test_slab {
int a,b,c;
struct list_head;
} __cacheline_aligned_in_smp;

test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

will create a new slab named "test_slab" of the size sizeof(struct
test_slab) and aligned to the alignment of test slab. If it fails then we
panic.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:55 +0800

31 Jan, 2007

1 commit

0f2452855 [PATCH] namespaces: fix task exit disaster ... Browse Code »

This is based on a patch by Eric W. Biederman, who pointed out that pid
namespaces are still fake, and we only have one ever active.

So for the time being, we can modify any code which could access
tsk->nsproxy->pid_ns during task exit to just use &init_pid_ns instead,
and move the exit_task_namespaces call in do_exit() back above
exit_notify(), so that an exiting nfs server has a valid tsk->sighand to
work with.

Long term, pulling pid_ns out of nsproxy might be the cleanest solution.

Signed-off-by: Eric W. Biederman

[ Eric's patch fixed to take care of free_pid() too ]

Signed-off-by: Serge E. Hallyn
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2007-01-31 05:40:36 +0800

09 Dec, 2006

4 commits

84d737866 [PATCH] add child reaper to pid_namespace ... Browse Code »

Add a per pid_namespace child-reaper. This is needed so processes are reaped
within the same pid space and do not spill over to the parent pid space. Its
also needed so containers preserve existing semantic that pid == 1 would reap
orphaned children.

This is based on Eric Biederman's patch: http://lkml.org/lkml/2006/2/6/285

Signed-off-by: Sukadev Bhattiprolu
Signed-off-by: Cedric Le Goater
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2006-12-09 00:28:52 +0800
6cc1b22a4 [PATCH] use current->nsproxy->pid_ns ... Browse Code »

Signed-off-by: Cedric Le Goater
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2006-12-09 00:28:52 +0800
9a575a92d [PATCH] to nsproxy ... Browse Code »

Add the pid namespace framework to the nsproxy object. The copy of the pid
namespace only increases the refcount on the global pid namespace,
init_pid_ns, and unshare is not implemented.

There is no configuration option to activate or deactivate this feature
because this not relevant for the moment.

Signed-off-by: Cedric Le Goater
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2006-12-09 00:28:52 +0800
61a58c6c2 [PATCH] rename struct pspace to struct pid_namespace ... Browse Code »

Rename struct pspace to struct pid_namespace for consistency with other
namespaces (uts_namespace and ipc_namespace). Also rename
include/linux/pspace.h to include/linux/pid_namespace.h and variables from
pspace to pid_ns.

Signed-off-by: Sukadev Bhattiprolu
Signed-off-by: Cedric Le Goater
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2006-12-09 00:28:52 +0800

08 Dec, 2006

1 commit

e18b890bb [PATCH] slab: remove kmem_cache_t ... Browse Code »

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#

set -e

for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done

The script was run like this

sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:25 +0800

02 Oct, 2006

8 commits

1a657f78d [PATCH] introduce get_task_pid() to fix unsafe get_pid() ... Browse Code »

proc_pid_make_inode:

ei->pid = get_pid(task_pid(task));

I think this is not safe. get_pid() can be preempted after checking "pid
!= NULL". Then the task exits, does detach_pid(), and RCU frees the pid.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-10-02 22:57:25 +0800
f40f50d3b [PATCH] Use struct pspace in next_pidmap and find_ge_pid ... Browse Code »

This updates my proc: readdir race fix (take 3) patch
to account for the changes made by: Sukadev Bhattiprolu
to introduce struct pspace.

Signed-off-by: Eric W. Biederman
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-10-02 22:57:15 +0800
3fbc96486 [PATCH] Define struct pspace ... Browse Code »

Define a per-container pid space object. And create one instance of this
object, init_pspace, to define the entire pid space. Subsequent patches
will provide/use interfaces to create/destroy pid spaces.

Its a subset/rework of Eric Biederman's patch
http://lkml.org/lkml/2006/2/6/285 .

Signed-off-by: Eric Biederman
Signed-off-by: Sukadev Bhattiprolu
Cc: Dave Hansen
Cc: Serge Hallyn
Cc: Cedric Le Goater
Cc: Kirill Korotaev
Cc: Andrey Savochkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2006-10-02 22:57:15 +0800
aa5a6662f [PATCH] Move pidmap to pspace.h ... Browse Code »

Move struct pidmap and PIDMAP_ENTRIES to a new file, include/linux/pspace.h
where it will be used in subsequent patches to define pid spaces.

Its a subset of Eric Biederman's patch http://lkml.org/lkml/2006/2/6/285

[akpm@osdl.org: cleanups]
Signed-off-by: Eric W. Biederman
Signed-off-by: Sukadev Bhattiprolu
Cc: Dave Hansen
Cc: Serge Hallyn
Cc: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2006-10-02 22:57:15 +0800
c88be3eb2 [PATCH] pids coding style use struct pidmap in next_pidmap ... Browse Code »

Use struct pidmap instead of pidmap_t.

This updates my proc: readdir race fix (take 3) patch
to account for the changes made by: Sukadev Bhattiprolu
to kill pidmap_t.

Signed-off-by: Eric W. Biederman
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-10-02 22:57:15 +0800
6a1f3b845 [PATCH] pids: coding style: use struct pidmap ... Browse Code »

Use struct pidmap instead of pidmap_t.

Its a subset of Eric Biederman's patch http://lkml.org/lkml/2006/2/6/271.

Signed-off-by: Eric W. Biederman
Signed-off-by: Sukadev Bhattiprolu
Cc: Dave Hansen
Cc: Serge Hallyn
Cc: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2006-10-02 22:57:14 +0800
bbf73147e [PATCH] pid: export the symbols needed to use struct pid * ... Browse Code »

pids aren't something that drivers should care about. However there are a lot
of helper layers in the kernel that do care, and are built as modules. Before
I can convert them to using struct pid instead of pid_t I need to export the
appropriate symbols so they can continue to be built.

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-10-02 22:57:13 +0800
0804ef4b0 [PATCH] proc: readdir race fix (take 3) ... Browse Code »

The problem: An opendir, readdir, closedir sequence can fail to report
process ids that are continually in use throughout the sequence of system
calls. For this race to trigger the process that proc_pid_readdir stops at
must exit before readdir is called again.

This can cause ps to fail to report processes, and it is in violation of
posix guarantees and normal application expectations with respect to
readdir.

Currently there is no way to work around this problem in user space short
of providing a gargantuan buffer to user space so the directory read all
happens in on system call.

This patch implements the normal directory semantics for proc, that
guarantee that a directory entry that is neither created nor destroyed
while reading the directory entry will be returned. For directory that are
either created or destroyed during the readdir you may or may not see them.
Furthermore you may seek to a directory offset you have previously seen.

These are the guarantee that ext[23] provides and that posix requires, and
more importantly that user space expects. Plus it is a simple semantic to
implement reliable service. It is just a matter of calling readdir a
second time if you are wondering if something new has show up.

These better semantics are implemented by scanning through the pids in
numerical order and by making the file offset a pid plus a fixed offset.

The pid scan happens on the pid bitmap, which when you look at it is
remarkably efficient for a brute force algorithm. Given that a typical
cache line is 64 bytes and thus covers space for 64*8 == 200 pids. There
are only 40 cache lines for the entire 32K pid space. A typical system
will have 100 pids or more so this is actually fewer cache lines we have to
look at to scan a linked list, and the worst case of having to scan the
entire pid bitmap is pretty reasonable.

If we need something more efficient we can go to a more efficient data
structure for indexing the pids, but for now what we have should be
sufficient.

In addition this takes no additional locks and is actually less code than
what we are doing now.

Also another very subtle bug in this area has been fixed. It is possible
to catch a task in the middle of de_thread where a thread is assuming the
thread of it's thread group leader. This patch carefully handles that case
so if we hit it we don't fail to return the pid, that is undergoing the
de_thread dance.

Thanks to KAMEZAWA Hiroyuki for
providing the first fix, pointing this out and working on it.

[oleg@tv-sign.ru: fix it]
Signed-off-by: Eric W. Biederman
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Oleg Nesterov
Cc: Jean Delvare
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-10-02 22:57:12 +0800

27 Sep, 2006

2 commits

65800ac77 [PATCH] pid: remove temporary debug code in attach_pid ... Browse Code »

With the patches flying between Oleg and myself somehow this temporary
debug code got left in pid.c. It was never intended to make it to the
stable kernel.

Signed-off-by: Eric W. Biederman
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-09-27 23:26:19 +0800
c18258c6f [PATCH] pid: Implement transfer_pid and use it to simplify de_thread ... Browse Code »

In de_thread we move pids from one process to another, a rather ugly case.
The function transfer_pid makes it clear what we are doing, and makes the
action atomic. This is useful we ever want to atomically traverse the
process group and session lists, in a rcu safe manner.

Even if the atomic properties this change should be a win as transfer_pid
should be less code to execute than executing both attach_pid and
detach_pid, and this should make de_thread slightly smaller as only a
single function call needs to be emitted. The only downside is that the
code might be slower to execute as the odds are against transfer_pid being
in cache.

Signed-off-by: Eric W. Biederman
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-09-27 23:26:19 +0800

04 Jul, 2006

1 commit

36c8b5868 [PATCH] sched: cleanup, remove task_t, convert to struct task_struct ... Browse Code »

cleanup: remove task_t and convert all the uses to struct task_struct. I
introduced it for the scheduler anno and it was a mistake.

Conversion was mostly scripted, the result was reviewed and all
secondary whitespace and style impact (if any) was fixed up by hand.

Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2006-07-04 06:27:11 +0800

01 Apr, 2006

1 commit

92476d7fc [PATCH] pidhash: Refactor the pid hash table ... Browse Code »

Simplifies the code, reduces the need for 4 pid hash tables, and makes the
code more capable.

In the discussions I had with Oleg it was felt that to a large extent the
cleanup itself justified the work. With struct pid being dynamically
allocated meant we could create the hash table entry when the pid was
allocated and free the hash table entry when the pid was freed. Instead of
playing with the hash lists when ever a process would attach or detach to a
process.

For myself the fact that it gave what my previous task_ref patch gave for free
with simpler code was a big win. The problem is that if you hold a reference
to struct task_struct you lock in 10K of low memory. If you do that in a user
controllable way like /proc does, with an unprivileged but hostile user space
application with typical resource limits of 1000 fds and 100 processes I can
trigger the OOM killer by consuming all of low memory with task structs, on a
machine wight 1GB of low memory.

If I instead hold a reference to struct pid which holds a pointer to my
task_struct, I don't suffer from that problem because struct pid is 2 orders
of magnitude smaller. In fact struct pid is small enough that most other
kernel data structures dwarf it, so simply limiting the number of referring
data structures is enough to prevent exhaustion of low memory.

This splits the current struct pid into two structures, struct pid and struct
pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one.
struct pid_link is the per process linkage into the hash tables and lives in
struct task_struct. struct pid is given an indepedent lifetime, and holds
pointers to each of the pid types.

The independent life of struct pid simplifies attach_pid, and detach_pid,
because we are always manipulating the list of pids and not the hash table.
In addition in giving struct pid an indpendent life it makes the concept much
more powerful.

Kernel data structures can now embed a struct pid * instead of a pid_t and
not suffer from pid wrap around problems or from keeping unnecessarily
large amounts of memory allocated.

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-04-01 04:19:00 +0800

29 Mar, 2006

2 commits

73b9ebfe1 [PATCH] pidhash: don't count idle threads ... Browse Code »

fork_idle() does unhash_process() just after copy_process(). Contrary,
boot_cpu's idle thread explicitely registers itself for each pid_type with nr
= 0.

copy_process() already checks p->pid != 0 before process_counts++, I think we
can just skip attach_pid() calls and job control inits for idle threads and
kill unhash_process(). We don't need to cleanup ->proc_dentry in fork_idle()
because with this patch idle threads are never hashed in
kernel/pid.c:pid_hash[].

We don't need to hash pid == 0 in pidmap_init(). free_pidmap() is never
called with pid == 0 arg, so it will never be reused. So it is still possible
to use pid == 0 in any PIDTYPE_xxx namespace from kernel/pid.c's POV.

However with this patch we don't hash pid == 0 for PIDTYPE_PID case. We still
have have PIDTYPE_PGID/PIDTYPE_SID entries with pid == 0: /sbin/init and
kernel threads which don't call daemonize().

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-03-29 10:36:41 +0800
d73d65293 [PATCH] pidhash: kill switch_exec_pids ... Browse Code »

switch_exec_pids is only called from de_thread by way of exec, and it is
only called when we are exec'ing from a non thread group leader.

Currently switch_exec_pids gives the leader the pid of the thread and
unhashes and rehashes all of the process groups. The leader is already in
the EXIT_DEAD state so no one cares about it's pids. The only concern for
the leader is that __unhash_process called from release_task will function
correctly. If we don't touch the leader at all we know that
__unhash_process will work fine so there is no need to touch the leader.

For the task becomming the thread group leader, we just need to give it the
pid of the old thread group leader, add it to the task list, and attach it
to the session and the process group of the thread group.

Currently de_thread is also adding the task to the task list which is just
silly.

Currently the only leader of __detach_pid besides detach_pid is
switch_exec_pids because of the ugly extra work that was being
performed.

So this patch removes switch_exec_pids because it is doing too much, it is
creating an unnecessary special case in pid.c, duing work duplicated in
de_thread, and generally obscuring what it is going on.

The necessary work is added to de_thread, and it seems to be a little
clearer there what is going on.

Signed-off-by: Eric W. Biederman
Cc: Oleg Nesterov
Cc: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-03-29 10:36:40 +0800

09 Jan, 2006

1 commit

e56d09031 [PATCH] RCU signal handling ... Browse Code »

RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked
instead of tasklist_lock read-locked. This is a scalability improvement on
SMP and a preemption-latency improvement under PREEMPT_RCU.

Signed-off-by: Paul E. McKenney
Signed-off-by: Ingo Molnar
Acked-by: William Irwin
Cc: Roland McGrath
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2006-01-09 12:13:40 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800