Doug / smarc-fsl-linux-kernel | Embedian Git Server

19 Feb, 2009

1 commit

67e055d14 cgroups: fix possible use after free ... Browse Code »

In cgroup_kill_sb(), root is freed before sb is detached from the list, so
another sget() may find this sb and call cgroup_test_super(), which will
access the root that has been freed.

Reported-by: Al Viro
Signed-off-by: Li Zefan
Acked-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-02-19 07:37:54 +0800

12 Feb, 2009

1 commit

cfebe563b cgroups: fix lockdep subclasses overflow ... Browse Code »

I enabled all cgroup subsystems when compiling kernel, and then:
# mount -t cgroup -o net_cls xxx /mnt
# mkdir /mnt/0

This showed up immediately:
BUG: MAX_LOCKDEP_SUBCLASSES too low!
turning off the locking correctness validator.

It's caused by the cgroup hierarchy lock:
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
if (ss->root == root)
mutex_lock_nested(&ss->hierarchy_mutex, i);
}

Now we have 9 cgroup subsystems, and the above 'i' for net_cls is 8, but
MAX_LOCKDEP_SUBCLASSES is 8.

This patch uses different lockdep keys for different subsystems.

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-02-12 06:25:36 +0800

30 Jan, 2009

4 commits

839ec5452 cgroup: fix root_count when mount fails due to busy subsystem ... Browse Code »

root_count was being incremented in cgroup_get_sb() after all error
checking was complete, but decremented in cgroup_kill_sb(), which can be
called on a superblock that we gave up on due to an error. This patch
changes cgroup_kill_sb() to only decrement root_count if the root was
previously linked into the list of roots.

Signed-off-by: Paul Menage
Tested-by: Serge Hallyn
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-01-30 10:04:45 +0800
804b3c28a cgroups: add cpu_relax() calls in css_tryget() and cgroup_clear_css_refs() ... Browse Code »

css_tryget() and cgroup_clear_css_refs() contain polling loops; these
loops should have cpu_relax calls in them to reduce cross-cache traffic.

Signed-off-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-01-30 10:04:45 +0800
1404f0656 cgroups: fix lock inconsistency in cgroup_clone() ... Browse Code »

I fixed a bug in cgroup_clone() in Linus' tree in commit 7b574b7
("cgroups: fix a race between cgroup_clone and umount") without noticing
there was a cleanup patch in -mm tree that should be rebased (now commit
104cbd5, "cgroups: use task_lock() for access tsk->cgroups safe in
cgroup_clone()"), thus resulted in lock inconsistency.

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-01-30 10:04:45 +0800
baef99a08 cgroups: use hierarchy mutex in creation failure path ... Browse Code »

Now, cgrp->sibling is handled under hierarchy mutex.
error route should do so, too.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Acked-by Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-30 10:04:43 +0800

09 Jan, 2009

14 commits

e7c5ec919 cgroups: add css_tryget() ... Browse Code »

Add css_tryget(), that obtains a counted reference on a CSS. It is used
in situations where the caller has a "weak" reference to the CSS, i.e.
one that does not protect the cgroup from removal via a reference count,
but would instead be cleaned up by a destroy() callback.

css_tryget() will return true on success, or false if the cgroup is being
removed.

This is similar to Kamezawa Hiroyuki's patch from a week or two ago, but
with the difference that in the event of css_tryget() racing with a
cgroup_rmdir(), css_tryget() will only return false if the cgroup really
does get removed.

This implementation is done by biasing css->refcnt, so that a refcnt of 1
means "releasable" and 0 means "released or releasing". In the event of a
race, css_tryget() distinguishes between "released" and "releasing" by
checking for the CSS_REMOVED flag in css->flags.

Signed-off-by: Paul Menage
Tested-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-01-09 00:31:10 +0800
999cd8a45 cgroups: add a per-subsystem hierarchy_mutex ... Browse Code »

These patches introduce new locking/refcount support for cgroups to
reduce the need for subsystems to call cgroup_lock(). This will
ultimately allow the atomicity of cgroup_rmdir() (which was removed
recently) to be restored.

These three patches give:

1/3 - introduce a per-subsystem hierarchy_mutex which a subsystem can
use to prevent changes to its own cgroup tree

2/3 - use hierarchy_mutex in place of calling cgroup_lock() in the
memory controller

3/3 - introduce a css_tryget() function similar to the one recently
proposed by Kamezawa, but avoiding spurious refcount failures in
the event of a race between a css_tryget() and an unsuccessful
cgroup_rmdir()

Future patches will likely involve:

- using hierarchy mutex in place of cgroup_lock() in more subsystems
where appropriate

- restoring the atomicity of cgroup_rmdir() with respect to cgroup_create()

This patch:

Add a hierarchy_mutex to the cgroup_subsys object that protects changes to
the hierarchy observed by that subsystem. It is taken by the cgroup
subsystem (in addition to cgroup_mutex) for the following operations:

- linking a cgroup into that subsystem's cgroup tree
- unlinking a cgroup from that subsystem's cgroup tree
- moving the subsystem to/from a hierarchy (including across the
bind() callback)

Thus if the subsystem holds its own hierarchy_mutex, it can safely
traverse its own hierarchy.

Signed-off-by: Paul Menage
Tested-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-01-09 00:31:10 +0800
a47295e6b cgroups: make cgroup_path() RCU-safe ... Browse Code »

Fix races between /proc/sched_debug by freeing cgroup objects via an RCU
callback. Thus any cgroup reference obtained from an RCU-safe source will
remain valid during the RCU section. Since dentries are also RCU-safe,
this allows us to traverse up the tree safely.

Additionally, make cgroup_path() check for a NULL cgrp->dentry to avoid
trying to report a path for a partially-created cgroup.

[lizf@cn.fujitsu.com: call deactive_super() in cgroup_diput()]
Signed-off-by: Paul Menage
Reviewed-by: Li Zefan
Tested-by: Li Zefan
Cc: Peter Zijlstra
Signed-off-by: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-01-09 00:31:03 +0800
e7b80bb69 cgroups: skip processes from other namespaces when listing a cgroup ... Browse Code »

Once tasks are populated from system namespace inside cgroup, container
replaces other namespace task with 0 while listing tasks, inside
container.

Though this is expected behaviour from container end, there is no use of
showing unwanted 0s.

In this patch, we check if a process is in same namespace before loading
into pid array.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Gowrishankar M
Acked-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gowrishankar M
2009-01-09 00:31:03 +0800
c12f65d43 cgroups: introduce link_css_set() to remove duplicate code ... Browse Code »

Add a common function link_css_set() to link a css_set to a cgroup.

Signed-off-by: Li Zefan
Cc: Paul Menage
Cc: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-01-09 00:31:03 +0800
33a68ac1c cgroups: add inactive subsystems to rootnode.subsys_list ... Browse Code »

Though for an inactive hierarchy, we have subsys->root == &rootnode, but
rootnode's subsys_list is always empty.

This conflicts with the code in find_css_set():

for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
...
if (ss->root->subsys_list.next == &ss->sibling) {
...
}
}
if (list_empty(&rootnode.subsys_list)) {
...
}

The above code assumes rootnode.subsys_list links all inactive
hierarchies.

Signed-off-by: Li Zefan
Cc: Paul Menage
Cc: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-01-09 00:31:03 +0800
e5f6a8609 cgroups: make root_list contains active hierarchies only ... Browse Code »

Don't link rootnode to the root list, so root_list contains active
hierarchies only as the comment indicates. And rename for_each_root() to
for_each_active_root().

Also remove redundant check in cgroup_kill_sb().

Signed-off-by: Li Zefan
Cc: Paul Menage
Cc: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-01-09 00:31:03 +0800
7534432dc cgroups: remove rcu_read_lock() in cgroupstats_build() ... Browse Code »

cgroup_iter_* do not need rcu_read_lock().

In cgroup_enable_task_cg_lists(), do_each_thread() and while_each_thread()
are protected by RCU, it's OK, for write_lock(&css_set_lock) implies
rcu_read_lock() in non-RT kernel.

If we need explicit rcu_read_lock(), we should add rcu_read_lock() in
cgroup_enable_task_cg_lists(), not cgroup_iter_*.

Signed-off-by: Lai Jiangshan
Acked-by: Paul Menage
Cc: KAMEZAWA Hiroyuki
Cc: Pavel Emelyanov
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2009-01-09 00:31:03 +0800
77efecd9e cgroups: call find_css_set() safely in cgroup_attach_task() ... Browse Code »

In cgroup_attach_task(), tsk maybe exit when we call find_css_set(). and
find_css_set() will access to invalid css_set.

This patch increases the count before get_css_set(), and decreases it
after find_css_set().

NOTE:

css_set's refcount is also taskcount, after this patch applied, taskcount
may be off-by-one WHEN cgroup_lock() is not held. but I reviewed other
code which use taskcount, they are still correct. No regression found by
reviewing and simply testing.

So I do not use two counters in css_set. (one counter for taskcount, the
other for refcount. like struct mm_struct) If this fix cause regression,
we will use two counters in css_set.

Signed-off-by: Lai Jiangshan
Cc: Paul Menage
Cc: KAMEZAWA Hiroyuki
Cc: Pavel Emelyanov
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2009-01-09 00:31:02 +0800
104cbd553 cgroups: use task_lock() for access tsk->cgroups safe in cgroup_clone() ... Browse Code »

Use task_lock() protect tsk->cgroups and get_css_set(tsk->cgroups).

Signed-off-by: Lai Jiangshan
Acked-by: Paul Menage
Cc: KAMEZAWA Hiroyuki
Cc: Pavel Emelyanov
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2009-01-09 00:31:02 +0800
b2aa30f7b cgroups: don't put struct cgroupfs_root protected by RCU ... Browse Code »

We don't access struct cgroupfs_root in fast path, so we should not put
struct cgroupfs_root protected by RCU

But the comment in struct cgroup_subsys.root confuse us.

struct cgroup_subsys.root is used in these places:

1 find_css_set(): if (ss->root->subsys_list.next == &ss->sibling)
2 rebind_subsystems(): if (ss->root != &rootnode)
rcu_assign_pointer(ss->root, root);
rcu_assign_pointer(subsys[i]->root, &rootnode);
3 cgroup_has_css_refs(): if (ss->root != cgrp->root)
4 cgroup_init_subsys(): ss->root = &rootnode;
5 proc_cgroupstats_show(): ss->name, ss->root->subsys_bits,
ss->root->number_of_cgroups, !ss->disabled);
6 cgroup_clone(): root = subsys->root;
if ((root != subsys->root) ||

All these place we have held cgroup_lock() or we don't dereference to
struct cgroupfs_root. It's means wo don't need RCU when use struct
cgroup_subsys.root, and we should not put struct cgroupfs_root protected
by RCU.

Signed-off-by: Lai Jiangshan
Reviewed-by: Paul Menage
Cc: KAMEZAWA Hiroyuki
Cc: Pavel Emelyanov
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2009-01-09 00:31:02 +0800
2019f634c cgroups: fix cgroup_iter_next() bug ... Browse Code »

We access res->cgroups without the task_lock(), so res->cgroups may be
changed. it's unreliable, and "if (l == &res->cgroups->tasks)" may be
false forever.

We don't need add any lock for fixing this bug. we just access to struct
css_set by struct cg_cgroup_link, not by struct task_struct.

Since we hold css_set_lock, struct cg_cgroup_link is reliable.

Signed-off-by: Lai Jiangshan
Reviewed-by: Paul Menage
Cc: KAMEZAWA Hiroyuki
Cc: Pavel Emelyanov
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2009-01-09 00:31:02 +0800
b12b533fa cgroups: add lock for child->cgroups in cgroup_post_fork() ... Browse Code »

When cgroup_post_fork() is called, child is seen by find_task_by_vpid(),
so child->cgroups maybe be changed, It'll incorrect.

child->cgroups's refcnt is decreased
child->cgroups's refcnt is increased
but child->cg_list is added to child->cgroups's list.

Signed-off-by: Lai Jiangshan
Reviewed-by: Paul Menage
Cc: KAMEZAWA Hiroyuki
Cc: Pavel Emelyanov
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2009-01-09 00:31:02 +0800
75139b827 cgroups: remove some redundant NULL checks ... Browse Code »

- In cgroup_clone(), if vfs_mkdir() returns successfully,
dentry->d_fsdata will be the pointer to the newly created
cgroup and won't be NULL.

- a cgroup file's dentry->d_fsdata won't be NULL, guaranteed
by cgroup_add_file().

- When walking through the subsystems of a cgroup_fs (using
for_each_subsys), cgrp->subsys[ss->subsys_id] won't be NULL,
guaranteed by cgroup_create().

(Also remove 2 unused variables in cgroup_rmdir().

Signed-off-by: Li Zefan
Cc: Paul Menage
Cc: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-01-09 00:31:01 +0800

07 Jan, 2009

1 commit

e5991371e mm: remove cgroup_mm_owner_callbacks ... Browse Code »

cgroup_mm_owner_callbacks() was brought in to support the memrlimit
controller, but sneaked into mainline ahead of it. That controller has
now been shelved, and the mm_owner_changed() args were inadequate for it
anyway (they needed an mm pointer instead of a task pointer).

Remove the dead code, and restore mm_update_next_owner() locking to how it
was before: taking mmap_sem there does nothing for memcontrol.c, now the
only user of mm->owner.

Signed-off-by: Hugh Dickins
Cc: Paul Menage
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-01-07 07:59:01 +0800

06 Jan, 2009

2 commits

520c85346 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
inotify: fix type errors in interfaces
fix breakage in reiserfs_new_inode()
fix the treatment of jfs special inodes
vfs: remove duplicate code in get_fs_type()
add a vfs_fsync helper
sys_execve and sys_uselib do not call into fsnotify
zero i_uid/i_gid on inode allocation
inode->i_op is never NULL
ntfs: don't NULL i_op
isofs check for NULL ->i_op in root directory is dead code
affs: do not zero ->i_op
kill suid bit only for regular files
vfs: lseek(fd, 0, SEEK_CUR) race condition

Linus Torvalds
2009-01-06 10:32:06 +0800
56ff5efad zero i_uid/i_gid on inode allocation ... Browse Code »

... and don't bother in callers. Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.

i_mode is not worth the effort; it has no common default value.

Signed-off-by: Al Viro

Al Viro
2009-01-06 00:54:28 +0800

05 Jan, 2009

1 commit

7b574b7b0 cgroups: fix a race between cgroup_clone and umount ... Browse Code »

The race is calling cgroup_clone() while umounting the ns cgroup subsys,
and thus cgroup_clone() might access invalid cgroup_fs, or kill_sb() is
called after cgroup_clone() created a new dir in it.

The BUG I triggered is BUG_ON(root->number_of_cgroups != 1);

------------[ cut here ]------------
kernel BUG at kernel/cgroup.c:1093!
invalid opcode: 0000 [#1] SMP
...
Process umount (pid: 5177, ti=e411e000 task=e40c4670 task.ti=e411e000)
...
Call Trace:
[] ? deactivate_super+0x3f/0x51
[] ? mntput_no_expire+0xb3/0xdd
[] ? sys_umount+0x265/0x2ac
[] ? sys_oldumount+0xd/0xf
[] ? sysenter_do_call+0x12/0x31
...
EIP: [] cgroup_kill_sb+0x23/0xe0 SS:ESP 0068:e411ef2c
---[ end trace c766c1be3bf944ac ]---

Cc: Serge E. Hallyn
Signed-off-by: Li Zefan
Cc: Paul Menage
Cc: "Serge E. Hallyn"
Cc: Balbir Singh
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-01-05 05:33:19 +0800

25 Dec, 2008

1 commit

cbacc2c7f Merge branch 'next' into for-linus Browse Code »

James Morris
2008-12-25 08:40:09 +0800

24 Dec, 2008

2 commits

20ca9b3f4 cgroups: avoid accessing uninitialized data in failure path ... Browse Code »

If cgroup_get_rootdir() failed, free_cg_links() will be called in the
failure path, but tmp_cg_links hasn't been initialized at that time.

I introduced this bug in the 2.6.27 merge window.

Signed-off-by: Li Zefan
Acked-by: Serge Hallyn
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-12-24 07:58:21 +0800
e368d3a83 cgroups: suppress bogus warning messages ... Browse Code »

Remove spurious warning messages that are thrown onto the console during
cgroup operations.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Sharyathi Nagesh
Acked-by: Serge E. Hallyn
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sharyathi Nagesh
2008-12-24 07:58:21 +0800

16 Dec, 2008

1 commit

307257cf4 cgroups: fix a race between rmdir and remount ... Browse Code »

When a cgroup is removed, it's unlinked from its parent's children list,
but not actually freed until the last dentry on it is released (at which
point cgrp->root->number_of_cgroups is decremented).

Currently rebind_subsystems checks for the top cgroup's child list being
empty in order to rebind subsystems into or out of a hierarchy - this can
result in the set of subsystems bound to a hierarchy being
removed-but-not-freed cgroup.

The simplest fix for this is to forbid remounts that change the set of
subsystems on a hierarchy that has removed-but-not-freed cgroups. This
bug can be reproduced via:

mkdir /mnt/cg
mount -t cgroup -o ns,freezer cgroup /mnt/cg
mkdir /mnt/cg/foo
sleep 1h < /mnt/cg/foo &
rmdir /mnt/cg/foo
mount -t cgroup -o remount,ns,devices,freezer cgroup /mnt/cg
kill $!

Though the above will cause oops in -mm only but not mainline, but the bug
can cause memory leak in mainline (and even oops)

Signed-off-by: Paul Menage
Reviewed-by: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-12-16 08:27:07 +0800

04 Dec, 2008

1 commit

ec98ce480 Merge branch 'master' into next ... Browse Code »

Conflicts:
fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris

James Morris
2008-12-04 14:16:36 +0800

20 Nov, 2008

2 commits

33d283bef cgroups: fix a serious bug in cgroupstats ... Browse Code »

Try this, and you'll get oops immediately:
# cd Documentation/accounting/
# gcc -o getdelays getdelays.c
# mount -t cgroup -o debug xxx /mnt
# ./getdelays -C /mnt/tasks

Because a normal file's dentry->d_fsdata is a pointer to struct cftype,
not struct cgroup.

After the patch, it returns EINVAL if we try to get cgroupstats
from a normal file.

Cc: Balbir Singh
Signed-off-by: Li Zefan
Acked-by: Paul Menage
Cc: [2.6.25.x, 2.6.26.x, 2.6.27.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-11-20 10:50:00 +0800
3fa59dfbc cgroup: fix potential deadlock in pre_destroy ... Browse Code »

As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.

It has following lock sequence.

cgroup_mutex (cgroup_rmdir)
-> pre_destroy -> mem_cgroup_pre_destroy-> force_empty
-> cpu_hotplug.lock. (lru_add_drain_all->
schedule_work->
get_online_cpus)

But, cpuset has following.
cpu_hotplug.lock (call notifier)
-> cgroup_mutex. (within notifier)

Then, this lock sequence should be fixed.

Considering how pre_destroy works, it's not necessary to holding
cgroup_mutex() while calling it.

As a side effect, we don't have to wait at this mutex while memcg's
force_empty works.(it can be long when there are tons of pages.)

Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Cc: Li Zefan
Cc: Paul Menage
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-11-20 10:49:58 +0800

14 Nov, 2008

4 commits

2b8289256 Merge branch 'master' into next ... Browse Code »

Conflicts:
security/keys/internal.h
security/keys/process_keys.c
security/keys/request_key.c

Fixed conflicts above by using the non 'tsk' versions.

Signed-off-by: James Morris

James Morris
2008-11-14 08:29:12 +0800
c69e8d9c0 CRED: Use RCU to access another task's creds and to release a task's own creds ... Browse Code »

Use RCU to access another task's creds and to release a task's own creds.
This means that it will be possible for the credentials of a task to be
replaced without another task (a) requiring a full lock to read them, and (b)
seeing deallocated memory.

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:19 +0800
b6dff3ec5 CRED: Separate task security context from task_struct ... Browse Code »

Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.

Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.

With comment fixes Signed-off-by: Marc Dionne

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:16 +0800
76aac0e9a CRED: Wrap task credential accesses in the core kernel ... Browse Code »

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells
Reviewed-by: James Morris
Acked-by: Serge Hallyn
Cc: Al Viro
Cc: linux-audit@redhat.com
Cc: containers@lists.linux-foundation.org
Cc: linux-mm@kvack.org
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:12 +0800

07 Nov, 2008

1 commit

24eb08995 cgroups: fix invalid cgrp->dentry before cgroup has been completely removed ... Browse Code »

This fixes an oops when reading /proc/sched_debug.

A cgroup won't be removed completely until finishing cgroup_diput(), so we
shouldn't invalidate cgrp->dentry in cgroup_rmdir(). Otherwise, when a
group is being removed while cgroup_path() gets called, we may trigger
NULL dereference BUG.

The bug can be reproduced:

# cat test.sh
#!/bin/sh
mount -t cgroup -o cpu xxx /mnt
for (( ; ; ))
{
mkdir /mnt/sub
rmdir /mnt/sub
}
# ./test.sh &
# cat /proc/sched_debug

BUG: unable to handle kernel NULL pointer dereference at 00000038
IP: [] cgroup_path+0x39/0x90
...
Call Trace:
[] ? print_cfs_rq+0x6e/0x75d
[] ? sched_debug_show+0x72d/0xc1e
...

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: [2.6.26.x, 2.6.27.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-11-07 07:41:19 +0800

27 Oct, 2008

1 commit

207777664 cgroup: remove unused variable ... Browse Code »

/scratch/sfr/next/kernel/cgroup.c: In function 'cgroup_tasks_start':
/scratch/sfr/next/kernel/cgroup.c:2107: warning: unused variable 'i'

Introduced in commit cc31edceee04a7b87f2be48f9489ebb72d264844 "cgroups:
convert tasks file to use a seq_file with shared pid array".

Signed-off-by: Stephen Rothwell
Signed-off-by: Linus Torvalds

Stephen Rothwell
2008-10-27 00:38:17 +0800

20 Oct, 2008

2 commits

cc31edcee cgroups: convert tasks file to use a seq_file with shared pid array ... Browse Code »

Rather than pre-generating the entire text for the "tasks" file each
time the file is opened, we instead just generate/update the array of
process ids and use a seq_file to report these to userspace. All open
file handles on the same "tasks" file can share a pid array, which may
be updated any time that no thread is actively reading the array. By
sharing the array, the potential for userspace to DoS the system by
opening many handles on the same "tasks" file is removed.

[Based on a patch by Lai Jiangshan, extended to use seq_file]

Signed-off-by: Paul Menage
Reviewed-by: Lai Jiangshan
Cc: Serge Hallyn
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-10-20 23:52:38 +0800
146aa1bd0 cgroups: fix probable race with put_css_set[_taskexit] and find_css_set ... Browse Code »

put_css_set_taskexit may be called when find_css_set is called on other
cpu. And the race will occur:

put_css_set_taskexit side find_css_set side

|
atomic_dec_and_test(&kref->refcount) |
/* kref->refcount = 0 */ |
....................................................................
| read_lock(&css_set_lock)
| find_existing_css_set
| get_css_set
| read_unlock(&css_set_lock);
....................................................................
__release_css_set |
....................................................................
| /* use a released css_set */
|

[put_css_set is the same. But in the current code, all put_css_set are
put into cgroup mutex critical region as the same as find_css_set.]

[akpm@linux-foundation.org: repair comments]
[menage@google.com: eliminate race in css_set refcounting]
Signed-off-by: Lai Jiangshan
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2008-10-20 23:52:38 +0800

17 Oct, 2008

1 commit

9363b9f23 memrlimit: cgroup mm owner callback changes to add task info ... Browse Code »

This patch adds an additional field to the mm_owner callbacks. This field
is required to get to the mm that changed. Hold mmap_sem in write mode
before calling the mm_owner_changed callback

[hugh@veritas.com: fix mmap_sem deadlock]
Signed-off-by: Balbir Singh
Cc: Sudhir Kumar
Cc: YAMAMOTO Takashi
Cc: Paul Menage
Cc: Li Zefan
Cc: Pavel Emelianov
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Cc: David Rientjes
Cc: Vivek Goyal
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Balbir Singh
2008-10-17 02:21:28 +0800