Eric Lee / smarc-fsl-linux-kernel

05 Jun, 2010

1 commit

94b3dd0f7 cgroups: alloc_css_id() increments hierarchy depth ... Browse Code »

Child groups should have a greater depth than their parents. Prior to
this change, the parent would incorrectly report zero memory usage for
child cgroups when use_hierarchy is enabled.

test script:
mount -t cgroup none /cgroups -o memory
cd /cgroups
mkdir cg1

echo 1 > cg1/memory.use_hierarchy
mkdir cg1/cg11

echo $$ > cg1/cg11/tasks
dd if=/dev/zero of=/tmp/foo bs=1M count=1

echo
echo CHILD
grep cache cg1/cg11/memory.stat

echo
echo PARENT
grep cache cg1/memory.stat

echo $$ > tasks
rmdir cg1/cg11 cg1
cd /
umount /cgroups

Using fae9c79, a recent patch that changed alloc_css_id() depth computation,
the parent incorrectly reports zero usage:
root@ubuntu:~# ./test
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0151844 s, 69.1 MB/s

CHILD
cache 1048576
total_cache 1048576

PARENT
cache 0
total_cache 0

With this patch, the parent correctly includes child usage:
root@ubuntu:~# ./test
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0136827 s, 76.6 MB/s

CHILD
cache 1052672
total_cache 1052672

PARENT
cache 0
total_cache 1052672

Signed-off-by: Greg Thelen
Acked-by: Paul Menage
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Li Zefan
Cc: [2.6.34.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Thelen
2010-06-05 06:21:45 +0800

28 May, 2010

1 commit

907860ed3 cgroups: make cftype.unregister_event() void-returning ... Browse Code »

Since we are unable to handle an error returned by
cftype.unregister_event() properly, let's make the callback
void-returning.

mem_cgroup_unregister_event() has been rewritten to be a "never fail"
function. On mem_cgroup_usage_register_event() we save old buffer for
thresholds array and reuse it in mem_cgroup_usage_unregister_event() to
avoid allocation.

Signed-off-by: Kirill A. Shutemov
Acked-by: KAMEZAWA Hiroyuki
Cc: Phil Carmody
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: Paul Menage
Cc: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2010-05-28 00:12:44 +0800

21 May, 2010

1 commit

f39d01be4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (44 commits)
vlynq: make whole Kconfig-menu dependant on architecture
add descriptive comment for TIF_MEMDIE task flag declaration.
EEPROM: max6875: Header file cleanup
EEPROM: 93cx6: Header file cleanup
EEPROM: Header file cleanup
agp: use NULL instead of 0 when pointer is needed
rtc-v3020: make bitfield unsigned
PCI: make bitfield unsigned
jbd2: use NULL instead of 0 when pointer is needed
cciss: fix shadows sparse warning
doc: inode uses a mutex instead of a semaphore.
uml: i386: Avoid redefinition of NR_syscalls
fix "seperate" typos in comments
cocbalt_lcdfb: correct sections
doc: Change urls for sparse
Powerpc: wii: Fix typo in comment
i2o: cleanup some exit paths
Documentation/: it's -> its where appropriate
UML: Fix compiler warning due to missing task_struct declaration
UML: add kernel.h include to signal.c
...

Linus Torvalds
2010-05-21 00:20:59 +0800

18 May, 2010

1 commit

b8ae30ee2 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
stop_machine: Move local variable closer to the usage site in cpu_stop_cpu_callback()
sched, wait: Use wrapper functions
sched: Remove a stale comment
ondemand: Make the iowait-is-busy time a sysfs tunable
ondemand: Solve a big performance issue by counting IOWAIT time as busy
sched: Intoduce get_cpu_iowait_time_us()
sched: Eliminate the ts->idle_lastupdate field
sched: Fold updating of the last_update_time_info into update_ts_time_stats()
sched: Update the idle statistics in get_cpu_idle_time_us()
sched: Introduce a function to update the idle statistics
sched: Add a comment to get_cpu_idle_time_us()
cpu_stop: add dummy implementation for UP
sched: Remove rq argument to the tracepoints
rcu: need barrier() in UP synchronize_sched_expedited()
sched: correctly place paranioa memory barriers in synchronize_sched_expedited()
sched: kill paranoia check in synchronize_sched_expedited()
sched: replace migration_thread with cpu_stop
stop_machine: reimplement using cpu_stop
cpu_stop: implement stop_cpu[s]()
sched: Fix select_idle_sibling() logic in select_task_rq_fair()
...

Linus Torvalds
2010-05-18 23:27:54 +0800

12 May, 2010

2 commits

747388d78 memcg: fix css_is_ancestor() RCU locking ... Browse Code »

Some callers (in memcontrol.c) calls css_is_ancestor() without
rcu_read_lock. Because css_is_ancestor() has to access RCU protected
data, it should be under rcu_read_lock().

This makes css_is_ancestor() itself does safe access to RCU protected
area. (At least, "root" can have refcnt==0 if it's not an ancestor of
"child". So, we need rcu_read_lock().)

Signed-off-by: KAMEZAWA Hiroyuki
Cc: "Paul E. McKenney"
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-05-12 08:33:42 +0800
7f0f15464 memcg: fix css_id() RCU locking for real ... Browse Code »

Commit ad4ba375373937817404fd92239ef4cadbded23b ("memcg: css_id() must be
called under rcu_read_lock()") modifies memcontol.c for fixing RCU check
message. But Andrew Morton pointed out that the fix doesn't seems sane
and it was just for hidining lockdep messages.

This is a patch for do proper things. Checking again, all places,
accessing without rcu_read_lock, that commit fixies was intentional....
all callers of css_id() has reference count on it. So, it's not necessary
to be under rcu_read_lock().

Considering again, we can use rcu_dereference_check for css_id(). We know
css->id is valid if css->refcnt > 0. (css->id never changes and freed
after css->refcnt going to be 0.)

This patch makes use of rcu_dereference_check() in css_id/depth and remove
unnecessary rcu-read-lock added by the commit.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: "Paul E. McKenney"
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-05-12 08:33:42 +0800

11 May, 2010

1 commit

a93d2f174 sched, wait: Use wrapper functions ... Browse Code »

epoll should not touch flags in wait_queue_t. This patch introduces a new
function __add_wait_queue_exclusive(), for the users, who use wait queue as a
LIFO queue.

__add_wait_queue_tail_exclusive() is introduced too instead of
add_wait_queue_exclusive_locked(). remove_wait_queue_locked() is removed, as
it is a duplicate of __remove_wait_queue(), disliked by users, and with less
users.

Signed-off-by: Changli Gao
Signed-off-by: Peter Zijlstra
Cc: Alexander Viro
Cc: Paul Menage
Cc: Li Zefan
Cc: Davide Libenzi
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar

Changli Gao
2010-05-11 23:43:58 +0800

05 May, 2010

2 commits

fae9c7917 cgroup: Fix an RCU warning in alloc_css_id() ... Browse Code »

With CONFIG_PROVE_RCU=y, a warning can be triggered:

# mount -t cgroup -o memory xxx /mnt
# mkdir /mnt/0

...
kernel/cgroup.c:4442 invoked rcu_dereference_check() without protection!
...

This is a false-positive. It's safe to directly access parent_css->id.

Signed-off-by: Li Zefan
Signed-off-by: Paul E. McKenney

Li Zefan
2010-05-05 00:25:00 +0800
9a9686b63 cgroup: Fix an RCU warning in cgroup_path() ... Browse Code »

with CONFIG_PROVE_RCU=y, a warning can be triggered:

# mount -t cgroup -o debug xxx /mnt
# cat /proc/$$/cgroup

...
kernel/cgroup.c:1649 invoked rcu_dereference_check() without protection!
...

This is a false-positive, because cgroup_path() can be called
with either rcu_read_lock() held or cgroup_mutex held.

Signed-off-by: Li Zefan
Signed-off-by: Paul E. McKenney

Li Zefan
2010-05-05 00:24:59 +0800

23 Apr, 2010

1 commit

6c9468e9e Merge branch 'master' into for-next Browse Code »

Jiri Kosina
2010-04-23 08:08:44 +0800

25 Mar, 2010

1 commit

9d34706f4 cgroups: remove duplicate include ... Browse Code »

commit e6a1105b ("cgroups: subsystem module loading interface") and commit
c50cc752 ("sched, cgroups: Fix module export") result in duplicate
including of module.h

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2010-03-25 07:31:19 +0800

16 Mar, 2010

1 commit

883931612 Fix typos in comments ... Browse Code »

[Ss]ytem => [Ss]ystem
udpate => update
paramters => parameters
orginal => original

Signed-off-by: Thomas Weber
Acked-by: Randy Dunlap
Signed-off-by: Jiri Kosina

Thomas Weber
2010-03-16 18:47:56 +0800

13 Mar, 2010

10 commits

a0a4db548 cgroups: remove events before destroying subsystem state objects ... Browse Code »

Events should be removed after rmdir of cgroup directory, but before
destroying subsystem state objects. Let's take reference to cgroup
directory dentry to do that.

Signed-off-by: Kirill A. Shutemov
Acked-by: KAMEZAWA Hiroyuki
Cc: Paul Menage
Acked-by: Li Zefan
Cc: Balbir Singh
Cc: Pavel Emelyanov
Cc: Dan Malek
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2010-03-13 07:52:37 +0800
4ab78683c cgroups: fix race between userspace and kernelspace ... Browse Code »

Notify userspace about cgroup removing only after rmdir of cgroup
directory to avoid race between userspace and kernelspace.

eventfd are used to notify about two types of event:
- control file-specific, like crossing memory threshold;
- cgroup removing.

To understand what really happen, userspace can check if the cgroup still
exists. To avoid race beetween userspace and kernelspace we have to
notify userspace about cgroup removing only after rmdir of cgroup
directory.

Signed-off-by: Kirill A. Shutemov
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Paul Menage
Acked-by: Li Zefan
Cc: Balbir Singh
Cc: Pavel Emelyanov
Cc: Dan Malek
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2010-03-13 07:52:37 +0800
0dea11687 cgroup: implement eventfd-based generic API for notifications ... Browse Code »

This patchset introduces eventfd-based API for notifications in cgroups
and implements memory notifications on top of it.

It uses statistics in memory controler to track memory usage.

Output of time(1) on building kernel on tmpfs:

Root cgroup before changes:
make -j2 506.37 user 60.93s system 193% cpu 4:52.77 total
Non-root cgroup before changes:
make -j2 507.14 user 62.66s system 193% cpu 4:54.74 total
Root cgroup after changes (0 thresholds):
make -j2 507.13 user 62.20s system 193% cpu 4:53.55 total
Non-root cgroup after changes (0 thresholds):
make -j2 507.70 user 64.20s system 193% cpu 4:55.70 total
Root cgroup after changes (1 thresholds, never crossed):
make -j2 506.97 user 62.20s system 193% cpu 4:53.90 total
Non-root cgroup after changes (1 thresholds, never crossed):
make -j2 507.55 user 64.08s system 193% cpu 4:55.63 total

This patch:

Introduce the write-only file "cgroup.event_control" in every cgroup.

To register new notification handler you need:
- create an eventfd;
- open a control file to be monitored. Callbacks register_event() and
unregister_event() must be defined for the control file;
- write " " to cgroup.event_control.
Interpretation of args is defined by control file implementation;

eventfd will be woken up by control file implementation or when the
cgroup is removed.

To unregister notification handler just close eventfd.

If you need notification functionality for a control file you have to
implement callbacks register_event() and unregister_event() in the
struct cftype.

[kamezawa.hiroyu@jp.fujitsu.com: Kconfig fix]
Signed-off-by: Kirill A. Shutemov
Reviewed-by: KAMEZAWA Hiroyuki
Paul Menage
Cc: Li Zefan
Cc: Balbir Singh
Cc: Pavel Emelyanov
Cc: Dan Malek
Cc: Vladislav Buzov
Cc: Daisuke Nishimura
Cc: Alexander Shishkin
Cc: Davide Libenzi
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2010-03-13 07:52:37 +0800
b70cc5fdb cgroups: clean up cgroup_pidlist_find() a bit ... Browse Code »

Don't call get_pid_ns() before we locate/alloc the ns.

Signed-off-by: Li Zefan
Cc: Serge Hallyn
Acked-by: Paul Menage
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2010-03-13 07:52:36 +0800
67523c48a cgroups: blkio subsystem as module ... Browse Code »

Modify the Block I/O cgroup subsystem to be able to be built as a module.
As the CFQ disk scheduler optionally depends on blk-cgroup, config options
in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
enhanced to support the new module dependency.

Signed-off-by: Ben Blum
Cc: Li Zefan
Cc: Paul Menage
Cc: "David S. Miller"
Cc: KAMEZAWA Hiroyuki
Cc: Lai Jiangshan
Cc: Vivek Goyal
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2010-03-13 07:52:36 +0800
cf5d5941f cgroups: subsystem module unloading ... Browse Code »

Provides support for unloading modular subsystems.

This patch adds a new function cgroup_unload_subsys which is to be used
for removing a loaded subsystem during module deletion. Reference
counting of the subsystems' modules is moved from once (at load time) to
once per attached hierarchy (in parse_cgroupfs_options and
rebind_subsystems) (i.e., 0 or 1).

Signed-off-by: Ben Blum
Acked-by: Li Zefan
Cc: Paul Menage
Cc: "David S. Miller"
Cc: KAMEZAWA Hiroyuki
Cc: Lai Jiangshan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2010-03-13 07:52:36 +0800
e6a1105ba cgroups: subsystem module loading interface ... Browse Code »

Add interface between cgroups subsystem management and module loading

This patch implements rudimentary module-loading support for cgroups -
namely, a cgroup_load_subsys (similar to cgroup_init_subsys) for use as a
module initcall, and a struct module pointer in struct cgroup_subsys.

Several functions that might be wanted by modules have had EXPORT_SYMBOL
added to them, but it's unclear exactly which functions want it and which
won't.

Signed-off-by: Ben Blum
Acked-by: Li Zefan
Cc: Paul Menage
Cc: "David S. Miller"
Cc: KAMEZAWA Hiroyuki
Cc: Lai Jiangshan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2010-03-13 07:52:36 +0800
aae8aab40 cgroups: revamp subsys array ... Browse Code »

This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree. This is
mainly useful for classifiers and subsystems that hook into components
that are already modules. cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.

It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.

Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear. Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.

Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.

Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.

Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.

Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves. In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy). The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored. Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.

This patch:

Make subsys[] able to be dynamically populated to support modular
subsystems

This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.

Signed-off-by: Ben Blum
Acked-by: Li Zefan
Cc: Paul Menage
Cc: "David S. Miller"
Cc: KAMEZAWA Hiroyuki
Cc: Lai Jiangshan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2010-03-13 07:52:36 +0800
d7b9fff71 cgroup: introduce coalesce css_get() and css_put() ... Browse Code »

Current css_get() and css_put() increment/decrement css->refcnt one by
one.

This patch add a new function __css_get(), which takes "count" as a arg
and increment the css->refcnt by "count". And this patch also add a new
arg("count") to __css_put() and change the function to decrement the
css->refcnt by "count".

These coalesce version of __css_get()/__css_put() will be used to improve
performance of memcg's moving charge feature later, where instead of
calling css_get()/css_put() repeatedly, these new functions will be used.

No change is needed for current users of css_get()/css_put().

Signed-off-by: Daisuke Nishimura
Acked-by: Paul Menage
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2010-03-13 07:52:36 +0800
2468c7234 cgroup: introduce cancel_attach() ... Browse Code »

Add cancel_attach() operation to struct cgroup_subsys. cancel_attach()
can be used when can_attach() operation prepares something for the subsys,
but we should rollback what can_attach() operation has prepared if attach
task fails after we've succeeded in can_attach().

Signed-off-by: Daisuke Nishimura
Acked-by: Li Zefan
Reviewed-by: Paul Menage
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2010-03-13 07:52:35 +0800

25 Feb, 2010

2 commits

c50cc7527 sched, cgroups: Fix module export ... Browse Code »

I have exported it in d11c563 - but cgroups.c did not have module.h included ...

Cc: Paul E. McKenney
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2010-02-25 19:02:13 +0800
d11c563dd sched: Use lockdep-based checking on rcu_dereference() ... Browse Code »

Update the rcu_dereference() usages to take advantage of the new
lockdep-based checking.

Signed-off-by: Paul E. McKenney
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference:
[ -v2: fix allmodconfig missing symbol export build failure on x86 ]
Signed-off-by: Ingo Molnar

Paul E. McKenney
2010-02-25 17:34:26 +0800

03 Feb, 2010

1 commit

4528fd059 cgroups: fix to return errno in a failure path ... Browse Code »

In cgroup_create(), if alloc_css_id() returns failure, the errno is not
propagated to userspace, so mkdir will fail silently.

To trigger this bug, we mount blkio (or memory subsystem), and create more
then 65534 cgroups. (The number of cgroups is limited to 65535 if a
subsystem has use_id == 1)

# mount -t cgroup -o blkio xxx /mnt
# for ((i = 0; i < 65534; i++)); do mkdir /mnt/$i; done
# mkdir /mnt/65534
(should return ENOSPC)
#

Signed-off-by: Li Zefan
Acked-by: Serge Hallyn
Acked-by: Paul Menage
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2010-02-03 10:11:22 +0800

12 Jan, 2010

1 commit

bd4f490a0 cgroups: fix 2.6.32 regression causing BUG_ON() in cgroup_diput() ... Browse Code »

The LTP cgroup test suite generates a "kernel BUG at kernel/cgroup.c:790!"
here in cgroup_diput():

/*
* if we're getting rid of the cgroup, refcount should ensure
* that there are no pidlists left.
*/
BUG_ON(!list_empty(&cgrp->pidlists));

The cgroup pidlist rework in 2.6.32 generates the BUG_ON, which is caused
when pidlist_array_load() calls cgroup_pidlist_find():

(1) if a matching cgroup_pidlist is found, it down_write's the mutex of the
pre-existing cgroup_pidlist, and increments its use_count.
(2) if no matching cgroup_pidlist is found, then a new one is allocated, it
down_write's its mutex, and the use_count is set to 0.
(3) the matching, or new, cgroup_pidlist gets returned back to pidlist_array_load(),
which increments its use_count -- regardless whether new or pre-existing --
and up_write's the mutex.

So if a matching list is ever encountered by cgroup_pidlist_find() during
the life of a cgroup directory, it results in an inflated use_count value,
preventing it from ever getting released by cgroup_release_pid_array().
Then if the directory is subsequently removed, cgroup_diput() hits the
BUG_ON() when it finds that the directory's cgroup is still populated with
a pidlist.

The patch simply removes the use_count increment when a matching pidlist
is found by cgroup_pidlist_find(), because it gets bumped by the calling
pidlist_array_load() function while still protected by the list's mutex.

Signed-off-by: Dave Anderson
Reviewed-by: Li Zefan
Acked-by: Ben Blum
Cc: Paul Menage
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Anderson
2010-01-12 01:34:05 +0800

29 Oct, 2009

1 commit

478988d3b cgroup: fix strstrip() misuse ... Browse Code »

cgroup_write_X64() and cgroup_write_string() ignore the return value of
strstrip(). it makes small inconsistent behavior.

example:
=========================
# cd /mnt/cgroup/hoge
# cat memory.swappiness
60
# echo "59 " > memory.swappiness
# cat memory.swappiness
59
# echo " 58" > memory.swappiness
bash: echo: write error: Invalid argument

This patch fixes it.

Cc: Li Zefan
Acked-by: Paul Menage
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-10-29 22:39:25 +0800

02 Oct, 2009

2 commits

3dece8347 cgroup: catch bad css refcnt at css_put ... Browse Code »

__css_put() doesn't check a bug as refcnt goes to minus.
I think it should be caught. This patch adds a check for it.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Paul Menage
Cc: Li Zefan
Cc: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-10-02 07:11:12 +0800
828c09509 const: constify remaining file_operations ... Browse Code »

[akpm@linux-foundation.org: fix KVM]
Signed-off-by: Alexey Dobriyan
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-10-02 07:11:11 +0800

24 Sep, 2009

11 commits

be367d099 cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time ... Browse Code »

Alter the ss->can_attach and ss->attach functions to be able to deal with
a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a
pre-patch to cgroup-procs-writable.patch.)

Currently, new mode of the attach function can only tell the subsystem
about the old cgroup of the threadgroup leader. No subsystem currently
needs that information for each thread that's being moved, but if one were
to be added (for example, one that counts tasks within a group) this bit
would need to be reworked a bit to tell the subsystem the right
information.

[hidave.darkstar@gmail.com: fix build]
Signed-off-by: Ben Blum
Signed-off-by: Paul Menage
Acked-by: Li Zefan
Reviewed-by: Matt Helsley
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Dave Young
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2009-09-24 22:20:58 +0800
c378369d8 cgroups: change css_set freeing mechanism to be under RCU ... Browse Code »

Changes css_set freeing mechanism to be under RCU

This is a prepatch for making the procs file writable. In order to free the
old css_sets for each task to be moved as they're being moved, the freeing
mechanism must be RCU-protected, or else we would have to have a call to
synchronize_rcu() for each task before freeing its old css_set.

Signed-off-by: Ben Blum
Signed-off-by: Paul Menage
Cc: "Paul E. McKenney"
Acked-by: Li Zefan
Cc: Matt Helsley
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2009-09-24 22:20:58 +0800
d1d9fd330 cgroups: use vmalloc for large cgroups pidlist allocations ... Browse Code »

Separates all pidlist allocation requests to a separate function that
judges based on the requested size whether or not the array needs to be
vmalloced or can be gotten via kmalloc, and similar for kfree/vfree.

Signed-off-by: Ben Blum
Signed-off-by: Paul Menage
Acked-by: Li Zefan
Cc: Matt Helsley
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2009-09-24 22:20:58 +0800
72a8cb30d cgroups: ensure correct concurrent opening/reading of pidlists across pid namespaces ... Browse Code »

Previously there was the problem in which two processes from different pid
namespaces reading the tasks or procs file could result in one process
seeing results from the other's namespace. Rather than one pidlist for
each file in a cgroup, we now keep a list of pidlists keyed by namespace
and file type (tasks versus procs) in which entries are placed on demand.
Each pidlist has its own lock, and that the pidlists themselves are passed
around in the seq_file's private pointer means we don't have to touch the
cgroup or its master list except when creating and destroying entries.

Signed-off-by: Ben Blum
Signed-off-by: Paul Menage
Cc: Li Zefan
Cc: Matt Helsley
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2009-09-24 22:20:58 +0800
102a775e3 cgroups: add a read-only "procs" file similar to "tasks" that shows only unique tgids ... Browse Code »

struct cgroup used to have a bunch of fields for keeping track of the
pidlist for the tasks file. Those are now separated into a new struct
cgroup_pidlist, of which two are had, one for procs and one for tasks.
The way the seq_file operations are set up is changed so that just the
pidlist struct gets passed around as the private data.

Interface example: Suppose a multithreaded process has pid 1000 and other
threads with ids 1001, 1002, 1003:
$ cat tasks
1000
1001
1002
1003
$ cat cgroup.procs
1000
$

Signed-off-by: Ben Blum
Signed-off-by: Paul Menage
Acked-by: Li Zefan
Cc: Matt Helsley
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2009-09-24 22:20:58 +0800
8f3ff2086 cgroups: revert "cgroups: fix pid namespace bug" ... Browse Code »

The following series adds a "cgroup.procs" file to each cgroup that
reports unique tgids rather than pids, and allows all threads in a
threadgroup to be atomically moved to a new cgroup.

The subsystem "attach" interface is modified to support attaching whole
threadgroups at a time, which could introduce potential problems if any
subsystem were to need to access the old cgroup of every thread being
moved. The attach interface may need to be revised if this becomes the
case.

Also added is functionality for read/write locking all CLONE_THREAD
fork()ing within a threadgroup, by means of an rwsem that lives in the
sighand_struct, for per-threadgroup-ness and also for sharing a cacheline
with the sighand's atomic count. This scheme should introduce no extra
overhead in the fork path when there's no contention.

The final patch reveals potential for a race when forking before a
subsystem's attach function is called - one potential solution in case any
subsystem has this problem is to hang on to the group's fork mutex through
the attach() calls, though no subsystem yet demonstrates need for an
extended critical section.

This patch:

Revert

commit 096b7fe012d66ed55e98bc8022405ede0cc80e96
Author: Li Zefan
AuthorDate: Wed Jul 29 15:04:04 2009 -0700
Commit: Linus Torvalds
CommitDate: Wed Jul 29 19:10:35 2009 -0700

cgroups: fix pid namespace bug

This is in preparation for some clashing cgroups changes that subsume the
original commit's functionaliy.

The original commit fixed a pid namespace bug which Ben Blum fixed
independently (in the same way, but with different code) as part of a
series of patches. I played around with trying to reconcile Ben's patch
series with Li's patch, but concluded that it was simpler to just revert
Li's, given that Ben's patch series contained essentially the same fix.

Signed-off-by: Paul Menage
Cc: Li Zefan
Cc: Matt Helsley
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-09-24 22:20:58 +0800
2c6ab6d20 cgroups: allow cgroup hierarchies to be created with no bound subsystems ... Browse Code »

This patch removes the restriction that a cgroup hierarchy must have at
least one bound subsystem. The mount option "none" is treated as an
explicit request for no bound subsystems.

A hierarchy with no subsystems can be useful for plain task tracking, and
is also a step towards the support for multiply-bindable subsystems.

As part of this change, the hierarchy id is no longer calculated from the
bitmask of subsystems in the hierarchy (since this is not guaranteed to be
unique) but is allocated via an ida. Reference counts on cgroups from
css_set objects are now taken explicitly one per hierarchy, rather than
one per subsystem.

Example usage:

mount -t cgroup -o none,name=foo cgroup /mnt/cgroup

Based on the "no-op"/"none" subsystem concept proposed by
kamezawa.hiroyu@jp.fujitsu.com

Signed-off-by: Paul Menage
Reviewed-by: Li Zefan
Cc: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Dhaval Giani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-09-24 22:20:58 +0800
7717f7ba9 cgroups: add a back-pointer from struct cg_cgroup_link to struct cgroup ... Browse Code »

Currently the cgroups code makes the assumption that the subsystem
pointers in a struct css_set uniquely identify the hierarchy->cgroup
mappings associated with the css_set; and there's no way to directly
identify the associated set of cgroups other than by indirecting through
the appropriate subsystem state pointers.

This patch removes the need for that assumption by adding a back-pointer
from struct cg_cgroup_link object to its associated cgroup; this allows
the set of cgroups to be determined by traversing the cg_links list in
the struct css_set.

Signed-off-by: Paul Menage
Reviewed-by: Li Zefan
Cc: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Dhaval Giani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-09-24 22:20:58 +0800
fe6934354 cgroups: move the cgroup debug subsys into cgroup.c to access internal state ... Browse Code »

While it's architecturally clean to have the cgroup debug subsystem be
completely independent of the cgroups framework, it limits its usefulness
for debugging the contents of internal data structures. Move the debug
subsystem code into the scope of all the cgroups data structures to make
more detailed debugging possible.

Signed-off-by: Paul Menage
Reviewed-by: Li Zefan
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Dhaval Giani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-09-24 22:20:57 +0800
c6d57f331 cgroups: support named cgroups hierarchies ... Browse Code »

To simplify referring to cgroup hierarchies in mount statements, and to
allow disambiguation in the presence of empty hierarchies and
multiply-bindable subsystems this patch adds support for naming a new
cgroup hierarchy via the "name=" mount option

A pre-existing hierarchy may be specified by either name or by subsystems;
a hierarchy's name cannot be changed by a remount operation.

Example usage:

# To create a hierarchy called "foo" containing the "cpu" subsystem
mount -t cgroup -oname=foo,cpu cgroup /mnt/cgroup1

# To mount the "foo" hierarchy on a second location
mount -t cgroup -oname=foo cgroup /mnt/cgroup2

Signed-off-by: Paul Menage
Reviewed-by: Li Zefan
Cc: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Dhaval Giani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-09-24 22:20:57 +0800
34f77a90f cgroups: make unlock sequence in cgroup_get_sb consistent ... Browse Code »

Make the last unlock sequence consistent with previous unlock sequeue.

Acked-by: Balbir Singh
Acked-by: Paul Menage
Signed-off-by: Xiaotian Feng
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xiaotian Feng
2009-09-24 22:20:57 +0800