Doug / smarc-fsl-linux-kernel | Embedian Git Server

29 Oct, 2009

1 commit

478988d3b cgroup: fix strstrip() misuse ... Browse Code »

cgroup_write_X64() and cgroup_write_string() ignore the return value of
strstrip(). it makes small inconsistent behavior.

example:
=========================
# cd /mnt/cgroup/hoge
# cat memory.swappiness
60
# echo "59 " > memory.swappiness
# cat memory.swappiness
59
# echo " 58" > memory.swappiness
bash: echo: write error: Invalid argument

This patch fixes it.

Cc: Li Zefan
Acked-by: Paul Menage
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-10-29 22:39:25 +0800

02 Oct, 2009

2 commits

3dece8347 cgroup: catch bad css refcnt at css_put ... Browse Code »

__css_put() doesn't check a bug as refcnt goes to minus.
I think it should be caught. This patch adds a check for it.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Paul Menage
Cc: Li Zefan
Cc: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-10-02 07:11:12 +0800
828c09509 const: constify remaining file_operations ... Browse Code »

[akpm@linux-foundation.org: fix KVM]
Signed-off-by: Alexey Dobriyan
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-10-02 07:11:11 +0800

24 Sep, 2009

11 commits

be367d099 cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time ... Browse Code »

Alter the ss->can_attach and ss->attach functions to be able to deal with
a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a
pre-patch to cgroup-procs-writable.patch.)

Currently, new mode of the attach function can only tell the subsystem
about the old cgroup of the threadgroup leader. No subsystem currently
needs that information for each thread that's being moved, but if one were
to be added (for example, one that counts tasks within a group) this bit
would need to be reworked a bit to tell the subsystem the right
information.

[hidave.darkstar@gmail.com: fix build]
Signed-off-by: Ben Blum
Signed-off-by: Paul Menage
Acked-by: Li Zefan
Reviewed-by: Matt Helsley
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Dave Young
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2009-09-24 22:20:58 +0800
c378369d8 cgroups: change css_set freeing mechanism to be under RCU ... Browse Code »

Changes css_set freeing mechanism to be under RCU

This is a prepatch for making the procs file writable. In order to free the
old css_sets for each task to be moved as they're being moved, the freeing
mechanism must be RCU-protected, or else we would have to have a call to
synchronize_rcu() for each task before freeing its old css_set.

Signed-off-by: Ben Blum
Signed-off-by: Paul Menage
Cc: "Paul E. McKenney"
Acked-by: Li Zefan
Cc: Matt Helsley
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2009-09-24 22:20:58 +0800
d1d9fd330 cgroups: use vmalloc for large cgroups pidlist allocations ... Browse Code »

Separates all pidlist allocation requests to a separate function that
judges based on the requested size whether or not the array needs to be
vmalloced or can be gotten via kmalloc, and similar for kfree/vfree.

Signed-off-by: Ben Blum
Signed-off-by: Paul Menage
Acked-by: Li Zefan
Cc: Matt Helsley
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2009-09-24 22:20:58 +0800
72a8cb30d cgroups: ensure correct concurrent opening/reading of pidlists across pid namespaces ... Browse Code »

Previously there was the problem in which two processes from different pid
namespaces reading the tasks or procs file could result in one process
seeing results from the other's namespace. Rather than one pidlist for
each file in a cgroup, we now keep a list of pidlists keyed by namespace
and file type (tasks versus procs) in which entries are placed on demand.
Each pidlist has its own lock, and that the pidlists themselves are passed
around in the seq_file's private pointer means we don't have to touch the
cgroup or its master list except when creating and destroying entries.

Signed-off-by: Ben Blum
Signed-off-by: Paul Menage
Cc: Li Zefan
Cc: Matt Helsley
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2009-09-24 22:20:58 +0800
102a775e3 cgroups: add a read-only "procs" file similar to "tasks" that shows only unique tgids ... Browse Code »

struct cgroup used to have a bunch of fields for keeping track of the
pidlist for the tasks file. Those are now separated into a new struct
cgroup_pidlist, of which two are had, one for procs and one for tasks.
The way the seq_file operations are set up is changed so that just the
pidlist struct gets passed around as the private data.

Interface example: Suppose a multithreaded process has pid 1000 and other
threads with ids 1001, 1002, 1003:
$ cat tasks
1000
1001
1002
1003
$ cat cgroup.procs
1000
$

Signed-off-by: Ben Blum
Signed-off-by: Paul Menage
Acked-by: Li Zefan
Cc: Matt Helsley
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2009-09-24 22:20:58 +0800
8f3ff2086 cgroups: revert "cgroups: fix pid namespace bug" ... Browse Code »

The following series adds a "cgroup.procs" file to each cgroup that
reports unique tgids rather than pids, and allows all threads in a
threadgroup to be atomically moved to a new cgroup.

The subsystem "attach" interface is modified to support attaching whole
threadgroups at a time, which could introduce potential problems if any
subsystem were to need to access the old cgroup of every thread being
moved. The attach interface may need to be revised if this becomes the
case.

Also added is functionality for read/write locking all CLONE_THREAD
fork()ing within a threadgroup, by means of an rwsem that lives in the
sighand_struct, for per-threadgroup-ness and also for sharing a cacheline
with the sighand's atomic count. This scheme should introduce no extra
overhead in the fork path when there's no contention.

The final patch reveals potential for a race when forking before a
subsystem's attach function is called - one potential solution in case any
subsystem has this problem is to hang on to the group's fork mutex through
the attach() calls, though no subsystem yet demonstrates need for an
extended critical section.

This patch:

Revert

commit 096b7fe012d66ed55e98bc8022405ede0cc80e96
Author: Li Zefan
AuthorDate: Wed Jul 29 15:04:04 2009 -0700
Commit: Linus Torvalds
CommitDate: Wed Jul 29 19:10:35 2009 -0700

cgroups: fix pid namespace bug

This is in preparation for some clashing cgroups changes that subsume the
original commit's functionaliy.

The original commit fixed a pid namespace bug which Ben Blum fixed
independently (in the same way, but with different code) as part of a
series of patches. I played around with trying to reconcile Ben's patch
series with Li's patch, but concluded that it was simpler to just revert
Li's, given that Ben's patch series contained essentially the same fix.

Signed-off-by: Paul Menage
Cc: Li Zefan
Cc: Matt Helsley
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-09-24 22:20:58 +0800
2c6ab6d20 cgroups: allow cgroup hierarchies to be created with no bound subsystems ... Browse Code »

This patch removes the restriction that a cgroup hierarchy must have at
least one bound subsystem. The mount option "none" is treated as an
explicit request for no bound subsystems.

A hierarchy with no subsystems can be useful for plain task tracking, and
is also a step towards the support for multiply-bindable subsystems.

As part of this change, the hierarchy id is no longer calculated from the
bitmask of subsystems in the hierarchy (since this is not guaranteed to be
unique) but is allocated via an ida. Reference counts on cgroups from
css_set objects are now taken explicitly one per hierarchy, rather than
one per subsystem.

Example usage:

mount -t cgroup -o none,name=foo cgroup /mnt/cgroup

Based on the "no-op"/"none" subsystem concept proposed by
kamezawa.hiroyu@jp.fujitsu.com

Signed-off-by: Paul Menage
Reviewed-by: Li Zefan
Cc: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Dhaval Giani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-09-24 22:20:58 +0800
7717f7ba9 cgroups: add a back-pointer from struct cg_cgroup_link to struct cgroup ... Browse Code »

Currently the cgroups code makes the assumption that the subsystem
pointers in a struct css_set uniquely identify the hierarchy->cgroup
mappings associated with the css_set; and there's no way to directly
identify the associated set of cgroups other than by indirecting through
the appropriate subsystem state pointers.

This patch removes the need for that assumption by adding a back-pointer
from struct cg_cgroup_link object to its associated cgroup; this allows
the set of cgroups to be determined by traversing the cg_links list in
the struct css_set.

Signed-off-by: Paul Menage
Reviewed-by: Li Zefan
Cc: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Dhaval Giani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-09-24 22:20:58 +0800
fe6934354 cgroups: move the cgroup debug subsys into cgroup.c to access internal state ... Browse Code »

While it's architecturally clean to have the cgroup debug subsystem be
completely independent of the cgroups framework, it limits its usefulness
for debugging the contents of internal data structures. Move the debug
subsystem code into the scope of all the cgroups data structures to make
more detailed debugging possible.

Signed-off-by: Paul Menage
Reviewed-by: Li Zefan
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Dhaval Giani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-09-24 22:20:57 +0800
c6d57f331 cgroups: support named cgroups hierarchies ... Browse Code »

To simplify referring to cgroup hierarchies in mount statements, and to
allow disambiguation in the presence of empty hierarchies and
multiply-bindable subsystems this patch adds support for naming a new
cgroup hierarchy via the "name=" mount option

A pre-existing hierarchy may be specified by either name or by subsystems;
a hierarchy's name cannot be changed by a remount operation.

Example usage:

# To create a hierarchy called "foo" containing the "cpu" subsystem
mount -t cgroup -oname=foo,cpu cgroup /mnt/cgroup1

# To mount the "foo" hierarchy on a second location
mount -t cgroup -oname=foo cgroup /mnt/cgroup2

Signed-off-by: Paul Menage
Reviewed-by: Li Zefan
Cc: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Dhaval Giani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-09-24 22:20:57 +0800
34f77a90f cgroups: make unlock sequence in cgroup_get_sb consistent ... Browse Code »

Make the last unlock sequence consistent with previous unlock sequeue.

Acked-by: Balbir Singh
Acked-by: Paul Menage
Signed-off-by: Xiaotian Feng
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xiaotian Feng
2009-09-24 22:20:57 +0800

23 Sep, 2009

1 commit

88e9d34c7 seq_file: constify seq_operations ... Browse Code »

Make all seq_operations structs const, to help mitigate against
revectoring user-triggerable function pointers.

This is derived from the grsecurity patch, although generated from scratch
because it's simpler than extracting the changes from there.

Signed-off-by: James Morris
Acked-by: Serge Hallyn
Acked-by: Casey Schaufler
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

James Morris
2009-09-23 22:39:29 +0800

22 Sep, 2009

2 commits

6e1d5dcc2 const: mark remaining inode_operations as const ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-09-22 22:17:24 +0800
b87221de6 const: mark remaining super_operations const ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-09-22 22:17:24 +0800

11 Sep, 2009

1 commit

d993831fa writeback: add name to backing_dev_info ... Browse Code »

This enables us to track who does what and print info. Its main use
is catching dirty inodes on the default_backing_dev_info, so we can
fix that up.

Signed-off-by: Jens Axboe

Jens Axboe
2009-09-11 15:20:26 +0800

30 Jul, 2009

2 commits

887032670 cgroup avoid permanent sleep at rmdir ... Browse Code »

After commit ec64f51545fffbc4cb968f0cea56341a4b07e85a ("cgroup: fix
frequent -EBUSY at rmdir"), cgroup's rmdir (especially against memcg)
doesn't return -EBUSY by temporary ref counts. That commit expects all
refs after pre_destroy() is temporary but...it wasn't. Then, rmdir can
wait permanently. This patch tries to fix that and change followings.

- set CGRP_WAIT_ON_RMDIR flag before pre_destroy().
- clear CGRP_WAIT_ON_RMDIR flag when the subsys finds racy case.
if there are sleeping ones, wakes them up.
- rmdir() sleeps only when CGRP_WAIT_ON_RMDIR flag is set.

Tested-by: Daisuke Nishimura
Reported-by: Daisuke Nishimura
Reviewed-by: Paul Menage
Acked-by: Balbir Sigh
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-07-30 10:10:35 +0800
096b7fe01 cgroups: fix pid namespace bug ... Browse Code »

The bug was introduced by commit cc31edceee04a7b87f2be48f9489ebb72d264844
("cgroups: convert tasks file to use a seq_file with shared pid array").

We cache a pid array for all threads that are opening the same "tasks"
file, but the pids in the array are always from the namespace of the
last process that opened the file, so all other threads will read pids
from that namespace instead of their own namespaces.

To fix it, we maintain a list of pid arrays, which is keyed by pid_ns.
The list will be of length 1 at most time.

Reported-by: Paul Menage
Idea-by: Paul Menage
Signed-off-by: Li Zefan
Reviewed-by: Serge Hallyn
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-07-30 10:10:35 +0800

19 Jun, 2009

1 commit

f9ab5b5b0 cgroups: forbid noprefix if mounting more than just cpuset subsystem ... Browse Code »

The 'noprefix' option was introduced for backwards-compatibility of
cpuset, but actually it can be used when mounting other subsystems.

This results in possibility of name collision, and now the collision can
really happen, because we have 'stat' file in both memory and cpuacct
subsystem:

# mount -t cgroup -o noprefix,memory,cpuacct xxx /mnt

Cgroup will happily mount the 2 subsystems, but only 'stat' file of memory
subsys can be seen.

We don't want users to use nopreifx, and also want to avoid name
collision, so we change to allow noprefix only if mounting just the cpuset
subsystem.

[akpm@linux-foundation.org: fix shift for cpuset_subsys_id >= 32]
Signed-off-by: Li Zefan
Cc: Paul Menage
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Acked-by: Dhaval Giani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-06-19 04:03:46 +0800

12 Jun, 2009

1 commit

337eb00a2 Push BKL down into ->remount_fs() ... Browse Code »

[xfs, btrfs, capifs, shmem don't need BKL, exempt]

Signed-off-by: Alessio Igor Bogani
Signed-off-by: Al Viro

Alessio Igor Bogani
2009-06-12 09:36:11 +0800

09 May, 2009

1 commit

6f5bbff9a Convert obvious places to deactivate_locked_super() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2009-05-09 22:49:40 +0800

03 Apr, 2009

7 commits

0b7f569e4 memcg: fix OOM killer under memcg ... Browse Code »

This patch tries to fix OOM Killer problems caused by hierarchy.
Now, memcg itself has OOM KILL function (in oom_kill.c) and tries to
kill a task in memcg.

But, when hierarchy is used, it's broken and correct task cannot
be killed. For example, in following cgroup

/groupA/ hierarchy=1, limit=1G,
01 nolimit
02 nolimit
All tasks' memory usage under /groupA, /groupA/01, groupA/02 is limited to
groupA's 1Gbytes but OOM Killer just kills tasks in groupA.

This patch provides makes the bad process be selected from all tasks
under hierarchy. BTW, currently, oom_jiffies is updated against groupA
in above case. oom_jiffies of tree should be updated.

To see how oom_jiffies is used, please check mem_cgroup_oom_called()
callers.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: const fix]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Paul Menage
Cc: Li Zefan
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-04-03 10:04:55 +0800
0670e08bd cgroups: don't change release_agent when remount failed ... Browse Code »

Remount can fail in either case:
- wrong mount options is specified, or option 'noprefix' is changed.
- a to-be-added subsys is already mounted/active.

When using remount to change 'release_agent', for the above former failure
case, remount will return errno with release_agent unchanged, but for the
latter case, remount will return EBUSY with relase_agent changed, which is
unexpected I think:

# mount -t cgroup -o cpu xxx /cgrp1
# mount -t cgroup -o cpuset,release_agent=agent1 yyy /cgrp2
# cat /cgrp2/release_agent
agent1
# mount -t cgroup -o remount,cpuset,noprefix,release_agent=agent2 yyy /cgrp2
mount: /cgrp2 not mounted already, or bad option
# cat /cgrp2/release_agent
agent1
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-04-03 10:04:54 +0800
099fca322 cgroups: show correct file mode ... Browse Code »

We have some read-only files and write-only files, but currently they are
all set to 0644, which is counter-intuitive and cause trouble for some
cgroup tools like libcgroup.

This patch adds 'mode' to struct cftype to allow cgroup subsys to set it's
own files' file mode, and for the most cases cft->mode can be default to 0
and cgroup will figure out proper mode.

Acked-by: Paul Menage
Reviewed-by: KAMEZAWA Hiroyuki
Signed-off-by: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-04-03 10:04:54 +0800
66bdc9cfc kernel/cgroup.c: kfree(NULL) is legal ... Browse Code »

Reduces object file size a bit:

Before:
$ size kernel/cgroup.o
text data bss dec hex filename
21593 7804 4924 34321 8611 kernel/cgroup.o
After:
$ size kernel/cgroup.o
text data bss dec hex filename
21537 7744 4924 34205 859d kernel/cgroup.o

Signed-off-by: Jesper Juhl
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2009-04-03 10:04:54 +0800
ec64f5154 cgroup: fix frequent -EBUSY at rmdir ... Browse Code »

In following situation, with memory subsystem,

/groupA use_hierarchy==1
/01 some tasks
/02 some tasks
/03 some tasks
/04 empty

When tasks under 01/02/03 hit limit on /groupA, hierarchical reclaim
is triggered and the kernel walks tree under groupA. In this case,
rmdir /groupA/04 fails with -EBUSY frequently because of temporal
refcnt from the kernel.

In general. cgroup can be rmdir'd if there are no children groups and
no tasks. Frequent fails of rmdir() is not useful to users.
(And the reason for -EBUSY is unknown to users.....in most cases)

This patch tries to modify above behavior, by
- retries if css_refcnt is got by someone.
- add "return value" to pre_destroy() and allows subsystem to
say "we're really busy!"

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Paul Menage
Cc: Li Zefan
Cc: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-04-03 10:04:54 +0800
38460b48d cgroup: CSS ID support ... Browse Code »

Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.

This patch attaches unique ID to each css and provides following.

- css_lookup(subsys, id)
returns pointer to struct cgroup_subysys_state of id.
- css_get_next(subsys, id, rootid, depth, foundid)
returns the next css under "root" by scanning

When cgroup_subsys->use_id is set, an id for css is maintained.

The cgroup framework only parepares
- css_id of root css for subsys
- id is automatically attached at creation of css.
- id is *not* freed automatically. Because the cgroup framework
don't know lifetime of cgroup_subsys_state.
free_css_id() function is provided. This must be called by subsys.

There are several reasons to develop this.
- Saving space .... For example, memcg's swap_cgroup is array of
pointers to cgroup. But it is not necessary to be very fast.
By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
reduce much amount of memory usage.

- Scanning without lock.
CSS_ID provides "scan id under this ROOT" function. By this, scanning
css under root can be written without locks.
ex)
do {
rcu_read_lock();
next = cgroup_get_next(subsys, id, root, &found);
/* check sanity of next here */
css_tryget();
rcu_read_unlock();
id = found + 1
} while(...)

Characteristics:
- Each css has unique ID under subsys.
- Lifetime of ID is controlled by subsys.
- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
- Allowed ID is 1-65535, ID 0 is UNUSED ID.

Design Choices:
- scan-by-ID v.s. scan-by-tree-walk.
As /proc's pid scan does, scan-by-ID is robust when scanning is done
by following kind of routine.
scan -> rest a while(release a lock) -> conitunue from interrupted
memcg's hierarchical reclaim does this.

- When subsys->use_id is set, # of css in the system is limited to
65535.

[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Paul Menage
Cc: Li Zefan
Cc: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Bharata B Rao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-04-03 10:04:53 +0800
313e924c0 cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups ... Browse Code »

The ns_proxy cgroup allows moving processes to child cgroups only one
level deep at a time. This commit relaxes this restriction and makes it
possible to attach tasks directly to grandchild cgroups, e.g.:

($pid is in the root cgroup)
echo $pid > /cgroup/CG1/CG2/tasks

Previously this operation would fail with -EPERM and would have to be
performed as two steps:
echo $pid > /cgroup/CG1/tasks
echo $pid > /cgroup/CG1/CG2/tasks

Also, the target cgroup no longer needs to be empty to move a task there.

Signed-off-by: Grzegorz Nosek
Acked-by: Serge Hallyn
Reviewed-by: Li Zefan
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Grzegorz Nosek
2009-04-03 10:04:53 +0800

28 Mar, 2009

2 commits

a3ec947c8 vfs: simple_set_mnt() should return void ... Browse Code »

simple_set_mnt() is defined as returning 'int' but always returns 0.
Callers assume simple_set_mnt() never fails and don't properly cleanup if
it were to _ever_ fail. For instance, get_sb_single() and get_sb_nodev()
should:

up_write(sb->s_unmount);
deactivate_super(sb);

if simple_set_mnt() fails.

Since simple_set_mnt() never fails, would be cleaner if it did not
return anything.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Sukadev Bhattiprolu
Acked-by: Serge Hallyn
Cc: Al Viro
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Sukadev Bhattiprolu
2009-03-28 02:44:03 +0800
3ba13d179 constify dentry_operations: rest ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2009-03-28 02:44:03 +0800

19 Feb, 2009

1 commit

67e055d14 cgroups: fix possible use after free ... Browse Code »

In cgroup_kill_sb(), root is freed before sb is detached from the list, so
another sget() may find this sb and call cgroup_test_super(), which will
access the root that has been freed.

Reported-by: Al Viro
Signed-off-by: Li Zefan
Acked-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-02-19 07:37:54 +0800

12 Feb, 2009

1 commit

cfebe563b cgroups: fix lockdep subclasses overflow ... Browse Code »

I enabled all cgroup subsystems when compiling kernel, and then:
# mount -t cgroup -o net_cls xxx /mnt
# mkdir /mnt/0

This showed up immediately:
BUG: MAX_LOCKDEP_SUBCLASSES too low!
turning off the locking correctness validator.

It's caused by the cgroup hierarchy lock:
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
if (ss->root == root)
mutex_lock_nested(&ss->hierarchy_mutex, i);
}

Now we have 9 cgroup subsystems, and the above 'i' for net_cls is 8, but
MAX_LOCKDEP_SUBCLASSES is 8.

This patch uses different lockdep keys for different subsystems.

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-02-12 06:25:36 +0800

30 Jan, 2009

4 commits

839ec5452 cgroup: fix root_count when mount fails due to busy subsystem ... Browse Code »

root_count was being incremented in cgroup_get_sb() after all error
checking was complete, but decremented in cgroup_kill_sb(), which can be
called on a superblock that we gave up on due to an error. This patch
changes cgroup_kill_sb() to only decrement root_count if the root was
previously linked into the list of roots.

Signed-off-by: Paul Menage
Tested-by: Serge Hallyn
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-01-30 10:04:45 +0800
804b3c28a cgroups: add cpu_relax() calls in css_tryget() and cgroup_clear_css_refs() ... Browse Code »

css_tryget() and cgroup_clear_css_refs() contain polling loops; these
loops should have cpu_relax calls in them to reduce cross-cache traffic.

Signed-off-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-01-30 10:04:45 +0800
1404f0656 cgroups: fix lock inconsistency in cgroup_clone() ... Browse Code »

I fixed a bug in cgroup_clone() in Linus' tree in commit 7b574b7
("cgroups: fix a race between cgroup_clone and umount") without noticing
there was a cleanup patch in -mm tree that should be rebased (now commit
104cbd5, "cgroups: use task_lock() for access tsk->cgroups safe in
cgroup_clone()"), thus resulted in lock inconsistency.

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-01-30 10:04:45 +0800
baef99a08 cgroups: use hierarchy mutex in creation failure path ... Browse Code »

Now, cgrp->sibling is handled under hierarchy mutex.
error route should do so, too.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Acked-by Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-30 10:04:43 +0800

09 Jan, 2009

2 commits

e7c5ec919 cgroups: add css_tryget() ... Browse Code »

Add css_tryget(), that obtains a counted reference on a CSS. It is used
in situations where the caller has a "weak" reference to the CSS, i.e.
one that does not protect the cgroup from removal via a reference count,
but would instead be cleaned up by a destroy() callback.

css_tryget() will return true on success, or false if the cgroup is being
removed.

This is similar to Kamezawa Hiroyuki's patch from a week or two ago, but
with the difference that in the event of css_tryget() racing with a
cgroup_rmdir(), css_tryget() will only return false if the cgroup really
does get removed.

This implementation is done by biasing css->refcnt, so that a refcnt of 1
means "releasable" and 0 means "released or releasing". In the event of a
race, css_tryget() distinguishes between "released" and "releasing" by
checking for the CSS_REMOVED flag in css->flags.

Signed-off-by: Paul Menage
Tested-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-01-09 00:31:10 +0800
999cd8a45 cgroups: add a per-subsystem hierarchy_mutex ... Browse Code »

These patches introduce new locking/refcount support for cgroups to
reduce the need for subsystems to call cgroup_lock(). This will
ultimately allow the atomicity of cgroup_rmdir() (which was removed
recently) to be restored.

These three patches give:

1/3 - introduce a per-subsystem hierarchy_mutex which a subsystem can
use to prevent changes to its own cgroup tree

2/3 - use hierarchy_mutex in place of calling cgroup_lock() in the
memory controller

3/3 - introduce a css_tryget() function similar to the one recently
proposed by Kamezawa, but avoiding spurious refcount failures in
the event of a race between a css_tryget() and an unsuccessful
cgroup_rmdir()

Future patches will likely involve:

- using hierarchy mutex in place of cgroup_lock() in more subsystems
where appropriate

- restoring the atomicity of cgroup_rmdir() with respect to cgroup_create()

This patch:

Add a hierarchy_mutex to the cgroup_subsys object that protects changes to
the hierarchy observed by that subsystem. It is taken by the cgroup
subsystem (in addition to cgroup_mutex) for the following operations:

- linking a cgroup into that subsystem's cgroup tree
- unlinking a cgroup from that subsystem's cgroup tree
- moving the subsystem to/from a hierarchy (including across the
bind() callback)

Thus if the subsystem holds its own hierarchy_mutex, it can safely
traverse its own hierarchy.

Signed-off-by: Paul Menage
Tested-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-01-09 00:31:10 +0800