Eric Lee / smarc-fsl-linux-kernel

22 Sep, 2015

1 commit

d3b428f03 fs: create and use seq_show_option for escaping ... Browse Code »

commit a068acf2ee77693e0bf39d6e07139ba704f461c3 upstream.

Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g. new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else. This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use
of "sudo" is something more sneaky:

$ BASE="ovl"
$ MNT="$BASE/mnt"
$ LOW="$BASE/lower"
$ UP="$BASE/upper"
$ WORK="$BASE/work/ 0 0
none /proc fuse.pwn user_id=1000"
$ mkdir -p "$LOW" "$UP" "$WORK"
$ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
$ cat /proc/mounts
none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
none /proc fuse.pwn user_id=1000 0 0
$ fusermount -u /proc
$ cat /proc/mounts
cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed. Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook
Acked-by: Serge Hallyn
Acked-by: Jan Kara
Acked-by: Paul Moore
Cc: J. R. Okajima
Signed-off-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Kees Cook
2015-09-22 01:05:45 +0800

22 Jul, 2015

1 commit

28dd1f346 sysfs: Create mountpoints with sysfs_create_mount_point ... Browse Code »

commit f9bb48825a6b5d02f4cabcc78967c75db903dcdc upstream.

This allows for better documentation in the code and
it allows for a simpler and fully correct version of
fs_fully_visible to be written.

The mount points converted and their filesystems are:
/sys/hypervisor/s390/ s390_hypfs
/sys/kernel/config/ configfs
/sys/kernel/debug/ debugfs
/sys/firmware/efi/efivars/ efivarfs
/sys/fs/fuse/connections/ fusectl
/sys/fs/pstore/ pstore
/sys/kernel/tracing/ tracefs
/sys/fs/cgroup/ cgroup
/sys/kernel/security/ securityfs
/sys/fs/selinux/ selinuxfs
/sys/fs/smackfs/ smackfs

Acked-by: Greg Kroah-Hartman
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2015-07-22 01:10:01 +0800

16 Apr, 2015

2 commits

94ff212d0 cgroup: remove use of seq_printf return value ... Browse Code »

The seq_printf return value, because it's frequently misused,
will eventually be converted to void.

See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
seq_has_overflowed() and make public")

Signed-off-by: Joe Perches
Acked-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2015-04-16 07:35:25 +0800
adbe427b9 memcg: zap mem_cgroup_lookup() ... Browse Code »

mem_cgroup_lookup() is a wrapper around mem_cgroup_from_id(), which
checks that id != 0 before issuing the function call. Today, there is
no point in this additional check apart from optimization, because there
is no css with id 0 to css_from_id.

Signed-off-by: Vladimir Davydov
Acked-by: Michal Hocko
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2015-04-16 07:35:16 +0800

03 Mar, 2015

2 commits

587945147 cgroup: Use kvfree in pidlist_free() ... Browse Code »

The wrapper already calls the appropriate free
function, use it instead of spinning our own.

Signed-off-by: Bandan Das
Acked-by: Zefan Li
Signed-off-by: Tejun Heo

Bandan Das
2015-03-03 21:47:25 +0800
295458e67 cgroup: call cgroup_subsys->bind on cgroup subsys initialization ... Browse Code »

Currently, we call cgroup_subsys->bind only on unmount, remount, and
when creating a new root on mount. Since the default hierarchy root is
created in cgroup_init, we will not call cgroup_subsys->bind if the
default hierarchy is freshly mounted. As a result, some controllers will
behave incorrectly (most notably, the "memory" controller will not
enable hierarchy support). Fix this by calling cgroup_subsys->bind right
after initializing a cgroup subsystem.

Signed-off-by: Vladimir Davydov
Signed-off-by: Tejun Heo

Vladimir Davydov
2015-03-03 01:11:01 +0800

14 Feb, 2015

1 commit

dfeb0750b kernfs: remove KERNFS_STATIC_NAME ... Browse Code »

When a new kernfs node is created, KERNFS_STATIC_NAME is used to avoid
making a separate copy of its name. It's currently only used for sysfs
attributes whose filenames are required to stay accessible and unchanged.
There are rare exceptions where these names are allocated and formatted
dynamically but for the vast majority of cases they're consts in the
rodata section.

Now that kernfs is converted to use kstrdup_const() and kfree_const(),
there's little point in keeping KERNFS_STATIC_NAME around. Remove it.

Signed-off-by: Tejun Heo
Cc: Andrzej Hajda
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2015-02-14 13:21:36 +0800

13 Feb, 2015

1 commit

01e586598 cgroup: release css->id after css_free ... Browse Code »

Currently, we release css->id in css_release_work_fn, right before calling
css_free callback, so that when css_free is called, the id may have
already been reused for a new cgroup.

I am going to use css->id to create unique names for per memcg kmem
caches. Since kmem caches are destroyed only on css_free, I need css->id
to be freed after css_free was called to avoid name clashes. This patch
therefore moves css->id removal to css_free_work_fn. To prevent
css_from_id from returning a pointer to a stale css, it makes
css_release_work_fn replace the css ptr at css_idr:css->id with NULL.

Signed-off-by: Vladimir Davydov
Cc: Johannes Weiner
Cc: Michal Hocko
Acked-by: Tejun Heo
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Dave Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2015-02-13 10:54:09 +0800

22 Jan, 2015

1 commit

3c606d35f cgroup: prevent mount hang due to memory controller lifetime ... Browse Code »

Since b2052564e66d ("mm: memcontrol: continue cache reclaim from
offlined groups"), re-mounting the memory controller after using it is
very likely to hang.

The cgroup core assumes that any remaining references after deleting a
cgroup are temporary in nature, and synchroneously waits for them, but
the above-mentioned commit has left-over page cache pin its css until
it is reclaimed naturally. That being said, swap entries and charged
kernel memory have been doing the same indefinite pinning forever, the
bug is just more likely to trigger with left-over page cache.

Reparenting kernel memory is highly impractical, which leaves changing
the cgroup assumptions to reflect this: once a controller has been
mounted and used, it has internal state that is independent from mount
and cgroup lifetime. It can be unmounted and remounted, but it can't
be reconfigured during subsequent mounts.

Don't offline the controller root as long as there are any children,
dead or alive. A remount will no longer wait for these old references
to drain, it will simply mount the persistent controller state again.

Reported-by: "Suzuki K. Poulose"
Reported-by: Will Deacon
Signed-off-by: Johannes Weiner
Signed-off-by: Tejun Heo

Johannes Weiner
2015-01-22 23:26:43 +0800

18 Nov, 2014

6 commits

eeecbd197 cgroup: implement cgroup_get_e_css() ... Browse Code »

Implement cgroup_get_e_css() which finds and gets the effective css
for the specified cgroup and subsystem combination. This function
always returns a valid pinned css. This will be used by cgroup
writeback support.

While at it, add comment to cgroup_e_css() to explain why that
function is different from cgroup_get_e_css() and has to test
cgrp->child_subsys_mask instead of cgroup_css(cgrp, ss).

Signed-off-by: Tejun Heo
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:52 +0800
56c807ba4 cgroup: add cgroup_subsys->css_e_css_changed() ... Browse Code »

Add a new cgroup_subsys operatoin ->css_e_css_changed(). This is
invoked if any of the effective csses seen from the css's cgroup may
have changed. This will be used to implement cgroup writeback
support.

Signed-off-by: Tejun Heo
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:51 +0800
7d172cc89 cgroup: add cgroup_subsys->css_released() ... Browse Code »

Add a new cgroup subsys callback css_released(). This is called when
the reference count of the css (cgroup_subsys_state) reaches zero
before RCU scheduling free.

Signed-off-by: Tejun Heo
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:51 +0800
db6e30534 cgroup: fix the async css offline wait logic in cgroup_subtree_control_write() ... Browse Code »

When a subsystem is offlined, its entry on @cgrp->subsys[] is cleared
asynchronously. If cgroup_subtree_control_write() is requested to
enable the subsystem again before the entry is cleared, it has to wait
for the previous offlining to finish and clear the @cgrp->subsys[]
entry before trying to enable the subsystem again.

This is currently done while verifying the input enable / disable
parameters. This used to be correct but f63070d350e3 ("cgroup: make
interface files visible iff enabled on cgroup->subtree_control")
breaks it. The commit is one of the commits implementing subsystem
dependency.

Through subsystem dependency, some subsystems may be enabled and
disabled implicitly in addition to the explicitly requested ones. The
actual subsystems to be enabled and disabled are determined during
@css_enable/disable calculation. The current offline wait logic skips
the ones which are already implicitly enabled and then waits for
subsystems in @enable; however, this misses the subsystems which may
be implicitly enabled through dependency from @enable. If such
implicitly subsystem hasn't yet finished offlining yet, the function
ends up trying to create a css when its @cgrp->subsys[] slot is
already occupied triggering BUG_ON() in init_and_link_css().

Fix it by moving the wait logic after @css_enable is calculated and
waiting for all the subsystems in @css_enable. This fixes the above
bug as the mask contains all subsystems which are to be enabled
including the ones enabled through dependencies.

Signed-off-by: Tejun Heo
Fixes: f63070d350e3 ("cgroup: make interface files visible iff enabled on cgroup->subtree_control")
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:51 +0800
755bf5ee8 cgroup: restructure child_subsys_mask handling in cgroup_subtree_control_write() ... Browse Code »

Make cgroup_subtree_control_write() first calculate new
subtree_control (new_sc), child_subsys_mask (new_ss) and
css_enable/disable masks before applying them to the cgroup. Also,
store the original subtree_control (old_sc) and child_subsys_mask
(old_ss) and use them to restore the orignal state after failure.

This patch shouldn't cause any behavior changes. This prepares for a
fix for a bug in the async css offline wait logic.

Signed-off-by: Tejun Heo
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:50 +0800
0f060deb5 cgroup: separate out cgroup_calc_child_subsys_mask() from cgroup_refresh_child_subsys_mask() ... Browse Code »

cgroup_refresh_child_subsys_mask() calculates and updates the
effective @cgrp->child_subsys_maks according to the current
@cgrp->subtree_control. Separate out the calculation part into
cgroup_calc_child_subsys_mask(). This will be used to fix a bug in
the async css offline wait logic.

Signed-off-by: Tejun Heo
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:50 +0800

10 Oct, 2014

2 commits

c798360cd Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

Pull percpu updates from Tejun Heo:
"A lot of activities on percpu front. Notable changes are...

- percpu allocator now can take @gfp. If @gfp doesn't contain
GFP_KERNEL, it tries to allocate from what's already available to
the allocator and a work item tries to keep the reserve around
certain level so that these atomic allocations usually succeed.

This will replace the ad-hoc percpu memory pool used by
blk-throttle and also be used by the planned blkcg support for
writeback IOs.

Please note that I noticed a bug in how @gfp is interpreted while
preparing this pull request and applied the fix 6ae833c7fe0c
("percpu: fix how @gfp is interpreted by the percpu allocator")
just now.

- percpu_ref now uses longs for percpu and global counters instead of
ints. It leads to more sparse packing of the percpu counters on
64bit machines but the overhead should be negligible and this
allows using percpu_ref for refcnting pages and in-memory objects
directly.

- The switching between percpu and single counter modes of a
percpu_ref is made independent of putting the base ref and a
percpu_ref can now optionally be initialized in single or killed
mode. This allows avoiding percpu shutdown latency for cases where
the refcounted objects may be synchronously created and destroyed
in rapid succession with only a fraction of them reaching fully
operational status (SCSI probing does this when combined with
blk-mq support). It's also planned to be used to implement forced
single mode to detect underflow more timely for debugging.

There's a separate branch percpu/for-3.18-consistent-ops which cleans
up the duplicate percpu accessors. That branch causes a number of
conflicts with s390 and other trees. I'll send a separate pull
request w/ resolutions once other branches are merged"

* 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits)
percpu: fix how @gfp is interpreted by the percpu allocator
blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky
percpu_ref: add PERCPU_REF_INIT_* flags
percpu_ref: decouple switching to percpu mode and reinit
percpu_ref: decouple switching to atomic mode and killing
percpu_ref: add PCPU_REF_DEAD
percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
percpu_ref: replace pcpu_ prefix with percpu_
percpu_ref: minor code and comment updates
percpu_ref: relocate percpu_ref_reinit()
Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
Revert "percpu: free percpu allocation info for uniprocessor system"
percpu-refcount: make percpu_ref based on longs instead of ints
percpu-refcount: improve WARN messages
percpu: fix locking regression in the failure path of pcpu_alloc()
percpu-refcount: add @gfp to percpu_ref_init()
proportions: add @gfp to init functions
percpu_counter: add @gfp to percpu_counter_init()
percpu_counter: make percpu_counters_lock irq-safe
...

Linus Torvalds
2014-10-10 19:26:02 +0800
b211e9d7c Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:
"Nothing too interesting. Just a handful of cleanup patches"

* 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Revert "cgroup: remove redundant variable in cgroup_mount()"
cgroup: remove redundant variable in cgroup_mount()
cgroup: fix missing unlock in cgroup_release_agent()
cgroup: remove CGRP_RELEASABLE flag
perf/cgroup: Remove perf_put_cgroup()
cgroup: remove redundant check in cgroup_ino()
cpuset: simplify proc_cpuset_show()
cgroup: simplify proc_cgroup_show()
cgroup: use a per-cgroup work for release agent
cgroup: remove bogus comments
cgroup: remove redundant code in cgroup_rmdir()
cgroup: remove some useless forward declarations
cgroup: fix a typo in comment.

Linus Torvalds
2014-10-10 19:24:40 +0800

26 Sep, 2014

1 commit

e756c7b69 Revert "cgroup: remove redundant variable in cgroup_mount()" ... Browse Code »

This reverts commit 0c7bf3e8cab7900e17ce7f97104c39927d835469.

If there are child cgroups in the cgroupfs and then we umount it,
the superblock will be destroyed but the cgroup_root will be kept
around. When we mount it again, cgroup_mount() will find this
cgroup_root and allocate a new sb for it.

So with this commit we will be trapped in a dead loop in the case
described above, because kernfs_pin_sb() keeps returning NULL.

Currently I don't see how we can avoid using both pinned_sb and
new_sb, so just revert it.

Cc: Al Viro
Reported-by: Andrey Wagin
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Zefan Li
2014-09-26 12:16:23 +0800

25 Sep, 2014

2 commits

2aad2a86f percpu_ref: add PERCPU_REF_INIT_* flags ... Browse Code »

With the recent addition of percpu_ref_reinit(), percpu_ref now can be
used as a persistent switch which can be turned on and off repeatedly
where turning off maps to killing the ref and waiting for it to drain;
however, there currently isn't a way to initialize a percpu_ref in its
off (killed and drained) state, which can be inconvenient for certain
persistent switch use cases.

Similarly, percpu_ref_switch_to_atomic/percpu() allow dynamic
selection of operation mode; however, currently a newly initialized
percpu_ref is always in percpu mode making it impossible to avoid the
latency overhead of switching to atomic mode.

This patch adds @flags to percpu_ref_init() and implements the
following flags.

* PERCPU_REF_INIT_ATOMIC : start ref in atomic mode
* PERCPU_REF_INIT_DEAD : start ref killed and drained

These flags should be able to serve the above two use cases.

v2: target_core_tpg.c conversion was missing. Fixed.

Signed-off-by: Tejun Heo
Reviewed-by: Kent Overstreet
Cc: Jens Axboe
Cc: Christoph Hellwig
Cc: Johannes Weiner

Tejun Heo
2014-09-25 01:31:50 +0800
d06efebf0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/… ... Browse Code »

…linux-block into for-3.18

This is to receive 0a30288da1ae ("blk-mq, percpu_ref: implement a
kludge for SCSI blk-mq stall during probe") which implements
__percpu_ref_kill_expedited() to work around SCSI blk-mq stall. The
commit reverted and patches to implement proper fix will be added.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>

Tejun Heo
2014-09-25 01:00:21 +0800

21 Sep, 2014

2 commits

0c7bf3e8c cgroup: remove redundant variable in cgroup_mount() ... Browse Code »

Both pinned_sb and new_sb indicate if a new superblock is needed,
so we can just remove new_sb.

Note now we must check if kernfs_tryget_sb() returns NULL, because
when it returns NULL, kernfs_mount() may still re-use an existing
superblock, which is just allocated by another concurent mount.

Suggested-by: Tejun Heo
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Zefan Li
2014-09-21 01:09:35 +0800
3e2cd91ab cgroup: fix missing unlock in cgroup_release_agent() ... Browse Code »

The patch 971ff4935538: "cgroup: use a per-cgroup work for release
agent" from Sep 18, 2014, leads to the following static checker
warning:

kernel/cgroup.c:5310 cgroup_release_agent()
warn: 'mutex:&cgroup_mutex' is sometimes locked here and sometimes unlocked.

Reported-by: Dan Carpenter
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Zefan Li
2014-09-21 00:23:35 +0800

19 Sep, 2014

4 commits

a25eb52e8 cgroup: remove CGRP_RELEASABLE flag ... Browse Code »

We call put_css_set() after setting CGRP_RELEASABLE flag in
cgroup_task_migrate(), but in other places we call it without setting
the flag. I don't see the necessity of this flag.

Moreover once the flag is set, it will never be cleared, unless writing
to the notify_on_release control file, so it can be quite confusing
if we look at the output of debug.releasable.

# mount -t cgroup -o debug xxx /cgroup
# mkdir /cgroup/child
# cat /cgroup/child/debug.releasable
0 /cgroup/child/tasks
# cat /cgroup/child/debug.releasable
0
# echo $$ > /cgroup/tasks && echo $$ > /cgroup/child/tasks
# cat /proc/child/debug.releasable
1
Signed-off-by: Tejun Heo

Zefan Li
2014-09-19 21:29:32 +0800
006f4ac49 cgroup: simplify proc_cgroup_show() ... Browse Code »

Use the ONE macro instead of REG, and we can simplify proc_cgroup_show().

Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Zefan Li
2014-09-19 01:27:23 +0800
971ff4935 cgroup: use a per-cgroup work for release agent ... Browse Code »

Instead of using a global work to schedule release agent on removable
cgroups, we change to use a per-cgroup work to do this, which makes
the code much simpler.

v2: use a dedicated work instead of reusing css->destroy_work. (Tejun)

Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Zefan Li
2014-09-19 01:14:22 +0800
eb4aec84d cgroup: fix unbalanced locking ... Browse Code »

cgroup_pidlist_start() holds cgrp->pidlist_mutex and then calls
pidlist_array_load(), and cgroup_pidlist_stop() releases the mutex.

It is wrong that we release the mutex in the failure path in
pidlist_array_load(), because cgroup_pidlist_stop() will be called
no matter if cgroup_pidlist_start() returns errno or not.

Fixes: 4bac00d16a8760eae7205e41d2c246477d42a210
Cc: # 3.14+
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo
Acked-by: Cong Wang

Zefan Li
2014-09-19 00:32:52 +0800

18 Sep, 2014

4 commits

0c8fc2c12 cgroup: remove bogus comments ... Browse Code »

We never grab cgroup mutex in fork and exit paths no matter whether
notify_on_release is set or not.

Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Li Zefan
2014-09-18 05:34:16 +0800
244bb9a63 cgroup: remove redundant code in cgroup_rmdir() ... Browse Code »

We no longer clear kn->priv in cgroup_rmdir(), so we don't need
to get an extra refcnt.

Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Li Zefan
2014-09-18 05:34:15 +0800
6213daab2 cgroup: remove some useless forward declarations ... Browse Code »

Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Li Zefan
2014-09-18 05:34:15 +0800
9253b279f Merge branch 'for-3.17-fixes' of ra.kernel.org:/pub/scm/linux/kernel/git/tj/cgroup into for-3.18 ... Browse Code »

Pull to receive a4189487da1b ("cgroup: delay the clearing of
cgrp->kn->priv") for the scheduled clean up patches.

Signed-off-by: Tejun Heo

Tejun Heo
2014-09-18 05:29:05 +0800

08 Sep, 2014

1 commit

a34375ef9 percpu-refcount: add @gfp to percpu_ref_init() ... Browse Code »

Percpu allocator now supports allocation mask. Add @gfp to
percpu_ref_init() so that !GFP_KERNEL allocation masks can be used
with percpu_refs too.

This patch doesn't make any functional difference.

v2: blk-mq conversion was missing. Updated.

Signed-off-by: Tejun Heo
Cc: Kent Overstreet
Cc: Benjamin LaHaise
Cc: Li Zefan
Cc: Nicholas A. Bellinger
Cc: Jens Axboe

Tejun Heo
2014-09-08 08:51:30 +0800

05 Sep, 2014

2 commits

aa32362f0 cgroup: check cgroup liveliness before unbreaking kernfs ... Browse Code »

When cgroup_kn_lock_live() is called through some kernfs operation and
another thread is calling cgroup_rmdir(), we'll trigger the warning in
cgroup_get().

------------[ cut here ]------------
WARNING: CPU: 1 PID: 1228 at kernel/cgroup.c:1034 cgroup_get+0x89/0xa0()
...
Call Trace:
[] dump_stack+0x41/0x52
[] warn_slowpath_common+0x7f/0xa0
[] warn_slowpath_null+0x1d/0x20
[] cgroup_get+0x89/0xa0
[] cgroup_kn_lock_live+0x28/0x70
[] __cgroup_procs_write.isra.26+0x51/0x230
[] cgroup_tasks_write+0x12/0x20
[] cgroup_file_write+0x40/0x130
[] kernfs_fop_write+0xd1/0x160
[] vfs_write+0x98/0x1e0
[] SyS_write+0x4d/0xa0
[] sysenter_do_call+0x12/0x12
---[ end trace 6f2e0c38c2108a74 ]---

Fix this by calling css_tryget() instead of cgroup_get().

v2:
- move cgroup_tryget() right below cgroup_get() definition. (Tejun)

Cc: # 3.15+
Reported-by: Toralf Förster
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Li Zefan
2014-09-05 00:36:19 +0800
a4189487d cgroup: delay the clearing of cgrp->kn->priv ... Browse Code »

Run these two scripts concurrently:

for ((; ;))
{
mkdir /cgroup/sub
rmdir /cgroup/sub
}

for ((; ;))
{
echo $$ > /cgroup/sub/cgroup.procs
echo $$ > /cgroup/cgroup.procs
}

A kernel bug will be triggered:

BUG: unable to handle kernel NULL pointer dereference at 00000038
IP: [] cgroup_put+0x9/0x80
...
Call Trace:
[] cgroup_kn_unlock+0x39/0x50
[] cgroup_kn_lock_live+0x61/0x70
[] __cgroup_procs_write.isra.26+0x51/0x230
[] cgroup_tasks_write+0x12/0x20
[] cgroup_file_write+0x40/0x130
[] kernfs_fop_write+0xd1/0x160
[] vfs_write+0x98/0x1e0
[] SyS_write+0x4d/0xa0
[] sysenter_do_call+0x12/0x12

We clear cgrp->kn->priv in the end of cgroup_rmdir(), but another
concurrent thread can access kn->priv after the clearing.

We should move the clearing to css_release_work_fn(). At that time
no one is holding reference to the cgroup and no one can gain a new
reference to access it.

v2:
- move RCU_INIT_POINTER() into the else block. (Tejun)
- remove the cgroup_parent() check. (Tejun)
- update the comment in css_tryget_online_from_dir().

Cc: # 3.15+
Reported-by: Toralf Förster
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Li Zefan
2014-09-05 00:36:18 +0800

25 Aug, 2014

1 commit

251f8c036 cgroup: fix a typo in comment. ... Browse Code »

There is no function named cgroup_enable_task_cg_links().
Instead, the correct function name in this comment should
be cgroup_enabled_task_cg_lists().

Signed-off-by: Dongsheng Yang
Signed-off-by: Tejun Heo

Dongsheng Yang
2014-08-25 22:49:29 +0800

23 Aug, 2014

1 commit

fa8137be6 cgroup: Display legacy cgroup files on default hierarchy ... Browse Code »

Kernel command line parameter cgroup__DEVEL__legacy_files_on_dfl forces
legacy cgroup files to show up on default hierarhcy if susbsystem does
not have any files defined for default hierarchy.

But this seems to be working only if legacy files are defined in
ss->legacy_cftypes. If one adds some cftypes later using
cgroup_add_legacy_cftypes(), these files don't show up on default
hierarchy. Update the function accordingly so that the dynamically
added legacy files also show up in the default hierarchy if the target
subsystem is also using the base legacy files for the default
hierarchy.

tj: Patch description and comment updates.

Signed-off-by: Vivek Goyal
Signed-off-by: Tejun Heo

Vivek Goyal
2014-08-23 01:20:40 +0800

18 Aug, 2014

1 commit

71b1fb5c4 cgroup: reject cgroup names with '\n' ... Browse Code »

/proc//cgroup contains one cgroup path on each line. If cgroup names are
allowed to contain "\n", applications cannot parse /proc//cgroup safely.

Signed-off-by: Alban Crequy
Signed-off-by: Tejun Heo
Cc: stable@vger.kernel.org

Alban Crequy
2014-08-18 22:18:57 +0800

05 Aug, 2014

2 commits

47dfe4037 Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup changes from Tejun Heo:
"Mostly changes to get the v2 interface ready. The core features are
mostly ready now and I think it's reasonable to expect to drop the
devel mask in one or two devel cycles at least for a subset of
controllers.

- cgroup added a controller dependency mechanism so that block cgroup
can depend on memory cgroup. This will be used to finally support
IO provisioning on the writeback traffic, which is currently being
implemented.

- The v2 interface now uses a separate table so that the interface
files for the new interface are explicitly declared in one place.
Each controller will explicitly review and add the files for the
new interface.

- cpuset is getting ready for the hierarchical behavior which is in
the similar style with other controllers so that an ancestor's
configuration change doesn't change the descendants' configurations
irreversibly and processes aren't silently migrated when a CPU or
node goes down.

All the changes are to the new interface and no behavior changed for
the multiple hierarchies"

* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (29 commits)
cpuset: fix the WARN_ON() in update_nodemasks_hier()
cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test
cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core
cgroup: distinguish the default and legacy hierarchies when handling cftypes
cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes()
cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes
cgroup: split cgroup_base_files[] into cgroup_{dfl|legacy}_base_files[]
cpuset: export effective masks to userspace
cpuset: allow writing offlined masks to cpuset.cpus/mems
cpuset: enable onlined cpu/node in effective masks
cpuset: refactor cpuset_hotplug_update_tasks()
cpuset: make cs->{cpus, mems}_allowed as user-configured masks
cpuset: apply cs->effective_{cpus,mems}
cpuset: initialize top_cpuset's configured masks at mount
cpuset: use effective cpumask to build sched domains
cpuset: inherit ancestor's masks if effective_{cpus, mems} becomes empty
cpuset: update cs->effective_{cpus, mems} when config changes
cpuset: update cpuset->effective_{cpus,mems} at hotplug
cpuset: add cs->effective_cpus and cs->effective_mems
cgroup: clean up sane_behavior handling
...

Linus Torvalds
2014-08-05 01:11:28 +0800
f2a84170e Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

Pull percpu updates from Tejun Heo:

- Major reorganization of percpu header files which I think makes
things a lot more readable and logical than before.

- percpu-refcount is updated so that it requires explicit destruction
and can be reinitialized if necessary. This was pulled into the
block tree to replace the custom percpu refcnting implemented in
blk-mq.

- In the process, percpu and percpu-refcount got cleaned up a bit

* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits)
percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()
percpu-refcount: require percpu_ref to be exited explicitly
percpu-refcount: use unsigned long for pcpu_count pointer
percpu-refcount: add helpers for ->percpu_count accesses
percpu-refcount: one bit is enough for REF_STATUS
percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc()
workqueue: stronger test in process_one_work()
workqueue: clear POOL_DISASSOCIATED in rebind_workers()
percpu: Use ALIGN macro instead of hand coding alignment calculation
percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations
percpu: preffity percpu header files
percpu: use raw_cpu_*() to define __this_cpu_*()
percpu: reorder macros in percpu header files
percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.h
percpu: move generic {raw|this}_cpu_*_N() definitions to include/asm-generic/percpu.h
percpu: only allow sized arch overrides for {raw|this}_cpu_*() ops
percpu: reorganize include/linux/percpu-defs.h
percpu: move accessors from include/linux/percpu.h to percpu-defs.h
percpu: include/asm-generic/percpu.h should contain only arch-overridable parts
percpu: introduce arch_raw_cpu_ptr()
...

Linus Torvalds
2014-08-05 01:09:27 +0800

15 Jul, 2014

2 commits

5de4fa13c cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test ... Browse Code »

cgrp_dfl_root_inhibit_ss_mask determines which subsystems are not
supported on the default hierarchy and is currently initialized
statically and just includes the debug subsystem. Now that there's
cgroup_subsys->dfl_files, we can easily tell which subsystems support
the default hierarchy or not.

Let's initialize cgrp_dfl_root_inhibit_ss_mask by testing whether
cgroup_subsys->dfl_files is NULL. After all, subsystems with NULL
->dfl_files aren't useable on the default hierarchy anyway.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2014-07-15 23:05:10 +0800
05ebb6e60 cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core ... Browse Code »

cgroup now distinguishes cftypes for the default and legacy
hierarchies more explicitly by using separate arrays and
CFTYPE_ONLY_ON_DFL and CFTYPE_INSANE should be and are used only
inside cgroup core proper. Let's make it clear that the flags are
internal by prefixing them with double underscores.

CFTYPE_INSANE is renamed to __CFTYPE_NOT_ON_DFL for consistency. The
two flags are also collected and assigned bits >= 16 so that they
aren't mixed with the published flags.

v2: Convert the extra ones in cgroup_exit_cftypes() which are added by
revision to the previous patch.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2014-07-15 23:05:10 +0800