Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

18 Nov, 2014

6 commits

eeecbd197 cgroup: implement cgroup_get_e_css() ... Browse Code »

Implement cgroup_get_e_css() which finds and gets the effective css
for the specified cgroup and subsystem combination. This function
always returns a valid pinned css. This will be used by cgroup
writeback support.

While at it, add comment to cgroup_e_css() to explain why that
function is different from cgroup_get_e_css() and has to test
cgrp->child_subsys_mask instead of cgroup_css(cgrp, ss).

Signed-off-by: Tejun Heo
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:52 +0800
56c807ba4 cgroup: add cgroup_subsys->css_e_css_changed() ... Browse Code »
2

Add a new cgroup_subsys operatoin ->css_e_css_changed(). This is
invoked if any of the effective csses seen from the css's cgroup may
have changed. This will be used to implement cgroup writeback
support.

Signed-off-by: Tejun Heo
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:51 +0800
7d172cc89 cgroup: add cgroup_subsys->css_released() ... Browse Code »

Add a new cgroup subsys callback css_released(). This is called when
the reference count of the css (cgroup_subsys_state) reaches zero
before RCU scheduling free.

Signed-off-by: Tejun Heo
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:51 +0800
db6e30534 cgroup: fix the async css offline wait logic in cgroup_subtree_control_write() ... Browse Code »

When a subsystem is offlined, its entry on @cgrp->subsys[] is cleared
asynchronously. If cgroup_subtree_control_write() is requested to
enable the subsystem again before the entry is cleared, it has to wait
for the previous offlining to finish and clear the @cgrp->subsys[]
entry before trying to enable the subsystem again.

This is currently done while verifying the input enable / disable
parameters. This used to be correct but f63070d350e3 ("cgroup: make
interface files visible iff enabled on cgroup->subtree_control")
breaks it. The commit is one of the commits implementing subsystem
dependency.

Through subsystem dependency, some subsystems may be enabled and
disabled implicitly in addition to the explicitly requested ones. The
actual subsystems to be enabled and disabled are determined during
@css_enable/disable calculation. The current offline wait logic skips
the ones which are already implicitly enabled and then waits for
subsystems in @enable; however, this misses the subsystems which may
be implicitly enabled through dependency from @enable. If such
implicitly subsystem hasn't yet finished offlining yet, the function
ends up trying to create a css when its @cgrp->subsys[] slot is
already occupied triggering BUG_ON() in init_and_link_css().

Fix it by moving the wait logic after @css_enable is calculated and
waiting for all the subsystems in @css_enable. This fixes the above
bug as the mask contains all subsystems which are to be enabled
including the ones enabled through dependencies.

Signed-off-by: Tejun Heo
Fixes: f63070d350e3 ("cgroup: make interface files visible iff enabled on cgroup->subtree_control")
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:51 +0800
755bf5ee8 cgroup: restructure child_subsys_mask handling in cgroup_subtree_control_write() ... Browse Code »

Make cgroup_subtree_control_write() first calculate new
subtree_control (new_sc), child_subsys_mask (new_ss) and
css_enable/disable masks before applying them to the cgroup. Also,
store the original subtree_control (old_sc) and child_subsys_mask
(old_ss) and use them to restore the orignal state after failure.

This patch shouldn't cause any behavior changes. This prepares for a
fix for a bug in the async css offline wait logic.

Signed-off-by: Tejun Heo
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:50 +0800
0f060deb5 cgroup: separate out cgroup_calc_child_subsys_mask() from cgroup_refresh_child_subsys_mask() ... Browse Code »

cgroup_refresh_child_subsys_mask() calculates and updates the
effective @cgrp->child_subsys_maks according to the current
@cgrp->subtree_control. Separate out the calculation part into
cgroup_calc_child_subsys_mask(). This will be used to fix a bug in
the async css offline wait logic.

Signed-off-by: Tejun Heo
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:50 +0800

10 Oct, 2014

2 commits

c798360cd Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

Pull percpu updates from Tejun Heo:
"A lot of activities on percpu front. Notable changes are...

- percpu allocator now can take @gfp. If @gfp doesn't contain
GFP_KERNEL, it tries to allocate from what's already available to
the allocator and a work item tries to keep the reserve around
certain level so that these atomic allocations usually succeed.

This will replace the ad-hoc percpu memory pool used by
blk-throttle and also be used by the planned blkcg support for
writeback IOs.

Please note that I noticed a bug in how @gfp is interpreted while
preparing this pull request and applied the fix 6ae833c7fe0c
("percpu: fix how @gfp is interpreted by the percpu allocator")
just now.

- percpu_ref now uses longs for percpu and global counters instead of
ints. It leads to more sparse packing of the percpu counters on
64bit machines but the overhead should be negligible and this
allows using percpu_ref for refcnting pages and in-memory objects
directly.

- The switching between percpu and single counter modes of a
percpu_ref is made independent of putting the base ref and a
percpu_ref can now optionally be initialized in single or killed
mode. This allows avoiding percpu shutdown latency for cases where
the refcounted objects may be synchronously created and destroyed
in rapid succession with only a fraction of them reaching fully
operational status (SCSI probing does this when combined with
blk-mq support). It's also planned to be used to implement forced
single mode to detect underflow more timely for debugging.

There's a separate branch percpu/for-3.18-consistent-ops which cleans
up the duplicate percpu accessors. That branch causes a number of
conflicts with s390 and other trees. I'll send a separate pull
request w/ resolutions once other branches are merged"

* 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits)
percpu: fix how @gfp is interpreted by the percpu allocator
blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky
percpu_ref: add PERCPU_REF_INIT_* flags
percpu_ref: decouple switching to percpu mode and reinit
percpu_ref: decouple switching to atomic mode and killing
percpu_ref: add PCPU_REF_DEAD
percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
percpu_ref: replace pcpu_ prefix with percpu_
percpu_ref: minor code and comment updates
percpu_ref: relocate percpu_ref_reinit()
Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
Revert "percpu: free percpu allocation info for uniprocessor system"
percpu-refcount: make percpu_ref based on longs instead of ints
percpu-refcount: improve WARN messages
percpu: fix locking regression in the failure path of pcpu_alloc()
percpu-refcount: add @gfp to percpu_ref_init()
proportions: add @gfp to init functions
percpu_counter: add @gfp to percpu_counter_init()
percpu_counter: make percpu_counters_lock irq-safe
...

Linus Torvalds
2014-10-10 19:26:02 +0800
b211e9d7c Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:
"Nothing too interesting. Just a handful of cleanup patches"

* 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Revert "cgroup: remove redundant variable in cgroup_mount()"
cgroup: remove redundant variable in cgroup_mount()
cgroup: fix missing unlock in cgroup_release_agent()
cgroup: remove CGRP_RELEASABLE flag
perf/cgroup: Remove perf_put_cgroup()
cgroup: remove redundant check in cgroup_ino()
cpuset: simplify proc_cpuset_show()
cgroup: simplify proc_cgroup_show()
cgroup: use a per-cgroup work for release agent
cgroup: remove bogus comments
cgroup: remove redundant code in cgroup_rmdir()
cgroup: remove some useless forward declarations
cgroup: fix a typo in comment.

Linus Torvalds
2014-10-10 19:24:40 +0800

26 Sep, 2014

1 commit

e756c7b69 Revert "cgroup: remove redundant variable in cgroup_mount()" ... Browse Code »

This reverts commit 0c7bf3e8cab7900e17ce7f97104c39927d835469.

If there are child cgroups in the cgroupfs and then we umount it,
the superblock will be destroyed but the cgroup_root will be kept
around. When we mount it again, cgroup_mount() will find this
cgroup_root and allocate a new sb for it.

So with this commit we will be trapped in a dead loop in the case
described above, because kernfs_pin_sb() keeps returning NULL.

Currently I don't see how we can avoid using both pinned_sb and
new_sb, so just revert it.

Cc: Al Viro
Reported-by: Andrey Wagin
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Zefan Li
2014-09-26 12:16:23 +0800

25 Sep, 2014

2 commits

2aad2a86f percpu_ref: add PERCPU_REF_INIT_* flags ... Browse Code »

With the recent addition of percpu_ref_reinit(), percpu_ref now can be
used as a persistent switch which can be turned on and off repeatedly
where turning off maps to killing the ref and waiting for it to drain;
however, there currently isn't a way to initialize a percpu_ref in its
off (killed and drained) state, which can be inconvenient for certain
persistent switch use cases.

Similarly, percpu_ref_switch_to_atomic/percpu() allow dynamic
selection of operation mode; however, currently a newly initialized
percpu_ref is always in percpu mode making it impossible to avoid the
latency overhead of switching to atomic mode.

This patch adds @flags to percpu_ref_init() and implements the
following flags.

* PERCPU_REF_INIT_ATOMIC : start ref in atomic mode
* PERCPU_REF_INIT_DEAD : start ref killed and drained

These flags should be able to serve the above two use cases.

v2: target_core_tpg.c conversion was missing. Fixed.

Signed-off-by: Tejun Heo
Reviewed-by: Kent Overstreet
Cc: Jens Axboe
Cc: Christoph Hellwig
Cc: Johannes Weiner

Tejun Heo
2014-09-25 01:31:50 +0800
d06efebf0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/… ... Browse Code »

…linux-block into for-3.18

This is to receive 0a30288da1ae ("blk-mq, percpu_ref: implement a
kludge for SCSI blk-mq stall during probe") which implements
__percpu_ref_kill_expedited() to work around SCSI blk-mq stall. The
commit reverted and patches to implement proper fix will be added.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>

Tejun Heo
2014-09-25 01:00:21 +0800

21 Sep, 2014

2 commits

0c7bf3e8c cgroup: remove redundant variable in cgroup_mount() ... Browse Code »
13

Both pinned_sb and new_sb indicate if a new superblock is needed,
so we can just remove new_sb.

Note now we must check if kernfs_tryget_sb() returns NULL, because
when it returns NULL, kernfs_mount() may still re-use an existing
superblock, which is just allocated by another concurent mount.

Suggested-by: Tejun Heo
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Zefan Li
2014-09-21 01:09:35 +0800
3e2cd91ab cgroup: fix missing unlock in cgroup_release_agent() ... Browse Code »

The patch 971ff4935538: "cgroup: use a per-cgroup work for release
agent" from Sep 18, 2014, leads to the following static checker
warning:

kernel/cgroup.c:5310 cgroup_release_agent()
warn: 'mutex:&cgroup_mutex' is sometimes locked here and sometimes unlocked.

Reported-by: Dan Carpenter
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Zefan Li
2014-09-21 00:23:35 +0800

19 Sep, 2014

4 commits

a25eb52e8 cgroup: remove CGRP_RELEASABLE flag ... Browse Code »

We call put_css_set() after setting CGRP_RELEASABLE flag in
cgroup_task_migrate(), but in other places we call it without setting
the flag. I don't see the necessity of this flag.

Moreover once the flag is set, it will never be cleared, unless writing
to the notify_on_release control file, so it can be quite confusing
if we look at the output of debug.releasable.

# mount -t cgroup -o debug xxx /cgroup
# mkdir /cgroup/child
# cat /cgroup/child/debug.releasable
0 /cgroup/child/tasks
# cat /cgroup/child/debug.releasable
0
# echo $$ > /cgroup/tasks && echo $$ > /cgroup/child/tasks
# cat /proc/child/debug.releasable
1
Signed-off-by: Tejun Heo

Zefan Li
2014-09-19 21:29:32 +0800
006f4ac49 cgroup: simplify proc_cgroup_show() ... Browse Code »

Use the ONE macro instead of REG, and we can simplify proc_cgroup_show().

Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Zefan Li
2014-09-19 01:27:23 +0800
971ff4935 cgroup: use a per-cgroup work for release agent ... Browse Code »
13

Instead of using a global work to schedule release agent on removable
cgroups, we change to use a per-cgroup work to do this, which makes
the code much simpler.

v2: use a dedicated work instead of reusing css->destroy_work. (Tejun)

Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Zefan Li
2014-09-19 01:14:22 +0800
eb4aec84d cgroup: fix unbalanced locking ... Browse Code »
5

cgroup_pidlist_start() holds cgrp->pidlist_mutex and then calls
pidlist_array_load(), and cgroup_pidlist_stop() releases the mutex.

It is wrong that we release the mutex in the failure path in
pidlist_array_load(), because cgroup_pidlist_stop() will be called
no matter if cgroup_pidlist_start() returns errno or not.

Fixes: 4bac00d16a8760eae7205e41d2c246477d42a210
Cc: # 3.14+
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo
Acked-by: Cong Wang

Zefan Li
2014-09-19 00:32:52 +0800

18 Sep, 2014

4 commits

0c8fc2c12 cgroup: remove bogus comments ... Browse Code »

We never grab cgroup mutex in fork and exit paths no matter whether
notify_on_release is set or not.

Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Li Zefan
2014-09-18 05:34:16 +0800
244bb9a63 cgroup: remove redundant code in cgroup_rmdir() ... Browse Code »

We no longer clear kn->priv in cgroup_rmdir(), so we don't need
to get an extra refcnt.

Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Li Zefan
2014-09-18 05:34:15 +0800
6213daab2 cgroup: remove some useless forward declarations ... Browse Code »

Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Li Zefan
2014-09-18 05:34:15 +0800
9253b279f Merge branch 'for-3.17-fixes' of ra.kernel.org:/pub/scm/linux/kernel/git/tj/cgroup into for-3.18 ... Browse Code »

Pull to receive a4189487da1b ("cgroup: delay the clearing of
cgrp->kn->priv") for the scheduled clean up patches.

Signed-off-by: Tejun Heo

Tejun Heo
2014-09-18 05:29:05 +0800

08 Sep, 2014

1 commit

a34375ef9 percpu-refcount: add @gfp to percpu_ref_init() ... Browse Code »

Percpu allocator now supports allocation mask. Add @gfp to
percpu_ref_init() so that !GFP_KERNEL allocation masks can be used
with percpu_refs too.

This patch doesn't make any functional difference.

v2: blk-mq conversion was missing. Updated.

Signed-off-by: Tejun Heo
Cc: Kent Overstreet
Cc: Benjamin LaHaise
Cc: Li Zefan
Cc: Nicholas A. Bellinger
Cc: Jens Axboe

Tejun Heo
2014-09-08 08:51:30 +0800

05 Sep, 2014

2 commits

aa32362f0 cgroup: check cgroup liveliness before unbreaking kernfs ... Browse Code »

When cgroup_kn_lock_live() is called through some kernfs operation and
another thread is calling cgroup_rmdir(), we'll trigger the warning in
cgroup_get().

------------[ cut here ]------------
WARNING: CPU: 1 PID: 1228 at kernel/cgroup.c:1034 cgroup_get+0x89/0xa0()
...
Call Trace:
[] dump_stack+0x41/0x52
[] warn_slowpath_common+0x7f/0xa0
[] warn_slowpath_null+0x1d/0x20
[] cgroup_get+0x89/0xa0
[] cgroup_kn_lock_live+0x28/0x70
[] __cgroup_procs_write.isra.26+0x51/0x230
[] cgroup_tasks_write+0x12/0x20
[] cgroup_file_write+0x40/0x130
[] kernfs_fop_write+0xd1/0x160
[] vfs_write+0x98/0x1e0
[] SyS_write+0x4d/0xa0
[] sysenter_do_call+0x12/0x12
---[ end trace 6f2e0c38c2108a74 ]---

Fix this by calling css_tryget() instead of cgroup_get().

v2:
- move cgroup_tryget() right below cgroup_get() definition. (Tejun)

Cc: # 3.15+
Reported-by: Toralf Förster
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Li Zefan
2014-09-05 00:36:19 +0800
a4189487d cgroup: delay the clearing of cgrp->kn->priv ... Browse Code »
13

Run these two scripts concurrently:

for ((; ;))
{
mkdir /cgroup/sub
rmdir /cgroup/sub
}

for ((; ;))
{
echo $$ > /cgroup/sub/cgroup.procs
echo $$ > /cgroup/cgroup.procs
}

A kernel bug will be triggered:

BUG: unable to handle kernel NULL pointer dereference at 00000038
IP: [] cgroup_put+0x9/0x80
...
Call Trace:
[] cgroup_kn_unlock+0x39/0x50
[] cgroup_kn_lock_live+0x61/0x70
[] __cgroup_procs_write.isra.26+0x51/0x230
[] cgroup_tasks_write+0x12/0x20
[] cgroup_file_write+0x40/0x130
[] kernfs_fop_write+0xd1/0x160
[] vfs_write+0x98/0x1e0
[] SyS_write+0x4d/0xa0
[] sysenter_do_call+0x12/0x12

We clear cgrp->kn->priv in the end of cgroup_rmdir(), but another
concurrent thread can access kn->priv after the clearing.

We should move the clearing to css_release_work_fn(). At that time
no one is holding reference to the cgroup and no one can gain a new
reference to access it.

v2:
- move RCU_INIT_POINTER() into the else block. (Tejun)
- remove the cgroup_parent() check. (Tejun)
- update the comment in css_tryget_online_from_dir().

Cc: # 3.15+
Reported-by: Toralf Förster
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Li Zefan
2014-09-05 00:36:18 +0800

25 Aug, 2014

1 commit

251f8c036 cgroup: fix a typo in comment. ... Browse Code »

There is no function named cgroup_enable_task_cg_links().
Instead, the correct function name in this comment should
be cgroup_enabled_task_cg_lists().

Signed-off-by: Dongsheng Yang
Signed-off-by: Tejun Heo

Dongsheng Yang
2014-08-25 22:49:29 +0800

23 Aug, 2014

1 commit

fa8137be6 cgroup: Display legacy cgroup files on default hierarchy ... Browse Code »

Kernel command line parameter cgroup__DEVEL__legacy_files_on_dfl forces
legacy cgroup files to show up on default hierarhcy if susbsystem does
not have any files defined for default hierarchy.

But this seems to be working only if legacy files are defined in
ss->legacy_cftypes. If one adds some cftypes later using
cgroup_add_legacy_cftypes(), these files don't show up on default
hierarchy. Update the function accordingly so that the dynamically
added legacy files also show up in the default hierarchy if the target
subsystem is also using the base legacy files for the default
hierarchy.

tj: Patch description and comment updates.

Signed-off-by: Vivek Goyal
Signed-off-by: Tejun Heo

Vivek Goyal
2014-08-23 01:20:40 +0800

18 Aug, 2014

1 commit

71b1fb5c4 cgroup: reject cgroup names with '\n' ... Browse Code »

/proc//cgroup contains one cgroup path on each line. If cgroup names are
allowed to contain "\n", applications cannot parse /proc//cgroup safely.

Signed-off-by: Alban Crequy
Signed-off-by: Tejun Heo
Cc: stable@vger.kernel.org

Alban Crequy
2014-08-18 22:18:57 +0800

05 Aug, 2014

2 commits

47dfe4037 Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup changes from Tejun Heo:
"Mostly changes to get the v2 interface ready. The core features are
mostly ready now and I think it's reasonable to expect to drop the
devel mask in one or two devel cycles at least for a subset of
controllers.

- cgroup added a controller dependency mechanism so that block cgroup
can depend on memory cgroup. This will be used to finally support
IO provisioning on the writeback traffic, which is currently being
implemented.

- The v2 interface now uses a separate table so that the interface
files for the new interface are explicitly declared in one place.
Each controller will explicitly review and add the files for the
new interface.

- cpuset is getting ready for the hierarchical behavior which is in
the similar style with other controllers so that an ancestor's
configuration change doesn't change the descendants' configurations
irreversibly and processes aren't silently migrated when a CPU or
node goes down.

All the changes are to the new interface and no behavior changed for
the multiple hierarchies"

* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (29 commits)
cpuset: fix the WARN_ON() in update_nodemasks_hier()
cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test
cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core
cgroup: distinguish the default and legacy hierarchies when handling cftypes
cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes()
cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes
cgroup: split cgroup_base_files[] into cgroup_{dfl|legacy}_base_files[]
cpuset: export effective masks to userspace
cpuset: allow writing offlined masks to cpuset.cpus/mems
cpuset: enable onlined cpu/node in effective masks
cpuset: refactor cpuset_hotplug_update_tasks()
cpuset: make cs->{cpus, mems}_allowed as user-configured masks
cpuset: apply cs->effective_{cpus,mems}
cpuset: initialize top_cpuset's configured masks at mount
cpuset: use effective cpumask to build sched domains
cpuset: inherit ancestor's masks if effective_{cpus, mems} becomes empty
cpuset: update cs->effective_{cpus, mems} when config changes
cpuset: update cpuset->effective_{cpus,mems} at hotplug
cpuset: add cs->effective_cpus and cs->effective_mems
cgroup: clean up sane_behavior handling
...

Linus Torvalds
2014-08-05 01:11:28 +0800
f2a84170e Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

Pull percpu updates from Tejun Heo:

- Major reorganization of percpu header files which I think makes
things a lot more readable and logical than before.

- percpu-refcount is updated so that it requires explicit destruction
and can be reinitialized if necessary. This was pulled into the
block tree to replace the custom percpu refcnting implemented in
blk-mq.

- In the process, percpu and percpu-refcount got cleaned up a bit

* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits)
percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()
percpu-refcount: require percpu_ref to be exited explicitly
percpu-refcount: use unsigned long for pcpu_count pointer
percpu-refcount: add helpers for ->percpu_count accesses
percpu-refcount: one bit is enough for REF_STATUS
percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc()
workqueue: stronger test in process_one_work()
workqueue: clear POOL_DISASSOCIATED in rebind_workers()
percpu: Use ALIGN macro instead of hand coding alignment calculation
percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations
percpu: preffity percpu header files
percpu: use raw_cpu_*() to define __this_cpu_*()
percpu: reorder macros in percpu header files
percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.h
percpu: move generic {raw|this}_cpu_*_N() definitions to include/asm-generic/percpu.h
percpu: only allow sized arch overrides for {raw|this}_cpu_*() ops
percpu: reorganize include/linux/percpu-defs.h
percpu: move accessors from include/linux/percpu.h to percpu-defs.h
percpu: include/asm-generic/percpu.h should contain only arch-overridable parts
percpu: introduce arch_raw_cpu_ptr()
...

Linus Torvalds
2014-08-05 01:09:27 +0800

15 Jul, 2014

6 commits

5de4fa13c cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test ... Browse Code »

cgrp_dfl_root_inhibit_ss_mask determines which subsystems are not
supported on the default hierarchy and is currently initialized
statically and just includes the debug subsystem. Now that there's
cgroup_subsys->dfl_files, we can easily tell which subsystems support
the default hierarchy or not.

Let's initialize cgrp_dfl_root_inhibit_ss_mask by testing whether
cgroup_subsys->dfl_files is NULL. After all, subsystems with NULL
->dfl_files aren't useable on the default hierarchy anyway.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2014-07-15 23:05:10 +0800
05ebb6e60 cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core ... Browse Code »

cgroup now distinguishes cftypes for the default and legacy
hierarchies more explicitly by using separate arrays and
CFTYPE_ONLY_ON_DFL and CFTYPE_INSANE should be and are used only
inside cgroup core proper. Let's make it clear that the flags are
internal by prefixing them with double underscores.

CFTYPE_INSANE is renamed to __CFTYPE_NOT_ON_DFL for consistency. The
two flags are also collected and assigned bits >= 16 so that they
aren't mixed with the published flags.

v2: Convert the extra ones in cgroup_exit_cftypes() which are added by
revision to the previous patch.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2014-07-15 23:05:10 +0800
a8ddc8215 cgroup: distinguish the default and legacy hierarchies when handling cftypes ... Browse Code »

Until now, cftype arrays carried files for both the default and legacy
hierarchies and the files which needed to be used on only one of them
were flagged with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE. This
gets confusing very quickly and we may end up exposing interface files
to the default hierarchy without thinking it through.

This patch makes cgroup core provide separate sets of interfaces for
cftype handling so that the cftypes for the default and legacy
hierarchies are clearly distinguished. The previous two patches
renamed the existing ones so that they clearly indicate that they're
for the legacy hierarchies. This patch adds the interface for the
default hierarchy and apply them selectively depending on the
hierarchy type.

* cftypes added through cgroup_subsys->dfl_cftypes and
cgroup_add_dfl_cftypes() only show up on the default hierarchy.

* cftypes added through cgroup_subsys->legacy_cftypes and
cgroup_add_legacy_cftypes() only show up on the legacy hierarchies.

* cgroup_subsys->dfl_cftypes and ->legacy_cftypes can point to the
same array for the cases where the interface files are identical on
both types of hierarchies.

* This makes all the existing subsystem interface files legacy-only by
default and all subsystems will have no interface file created when
enabled on the default hierarchy. Each subsystem should explicitly
review and compose the interface for the default hierarchy.

* A boot param "cgroup__DEVEL__legacy_files_on_dfl" is added which
makes subsystems which haven't decided the interface files for the
default hierarchy to present the legacy files on the default
hierarchy so that its behavior on the default hierarchy can be
tested. As the awkward name suggests, this is for development only.

* memcg's CFTYPE_INSANE on "use_hierarchy" is noop now as the whole
array isn't used on the default hierarchy. The flag is removed.

v2: Updated documentation for cgroup__DEVEL__legacy_files_on_dfl.

v3: Clear CFTYPE_ONLY_ON_DFL and CFTYPE_INSANE when cfts are removed
as suggested by Li.

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Acked-by: Li Zefan
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Vivek Goyal
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Aristeu Rozanski
Cc: Aneesh Kumar K.V

Tejun Heo
2014-07-15 23:05:10 +0800
2cf669a58 cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes() ... Browse Code »

Currently, cftypes added by cgroup_add_cftypes() are used for both the
unified default hierarchy and legacy ones and subsystems can mark each
file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to
appear only on one of them. This is quite hairy and error-prone.
Also, we may end up exposing interface files to the default hierarchy
without thinking it through.

cgroup_subsys will grow two separate cftype addition functions and
apply each only on the hierarchies of the matching type. This will
allow organizing cftypes in a lot clearer way and encourage subsystems
to scrutinize the interface which is being exposed in the new default
hierarchy.

In preparation, this patch adds cgroup_add_legacy_cftypes() which
currently is a simple wrapper around cgroup_add_cftypes() and replaces
all cgroup_add_cftypes() usages with it.

While at it, this patch drops a completely spurious return from
__hugetlb_cgroup_file_init().

This patch doesn't introduce any functional differences.

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Acked-by: Li Zefan
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Aneesh Kumar K.V

Tejun Heo
2014-07-15 23:05:09 +0800
5577964e6 cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes ... Browse Code »

Currently, cgroup_subsys->base_cftypes is used for both the unified
default hierarchy and legacy ones and subsystems can mark each file
with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear
only on one of them. This is quite hairy and error-prone. Also, we
may end up exposing interface files to the default hierarchy without
thinking it through.

cgroup_subsys will grow two separate cftype arrays and apply each only
on the hierarchies of the matching type. This will allow organizing
cftypes in a lot clearer way and encourage subsystems to scrutinize
the interface which is being exposed in the new default hierarchy.

In preparation, this patch renames cgroup_subsys->base_cftypes to
cgroup_subsys->legacy_cftypes. This patch is pure rename.

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Acked-by: Li Zefan
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Vivek Goyal
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Aristeu Rozanski
Cc: Aneesh Kumar K.V

Tejun Heo
2014-07-15 23:05:09 +0800
a14c6874b cgroup: split cgroup_base_files[] into cgroup_{dfl|legacy}_base_files[] ... Browse Code »

Currently cgroup_base_files[] contains the cgroup core interface files
for both legacy and default hierarchies with each file tagged with
CFTYPE_INSANE and CFTYPE_ONLY_ON_DFL. This is difficult to read.

Let's separate it out to two separate tables, cgroup_dfl_base_files[]
and cgroup_legacy_base_files[], and use the appropriate one in
cgroup_mkdir() depending on the hierarchy type. This makes tagging
each file unnecessary.

This patch doesn't introduce any behavior changes.

v2: cgroup_dfl_base_files[] was missing the termination entry
triggering WARN in cgroup_init_cftypes() for 0day kernel testing
robot. Fixed.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Cc: Jet Chen

Tejun Heo
2014-07-15 23:05:09 +0800

09 Jul, 2014

5 commits

7b9a6ba56 cgroup: clean up sane_behavior handling ... Browse Code »

After the previous patch to remove sane_behavior support from
non-default hierarchies, CGRP_ROOT_SANE_BEHAVIOR is used only to
indicate the default hierarchy while parsing mount options. This
patch makes the following cleanups around it.

* Don't show it in the mount option. Eventually the default hierarchy
will be assigned a different filesystem type.

* As sane_behavior is no longer effective on non-default hierarchies
and the default hierarchy doesn't accept any mount options,
parse_cgroupfs_options() can consider sane_behavior mount option as
indicating the default hierarchy and fail if any other options are
specified with it. While at it, remove one of the double blank
lines in the function.

* cgroup_mount() can now simply test CGRP_ROOT_SANE_BEHAVIOR to tell
whether to mount the default hierarchy or not.

* As CGROUP_ROOT_SANE_BEHAVIOR's only role now is indicating whether
to select the default hierarchy or not during mount, it doesn't need
to be set in the default hierarchy itself. cgroup_init_early()
updated accordingly.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2014-07-09 22:08:08 +0800
aa6ec29be cgroup: remove sane_behavior support on non-default hierarchies ... Browse Code »

sane_behavior has been used as a development vehicle for the default
unified hierarchy. Now that the default hierarchy is in place, the
flag became redundant and confusing as its usage is allowed on all
hierarchies. There are gonna be either the default hierarchy or
legacy ones. Let's make that clear by removing sane_behavior support
on non-default hierarchies.

This patch replaces cgroup_sane_behavior() with cgroup_on_dfl(). The
comment on top of CGRP_ROOT_SANE_BEHAVIOR is moved to on top of
cgroup_on_dfl() with sane_behavior specific part dropped.

On the default and legacy hierarchies w/o sane_behavior, this
shouldn't cause any behavior differences.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal
Acked-by: Li Zefan
Cc: Johannes Weiner
Cc: Michal Hocko

Tejun Heo
2014-07-09 22:08:08 +0800
c1d5d42ef cgroup: make interface file "cgroup.sane_behavior" legacy-only ... Browse Code »

"cgroup.sane_behavior" is added to help distinguishing whether
sane_behavior is in effect or not. We now have the default hierarchy
where the flag is always in effect and are planning to remove
supporting sane behavior on the legacy hierarchies making this file on
the default hierarchy rather pointless. Let's make it legacy only and
thus always zero.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2014-07-09 22:08:08 +0800
7450e90bb cgroup: remove CGRP_ROOT_OPTION_MASK ... Browse Code »

cgroup_root->flags only contains CGRP_ROOT_* flags and there's no
reason to mask the flags. Remove CGRP_ROOT_OPTION_MASK.

This doesn't cause any behavior differences.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2014-07-09 22:08:07 +0800
af0ba6789 cgroup: implement cgroup_subsys->depends_on ... Browse Code »

Currently, the blkio subsystem attributes all of writeback IOs to the
root. One of the issues is that there's no way to tell who originated
a writeback IO from block layer. Those IOs are usually issued
asynchronously from a task which didn't have anything to do with
actually generating the dirty pages. The memory subsystem, when
enabled, already keeps track of the ownership of each dirty page and
it's desirable for blkio to piggyback instead of adding its own
per-page tag.

blkio piggybacking on memory is an implementation detail which
preferably should be handled automatically without requiring explicit
userland action. To achieve that, this patch implements
cgroup_subsys->depends_on which contains the mask of subsystems which
should be enabled together when the subsystem is enabled.

The previous patches already implemented the support for enabled but
invisible subsystems and cgroup_subsys->depends_on can be easily
implemented by updating cgroup_refresh_child_subsys_mask() so that it
calculates cgroup->child_subsys_mask considering
cgroup_subsys->depends_on of the explicitly enabled subsystems.

Documentation/cgroups/unified-hierarchy.txt is updated to explain that
subsystems may not become immediately available after being unused
from userland and that dependency could be a factor in it. As
subsystems may already keep residual references, this doesn't
significantly change how subsystem rebinding can be used.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Johannes Weiner

Tejun Heo
2014-07-09 06:02:57 +0800