24 Sep, 2013

1 commit

  • The only user of css_id was memcg, and it has been convered to use
    cgroup->id, so kill css_id.

    Signed-off-by: Li Zefan
    Reviewed-by: Michal Hocko
    Acked-by: Tejun Heo
    Signed-off-by: Tejun Heo

    Li Zefan
     

27 Aug, 2013

2 commits

  • When cgroup files are created, cgroup core automatically prepends the
    name of the subsystem as prefix. This patch adds CFTYPE_NO_ which
    disables the automatic prefix. This is to work around historical
    baggages and shouldn't be used for new files.

    This will be used to move "cgroup.event_control" from cgroup core to
    memcg.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Kirill A. Shutemov
    Cc: Glauber Costa

    Tejun Heo
     
  • cgroup_css_from_dir() will grow another user. In preparation, make
    the following changes.

    * All css functions are prefixed with just "css_", rename it to
    css_from_dir().

    * Take dentry * instead of file * as dentry is what ultimately
    identifies a cgroup and file may not always be available. Note that
    the function now checkes whether @dentry->d_inode is NULL as the
    caller now may specify a negative dentry.

    * Make it take cgroup_subsys * instead of integer subsys_id. This
    simplifies the function and allows specifying no subsystem for
    cgroup->dummy_css.

    * Make return section a bit less verbose.

    This patch doesn't introduce any behavior changes.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Kirill A. Shutemov
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar

    Tejun Heo
     

19 Aug, 2013

1 commit

  • Now we want cgroup core to always provide the css to use to the
    subsystems, so change this API to css_from_id().

    Uninline css_from_id(), because it's getting bigger and cgroup_css()
    has been unexported.

    While at it, remove the #ifdef, and shuffle the order of the args.

    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo

    Li Zefan
     

14 Aug, 2013

3 commits

  • With the planned unified hierarchy, individual css's will be created
    and destroyed dynamically across the lifetime of a cgroup. To enable
    such usages, css destruction is being decoupled from cgroup
    destruction. Most of the destruction path has been decoupled but the
    actual free of css still depends on cgroup free path.

    When all css refs are drained, css_release() kicks off
    css_free_work_fn() which puts the cgroup. When the cgroup refcnt
    reaches zero, cgroup_diput() is invoked which in turn schedules RCU
    free of the cgroup. After a grace period, all css's are freed along
    with the cgroup itself.

    This patch moves the RCU grace period and css freeing from cgroup
    release path to css release path. css_release(), instead of kicking
    off css_free_work_fn() directly, schedules RCU callback
    css_free_rcu_fn() which in turn kicks off css_free_work_fn() after a
    RCU grace period. css_free_work_fn() is updated to free the css
    directly.

    The five-way punting - percpu ref kill confirmation, a work item,
    percpu ref release, RCU grace period, and again a work item - is quite
    hairy but the work items are there only to provide process context and
    the actual sequence is kill confirm -> release -> RCU free, which
    isn't simple but not too crazy.

    This removes cgroup_css() usage after offline_css() allowing clearing
    cgroup->subsys[] from offline_css(), which makes it consistent with
    online_css() and brings it closer to proper lifetime management for
    individual css's.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • Currently, css (cgroup_subsys_state) lifetime is tied to that of the
    associated cgroup. css's are created when the associated cgroup is
    created and destroyed when it gets destroyed. Also, individual css's
    aren't RCU protected but the whole cgroup is. With the planned
    unified hierarchy, css's will need to be dynamically created and
    destroyed within the lifetime of a cgroup.

    To enable such usages, this patch decouples css destruction from
    cgroup destruction - offline_css() invocation and the final css_put()
    are moved from cgroup_destroy_css_killed() to css_killed_work_fn().
    Now each css is individually offlined and put as its reference count
    is killed instead of waiting for all css's attached to the cgroup to
    finish refcnt killing and then proceeding to offlining and putting
    them together.

    While this changes the order of destruction operations, the changes
    shouldn't be noticeable to cgroup subsystems or userland.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • Currently, css (cgroup_subsys_state) lifetime is tied to that of the
    associated cgroup. With the planned unified hierarchy, css's will be
    dynamically created and destroyed within the lifetime of a cgroup. To
    enable such usages, css's will be individually RCU protected instead
    of being tied to the cgroup.

    cgroup->css_kill_cnt is used during cgroup destruction to wait for css
    reference count disable; however, this model doesn't work once css's
    lifetimes are managed separately from cgroup's. This patch replaces
    it with cgroup->nr_css which is an cgroup_mutex protected integer
    counting the number of attached css's. The count is incremented from
    online_css() and decremented after refcnt kill is confirmed. If the
    count reaches zero and the cgroup is marked dead, the second stage of
    cgroup destruction is kicked off. If a cgroup doesn't have any css
    attached at the time of rmdir, cgroup_destroy_locked() now invokes the
    second stage directly as no css kill confirmation would happen.

    cgroup_offline_fn() - the second step of cgroup destruction - is
    renamed to cgroup_destroy_css_killed() and now expects to be called
    with cgroup_mutex held.

    While this patch changes how css destruction is punted to work items,
    it shouldn't change any visible behavior.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     

13 Aug, 2013

3 commits

  • For the planned unified hierarchy, each css (cgroup_subsys_state) will
    be RCU protected so that it can be created and destroyed individually
    while allowing RCU accesses. Previous changes ensured that all
    cgroup->subsys[] accesses use the cgroup_css() accessor. This patch
    adds __rcu modifier to cgroup->subsys[], add matching RCU dereference
    in cgroup_css() and convert all assignments to either
    rcu_assign_pointer() or RCU_INIT_POINTER().

    This change prepares for the actual RCUfication of css's and doesn't
    introduce any visible behavior change. The conversion is verified
    with sparse and all accesses are properly RCU annotated.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • With the planned unified hierarchy, css's (cgroup_subsys_state) will
    be RCU protected and allowed to be attached and detached dynamically
    over the course of a cgroup's lifetime. This means that css's will
    stay accessible after being detached from its cgroup - the matching
    pointer in cgroup->subsys[] cleared - for ref draining and RCU grace
    period.

    cgroup core still wants to guarantee that the parent css is never
    destroyed before its children and css_parent() always returns the
    parent regardless of the state of the child css as long as it's
    accessible.

    This patch makes css's hold onto their parents and adds css->parent so
    that the parent css is never detroyed before its children and can be
    determined without consulting the cgroups.

    cgroup->dummy_css is also updated to point to the parent dummy_css;
    however, it doesn't need to worry about object lifetime as the parent
    cgroup is already pinned by the child.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • css (cgroup_subsys_state) will become RCU protected and there will be
    two stages which require punting to work item during release. To
    prepare for using the work item for multiple times, rename
    css->dput_work to css->destroy_work and css_dput_fn() to
    css_free_work_fn() and move work item initialization from css init to
    right before the actual usage.

    This reorganization doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     

09 Aug, 2013

18 commits

  • Previously, all css descendant iterators didn't include the origin
    (root of subtree) css in the iteration. The reasons were maintaining
    consistency with css_for_each_child() and that at the time of
    introduction more use cases needed skipping the origin anyway;
    however, given that css_is_descendant() considers self to be a
    descendant, omitting the origin css has become more confusing and
    looking at the accumulated use cases rather clearly indicates that
    including origin would result in simpler code overall.

    While this is a change which can easily lead to subtle bugs, cgroup
    API including the iterators has recently gone through major
    restructuring and no out-of-tree changes will be applicable without
    adjustments making this a relatively acceptable opportunity for this
    type of change.

    The conversions are mostly straight-forward. If the iteration block
    had explicit origin handling before or after, it's moved inside the
    iteration. If not, if (pos == origin) continue; is added. Some
    conversions add extra reference get/put around origin handling by
    consolidating origin handling and the rest. While the extra ref
    operations aren't strictly necessary, this shouldn't cause any
    noticeable difference.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Vivek Goyal
    Acked-by: Aristeu Rozanski
    Acked-by: Michal Hocko
    Cc: Jens Axboe
    Cc: Matt Helsley
    Cc: Johannes Weiner
    Cc: Balbir Singh

    Tejun Heo
     
  • cgroup_css() no longer has any user left outside cgroup.c proper and
    we don't want subsystems to grow new usages of the function. cgroup
    core should always provide the css to use to the subsystems, which
    will make dynamic creation and destruction of css's across the
    lifetime of a cgroup much more manageable than exposing the cgroup
    directly to subsystems and let them dereference css's from it.

    Make cgroup_css() a static function in cgroup.c.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • cgroup is in the process of converting to css (cgroup_subsys_state)
    from cgroup as the principal subsystem interface handle. This is
    mostly to prepare for the unified hierarchy support where css's will
    be created and destroyed dynamically but also helps cleaning up
    subsystem implementations as css is usually what they are interested
    in anyway.

    cgroup_taskset which is used by the subsystem attach methods is the
    last cgroup subsystem API which isn't using css as the handle. Update
    cgroup_taskset_cur_cgroup() to cgroup_taskset_cur_css() and
    cgroup_taskset_for_each() to take @skip_css instead of @skip_cgrp.

    The conversions are pretty mechanical. One exception is
    cpuset::cgroup_cs(), which lost its last user and got removed.

    This patch shouldn't introduce any functional changes.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Daniel Wagner
    Cc: Ingo Molnar
    Cc: Matt Helsley
    Cc: Steven Rostedt

    Tejun Heo
     
  • cgroup is in the process of converting to css (cgroup_subsys_state)
    from cgroup as the principal subsystem interface handle. This is
    mostly to prepare for the unified hierarchy support where css's will
    be created and destroyed dynamically but also helps cleaning up
    subsystem implementations as css is usually what they are interested
    in anyway.

    cftype->[un]register_event() is among the remaining couple interfaces
    which still use struct cgroup. Convert it to cgroup_subsys_state.
    The conversion is mostly mechanical and removes the last users of
    mem_cgroup_from_cont() and cg_to_vmpressure(), which are removed.

    v2: indentation update as suggested by Li Zefan.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: Balbir Singh

    Tejun Heo
     
  • cgroup is in the process of converting to css (cgroup_subsys_state)
    from cgroup as the principal subsystem interface handle. This is
    mostly to prepare for the unified hierarchy support where css's will
    be created and destroyed dynamically but also helps cleaning up
    subsystem implementations as css is usually what they are interested
    in anyway.

    This patch converts task iterators to deal with css instead of cgroup.
    Note that under unified hierarchy, different sets of tasks will be
    considered belonging to a given cgroup depending on the subsystem in
    question and making the iterators deal with css instead cgroup
    provides them with enough information about the iteration.

    While at it, fix several function comment formats in cpuset.c.

    This patch doesn't introduce any behavior differences.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: Matt Helsley

    Tejun Heo
     
  • cgroup_scan_tasks() takes a pointer to struct cgroup_scanner as its
    sole argument and the only function of that struct is packing the
    arguments of the function call which are consisted of five fields.
    It's not too unusual to pack parameters into a struct when the number
    of arguments gets excessive or the whole set needs to be passed around
    a lot, but neither holds here making it just weird.

    Drop struct cgroup_scanner and pass the params directly to
    cgroup_scan_tasks(). Note that struct cpuset_change_nodemask_arg was
    added to cpuset.c to pass both ->cs and ->newmems pointer to
    cpuset_change_nodemask() using single data pointer.

    This doesn't make any functional differences.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • Currently all cgroup_task_iter functions require @cgrp to be passed
    in, which is superflous and increases chance of usage error. Make
    cgroup_task_iter remember the cgroup being iterated and drop @cgrp
    argument from next and end functions.

    This patch doesn't introduce any behavior differences.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Cc: Matt Helsley
    Cc: Johannes Weiner
    Cc: Balbir Singh

    Tejun Heo
     
  • cgroup now has multiple iterators and it's quite confusing to have
    something which walks over tasks of a single cgroup named cgroup_iter.
    Let's rename it to cgroup_task_iter.

    While at it, reformat / update comments and replace the overview
    comment above the interface function decls with proper function
    comments. Such overview can be useful but function comments should be
    more than enough here.

    This is pure rename and doesn't introduce any functional changes.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Cc: Matt Helsley
    Cc: Johannes Weiner
    Cc: Balbir Singh

    Tejun Heo
     
  • cgroup is currently in the process of transitioning to using css
    (cgroup_subsys_state) as the primary handle instead of cgroup in
    subsystem API. For hierarchy iterators, this is beneficial because

    * In most cases, css is the only thing subsystems care about anyway.

    * On the planned unified hierarchy, iterations for different
    subsystems will need to skip over different subtrees of the
    hierarchy depending on which subsystems are enabled on each cgroup.
    Passing around css makes it unnecessary to explicitly specify the
    subsystem in question as css is intersection between cgroup and
    subsystem

    * For the planned unified hierarchy, css's would need to be created
    and destroyed dynamically independent from cgroup hierarchy. Having
    cgroup core manage css iteration makes enforcing deref rules a lot
    easier.

    Most subsystem conversions are straight-forward. Noteworthy changes
    are

    * blkio: cgroup_to_blkcg() is no longer used. Removed.

    * freezer: cgroup_freezer() is no longer used. Removed.

    * devices: cgroup_to_devcgroup() is no longer used. Removed.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Acked-by: Vivek Goyal
    Acked-by: Aristeu Rozanski
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: Matt Helsley
    Cc: Jens Axboe

    Tejun Heo
     
  • There are several places where the children list is accessed directly.
    This patch converts those places to use cgroup_next_child(). This
    will help updating the hierarchy iterators to use @css instead of
    @cgrp.

    While cgroup_next_child() can be heavy in pathological cases - e.g. a
    lot of dead children, this shouldn't cause any noticeable behavior
    differences.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • cgroup is transitioning to using css (cgroup_subsys_state) as the main
    subsys interface handle instead of cgroup and the iterators will be
    updated to use css too. The iterators need to walk the cgroup
    hierarchy and return the css's matching the origin css, which is a bit
    cumbersome to open code.

    This patch converts cgroup_next_sibling() to cgroup_next_child() so
    that it can handle all steps of direct child iteration. This will be
    used to update iterators to take @css instead of @cgrp. In addition
    to the new iteration init handling, cgroup_next_child() is
    restructured so that the different branches share the end of iteration
    condition check.

    This patch doesn't change any behavior.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • cgroup is currently in the process of transitioning to using struct
    cgroup_subsys_state * as the primary handle instead of struct cgroup.
    Please see the previous commit which converts the subsystem methods
    for rationale.

    This patch converts all cftype file operations to take @css instead of
    @cgroup. cftypes for the cgroup core files don't have their subsytem
    pointer set. These will automatically use the dummy_css added by the
    previous patch and can be converted the same way.

    Most subsystem conversions are straight forwards but there are some
    interesting ones.

    * freezer: update_if_frozen() is also converted to take @css instead
    of @cgroup for consistency. This will make the code look simpler
    too once iterators are converted to use css.

    * memory/vmpressure: mem_cgroup_from_css() needs to be exported to
    vmpressure while mem_cgroup_from_cont() can be made static.
    Updated accordingly.

    * cpu: cgroup_tg() doesn't have any user left. Removed.

    * cpuacct: cgroup_ca() doesn't have any user left. Removed.

    * hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
    Removed.

    * net_cls: cgrp_cls_state() doesn't have any user left. Removed.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Acked-by: Vivek Goyal
    Acked-by: Aristeu Rozanski
    Acked-by: Daniel Wagner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: Matt Helsley
    Cc: Jens Axboe
    Cc: Steven Rostedt

    Tejun Heo
     
  • cgroup subsystem API is being converted to use css
    (cgroup_subsys_state) as the main handle, which makes things a bit
    awkward for subsystem agnostic core features - the "cgroup.*"
    interface files and various iterations - a bit awkward as they don't
    have a css to use.

    This patch adds cgroup->dummy_css which has NULL ->ss and whose only
    role is pointing back to the cgroup. This will be used to support
    subsystem agnostic features on the coming css based API.

    css_parent() is updated to handle dummy_css's. Note that css will
    soon grow its own ->parent field and css_parent() will be made
    trivial.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • cgroup is transitioning to using css (cgroup_subsys_state) instead of
    cgroup as the primary subsystem handle. The cgroupfs file interface
    will be converted to use css's which requires finding out the
    subsystem from cftype so that the matching css can be determined from
    the cgroup.

    This patch adds cftype->ss which points to the subsystem the file
    belongs to. The field is initialized while a cftype is being
    registered. This makes it unnecessary to explicitly specify the
    subsystem for other cftype handling functions. @ss argument dropped
    from various cftype handling functions.

    This patch shouldn't introduce any behavior differences.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Vivek Goyal
    Cc: Jens Axboe

    Tejun Heo
     
  • cgroup is currently in the process of transitioning to using struct
    cgroup_subsys_state * as the primary handle instead of struct cgroup *
    in subsystem implementations for the following reasons.

    * With unified hierarchy, subsystems will be dynamically bound and
    unbound from cgroups and thus css's (cgroup_subsys_state) may be
    created and destroyed dynamically over the lifetime of a cgroup,
    which is different from the current state where all css's are
    allocated and destroyed together with the associated cgroup. This
    in turn means that cgroup_css() should be synchronized and may
    return NULL, making it more cumbersome to use.

    * Differing levels of per-subsystem granularity in the unified
    hierarchy means that the task and descendant iterators should behave
    differently depending on the specific subsystem the iteration is
    being performed for.

    * In majority of the cases, subsystems only care about its part in the
    cgroup hierarchy - ie. the hierarchy of css's. Subsystem methods
    often obtain the matching css pointer from the cgroup and don't
    bother with the cgroup pointer itself. Passing around css fits
    much better.

    This patch converts all cgroup_subsys methods to take @css instead of
    @cgroup. The conversions are mostly straight-forward. A few
    noteworthy changes are

    * ->css_alloc() now takes css of the parent cgroup rather than the
    pointer to the new cgroup as the css for the new cgroup doesn't
    exist yet. Knowing the parent css is enough for all the existing
    subsystems.

    * In kernel/cgroup.c::offline_css(), unnecessary open coded css
    dereference is replaced with local variable access.

    This patch shouldn't cause any behavior differences.

    v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced
    with local variable @css as suggested by Li Zefan.

    Rebased on top of new for-3.12 which includes for-3.11-fixes so
    that ->css_free() invocation added by da0a12caff ("cgroup: fix a
    leak when percpu_ref_init() fails") is converted too. Suggested
    by Li Zefan.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Acked-by: Vivek Goyal
    Acked-by: Aristeu Rozanski
    Acked-by: Daniel Wagner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: Matt Helsley
    Cc: Jens Axboe
    Cc: Steven Rostedt

    Tejun Heo
     
  • Currently, controllers have to explicitly follow the cgroup hierarchy
    to find the parent of a given css. cgroup is moving towards using
    cgroup_subsys_state as the main controller interface construct, so
    let's provide a way to climb the hierarchy using just csses.

    This patch implements css_parent() which, given a css, returns its
    parent. The function is guarnateed to valid non-NULL parent css as
    long as the target css is not at the top of the hierarchy.

    freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
    are converted to use css_parent() instead of accessing cgroup->parent
    directly.

    * __parent_ca() is dropped from cpuacct and its usage is replaced with
    parent_ca(). The only difference between the two was NULL test on
    cgroup->parent which is now embedded in css_parent() making the
    distinction moot. Note that eventually a css->parent field will be
    added to css and the NULL check in css_parent() will go away.

    This patch shouldn't cause any behavior differences.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • Currently, given a cgroup_subsys_state, there's no way to find out
    which subsystem the css is for, which we'll need to convert the cgroup
    controller API to primarily use @css instead of @cgroup. This patch
    adds cgroup_subsys_state->ss which points to the subsystem the @css
    belongs to.

    While at it, remove the comment about accessing @css->cgroup to
    determine the hierarchy. cgroup core will provide API to traverse
    hierarchy of css'es and we don't want subsystems to directly walk
    cgroup hierarchies anymore.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • The names of the two struct cgroup_subsys_state accessors -
    cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
    The former clashes with the type name and the latter doesn't even
    indicate it's somehow related to cgroup.

    We're about to revamp large portion of cgroup API, so, let's rename
    them so that they're less awkward. Most per-controller usages of the
    accessors are localized in accessor wrappers and given the amount of
    scheduled changes, this isn't gonna add any noticeable headache.

    Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
    to task_css(). This patch is pure rename.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     

03 Aug, 2013

1 commit

  • for-3.12 branch is about to receive invasive updates which are
    dependent on da0a12caff ("cgroup: fix a leak when percpu_ref_init()
    fails"). Given the amount of scheduled changes, I think it'd less
    painful to pull in for-3.11-fixes as preparation. Pull in
    for-3.11-fixes into for-3.12.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

31 Jul, 2013

4 commits

  • This will be used as a replacement for css_lookup().

    There's a difference with cgroup id and css id. cgroup id starts with 0,
    while css id starts with 1.

    v4:
    - also check if cggroup_mutex is held.
    - make it an inline function.

    Signed-off-by: Li Zefan
    Reviewed-by: Michal Hocko
    Signed-off-by: Tejun Heo

    Li Zefan
     
  • As cgroup id has been used in netprio cgroup and will be used in memcg,
    it's important to make it clear how a cgroup id is allocated.

    For example, in netprio cgroup, the id is used as index of anarray.

    Signed-off-by: Li Zefan
    Reviewed-by: Michal Hocko
    Signed-off-by: Tejun Heo

    Li Zefan
     
  • This enables us to lookup a cgroup by its id.

    v4:
    - add a comment for idr_remove() in cgroup_offline_fn().

    v3:
    - on success, idr_alloc() returns the id but not 0, so fix the BUG_ON()
    in cgroup_init().
    - pass the right value to idr_alloc() so that the id for dummy cgroup is 0.

    Signed-off-by: Li Zefan
    Reviewed-by: Michal Hocko
    Signed-off-by: Tejun Heo

    Li Zefan
     
  • Constantly use @cset for css_set variables and use @cgrp as cgroup
    variables.

    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo

    Li Zefan
     

13 Jul, 2013

1 commit

  • task_cgroup_path_from_hierarchy() was added for the planned new users
    and none of the currently planned users wants to know about multiple
    hierarchies. This patch drops the multiple hierarchy part and makes
    it always return the path in the first non-dummy hierarchy.

    As unified hierarchy will always have id 1, this is guaranteed to
    return the path for the unified hierarchy if mounted; otherwise, it
    will return the path from the hierarchy which happens to occupy the
    lowest hierarchy id, which will usually be the first hierarchy mounted
    after boot.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Cc: Lennart Poettering
    Cc: Kay Sievers
    Cc: Jan Kaluža

    Tejun Heo
     

12 Jul, 2013

1 commit

  • Pull core block IO updates from Jens Axboe:
    "Here are the core IO block bits for 3.11. It contains:

    - A tweak to the reserved tag logic from Jan, for weirdo devices with
    just 3 free tags. But for those it improves things substantially
    for random writes.

    - Periodic writeback fix from Jan. Marked for stable as well.

    - Fix for a race condition in IO scheduler switching from Jianpeng.

    - The hierarchical blk-cgroup support from Tejun. This is the grunt
    of the series.

    - blk-throttle fix from Vivek.

    Just a note that I'm in the middle of a relocation, whole family is
    flying out tomorrow. Hence I will be awal the remainder of this week,
    but back at work again on Monday the 15th. CC'ing Tejun, since any
    potential "surprises" will most likely be from the blk-cgroup work.
    But it's been brewing for a while and sitting in my tree and
    linux-next for a long time, so should be solid."

    * 'for-3.11/core' of git://git.kernel.dk/linux-block: (36 commits)
    elevator: Fix a race in elevator switching
    block: Reserve only one queue tag for sync IO if only 3 tags are available
    writeback: Fix periodic writeback after fs mount
    blk-throttle: implement proper hierarchy support
    blk-throttle: implement throtl_grp->has_rules[]
    blk-throttle: Account for child group's start time in parent while bio climbs up
    blk-throttle: add throtl_qnode for dispatch fairness
    blk-throttle: make throtl_pending_timer_fn() ready for hierarchy
    blk-throttle: make tg_dispatch_one_bio() ready for hierarchy
    blk-throttle: make blk_throtl_bio() ready for hierarchy
    blk-throttle: make blk_throtl_drain() ready for hierarchy
    blk-throttle: dispatch from throtl_pending_timer_fn()
    blk-throttle: implement dispatch looping
    blk-throttle: separate out throtl_service_queue->pending_timer from throtl_data->dispatch_work
    blk-throttle: set REQ_THROTTLED from throtl_charge_bio() and gate stats update with it
    blk-throttle: implement sq_to_tg(), sq_to_td() and throtl_log()
    blk-throttle: add throtl_service_queue->parent_sq
    blk-throttle: generalize update_disptime optimization in blk_throtl_bio()
    blk-throttle: dispatch to throtl_data->service_queue.bio_lists[]
    blk-throttle: move bio_lists[] and friends to throtl_service_queue
    ...

    Linus Torvalds
     

03 Jul, 2013

1 commit

  • Pull cpuset changes from Tejun Heo:
    "cpuset has always been rather odd about its configurations - a cgroup
    right after creation didn't allow any task executions before
    configuration, changing configuration in the parent modifies the
    descendants irreversibly and so on. These behaviors are inherently
    nasty and almost hostile against sharing the hierarchy with other
    controllers making it very difficult to use in unified hierarchy.

    Li is currently in the process of updating the behaviors for
    __DEVEL__sane_behavior which is the bulk of changes in this pull
    request. It isn't complete yet and the behaviors will change further
    but all changes are gated behind sane_behavior. In the process, the
    rather hairy work-item punting which was used to work around the
    limitations of cgroup descendant iterator was simplified."

    * 'for-3.11-cpuset' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cpuset: rename @cont to @cgrp
    cpuset: fix to migrate mm correctly in a corner case
    cpuset: allow to move tasks to empty cpusets
    cpuset: allow to keep tasks in empty cpusets
    cpuset: introduce effective_{cpumask|nodemask}_cpuset()
    cpuset: record old_mems_allowed in struct cpuset
    cpuset: remove async hotplug propagation work
    cpuset: let hotplug propagation work wait for task attaching
    cpuset: re-structure update_cpumask() a bit
    cpuset: remove cpuset_test_cpumask()
    cpuset: remove unnecessary variable in cpuset_attach()
    cpuset: cleanup guarantee_online_{cpus|mems}()
    cpuset: remove redundant check in cpuset_cpus_allowed_fallback()

    Linus Torvalds
     

28 Jun, 2013

1 commit

  • 1672d04070 ("cgroup: fix cgroupfs_root early destruction path")
    introduced CGRP_ROOT_SUBSYS_BOUND which is used to mark completion of
    subsys binding on a new root; however, this broke remounts.
    cgroup_remount() doesn't allow changing root options via remount and
    CGRP_ROOT_SUBSYS_BOUND, which is set on all fully initialized roots,
    makes the function reject all remounts.

    Fix it by putting the options part in the lower 16 bits of root->flags
    and masking the comparions. While at it, make cgroup_remount() emit
    an error message explaining why it's rejecting a remount request, so
    that it's less of a mystery.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

27 Jun, 2013

2 commits

  • task->cgroups is a RCU pointer pointing to struct css_set. A task
    switches to a different css_set on cgroup migration but a css_set
    doesn't change once created and its pointers to cgroup_subsys_states
    aren't RCU protected.

    task_subsys_state[_check]() is the macro to acquire css given a task
    and subsys_id pair. It RCU-dereferences task->cgroups->subsys[] not
    task->cgroups, so the RCU pointer task->cgroups ends up being
    dereferenced without read_barrier_depends() after it. It's broken.

    Fix it by introducing task_css_set[_check]() which does
    RCU-dereference on task->cgroups. task_subsys_state[_check]() is
    reimplemented to directly dereference ->subsys[] of the css_set
    returned from task_css_set[_check]().

    This removes some of sparse RCU warnings in cgroup.

    v2: Fixed unbalanced parenthsis and there's no need to use
    rcu_dereference_raw() when !CONFIG_PROVE_RCU. Both spotted by Li.

    Signed-off-by: Tejun Heo
    Reported-by: Fengguang Wu
    Acked-by: Li Zefan
    Cc: stable@vger.kernel.org

    Tejun Heo
     
  • cgroupfs_root used to have ->actual_subsys_mask in addition to
    ->subsys_mask. a8a648c4ac ("cgroup: remove
    cgroup->actual_subsys_mask") removed it noting that the subsys_mask is
    essentially temporary and doesn't belong in cgroupfs_root; however,
    the patch made it impossible to tell whether a cgroupfs_root actually
    has the subsystems bound or just have the bits set leading to the
    following BUG when trying to mount with subsystems which are already
    mounted elsewhere.

    kernel BUG at kernel/cgroup.c:1038!
    invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    ...
    CPU: 1 PID: 7973 Comm: mount Tainted: G W 3.10.0-rc7-next-20130625-sasha-00011-g1c1dc0e #1105
    task: ffff880fc0ae8000 ti: ffff880fc0b9a000 task.ti: ffff880fc0b9a000
    RIP: 0010:[] [] rebind_subsystems+0x409/0x5f0
    ...
    Call Trace:
    [] cgroup_kill_sb+0xff/0x210
    [] deactivate_locked_super+0x4f/0x90
    [] cgroup_mount+0x673/0x6e0
    [] cpuset_mount+0xd9/0x110
    [] mount_fs+0xb0/0x2d0
    [] vfs_kern_mount+0xbd/0x180
    [] do_new_mount+0x145/0x2c0
    [] do_mount+0x356/0x3c0
    [] SyS_mount+0xfd/0x140
    [] tracesys+0xdd/0xe2

    We still want rebind_subsystems() to take added/removed masks, so
    let's fix it by marking whether a cgroupfs_root has finished binding
    or not. Also, document what's going on around ->subsys_mask
    initialization so that similar mistakes aren't repeated.

    Signed-off-by: Tejun Heo
    Reported-by: Sasha Levin
    Acked-by: Li Zefan

    Tejun Heo
     

25 Jun, 2013

1 commit

  • cgroup curiously has two subsystem masks, ->subsys_mask and
    ->actual_subsys_mask. The latter only exists because the new target
    subsys_mask is passed into rebind_subsystems() via @root>subsys_mask.
    rebind_subsystems() needs to know what the current mask is to decide
    how to reach the target mask so ->actual_subsys_mask is used as the
    temp location to remember the current state.

    Adding a temporary field to a permanent data structure is rather silly
    and can be misleading. Update rebind_subsystems() to take @added_mask
    and @removed_mask instead and remove @root->actual_subsys_mask.

    This patch shouldn't introduce any behavior changes.

    v2: Comment and description updated as suggested by Li.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo