04 Apr, 2014

1 commit

  • Pull cgroup updates from Tejun Heo:
    "A lot updates for cgroup:

    - The biggest one is cgroup's conversion to kernfs. cgroup took
    after the long abandoned vfs-entangled sysfs implementation and
    made it even more convoluted over time. cgroup's internal objects
    were fused with vfs objects which also brought in vfs locking and
    object lifetime rules. Naturally, there are places where vfs rules
    don't fit and nasty hacks, such as credential switching or lock
    dance interleaving inode mutex and cgroup_mutex with object serial
    number comparison thrown in to decide whether the operation is
    actually necessary, needed to be employed.

    After conversion to kernfs, internal object lifetime and locking
    rules are mostly isolated from vfs interactions allowing shedding
    of several nasty hacks and overall simplification. This will also
    allow implmentation of operations which may affect multiple cgroups
    which weren't possible before as it would have required nesting
    i_mutexes.

    - Various simplifications including dropping of module support,
    easier cgroup name/path handling, simplified cgroup file type
    handling and task_cg_lists optimization.

    - Prepatory changes for the planned unified hierarchy, which is still
    a patchset away from being actually operational. The dummy
    hierarchy is updated to serve as the default unified hierarchy.
    Controllers which aren't claimed by other hierarchies are
    associated with it, which BTW was what the dummy hierarchy was for
    anyway.

    - Various fixes from Li and others. This pull request includes some
    patches to add missing slab.h to various subsystems. This was
    triggered xattr.h include removal from cgroup.h. cgroup.h
    indirectly got included a lot of files which brought in xattr.h
    which brought in slab.h.

    There are several merge commits - one to pull in kernfs updates
    necessary for converting cgroup (already in upstream through
    driver-core), others for interfering changes in the fixes branch"

    * 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits)
    cgroup: remove useless argument from cgroup_exit()
    cgroup: fix spurious lockdep warning in cgroup_exit()
    cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c
    cgroup: break kernfs active_ref protection in cgroup directory operations
    cgroup: fix cgroup_taskset walking order
    cgroup: implement CFTYPE_ONLY_ON_DFL
    cgroup: make cgrp_dfl_root mountable
    cgroup: drop const from @buffer of cftype->write_string()
    cgroup: rename cgroup_dummy_root and related names
    cgroup: move ->subsys_mask from cgroupfs_root to cgroup
    cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding
    cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}()
    cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root
    cgroup: reorganize cgroup bootstrapping
    cgroup: relocate setting of CGRP_DEAD
    cpuset: use rcu_read_lock() to protect task_cs()
    cgroup_freezer: document freezer_fork() subtleties
    cgroup: update cgroup_transfer_tasks() to either succeed or fail
    cgroup: drop task_lock() protection around task->cgroups
    cgroup: update how a newly forked task gets associated with css_set
    ...

    Linus Torvalds
     

15 Mar, 2014

1 commit

  • Replace the bh safe variant with the hard irq safe variant.

    We need a hard irq safe variant to deal with netpoll transmitting
    packets from hard irq context, and we need it in most if not all of
    the places using the bh safe variant.

    Except on 32bit uni-processor the code is exactly the same so don't
    bother with a bh variant, just have a hard irq safe variant that
    everyone can use.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

12 Feb, 2014

1 commit

  • cgroup->name handling became quite complicated over time involving
    dedicated struct cgroup_name for RCU protection. Now that cgroup is
    on kernfs, we can drop all of it and simply use kernfs_name/path() and
    friends. Replace cgroup->name and all related code with kernfs
    name/path constructs.

    * Reimplement cgroup_name() and cgroup_path() as thin wrappers on top
    of kernfs counterparts, which involves semantic changes.
    pr_cont_cgroup_name() and pr_cont_cgroup_path() added.

    * cgroup->name handling dropped from cgroup_rename().

    * All users of cgroup_name/path() updated to the new semantics. Users
    which were formatting the string just to printk them are converted
    to use pr_cont_cgroup_name/path() instead, which simplifies things
    quite a bit. As cgroup_name() no longer requires RCU read lock
    around it, RCU lockings which were protecting only cgroup_name() are
    removed.

    v2: Comment above oom_info_lock updated as suggested by Michal.

    v3: dummy_top doesn't have a kn associated and
    pr_cont_cgroup_name/path() ended up calling the matching kernfs
    functions with NULL kn leading to oops. Test for NULL kn and
    print "/" if so. This issue was reported by Fengguang Wu.

    v4: Rebased on top of 0ab02ca8f887 ("cgroup: protect modifications to
    cgroup_idr with cgroup_mutex").

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Acked-by: Michal Hocko
    Acked-by: Li Zefan
    Cc: Fengguang Wu
    Cc: Ingo Molnar
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki

    Tejun Heo
     

08 Feb, 2014

1 commit

  • cgroup_subsys is a bit messier than it needs to be.

    * The name of a subsys can be different from its internal identifier
    defined in cgroup_subsys.h. Most subsystems use the matching name
    but three - cpu, memory and perf_event - use different ones.

    * cgroup_subsys_id enums are postfixed with _subsys_id and each
    cgroup_subsys is postfixed with _subsys. cgroup.h is widely
    included throughout various subsystems, it doesn't and shouldn't
    have claim on such generic names which don't have any qualifier
    indicating that they belong to cgroup.

    * cgroup_subsys->subsys_id should always equal the matching
    cgroup_subsys_id enum; however, we require each controller to
    initialize it and then BUG if they don't match, which is a bit
    silly.

    This patch cleans up cgroup_subsys names and initialization by doing
    the followings.

    * cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
    cgroup_subsys with _cgrp_subsys.

    * With the above, renaming subsys identifiers to match the userland
    visible names doesn't cause any naming conflicts. All non-matching
    identifiers are renamed to match the official names.

    cpu_cgroup -> cpu
    mem_cgroup -> memory
    perf -> perf_event

    * controllers no longer need to initialize ->subsys_id and ->name.
    They're generated in cgroup core and set automatically during boot.

    * Redundant cgroup_subsys declarations removed.

    * While updating BUG_ON()s in cgroup_init_early(), convert them to
    WARN()s. BUGging that early during boot is stupid - the kernel
    can't print anything, even through serial console and the trap
    handler doesn't even link stack frame properly for back-tracing.

    This patch doesn't introduce any behavior changes.

    v2: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs
    classid handling into core").

    Signed-off-by: Tejun Heo
    Acked-by: Neil Horman
    Acked-by: "David S. Miller"
    Acked-by: "Rafael J. Wysocki"
    Acked-by: Michal Hocko
    Acked-by: Peter Zijlstra
    Acked-by: Aristeu Rozanski
    Acked-by: Ingo Molnar
    Acked-by: Li Zefan
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Serge E. Hallyn
    Cc: Vivek Goyal
    Cc: Thomas Graf

    Tejun Heo
     

21 Nov, 2013

1 commit


13 Nov, 2013

1 commit

  • Now that seqcounts are lockdep enabled objects, we need to explicitly
    initialize runtime allocated seqcounts so that lockdep can track them.

    Without this patch, Fengguang was seeing:

    [ 4.127282] INFO: trying to register non-static key.
    [ 4.128027] the code is fine but needs lockdep annotation.
    [ 4.128027] turning off the locking correctness validator.
    [ 4.128027] CPU: 0 PID: 96 Comm: kworker/u4:1 Not tainted 3.12.0-next-20131108-10601-gbad570d #2
    [ 4.128027] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [ ... ]
    [ 4.128027] Call Trace:
    [ 4.128027] [] ? console_unlock+0x353/0x380
    [ 4.128027] [] dump_stack+0x48/0x60
    [ 4.128027] [] __lock_acquire.isra.26+0x7e3/0xceb
    [ 4.128027] [] lock_acquire+0x71/0x9a
    [ 4.128027] [] ? blk_throtl_bio+0x1c3/0x485
    [ 4.128027] [] throtl_update_dispatch_stats+0x7c/0x153
    [ 4.128027] [] ? blk_throtl_bio+0x1c3/0x485
    [ 4.128027] [] blk_throtl_bio+0x1c3/0x485
    ...

    Use u64_stats_init() for all affected data structures, which initializes
    the seqcount.

    Reported-and-Tested-by: Fengguang Wu
    Cc: Vivek Goyal
    Cc: Jens Axboe
    Signed-off-by: Peter Zijlstra
    [ Folded in another fix from the mailing list as well as a fix to that fix. Tweaked commit message. ]
    Signed-off-by: John Stultz
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1384314134-6895-1-git-send-email-john.stultz@linaro.org
    [ So I actually think that the two SOBs from PeterZ are the right depiction of the patch route. ]
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

09 Aug, 2013

5 commits

  • Previously, all css descendant iterators didn't include the origin
    (root of subtree) css in the iteration. The reasons were maintaining
    consistency with css_for_each_child() and that at the time of
    introduction more use cases needed skipping the origin anyway;
    however, given that css_is_descendant() considers self to be a
    descendant, omitting the origin css has become more confusing and
    looking at the accumulated use cases rather clearly indicates that
    including origin would result in simpler code overall.

    While this is a change which can easily lead to subtle bugs, cgroup
    API including the iterators has recently gone through major
    restructuring and no out-of-tree changes will be applicable without
    adjustments making this a relatively acceptable opportunity for this
    type of change.

    The conversions are mostly straight-forward. If the iteration block
    had explicit origin handling before or after, it's moved inside the
    iteration. If not, if (pos == origin) continue; is added. Some
    conversions add extra reference get/put around origin handling by
    consolidating origin handling and the rest. While the extra ref
    operations aren't strictly necessary, this shouldn't cause any
    noticeable difference.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Vivek Goyal
    Acked-by: Aristeu Rozanski
    Acked-by: Michal Hocko
    Cc: Jens Axboe
    Cc: Matt Helsley
    Cc: Johannes Weiner
    Cc: Balbir Singh

    Tejun Heo
     
  • cgroup is currently in the process of transitioning to using css
    (cgroup_subsys_state) as the primary handle instead of cgroup in
    subsystem API. For hierarchy iterators, this is beneficial because

    * In most cases, css is the only thing subsystems care about anyway.

    * On the planned unified hierarchy, iterations for different
    subsystems will need to skip over different subtrees of the
    hierarchy depending on which subsystems are enabled on each cgroup.
    Passing around css makes it unnecessary to explicitly specify the
    subsystem in question as css is intersection between cgroup and
    subsystem

    * For the planned unified hierarchy, css's would need to be created
    and destroyed dynamically independent from cgroup hierarchy. Having
    cgroup core manage css iteration makes enforcing deref rules a lot
    easier.

    Most subsystem conversions are straight-forward. Noteworthy changes
    are

    * blkio: cgroup_to_blkcg() is no longer used. Removed.

    * freezer: cgroup_freezer() is no longer used. Removed.

    * devices: cgroup_to_devcgroup() is no longer used. Removed.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Acked-by: Vivek Goyal
    Acked-by: Aristeu Rozanski
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: Matt Helsley
    Cc: Jens Axboe

    Tejun Heo
     
  • Currently, controllers have to explicitly follow the cgroup hierarchy
    to find the parent of a given css. cgroup is moving towards using
    cgroup_subsys_state as the main controller interface construct, so
    let's provide a way to climb the hierarchy using just csses.

    This patch implements css_parent() which, given a css, returns its
    parent. The function is guarnateed to valid non-NULL parent css as
    long as the target css is not at the top of the hierarchy.

    freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
    are converted to use css_parent() instead of accessing cgroup->parent
    directly.

    * __parent_ca() is dropped from cpuacct and its usage is replaced with
    parent_ca(). The only difference between the two was NULL test on
    cgroup->parent which is now embedded in css_parent() making the
    distinction moot. Note that eventually a css->parent field will be
    added to css and the NULL check in css_parent() will go away.

    This patch shouldn't cause any behavior differences.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • css (cgroup_subsys_state) is usually embedded in a subsys specific
    data structure. Subsystems either use container_of() directly to cast
    from css to such data structure or has an accessor function wrapping
    such cast. As cgroup as whole is moving towards using css as the main
    interface handle, add and update such accessors to ease dealing with
    css's.

    All accessors explicitly handle NULL input and return NULL in those
    cases. While this looks like an extra branch in the code, as all
    controllers specific data structures have css as the first field, the
    casting doesn't involve any offsetting and the compiler can trivially
    optimize out the branch.

    * blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
    accessor. Added.

    * memory, hugetlb and devices already had one but didn't explicitly
    handle NULL input. Updated.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • The names of the two struct cgroup_subsys_state accessors -
    cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
    The former clashes with the type name and the latter doesn't even
    indicate it's somehow related to cgroup.

    We're about to revamp large portion of cgroup API, so, let's rename
    them so that they're less awkward. Most per-controller usages of the
    accessors are localized in accessor wrappers and given the amount of
    scheduled changes, this isn't gonna add any noticeable headache.

    Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
    to task_css(). This patch is pure rename.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     

15 May, 2013

3 commits

  • Currently, when the last reference of a blkcg_gq is put, all then
    release operations sans the actual freeing happen directly in
    blkg_put(). As blkg_put() may be called under queue_lock, all
    pd_exit_fn()s may be too. This makes it impossible for pd_exit_fn()s
    to use del_timer_sync() on timers which grab the queue_lock which is
    an irq-safe lock due to the deadlock possibility described in the
    comment on top of del_timer_sync().

    This can be easily avoided by perfoming the release operations in the
    RCU callback instead of directly from blkg_put(). This patch moves
    the blkcg_gq release operations to the RCU callback.

    As this leaves __blkg_release() with only call_rcu() invocation,
    blkg_rcu_free() is renamed to __blkg_release_rcu(), exported and
    call_rcu() invocation is now done directly from blkg_put() instead of
    going through __blkg_release() which is removed.

    Signed-off-by: Tejun Heo
    Acked-by: Vivek Goyal

    Tejun Heo
     
  • This will be used by blk-throttle hierarchy support.

    Signed-off-by: Tejun Heo
    Acked-by: Vivek Goyal

    Tejun Heo
     
  • blk-throttle hierarchy support will make use of it. Move
    blkg_for_each_descendant_pre() from block/blk-cgroup.c to
    block/blk-cgroup.h.

    Signed-off-by: Tejun Heo
    Acked-by: Vivek Goyal

    Tejun Heo
     

05 Mar, 2013

1 commit

  • rename() will change dentry->d_name. The result of this race can
    be worse than seeing partially rewritten name, but we might access
    a stale pointer because rename() will re-allocate memory to hold
    a longer name.

    As accessing dentry->name must be protected by dentry->d_lock or
    parent inode's i_mutex, while on the other hand cgroup-path() can
    be called with some irq-safe spinlocks held, we can't generate
    cgroup path using dentry->d_name.

    Alternatively we make a copy of dentry->d_name and save it in
    cgrp->name when a cgroup is created, and update cgrp->name at
    rename().

    v5: use flexible array instead of zero-size array.
    v4: - allocate root_cgroup_name and all root_cgroup->name points to it.
    - add cgroup_name() wrapper.
    v3: use kfree_rcu() instead of synchronize_rcu() in user-visible path.
    v2: make cgrp->name RCU safe.

    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo

    Li Zefan
     

10 Jan, 2013

6 commits

  • Implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge().
    The former two collect the [rw]stats designated by the target policy
    data and offset from the pd's subtree. The latter two add one
    [rw]stat to another.

    Note that the recursive sum functions require the queue lock to be
    held on entry to make blkg online test reliable. This is necessary to
    properly handle stats of a dying blkg.

    These will be used to implement hierarchical stats.

    Signed-off-by: Tejun Heo
    Acked-by: Vivek Goyal

    Tejun Heo
     
  • Rename blkg_rwstat_sum() to blkg_rwstat_total(). sum will be used for
    summing up stats from multiple blkgs.

    Signed-off-by: Tejun Heo
    Acked-by: Vivek Goyal

    Tejun Heo
     
  • Add two blkcg_policy methods, ->online_pd_fn() and ->offline_pd_fn(),
    which are invoked as the policy_data gets activated and deactivated
    while holding both blkcg and q locks.

    Also, add blkcg_gq->online bool, which is set and cleared as the
    blkcg_gq gets activated and deactivated. This flag also is toggled
    while holding both blkcg and q locks.

    These will be used to implement hierarchical stats.

    Signed-off-by: Tejun Heo
    Acked-by: Vivek Goyal

    Tejun Heo
     
  • Add pd->plid so that the policy a pd belongs to can be identified
    easily. This will be used to implement hierarchical blkg_[rw]stats.

    Signed-off-by: Tejun Heo
    Acked-by: Vivek Goyal

    Tejun Heo
     
  • cfq blkcg is about to grow proper hierarchy handling, where a child
    blkg's weight would nest inside the parent's. This makes tasks in a
    blkg to compete against both tasks in the sibling blkgs and the tasks
    of child blkgs.

    We're gonna use the existing weight as the group weight which decides
    the blkg's weight against its siblings. This patch introduces a new
    weight - leaf_weight - which decides the weight of a blkg against the
    child blkgs.

    It's named leaf_weight because another way to look at it is that each
    internal blkg nodes have a hidden child leaf node which contains all
    its tasks and leaf_weight is the weight of the leaf node and handled
    the same as the weight of the child blkgs.

    This patch only adds leaf_weight fields and exposes it to userland.
    The new weight isn't actually used anywhere yet. Note that
    cfq-iosched currently offcially supports only single level hierarchy
    and root blkgs compete with the first level blkgs - ie. root weight is
    basically being used as leaf_weight. For root blkgs, the two weights
    are kept in sync for backward compatibility.

    v2: cfqd->root_group->leaf_weight initialization was missing from
    cfq_init_queue() causing divide by zero when
    !CONFIG_CFQ_GROUP_SCHED. Fix it. Reported by Fengguang.

    Signed-off-by: Tejun Heo
    Cc: Fengguang Wu

    Tejun Heo
     
  • Currently a child blkg (blkcg_gq) can be created even if its parent
    doesn't exist. ie. Given a blkg, it's not guaranteed that its
    ancestors will exist. This makes it difficult to implement proper
    hierarchy support for blkcg policies.

    Always create blkgs recursively and make a child blkg hold a reference
    to its parent. blkg->parent is added so that finding the parent is
    easy. blkcg_parent() is also added in the process.

    This change can be visible to userland. e.g. while issuing IO in a
    nested cgroup didn't affect the ancestors at all, now it will
    initialize all ancestor blkgs and zero stats for the request_queue
    will always appear on them. While this is userland visible, this
    shouldn't cause any functional difference.

    Signed-off-by: Tejun Heo
    Acked-by: Vivek Goyal

    Tejun Heo
     

27 Jun, 2012

1 commit

  • Currently, request_queue has one request_list to allocate requests
    from regardless of blkcg of the IO being issued. When the unified
    request pool is used up, cfq proportional IO limits become meaningless
    - whoever grabs the next request being freed wins the race regardless
    of the configured weights.

    This can be easily demonstrated by creating a blkio cgroup w/ very low
    weight, put a program which can issue a lot of random direct IOs there
    and running a sequential IO from a different cgroup. As soon as the
    request pool is used up, the sequential IO bandwidth crashes.

    This patch implements per-blkg request_list. Each blkg has its own
    request_list and any IO allocates its request from the matching blkg
    making blkcgs completely isolated in terms of request allocation.

    * Root blkcg uses the request_list embedded in each request_queue,
    which was renamed to @q->root_rl from @q->rq. While making blkcg rl
    handling a bit harier, this enables avoiding most overhead for root
    blkcg.

    * Queue fullness is properly per request_list but bdi isn't blkcg
    aware yet, so congestion state currently just follows the root
    blkcg. As writeback isn't aware of blkcg yet, this works okay for
    async congestion but readahead may get the wrong signals. It's
    better than blkcg completely collapsing with shared request_list but
    needs to be improved with future changes.

    * After this change, each block cgroup gets a full request pool making
    resource consumption of each cgroup higher. This makes allowing
    non-root users to create cgroups less desirable; however, note that
    allowing non-root users to directly manage cgroups is already
    severely broken regardless of this patch - each block cgroup
    consumes kernel memory and skews IO weight (IO weights are not
    hierarchical).

    v2: queue-sysfs.txt updated and patch description udpated as suggested
    by Vivek.

    v3: blk_get_rl() wasn't checking error return from
    blkg_lookup_create() and may cause oops on lookup failure. Fix it
    by falling back to root_rl on blkg lookup failures. This problem
    was spotted by Rakesh Iyer .

    v4: Updated to accomodate 458f27a982 "block: Avoid missed wakeup in
    request waitqueue". blk_drain_queue() now wakes up waiters on all
    blkg->rl on the target queue.

    Signed-off-by: Tejun Heo
    Acked-by: Vivek Goyal
    Cc: Wu Fengguang
    Signed-off-by: Jens Axboe

    Tejun Heo
     

25 Jun, 2012

1 commit

  • Make bio_blkcg() and friends inline. They all are very simple and
    used only in few places.

    This patch is to prepare for further updates to request allocation
    path.

    Signed-off-by: Tejun Heo
    Acked-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     

20 Apr, 2012

14 commits

  • blkg lookup is currently performed by traversing linked list anchored
    at blkcg->blkg_list. This is very unscalable and with blk-throttle
    enabled and enough request queues on the system, this can get very
    ugly quickly (blk-throttle performs look up on every bio submission).

    This patch makes blkcg use radix tree to index blkgs combined with
    simple last-looked-up hint. This is mostly identical to how icqs are
    indexed from ioc.

    Note that because __blkg_lookup() may be invoked without holding queue
    lock, hint is only updated from __blkg_lookup_create(). Due to cfq's
    cfqq caching, this makes hint updates overly lazy. This will be
    improved with scheduled blkcg aware request allocation.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • There's no reason to keep blkcg_policy_ops separate. Collapse it into
    blkcg_policy.

    This patch doesn't introduce any functional change.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently blkg_policy_data carries policy specific data as char flex
    array instead of being embedded in policy specific data. This was
    forced by oddities around blkg allocation which are all gone now.

    This patch makes blkg_policy_data embedded in policy specific data -
    throtl_grp and cfq_group so that it's more conventional and consistent
    with how io_cq is handled.

    * blkcg_policy->pdata_size is renamed to ->pd_size.

    * Functions which used to take void *pdata now takes struct
    blkg_policy_data *pd.

    * blkg_to_pdata/pdata_to_blkg() updated to blkg_to_pd/pd_to_blkg().

    * Dummy struct blkg_policy_data definition added. Dummy
    pdata_to_blkg() definition was unused and inconsistent with the
    non-dummy version - correct dummy pd_to_blkg() added.

    * throtl and cfq updated accordingly.

    * As dummy blkg_to_pd/pd_to_blkg() are provided,
    blkg_to_cfqg/cfqg_to_blkg() don't need to be ifdef'd. Moved outside
    ifdef block.

    This patch doesn't introduce any functional change.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • During the recent blkcg cleanup, most of blkcg API has changed to such
    extent that mass renaming wouldn't cause any noticeable pain. Take
    the chance and cleanup the naming.

    * Rename blkio_cgroup to blkcg.

    * Drop blkio / blkiocg prefixes and consistently use blkcg.

    * Rename blkio_group to blkcg_gq, which is consistent with io_cq but
    keep the blkg prefix / variable name.

    * Rename policy method type and field names to signify they're dealing
    with policy data.

    * Rename blkio_policy_type to blkcg_policy.

    This patch doesn't cause any functional change.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • * Update indentation on struct field declarations.

    * Uniformly don't use "extern" on function declarations.

    * Merge the two #ifdef CONFIG_BLK_CGROUP blocks.

    All changes in this patch are cosmetic.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blkio_group->path[] stores the path of the associated cgroup and is
    used only for debug messages. Just format the path from blkg->cgroup
    when printing debug messages.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blkg_rwstat_read() in blk-cgroup.h was missing inline modifier causing
    compile warning depending on configuration. Add it.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • * All_q_list is unused. Drop all_q_{mutex|list}.

    * @for_root of blkg_lookup_create() is always %false when called from
    outside blk-cgroup.c proper. Factor out __blkg_lookup_create() so
    that it doesn't check whether @q is bypassing and use the
    underscored version for the @for_root callsite.

    * blkg_destroy_all() is used only from blkcg proper and @destroy_root
    is always %true. Make it static and drop @destroy_root.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • All blkcg policies were assumed to be enabled on all request_queues.
    Due to various implementation obstacles, during the recent blkcg core
    updates, this was temporarily implemented as shooting down all !root
    blkgs on elevator switch and policy [de]registration combined with
    half-broken in-place root blkg updates. In addition to being buggy
    and racy, this meant losing all blkcg configurations across those
    events.

    Now that blkcg is cleaned up enough, this patch replaces the temporary
    implementation with proper per-queue policy activation. Each blkcg
    policy should call the new blkcg_[de]activate_policy() to enable and
    disable the policy on a specific queue. blkcg_activate_policy()
    allocates and installs policy data for the policy for all existing
    blkgs. blkcg_deactivate_policy() does the reverse. If a policy is
    not enabled for a given queue, blkg printing / config functions skip
    the respective blkg for the queue.

    blkcg_activate_policy() also takes care of root blkg creation, and
    cfq_init_queue() and blk_throtl_init() are updated accordingly.

    This replaces blkcg_bypass_{start|end}() and update_root_blkg_pd()
    unnecessary. Dropped.

    v2: cfq_init_queue() was returning uninitialized @ret on root_group
    alloc failure if !CONFIG_CFQ_GROUP_IOSCHED. Fixed.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Add @pol to blkg_conf_prep() and let it return with queue lock held
    (to be released by blkg_conf_finish()). Note that @pol isn't used
    yet.

    This is to prepare for per-queue policy activation and doesn't cause
    any visible difference.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Remove BLKIO_POLICY_* enums and let blkio_policy_register() allocate
    @pol->plid dynamically on registration. The maximum number of blkcg
    policies which can be registered at the same time is defined by
    BLKCG_MAX_POLS constant added to include/linux/blkdev.h.

    Note that blkio_policy_register() now may fail. Policy init functions
    updated accordingly and unnecessary ifdefs removed from cfq_init().

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • The two functions were taking "enum blkio_policy_id plid". Make them
    take "const struct blkio_policy_type *pol" instead.

    This is to prepare for per-queue policy activation and doesn't cause
    any functional difference.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • With blkio_policy[], blkio_list is redundant and hinders with
    per-queue policy activation. Remove it. Also, replace
    blkio_list_lock with a mutex blkcg_pol_mutex and let it protect the
    whole [un]registration.

    This is to prepare for per-queue policy activation and doesn't cause
    any functional difference.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • * CFQ_WEIGHT_* defined inside CONFIG_BLK_CGROUP causes cfq-iosched.c
    compile failure when the config is disabled. Move it outside the
    ifdef block.

    * Dummy cfqg_stats_*() definitions were lacking inline modifiers
    causing unused functions warning if !CONFIG_CFQ_GROUP_IOSCHED. Add
    them.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     

02 Apr, 2012

3 commits