20 Apr, 2012

14 commits

  • blkg lookup is currently performed by traversing linked list anchored
    at blkcg->blkg_list. This is very unscalable and with blk-throttle
    enabled and enough request queues on the system, this can get very
    ugly quickly (blk-throttle performs look up on every bio submission).

    This patch makes blkcg use radix tree to index blkgs combined with
    simple last-looked-up hint. This is mostly identical to how icqs are
    indexed from ioc.

    Note that because __blkg_lookup() may be invoked without holding queue
    lock, hint is only updated from __blkg_lookup_create(). Due to cfq's
    cfqq caching, this makes hint updates overly lazy. This will be
    improved with scheduled blkcg aware request allocation.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • There's no reason to keep blkcg_policy_ops separate. Collapse it into
    blkcg_policy.

    This patch doesn't introduce any functional change.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently blkg_policy_data carries policy specific data as char flex
    array instead of being embedded in policy specific data. This was
    forced by oddities around blkg allocation which are all gone now.

    This patch makes blkg_policy_data embedded in policy specific data -
    throtl_grp and cfq_group so that it's more conventional and consistent
    with how io_cq is handled.

    * blkcg_policy->pdata_size is renamed to ->pd_size.

    * Functions which used to take void *pdata now takes struct
    blkg_policy_data *pd.

    * blkg_to_pdata/pdata_to_blkg() updated to blkg_to_pd/pd_to_blkg().

    * Dummy struct blkg_policy_data definition added. Dummy
    pdata_to_blkg() definition was unused and inconsistent with the
    non-dummy version - correct dummy pd_to_blkg() added.

    * throtl and cfq updated accordingly.

    * As dummy blkg_to_pd/pd_to_blkg() are provided,
    blkg_to_cfqg/cfqg_to_blkg() don't need to be ifdef'd. Moved outside
    ifdef block.

    This patch doesn't introduce any functional change.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • During the recent blkcg cleanup, most of blkcg API has changed to such
    extent that mass renaming wouldn't cause any noticeable pain. Take
    the chance and cleanup the naming.

    * Rename blkio_cgroup to blkcg.

    * Drop blkio / blkiocg prefixes and consistently use blkcg.

    * Rename blkio_group to blkcg_gq, which is consistent with io_cq but
    keep the blkg prefix / variable name.

    * Rename policy method type and field names to signify they're dealing
    with policy data.

    * Rename blkio_policy_type to blkcg_policy.

    This patch doesn't cause any functional change.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • * Update indentation on struct field declarations.

    * Uniformly don't use "extern" on function declarations.

    * Merge the two #ifdef CONFIG_BLK_CGROUP blocks.

    All changes in this patch are cosmetic.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blkio_group->path[] stores the path of the associated cgroup and is
    used only for debug messages. Just format the path from blkg->cgroup
    when printing debug messages.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blkg_rwstat_read() in blk-cgroup.h was missing inline modifier causing
    compile warning depending on configuration. Add it.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • * All_q_list is unused. Drop all_q_{mutex|list}.

    * @for_root of blkg_lookup_create() is always %false when called from
    outside blk-cgroup.c proper. Factor out __blkg_lookup_create() so
    that it doesn't check whether @q is bypassing and use the
    underscored version for the @for_root callsite.

    * blkg_destroy_all() is used only from blkcg proper and @destroy_root
    is always %true. Make it static and drop @destroy_root.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • All blkcg policies were assumed to be enabled on all request_queues.
    Due to various implementation obstacles, during the recent blkcg core
    updates, this was temporarily implemented as shooting down all !root
    blkgs on elevator switch and policy [de]registration combined with
    half-broken in-place root blkg updates. In addition to being buggy
    and racy, this meant losing all blkcg configurations across those
    events.

    Now that blkcg is cleaned up enough, this patch replaces the temporary
    implementation with proper per-queue policy activation. Each blkcg
    policy should call the new blkcg_[de]activate_policy() to enable and
    disable the policy on a specific queue. blkcg_activate_policy()
    allocates and installs policy data for the policy for all existing
    blkgs. blkcg_deactivate_policy() does the reverse. If a policy is
    not enabled for a given queue, blkg printing / config functions skip
    the respective blkg for the queue.

    blkcg_activate_policy() also takes care of root blkg creation, and
    cfq_init_queue() and blk_throtl_init() are updated accordingly.

    This replaces blkcg_bypass_{start|end}() and update_root_blkg_pd()
    unnecessary. Dropped.

    v2: cfq_init_queue() was returning uninitialized @ret on root_group
    alloc failure if !CONFIG_CFQ_GROUP_IOSCHED. Fixed.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Add @pol to blkg_conf_prep() and let it return with queue lock held
    (to be released by blkg_conf_finish()). Note that @pol isn't used
    yet.

    This is to prepare for per-queue policy activation and doesn't cause
    any visible difference.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Remove BLKIO_POLICY_* enums and let blkio_policy_register() allocate
    @pol->plid dynamically on registration. The maximum number of blkcg
    policies which can be registered at the same time is defined by
    BLKCG_MAX_POLS constant added to include/linux/blkdev.h.

    Note that blkio_policy_register() now may fail. Policy init functions
    updated accordingly and unnecessary ifdefs removed from cfq_init().

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • The two functions were taking "enum blkio_policy_id plid". Make them
    take "const struct blkio_policy_type *pol" instead.

    This is to prepare for per-queue policy activation and doesn't cause
    any functional difference.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • With blkio_policy[], blkio_list is redundant and hinders with
    per-queue policy activation. Remove it. Also, replace
    blkio_list_lock with a mutex blkcg_pol_mutex and let it protect the
    whole [un]registration.

    This is to prepare for per-queue policy activation and doesn't cause
    any functional difference.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • * CFQ_WEIGHT_* defined inside CONFIG_BLK_CGROUP causes cfq-iosched.c
    compile failure when the config is disabled. Move it outside the
    ifdef block.

    * Dummy cfqg_stats_*() definitions were lacking inline modifiers
    causing unused functions warning if !CONFIG_CFQ_GROUP_IOSCHED. Add
    them.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     

02 Apr, 2012

18 commits

  • Now that all stat handling code lives in policy implementations,
    there's no need to encode policy ID in cft->private.

    * Export blkcg_prfill_[rw]stat() from blkcg, remove
    blkcg_print_[rw]stat(), and implement cfqg_print_[rw]stat() which
    use hard-code BLKIO_POLICY_PROP.

    * Use cft->private for offset of the target field directly and drop
    BLKCG_STAT_{PRIV|POL|OFF}().

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Now that all conf and stat fields are moved into policy specific
    blkio_policy_data->pdata areas, there's no reason to use
    blkio_policy_data itself in prfill functions. Pass around @pd->pdata
    instead of @pd.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blkio_cgroup_conf->iops and ->bps are owned by blk-throttle and has no
    reason to be defined in blkcg core. Drop them and let conf setting
    functions directly manipulate throtl_grp->bps[] and ->iops[].

    This makes blkio_group_conf empty. Drop it.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blkio_group_conf->weight is owned by cfq and has no reason to be
    defined in blkcg core. Replace it with cfq_group->dev_weight and let
    conf setting functions directly set it. If dev_weight is zero, the
    cfqg doesn't have device specific weight configured.

    Also, rename BLKIO_WEIGHT_* constants to CFQ_WEIGHT_* and rename
    blkio_cgroup->weight to blkio_cgroup->cfq_weight. We eventually want
    per-policy storage in blkio_cgroup but just mark the ownership of the
    field for now.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blkio_group_stats_cpu is used only by blk-throtl and has no reason to
    be defined in blkcg core.

    * Move blkio_group_stats_cpu to blk-throttle.c and rename it to
    tg_stats_cpu.

    * blkg_policy_data->stats_cpu is replaced with throtl_grp->stats_cpu.
    prfill functions updated accordingly.

    * All related macros / functions are renamed so that they have tg_
    prefix and the unnecessary @pol arguments are dropped.

    * Per-cpu stats allocation code is also moved from blk-cgroup.c to
    blk-throttle.c and gets simplified to only deal with
    BLKIO_POLICY_THROTL. percpu stat free is performed by the exit
    method throtl_exit_blkio_group().

    * throtl_reset_group_stats() implemented for
    blkio_reset_group_stats_fn method so that tg->stats_cpu can be
    reset.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blkio_group_stats contains only fields used by cfq and has no reason
    to be defined in blkcg core.

    * Move blkio_group_stats to cfq-iosched.c and rename it to cfqg_stats.

    * blkg_policy_data->stats is replaced with cfq_group->stats.
    blkg_prfill_[rw]stat() are updated to use offset against pd->pdata
    instead.

    * All related macros / functions are renamed so that they have cfqg_
    prefix and the unnecessary @pol arguments are dropped.

    * All stat functions now take cfq_group * instead of blkio_group *.

    * lockdep assertion on queue lock dropped. Elevator runs under queue
    lock by default. There isn't much to be gained by adding lockdep
    assertions at stat function level.

    * cfqg_stats_reset() implemented for blkio_reset_group_stats_fn method
    so that cfqg->stats can be reset.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Add blkio_policy_ops->blkio_exit_group_fn() and
    ->blkio_reset_group_stats_fn(). These will be used to further
    modularize blkcg policy implementation.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blkio_group_stats_cpu is used to count dispatch stats using per-cpu
    counters. This is used by both blk-throtl and cfq-iosched but the
    sharing is rather silly.

    * cfq-iosched doesn't need per-cpu dispatch stats. cfq always updates
    those stats while holding queue_lock.

    * blk-throtl needs per-cpu dispatch stats but only service_bytes and
    serviced. It doesn't make use of sectors.

    This patch makes cfq add and use global stats for service_bytes,
    serviced and sectors, removes per-cpu sectors counter and moves
    per-cpu stat printing code to blk-throttle.c.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • As with conf/stats file handling code, there's no reason for stat
    update code to live in blkcg core with policies calling into update
    them. The current organization is both inflexible and complex.

    This patch moves stat update code to specific policies. All
    blkiocg_update_*_stats() functions which deal with BLKIO_POLICY_PROP
    stats are collapsed into their cfq_blkiocg_update_*_stats()
    counterparts. blkiocg_update_dispatch_stats() is used by both
    policies and duplicated as throtl_update_dispatch_stats() and
    cfq_blkiocg_update_dispatch_stats(). This will be cleaned up later.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blkcg conf/stat handling is convoluted in that details which belong to
    specific policy implementations are all out in blkcg core and then
    policies hook into core layer to access and manipulate confs and
    stats. This sadly achieves both inflexibility (confs/stats can't be
    modified without messing with blkcg core) and complexity (all the
    call-ins and call-backs).

    The previous patches restructured conf and stat handling code such
    that they can be separated out. This patch relocates the file
    handling part. All conf/stat file handling code which belongs to
    BLKIO_POLICY_PROP is moved to cfq-iosched.c and all
    BKLIO_POLICY_THROTL code to blk-throtl.c.

    The move is verbatim except for blkio_update_group_{weight|bps|iops}()
    callbacks which relays conf changes to policies. The configuration
    settings are handled in policies themselves so the relaying isn't
    necessary. Conf setting functions are modified to directly call
    per-policy update functions and the relaying mechanism is dropped.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Add blkiop->cftypes which is added and removed together with the
    policy. This will be used to move conf/stat handling to the policies.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • conf/stat handling is about to be moved to policy implementation from
    blkcg core. Export conf/stat helpers from blkcg core so that
    blk-throttle and cfq-iosched can use them.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • As part of userland interface restructuring, this patch updates
    per-blkio_group configuration setting. Instead of funneling
    everything through a master function which has hard-coded cases for
    each config file it may handle, the common part is factored into
    blkg_conf_prep() and blkg_conf_finish() and different configuration
    setters are implemented using the helpers.

    While this doesn't result in immediate LOC reduction, this enables
    further cleanups and more modular implementation.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Similarly to the previous stat restructuring, this patch restructures
    conf printing code such that,

    * Conf printing uses the same helpers as stat.

    * Printing function doesn't require hardcoded switching on the config
    being printed. Note that this isn't complete yet for throttle
    confs. The next patch will convert setting for these confs and will
    complete the transition.

    * Printing uses read_seq_string callback (other methods will be phased
    out).

    Note that blkio_group_conf.iops[2] is changed to u64 so that they can
    be manipulated with the same functions. This is transitional and will
    go away later.

    After this patch, per-device configurations - weight, bps and iops -
    use __blkg_prfill_u64() for printing which uses white space as
    delimiter instead of tab.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blkcg stats handling is a mess. None of the stats has much to do with
    blkcg core but they are all implemented in blkcg core. Code sharing
    is achieved by mixing common code with hard-coded cases for each stat
    counter.

    This patch restructures statistics printing such that

    * Common logic exists as helper functions and specific print functions
    use the helpers to implement specific cases.

    * Printing functions serving multiple counters don't require hardcoded
    switching on specific counters.

    * Printing uses read_seq_string callback (other methods will be phased
    out).

    This change enables further cleanups and relocating stats code to the
    policy implementation it belongs to.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blkcg uses u64_stats_sync to avoid reading wrong u64 statistic values
    on 32bit archs and some stat counters have subtypes to distinguish
    read/writes and sync/async IOs. The stat code paths are confusing and
    involve a lot of going back and forth between blkcg core and specific
    policy implementations, and synchronization and subtype handling are
    open coded in blkcg core.

    This patch introduces struct blkg_stat and blkg_rwstat which, with
    accompanying operations, encapsulate stat updating and accessing with
    proper synchronization.

    blkg_stat is simple u64 counter with 64bit read-access protection.
    blkg_rwstat is the one with rw and [a]sync subcounters and takes @rw
    flags to distinguish IO subtypes (%REQ_WRITE and %REQ_SYNC) and
    replaces stat_sub_type indexed arrays.

    All counters in blkio_group_stats and blkio_group_stats_cpu are
    replaced with either blkg_stat or blkg_rwstat along with all users.

    This does add one u64_stats_sync per counter and increase stats_sync
    operations but they're empty/noops on 64bit archs and blkcg doesn't
    have too many counters, especially with DEBUG_BLK_CGROUP off.

    While the currently resulting code isn't necessarily simpler at the
    moment, this will enable further clean up of blkcg stats code.

    - BLKIO_STAT_{READ|WRITE|SYNC|ASYNC|TOTAL} renamed to
    BLKG_RWSTAT_{READ|WRITE|SYNC|ASYNC|TOTAL}.

    - blkg_stat_add() replaces blkio_add_stat() and
    blkio_check_and_dec_stat(). Note that BUG_ON() on underflow in the
    latter function no longer exists. It's *way* better to have
    underflowed stat counters than oopsing.

    - blkio_group_stats->dequeue is now a proper u64 stat counter instead
    of ulong.

    - reset_stats() updated to clear each stat counters individually and
    BLKG_STATS_DEBUG_CLEAR_{START|SIZE} are removed.

    - Some functions reconstruct rw flags from direction and sync
    booleans. This will be removed by future patches.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • BLKIO_STAT_CPU_SECTORS doesn't need read/write/sync/async subcounters
    and is counted by blkio_group_stats_cpu->sectors; however, it still
    holds a member in blkio_group_stats_cpu->stat_arr_cpu.

    Rearrange stat_type_cpu and define BLKIO_STAT_CPU_ARR_NR and use it
    for stat_arr_cpu[] size so that only SERVICE_BYTES and SERVICED have
    subcounters.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • @pol to blkg_to_pdata() and @plid to blkg_lookup_create() are no
    longer necessary. Drop them.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

20 Mar, 2012

6 commits

  • Add 64bit unique id to blkcg. This will be used by policies which
    want blkcg identity test to tell whether the associated blkcg has
    changed.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • With recent plug merge updates, all non-percpu stat updates happen
    under queue_lock making stats_lock unnecessary to synchronize stat
    updates. The only synchronization necessary is stat reading, which
    can be done using u64_stats_sync instead.

    This patch removes blkio_group->stats_lock and adds
    blkio_group_stats->syncp for reader synchronization.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Restructure blkio_get_stat() to prepare for removal of stats_lock.

    * Define BLKIO_STAT_ARR_NR explicitly to denote which stats have
    subtypes instead of using BLKIO_STAT_QUEUED.

    * Separate out stat acquisition and printing. After this, there are
    only two users of blkio_fill_stat(). Just open code it.

    * The code was mixing MAX_KEY_LEN and MAX_KEY_LEN - 1. There's no
    need to subtract one. Use MAX_KEY_LEN consistently.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blkiocg_reset_stats() implements stat reset for blkio.reset_stats
    cgroupfs file. This feature is very unconventional and something
    which shouldn't have been merged. It's only useful when there's only
    one user or tool looking at the stats. As soon as multiple users
    and/or tools are involved, it becomes useless as resetting disrupts
    other usages. There are very good reasons why all other stats expect
    readers to read values at the start and end of a period and subtract
    to determine delta over the period.

    The implementation is rather complex - some fields shouldn't be
    cleared and it saves some fields, resets whole and restores for some
    reason. Reset of percpu stats is also racy. The comment points to
    64bit store atomicity for the reason but even without that stores for
    zero can simply race with other CPUs doing RMW and get clobbered.

    Simplify reset by

    * Clear selectively instead of resetting and restoring.

    * Grouping debug stat fields to be reset and using memset() over them.

    * Not caring about stats_lock.

    * Using memset() to reset percpu stats.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • With recent plug merge updates, merged stats are no longer called for
    plug merges and now only updated while holding queue_lock. As
    stats_lock is scheduled to be removed, there's no reason to use percpu
    for merged stats. Don't use percpu for merged stats.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Current per cpu stat allocation assumes GFP_KERNEL allocation flag. But in
    IO path there are times when we want GFP_NOIO semantics. As there is no
    way to pass the allocation flags to alloc_percpu(), this patch delays the
    allocation of stats using a worker thread.

    v2-> tejun suggested following changes. Changed the patch accordingly.
    - move alloc_node location in structure
    - reduce the size of names of some of the fields
    - Reduce the scope of locking of alloc_list_lock
    - Simplified stat_alloc_fn() by allocating stats for all
    policies in one go and then assigning these to a group.

    v3 -> Andrew suggested to put some comments in the code. Also raised
    concerns about trying to allocate infinitely in case of allocation
    failure. I have changed the logic to sleep for 10ms before retrying.
    That should take care of non-preemptible UP kernels.

    v4 -> Tejun had more suggestions.
    - drop list_for_each_entry_all()
    - instead of msleep() use queue_delayed_work()
    - Some cleanups realted to more compact coding.

    v5-> tejun suggested more cleanups leading to more compact code.

    tj: - Relocated pcpu_stats into blkio_stat_alloc_fn().
    - Minor comment update.
    - This also fixes suspicious RCU usage warning caused by invoking
    cgroup_path() from blkg_alloc() without holding RCU read lock.
    Now that blkg_alloc() doesn't require sleepable context, RCU
    read lock from blkg_lookup_create() is maintained throughout
    blkg_alloc().

    Signed-off-by: Vivek Goyal
    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

07 Mar, 2012

2 commits

  • Implement bio_blkio_cgroup() which returns the blkcg associated with
    the bio if exists or %current's blkcg, and use it in blk-throttle and
    cfq-iosched propio. This makes both cgroup policies honor task
    association for the bio instead of always assuming %current.

    As nobody is using bio_set_task() yet, this doesn't introduce any
    behavior change.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Now that blkg additions / removals are always done under both q and
    blkcg locks, the only places RCU locking is necessary are
    blkg_lookup[_create]() for lookup w/o blkcg lock. This patch drops
    unncessary RCU locking replacing it with plain blkcg locking as
    necessary.

    * blkiocg_pre_destroy() already perform proper locking and don't need
    RCU. Dropped.

    * blkio_read_blkg_stats() now uses blkcg->lock instead of RCU read
    lock. This isn't a hot path.

    * Now unnecessary synchronize_rcu() from queue exit paths removed.
    This makes q->nr_blkgs unnecessary. Dropped.

    * RCU annotation on blkg->q removed.

    -v2: Vivek pointed out that blkg_lookup_create() still needs to be
    called under rcu_read_lock(). Updated.

    -v3: After the update, stats_lock locking in blkio_read_blkg_stats()
    shouldn't be using _irq variant as it otherwise ends up enabling
    irq while blkcg->lock is locked. Fixed.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo