07 Mar, 2012

34 commits

  • Currently ioc->nr_tasks is used to decide two things - whether an ioc
    is done issuing IOs and whether it's shared by multiple tasks. This
    patch separate out the first into ioc->active_ref, which is acquired
    and released using {get|put}_io_context_active() respectively.

    This will be used to associate bio's with a given task. This patch
    doesn't introduce any visible behavior change.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • ioc_task_link() is used to share %current's ioc on clone. If
    %current->io_context is set, %current is guaranteed to have refcount
    on the ioc and, thus, ioc_task_link() can't fail.

    Replace error checking in ioc_task_link() with WARN_ON_ONCE() and make
    it just increment refcount and nr_tasks.

    -v2: Description typo fix (Vivek).

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Make the following interface updates to prepare for future ioc related
    changes.

    * create_io_context() returning ioc only works for %current because it
    doesn't increment ref on the ioc. Drop @task parameter from it and
    always assume %current.

    * Make create_io_context_slowpath() return 0 or -errno and rename it
    to create_task_io_context().

    * Make ioc_create_icq() take @ioc as parameter instead of assuming
    that of %current. The caller, get_request(), is updated to create
    ioc explicitly and then pass it into ioc_create_icq().

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • get_request() is structured a bit unusually in that failure path is
    inlined in the usual flow with goto labels atop and inside it.
    Relocate the error path to the end of the function.

    This is to prepare for icq handling changes in get_request() and
    doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Now that blkg additions / removals are always done under both q and
    blkcg locks, the only places RCU locking is necessary are
    blkg_lookup[_create]() for lookup w/o blkcg lock. This patch drops
    unncessary RCU locking replacing it with plain blkcg locking as
    necessary.

    * blkiocg_pre_destroy() already perform proper locking and don't need
    RCU. Dropped.

    * blkio_read_blkg_stats() now uses blkcg->lock instead of RCU read
    lock. This isn't a hot path.

    * Now unnecessary synchronize_rcu() from queue exit paths removed.
    This makes q->nr_blkgs unnecessary. Dropped.

    * RCU annotation on blkg->q removed.

    -v2: Vivek pointed out that blkg_lookup_create() still needs to be
    called under rcu_read_lock(). Updated.

    -v3: After the update, stats_lock locking in blkio_read_blkg_stats()
    shouldn't be using _irq variant as it otherwise ends up enabling
    irq while blkcg->lock is locked. Fixed.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blkgs are chained from both blkcgs and request_queues and thus
    subjected to two locks - blkcg->lock and q->queue_lock. As both blkcg
    and q can go away anytime, locking during removal is tricky. It's
    currently solved by wrapping removal inside RCU, which makes the
    synchronization complex. There are three locks to worry about - the
    outer RCU, q lock and blkcg lock, and it leads to nasty subtle
    complications like conditional synchronize_rcu() on queue exit paths.

    For all other paths, blkcg lock is naturally nested inside q lock and
    the only exception is blkcg removal path, which is a very cold path
    and can be implemented as clumsy but conceptually-simple reverse
    double lock dancing.

    This patch updates blkg removal path such that blkgs are removed while
    holding both q and blkcg locks, which is trivial for request queue
    exit path - blkg_destroy_all(). The blkcg removal path,
    blkiocg_pre_destroy(), implements reverse double lock dancing
    essentially identical to ioc_release_fn().

    This simplifies blkg locking - no half-dead blkgs to worry about. Now
    unnecessary RCU annotations will be removed by the next patch.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently, blkg is per cgroup-queue-policy combination. This is
    unnatural and leads to various convolutions in partially used
    duplicate fields in blkg, config / stat access, and general management
    of blkgs.

    This patch make blkg's per cgroup-queue and let them serve all
    policies. blkgs are now created and destroyed by blkcg core proper.
    This will allow further consolidation of common management logic into
    blkcg core and API with better defined semantics and layering.

    As a transitional step to untangle blkg management, elvswitch and
    policy [de]registration, all blkgs except the root blkg are being shot
    down during elvswitch and bypass. This patch adds blkg_root_update()
    to update root blkg in place on policy change. This is hacky and racy
    but should be good enough as interim step until we get locking
    simplified and switch over to proper in-place update for all blkgs.

    -v2: Root blkgs need to be updated on elvswitch too and blkg_alloc()
    comment wasn't updated according to the function change. Fixed.
    Both pointed out by Vivek.

    -v3: v2 updated blkg_destroy_all() to invoke update_root_blkg_pd() for
    all policies. This freed root pd during elvswitch before the
    last queue finished exiting and led to oops. Directly invoke
    update_root_blkg_pd() only on BLKIO_POLICY_PROP from
    cfq_exit_queue(). This also is closer to what will be done with
    proper in-place blkg update. Reported by Vivek.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • With the previous patch to move blkg list heads and counters to
    request_queue and blkg, logic to manage them in both policies are
    almost identical and can be moved to blkcg core.

    This patch moves blkg link logic into blkg_lookup_create(), implements
    common blkg unlink code in blkg_destroy(), and updates
    blkg_destory_all() so that it's policy specific and can skip root
    group. The updated blkg_destroy_all() is now used to both clear queue
    for bypassing and elv switching, and release all blkgs on q exit.

    This patch introduces a race window where policy [de]registration may
    race against queue blkg clearing. This can only be a problem on cfq
    unload and shouldn't be a real problem in practice (and we have many
    other places where this race already exists). Future patches will
    remove these unlikely races.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently, specific policy implementations are responsible for
    maintaining list and number of blkgs. This duplicates code
    unnecessarily, and hinders factoring common code and providing blkcg
    API with better defined semantics.

    After this patch, request_queue hosts list heads and counters and blkg
    has list nodes for both policies. This patch only relocates the
    necessary fields and the next patch will actually move management code
    into blkcg core.

    Note that request_queue->blkg_list[] and ->nr_blkgs[] are hardcoded to
    have 2 elements. This is to avoid include dependency and will be
    removed by the next patch.

    This patch doesn't introduce any behavior change.

    -v2: Now unnecessary conditional on CONFIG_BLK_CGROUP_MODULE removed
    as pointed out by Vivek.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blkg is scheduled to be unified for all policies and thus there won't
    be one-to-one mapping from blkg to policy. Update stat related
    functions to take explicit @pol or @plid arguments and not use
    blkg->plid.

    This is painful for now but most of specific stat interface functions
    will be replaced with a handful of generic helpers.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • To prepare for unifying blkgs for different policies, make blkg->pd an
    array with BLKIO_NR_POLICIES elements and move blkg->conf, ->stats,
    and ->stats_cpu into blkg_policy_data.

    This patch doesn't introduce any functional difference.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently, blkcg policy implementations manage blkg refcnt duplicating
    mostly identical code in both policies. This patch moves refcnt to
    blkg and let blkcg core handle refcnt and freeing of blkgs.

    * cfq blkgs now also get freed via RCU.

    * cfq blkgs lose RB_EMPTY_ROOT() sanity check on blkg free. If
    necessary, we can add blkio_exit_group_fn() to resurrect this.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently, blkg's are embedded in private data blkcg policy private
    data structure and thus allocated and freed by policies. This leads
    to duplicate codes in policies, hinders implementing common part in
    blkcg core with strong semantics, and forces duplicate blkg's for the
    same cgroup-q association.

    This patch introduces struct blkg_policy_data which is a separate data
    structure chained from blkg. Policies specifies the amount of private
    data it needs in its blkio_policy_type->pdata_size and blkcg core
    takes care of allocating them along with blkg which can be accessed
    using blkg_to_pdata(). blkg can be determined from pdata using
    pdata_to_blkg(). blkio_alloc_group_fn() method is accordingly updated
    to blkio_init_group_fn().

    For consistency, tg_of_blkg() and cfqg_of_blkg() are replaced with
    blkg_to_tg() and blkg_to_cfqg() respectively, and functions to map in
    the reverse direction are added.

    Except that policy specific data now lives in a separate data
    structure from blkg, this patch doesn't introduce any functional
    difference.

    This will be used to unify blkg's for different policies.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Keep track of all request_queues which have blkcg initialized and turn
    on bypass and invoke blkcg_clear_queue() on all before making changes
    to blkcg policies.

    This is to prepare for moving blkg management into blkcg core. Note
    that this uses more brute force than necessary. Finer grained shoot
    down will be implemented later and given that policy [un]registration
    almost never happens on running systems (blk-throtl can't be built as
    a module and cfq usually is the builtin default iosched), this
    shouldn't be a problem for the time being.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently block core calls directly into blk-throttle for init, drain
    and exit. This patch adds blkcg_{init|drain|exit}_queue() which wraps
    the blk-throttle functions. This is to give more control and
    visiblity to blkcg core layer for proper layering. Further patches
    will add logic common to blkcg policies to the functions.

    While at it, collapse blk_throtl_release() into blk_throtl_exit().
    There's no reason to keep them separate.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently, blkg points to the associated blkcg via its css_id. This
    unnecessarily complicates dereferencing blkcg. Let blkg hold a
    reference to the associated blkcg and point directly to it and disable
    css_id on blkio_subsys.

    This change requires splitting blkiocg_destroy() into
    blkiocg_pre_destroy() and blkiocg_destroy() so that all blkg's can be
    destroyed and all the blkcg references held by them dropped during
    cgroup removal.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blk-cgroup printing code currently assumes that there is a device/disk
    associated with every queue in the system, but modules like floppy,
    can instantiate request queues without registering disk which can lead
    to oops.

    Skip the queue/blkg which don't have dev/disk associated with them.

    -tj: Factored out backing_dev_info check into blkg_dev_name().

    Signed-off-by: Vivek Goyal
    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • blkg->dev is dev_t recording the device number of the block device for
    the associated request_queue. It is used to identify the associated
    block device when printing out configuration or stats.

    This is redundant to begin with. A blkg is an association between a
    cgroup and a request_queue and it of course is possible to reach
    request_queue from blkg and synchronization conventions are in place
    for safe q dereferencing, so this shouldn't be necessary from the
    beginning. Furthermore, it's initialized by sscanf()ing the device
    name of backing_dev_info. The mind boggles.

    Anyways, if blkg is visible under rcu lock, we *know* that the
    associated request_queue hasn't gone away yet and its bdi is
    registered and alive - blkg can't be created for request_queue which
    hasn't been fully initialized and it can't go away before blkg is
    removed.

    Let stat and conf read functions get device name from
    blkg->q->backing_dev_info.dev and pass it down to printing functions
    and remove blkg->dev.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Now that blkcg configuration lives in blkg's, blkio_policy_node is no
    longer necessary. Kill it.

    blkio_policy_parse_and_set() now fails if invoked for missing device
    and functions to print out configurations are updated to print from
    blkg's.

    cftype_blkg_same_policy() is dropped along with other policy functions
    for consistency. Its one line is open coded in the only user -
    blkio_read_blkg_stats().

    -v2: Update to reflect the retry-on-bypass logic change of the
    previous patch.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blkcg is very peculiar in that it allows setting and remembering
    configurations for non-existent devices by maintaining separate data
    structures for configuration.

    This behavior is completely out of the usual norms and outright
    confusing; furthermore, it uses dev_t number to match the
    configuration to devices, which is unpredictable to begin with and
    becomes completely unuseable if EXT_DEVT is fully used.

    It is wholely unnecessary - we already have fully functional userland
    mechanism to program devices being hotplugged which has full access to
    device identification, connection topology and filesystem information.

    Add a new struct blkio_group_conf which contains all blkcg
    configurations to blkio_group and let blkio_group, which can be
    created iff the associated device exists and is removed when the
    associated device goes away, carry all configurations.

    Note that, after this patch, all newly created blkg's will always have
    the default configuration (unlimited for throttling and blkcg's weight
    for propio).

    This patch makes blkio_policy_node meaningless but doesn't remove it.
    The next patch will.

    -v2: Updated to retry after short sleep if blkg lookup/creation failed
    due to the queue being temporarily bypassed as indicated by
    -EBUSY return. Pointed out by Vivek.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Cc: Kay Sievers
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently both blk-throttle and cfq-iosched implement their own
    blkio_group creation code in throtl_get_tg() and cfq_get_cfqg(). This
    patch factors out the common code into blkg_lookup_create(), which
    returns ERR_PTR value so that transitional failures due to queue
    bypass can be distinguished from other failures.

    * New plkio_policy_ops methods blkio_alloc_group_fn() and
    blkio_link_group_fn added. Both are transitional and will be
    removed once the blkg management code is fully moved into
    blk-cgroup.c.

    * blkio_alloc_group_fn() allocates policy-specific blkg which is
    usually a larger data structure with blkg as the first entry and
    intiailizes it. Note that initialization of blkg proper, including
    percpu stats, is responsibility of blk-cgroup proper.

    Note that default config (weight, bps...) initialization is done
    from this method; otherwise, we end up violating locking order
    between blkcg and q locks via blkcg_get_CONF() functions.

    * blkio_link_group_fn() is called under queue_lock and responsible for
    linking the blkg to the queue. blkcg side is handled by blk-cgroup
    proper.

    * The common blkg creation function is named blkg_lookup_create() and
    blkiocg_lookup_group() is renamed to blkg_lookup() for consistency.
    Also, throtl / cfq related functions are similarly [re]named for
    consistency.

    This simplifies blkcg policy implementations and enables further
    cleanup.

    -v2: Vivek noticed that blkg_lookup_create() incorrectly tested
    blk_queue_dead() instead of blk_queue_bypass() leading a user of
    the function ending up creating a new blkg on bypassing queue.
    This is a bug introduced while relocating bypass patches before
    this one. Fixed.

    -v3: ERR_PTR patch folded into this one. @for_root added to
    blkg_lookup_create() to allow creating root group on a bypassed
    queue during elevator switch.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • For root blkg, blk_throtl_init() was using throtl_alloc_tg()
    explicitly and cfq_init_queue() was manually initializing embedded
    cfqd->root_group, adding unnecessarily different code paths to blkg
    handling.

    Make both use the usual blkio_group get functions - throtl_get_tg()
    and cfq_get_cfqg() - for the root blkio_group too. Note that
    blk_throtl_init() callsite is pushed downwards in
    blk_alloc_queue_node() so that @q is sufficiently initialized for
    throtl_get_tg().

    This simplifies root blkg handling noticeably for cfq and will allow
    further modularization of blkcg API.

    -v2: Vivek pointed out that using cfq_get_cfqg() won't work if
    CONFIG_CFQ_GROUP_IOSCHED is disabled. Fix it by factoring out
    initialization of base part of cfqg into cfq_init_cfqg_base() and
    alloc/init/free explicitly if !CONFIG_CFQ_GROUP_IOSCHED.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Block cgroup policies are maintained in a linked list and,
    theoretically, multiple policies sharing the same policy ID are
    allowed.

    This patch temporarily restricts one policy per plid and adds
    blkio_policy[] array which indexes registered policy types by plid.
    Both the restriction and blkio_policy[] array are transitional and
    will be removed once API cleanup is complete.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blkgio_group is association between a block cgroup and a queue for a
    given policy. Using opaque void * for association makes things
    confusing and hinders factoring of common code. Use request_queue *
    and, if necessary, policy id instead.

    This will help block cgroup API cleanup.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • In both blkg get functions - throtl_get_tg() and cfq_get_cfqg(),
    instead of obtaining blkcg of %current explicitly, let the caller
    specify the blkcg to use as parameter and make both functions hold on
    to the blkcg.

    This is part of block cgroup interface cleanup and will help making
    blkcg API more modular.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • rcu_read_lock() in throtl_get_tb() and cfq_get_cfqg() holds onto
    @blkcg while looking up blkg. For API cleanup, the next patch will
    make the caller responsible for determining @blkcg to look blkg from
    and let them specify it as a parameter. Move rcu read locking out to
    the callers to prepare for the change.

    -v2: Originally this patch was described as a fix for RCU read locking
    bug around @blkg, which Vivek pointed out to be incorrect. It
    was from misunderstanding the role of rcu locking as protecting
    @blkg not @blkcg. Patch description updated.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Elevator switch may involve changes to blkcg policies. Implement
    shoot down of blkio_groups.

    Combined with the previous bypass updates, the end goal is updating
    blkcg core such that it can ensure that blkcg's being affected become
    quiescent and don't have any per-blkg data hanging around before
    commencing any policy updates. Until queues are made aware of the
    policies that applies to them, as an interim step, all per-policy blkg
    data will be shot down.

    * blk-throtl doesn't need this change as it can't be disabled for a
    live queue; however, update it anyway as the scheduled blkg
    unification requires this behavior change. This means that
    blk-throtl configuration will be unnecessarily lost over elevator
    switch. This oddity will be removed after blkcg learns to associate
    individual policies with request_queues.

    * blk-throtl dosen't shoot down root_tg. This is to ease transition.
    Unified blkg will always have persistent root group and not shooting
    down root_tg for now eases transition to that point by avoiding
    having to update td->root_tg and is safe as blk-throtl can never be
    disabled

    -v2: Vivek pointed out that group list is not guaranteed to be empty
    on return from clear function if it raced cgroup removal and
    lost. Fix it by waiting a bit and retrying. This kludge will
    soon be removed once locking is updated such that blkg is never
    in limbo state between blkcg and request_queue locks.

    blk-throtl no longer shoots down root_tg to avoid breaking
    td->root_tg.

    Also, Nest queue_lock inside blkio_list_lock not the other way
    around to avoid introduce possible deadlock via blkcg lock.

    -v3: blkcg_clear_queue() repositioned and renamed to
    blkg_destroy_all() to increase consistency with later changes.
    cfq_clear_queue() updated to check q->elevator before
    dereferencing it to avoid NULL dereference on not fully
    initialized queues (used by later change).

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Extend queue bypassing such that dying queue is always bypassing and
    blk-throttle is drained on bypass. With blkcg policies updated to
    test blk_queue_bypass() instead of blk_queue_dead(), this ensures that
    no bio or request is held by or going through blkcg policies on a
    bypassing queue.

    This will be used to implement blkg cleanup on elevator switches and
    policy changes.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Rename and extend elv_queisce_start/end() to
    blk_queue_bypass_start/end() which are exported and supports nesting
    via @q->bypass_depth. Also add blk_queue_bypass() to test bypass
    state.

    This will be further extended and used for blkio_group management.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • elevator_ops->elevator_init_fn() has a weird return value. It returns
    a void * which the caller should assign to q->elevator->elevator_data
    and %NULL return denotes init failure.

    Update such that it returns integer 0/-errno and sets elevator_data
    directly as necessary.

    This makes the interface more conventional and eases further cleanup.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Elevator switch tries hard to keep as much as context until new
    elevator is ready so that it can revert to the original state if
    initializing the new elevator fails for some reason. Unfortunately,
    with more auxiliary contexts to manage, this makes elevator init and
    exit paths too complex and fragile.

    This patch makes elevator_switch() unregister the current elevator and
    flush icq's before start initializing the new one. As we still keep
    the old elevator itself, the only difference is that we lose icq's on
    rare occassions of switching failure, which isn't critical at all.

    Note that this makes explicit elevator parameter to
    elevator_init_queue() and __elv_register_queue() unnecessary as they
    always can use the current elevator.

    This patch enables block cgroup cleanups.

    -v2: blk_add_trace_msg() prints elevator name from @new_e instead of
    @e->type as the local variable no longer exists. This caused
    build failure on CONFIG_BLK_DEV_IO_TRACE.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • cfq has been registering zeroed blkio_poilcy_cfq if CFQ_GROUP_IOSCHED
    is disabled. This fortunately doesn't collide with blk-throtl as
    BLKIO_POLICY_PROP is zero but is unnecessary and risky. Just don't
    register it if not enabled.

    Signed-off-by: Tejun Heo
    Acked-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Block cgroup core can be built as module; however, it isn't too useful
    as blk-throttle can only be built-in and cfq-iosched is usually the
    default built-in scheduler. Scheduled blkcg cleanup requires calling
    into blkcg from block core. To simplify that, disallow building blkcg
    as module by making CONFIG_BLK_CGROUP bool.

    If building blkcg core as module really matters, which I doubt, we can
    revisit it after blkcg API cleanup.

    -v2: Vivek pointed out that IOSCHED_CFQ was incorrectly updated to
    depend on BLK_CGROUP. Fixed.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently, blk_cleanup_queue() doesn't call elv_drain_elevator() if
    q->elevator doesn't exist; however, bio based drivers don't have
    elevator initialized but can still use blk-throttle. This patch moves
    q->elevator test inside blk_drain_queue() such that only
    elv_drain_elevator() is skipped if !q->elevator.

    -v2: loop can have registered queue which has NULL request_fn. Make
    sure we don't call into __blk_run_queue() in such cases.

    Signed-off-by: Tejun Heo
    Reported-by: Vivek Goyal

    Fold in bug fix from Vivek.

    Signed-off-by: Jens Axboe

    Tejun Heo
     

04 Mar, 2012

1 commit


02 Mar, 2012

5 commits

  • This patch (as1519) fixes a bug in the block layer's disk-events
    polling. The polling is done by a work routine queued on the
    system_nrt_wq workqueue. Since that workqueue isn't freezable, the
    polling continues even in the middle of a system sleep transition.

    Obviously, polling a suspended drive for media changes and such isn't
    a good thing to do; in the case of USB mass-storage devices it can
    lead to real problems requiring device resets and even re-enumeration.

    The patch fixes things by creating a new system-wide, non-reentrant,
    freezable workqueue and using it for disk-events polling.

    Signed-off-by: Alan Stern
    CC:
    Acked-by: Tejun Heo
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Jens Axboe

    Alan Stern
     
  • Set CommandMailbox with memset before use it. Fix for:

    drivers/block/DAC960.c: In function ‘DAC960_V1_EnableMemoryMailboxInterface’:
    arch/x86/include/asm/io.h:61:1: warning: ‘CommandMailbox.Bytes[12]’
    may be used uninitialized in this function [-Wuninitialized]
    drivers/block/DAC960.c:1175:30: note: ‘CommandMailbox.Bytes[12]’
    was declared here

    Signed-off-by: Danny Kukawka
    Signed-off-by: Jens Axboe

    Danny Kukawka
     
  • Fixed compiler warning:

    comparison between ‘DAC960_V2_IOCTL_Opcode_T’ and ‘enum ’

    Renamed enum, added a new enum for SCSI_10.CommandOpcode in
    DAC960_V2_ProcessCompletedCommand().

    Signed-off-by: Danny Kukawka
    Signed-off-by: Jens Axboe

    Danny Kukawka
     
  • The following situation might occur:

    __blkdev_get: add_disk:

    register_disk()
    get_gendisk()

    disk_block_events()
    disk->ev == NULL

    disk_add_events()

    __disk_unblock_events()
    disk->ev != NULL
    --ev->block

    Then we unblock events, when they are suppose to be blocked. This can
    trigger events related block/genhd.c warnings, but also can crash in
    sd_check_events() or other places.

    I'm able to reproduce crashes with the following scripts (with
    connected usb dongle as sdb disk).

    DEV=/dev/sdb
    ENABLE=/sys/bus/usb/devices/1-2/bConfigurationValue

    function stop_me()
    {
    for i in `jobs -p` ; do kill $i 2> /dev/null ; done
    exit
    }

    trap stop_me SIGHUP SIGINT SIGTERM

    for ((i = 0; i < 10; i++)) ; do
    while true; do fdisk -l $DEV 2>&1 > /dev/null ; done &
    done

    while true ; do
    echo 1 > $ENABLE
    sleep 1
    echo 0 > $ENABLE
    done

    I use the script to verify patch fixing oops in sd_revalidate_disk
    http://marc.info/?l=linux-scsi&m=132935572512352&w=2
    Without Jun'ichi Nomura patch titled "Fix NULL pointer dereference in
    sd_revalidate_disk" or this one, script easily crash kernel within
    a few seconds. With both patches applied I do not observe crash.
    Unfortunately after some time (dozen of minutes), script will hung in:

    [ 1563.906432] [] schedule_timeout_uninterruptible+0x15/0x20
    [ 1563.906437] [] msleep+0x15/0x20
    [ 1563.906443] [] blk_drain_queue+0x32/0xd0
    [ 1563.906447] [] blk_cleanup_queue+0xd0/0x170
    [ 1563.906454] [] scsi_free_queue+0x3f/0x60
    [ 1563.906459] [] __scsi_remove_device+0x6e/0xb0
    [ 1563.906463] [] scsi_forget_host+0x4f/0x60
    [ 1563.906468] [] scsi_remove_host+0x5a/0xf0
    [ 1563.906482] [] quiesce_and_remove_host+0x5b/0xa0 [usb_storage]
    [ 1563.906490] [] usb_stor_disconnect+0x13/0x20 [usb_storage]

    Anyway I think this patch is some step forward.

    As drawback, I do not teardown on sysfs file create error, because I do
    not know how to nullify disk->ev (since it can be used). However add_disk
    error handling practically does not exist too, and things will work
    without this sysfs file, except events will not be exported to user
    space.

    Signed-off-by: Stanislaw Gruszka
    Acked-by: Tejun Heo
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Stanislaw Gruszka
     
  • Fix setting bio flags in drivers (sd_dif/floppy).

    Signed-off-by: Muthukumar R
    Signed-off-by: Jens Axboe

    Muthukumar R