16 Jan, 2012

1 commit

  • * 'for-3.3/core' of git://git.kernel.dk/linux-block: (37 commits)
    Revert "block: recursive merge requests"
    block: Stop using macro stubs for the bio data integrity calls
    blockdev: convert some macros to static inlines
    fs: remove unneeded plug in mpage_readpages()
    block: Add BLKROTATIONAL ioctl
    block: Introduce blk_set_stacking_limits function
    block: remove WARN_ON_ONCE() in exit_io_context()
    block: an exiting task should be allowed to create io_context
    block: ioc_cgroup_changed() needs to be exported
    block: recursive merge requests
    block, cfq: fix empty queue crash caused by request merge
    block, cfq: move icq creation and rq->elv.icq association to block core
    block, cfq: restructure io_cq creation path for io_context interface cleanup
    block, cfq: move io_cq exit/release to blk-ioc.c
    block, cfq: move icq cache management to block core
    block, cfq: move io_cq lookup to blk-ioc.c
    block, cfq: move cfqd->icq_list to request_queue and add request->elv.icq
    block, cfq: reorganize cfq_io_context into generic and cfq specific parts
    block: remove elevator_queue->ops
    block: reorder elevator switch sequence
    ...

    Fix up conflicts in:
    - block/blk-cgroup.c
    Switch from can_attach_task to can_attach
    - block/cfq-iosched.c
    conflict with now removed cic index changes (we now use q->id instead)

    Linus Torvalds
     

14 Dec, 2011

3 commits

  • cic is association between io_context and request_queue. A cic is
    linked from both ioc and q and should be destroyed when either one
    goes away. As ioc and q both have their own locks, locking becomes a
    bit complex - both orders work for removal from one but not from the
    other.

    Currently, cfq tries to circumvent this locking order issue with RCU.
    ioc->lock nests inside queue_lock but the radix tree and cic's are
    also protected by RCU allowing either side to walk their lists without
    grabbing lock.

    This rather unconventional use of RCU quickly devolves into extremely
    fragile convolution. e.g. The following is from cfqd going away too
    soon after ioc and q exits raced.

    general protection fault: 0000 [#1] PREEMPT SMP
    CPU 2
    Modules linked in:
    [ 88.503444]
    Pid: 599, comm: hexdump Not tainted 3.1.0-rc10-work+ #158 Bochs Bochs
    RIP: 0010:[] [] cfq_exit_single_io_context+0x58/0xf0
    ...
    Call Trace:
    [] call_for_each_cic+0x5a/0x90
    [] cfq_exit_io_context+0x15/0x20
    [] exit_io_context+0x100/0x140
    [] do_exit+0x579/0x850
    [] do_group_exit+0x5b/0xd0
    [] sys_exit_group+0x17/0x20
    [] system_call_fastpath+0x16/0x1b

    The only real hot path here is cic lookup during request
    initialization and avoiding extra locking requires very confined use
    of RCU. This patch makes cic removal from both ioc and request_queue
    perform double-locking and unlink immediately.

    * From q side, the change is almost trivial as ioc->lock nests inside
    queue_lock. It just needs to grab each ioc->lock as it walks
    cic_list and unlink it.

    * From ioc side, it's a bit more difficult because of inversed lock
    order. ioc needs its lock to walk its cic_list but can't grab the
    matching queue_lock and needs to perform unlock-relock dancing.

    Unlinking is now wholly done from put_io_context() and fast path is
    optimized by using the queue_lock the caller already holds, which is
    by far the most common case. If the ioc accessed multiple devices,
    it tries with trylock. In unlikely cases of fast path failure, it
    falls back to full double-locking dance from workqueue.

    Double-locking isn't the prettiest thing in the world but it's *far*
    simpler and more understandable than RCU trick without adding any
    meaningful overhead.

    This still leaves a lot of now unnecessary RCU logics. Future patches
    will trim them.

    -v2: Vivek pointed out that cic->q was being dereferenced after
    cic->release() was called. Updated to use local variable @this_q
    instead.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • ioprio/cgroup change was handled by marking the changed state in ioc
    and, on the following access to the ioc, performing RCU-protected
    iteration through all cic's grabbing the matching queue_lock.

    This patch moves the changed state to each cic. When ioprio or cgroup
    changes, the respective bit is set on all cic's of the ioc and when
    each of those cic (not ioc) is accessed, change is applied for that
    specific ioc-queue pair.

    This also fixes the following two race conditions between setting and
    clearing of changed states.

    * Missing barrier between assign/load of ioprio and ioprio_changed
    allowed applying old ioprio.

    * Change requests could happen between application of change and
    clearing of changed variables.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Ignoring copy_io() during fork, io_context can be allocated from two
    places - current_io_context() and set_task_ioprio(). The former is
    always called from local task while the latter can be called from
    different task. The synchornization between them are peculiar and
    dubious.

    * current_io_context() doesn't grab task_lock() and assumes that if it
    saw %NULL ->io_context, it would stay that way until allocation and
    assignment is complete. It has smp_wmb() between alloc/init and
    assignment.

    * set_task_ioprio() grabs task_lock() for assignment and does
    smp_read_barrier_depends() between "ioc = task->io_context" and "if
    (ioc)". Unfortunately, this doesn't achieve anything - the latter
    is not a dependent load of the former. ie, if ioc itself were being
    dereferenced "ioc->xxx", it would mean something (not sure what tho)
    but as the code currently stands, the dependent read barrier is
    noop.

    As only one of the the two test-assignment sequences is task_lock()
    protected, the task_lock() can't do much about race between the two.
    Nothing prevents current_io_context() and set_task_ioprio() allocating
    its own ioc for the same task and overwriting the other's.

    Also, set_task_ioprio() can race with exiting task and create a new
    ioc after exit_io_context() is finished.

    ioc get/put doesn't have any reason to be complex. The only hot path
    is accessing the existing ioc of %current, which is simple to achieve
    given that ->io_context is never destroyed as long as the task is
    alive. All other paths can happily go through task_lock() like all
    other task sub structures without impacting anything.

    This patch updates ioc get/put so that it becomes more conventional.

    * alloc_io_context() is replaced with get_task_io_context(). This is
    the only interface which can acquire access to ioc of another task.
    On return, the caller has an explicit reference to the object which
    should be put using put_io_context() afterwards.

    * The functionality of current_io_context() remains the same but when
    creating a new ioc, it shares the code path with
    get_task_io_context() and always goes through task_lock().

    * get_io_context() now means incrementing ref on an ioc which the
    caller already has access to (be that an explicit refcnt or implicit
    %current one).

    * PF_EXITING inhibits creation of new io_context and once
    exit_io_context() is finished, it's guaranteed that both ioc
    acquisition functions return %NULL.

    * All users are updated. Most are trivial but
    smp_read_barrier_depends() removal from cfq_get_io_context() needs a
    bit of explanation. I suppose the original intention was to ensure
    ioc->ioprio is visible when set_task_ioprio() allocates new
    io_context and installs it; however, this wouldn't have worked
    because set_task_ioprio() doesn't have wmb between init and install.
    There are other problems with this which will be fixed in another
    patch.

    * While at it, use NUMA_NO_NODE instead of -1 for wildcard node
    specification.

    -v2: Vivek spotted contamination from debug patch. Removed.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     

13 Dec, 2011

1 commit

  • Now that subsys->can_attach() and attach() take @tset instead of
    @task, they can handle per-task operations. Convert
    ->can_attach_task() and ->attach_task() users to use ->can_attach()
    and attach() instead. Most converions are straight-forward.
    Noteworthy changes are,

    * In cgroup_freezer, remove unnecessary NULL assignments to unused
    methods. It's useless and very prone to get out of sync, which
    already happened.

    * In cpuset, PF_THREAD_BOUND test is checked for each task. This
    doesn't make any practical difference but is conceptually cleaner.

    Signed-off-by: Tejun Heo
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Frederic Weisbecker
    Acked-by: Li Zefan
    Cc: Paul Menage
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: James Morris
    Cc: Ingo Molnar
    Cc: Peter Zijlstra

    Tejun Heo
     

25 Oct, 2011

2 commits


19 Oct, 2011

1 commit

  • blkio_policy_parse_and_set() calls blkio_check_dev_num() to check
    whether the given dev_t is valid. blkio_check_dev_num() uses
    get_gendisk() for verification but never puts the returned genhd
    leaking the reference.

    This patch collapses blkio_check_dev_num() into its caller and updates
    it such that the genhd is put before returning.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

21 Sep, 2011

1 commit

  • The bug is we're not able to remove the device from blkio cgroup's
    per-device control files if it gets unplugged.

    To reproduce the bug:

    # mount -t cgroup -o blkio xxx /cgroup
    # cd /cgroup
    # echo "8:0 1000" > blkio.throttle.read_bps_device
    # unplug the device
    # cat blkio.throttle.read_bps_device
    8:0 1000
    # echo "8:0 0" > blkio.throttle.read_bps_device
    -bash: echo: write error: No such device

    After patching, the device removal will succeed.

    Thanks for the comments of Paul, Zefan, and Vivek.

    Signed-off-by: Wanlong Gao
    Cc: Li Zefan
    Cc: Paul Menage
    Acked-by: Vivek Goyal
    Cc: Jens Axboe
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Wanlong Gao
     

27 May, 2011

1 commit

  • Add cgroup subsystem callbacks for per-thread attachment in atomic contexts

    Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
    for cgroups's subsystem interface. Unlike can_attach and attach, these
    are for per-thread operations, to be called potentially many times when
    attaching an entire threadgroup.

    Also, the old "bool threadgroup" interface is removed, as replaced by
    this. All subsystems are modified for the new interface - of note is
    cpuset, which requires from/to nodemasks for attach to be globally scoped
    (though per-cpuset would work too) to persist from its pre_attach to
    attach_task and attach.

    This is a pre-patch for cgroup-procs-writable.patch.

    Signed-off-by: Ben Blum
    Cc: "Eric W. Biederman"
    Cc: Li Zefan
    Cc: Matt Helsley
    Reviewed-by: Paul Menage
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     

23 May, 2011

1 commit


21 May, 2011

4 commits

  • Now dispatch stats update is lock free. But reset of these stats still
    takes blkg->stats_lock and is dependent on that. As stats are per cpu,
    we should be able to just reset the stats on each cpu without any locks.
    (Atleast for 64bit arch).

    On 32bit arch there is a small race where 64bit updates are not atomic.
    The result of this race can be that in the presence of other writers,
    one might not get 0 value after reset of a stat and might see something
    intermediate

    One can write more complicated code to cover this race like sending IPI
    to other cpus to reset stats and for offline cpus, reset these directly.

    Right not I am not taking that path because reset_update is more of a
    debug feature and it can happen only on 32bit arch and possibility of
    it happening is small. Will fix it if it becomes a real problem. For
    the time being going for code simplicity.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • Some of the stats are 64bit and updation will be non atomic on 32bit
    architecture. Use sequence counters on 32bit arch to make reading
    of stats safe.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • Currently we take blkg_stat lock for even updating the stats. So even if
    a group has no throttling rules (common case for root group), we end
    up taking blkg_lock, for updating the stats.

    Make dispatch stats per cpu so that these can be updated without taking
    blkg lock.

    If cpu goes offline, these stats simply disappear. No protection has
    been provided for that yet. Do we really need anything for that?

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • cgroup unaccounted_time file is created only if CONFIG_DEBUG_BLK_CGROUP=y.
    there are some fields which are out side this config option. Fix that.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

16 May, 2011

1 commit

  • Currentlly we first map the task to cgroup and then cgroup to
    blkio_cgroup. There is a more direct way to get to blkio_cgroup
    from task using task_subsys_state(). Use that.

    The real reason for the fix is that it also avoids a race in generic
    cgroup code. During remount/umount rebind_subsystems() is called and
    it can do following with and rcu protection.

    cgrp->subsys[i] = NULL;

    That means if somebody got hold of cgroup under rcu and then it tried
    to do cgroup->subsys[] to get to blkio_cgroup, it would get NULL which
    is wrong. I was running into this race condition with ltp running on a
    upstream derived kernel and that lead to crash.

    So ideally we should also fix cgroup generic code to wait for rcu
    grace period before setting pointer to NULL. Li Zefan is not very keen
    on introducing synchronize_wait() as he thinks it will slow
    down moun/remount/umount operations.

    So for the time being atleast fix the kernel crash by taking a more
    direct route to blkio_cgroup.

    One tester had reported a crash while running LTP on a derived kernel
    and with this fix crash is no more seen while the test has been
    running for over 6 days.

    Signed-off-by: Vivek Goyal
    Reviewed-by: Li Zefan
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

31 Mar, 2011

1 commit


23 Mar, 2011

1 commit


12 Mar, 2011

1 commit

  • There are two kind of times that tasks are not charged for: the first
    seek and the extra time slice used over the allocated timeslice. Both
    of these exported as a new unaccounted_time stat.

    I think it would be good to have this reported in 'time' as well, but
    that is probably a separate discussion.

    Signed-off-by: Justin TerAvest
    Signed-off-by: Jens Axboe

    Justin TerAvest
     

16 Nov, 2010

1 commit

  • o Allow hierarchical cgroup creation for blkio controller

    o Currently we disallow it as both the io controller policies (throttling
    as well as proportion bandwidth) do not support hierarhical accounting
    and control. But the flip side is that blkio controller can not be used with
    libvirt as libvirt creates a cgroup hierarchy deeper than 1 level.

    //libvirt/qemu/

    o So this patch will allow creation of cgroup hierarhcy but at the backend
    everything will be treated as flat. So if somebody created a an hierarchy
    like as follows.

    root
    / \
    test1 test2
    |
    test3

    CFQ and throttling will practically treat all groups at same level.

    pivot
    / | \ \
    root test1 test2 test3

    o Once we have actual support for hierarchical accounting and control
    then we can introduce another cgroup tunable file "blkio.use_hierarchy"
    which will be 0 by default but if user wants to enforce hierarhical
    control then it can be set to 1. This way there should not be any
    ABI problems down the line.

    o The only not so pretty part is introduction of extra file "use_hierarchy"
    down the line. Kame-san had mentioned that hierarhical accounting is
    expensive in memory controller hence they keep it off by default. I
    suspect same will be the case for IO controller also as for each IO
    completion we shall have to account IO through hierarchy up to the root.
    if yes, then it probably is not a very bad idea to introduce this extra
    file so that it will be used only when somebody needs it and some people
    might enable hierarchy only in part of the hierarchy.

    o This is how basically memory controller also uses "use_hierarhcy" and
    they also allowed creation of hierarchies when actual backend support
    was not available.

    Signed-off-by: Vivek Goyal
    Acked-by: Balbir Singh
    Reviewed-by: Gui Jianfeng
    Reviewed-by: Ciju Rajan K
    Tested-by: Ciju Rajan K
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

23 Oct, 2010

1 commit

  • * 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits)
    cfq-iosched: Fix a gcc 4.5 warning and put some comments
    block: Turn bvec_k{un,}map_irq() into static inline functions
    block: fix accounting bug on cross partition merges
    block: Make the integrity mapped property a bio flag
    block: Fix double free in blk_integrity_unregister
    block: Ensure physical block size is unsigned int
    blkio-throttle: Fix possible multiplication overflow in iops calculations
    blkio-throttle: limit max iops value to UINT_MAX
    blkio-throttle: There is no need to convert jiffies to milli seconds
    blkio-throttle: Fix link failure failure on i386
    blkio: Recalculate the throttled bio dispatch time upon throttle limit change
    blkio: Add root group to td->tg_list
    blkio: deletion of a cgroup was causes oops
    blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n
    block: set the bounce_pfn to the actual DMA limit rather than to max memory
    block: revert bad fix for memory hotplug causing bounces
    Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK
    block: set the bounce_pfn to the actual DMA limit rather than to max memory
    block: Prevent hang_check firing during long I/O
    cfq: improve fsync performance for small files
    ...

    Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h

    Linus Torvalds
     

02 Oct, 2010

1 commit


01 Oct, 2010

3 commits

  • o Currently any cgroup throttle limit changes are processed asynchronousy and
    the change does not take affect till a new bio is dispatched from same group.

    o It might happen that a user sets a redicuously low limit on throttling.
    Say 1 bytes per second on reads. In such cases simple operations like mount
    a disk can wait for a very long time.

    o Once bio is throttled, there is no easy way to come out of that wait even if
    user increases the read limit later.

    o This patch fixes it. Now if a user changes the cgroup limits, we recalculate
    the bio dispatch time according to new limits.

    o Can't take queueu lock under blkcg_lock, hence after the change I wake
    up the dispatch thread again which recalculates the time. So there are some
    variables being synchronized across two threads without lock and I had to
    make use of barriers. Hoping I have used barriers correctly. Any review of
    memory barrier code especially will help.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • o Now a cgroup list of blkg elements can contain blkg from multiple policies.
    Before sending an unlink event, make sure blkg belongs to they policy. If
    policy does not own the blkg, do not send update for this blkg.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • Currently throttling related files were visible even if user had disabled
    throttling using config options. It was switching off background throttling
    of bio but not the cgroup files. This patch fixes it.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

16 Sep, 2010

4 commits


23 Aug, 2010

1 commit

  • If the cgroup hierarchy for blkio control groups is deeper than two
    levels, kernel should not allow the creation of further levels. mkdir
    system call does not except EINVAL as a return value. This patch
    replaces EINVAL with more appropriate EPERM

    Signed-off-by: Ciju Rajan K
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Jens Axboe

    Ciju Rajan K
     

22 May, 2010

1 commit


07 May, 2010

1 commit

  • with CONFIG_PROVE_RCU=y, a warning can be triggered:

    # mount -t cgroup -o blkio xxx /mnt
    # mkdir /mnt/subgroup

    ...
    kernel/cgroup.c:4442 invoked rcu_dereference_check() without protection!
    ...

    To fix this, we avoid caling css_depth() here, which is a bit simpler
    than the original code.

    Signed-off-by: Li Zefan
    Acked-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Li Zefan
     

03 May, 2010

1 commit


27 Apr, 2010

2 commits

  • This patch fixes few usability and configurability issues.

    o All the cgroup based controller options are configurable from
    "Genral Setup/Control Group Support/" menu. blkio is the only exception.
    Hence make this option visible in above menu and make it configurable from
    there to bring it inline with rest of the cgroup based controllers.

    o Get rid of CONFIG_DEBUG_CFQ_IOSCHED.

    This option currently does two things.

    - Enable printing of cgroup paths in blktrace
    - Enables CONFIG_DEBUG_BLK_CGROUP, which in turn displays additional stat
    files in cgroup.

    If we are using group scheduling, blktrace data is of not really much use
    if cgroup information is not present. To get this data, currently one has to
    also enable CONFIG_DEBUG_CFQ_IOSCHED, which in turn brings the overhead of
    all the additional debug stat files which is not desired.

    Hence, this patch moves printing of cgroup paths under
    CONFIG_CFQ_GROUP_IOSCHED.

    This allows us to get rid of CONFIG_DEBUG_CFQ_IOSCHED completely. Now all
    the debug stat files are controlled only by CONFIG_DEBUG_BLK_CGROUP which
    can be enabled through config menu.

    Signed-off-by: Vivek Goyal
    Acked-by: Divyesh Shah
    Reviewed-by: Gui Jianfeng
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • o Once in a while, I was hitting a BUG_ON() in blkio code. empty_time was
    assuming that upon slice expiry, group can't be marked empty already (except
    forced dispatch).

    But this assumption is broken if cfqq can move (group_isolation=0) across
    groups after receiving a request.

    I think most likely in this case we got a request in a cfqq and accounted
    the rq in one group, later while adding the cfqq to tree, we moved the queue
    to a different group which was already marked empty and after dispatch from
    slice we found group already marked empty and raised alarm.

    This patch does not error out if group is already marked empty. This can
    introduce some empty_time stat error only in case of group_isolation=0. This
    is better than crashing. In case of group_isolation=1 we should still get
    same stats as before this patch.

    [ 222.308546] ------------[ cut here ]------------
    [ 222.309311] kernel BUG at block/blk-cgroup.c:236!
    [ 222.309311] invalid opcode: 0000 [#1] SMP
    [ 222.309311] last sysfs file: /sys/devices/virtual/block/dm-3/queue/scheduler
    [ 222.309311] CPU 1
    [ 222.309311] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
    [ 222.309311]
    [ 222.309311] Pid: 4780, comm: fio Not tainted 2.6.34-rc4-blkio-config #68 0A98h/HP xw8600 Workstation
    [ 222.309311] RIP: 0010:[] [] blkiocg_set_start_empty_time+0x50/0x83
    [ 222.309311] RSP: 0018:ffff8800ba6e79f8 EFLAGS: 00010002
    [ 222.309311] RAX: 0000000000000082 RBX: ffff8800a13b7990 RCX: ffff8800a13b7808
    [ 222.309311] RDX: 0000000000002121 RSI: 0000000000000082 RDI: ffff8800a13b7a30
    [ 222.309311] RBP: ffff8800ba6e7a18 R08: 0000000000000000 R09: 0000000000000001
    [ 222.309311] R10: 000000000002f8c8 R11: ffff8800ba6e7ad8 R12: ffff8800a13b78ff
    [ 222.309311] R13: ffff8800a13b7990 R14: 0000000000000001 R15: ffff8800a13b7808
    [ 222.309311] FS: 00007f3beec476f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
    [ 222.309311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 222.309311] CR2: 000000000040e7f0 CR3: 00000000a12d5000 CR4: 00000000000006e0
    [ 222.309311] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 222.309311] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [ 222.309311] Process fio (pid: 4780, threadinfo ffff8800ba6e6000, task ffff8800b3d6bf00)
    [ 222.309311] Stack:
    [ 222.309311] 0000000000000001 ffff8800bab17a48 ffff8800bab17a48 ffff8800a13b7800
    [ 222.309311] ffff8800ba6e7a68 ffffffff8121da35 ffff880000000001 00ff8800ba5c5698
    [ 222.309311] ffff8800ba6e7a68 ffff8800a13b7800 0000000000000000 ffff8800bab17a48
    [ 222.309311] Call Trace:
    [ 222.309311] [] __cfq_slice_expired+0x2af/0x3ec
    [ 222.309311] [] cfq_dispatch_requests+0x2c8/0x8e8
    [ 222.309311] [] ? spin_unlock_irqrestore+0xe/0x10
    [ 222.309311] [] ? blk_insert_cloned_request+0x70/0x7b
    [ 222.309311] [] blk_peek_request+0x191/0x1a7
    [ 222.309311] [] dm_request_fn+0x38/0x14c [dm_mod]
    [ 222.309311] [] ? sync_page_killable+0x0/0x35
    [ 222.309311] [] __generic_unplug_device+0x32/0x37
    [ 222.309311] [] generic_unplug_device+0x2e/0x3c
    [ 222.309311] [] dm_unplug_all+0x42/0x5b [dm_mod]
    [ 222.309311] [] blk_unplug+0x29/0x2d
    [ 222.309311] [] blk_backing_dev_unplug+0x12/0x14
    [ 222.309311] [] block_sync_page+0x35/0x39
    [ 222.309311] [] sync_page+0x41/0x4a
    [ 222.309311] [] sync_page_killable+0xe/0x35
    [ 222.309311] [] __wait_on_bit_lock+0x46/0x8f
    [ 222.309311] [] __lock_page_killable+0x66/0x6d
    [ 222.309311] [] ? wake_bit_function+0x0/0x33
    [ 222.309311] [] lock_page_killable+0x2c/0x2e
    [ 222.309311] [] generic_file_aio_read+0x361/0x4f0
    [ 222.309311] [] do_sync_read+0xcb/0x108
    [ 222.309311] [] ? security_file_permission+0x16/0x18
    [ 222.309311] [] vfs_read+0xab/0x108
    [ 222.309311] [] sys_read+0x4a/0x6e
    [ 222.309311] [] system_call_fastpath+0x16/0x1b
    [ 222.309311] Code: 58 01 00 00 00 48 89 c6 75 0a 48 83 bb 60 01 00 00 00 74 09 48 8d bb a0 00 00 00 eb 35 41 fe cc 74 0d f6 83 c0 01 00 00 04 74 04 0b eb fe 48 89 75 e8 e8 be e0 de ff 66 83 8b c0 01 00 00 04
    [ 222.309311] RIP [] blkiocg_set_start_empty_time+0x50/0x83
    [ 222.309311] RSP
    [ 222.309311] ---[ end trace 32b4f71dffc15712 ]---

    Signed-off-by: Vivek Goyal
    Acked-by: Divyesh Shah
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

16 Apr, 2010

1 commit


14 Apr, 2010

3 commits

  • Fixes compile errors in blk-cgroup code for empty_time stat and a merge fix in
    CFQ. The first error was when CONFIG_DEBUG_CFQ_IOSCHED is not set.

    Signed-off-by: Divyesh Shah
    Signed-off-by: Jens Axboe

    Divyesh Shah
     
  • Conflicts:
    block/blk-cgroup.c
    block/cfq-iosched.c

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Changelog from v1:
    o Call blkiocg_update_idle_time_stats() at cfq_rq_enqueued() instead of at
    dispatch time.

    Changelog from original patchset: (in response to Vivek Goyal's comments)
    o group blkiocg_update_blkio_group_dequeue_stats() with other DEBUG functions
    o rename blkiocg_update_set_active_queue_stats() to
    blkiocg_update_avg_queue_size_stats()
    o s/request/io/ in blkiocg_update_request_add_stats() and
    blkiocg_update_request_remove_stats()
    o Call cfq_del_timer() at request dispatch() instead of
    blkiocg_update_idle_time_stats()

    Signed-off-by: Divyesh Shah
    Acked-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Divyesh Shah
     

13 Apr, 2010

1 commit

  • Currently, IO Controller makes use of blkio.weight to assign weight for
    all devices. Here a new user interface "blkio.weight_device" is introduced to
    assign different weights for different devices. blkio.weight becomes the
    default value for devices which are not configured by "blkio.weight_device"

    You can use the following format to assigned specific weight for a given
    device:
    #echo "major:minor weight" > blkio.weight_device

    major:minor represents device number.

    And you can remove weight for a given device as following:
    #echo "major:minor 0" > blkio.weight_device

    V1->V2 changes:
    - use user interface "weight_device" instead of "policy" suggested by Vivek
    - rename some struct suggested by Vivek
    - rebase to 2.6-block "for-linus" branch
    - remove an useless list_empty check pointed out by Li Zefan
    - some trivial typo fix

    V2->V3 changes:
    - Move policy_*_node() functions up to get rid of forward declarations
    - rename related functions by adding prefix "blkio_"

    Signed-off-by: Gui Jianfeng
    Acked-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Gui Jianfeng