23 Mar, 2011
3 commits
-
Lina reported that if throttle limits are initially very high and then
dropped, then no new bio might be dispatched for a long time. And the
reason being that after dropping the limits we don't reset the existing
slice and do the rate calculation with new low rate and account the bios
dispatched at high rate. To fix it, reset the slice upon rate change.https://lkml.org/lkml/2011/3/10/298
Another problem with very high limit is that we never queued the
bio on throtl service tree. That means we kept on extending the
group slice but never trimmed it. Fix that also by regulary
trimming the slice even if bio is not being queued up.Reported-by: Lina Lu
Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe -
This change moves unaccounted_time to only be reported when
CONFIG_DEBUG_BLK_CGROUP is true.Signed-off-by: Justin TerAvest
Signed-off-by: Jens Axboe -
Commit "Add unaccounted time to timeslice_used" changed the behavior of
cfq_preempt_queue to set cfqq active. Vivek pointed out that other
preemption rules might get involved, so we shouldn't manually set which
queue is active.This cleans up the code to just clear the queue stats at preemption
time.Signed-off-by: Justin TerAvest
Signed-off-by: Jens Axboe
22 Mar, 2011
1 commit
-
After the stack plugging introduction, these are called lockless.
Ensure that the counters are updated atomically.Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe
21 Mar, 2011
1 commit
-
One of the disadvantages of on-stack plugging is that we potentially
lose out on merging since all pending IO isn't always visible to
everybody. When we flush the on-stack plugs, right now we don't do
any checks to see if potential merge candidates could be utilized.Correct this by adding a new insert variant, ELEVATOR_INSERT_SORT_MERGE.
It works just ELEVATOR_INSERT_SORT, but first checks whether we can
merge with an existing request before doing the insertion (if we fail
merging).This fixes a regression with multiple processes issuing IO that
can be merged.Thanks to Shaohua Li for testing and fixing
an accounting bug.Signed-off-by: Jens Axboe
19 Mar, 2011
1 commit
-
"disk" is always NULL when we goto out. There was a check for this
before, but it was removed in 69e02c59a7d9 "block: Don't check events
while open is in progress".Signed-off-by: Dan Carpenter
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe
17 Mar, 2011
7 commits
-
Version 3 is updated to apply to for-2.6.39/core.
For version 2, I took Vivek's advice and made sure we update the group
weight from cfq_group_service_tree_add().If a weight was updated while a group is on the service tree, the
calculation for the total weight of the service tree can be adjusted
improperly, which either leads to bad service tree weights, or
potentially crashes (if total_weight becomes 0).This patch defers updates to the weight until a group is off the service
tree.Signed-off-by: Justin TerAvest
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe -
We don't have proper reference counting for this yet, so we run into
cases where the device is pulled and we OOPS on flushing the fs data.
This happens even though the dirty inodes have already been
migrated to the default_backing_dev_info.Reported-by: Torsten Hilbrich
Tested-by: Torsten Hilbrich
Cc: stable@kernel.org
Signed-off-by: Jens Axboe -
MD and DM create a new bio_set for every metadevice. Each bio_set has an
integrity mempool attached regardless of whether the metadevice is
capable of passing integrity metadata. This is a waste of memory.Instead we defer the allocation decision to MD and DM since we know at
metadevice creation time whether integrity passthrough is needed or not.Automatic integrity mempool allocation can then be removed from
bioset_create() and we make an explicit integrity allocation for the
fs_bio_set.Signed-off-by: Martin K. Petersen
Reported-by: Zdenek Kabelac
Acked-by: Mike Snitzer
Signed-off-by: Jens Axboe -
'write_op' was still used, even though it was always WRITE_SYNC now.
Add plugging around the cases where it submits IO, and flush them
before we end up waiting for that IO.Signed-off-by: Jens Axboe
-
'write_op' was still used, even though it was always WRITE_SYNC now.
Add plugging around the cases where it submits IO, and flush them
before we end up waiting for that IO.Signed-off-by: Jens Axboe
-
It used WRITE_SYNC_PLUG before and potentially submits a batch
of IO, so lets enable plugging for this case.Signed-off-by: Jens Axboe
-
This recovers a performance regression caused by the removal
of the per-device plugging.Signed-off-by: Jens Axboe
12 Mar, 2011
4 commits
-
There are two kind of times that tasks are not charged for: the first
seek and the extra time slice used over the allocated timeslice. Both
of these exported as a new unaccounted_time stat.I think it would be good to have this reported in 'time' as well, but
that is probably a separate discussion.Signed-off-by: Justin TerAvest
Signed-off-by: Jens Axboe -
They used an older prototype, fix it up.
Reported-by: Randy Dunlap
Signed-off-by: Jens Axboe -
barrier is already removed, so remove the obsolete comments
in blkdev_issue_zeroout.Cc: Jens Axboe
Signed-off-by: Tao Ma
Signed-off-by: Jens Axboe -
In blk_add_trace_rq, we only chose the minor 2 bits from
request's cmd_flags and did some check for discard.
so most of other flags(e.g, REQ_SYNC) are missing.For example, with a sync write after blkparse we get:
8,16 1 1 0.001776503 7509 A WS 1349632 + 1024 cmd_flags directly to __blk_add_trace.With this patch, after a sync write we get:
8,16 1 1 0.001776900 5425 A WS 1189888 + 1024
Acked-by: Jeff Moyer
Signed-off-by: Jens Axboe
10 Mar, 2011
23 commits
-
Conflicts:
block/blk-core.c
block/blk-flush.c
drivers/md/raid1.c
drivers/md/raid10.c
drivers/md/raid5.c
fs/nilfs2/btnode.c
fs/nilfs2/mdt.cSigned-off-by: Jens Axboe
-
Use plug in throttle dispatch also as we are dispatching a bunch of
bios in throttle context and some of them might merge.Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe -
With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.Signed-off-by: Jens Axboe
-
This should be useless now that we have on-stack plugging. So lets just
kill it.Signed-off-by: Jens Axboe
-
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe -
Signed-off-by: Jens Axboe
-
Signed-off-by: Jens Axboe
-
Signed-off-by: Jens Axboe
-
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().Signed-off-by: Jens Axboe
-
This patch adds support for creating a queuing context outside
of the queue itself. This enables us to batch up pieces of IO
before grabbing the block device queue lock and submitting them to
the IO scheduler.The context is created on the stack of the process and assigned in
the task structure, so that we can auto-unplug it if we hit a schedule
event.The current queue plugging happens implicitly if IO is submitted to
an empty device, yet callers have to remember to unplug that IO when
they are going to wait for it. This is an ugly API and has caused bugs
in the past. Additionally, it requires hacks in the vm (->sync_page()
callback) to handle that logic. By switching to an explicit plugging
scheme we make the API a lot nicer and can get rid of the ->sync_page()
hack in the vm.Signed-off-by: Jens Axboe
-
It was always abuse to reuse the plugging infrastructure for this,
convert it to the (new) real API for delaying queueing a bit. A
default delay of 3 msec is defined, to match the previous
behaviour.Signed-off-by: Jens Axboe
-
It was always abuse to reuse the plugging infrastructure for this,
convert it to the (new) real API for delaying queueing a bit.Signed-off-by: Jens Axboe
Acked-by: David S. Miller -
Currently we use plugging for that, but as plugging is going away,
we need an alternative mechanism.Signed-off-by: Jens Axboe
-
Convert two staging drivers - blkvsc_drv and cyasblkdev_block - from
->media_changed() to ->check_events(). The former always indicated
media changed while the latter always indicated media not changed.
Not sure what the drivers are trying to achieve but keep the original
behavior.Signed-off-by: Tejun Heo
Acked-by: Greg Kroah-Hartman
Cc: Jens Axboe
Cc: Kay Sievers -
Convert from ->media_changed() to ->check_events().
pktcdvd needs to forward all event related operations to the
underlying device. Forward ->check_events() instead of
->media_changed() and inherit disk->[async_]events.Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Kay Sievers
Cc: Peter Osterlund -
umem doesn't implement media changed detection and there's no need to
implement dummy callback anymore. Remove it.Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Kay Sievers -
Convert from ->media_changed() to ->check_events().
s390/tape_block buffers media changed state and clears it on
revalidation. It will behave correctly with kernel event polling.Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Kay Sievers
Cc: Martin Schwidefsky
Cc: Heiko Carstens -
Convert from ->media_changed() to ->check_events().
i2o_block buffers media changed state and clears it after reporting.
It will behave correctly with kernel event polling.Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Kay Sievers
Cc: Markus Lidel -
Convert from ->media_changed() to ->check_events().
xsysace buffers media changed state and clears it on revalidation. It
will behave correctly with kernel event polling.Signed-off-by: Tejun Heo
Acked-by: Grant Likely
Cc: Jens Axboe
Cc: Kay Sievers -
Convert from ->media_changed() to ->check_events().
ub buffers media changed state and clears it on revalidation. It will
behave correctly with kernel event polling.Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Kay Sievers
Cc: Pete Zaitcev -
Convert from ->media_changed() to ->check_events().
Both swim and swim3 buffer media changed state and clear it on
revalidation. They will behave correctly with kernel event polling.Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Kay Sievers
Cc: Laurent Vivier
Cc: Benjamin Herrenschmidt -
Convert from ->media_changed() to ->check_events().
DAC960 media change notification seems to be one way (once set, never
cleared) and will generate spurious events when polled once the
condition triggers.Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Kay Sievers -
Convert paride drivers from ->media_changed() to ->check_events().
pcd and pd buffer and clear events after reporting; however, pf
unconditionally reports MEDIA_CHANGE and will generate spurious events
when polled.Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Kay Sievers
Cc: Tim Waugh