22 Apr, 2011
2 commits
-
DISK_EVENT_MEDIA_CHANGE is used for both userland visible event and
internal event for revalidation of removeable devices. Some legacy
drivers don't implement proper event detection and continuously
generate events under certain circumstances. For example, ide-cd
generates media changed continuously if there's no media in the drive,
which can lead to infinite loop of events jumping back and forth
between the driver and userland event handler.This patch updates disk event infrastructure such that it never
propagates events not listed in disk->events to userland. Those
events are processed the same for internal purposes but uevent
generation is suppressed.This also ensures that userland only gets events which are advertised
in the @events sysfs node lowering risk of confusion.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
The sort insert is the one that goes to the IO scheduler. With
the SORT_MERGE addition, we could bypass IO scheduler setup
but still ask the IO scheduler to insert the request. This would
cause an oops on switching IO schedulers through the sysfs
interface, unless the disk just happened to be idle while it
occured.Signed-off-by: Jens Axboe
19 Apr, 2011
6 commits
-
In queue_requests_store, the code looks like
if (rl->count[BLK_RW_SYNC] >= q->nr_requests) {
blk_set_queue_full(q, BLK_RW_SYNC);
} else if (rl->count[BLK_RW_SYNC]+1 nr_requests) {
blk_clear_queue_full(q, BLK_RW_SYNC);
wake_up(&rl->wait[BLK_RW_SYNC]);
}
If we don't satify the situation of "if", we can get that
rl->count[BLK_RW_SYNC} < q->nr_quests. It is the same as
rl->count[BLK_RW_SYNC]+1 nr_requests.
All the "else" should satisfy the "else if" check so it isn't
needed actually.Signed-off-by: Tao Ma
Signed-off-by: Jens Axboe -
We do not call blk_trace_remove_sysfs() in err return path
if kobject_add() fails. This path fixes it.Cc: stable@kernel.org
Signed-off-by: Liu Yuan
Signed-off-by: Jens Axboe -
We don't pass in a 'force_kblockd' anymore, get rid of the
stsale comment.Reported-by: Mike Snitzer
Signed-off-by: Jens Axboe -
We are currently using this flag to check whether it's safe
to call into ->request_fn(). If it is set, we punt to kblockd.
But we get a lot of false positives and excessive punts to
kblockd, which hurts performance.The only real abuser of this infrastructure is SCSI. So export
the async queue run and convert SCSI over to use that. There's
room for improvement in that SCSI need not always use the async
call, but this fixes our performance issue and they can fix that
up in due time.Signed-off-by: Jens Axboe
-
For some configurations of CONFIG_PREEMPT that is not true. So
get rid of __call_for_each_cic() and always uses the explicitly
rcu_read_lock() protected call_for_each_cic() instead.This fixes a potential bug related to IO scheduler removal or
online switching.Thanks to Paul McKenney for clarifying this.
Signed-off-by: Jens Axboe
-
With all drivers and file systems converted, we only have
in-core use of this function. So remove the export.Reporteed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
18 Apr, 2011
5 commits
-
Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly. I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Reported-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
If we know we are going to punt to kblockd, we can drop the queue
lock before calling into __blk_run_queue() since it only does a
safe bit test and a workqueue call. Since kblockd needs to grab
this very lock as one of the first things it does, it's a good
optimization to drop the lock before waking kblockd.Signed-off-by: Jens Axboe
-
MD can't use this since it really requires us to be able to
keep more than a single piece of state for the unplug. Commit
048c9374 added the required support for MD, so get rid of this
now unused code.This reverts commit f75664570d8b75469cc468f23c2b27220984983b.
Conflicts:
block/blk-core.c
Signed-off-by: Jens Axboe
-
md/raid requires an unplug callback, but as it does not uses
requests the current code cannot provide one.So allow arbitrary callbacks to be attached to the blk_plug.
Signed-off-by: NeilBrown
Signed-off-by: Jens Axboe
16 Apr, 2011
1 commit
-
It's a pretty close match to what we had before - the timer triggering
would mean that nobody unplugged the plug in due time, in the new
scheme this matches very closely what the schedule() unplug now is.
It's essentially the difference between an explicit unplug (IO unplug)
or an implicit unplug (timer unplug, we scheduled with pending IO
queued).Signed-off-by: Jens Axboe
15 Apr, 2011
2 commits
-
For the explicit unplugging, we'd prefer to kick things off
immediately and not pay the penalty of the latency to switch
to kblockd. So let blk_finish_plug() do the run inline, while
the implicit-on-schedule-out unplug will punt to kblockd.Signed-off-by: Jens Axboe
-
It's a bit of a mess currently. task->plug is being cleared
and reset in __blk_finish_plug(), and blk_finish_plug() is
testing for a NULL plug which cannot happen even from schedule()
anymore since it uses blk_needs_flush_plug() to determine
whether to call into this function at all.So get rid of some of the cruft.
Signed-off-by: Jens Axboe
14 Apr, 2011
1 commit
-
In the function blk_register_queue(), var _dev_ is already assigned by
disk_to_dev().So use it directly instead of calling disk_to_dev() again.Signed-off-by: Liu Yuan
Modified by me to delete an empty line in the same function while
in there anyway.Signed-off-by: Jens Axboe
12 Apr, 2011
6 commits
-
There are worries that we are now consuming a lot more stack in
some cases, since we potentially call into IO dispatch from
schedule() or io_schedule(). We can reduce this problem by moving
the running of the queue to kblockd, like the old plugging scheme
did as well.This may or may not be a good idea from a performance perspective,
depending on how many tasks have queue plugs running at the same
time. For even the slightly contended case, doing just a single
queue run from kblockd instead of multiple runs directly from the
unpluggers will be faster.Signed-off-by: Jens Axboe
-
The original use for this dates back to when we had to track write
requests for serializing around barriers. That's not needed anymore,
so kill it.Signed-off-by: Jens Axboe
-
This was removed with the queue plug state. But we can easily readd
by checking if this is the first request going to this queue. It's
good information to have when tracing to see how effective the
plugging is.Signed-off-by: Jens Axboe
-
MD would like to know when a queue is unplugged, so it can flush
it's bitmap writes. Add such a callback.Signed-off-by: Jens Axboe
-
It's done at the top to avoid doing it for every queue we unplug.
Signed-off-by: Jens Axboe
-
It was removed with the on-stack plugging, readd it and track the
depth of requests added when flushing the plug.Signed-off-by: Jens Axboe
11 Apr, 2011
1 commit
-
If the request_fn ends up blocking, we could be re-entering
the plug flush. Since the list is protected by explicitly
not allowing schedule events, this isn't a terribly good idea.Additionally, it can cause us to recurse. As request_fn called by
__blk_run_queue is allowed to 'schedule()' (after dropping the queue
lock of course), it is possible to get a recursive call:schedule -> blk_flush_plug -> __blk_finish_plug -> flush_plug_list
-> __blk_run_queue -> request_fn -> scheduleWe must make sure that the second schedule does not call into
blk_flush_plug again. So instead of leaving the list of requests on
blk_plug->list, move them to a separate list leaving blk_plug->list
empty.Signed-off-by: Jens Axboe
08 Apr, 2011
1 commit
-
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6:
Fix common misspellings
06 Apr, 2011
6 commits
-
Comparison function for list_sort() must be anticommutative,
otherwise it is not sorting in ordinary meaning.But fortunately list_sort() always check ((*cmp)(priv, a, b)
Signed-off-by: Jens Axboe -
The current block integrity (DIF/DIX) support in DM is verifying that
all devices' integrity profiles match during DM device resume (which
is past the point of no return). To some degree that is unavoidable
(stacked DM devices force this late checking). But for most DM
devices (which aren't stacking on other DM devices) the ideal time to
verify all integrity profiles match is during table load.Introduce the notion of an "initialized" integrity profile: a profile
that was blk_integrity_register()'d with a non-NULL 'blk_integrity'
template. Add blk_integrity_is_initialized() to allow checking if a
profile was initialized.Update DM integrity support to:
- check all devices with _initialized_ integrity profiles match
during table load; uninitialized profiles (e.g. for underlying DM
device(s) of a stacked DM device) are ignored.
- disallow a table load that would result in an integrity profile that
conflicts with a DM device's existing (in-use) integrity profile
- avoid clearing an existing integrity profile
- validate all integrity profiles match during resume; but if they
don't all we can do is report the mismatch (during resume we're past
the point of no return)Signed-off-by: Mike Snitzer
Cc: Martin K. Petersen
Signed-off-by: Jens Axboe -
xchg does not work portably with smaller than 32bit types.
Signed-off-by: Andreas Schwab
Signed-off-by: Jens Axboe -
It's not a preempt type request, in fact we have to insert it
behind requests that do specify INSERT_FRONT.Signed-off-by: Jens Axboe
-
Merge it with __elv_add_request(), it's pretty pointless to
have a function with only two callers. The main interface
is elv_add_request()/__elv_add_request().Signed-off-by: Jens Axboe
-
Currently we just dump a non-informative 'request botched' message.
Lets actually try and print something sane to help debug issues
around this.Signed-off-by: Jens Axboe
31 Mar, 2011
1 commit
-
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi
26 Mar, 2011
2 commits
-
When the queue work handler was converted to delayed work, the
stopping was inadvertently made sync as well. Change this back
to being async stop, using __cancel_delayed_work() instead of
cancel_delayed_work().Reported-by: Jeremy Fitzhardinge
Reported-by: Chris Mason
Signed-off-by: Jens Axboe -
With the introduction of the on-stack plugging, we would assume
that any request being inserted was a normal file system request.
As flush/fua requires a special insert mode, this caused problems.Fix this up by checking for this in flush_plug_list() and use
the appropriate insert mechanism.Big thanks goes to Markus Tripplesdorf for tirelessly testing
patches, and to Sergey Senozhatsky for helping find the real
issue.Reported-by: Markus Tripplesdorf
Signed-off-by: Jens Axboe
25 Mar, 2011
1 commit
-
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...Fix up conflicts in fs/{aio.c,super.c}
23 Mar, 2011
5 commits
-
Removing think time checking. A high thinktime queue might means the queue
dispatches several requests and then do away. Limitting such queue seems
meaningless. And also this can simplify code. This is suggested by Vivek.Signed-off-by: Shaohua Li
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe -
For v2, I added back lines to cfq_preempt_queue() that were removed
during updates for accounting unaccounted_time. Thanks for pointing out
that I'd missed these, Vivek.Previous commit "cfq-iosched: Don't set active queue in preempt" wrongly
cleared stats for preempting queues when it shouldn't have, because when
we choose a queue to preempt, it still isn't necessarily scheduled next.Thanks to Vivek Goyal for figuring this out and understanding how the
preemption code works.Signed-off-by: Justin TerAvest
Signed-off-by: Jens Axboe -
Lina reported that if throttle limits are initially very high and then
dropped, then no new bio might be dispatched for a long time. And the
reason being that after dropping the limits we don't reset the existing
slice and do the rate calculation with new low rate and account the bios
dispatched at high rate. To fix it, reset the slice upon rate change.https://lkml.org/lkml/2011/3/10/298
Another problem with very high limit is that we never queued the
bio on throtl service tree. That means we kept on extending the
group slice but never trimmed it. Fix that also by regulary
trimming the slice even if bio is not being queued up.Reported-by: Lina Lu
Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe -
This change moves unaccounted_time to only be reported when
CONFIG_DEBUG_BLK_CGROUP is true.Signed-off-by: Justin TerAvest
Signed-off-by: Jens Axboe -
Commit "Add unaccounted time to timeslice_used" changed the behavior of
cfq_preempt_queue to set cfqq active. Vivek pointed out that other
preemption rules might get involved, so we shouldn't manually set which
queue is active.This cleans up the code to just clear the queue stats at preemption
time.Signed-off-by: Justin TerAvest
Signed-off-by: Jens Axboe