Eric Lee / smarc-fsl-linux-kernel

23 Dec, 2015

2 commits

bbc758ec0 block: remove REQ_NO_TIMEOUT flag ... Browse Code »

This was added for the 'magic' AEN requests in the NVMe driver that never
return. We now handle them purely inside the driver and don't need this
core hack any more.

Signed-off-by: Christoph Hellwig
Acked-by: Keith Busch
Signed-off-by: Jens Axboe

Christoph Hellwig
2015-12-23 00:38:34 +0800
287922eb0 block: defer timeouts to a workqueue ... Browse Code »

Timer context is not very useful for drivers to perform any meaningful abort
action from. So instead of calling the driver from this useless context
defer it to a workqueue as soon as possible.

Note that while a delayed_work item would seem the right thing here I didn't
dare to use it due to the magic in blk_add_timer that pokes deep into timer
internals. But maybe this encourages Tejun to add a sensible API for that to
the workqueue API and we'll all be fine in the end :)

Contains a major update from Keith Bush:

"This patch removes synchronizing the timeout work so that the timer can
start a freeze on its own queue. The timer enters the queue, so timer
context can only start a freeze, but not wait for frozen."

Signed-off-by: Christoph Hellwig
Acked-by: Keith Busch
Signed-off-by: Jens Axboe

Christoph Hellwig
2015-12-23 00:38:16 +0800

25 Nov, 2015

2 commits

3b627a3f9 block: clarify blk_add_timer() use case for blk-mq ... Browse Code »

Just a comment update on not needing queue_lock, and that we aren't
really adding the request to a timeout list for !mq.

Signed-off-by: Jens Axboe

Jens Axboe
2015-11-25 06:58:53 +0800
55ce0da1d block: fix blk_abort_request for blk-mq drivers ... Browse Code »

We only added the request to the request list for the !blk-mq case,
so we should only delete it in that case as well.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2015-11-25 06:24:10 +0800

08 Jan, 2015

1 commit

5b3f25fc3 blk-mq: Allow requests to never expire ... Browse Code »

Some types of requests may be started that are not gauranteed to ever
complete. This adds a request flag that a driver can use so mark the
request as such.

Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe

Keith Busch
2015-01-08 23:59:01 +0800

23 Sep, 2014

3 commits

904158376 block: fix blk_abort_request on blk-mq ... Browse Code »

Signed-off-by: Christoph Hellwig

Moved blk_mq_rq_timed_out() definition to the private blk-mq.h header.

Signed-off-by: Jens Axboe

Christoph Hellwig
2014-09-23 02:00:08 +0800
5e940aaa5 blk-timeout: fix blk_add_timer ... Browse Code »

Commit 8cb34819cdd5d(blk-mq: unshared timeout handler) introduces
blk-mq's own timeout handler, and removes following line:

blk_queue_rq_timed_out(q, blk_mq_rq_timed_out);

which then causes blk_add_timer() to bypass adding the timer,
since blk-mq no longer has q->rq_timed_out_fn defined.

This patch fixes the problem by bypassing the check for blk-mq,
so that both request deadlines are still set and the rolling
timer updated.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-23 02:00:08 +0800
46f92d42e blk-mq: unshared timeout handler ... Browse Code »

Duplicate the (small) timeout handler in blk-mq so that we can pass
arguments more easily to the driver timeout handler. This enables
the next patch.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-09-23 02:00:07 +0800

31 May, 2014

1 commit

c7bca4183 block: ensure that the timer is always added ... Browse Code »

Commit f793aa537866 relaxed the timer addition a little too much.
If the timer isn't pending, we always need to add it.

Signed-off-by: Jens Axboe

Jens Axboe
2014-05-31 05:41:39 +0800

14 May, 2014

1 commit

0d2602ca3 blk-mq: improve support for shared tags maps ... Browse Code »

This adds support for active queue tracking, meaning that the
blk-mq tagging maintains a count of active users of a tag set.
This allows us to maintain a notion of fairness between users,
so that we can distribute the tag depth evenly without starving
some users while allowing others to try unfair deep queues.

If sharing of a tag set is detected, each hardware queue will
track the depth of its own queue. And if this exceeds the total
depth divided by the number of active queues, the user is actively
throttled down.

The active queue count is done lazily to avoid bouncing that data
between submitter and completer. Each hardware queue gets marked
active when it allocates its first tag, and gets marked inactive
when 1) the last tag is cleared, and 2) the queue timeout grace
period has passed.

Signed-off-by: Jens Axboe

Jens Axboe
2014-05-14 05:10:52 +0800

25 Apr, 2014

1 commit

c4a634f43 block: fold __blk_add_timer into blk_add_timer ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-04-25 22:24:40 +0800

24 Apr, 2014

1 commit

87ee7b112 blk-mq: fix race with timeouts and requeue events ... Browse Code »

If a requeue event races with a timeout, we can get into the
situation where we attempt to complete a request from the
timeout handler when it's not start anymore. This causes a crash.
So have the timeout handler check that REQ_ATOM_STARTED is still
set on the request - if not, we ignore the event. If this happens,
the request has now been marked as complete. As a consequence, we
need to ensure to clear REQ_ATOM_COMPLETE in blk_mq_start_request(),
as to maintain proper request state.

Signed-off-by: Jens Axboe

Jens Axboe
2014-04-24 22:51:47 +0800

17 Apr, 2014

1 commit

f793aa537 block: relax when to modify the timeout timer ... Browse Code »

Since we are now, by default, applying timer slack to expiry times,
the logic for when to modify a timer in the block code is suboptimal.
The block layer keeps a forward rolling timer per queue for all
requests, and modifies this timer if a request has a shorter timeout
than what the current expiry time is. However, this breaks down
when our rounded timer values get applied slack. Then each new
request ends up modifying the timer, since we're still a little
in front of the timer + slack.

Fix this by allowing a tolerance of HZ / 2, the timeout handling
doesn't need to be very precise. This drastically cuts down
the number of timer modifications we have to make.

Signed-off-by: Jens Axboe

Jens Axboe
2014-04-17 04:15:25 +0800

11 Feb, 2014

1 commit

30a91cb4e blk-mq: rework I/O completions ... Browse Code »

Rework I/O completions to work more like the old code path. blk_mq_end_io
now stays out of the business of deferring completions to others CPUs
and calling blk_mark_rq_complete. The latter is very important to allow
completing requests that have timed out and thus are already marked completed,
the former allows using the IPI callout even for driver specific completions
instead of having to reimplement them.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-02-11 00:27:31 +0800

09 Nov, 2013

2 commits

e37459b8e Merge branch 'blk-mq/core' into for-3.13/core ... Browse Code »

Signed-off-by: Jens Axboe

Conflicts:
block/blk-timeout.c

Jens Axboe
2013-11-09 00:08:12 +0800
8616ebb16 block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO ... Browse Code »

This patch fixes coccinelle error regarding usage of IS_ERR and
PTR_ERR instead of PTR_ERR_OR_ZERO.

Signed-off-by: Duan Jiong
Signed-off-by: Jens Axboe

Duan Jiong
2013-11-09 00:05:30 +0800

08 Nov, 2013

1 commit

4912aa6c1 block: fix race between request completion and timeout handling ... Browse Code »

crocode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma dca be2net sg ses enclosure ext4 mbcache jbd2 sd_mod crc_t10dif ahci megaraid_sas(U) dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 491, comm: scsi_eh_0 Tainted: G W ---------------- 2.6.32-220.13.1.el6.x86_64 #1 IBM -[8722PAX]-/00D1461
RIP: 0010:[] [] blk_requeue_request+0x94/0xa0
RSP: 0018:ffff881057eefd60 EFLAGS: 00010012
RAX: ffff881d99e3e8a8 RBX: ffff881d99e3e780 RCX: ffff881d99e3e8a8
RDX: ffff881d99e3e8a8 RSI: ffff881d99e3e780 RDI: ffff881d99e3e780
RBP: ffff881057eefd80 R08: ffff881057eefe90 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff881057f92338
R13: 0000000000000000 R14: ffff881057f92338 R15: ffff883058188000
FS: 0000000000000000(0000) GS:ffff880040200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006d3ec0 CR3: 000000302cd7d000 CR4: 00000000000406b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scsi_eh_0 (pid: 491, threadinfo ffff881057eee000, task ffff881057e29540)
Stack:
0000000000001057 0000000000000286 ffff8810275efdc0 ffff881057f16000
ffff881057eefdd0 ffffffff81362323 ffff881057eefe20 ffffffff8135f393
ffff881057e29af8 ffff8810275efdc0 ffff881057eefe78 ffff881057eefe90
Call Trace:
[] __scsi_queue_insert+0xa3/0x150
[] ? scsi_eh_ready_devs+0x5e3/0x850
[] scsi_queue_insert+0x13/0x20
[] scsi_eh_flush_done_q+0x104/0x160
[] scsi_error_handler+0x35b/0x660
[] ? scsi_error_handler+0x0/0x660
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
Code: 00 00 eb d1 4c 8b 2d 3c 8f 97 00 4d 85 ed 74 bf 49 8b 45 00 49 83 c5 08 48 89 de 4c 89 e7 ff d0 49 8b 45 00 48 85 c0 75 eb eb a4 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00
RIP [] blk_requeue_request+0x94/0xa0
RSP

The RIP is this line:
BUG_ON(blk_queued_rq(rq));

After digging through the code, I think there may be a race between the
request completion and the timer handler running.

A timer is started for each request put on the device's queue (see
blk_start_request->blk_add_timer). If the request does not complete
before the timer expires, the timer handler (blk_rq_timed_out_timer)
will mark the request complete atomically:

static inline int blk_mark_rq_complete(struct request *rq)
{
return test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
}

and then call blk_rq_timed_out. The latter function will call
scsi_times_out, which will return one of BLK_EH_HANDLED,
BLK_EH_RESET_TIMER or BLK_EH_NOT_HANDLED. If BLK_EH_RESET_TIMER is
returned, blk_clear_rq_complete is called, and blk_add_timer is again
called to simply wait longer for the request to complete.

Now, if the request happens to complete while this is going on, what
happens? Given that we know the completion handler will bail if it
finds the REQ_ATOM_COMPLETE bit set, we need to focus on the completion
handler running after that bit is cleared. So, from the above
paragraph, after the call to blk_clear_rq_complete. If the completion
sets REQ_ATOM_COMPLETE before the BUG_ON in blk_add_timer, we go boom
there (I haven't seen this in the cores). Next, if we get the
completion before the call to list_add_tail, then the timer will
eventually fire for an old req, which may either be freed or reallocated
(there is evidence that this might be the case). Finally, if the
completion comes in *after* the addition to the timeout list, I think
it's harmless. The request will be removed from the timeout list,
req_atom_complete will be set, and all will be well.

This will only actually explain the coredumps *IF* the request
structure was freed, reallocated *and* queued before the error handler
thread had a chance to process it. That is possible, but it may make
sense to keep digging for another race. I think that if this is what
was happening, we would see other instances of this problem showing up
as null pointer or garbage pointer dereferences, for example when the
request structure was not re-used. It looks like we actually do run
into that situation in other reports.

This patch moves the BUG_ON(test_bit(REQ_ATOM_COMPLETE,
&req->atomic_flags)); from blk_add_timer to the only caller that could
trip over it (blk_start_request). It then inverts the calls to
blk_clear_rq_complete and blk_add_timer in blk_rq_timed_out to address
the race. I've boot tested this patch, but nothing more.

Signed-off-by: Jeff Moyer
Acked-by: Hannes Reinecke
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Jeff Moyer
2013-11-08 23:59:04 +0800

25 Oct, 2013

1 commit

320ae51fe blk-mq: new multi-queue block IO queueing mechanism ... Browse Code »

Linux currently has two models for block devices:

- The classic request_fn based approach, where drivers use struct
request units for IO. The block layer provides various helper
functionalities to let drivers share code, things like tag
management, timeout handling, queueing, etc.

- The "stacked" approach, where a driver squeezes in between the
block layer and IO submitter. Since this bypasses the IO stack,
driver generally have to manage everything themselves.

With drivers being written for new high IOPS devices, the classic
request_fn based driver doesn't work well enough. The design dates
back to when both SMP and high IOPS was rare. It has problems with
scaling to bigger machines, and runs into scaling issues even on
smaller machines when you have IOPS in the hundreds of thousands
per device.

The stacked approach is then most often selected as the model
for the driver. But this means that everybody has to re-invent
everything, and along with that we get all the problems again
that the shared approach solved.

This commit introduces blk-mq, block multi queue support. The
design is centered around per-cpu queues for queueing IO, which
then funnel down into x number of hardware submission queues.
We might have a 1:1 mapping between the two, or it might be
an N:M mapping. That all depends on what the hardware supports.

blk-mq provides various helper functions, which include:

- Scalable support for request tagging. Most devices need to
be able to uniquely identify a request both in the driver and
to the hardware. The tagging uses per-cpu caches for freed
tags, to enable cache hot reuse.

- Timeout handling without tracking request on a per-device
basis. Basically the driver should be able to get a notification,
if a request happens to fail.

- Optional support for non 1:1 mappings between issue and
submission queues. blk-mq can redirect IO completions to the
desired location.

- Support for per-request payloads. Drivers almost always need
to associate a request structure with some driver private
command structure. Drivers can tell blk-mq this at init time,
and then any request handed to the driver will have the
required size of memory associated with it.

- Support for merging of IO, and plugging. The stacked model
gets neither of these. Even for high IOPS devices, merging
sequential IO reduces per-command overhead and thus
increases bandwidth.

For now, this is provided as a potential 3rd queueing model, with
the hope being that, as it matures, it can replace both the classic
and stacked model. That would get us back to having just 1 real
model for block devices, leaving the stacked approach to dm/md
devices (as it was originally intended).

Contributions in this patch from the following people:

Shaohua Li
Alexander Gordeev
Christoph Hellwig
Mike Christie
Matias Bjorling
Jeff Moyer

Acked-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Jens Axboe
2013-10-25 18:56:00 +0800

01 Jul, 2013

1 commit

80bd7181b block: check for timeout function in blk_rq_timed_out() ... Browse Code »

rq_timed_out_fn might have been unset while the request
was in flight, so we need to check for it in blk_rq_timed_out().

Acked-by: Jens Axboe
Signed-off-by: Hannes Reinecke
Signed-off-by: Stefan Weinhuber
Signed-off-by: Martin Schwidefsky

Hannes Reinecke
2013-07-01 23:31:23 +0800

15 Jun, 2012

1 commit

76aaa5101 block: Drop dead function blk_abort_queue() ... Browse Code »

This function was only used by btrfs code in btrfs_abort_devices()
(seems in a wrong way).

It was removed in commit d07eb9117050c9ed3f78296ebcc06128b52693be,
So, Let's remove the dead code to avoid any confusion.

Changes in v2: update commit log, btrfs_abort_devices() was removed
already.

Cc: Jens Axboe
Cc: linux-kernel@vger.kernel.org
Cc: Chris Mason
Cc: linux-btrfs@vger.kernel.org
Cc: David Sterba
Signed-off-by: Asias He
Signed-off-by: Jens Axboe

Asias He
2012-06-15 14:46:23 +0800

04 Aug, 2011

1 commit

dd48c085c fault-injection: add ability to export fault_attr in arbitrary directory ... Browse Code »

init_fault_attr_dentries() is used to export fault_attr via debugfs.
But it can only export it in debugfs root directory.

Per Forlin is working on mmc_fail_request which adds support to inject
data errors after a completed host transfer in MMC subsystem.

The fault_attr for mmc_fail_request should be defined per mmc host and
export it in debugfs directory per mmc host like
/sys/kernel/debug/mmc0/mmc_fail_request.

init_fault_attr_dentries() doesn't help for mmc_fail_request. So this
introduces fault_create_debugfs_attr() which is able to create a
directory in the arbitrary directory and replace
init_fault_attr_dentries().

[akpm@linux-foundation.org: extraneous semicolon, per Randy]
Signed-off-by: Akinobu Mita
Tested-by: Per Forlin
Cc: Jens Axboe
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Matt Mackall
Cc: Randy Dunlap
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2011-08-04 08:25:20 +0800

21 Apr, 2010

1 commit

a534dbe96 block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer ... Browse Code »

blk_rq_timed_out_timer() relied on blk_add_timer() never returning a
timer value of zero, but commit 7838c15b8dd18e78a523513749e5b54bda07b0cb
removed the code that bumped this value when it was zero.
Therefore when jiffies is near wrap we could get unlucky & not set the
timeout value correctly.

This patch uses a flag to indicate that the timeout value was set and so
handles jiffies wrap correctly, and it keeps all the logic in one
function so should be easier to maintain in the future.

Signed-off-by: Richard Kennedy
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Richard Kennedy
2010-04-21 23:42:08 +0800

28 Apr, 2009

1 commit

2eef33e43 block: clean up misc stuff after block layer timeout conversion ... Browse Code »

* In blk_rq_timed_out_timer(), else { if } to else if

* In blk_add_timer(), simplify if/else block

[ Impact: cleanup ]

Signed-off-by: Tejun Heo

Tejun Heo
2009-04-28 13:37:34 +0800

24 Apr, 2009

1 commit

17d5c8ca7 block: fix intermittent dm timeout based oops ... Browse Code »

Very rarely under stress testing of dm, oopses are occuring as
something tampers with an old stack frame. This has been traced back
to blk_abort_queue() leaving a timeout_list pointing to the stack.
The reason is that sometimes blk_abort_request() won't delete the
timer (if the request is marked as complete but before the timer has
been removed, a small race window). Fix this by splicing back from
the ususally empty list to the q->timeout_list.

Signed-off-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Hannes Reinecke
2009-04-24 14:54:21 +0800

22 Apr, 2009

1 commit

b75911349 block: make blk_abort_queue() ignore non-request based devices ... Browse Code »

There's nothing to do for those devices, since the timeout handling is
based on requests.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-22 14:35:10 +0800

18 Feb, 2009

1 commit

be987fdb5 block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list ... Browse Code »

blk_abort_queue() iterates the timeout list and aborts each request on the
list, but if the driver error handling readds a request to the timeout list
during this processing, we could be looping forever. Fix this by splicing
current entries to a local list and run over that list instead.

Signed-off-by: Jens Axboe

Hannes Reinecke
2009-02-18 17:34:16 +0800

29 Dec, 2008

3 commits

70ed28b92 block: leave the request timeout timer running even on an empty list ... Browse Code »

For sync IO, we'll often do them serialized. This means we'll be touching
the queue timer for every IO, as opposed to only occasionally like we
do for queued IO. Instead of deleting the timer when the last request
is removed, just let continue running. If a new request comes up soon
we then don't have to readd the timer again. If no new requests arrive,
the timer will expire without side effect later.

This improves high iops sync IO by ~1%.

Signed-off-by: Jens Axboe

Jens Axboe
2008-12-29 15:28:42 +0800
65d3618cc block: add comment in blk_rq_timed_out() about why next can not be 0 ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2008-12-29 15:28:42 +0800
565e411d7 block: optimizations in blk_rq_timed_out_timer() ... Browse Code »

Now the rq->deadline can't be zero if the request is in the
timeout_list, so there is no need to have next_set. There is no need to
access a request's deadline field if blk_rq_timed_out is called on it.

Signed-off-by: Malahal Naineni
Signed-off-by: Jens Axboe

malahal@us.ibm.com
2008-12-29 15:28:42 +0800

06 Nov, 2008

1 commit

7838c15b8 Block: use round_jiffies_up() ... Browse Code »

This patch (as1159b) changes the timeout routines in the block core to
use round_jiffies_up(). There's no point in rounding the timer
deadline down, since if it expires too early we will have to restart
it.

The patch also removes some unnecessary tests when a request is
removed from the queue's timer list.

Signed-off-by: Alan Stern
Signed-off-by: Jens Axboe

Alan Stern
2008-11-06 15:42:49 +0800

09 Oct, 2008

4 commits

7ba1fbaa4 block: use rq complete marking in blk_abort_request() ... Browse Code »

We cannot abort a request if we raced with the timeout handler already,
or with the IO completion. So make blk_abort_request() mark the request
as complete, and only continue if we succeeded.

Found and suggested by Mike Anderson

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:17 +0800
581d4e28d block: add fault injection mechanism for faking request timeouts ... Browse Code »

Only works for the generic request timer handling. Allows one to
sporadically ignore request completions, thus exercising the timeout
handling.

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:17 +0800
11914a53d block: Add interface to abort queued requests ... Browse Code »

Signed-off-by: Mike Anderson
Signed-off-by: Jens Axboe

Mike Anderson
2008-10-09 14:56:13 +0800
242f9dcb8 block: unify request timeout handling ... Browse Code »

Right now SCSI and others do their own command timeout handling.
Move those bits to the block layer.

Instead of having a timer per command, we try to be a bit more clever
and simply have one per-queue. This avoids the overhead of having to
tear down and setup a timer for each command, so it will result in a lot
less timer fiddling.

Signed-off-by: Mike Anderson
Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:13 +0800