Eric Lee / smarc-fsl-linux-kernel

01 Aug, 2020

1 commit

d958e343b block: blk-timeout: delete duplicated word ... Browse Code »

Drop the repeated word "request".
Change to the correct kernel-doc notation for function name separtor.

Signed-off-by: Randy Dunlap
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe

Randy Dunlap
2020-08-01 06:29:47 +0800

17 Jul, 2020

1 commit

943c4d907 block: make blk_timeout_init() static ... Browse Code »

The sparse tool complains as follows:

block/blk-timeout.c:93:12: warning:
symbol 'blk_timeout_init' was not declared. Should it be static?

Function blk_timeout_init() is not used outside of blk-timeout.c, so
mark it static.

Fixes: 9054650fac24 ("block: relax jiffies rounding for timeouts")
Reported-by: Hulk Robot
Signed-off-by: Wei Yongjun
Signed-off-by: Jens Axboe

Wei Yongjun
2020-07-17 21:13:42 +0800

15 Jul, 2020

1 commit

9054650fa block: relax jiffies rounding for timeouts ... Browse Code »

In doing high IOPS testing, blk-mq is generally pretty well optimized.
There are a few things that stuck out as using more CPU than what is
really warranted, and one thing is the round_jiffies_up() that we do
twice for each request. That accounts for about 0.8% of the CPU in
my testing.

We can make this cheaper by avoiding an integer division, by just adding
a rough HZ mask that we can AND with instead. The timeouts are only on a
second granularity already, we don't have to be that accurate here and
this patch barely changes that. All we care about is nice grouping.

Signed-off-by: Jens Axboe

Jens Axboe
2020-07-15 23:23:35 +0800

24 Jun, 2020

1 commit

15f73f5b3 blk-mq: move failure injection out of blk_mq_complete_request ... Browse Code »

Move the call to blk_should_fake_timeout out of blk_mq_complete_request
and into the drivers, skipping call sites that are obvious error
handlers, and remove the now superflous blk_mq_force_complete_rq helper.
This ensures we don't keep injecting errors into completions that just
terminate the Linux request after the hardware has been reset or the
command has been aborted.

Reviewed-by: Daniel Wagner
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-06-24 23:15:57 +0800

01 May, 2019

1 commit

3dcf60bcb block: add SPDX tags to block layer files missing licensing information ... Browse Code »

Various block layer files do not have any licensing information at all.
Add SPDX tags for the default kernel GPLv2 license to those.

Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-05-01 06:12:03 +0800

16 Nov, 2018

2 commits

39795d653 block: don't hold the queue_lock over blk_abort_request ... Browse Code »

There is nothing it could synchronize against, so don't go through
the pains of acquiring the lock.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-16 03:13:18 +0800
079076b34 block: remove deadline __deadline manipulation helpers ... Browse Code »

No users left since the removal of the legacy request interface, we can
remove all the magic bit stealing now and make it a normal field.

But use WRITE_ONCE/READ_ONCE on the new deadline field, given that we
don't seem to have any mechanism to guarantee a new value actually
gets seen by other threads.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-16 03:13:16 +0800

10 Nov, 2018

1 commit

9d037ad70 block: remove req->timeout_list ... Browse Code »

Unused now that the legacy request path is gone.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-10 02:44:10 +0800

08 Nov, 2018

1 commit

4316b79e4 block: kill legacy parts of timeout handling ... Browse Code »

The only user of legacy timing now is BSG, which is invoked
from the mq timeout handler. Kill the legacy code, and rename
the q->rq_timed_out_fn to q->bsg_job_timeout_fn.

Reviewed-by: Hannes Reinecke
Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:33 +0800

24 Jun, 2018

1 commit

f5e350f02 blk-mq: Fix timeout handling in case the timeout handler returns BLK_EH_DONE ... Browse Code »

Make sure that RQF_TIMED_OUT is cleared when a request is reused
after a block driver timeout handler has returned BLK_EH_DONE.

Fixes: da6612673988 ("blk-mq: don't time out requests again that are in the timeout handler")
Signed-off-by: Bart Van Assche
Cc: Christoph Hellwig
Cc: Jianchao Wang
Cc: Andrew Randrianasulu
Signed-off-by: Jens Axboe

Bart Van Assche
2018-06-24 00:25:45 +0800

29 May, 2018

3 commits

f6e7d48a7 block: remove BLK_EH_HANDLED ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-29 22:59:21 +0800
6600593cb block: rename BLK_EH_NOT_HANDLED to BLK_EH_DONE ... Browse Code »

The BLK_EH_NOT_HANDLED implies nothing happen, but very often that
is not what is happening - instead the driver already completed the
command. Fix the symbolic name to reflect that a little better.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-29 22:59:21 +0800
12f5b9314 blk-mq: Remove generation seqeunce ... Browse Code »

This patch simplifies the timeout handling by relying on the request
reference counting to ensure the iterator is operating on an inflight
and truly timed out request. Since the reference counting prevents the
tag from being reallocated, the block layer no longer needs to prevent
drivers from completing their requests while the timeout handler is
operating on it: a driver completing a request is allowed to proceed to
the next state without additional syncronization with the block layer.

This also removes any need for generation sequence numbers since the
request lifetime is prevented from being reallocated as a new sequence
while timeout handling is operating on it.

To enables this a refcount is added to struct request so that request
users can be sure they're operating on the same request without it
changing while they're processing it. The request's tag won't be
released for reuse until both the timeout handler and the completion
are done with it.

Signed-off-by: Keith Busch
[hch: slight cleanups, added back submission side hctx lock, use cmpxchg
for completions]
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Keith Busch
2018-05-29 22:59:21 +0800

03 Apr, 2018

1 commit

bc6d65e6d blk-mq: Directly schedule q->timeout_work when aborting a request ... Browse Code »

Request abortion is performed by overriding deadline to now and
scheduling timeout handling immediately. For the latter part, the
code was using mod_timer(timeout, 0) which can't guarantee that the
timer runs afterwards. Let's schedule the underlying work item
directly instead.

This fixes the hangs during probing reported by Sitsofe but it isn't
yet clear to me how the failure can happen reliably if it's just the
above described race condition.

Signed-off-by: Tejun Heo
Reported-by: Sitsofe Wheeler
Reported-by: Meelis Roos
Fixes: 358f70da49d7 ("blk-mq: make blk_abort_request() trigger timeout path")
Cc: stable@vger.kernel.org # v4.16
Link: http://lkml.kernel.org/r/CALjAwxh-PVYFnYFCJpGOja+m5SzZ8Sa4J7ohxdK=r8NyOF-EMA@mail.gmail.com
Link: http://lkml.kernel.org/r/alpine.LRH.2.21.1802261049140.4893@math.ut.ee
Signed-off-by: Jens Axboe

Tejun Heo
2018-04-03 06:36:13 +0800

09 Mar, 2018

1 commit

8814ce8a0 block: Introduce blk_queue_flag_{set,clear,test_and_{set,clear}}() ... Browse Code »

Introduce functions that modify the queue flags and that protect
these modifications with the request queue lock. Except for moving
one wake_up_all() call from inside to outside a critical section,
this patch does not change any functionality.

Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Ming Lei
Reviewed-by: Johannes Thumshirn
Reviewed-by: Martin K. Petersen
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2018-03-09 05:13:48 +0800

11 Jan, 2018

1 commit

0a72e7f44 block: add accessors for setting/querying request deadline ... Browse Code »

We reduce the resolution of request expiry, but since we're already
using jiffies for this where resolution depends on the kernel
configuration and since the timeout resolution is coarse anyway,
that should be fine.

Reviewed-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-01-11 02:47:47 +0800

10 Jan, 2018

3 commits

634f9e463 blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq ... Browse Code »

After the recent updates to use generation number and state based
synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except
to avoid firing the same timeout multiple times.

Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag
RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same timeout multiple
times. This removes atomic bitops from hot paths too.

v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out().

v3: Added RQF_MQ_TIMEOUT_EXPIRED flag.

Signed-off-by: Tejun Heo
Cc: "jianchao.wang"
Signed-off-by: Jens Axboe

Tejun Heo
2018-01-10 00:31:15 +0800
358f70da4 blk-mq: make blk_abort_request() trigger timeout path ... Browse Code »

With issue/complete and timeout paths now using the generation number
and state based synchronization, blk_abort_request() is the only one
which depends on REQ_ATOM_COMPLETE for arbitrating completion.

There's no reason for blk_abort_request() to be a completely separate
path. This patch makes blk_abort_request() piggyback on the timeout
path instead of trying to terminate the request directly.

This removes the last dependency on REQ_ATOM_COMPLETE in blk-mq.

Note that this makes blk_abort_request() asynchronous - it initiates
abortion but the actual termination will happen after a short while,
even when the caller owns the request. AFAICS, SCSI and ATA should be
fine with that and I think mtip32xx and dasd should be safe but not
completely sure. It'd be great if people who know the drivers take a
look.

v2: - Add comment explaining the lack of synchronization around
->deadline update as requested by Bart.

Signed-off-by: Tejun Heo
Cc: Asai Thambi SP
Cc: Stefan Haberland
Cc: Jan Hoeppner
Cc: Bart Van Assche
Signed-off-by: Jens Axboe

Tejun Heo
2018-01-10 00:31:15 +0800
1d9bd5161 blk-mq: replace timeout synchronization with a RCU and generation based scheme ... Browse Code »

Currently, blk-mq timeout path synchronizes against the usual
issue/completion path using a complex scheme involving atomic
bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
rules. Unfortunately, it contains quite a few holes.

There's a complex dancing around REQ_ATOM_STARTED and
REQ_ATOM_COMPLETE between issue/completion and timeout paths; however,
they don't have a synchronization point across request recycle
instances and it isn't clear what the barriers add.
blk_mq_check_expired() can easily read STARTED from N-2'th iteration,
deadline from N-1'th, blk_mark_rq_complete() against Nth instance.

In fact, it's pretty easy to make blk_mq_check_expired() terminate a
later instance of a request. If we induce 5 sec delay before
time_after_eq() test in blk_mq_check_expired(), shorten the timeout to
2s, and issue back-to-back large IOs, blk-mq starts timing out
requests spuriously pretty quickly. Nothing actually timed out. It
just made the call on a recycle instance of a request and then
terminated a later instance long after the original instance finished.
The scenario isn't theoretical either.

This patch replaces the broken synchronization mechanism with a RCU
and generation number based one.

1. Each request has a u64 generation + state value, which can be
updated only by the request owner. Whenever a request becomes
in-flight, the generation number gets bumped up too. This provides
the basis for the timeout path to distinguish different recycle
instances of the request.

Also, marking a request in-flight and setting its deadline are
protected with a seqcount so that the timeout path can fetch both
values coherently.

2. The timeout path fetches the generation, state and deadline. If
the verdict is timeout, it records the generation into a dedicated
request abortion field and does RCU wait.

3. The completion path is also protected by RCU (from the previous
patch) and checks whether the current generation number and state
match the abortion field. If so, it skips completion.

4. The timeout path, after RCU wait, scans requests again and
terminates the ones whose generation and state still match the ones
requested for abortion.

By now, the timeout path knows that either the generation number
and state changed if it lost the race or the completion will yield
to it and can safely timeout the request.

While it's more lines of code, it's conceptually simpler, doesn't
depend on direct use of subtle memory ordering or coherence, and
hopefully doesn't terminate the wrong instance.

While this change makes REQ_ATOM_COMPLETE synchronization unnecessary
between issue/complete and timeout paths, REQ_ATOM_COMPLETE isn't
removed yet as it's still used in other places. Future patches will
move all state tracking to the new mechanism and remove all bitops in
the hot paths.

Note that this patch adds a comment explaining a race condition in
BLK_EH_RESET_TIMER path. The race has always been there and this
patch doesn't change it. It's just documenting the existing race.

v2: - Fixed BLK_EH_RESET_TIMER handling as pointed out by Jianchao.
- s/request->gstate_seqc/request->gstate_seq/ as suggested by Peter.
- READ_ONCE() added in blk_mq_rq_update_state() as suggested by Peter.

v3: - Fixed possible extended seqcount / u64_stats_sync read looping
spotted by Peter.
- MQ_RQ_IDLE was incorrectly being set in complete_request instead
of free_request. Fixed.

v4: - Rebased on top of hctx_lock() refactoring patch.
- Added comment explaining the use of hctx_lock() in completion path.

v5: - Added comments requested by Bart.
- Note the addition of BLK_EH_RESET_TIMER race condition in the
commit message.

Signed-off-by: Tejun Heo
Cc: "jianchao.wang"
Cc: Peter Zijlstra
Cc: Christoph Hellwig
Cc: Bart Van Assche
Signed-off-by: Jens Axboe

Tejun Heo
2018-01-10 00:31:15 +0800

31 Oct, 2017

1 commit

4e9b6f208 block: Fix a race between blk_cleanup_queue() and timeout handling ... Browse Code »

Make sure that if the timeout timer fires after a queue has been
marked "dying" that the affected requests are finished.

Reported-by: chenxiang (M)
Fixes: commit 287922eb0b18 ("block: defer timeouts to a workqueue")
Signed-off-by: Bart Van Assche
Tested-by: chenxiang (M)
Cc: Christoph Hellwig
Cc: Keith Busch
Cc: Hannes Reinecke
Cc: Ming Lei
Cc: Johannes Thumshirn
Cc:
Signed-off-by: Jens Axboe

Bart Van Assche
2017-10-31 03:28:10 +0800

05 Oct, 2017

1 commit

a7af0af32 blk-mq: attempt to fix atomic flag memory ordering ... Browse Code »

Attempt to untangle the ordering in blk-mq. The patch introducing the
single smp_mb__before_atomic() is obviously broken in that it doesn't
clearly specify a pairing barrier and an obtained guarantee.

The comment is further misleading in that it hints that the
deadline store and the COMPLETE store also need to be ordered, but
AFAICT there is no such dependency. However what does appear to be
important is the clear happening _after_ the store, and that worked by
pure accident.

This clarifies blk_mq_start_request() -- we should not get there with
STARTING set -- this simplifies the code and makes the barrier usage
sane (the old code could be read to allow not having _any_ atomic after
the barrier, in which case the barrier hasn't got anything to order). We
then also introduce the missing pairing barrier for it.

Also down-grade the barrier to smp_wmb(), this is cheaper for
PowerPC/ARM and doesn't cost anything extra on x86.

And it documents the STARTING vs COMPLETE ordering. Although I've not
been entirely successful in reverse engineering the blk-mq state
machine so there might still be more funnies around timeout vs
requeue.

If I got anything wrong, feel free to educate me by adding comments to
clarify things ;-)

Cc: Alan Stern
Cc: Will Deacon
Cc: Ming Lei
Cc: Christoph Hellwig
Cc: Andrea Parri
Cc: Boqun Feng
Cc: Bart Van Assche
Cc: "Paul E. McKenney"
Fixes: 538b75341835 ("blk-mq: request deadline must be visible before marking rq as started")
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Jens Axboe

Peter Zijlstra
2017-10-05 01:20:11 +0800

21 Jun, 2017

1 commit

2fff8a924 block: Check locking assumptions at runtime ... Browse Code »

Instead of documenting the locking assumptions of most block layer
functions as a comment, use lockdep_assert_held() to verify locking
assumptions at runtime.

Signed-off-by: Bart Van Assche
Reviewed-by: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Omar Sandoval
Cc: Ming Lei
Signed-off-by: Jens Axboe

Bart Van Assche
2017-06-21 09:27:14 +0800

21 Apr, 2017

1 commit

caf7df122 block: remove the errors field from struct request ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Acked-by: Roger Pau Monné
Reviewed-by: Konrad Rzeszutek Wilk
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-04-21 02:16:10 +0800

23 Dec, 2015

2 commits

bbc758ec0 block: remove REQ_NO_TIMEOUT flag ... Browse Code »

This was added for the 'magic' AEN requests in the NVMe driver that never
return. We now handle them purely inside the driver and don't need this
core hack any more.

Signed-off-by: Christoph Hellwig
Acked-by: Keith Busch
Signed-off-by: Jens Axboe

Christoph Hellwig
2015-12-23 00:38:34 +0800
287922eb0 block: defer timeouts to a workqueue ... Browse Code »

Timer context is not very useful for drivers to perform any meaningful abort
action from. So instead of calling the driver from this useless context
defer it to a workqueue as soon as possible.

Note that while a delayed_work item would seem the right thing here I didn't
dare to use it due to the magic in blk_add_timer that pokes deep into timer
internals. But maybe this encourages Tejun to add a sensible API for that to
the workqueue API and we'll all be fine in the end :)

Contains a major update from Keith Bush:

"This patch removes synchronizing the timeout work so that the timer can
start a freeze on its own queue. The timer enters the queue, so timer
context can only start a freeze, but not wait for frozen."

Signed-off-by: Christoph Hellwig
Acked-by: Keith Busch
Signed-off-by: Jens Axboe

Christoph Hellwig
2015-12-23 00:38:16 +0800

25 Nov, 2015

2 commits

3b627a3f9 block: clarify blk_add_timer() use case for blk-mq ... Browse Code »

Just a comment update on not needing queue_lock, and that we aren't
really adding the request to a timeout list for !mq.

Signed-off-by: Jens Axboe

Jens Axboe
2015-11-25 06:58:53 +0800
55ce0da1d block: fix blk_abort_request for blk-mq drivers ... Browse Code »

We only added the request to the request list for the !blk-mq case,
so we should only delete it in that case as well.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2015-11-25 06:24:10 +0800

08 Jan, 2015

1 commit

5b3f25fc3 blk-mq: Allow requests to never expire ... Browse Code »

Some types of requests may be started that are not gauranteed to ever
complete. This adds a request flag that a driver can use so mark the
request as such.

Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe

Keith Busch
2015-01-08 23:59:01 +0800

23 Sep, 2014

3 commits

904158376 block: fix blk_abort_request on blk-mq ... Browse Code »

Signed-off-by: Christoph Hellwig

Moved blk_mq_rq_timed_out() definition to the private blk-mq.h header.

Signed-off-by: Jens Axboe

Christoph Hellwig
2014-09-23 02:00:08 +0800
5e940aaa5 blk-timeout: fix blk_add_timer ... Browse Code »

Commit 8cb34819cdd5d(blk-mq: unshared timeout handler) introduces
blk-mq's own timeout handler, and removes following line:

blk_queue_rq_timed_out(q, blk_mq_rq_timed_out);

which then causes blk_add_timer() to bypass adding the timer,
since blk-mq no longer has q->rq_timed_out_fn defined.

This patch fixes the problem by bypassing the check for blk-mq,
so that both request deadlines are still set and the rolling
timer updated.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-23 02:00:08 +0800
46f92d42e blk-mq: unshared timeout handler ... Browse Code »

Duplicate the (small) timeout handler in blk-mq so that we can pass
arguments more easily to the driver timeout handler. This enables
the next patch.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-09-23 02:00:07 +0800

31 May, 2014

1 commit

c7bca4183 block: ensure that the timer is always added ... Browse Code »

Commit f793aa537866 relaxed the timer addition a little too much.
If the timer isn't pending, we always need to add it.

Signed-off-by: Jens Axboe

Jens Axboe
2014-05-31 05:41:39 +0800

14 May, 2014

1 commit

0d2602ca3 blk-mq: improve support for shared tags maps ... Browse Code »

This adds support for active queue tracking, meaning that the
blk-mq tagging maintains a count of active users of a tag set.
This allows us to maintain a notion of fairness between users,
so that we can distribute the tag depth evenly without starving
some users while allowing others to try unfair deep queues.

If sharing of a tag set is detected, each hardware queue will
track the depth of its own queue. And if this exceeds the total
depth divided by the number of active queues, the user is actively
throttled down.

The active queue count is done lazily to avoid bouncing that data
between submitter and completer. Each hardware queue gets marked
active when it allocates its first tag, and gets marked inactive
when 1) the last tag is cleared, and 2) the queue timeout grace
period has passed.

Signed-off-by: Jens Axboe

Jens Axboe
2014-05-14 05:10:52 +0800

25 Apr, 2014

1 commit

c4a634f43 block: fold __blk_add_timer into blk_add_timer ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-04-25 22:24:40 +0800

24 Apr, 2014

1 commit

87ee7b112 blk-mq: fix race with timeouts and requeue events ... Browse Code »

If a requeue event races with a timeout, we can get into the
situation where we attempt to complete a request from the
timeout handler when it's not start anymore. This causes a crash.
So have the timeout handler check that REQ_ATOM_STARTED is still
set on the request - if not, we ignore the event. If this happens,
the request has now been marked as complete. As a consequence, we
need to ensure to clear REQ_ATOM_COMPLETE in blk_mq_start_request(),
as to maintain proper request state.

Signed-off-by: Jens Axboe

Jens Axboe
2014-04-24 22:51:47 +0800

17 Apr, 2014

1 commit

f793aa537 block: relax when to modify the timeout timer ... Browse Code »

Since we are now, by default, applying timer slack to expiry times,
the logic for when to modify a timer in the block code is suboptimal.
The block layer keeps a forward rolling timer per queue for all
requests, and modifies this timer if a request has a shorter timeout
than what the current expiry time is. However, this breaks down
when our rounded timer values get applied slack. Then each new
request ends up modifying the timer, since we're still a little
in front of the timer + slack.

Fix this by allowing a tolerance of HZ / 2, the timeout handling
doesn't need to be very precise. This drastically cuts down
the number of timer modifications we have to make.

Signed-off-by: Jens Axboe

Jens Axboe
2014-04-17 04:15:25 +0800

11 Feb, 2014

1 commit

30a91cb4e blk-mq: rework I/O completions ... Browse Code »

Rework I/O completions to work more like the old code path. blk_mq_end_io
now stays out of the business of deferring completions to others CPUs
and calling blk_mark_rq_complete. The latter is very important to allow
completing requests that have timed out and thus are already marked completed,
the former allows using the IPI callout even for driver specific completions
instead of having to reimplement them.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-02-11 00:27:31 +0800

09 Nov, 2013

2 commits

e37459b8e Merge branch 'blk-mq/core' into for-3.13/core ... Browse Code »

Signed-off-by: Jens Axboe

Conflicts:
block/blk-timeout.c

Jens Axboe
2013-11-09 00:08:12 +0800
8616ebb16 block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO ... Browse Code »

This patch fixes coccinelle error regarding usage of IS_ERR and
PTR_ERR instead of PTR_ERR_OR_ZERO.

Signed-off-by: Duan Jiong
Signed-off-by: Jens Axboe

Duan Jiong
2013-11-09 00:05:30 +0800

08 Nov, 2013

1 commit

4912aa6c1 block: fix race between request completion and timeout handling ... Browse Code »

crocode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma dca be2net sg ses enclosure ext4 mbcache jbd2 sd_mod crc_t10dif ahci megaraid_sas(U) dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 491, comm: scsi_eh_0 Tainted: G W ---------------- 2.6.32-220.13.1.el6.x86_64 #1 IBM -[8722PAX]-/00D1461
RIP: 0010:[] [] blk_requeue_request+0x94/0xa0
RSP: 0018:ffff881057eefd60 EFLAGS: 00010012
RAX: ffff881d99e3e8a8 RBX: ffff881d99e3e780 RCX: ffff881d99e3e8a8
RDX: ffff881d99e3e8a8 RSI: ffff881d99e3e780 RDI: ffff881d99e3e780
RBP: ffff881057eefd80 R08: ffff881057eefe90 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff881057f92338
R13: 0000000000000000 R14: ffff881057f92338 R15: ffff883058188000
FS: 0000000000000000(0000) GS:ffff880040200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006d3ec0 CR3: 000000302cd7d000 CR4: 00000000000406b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scsi_eh_0 (pid: 491, threadinfo ffff881057eee000, task ffff881057e29540)
Stack:
0000000000001057 0000000000000286 ffff8810275efdc0 ffff881057f16000
ffff881057eefdd0 ffffffff81362323 ffff881057eefe20 ffffffff8135f393
ffff881057e29af8 ffff8810275efdc0 ffff881057eefe78 ffff881057eefe90
Call Trace:
[] __scsi_queue_insert+0xa3/0x150
[] ? scsi_eh_ready_devs+0x5e3/0x850
[] scsi_queue_insert+0x13/0x20
[] scsi_eh_flush_done_q+0x104/0x160
[] scsi_error_handler+0x35b/0x660
[] ? scsi_error_handler+0x0/0x660
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
Code: 00 00 eb d1 4c 8b 2d 3c 8f 97 00 4d 85 ed 74 bf 49 8b 45 00 49 83 c5 08 48 89 de 4c 89 e7 ff d0 49 8b 45 00 48 85 c0 75 eb eb a4 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00
RIP [] blk_requeue_request+0x94/0xa0
RSP

The RIP is this line:
BUG_ON(blk_queued_rq(rq));

After digging through the code, I think there may be a race between the
request completion and the timer handler running.

A timer is started for each request put on the device's queue (see
blk_start_request->blk_add_timer). If the request does not complete
before the timer expires, the timer handler (blk_rq_timed_out_timer)
will mark the request complete atomically:

static inline int blk_mark_rq_complete(struct request *rq)
{
return test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
}

and then call blk_rq_timed_out. The latter function will call
scsi_times_out, which will return one of BLK_EH_HANDLED,
BLK_EH_RESET_TIMER or BLK_EH_NOT_HANDLED. If BLK_EH_RESET_TIMER is
returned, blk_clear_rq_complete is called, and blk_add_timer is again
called to simply wait longer for the request to complete.

Now, if the request happens to complete while this is going on, what
happens? Given that we know the completion handler will bail if it
finds the REQ_ATOM_COMPLETE bit set, we need to focus on the completion
handler running after that bit is cleared. So, from the above
paragraph, after the call to blk_clear_rq_complete. If the completion
sets REQ_ATOM_COMPLETE before the BUG_ON in blk_add_timer, we go boom
there (I haven't seen this in the cores). Next, if we get the
completion before the call to list_add_tail, then the timer will
eventually fire for an old req, which may either be freed or reallocated
(there is evidence that this might be the case). Finally, if the
completion comes in *after* the addition to the timeout list, I think
it's harmless. The request will be removed from the timeout list,
req_atom_complete will be set, and all will be well.

This will only actually explain the coredumps *IF* the request
structure was freed, reallocated *and* queued before the error handler
thread had a chance to process it. That is possible, but it may make
sense to keep digging for another race. I think that if this is what
was happening, we would see other instances of this problem showing up
as null pointer or garbage pointer dereferences, for example when the
request structure was not re-used. It looks like we actually do run
into that situation in other reports.

This patch moves the BUG_ON(test_bit(REQ_ATOM_COMPLETE,
&req->atomic_flags)); from blk_add_timer to the only caller that could
trip over it (blk_start_request). It then inverts the calls to
blk_clear_rq_complete and blk_add_timer in blk_rq_timed_out to address
the race. I've boot tested this patch, but nothing more.

Signed-off-by: Jeff Moyer
Acked-by: Hannes Reinecke
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Jeff Moyer
2013-11-08 23:59:04 +0800