Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

13 Feb, 2014

1 commit

c8123f8c9 block: add cond_resched() to potentially long running ioctl discard loop ... Browse Code »

When mkfs issues a full device discard and the device only
supports discards of a smallish size, we can loop in
blkdev_issue_discard() for a long time. If preempt isn't enabled,
this can turn into a softlock situation and the kernel will
start complaining.

Add an explicit cond_resched() at the end of the loop to avoid
that.

Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Jens Axboe
2014-02-13 00:36:37 +0800

24 Nov, 2013

1 commit

4f024f379 block: Abstract out bvec iterator ... Browse Code »
13

Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Geert Uytterhoeven
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "Ed L. Cashin"
Cc: Nick Piggin
Cc: Lars Ellenberg
Cc: Jiri Kosina
Cc: Matthew Wilcox
Cc: Geoff Levand
Cc: Yehuda Sadeh
Cc: Sage Weil
Cc: Alex Elder
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris
Cc: Philip Kelleher
Cc: Rusty Russell
Cc: "Michael S. Tsirkin"
Cc: Konrad Rzeszutek Wilk
Cc: Jeremy Fitzhardinge
Cc: Neil Brown
Cc: Alasdair Kergon
Cc: Mike Snitzer
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh
Cc: Benny Halevy
Cc: "James E.J. Bottomley"
Cc: Greg Kroah-Hartman
Cc: "Nicholas A. Bellinger"
Cc: Alexander Viro
Cc: Chris Mason
Cc: "Theodore Ts'o"
Cc: Andreas Dilger
Cc: Jaegeuk Kim
Cc: Steven Whitehouse
Cc: Dave Kleikamp
Cc: Joern Engel
Cc: Prasad Joshi
Cc: Trond Myklebust
Cc: KONISHI Ryusuke
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Ben Myers
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Len Brown
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: Herton Ronaldo Krzesinski
Cc: Ben Hutchings
Cc: Andrew Morton
Cc: Guo Chao
Cc: Tejun Heo
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Wei Yongjun
Cc: "Roger Pau Monné"
Cc: Jan Beulich
Cc: Stefano Stabellini
Cc: Ian Campbell
Cc: Sebastian Ott
Cc: Christian Borntraeger
Cc: Minchan Kim
Cc: Jiang Liu
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Joe Perches
Cc: Peng Tao
Cc: Andy Adamson
Cc: fanchaoting
Cc: Jie Liu
Cc: Sunil Mushran
Cc: "Martin K. Petersen"
Cc: Namjae Jeon
Cc: Pankaj Kumar
Cc: Dan Magenheimer
Cc: Mel Gorman 6

Kent Overstreet
2013-11-24 14:33:47 +0800

09 Nov, 2013

1 commit

97597dc08 block: Do not call sector_div() with a 64-bit divisor ... Browse Code »

do_div() (called by sector_div() if CONFIG_LBDAF=y) is meant for divisions
of 64-bit number by 32-bit numbers. Passing 64-bit divisor types caused
issues in the past on 32-bit platforms, cfr. commit
ea077b1b96e073eac5c3c5590529e964767fc5f7 ("m68k: Truncate base in
do_div()").

As queue_limits.max_discard_sectors and .discard_granularity are unsigned
int, max_discard_sectors and granularity should be unsigned int.
As bdev_discard_alignment() returns int, alignment should be int.
Now 2 calls to sector_div() can be replaced by 32-bit arithmetic:
- The 64-bit modulo operation can become a 32-bit modulo operation,
- The 64-bit division and multiplication can be replaced by a 32-bit
modulo operation and a subtraction.

Signed-off-by: Geert Uytterhoeven
Signed-off-by: Jens Axboe

Geert Uytterhoeven
2013-11-09 00:04:46 +0800

15 Feb, 2013

1 commit

5577022f4 block: account iowait time when waiting for completion of IO request ... Browse Code »

Using wait_for_completion() for waiting for a IO request to be executed
results in wrong iowait time accounting. For example, a system having
the only task doing write() and fdatasync() on a block device can be
reported being idle instead of iowaiting as it should because
blkdev_issue_flush() calls wait_for_completion() which in turn calls
schedule() that does not increment the iowait proc counter and thus does
not turn on iowait time accounting.

The patch makes block layer use wait_for_completion_io() instead of
wait_for_completion() where appropriate to account iowait time
correctly.

Signed-off-by: Vladimir Davydov
Signed-off-by: Jens Axboe

Vladimir Davydov
2013-02-15 23:45:07 +0800

15 Dec, 2012

2 commits

0cfbcafca block: add plug for blkdev_issue_discard ... Browse Code »

Last post of this patch appears lost, so I resend this.

Now discard merge works, add plug for blkdev_issue_discard. This will help
discard request merge especially for raid0 case. In raid0, a big discard
request is split to small requests, and if correct plug is added, such small
requests can be merged in low layer.

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2012-12-15 03:46:04 +0800
8dd2cb7e8 block: discard granularity might not be power of 2 ... Browse Code »

In MD raid case, discard granularity might not be power of 2, for example, a
4-disk raid5 has 3*chunk_size discard granularity. Correct the calculation for
such cases.

Reported-by: Neil Brown
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2012-12-15 03:46:04 +0800

20 Sep, 2012

2 commits

579e8f3c7 block: Make blkdev_issue_zeroout use WRITE SAME ... Browse Code »

If the device supports WRITE SAME, use that to optimize zeroing of
blocks. If the device does not support WRITE SAME or if the operation
fails, fall back to writing zeroes the old-fashioned way.

Signed-off-by: Martin K. Petersen
Acked-by: Mike Snitzer
Signed-off-by: Jens Axboe

Martin K. Petersen
2012-09-20 20:31:49 +0800
4363ac7c1 block: Implement support for WRITE SAME ... Browse Code »

The WRITE SAME command supported on some SCSI devices allows the same
block to be efficiently replicated throughout a block range. Only a
single logical block is transferred from the host and the storage device
writes the same data to all blocks described by the I/O.

This patch implements support for WRITE SAME in the block layer. The
blkdev_issue_write_same() function can be used by filesystems and block
drivers to replicate a buffer across a block range. This can be used to
efficiently initialize software RAID devices, etc.

Signed-off-by: Martin K. Petersen
Acked-by: Mike Snitzer
Signed-off-by: Jens Axboe

Martin K. Petersen
2012-09-20 20:31:45 +0800

02 Aug, 2012

2 commits

c6e666345 block: split discard into aligned requests ... Browse Code »

When a disk has large discard_granularity and small max_discard_sectors,
discards are not split with optimal alignment. In the limit case of
discard_granularity == max_discard_sectors, no request could be aligned
correctly, so in fact you might end up with no discarded logical blocks
at all.

Another example that helps showing the condition in the patch is with
discard_granularity == 64, max_discard_sectors == 128. A request that is
submitted for 256 sectors 2..257 will be split in two: 2..129, 130..257.
However, only 2 aligned blocks out of 3 are included in the request;
128..191 may be left intact and not discarded. With this patch, the
first request will be truncated to ensure good alignment of what's left,
and the split will be 2..127, 128..255, 256..257. The patch will also
take into account the discard_alignment.

At most one extra request will be introduced, because the first request
will be reduced by at most granularity-1 sectors, and granularity
must be less than max_discard_sectors. Subsequent requests will run
on round_down(max_discard_sectors, granularity) sectors, as in the
current code.

Signed-off-by: Paolo Bonzini
Acked-by: Vivek Goyal
Tested-by: Mike Snitzer
Signed-off-by: Jens Axboe

Paolo Bonzini
2012-08-02 15:48:50 +0800
f6ff53d36 block: reorganize rounding of max_discard_sectors ... Browse Code »

Mostly a preparation for the next patch.

In principle this fixes an infinite loop if max_discard_sectors < granularity,
but that really shouldn't happen.

Signed-off-by: Paolo Bonzini
Acked-by: Vivek Goyal
Tested-by: Mike Snitzer
Signed-off-by: Jens Axboe

Paolo Bonzini
2012-08-02 15:48:49 +0800

24 Jul, 2011

1 commit

4c64500ea block: fix patch import error in max_discard_sectors check ... Browse Code »

A '!' snuck in before the unlikely, rendering it useless.

Reported-by: Mike Snitzer
Signed-off-by: Jens Axboe

Jens Axboe
2011-07-24 02:34:59 +0800

07 Jul, 2011

1 commit

0f7996039 block: eliminate potential for infinite loop in blkdev_issue_discard ... Browse Code »

Due to the recently identified overflow in read_capacity_16() it was
possible for max_discard_sectors to be zero but still have discards
enabled on the associated device's queue.

Eliminate the possibility for blkdev_issue_discard to infinitely loop.

Interestingly this issue wasn't identified until a device, whose
discard_granularity was 0 due to read_capacity_16 overflow, was consumed
by blk_stack_limits() to construct limits for a higher-level DM
multipath device. The multipath device's resulting limits never had the
discard limits stacked because blk_stack_limits() will only do so if
the bottom device's discard_granularity != 0. This resulted in the
multipath device's limits.max_discard_sectors being 0.

Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Mike Snitzer
2011-07-07 03:32:02 +0800

07 May, 2011

3 commits

8af1954d1 blkdev: Do not return -EOPNOTSUPP if discard is supported ... Browse Code »

Currently we return -EOPNOTSUPP in blkdev_issue_discard() if any of the
bio fails due to underlying device not supporting discard request.
However, if the device is for example dm device composed of devices
which some of them support discard and some of them does not, it is ok
for some bios to fail with EOPNOTSUPP, but it does not mean that discard
is not supported at all.

This commit removes the check for bios failed with EOPNOTSUPP and change
blkdev_issue_discard() to return operation not supported if and only if
the device does not actually supports it, not just part of the device as
some bios might indicate.

This change also fixes problem with BLKDISCARD ioctl() which now works
correctly on such dm devices.

Signed-off-by: Lukas Czerner
CC: Jens Axboe
CC: Jeff Moyer
Signed-off-by: Jens Axboe

Lukas Czerner
2011-05-07 09:30:01 +0800
5baebe5c8 blkdev: Simple cleanup in blkdev_issue_zeroout() ... Browse Code »

In blkdev_issue_zeroout() we are submitting regular WRITE bios, so we do
not need to check for -EOPNOTSUPP specifically in case of error. Also
there is no need to have label submit: because there is no way to jump
out from the while cycle without an error and we really want to exit,
rather than try again. And also remove the check for (sz == 0) since at
that point sz can never be zero.

Signed-off-by: Lukas Czerner
Reviewed-by: Jeff Moyer
CC: Dmitry Monakhov
CC: Jens Axboe
Signed-off-by: Jens Axboe

Lukas Czerner
2011-05-07 09:26:28 +0800
5dba3089e blkdev: Submit discard bio in batches in blkdev_issue_discard() ... Browse Code »

Currently we are waiting for every submitted REQ_DISCARD bio separately,
but it can have unwanted consequences of repeatedly flushing the queue,
so we rather submit bios in batches and wait for the entire batch, hence
narrowing the window of other ios going in.

Use bio_batch_end_io() and struct bio_batch for that purpose, the same
is used by blkdev_issue_zeroout(). Also change bio_batch_end_io() so we
always set !BIO_UPTODATE in the case of error and remove the check for
bb, since we are the only user of this function and we always set this.

Remove bio_get()/bio_put() from the blkdev_issue_discard() since
bio_alloc() and bio_batch_end_io() is doing the same thing, hence it is
not needed anymore.

I have done simple dd testing with surprising results. The script I have
used is:

for i in $(seq 10); do
echo $i
dd if=/dev/sdb1 of=/dev/sdc1 bs=4k &
sleep 5
done
/usr/bin/time -f %e ./blkdiscard /dev/sdc1

Running time of BLKDISCARD on the whole device:
with patch without patch
0.95 15.58

So we can see that in this artificial test the kernel with the patch
applied is approx 16x faster in discarding the device.

Signed-off-by: Lukas Czerner
CC: Dmitry Monakhov
CC: Jens Axboe
CC: Jeff Moyer
Signed-off-by: Jens Axboe

Lukas Czerner
2011-05-07 09:26:27 +0800

25 Mar, 2011

1 commit

6c5103890 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...

Fix up conflicts in fs/{aio.c,super.c}

Linus Torvalds
2011-03-25 01:16:26 +0800

12 Mar, 2011

1 commit

eba2ed9c9 block: remove obsolete comments for blkdev_issue_zeroout. ... Browse Code »

barrier is already removed, so remove the obsolete comments
in blkdev_issue_zeroout.

Cc: Jens Axboe
Signed-off-by: Tao Ma
Signed-off-by: Jens Axboe

Tao Ma
2011-03-12 03:13:54 +0800

11 Mar, 2011

1 commit

0aeea1896 block: fix mis-synchronisation in blkdev_issue_zeroout() ... Browse Code »

BZ29402
https://bugzilla.kernel.org/show_bug.cgi?id=29402

We can hit serious mis-synchronization in bio completion path of
blkdev_issue_zeroout() leading to a panic.

The problem is that when we are going to wait_for_completion() in
blkdev_issue_zeroout() we check if the bb.done equals issued (number of
submitted bios). If it does, we can skip the wait_for_completition()
and just out of the function since there is nothing to wait for.
However, there is a ordering problem because bio_batch_end_io() is
calling atomic_inc(&bb->done) before complete(), hence it might seem to
blkdev_issue_zeroout() that all bios has been completed and exit. At
this point when bio_batch_end_io() is going to call complete(bb->wait),
bb and wait does not longer exist since it was allocated on stack in
blkdev_issue_zeroout() ==> panic!

(thread 1) (thread 2)
bio_batch_end_io() blkdev_issue_zeroout()
if(bb) { ...
if (bb->end_io) ...
bb->end_io(bio, err); ...
atomic_inc(&bb->done); ...
... while (issued != atomic_read(&bb.done))
... (let issued == bb.done)
... (do the rest of the function)
... return ret;
complete(bb->wait);
^^^^^^^^
panic

We can fix this easily by simplifying bio_batch and completion counting.

Also remove bio_end_io_t *end_io since it is not used.

Signed-off-by: Lukas Czerner
Reported-by: Eric Whitney
Tested-by: Eric Whitney
Reviewed-by: Jeff Moyer
CC: Dmitry Monakhov
Signed-off-by: Jens Axboe

Lukas Czerner
2011-03-11 22:36:08 +0800

02 Mar, 2011

1 commit

291d24f6d block: fix kernel-doc format for blkdev_issue_zeroout ... Browse Code »

Signed-off-by: Ben Hutchings
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Ben Hutchings
2011-03-02 02:45:24 +0800

17 Sep, 2010

1 commit

dd3932edd block: remove BLKDEV_IFL_WAIT ... Browse Code »

All the blkdev_issue_* helpers can only sanely be used for synchronous
caller. To issue cache flushes or barriers asynchronously the caller needs
to set up a bio by itself with a completion callback to move the asynchronous
state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always
specified when calling blkdev_issue_* and also remove the now unused flags
argument to blkdev_issue_flush and blkdev_issue_zeroout. For
blkdev_issue_discard we need to keep it for the secure discard flag, which
gains a more descriptive name and loses the bitops vs flag confusion.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-09-17 02:52:58 +0800

10 Sep, 2010

1 commit

8c5553678 block: remove the BLKDEV_IFL_BARRIER flag ... Browse Code »

Remove support for barriers on discards, which is unused now. Also
remove the DISCARD_NOBARRIER I/O type in favour of just setting the
rw flags up locally in blkdev_issue_discard.

tj: Also remove DISCARD_SECURE and use REQ_SECURE directly.

Signed-off-by: Christoph Hellwig
Acked-by: Mike Snitzer
Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-09-10 18:35:40 +0800

12 Aug, 2010

1 commit

8d57a98cc block: add secure discard ... Browse Code »

Secure discard is the same as discard except that all copies of the
discarded sectors (perhaps created by garbage collection) must also be
erased.

Signed-off-by: Adrian Hunter
Acked-by: Jens Axboe
Cc: Kyungmin Park
Cc: Madhusudhan Chikkature
Cc: Christoph Hellwig
Cc: Ben Gardiner
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Hunter
2010-08-12 23:43:30 +0800

09 Aug, 2010

1 commit

18edc8eaa blkdev: fix blkdev_issue_zeroout return value ... Browse Code »

- If function called without barrier option retvalue is incorrect

Signed-off-by: Dmitry Monakhov
Signed-off-by: Jens Axboe

Dmitry Monakhov
2010-08-09 00:31:08 +0800

08 Aug, 2010

2 commits

10d1f9e2c block: fix problem with sending down discard that isn't of correct granularity ... Browse Code »

If the queue doesn't have a limit set, or it just set UINT_MAX like
we default to, we coud be sending down a discard request that isn't
of the correct granularity if the block size is > 512b.

Fix this by adjusting max_discard_sectors down to the proper
alignment.

Signed-off-by: Jens Axboe

Jens Axboe
2010-08-08 00:26:33 +0800
66ac02801 block: don't allocate a payload for discard request ... Browse Code »

Allocating a fixed payload for discard requests always was a horrible hack,
and it's not coming to byte us when adding support for discard in DM/MD.

So change the code to leave the allocation of a payload to the lowlevel
driver. Unfortunately that means we'll need another hack, which allows
us to update the various block layer length fields indicating that we
have a payload. Instead of hiding this in sd.c, which we already partially
do for UNMAP support add a documented helper in the core block layer for it.

Signed-off-by: Christoph Hellwig
Acked-by: Mike Snitzer
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-08-08 00:23:08 +0800

29 Apr, 2010

3 commits

0341aafb7 block: fix bad use of min() on different types ... Browse Code »

Just cast the page size to sector_t, that will always fit.

Signed-off-by: Jens Axboe

Jens Axboe
2010-04-29 15:28:21 +0800
3f14d792f blkdev: add blkdev_issue_zeroout helper function ... Browse Code »

- Add bio_batch helper primitive. This is rather generic primitive
for submitting/waiting a complex request which consists of several
bios.
- blkdev_issue_zeroout() generate number of zero filed write bios.

Signed-off-by: Dmitry Monakhov
Signed-off-by: Jens Axboe

Dmitry Monakhov
2010-04-29 01:47:36 +0800
f31e7e402 blkdev: move blkdev_issue helper functions to separate file ... Browse Code »

Move blkdev_issue_discard from blk-barrier.c because it is
not barrier related.
Later the file will be populated by other helpers.

Signed-off-by: Dmitry Monakhov
Signed-off-by: Jens Axboe

Dmitry Monakhov
2010-04-29 01:47:36 +0800