Eric Lee / smarc-fsl-linux-kernel

14 Apr, 2016

1 commit

c888a8f95 block: kill off q->flush_flags ... Browse Code »

Now that we converted everything to the newer block write cache
interface, kill off the queue flush_flags and queueable flush
entries.

Signed-off-by: Jens Axboe

Jens Axboe
2016-04-14 03:33:19 +0800

13 Apr, 2016

2 commits

2245f6de6 block: kill blk_queue_flush() ... Browse Code »

We don't have any drivers left using it, so kill it off. Update
documentation to use the newer blk_queue_write_cache().

Signed-off-by: Jens Axboe
Reviewed-by: Christoph Hellwig

Jens Axboe
2016-04-13 06:00:39 +0800
93e9d8e83 block: add ability to flag write back caching on a device ... Browse Code »

Add an internal helper and flag for setting whether a queue has
write back caching, or write through (or none). Add a sysfs file
to show this as well, and make it changeable from user space.

This will replace the (awkward) blk_queue_flush() interface that
drivers currently use to inform the block layer of write cache state
and capabilities.

Signed-off-by: Jens Axboe
Reviewed-by: Christoph Hellwig

Jens Axboe
2016-04-13 05:46:27 +0800

05 Apr, 2016

1 commit

09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

12 Feb, 2016

1 commit

5f009d3f8 block: Initialize max_dev_sectors to 0 ... Browse Code »

The new queue limit is not used by the majority of block drivers, and
should be initialized to 0 for the driver's requested settings to be used.

Signed-off-by: Keith Busch
Acked-by: Martin K. Petersen
Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Keith Busch
2016-02-12 04:10:39 +0800

26 Nov, 2015

1 commit

ca369d51b block/sd: Fix device-imposed transfer length limits ... Browse Code »

Commit 4f258a46346c ("sd: Fix maximum I/O size for BLOCK_PC requests")
had the unfortunate side-effect of removing an implicit clamp to
BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer
code. This caused problems for some SMR drives.

Debugging this issue revealed a few problems with the existing
infrastructure since the block layer didn't know how to deal with
device-imposed limits, only limits set by the I/O controller.

- Introduce a new queue limit, max_dev_sectors, which is used by the
ULD to signal the maximum sectors for a REQ_TYPE_FS request.

- Ensure that max_dev_sectors is correctly stacked and taken into
account when overriding max_sectors through sysfs.

- Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer
values for later processing.

- In sd_revalidate() set the queue's max_dev_sectors based on the
MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value
is not reported, fall back to a cap based on the CDB TRANSFER LENGTH
field size.

- In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits
VPD--if reported and sane--to signal the preferred device transfer
size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS.

- blk_limits_max_hw_sectors() is no longer used and can be removed.

Signed-off-by: Martin K. Petersen
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581
Reviewed-by: Christoph Hellwig
Tested-by: sweeneygj@gmx.com
Tested-by: Arzeets
Tested-by: David Eisner
Tested-by: Mario Kicherer
Signed-off-by: Martin K. Petersen

Martin K. Petersen
2015-11-26 10:38:58 +0800

03 Sep, 2015

1 commit

1081230b7 Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block updates from Jens Axboe:
"This first core part of the block IO changes contains:

- Cleanup of the bio IO error signaling from Christoph. We used to
rely on the uptodate bit and passing around of an error, now we
store the error in the bio itself.

- Improvement of the above from myself, by shrinking the bio size
down again to fit in two cachelines on x86-64.

- Revert of the max_hw_sectors cap removal from a revision again,
from Jeff Moyer. This caused performance regressions in various
tests. Reinstate the limit, bump it to a more reasonable size
instead.

- Make /sys/block//queue/discard_max_bytes writeable, by me.
Most devices have huge trim limits, which can cause nasty latencies
when deleting files. Enable the admin to configure the size down.
We will look into having a more sane default instead of UINT_MAX
sectors.

- Improvement of the SGP gaps logic from Keith Busch.

- Enable the block core to handle arbitrarily sized bios, which
enables a nice simplification of bio_add_page() (which is an IO hot
path). From Kent.

- Improvements to the partition io stats accounting, making it
faster. From Ming Lei.

- Also from Ming Lei, a basic fixup for overflow of the sysfs pending
file in blk-mq, as well as a fix for a blk-mq timeout race
condition.

- Ming Lin has been carrying Kents above mentioned patches forward
for a while, and testing them. Ming also did a few fixes around
that.

- Sasha Levin found and fixed a use-after-free problem introduced by
the bio->bi_error changes from Christoph.

- Small blk cgroup cleanup from Viresh Kumar"

* 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits)
blk: Fix bio_io_vec index when checking bvec gaps
block: Replace SG_GAPS with new queue limits mask
block: bump BLK_DEF_MAX_SECTORS to 2560
Revert "block: remove artifical max_hw_sectors cap"
blk-mq: fix race between timeout and freeing request
blk-mq: fix buffer overflow when reading sysfs file of 'pending'
Documentation: update notes in biovecs about arbitrarily sized bios
block: remove bio_get_nr_vecs()
fs: use helper bio_add_page() instead of open coding on bi_io_vec
block: kill merge_bvec_fn() completely
md/raid5: get rid of bio_fits_rdev()
md/raid5: split bio for chunk_aligned_read
block: remove split code in blkdev_issue_{discard,write_same}
btrfs: remove bio splitting and merge_bvec_fn() calls
bcache: remove driver private bio splitting code
block: simplify bio_add_page()
block: make generic_make_request handle arbitrarily sized bios
blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL)
block: don't access bio->bi_error after bio_put()
block: shrink struct bio down to 2 cache lines again
...

Linus Torvalds
2015-09-03 04:10:25 +0800

20 Aug, 2015

1 commit

03100aada block: Replace SG_GAPS with new queue limits mask ... Browse Code »

The SG_GAPS queue flag caused checks for bio vector alignment against
PAGE_SIZE, but the device may have different constraints. This patch
adds a queue limits so a driver with such constraints can set to allow
requests that would have been unnecessarily split. The new gaps check
takes the request_queue as a parameter to simplify the logic around
invoking this function.

This new limit makes the queue flag redundant, so removing it and
all usage. Device-mappers will inherit the correct settings through
blk_stack_limits().

Signed-off-by: Keith Busch
Reviewed-by: Martin K. Petersen
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Keith Busch
2015-08-20 05:26:02 +0800

19 Aug, 2015

1 commit

30e2bc08b Revert "block: remove artifical max_hw_sectors cap" ... Browse Code »

This reverts commit 34b48db66e08ca1c1bc07cf305d672ac940268dc.
That commit caused performance regressions for streaming I/O
workloads on a number of different storage devices, from
SATA disks to external RAID arrays. It also managed to
trip up some buggy firmware in at least one drive, causing
data corruption.

The next patch will bump the default max_sectors_kb value to
1280, which will accommodate a 10-data-disk stripe write
with chunk size 128k. In the testing I've done using iozone,
fio, and aio-stress, a value of 1280 does not show a big
performance difference from 512. This will hopefully still
help the software RAID setup that Christoph saw the original
performance gains with while still not regressing other
storage configurations.

Signed-off-by: Jeff Moyer
Signed-off-by: Jens Axboe

Jeff Moyer
2015-08-19 04:21:13 +0800

14 Aug, 2015

1 commit

8ae126660 block: kill merge_bvec_fn() completely ... Browse Code »

As generic_make_request() is now able to handle arbitrarily sized bios,
it's no longer necessary for each individual block driver to define its
own ->merge_bvec_fn() callback. Remove every invocation completely.

Cc: Jens Axboe
Cc: Lars Ellenberg
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina
Cc: Yehuda Sadeh
Cc: Sage Weil
Cc: Alex Elder
Cc: ceph-devel@vger.kernel.org
Cc: Alasdair Kergon
Cc: Mike Snitzer
Cc: dm-devel@redhat.com
Cc: Neil Brown
Cc: linux-raid@vger.kernel.org
Cc: Christoph Hellwig
Cc: "Martin K. Petersen"
Acked-by: NeilBrown (for the 'md' bits)
Acked-by: Mike Snitzer
Signed-off-by: Kent Overstreet
[dpark: also remove ->merge_bvec_fn() in dm-thin as well as
dm-era-target, and resolve merge conflicts]
Signed-off-by: Dongsu Park
Signed-off-by: Ming Lin
Signed-off-by: Jens Axboe

Kent Overstreet
2015-08-14 02:31:57 +0800

13 Aug, 2015

1 commit

4f258a463 sd: Fix maximum I/O size for BLOCK_PC requests ... Browse Code »

Commit bcdb247c6b6a ("sd: Limit transfer length") clamped the maximum
size of an I/O request to the MAXIMUM TRANSFER LENGTH field in the BLOCK
LIMITS VPD. This had the unfortunate effect of also limiting the maximum
size of non-filesystem requests sent to the device through sg/bsg.

Avoid using blk_queue_max_hw_sectors() and set the max_sectors queue
limit directly.

Also update the comment in blk_limits_max_hw_sectors() to clarify that
max_hw_sectors defines the limit for the I/O controller only.

Signed-off-by: Martin K. Petersen
Reported-by: Brian King
Tested-by: Brian King
Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: James Bottomley

Martin K. Petersen
2015-08-13 02:54:37 +0800

17 Jul, 2015

1 commit

0034af036 block: make /sys/block/<dev>/queue/discard_max_bytes writeable ... Browse Code »

Lots of devices support huge discard sizes these days. Depending
on how the device handles them internally, huge discards can
introduce massive latencies (hundreds of msec) on the device side.

We have a sysfs file, discard_max_bytes, that advertises the max
hardware supported discard size. Make this writeable, and split
the settings into a soft and hard limit. This can be set from
'discard_granularity' and up to the hardware limit.

Add a new sysfs file, 'discard_max_hw_bytes', that shows the hw
set limit.

Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe

Jens Axboe
2015-07-17 22:41:53 +0800

31 Mar, 2015

1 commit

e9637415a block: fix blk_stack_limits() regression due to lcm() change ... Browse Code »

Linux 3.19 commit 69c953c ("lib/lcm.c: lcm(n,0)=lcm(0,n) is 0, not n")
caused blk_stack_limits() to not properly stack queue_limits for stacked
devices (e.g. DM).

Fix this regression by establishing lcm_not_zero() and switching
blk_stack_limits() over to using it.

DM uses blk_set_stacking_limits() to establish the initial top-level
queue_limits that are then built up based on underlying devices' limits
using blk_stack_limits(). In the case of optimal_io_size (io_opt)
blk_set_stacking_limits() establishes a default value of 0. With commit
69c953c, lcm(0, n) is no longer n, which compromises proper stacking of
the underlying devices' io_opt.

Test:
$ modprobe scsi_debug dev_size_mb=10 num_tgts=1 opt_blks=1536
$ cat /sys/block/sde/queue/optimal_io_size
786432
$ dmsetup create node --table "0 100 linear /dev/sde 0"

Before this fix:
$ cat /sys/block/dm-5/queue/optimal_io_size
0

After this fix:
$ cat /sys/block/dm-5/queue/optimal_io_size
786432

Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org # 3.19+
Acked-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Mike Snitzer
2015-03-31 23:45:50 +0800

22 Oct, 2014

1 commit

34b48db66 block: remove artifical max_hw_sectors cap ... Browse Code »

Set max_sectors to the value the drivers provides as hardware limit by
default. Linux had proper I/O throttling for a long time and doesn't
rely on a artifically small maximum I/O size anymore. By not limiting
the I/O size by default we remove an annoying tuning step required for
most Linux installation.

Note that both the user, and if absolutely required the driver can still
impose a limit for FS requests below max_hw_sectors_kb.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-10-22 04:02:54 +0800

09 Oct, 2014

1 commit

b8839b8c5 block: fix alignment_offset math that assumes io_min is a power-of-2 ... Browse Code »

The math in both blk_stack_limits() and queue_limit_alignment_offset()
assume that a block device's io_min (aka minimum_io_size) is always a
power-of-2. Fix the math such that it works for non-power-of-2 io_min.

This issue (of alignment_offset != 0) became apparent when testing
dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
1280K. Commit fdfb4c8c1 ("dm thin: set minimum_io_size to pool's data
block size") unlocked the potential for alignment_offset != 0 due to
the dm-thin-pool's io_min possibly being a non-power-of-2.

Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org
Acked-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Mike Snitzer
2014-10-09 23:41:40 +0800

11 Jun, 2014

1 commit

58a4915ad block: ensure that bio_add_page() always accepts a page for an empty bio ... Browse Code »

With commit 762380ad9322 added support for chunk sizes and no merging
across them, it broke the rule of always allowing adding of a single
page to an empty bio. So relax the restriction a bit to allow for that,
similarly to what we have always done.

This fixes a crash with mkfs.xfs and 512b sector sizes on NVMe.

Reported-by: Keith Busch
Signed-off-by: Jens Axboe

Jens Axboe
2014-06-11 02:53:56 +0800

06 Jun, 2014

1 commit

762380ad9 block: add notion of a chunk size for request merging ... Browse Code »

Some drivers have different limits on what size a request should
optimally be, depending on the offset of the request. Similar to
dividing a device into chunks. Add a setting that allows the driver
to inform the block layer of such a chunk size. The block layer will
then prevent merging across the chunks.

This is needed to optimally support NVMe with a non-zero stripe size.

Signed-off-by: Jens Axboe

Jens Axboe
2014-06-06 03:38:39 +0800

09 Jan, 2014

1 commit

c78afc626 bcache/md: Use raid stripe size ... Browse Code »

Now that we've got code for raid5/6 stripe awareness, bcache just needs
to know about the stripes and when writing partial stripes is expensive
- we probably don't want to enable this optimization for raid1 or 10,
even though they have stripes. So add a flag to queue_limits.

Signed-off-by: Kent Overstreet

Kent Overstreet
2014-01-09 05:05:09 +0800

14 Nov, 2013

1 commit

0910c0bdf Merge branch 'for-3.13/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block IO core updates from Jens Axboe:
"This is the pull request for the core changes in the block layer for
3.13. It contains:

- The new blk-mq request interface.

This is a new and more scalable queueing model that marries the
best part of the request based interface we currently have (which
is fully featured, but scales poorly) and the bio based "interface"
which the new drivers for high IOPS devices end up using because
it's much faster than the request based one.

The bio interface has no block layer support, since it taps into
the stack much earlier. This means that drivers end up having to
implement a lot of functionality on their own, like tagging,
timeout handling, requeue, etc. The blk-mq interface provides all
these. Some drivers even provide a switch to select bio or rq and
has code to handle both, since things like merging only works in
the rq model and hence is faster for some workloads. This is a
huge mess. Conversion of these drivers nets us a substantial code
reduction. Initial results on converting SCSI to this model even
shows an 8x improvement on single queue devices. So while the
model was intended to work on the newer multiqueue devices, it has
substantial improvements for "classic" hardware as well. This code
has gone through extensive testing and development, it's now ready
to go. A pull request is coming to convert virtio-blk to this
model will be will be coming as well, with more drivers scheduled
for 3.14 conversion.

- Two blktrace fixes from Jan and Chen Gang.

- A plug merge fix from Alireza Haghdoost.

- Conversion of __get_cpu_var() from Christoph Lameter.

- Fix for sector_div() with 64-bit divider from Geert Uytterhoeven.

- A fix for a race between request completion and the timeout
handling from Jeff Moyer. This is what caused the merge conflict
with blk-mq/core, in case you are looking at that.

- A dm stacking fix from Mike Snitzer.

- A code consolidation fix and duplicated code removal from Kent
Overstreet.

- A handful of block bug fixes from Mikulas Patocka, fixing a loop
crash and memory corruption on blk cg.

- Elevator switch bug fix from Tomoki Sekiyama.

A heads-up that I had to rebase this branch. Initially the immutable
bio_vecs had been queued up for inclusion, but a week later, it became
clear that it wasn't fully cooked yet. So the decision was made to
pull this out and postpone it until 3.14. It was a straight forward
rebase, just pruning out the immutable series and the later fixes of
problems with it. The rest of the patches applied directly and no
further changes were made"

* 'for-3.13/core' of git://git.kernel.dk/linux-block: (31 commits)
block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
block: Do not call sector_div() with a 64-bit divisor
kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup()
block: Consolidate duplicated bio_trim() implementations
block: Use rw_copy_check_uvector()
block: Enable sysfs nomerge control for I/O requests in the plug list
block: properly stack underlying max_segment_size to DM device
elevator: acquire q->sysfs_lock in elevator_change()
elevator: Fix a race in elevator switching and md device initialization
block: Replace __get_cpu_var uses
bdi: test bdi_init failure
block: fix a probe argument to blk_register_region
loop: fix crash if blk_alloc_queue fails
blk-core: Fix memory corruption if blkcg_init_queue fails
block: fix race between request completion and timeout handling
blktrace: Send BLK_TN_PROCESS events to all running traces
blk-mq: don't disallow request merges for req->special being set
blk-mq: mq plug list breakage
blk-mq: fix for flush deadlock
...

Linus Torvalds
2013-11-14 11:08:14 +0800

09 Nov, 2013

1 commit

d82ae52e6 block: properly stack underlying max_segment_size to DM device ... Browse Code »

Without this patch all DM devices will default to BLK_MAX_SEGMENT_SIZE
(65536) even if the underlying device(s) have a larger value -- this is
due to blk_stack_limits() using min_not_zero() when stacking the
max_segment_size limit.

1073741824

before patch:
65536

after patch:
1073741824

Reported-by: Lukasz Flis
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org # v3.3+
Signed-off-by: Jens Axboe

Mike Snitzer
2013-11-09 00:00:17 +0800

31 Oct, 2013

1 commit

9f7e45d83 ARM: 7794/1: block: Rename parameter dma_mask to max_addr for blk_queue_bounce_limit() ... Browse Code »

The blk_queue_bounce_limit() API parameter 'dma_mask' is actually the
maximum address the device can handle rather than a dma_mask. Rename
it accordingly to avoid it being interpreted as dma_mask.

No functional change.

The idea is to fix the bad assumptions about dma_mask wherever it could
be miss-interpreted.

Cc: Jens Axboe
Signed-off-by: Santosh Shilimkar
Signed-off-by: Russell King

Santosh Shilimkar
2013-10-31 22:49:22 +0800

15 Dec, 2012

1 commit

8dd2cb7e8 block: discard granularity might not be power of 2 ... Browse Code »

In MD raid case, discard granularity might not be power of 2, for example, a
4-disk raid5 has 3*chunk_size discard granularity. Correct the calculation for
such cases.

Reported-by: Neil Brown
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2012-12-15 03:46:04 +0800

20 Sep, 2012

1 commit

4363ac7c1 block: Implement support for WRITE SAME ... Browse Code »

The WRITE SAME command supported on some SCSI devices allows the same
block to be efficiently replicated throughout a block range. Only a
single logical block is transferred from the host and the storage device
writes the same data to all blocks described by the I/O.

This patch implements support for WRITE SAME in the block layer. The
blkdev_issue_write_same() function can be used by filesystems and block
drivers to replicate a buffer across a block range. This can be used to
efficiently initialize software RAID devices, etc.

Signed-off-by: Martin K. Petersen
Acked-by: Mike Snitzer
Signed-off-by: Jens Axboe

Martin K. Petersen
2012-09-20 20:31:45 +0800

01 Aug, 2012

1 commit

fe86cdcef block: do not artificially constrain max_sectors for stacking drivers ... Browse Code »

blk_set_stacking_limits is intended to allow stacking drivers to build
up the limits of the stacked device based on the underlying devices'
limits. But defaulting 'max_sectors' to BLK_DEF_MAX_SECTORS (1024)
doesn't allow the stacking driver to inherit a max_sectors larger than
1024 -- due to blk_stack_limits' use of min_not_zero.

It is now clear that this artificial limit is getting in the way so
change blk_set_stacking_limits's max_sectors to UINT_MAX (which allows
stacking drivers like dm-multipath to inherit 'max_sectors' from the
underlying paths).

Reported-by: Vijay Chauhan
Tested-by: Vijay Chauhan
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Mike Snitzer
2012-08-01 16:44:28 +0800

11 Jan, 2012

1 commit

b1bd055d3 block: Introduce blk_set_stacking_limits function ... Browse Code »

Stacking driver queue limits are typically bounded exclusively by the
capabilities of the low level devices, not by the stacking driver
itself.

This patch introduces blk_set_stacking_limits() which has more liberal
metrics than the default queue limits function. This allows us to
inherit topology parameters from bottom devices without manually
tweaking the default limits in each driver prior to calling the stacking
function.

Since there is now a clear distinction between stacking and low-level
devices, blk_set_default_limits() has been modified to carry the more
conservative values that we used to manually set in
blk_queue_make_request().

Signed-off-by: Martin K. Petersen
Acked-by: Mike Snitzer
Signed-off-by: Jens Axboe

Martin K. Petersen
2012-01-11 23:27:11 +0800

18 May, 2011

1 commit

a934a00a6 block: Fix discard topology stacking and reporting ... Browse Code »

In some cases we would end up stacking discard_zeroes_data incorrectly.
Fix this by enabling the feature by default for stacking drivers and
clearing it for low-level drivers. Incorporating a device that does not
support dzd will then cause the feature to be disabled in the stacking
driver.

Also ensure that the maximum discard value does not overflow when
exported in sysfs and return 0 in the alignment and dzd fields for
devices that don't support discard.

Reported-by: Lukas Czerner
Signed-off-by: Martin K. Petersen
Acked-by: Mike Snitzer
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Martin K. Petersen
2011-05-18 16:37:35 +0800

07 May, 2011

1 commit

f38769309 block: add a non-queueable flush flag ... Browse Code »

flush request isn't queueable in some drives. Add a flag to let driver
notify block layer about this. We can optimize flush performance with the
knowledge.

Stable: 2.6.39 only

Cc: stable@kernel.org
Signed-off-by: Shaohua Li
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

shaohua.li@intel.com
2011-05-07 01:36:25 +0800

18 Apr, 2011

1 commit

b4cb290e0 Revert "block: add callback function for unplug notification" ... Browse Code »

MD can't use this since it really requires us to be able to
keep more than a single piece of state for the unplug. Commit
048c9374 added the required support for MD, so get rid of this
now unused code.

This reverts commit f75664570d8b75469cc468f23c2b27220984983b.

Conflicts:

block/blk-core.c

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-18 15:54:05 +0800

12 Apr, 2011

1 commit

f75664570 block: add callback function for unplug notification ... Browse Code »

MD would like to know when a queue is unplugged, so it can flush
it's bitmap writes. Add such a callback.

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-12 16:17:31 +0800

10 Mar, 2011

2 commits

4c63f5646 Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core ... Browse Code »

Conflicts:
block/blk-core.c
block/blk-flush.c
drivers/md/raid1.c
drivers/md/raid10.c
drivers/md/raid5.c
fs/nilfs2/btnode.c
fs/nilfs2/mdt.c

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:58:35 +0800
7eaceacca block: remove per-queue plugging ... Browse Code »
86

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:07 +0800

03 Mar, 2011

1 commit

c94a96ac9 block: Initialize ->queue_lock to internal lock at queue allocation time ... Browse Code »

There does not seem to be a clear convention whether q->queue_lock is
initialized or not when blk_cleanup_queue() is called. In the past it
was not necessary but now blk_throtl_exit() takes up queue lock by
default and needs queue lock to be available.

In fact elevator_exit() code also has similar requirement just that it
is less stringent in the sense that elevator_exit() is called only if
elevator is initialized.

Two problems have been noticed because of ambiguity about spin lock
status.

- If a driver calls blk_alloc_queue() and then soon calls
blk_cleanup_queue() almost immediately, (because some other
driver structure allocation failed or some other error happened)
then blk_throtl_exit() will run into issues as queue lock is not
initialized. Loop driver ran into this issue recently and I
noticed error paths in md driver too. Similar error paths should
exist in other drivers too.

- If some driver provided external spin lock and zapped the lock
before blk_cleanup_queue(), then it can lead to issues.

So this patch initializes the default queue lock at queue allocation time.

block throttling code is one of the users of queue lock and it is
initialized at the queue allocation time, so it makes sense to
initialize ->queue_lock also to internal lock. A driver can overide that
lock later. This will take care of the issue where a driver does not have
to worry about initializing the queue lock to default before calling
blk_cleanup_queue()

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2011-03-03 08:06:49 +0800

17 Dec, 2010

2 commits

72d4cd9f3 block: max hardware sectors limit wrapper ... Browse Code »

Implement blk_limits_max_hw_sectors() and make
blk_queue_max_hw_sectors() a wrapper around it.

DM needs this to avoid setting queue_limits' max_hw_sectors and
max_sectors directly. dm_set_device_limits() now leverages
blk_limits_max_hw_sectors() logic to establish the appropriate
max_hw_sectors minimum (PAGE_SIZE). Fixes issue where DM was
incorrectly setting max_sectors rather than max_hw_sectors (which
caused dm_merge_bvec()'s max_hw_sectors check to be ineffective).

Signed-off-by: Mike Snitzer
Cc: stable@kernel.org
Acked-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Mike Snitzer
2010-12-17 15:36:01 +0800
e692cb668 block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead ... Browse Code »

When stacking devices, a request_queue is not always available. This
forced us to have a no_cluster flag in the queue_limits that could be
used as a carrier until the request_queue had been set up for a
metadevice.

There were several problems with that approach. First of all it was up
to the stacking device to remember to set queue flag after stacking had
completed. Also, the queue flag and the queue limits had to be kept in
sync at all times. We got that wrong, which could lead to us issuing
commands that went beyond the max scatterlist limit set by the driver.

The proper fix is to avoid having two flags for tracking the same thing.
We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the
block layer merging functions. The queue_limit 'no_cluster' is turned
into 'cluster' to avoid double negatives and to ease stacking.
Clustering defaults to being enabled as before. The queue flag logic is
removed from the stacking function, and explicitly setting the cluster
flag is no longer necessary in DM and MD.

Reported-by: Ed Lin
Signed-off-by: Martin K. Petersen
Acked-by: Mike Snitzer
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Martin K. Petersen
2010-12-17 15:35:53 +0800

23 Oct, 2010

1 commit

a2887097f Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
xen-blkfront: disable barrier/flush write support
Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
block: remove BLKDEV_IFL_WAIT
aic7xxx_old: removed unused 'req' variable
block: remove the BH_Eopnotsupp flag
block: remove the BLKDEV_IFL_BARRIER flag
block: remove the WRITE_BARRIER flag
swap: do not send discards as barriers
fat: do not send discards as barriers
ext4: do not send discards as barriers
jbd2: replace barriers with explicit flush / FUA usage
jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
jbd: replace barriers with explicit flush / FUA usage
nilfs2: replace barriers with explicit flush / FUA usage
reiserfs: replace barriers with explicit flush / FUA usage
gfs2: replace barriers with explicit flush / FUA usage
btrfs: replace barriers with explicit flush / FUA usage
xfs: replace barriers with explicit flush / FUA usage
block: pass gfp_mask and flags to sb_issue_discard
dm: convey that all flushes are processed as empty
...

Linus Torvalds
2010-10-23 08:07:18 +0800

14 Oct, 2010

1 commit

892b6f90d block: Ensure physical block size is unsigned int ... Browse Code »

Physical block size was declared unsigned int to accomodate the maximum
size reported by READ CAPACITY(16). Make sure we use the right type in
the related functions.

Signed-off-by: Martin K. Petersen
Acked-by: Mike Snitzer
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Martin K. Petersen
2010-10-14 03:19:12 +0800

01 Oct, 2010

2 commits

efb012b36 block: set the bounce_pfn to the actual DMA limit rather than to max memory ... Browse Code »

The bounce_pfn of the request queue in 64 bit systems is set to the
current max_low_pfn. Adding more memory later makes this incorrect.
Memory allocated beyond this boot time max_low_pfn appear to require
bounce buffers (bounce buffers are actually not allocated but used in
calculating segments that may result in "over max segments limit"
errors).

Signed-off-by: Malahal Naineni
Signed-off-by: Jens Axboe

Malahal Naineni
2010-10-01 20:45:27 +0800
260a67a9e block: revert bad fix for memory hotplug causing bounces ... Browse Code »

Revert "block: set the bounce_pfn to the actual DMA limit rather than to max memory"

This reverts commit c49825facfd4969585224a896a5e717f88450cad.

Signed-off-by: Jens Axboe

Jens Axboe
2010-10-01 20:42:43 +0800

25 Sep, 2010

1 commit

c49825fac block: set the bounce_pfn to the actual DMA limit rather than to max memory ... Browse Code »

The bounce_pfn of the request queue in 64 bit systems is set to the
current max_low_pfn. Adding more memory later makes this incorrect.
Memory allocated beyond this boot time max_low_pfn appear to require
bounce buffers (bounce buffers are actually not allocated but used in
calculating segments that may result in "over max segments limit"
errors).

Signed-off-by: Malahal Naineni
Signed-off-by: Jens Axboe

Malahal Naineni
2010-09-25 02:27:16 +0800

11 Sep, 2010

1 commit

13f05c8d8 block/scsi: Provide a limit on the number of integrity segments ... Browse Code »

Some controllers have a hardware limit on the number of protection
information scatter-gather list segments they can handle.

Introduce a max_integrity_segments limit in the block layer and provide
a new scsi_host_template setting that allows HBA drivers to provide a
value suitable for the hardware.

Add support for honoring the integrity segment limit when merging both
bios and requests.

Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Martin K. Petersen
2010-09-11 02:50:10 +0800