22 Apr, 2009
1 commit
- 
Impact: don't set GFP_DMA in q->bounce_gfp unnecessarily All DMA address limits are expressed in terms of the last addressable 
 unit (byte or page) instead of one plus that. However, when
 determining bounce_gfp for 64bit machines in blk_queue_bounce_limit(),
 it compares the specified limit against 0x100000000UL to determine
 whether it's below 4G ending up falsely setting GFP_DMA in
 q->bounce_gfp.As DMA zone is very small on x86_64, this makes larger SG_IO transfers 
 very eager to trigger OOM killer. Fix it. While at it, rename the
 parameter to @dma_mask for clarity and convert comment to proper
 winged style.Signed-off-by: Tejun Heo 
 Signed-off-by: Jens Axboe
07 Apr, 2009
1 commit
- 
Fix a typo (this was in the original patch but was not merged when the code 
 fixes were for some reason)Signed-off-by: Alan Cox 
 Signed-off-by: Jeff Garzik
29 Dec, 2008
1 commit
- 
zero is invalid for max_phys_segments, max_hw_segments, and 
 max_segment_size. It's better to use use min_not_zero instead of
 min. min() works though (because the commit 0e435ac makes sure that
 these values are set to the default values, non zero, if a queue is
 initialized properly).With this patch, blk_queue_stack_limits does the almost same thing 
 that dm's combine_restrictions_low() does. I think that it's easy to
 remove dm's combine_restrictions_low.Signed-off-by: FUJITA Tomonori 
 Signed-off-by: Jens Axboe
03 Dec, 2008
1 commit
- 
Fix setting of max_segment_size and seg_boundary mask for stacked md/dm 
 devices.When stacking devices (LVM over MD over SCSI) some of the request queue 
 parameters are not set up correctly in some cases by default, namely
 max_segment_size and and seg_boundary mask.If you create MD device over SCSI, these attributes are zeroed. Problem become when there is over this mapping next device-mapper mapping 
 - queue attributes are set in DM this way:request_queue max_segment_size seg_boundary_mask 
 SCSI 65536 0xffffffff
 MD RAID1 0 0
 LVM 65536 -1 (64bit)Unfortunately bio_add_page (resp. bio_phys_segments) calculates number of 
 physical segments according to these parameters.During the generic_make_request() is segment cout recalculated and can 
 increase bio->bi_phys_segments count over the allowed limit. (After
 bio_clone() in stack operation.)Thi is specially problem in CCISS driver, where it produce OOPS here BUG_ON(creq->nr_phys_segments > MAXSGENTRIES); (MAXSEGENTRIES is 31 by default.) Sometimes even this command is enough to cause oops: dd iflag=direct if=/dev// of=/dev/null bs=128000 count=10 This command generates bios with 250 sectors, allocated in 32 4k-pages 
 (last page uses only 1024 bytes).For LVM layer, it allocates bio with 31 segments (still OK for CCISS), 
 unfortunatelly on lower layer it is recalculated to 32 segments and this
 violates CCISS restriction and triggers BUG_ON().The patch tries to fix it by: * initializing attributes above in queue request constructor 
 blk_queue_make_request()* make sure that blk_queue_stack_limits() inherits setting (DM uses its own function to set the limits because it 
 blk_queue_stack_limits() was introduced later. It should probably switch
 to use generic stack limit function too.)* sets the default seg_boundary value in one place (blkdev.h) * use this mask as default in DM (instead of -1, which differs in 64bit) Bugs related to this: 
 https://bugzilla.redhat.com/show_bug.cgi?id=471639
 http://bugzilla.kernel.org/show_bug.cgi?id=8672Signed-off-by: Milan Broz 
 Reviewed-by: Alasdair G Kergon
 Cc: Neil Brown
 Cc: FUJITA Tomonori
 Cc: Tejun Heo
 Cc: Mike Miller
 Signed-off-by: Jens Axboe
17 Oct, 2008
1 commit
- 
modprobe loop; rmmod loop effectively creates a blk_queue and destroys it 
 which results in q->unplug_work being canceled without it ever being
 initialized.Therefore, move the initialization of q->unplug_work from 
 blk_queue_make_request() to blk_alloc_queue*().Reported-by: Alexey Dobriyan 
 Signed-off-by: Peter Zijlstra
 Signed-off-by: Jens Axboe
09 Oct, 2008
6 commits
- 
This patch adds an new interface, blk_lld_busy(), to check lld's 
 busy state from the block layer.
 blk_lld_busy() calls down into low-level drivers for the checking
 if the drivers set q->lld_busy_fn() using blk_queue_lld_busy().This resolves a performance problem on request stacking devices below. Some drivers like scsi mid layer stop dispatching request when 
 they detect busy state on its low-level device like host/target/device.
 It allows other requests to stay in the I/O scheduler's queue
 for a chance of merging.Request stacking drivers like request-based dm should follow 
 the same logic.
 However, there is no generic interface for the stacked device
 to check if the underlying device(s) are busy.
 If the request stacking driver dispatches and submits requests to
 the busy underlying device, the requests will stay in
 the underlying device's queue without a chance of merging.
 This causes performance problem on burst I/O load.With this patch, busy state of the underlying device is exported 
 via q->lld_busy_fn(). So the request stacking driver can check it
 and stop dispatching requests if busy.The underlying device driver must return the busy state appropriately: 
 1: when the device driver can't process requests immediately.
 0: when the device driver can process requests immediately,
 including abnormal situations where the device driver needs
 to kill all requests.Signed-off-by: Kiyoshi Ueda 
 Signed-off-by: Jun'ichi Nomura
 Cc: Andrew Morton
 Signed-off-by: Jens Axboe
- 
Right now SCSI and others do their own command timeout handling. 
 Move those bits to the block layer.Instead of having a timer per command, we try to be a bit more clever 
 and simply have one per-queue. This avoids the overhead of having to
 tear down and setup a timer for each command, so it will result in a lot
 less timer fiddling.Signed-off-by: Mike Anderson 
 Signed-off-by: Jens Axboe
- 
Noticed by sparse: 
 block/blk-softirq.c:156:12: warning: symbol 'blk_softirq_init' was not declared. Should it be static?
 block/genhd.c:583:28: warning: function 'bdget_disk' with external linkage has definition
 block/genhd.c:659:17: warning: incorrect type in argument 1 (different base types)
 block/genhd.c:659:17: expected unsigned int [unsigned] [usertype] size
 block/genhd.c:659:17: got restricted gfp_t
 block/genhd.c:659:29: warning: incorrect type in argument 2 (different base types)
 block/genhd.c:659:29: expected restricted gfp_t [usertype] flags
 block/genhd.c:659:29: got unsigned int
 block: kmalloc args reversedSigned-off-by: Harvey Harrison 
 Signed-off-by: Jens Axboe
- 
This patch adds support for controlling the IO completion CPU of 
 either all requests on a queue, or on a per-request basis. We export
 a sysfs variable (rq_affinity) which, if set, migrates completions
 of requests to the CPU that originally submitted it. A bio helper
 (bio_set_completion_cpu()) is also added, so that queuers can ask
 for completion on that specific CPU.In testing, this has been show to cut the system time by as much 
 as 20-40% on synthetic workloads where CPU affinity is desired.This requires a little help from the architecture, so it'll only 
 work as designed for archs that are using the new generic smp
 helper infrastructure.Signed-off-by: Jens Axboe 
- 
…in them as needed. Fix changed function parameter names. Fix typos/spellos. In comments, change REQ_SPECIAL to REQ_TYPE_SPECIAL and REQ_BLOCK_PC to REQ_TYPE_BLOCK_PC. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> 
 Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
- 
Some block devices benefit from a hint that they can forget the contents 
 of certain sectors. Add basic support for this to the block core, along
 with a 'blkdev_issue_discard()' helper function which issues such
 requests.The caller doesn't get to provide an end_io functio, since 
 blkdev_issue_discard() will automatically split the request up into
 multiple bios if appropriate. Neither does the function wait for
 completion -- it's expected that callers won't care about when, or even
 _if_, the request completes. It's only a hint to the device anyway. By
 definition, the file system doesn't _care_ about these sectors any more.[With feedback from OGAWA Hirofumi and 
 Jens Axboe
 Signed-off-by: Jens Axboe
04 Jul, 2008
1 commit
- 
This adds blk_queue_update_dma_pad to prevent LLDs from overwriting 
 the dma pad mask wrongly (we added blk_queue_update_dma_alignment due
 to the same reason).This also converts libata to use blk_queue_update_dma_pad instead of 
 blk_queue_dma_pad.Signed-off-by: FUJITA Tomonori 
 Cc: Tejun Heo
 Cc: Bartlomiej Zolnierkiewicz
 Cc: Thomas Bogendoerfer
 Cc: James Bottomley
 Signed-off-by: Andrew Morton
 Signed-off-by: Jens Axboe
15 May, 2008
1 commit
- 
As setting and clearing queue flags now requires that we hold a spinlock 
 on the queue, and as blk_queue_stack_limits is called without that lock,
 get the lock inside blk_queue_stack_limits.For blk_queue_stack_limits to be able to find the right lock, each md 
 personality needs to set q->queue_lock to point to the appropriate lock.
 Those personalities which didn't previously use a spin_lock, us
 q->__queue_lock. So always initialise that lock when allocated.With this in place, setting/clearing of the QUEUE_FLAG_PLUGGED bit will no 
 longer cause warnings as it will be clear that the proper lock is held.Thanks to Dan Williams for review and fixing the silly bugs. Signed-off-by: NeilBrown 
 Cc: Dan Williams
 Cc: Jens Axboe
 Cc: Alistair John Strachan
 Cc: Nick Piggin
 Cc: "Rafael J. Wysocki"
 Cc: Jacek Luczak
 Cc: Prakash Punnoor
 Signed-off-by: Andrew Morton
 Signed-off-by: Linus Torvalds
01 May, 2008
1 commit
- 
__FUNCTION__ is gcc specific, use __func__ Signed-off-by: Harvey Harrison 
 Cc: Jens Axboe
 Signed-off-by: Andrew Morton
 Signed-off-by: Linus Torvalds
29 Apr, 2008
2 commits
- 
We can save some atomic ops in the IO path, if we clearly define 
 the rules of how to modify the queue flags.Signed-off-by: Jens Axboe 
- 
blk_max_pfn can now be unexported. Signed-off-by: Adrian Bunk 
 Signed-off-by: Jens Axboe
02 Apr, 2008
1 commit
- 
Looking a bit closer into this regression the reason this can't be 
 right is that dma_addr common default is BLK_BOUNCE_HIGH and most
 machines have less than 4G. So if you do:if (b_pfn > PAGE_SHIFT)) 
 dma = 1that will translate to: if (BLK_BOUNCE_HIGH < blk_max_low_pfn. I guess this is what you were looking after. I didn't verify but as 
 far as I can tell, this will stop the regression with isa dma
 operations at boot for 99% of blkdev/memory combinations out there and
 I guess this fixes the setups with >4G of ram and 32bit pci cards as
 well (this also retains symmetry with the 32bit code).Signed-off-by: Andrea Arcangeli 
 Signed-off-by: Jens Axboe
04 Mar, 2008
4 commits
- 
Intoduced between 2.6.25-rc2 and -rc3 
 block/blk-settings.c:319:12: warning: function 'blk_queue_dma_drain' with external linkage has definitionSigned-off-by: Harvey Harrison 
 Signed-off-by: Jens Axboe
- 
For some non-x86 systems with 4GB or upper 4GB memory, 
 we need increase the range of addresses that can be
 used for direct DMA in 64-bit kernel.Signed-off-by: Yang Shi 
 Signed-off-by: Jens Axboe
- 
Block layer alignment was used for two different purposes - memory 
 alignment and padding. This causes problems in lower layers because
 drivers which only require memory alignment ends up with adjusted
 rq->data_len. Separate out padding such that padding occurs iff
 driver explicitly requests it.Tomo: restorethe code to update bio in blk_rq_map_user 
 introduced by the commit 40b01b9bbdf51ae543a04744283bf2d56c4a6afa
 according to padding alignment.Signed-off-by: Tejun Heo 
 Signed-off-by: FUJITA Tomonori
 Signed-off-by: Jens Axboe
- 
kernel-doc for block/: 
 - add missing parameters
 - fix one function's parameter list (remove blank line)
 - add 2 source files to docbook for non-exported kernel-doc functionsSigned-off-by: Randy Dunlap 
 Signed-off-by: Jens Axboe
19 Feb, 2008
2 commits
- 
Draining shouldn't be done for commands where overflow may indicate 
 data integrity issues. Add dma_drain_needed callback to
 request_queue. Drain buffer is appened iff this function returns
 non-zero.Signed-off-by: Tejun Heo 
 Cc: James Bottomley
 Signed-off-by: Jens Axboe
- 
blk_settings_init() can become static. Signed-off-by: Adrian Bunk 
 Signed-off-by: Jens Axboe
01 Feb, 2008
1 commit
- 
Signed-off-by: Jens Axboe 
30 Jan, 2008
1 commit
- 
Adds files for barrier handling, rq execution, io context handling, 
 mapping data to requests, and queue settings.Signed-off-by: Jens Axboe