02 May, 2017
1 commit
-
Pull block layer updates from Jens Axboe:
- Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
was initially a fork of CFQ, but subsequently changed to implement
fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
to be used on desktop type single drives, providing good fairness.
From Paolo.- Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
using a scalable token based algorithm that throttles IO based on
live completion IO stats, similary to blk-wbt. From Omar.- A series from Jan, moving users to separately allocated backing
devices. This continues the work of separating backing device life
times, solving various problems with hot removal.- A series of updates for lightnvm, mostly from Javier. Includes a
'pblk' target that exposes an open channel SSD as a physical block
device.- A series of fixes and improvements for nbd from Josef.
- A series from Omar, removing queue sharing between devices on mostly
legacy drivers. This helps us clean up other bits, if we know that a
queue only has a single device backing. This has been overdue for
more than a decade.- Fixes for the blk-stats, and improvements to unify the stats and user
windows. This both improves blk-wbt, and enables other users to
register a need to receive IO stats for a device. From Omar.- blk-throttle improvements from Shaohua. This provides a scalable
framework for implementing scalable priotization - particularly for
blk-mq, but applicable to any type of block device. The interface is
marked experimental for now.- Bucketized IO stats for IO polling from Stephen Bates. This improves
efficiency of polled workloads in the presence of mixed block size
IO.- A few fixes for opal, from Scott.
- A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
From a variety of folks, mostly Sagi and James Smart.- A series from Bart, improving our exposed info and capabilities from
the blk-mq debugfs support.- A series from Christoph, cleaning up how handle WRITE_ZEROES.
- A series from Christoph, cleaning up the block layer handling of how
we track errors in a request. On top of being a nice cleanup, it also
shrinks the size of struct request a bit.- Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
never used by platforms, and the latter has outlived it's usefulness.- Various little bug fixes and cleanups from a wide variety of folks.
* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
block: hide badblocks attribute by default
blk-mq: unify hctx delay_work and run_work
block: add kblock_mod_delayed_work_on()
blk-mq: unify hctx delayed_run_work and run_work
nbd: fix use after free on module unload
MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
blk-mq-sched: alloate reserved tags out of normal pool
mtip32xx: use runtime tag to initialize command header
scsi: Implement blk_mq_ops.show_rq()
blk-mq: Add blk_mq_ops.show_rq()
blk-mq: Show operation, cmd_flags and rq_flags names
blk-mq: Make blk_flags_show() callers append a newline character
blk-mq: Move the "state" debugfs attribute one level down
blk-mq: Unregister debugfs attributes earlier
blk-mq: Only unregister hctxs for which registration succeeded
blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
blk-mq: Let blk_mq_debugfs_register() look up the queue name
blk-mq: Register /queue/mq after having registered /queue
ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
ide-pm: always pass 0 error to __blk_end_request_all
..
27 Apr, 2017
1 commit
-
mtip32xx supposes that 'request_idx' passed to .init_request()
is tag of the request, and use that as request's tag to initialize
command header.After MQ IO scheduler is in, request tag assigned isn't same with
the request index anymore, so cause strange hardware failure on
mtip32xx, even whole system panic is triggered.This patch fixes the issue by initializing command header via
request's real tag.Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
21 Apr, 2017
3 commits
-
We need to get the command payload from the request before
we attempt to dereference it.Fixes: 4dda4735c581 ("mtip32xx: add a status field to struct mtip_cmd")
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Now that all drivers that call blk_mq_complete_requests have a
->complete callback we can remove the direct call to blk_mq_end_request,
as well as the error argument to blk_mq_complete_request.Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe -
Instead of using req->errors, which will go away.
Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe
20 Apr, 2017
1 commit
-
The recent introduced MQ IO scheduler breaks mtip32xx in the
following way.mtip32xx use the 'request_index' passed to .init_request() as
hardware tag index for initializing hardware queue, and it
actually require that rq->tag is always same with 'request_index'
passed to .init_request(). Current blk-mq IO scheduler can't
guarantee this point, so this patch passes BLK_MQ_F_NO_SCHED
and at least make mtip32xx working.This patch fixes the following strange hardware failure. The
issue can be triggered easily when doing I/O with mq-deadline
enabled.[ 186.972578] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 32993
[ 186.972578] {1}[Hardware Error]: event severity: fatal
[ 186.972579] {1}[Hardware Error]: Error 0, type: fatal
[ 186.972580] {1}[Hardware Error]: section_type: PCIe error
[ 186.972580] {1}[Hardware Error]: port_type: 0, PCIe end point
[ 186.972581] {1}[Hardware Error]: version: 1.0
[ 186.972581] {1}[Hardware Error]: command: 0x0407, status: 0x0010
[ 186.972582] {1}[Hardware Error]: device_id: 0000:07:00.0
[ 186.972582] {1}[Hardware Error]: slot: 4
[ 186.972583] {1}[Hardware Error]: secondary_bus: 0x00
[ 186.972583] {1}[Hardware Error]: vendor_id: 0x1344, device_id: 0x5150
[ 186.972584] {1}[Hardware Error]: class_code: 008001
[ 186.972585] Kernel panic - not syncing: Fatal hardware error!Reported-by: Jozef Mikovic
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
09 Apr, 2017
1 commit
-
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
kill this hack.Signed-off-by: Christoph Hellwig
Reviewed-by: Martin K. Petersen
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe
31 Mar, 2017
1 commit
-
Constify all instances of blk_mq_ops, as they are never modified.
Signed-off-by: Eric Biggers
Signed-off-by: Jens Axboe
29 Mar, 2017
1 commit
-
As the .q_usage_counter is used by both legacy and
mq path, we need to block new I/O if queue becomes
dead in blk_queue_enter().So rename it and we can use this function in both
paths.Reviewed-by: Bart Van Assche
Reviewed-by: Hannes Reinecke
Signed-off-by: Ming Lei
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe
01 Dec, 2016
1 commit
-
Fix bug https://bugzilla.kernel.org/show_bug.cgi?id=188531. In function
mtip_block_initialize(), variable rv takes the return value, and its
value should be negative on errors. rv is initialized as 0 and is not
reset when the call to ida_pre_get() fails. So 0 may be returned.
The return value 0 indicates that there is no error, which may be
inconsistent with the execution status. This patch fixes the bug by
explicitly assigning -ENOMEM to rv on the branch that ida_pre_get()
fails.Signed-off-by: Pan Bian
Signed-off-by: Jens Axboe
12 Nov, 2016
1 commit
-
There is no need to call kfree() if memdup_user() fails, as no memory
was allocated and the error in the error-valued pointer should be returned.Signed-off-by: Sachin Shukla
Signed-off-by: Jens Axboe
15 Sep, 2016
1 commit
-
All drivers use the default, so provide an inline version of it. If we
ever need other queue mapping we can add an optional method back,
although supporting will also require major changes to the queue setup
code.This provides better code generation, and better debugability as well.
Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe
29 Aug, 2016
1 commit
-
We get 1 warning when biuld kernel with W=1:
drivers/block/mtip32xx/mtip32xx.c:3689:6: warning: no previous prototype for
'mtip_block_release' [-Wmissing-prototypes]In fact, this function is only used in the file in which it is declared
and don't need a declaration, but can be made static.
so this patch marks it 'static'.Signed-off-by: Baoyou Xie
Signed-off-by: Jens Axboe
28 Jun, 2016
1 commit
-
For block drivers that specify a parent device, convert them to use
device_add_disk().This conversion was done with the following semantic patch:
@@
struct gendisk *disk;
expression E;
@@- disk->driverfs_dev = E;
...
- add_disk(disk);
+ device_add_disk(E, disk);@@
struct gendisk *disk;
expression E1, E2;
@@- disk->driverfs_dev = E1;
...
E2 = disk;
...
- add_disk(E2);
+ device_add_disk(E1, E2);...plus some manual fixups for a few missed conversions.
Cc: Jens Axboe
Cc: Keith Busch
Cc: Michael S. Tsirkin
Cc: David Woodhouse
Cc: David S. Miller
Cc: James Bottomley
Cc: Ross Zwisler
Cc: Konrad Rzeszutek Wilk
Cc: Martin K. Petersen
Reviewed-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Dan Williams
08 Jun, 2016
1 commit
-
The req operation REQ_OP is separated from the rq_flag_bits
definition. This converts the block layer drivers to
use req_op to get the op from the request struct.Signed-off-by: Mike Christie
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe
13 Apr, 2016
2 commits
-
The driver calls it with 0 for flags, since it doesn't have a writeback
cache. Just remove the call, as it's a no-op right now.Signed-off-by: Jens Axboe
Reviewed-by: Christoph Hellwig -
Only a single tags array anyway.
Signed-off-by: Keith Busch
Reviewed-by: Johannes Thumshirn
Signed-off-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
19 Mar, 2016
1 commit
-
exec_drive_taskfile() checks for dma mapping errors by comparison
returned address with zero, while pci_dma_mapping_error() should be used.Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov
Signed-off-by: Jens Axboe
04 Mar, 2016
10 commits
-
We always return BLK_EH_RESET_TIMER, so no point in storing that in
an integer.Signed-off-by: Jens Axboe
-
Fail all pending requests after surprise removal of a drive.
Signed-off-by: Vignesh Gunasekaran
Signed-off-by: Selvan Mani
Signed-off-by: Asai Thambi S P
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe -
Added timeout handler. Replaced blk_mq_end_request() with
blk_mq_complete_request() to avoid double completion of a request.Signed-off-by: Selvan Mani
Signed-off-by: Rajesh Kumar Sambandam
Signed-off-by: Asai Thambi S P
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe -
Allow device initialization to finish gracefully when it is in
FTL rebuild failure state. Also, recover device out of this state
after successfully secure erasing it.Signed-off-by: Selvan Mani
Signed-off-by: Vignesh Gunasekaran
Signed-off-by: Asai Thambi S P
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe -
Flush inflight IOs using fsync_bdev() when the device is safely
removed. Also, block further IOs in device open function.Signed-off-by: Selvan Mani
Signed-off-by: Rajesh Kumar Sambandam
Signed-off-by: Asai Thambi S P
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe -
When FTL rebuild is in progress, alloc_disk() initializes the disk
but device node will be created by add_disk() only after successful
completion of FTL rebuild. So, skip deletion of device node in
removal path when FTL rebuild is in progress.Signed-off-by: Selvan Mani
Signed-off-by: Asai Thambi S P
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe -
Prevent standby immediate command from being issued in remove,
suspend and shutdown paths, while drive is in FTL rebuild process.Signed-off-by: Selvan Mani
Signed-off-by: Vignesh Gunasekaran
Signed-off-by: Asai Thambi S P
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe -
Print exact time when an internal command is interrupted.
Signed-off-by: Selvan Mani
Signed-off-by: Rajesh Kumar Sambandam
Signed-off-by: Asai Thambi S P
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe -
Remove setting and clearing MTIP_PF_EH_ACTIVE_BIT flag in
mtip_handle_tfe() as they are redundant. Also avoid waking
up service thread from mtip_handle_tfe() because it is
already woken up in case of taskfile error.Signed-off-by: Selvan Mani
Signed-off-by: Rajesh Kumar Sambandam
Signed-off-by: Asai Thambi S P
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe -
Service thread does not detect the need for taskfile error hanlding. Fixed the
flag condition to process taskfile error.Signed-off-by: Selvan Mani
Signed-off-by: Asai Thambi S P
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe
22 Jan, 2016
1 commit
-
Pull block driver updates from Jens Axboe:
"This is the block driver pull request for 4.5, with the exception of
NVMe, which is in a separate branch and will be posted after this one.This pull request contains:
- A set of bcache stability fixes, which have been acked by Kent.
These have been used and tested for more than a year by the
community, so it's about time that they got in.- A set of drbd updates from the drbd team (Andreas, Lars, Philipp)
and Markus Elfring, Oleg Drokin.- A set of fixes for xen blkback/front from the usual suspects, (Bob,
Konrad) as well as community based fixes from Kiri, Julien, and
Peng.- A 2038 time fix for sx8 from Shraddha, with a fix from me.
- A small mtip32xx cleanup from Zhu Yanjun.
- A null_blk division fix from Arnd"
* 'for-4.5/drivers' of git://git.kernel.dk/linux-block: (71 commits)
null_blk: use sector_div instead of do_div
mtip32xx: restrict variables visible in current code module
xen/blkfront: Fix crash if backend doesn't follow the right states.
xen/blkback: Fix two memory leaks.
xen/blkback: make st_ statistics per ring
xen/blkfront: Handle non-indirect grant with 64KB pages
xen-blkfront: Introduce blkif_ring_get_request
xen-blkback: clear PF_NOFREEZE for xen_blkif_schedule()
xen/blkback: Free resources if connect_ring failed.
xen/blocks: Return -EXX instead of -1
xen/blkback: make pool of persistent grants and free pages per-queue
xen/blkback: get the number of hardware queues/rings from blkfront
xen/blkback: pseudo support for multi hardware queues/rings
xen/blkback: separate ring information out of struct xen_blkif
xen/blkfront: correct setting for xen_blkif_max_ring_order
xen/blkfront: make persistent grants pool per-queue
xen/blkfront: Remove duplicate setting of ->xbdev.
xen/blkfront: Cleanup of comments, fix unaligned variables, and syntax errors.
xen/blkfront: negotiate number of queues/rings to be used with backend
xen/blkfront: split per device io_lock
...
20 Jan, 2016
1 commit
-
Pull core block updates from Jens Axboe:
"We don't have a lot of core changes this time around, it's mostly in
drivers, which will come in a subsequent pull.The cores changes include:
- blk-mq
- Prep patch from Christoph, changing blk_mq_alloc_request() to
take flags instead of just using gfp_t for sleep/nosleep.
- Doc patch from me, clarifying the difference between legacy
and blk-mq for timer usage.
- Fixes from Raghavendra for memory-less numa nodes, and a reuse
of CPU masks.- Cleanup from Geliang Tang, using offset_in_page() instead of open
coding it.- From Ilya, rename request_queue slab to it reflects what it holds,
and a fix for proper use of bdgrab/put.- A real fix for the split across stripe boundaries from Keith. We
yanked a broken version of this from 4.4-rc final, this one works.- From Mike Krinkin, emit a trace message when we split.
- From Wei Tang, two small cleanups, not explicitly clearing memory
that is already cleared"* 'for-4.5/core' of git://git.kernel.dk/linux-block:
block: use bd{grab,put}() instead of open-coding
block: split bios to max possible length
block: add call to split trace point
blk-mq: Avoid memoryless numa node encoded in hctx numa_node
blk-mq: Reuse hardware context cpumask for tags
blk-mq: add a flags parameter to blk_mq_alloc_request
Revert "blk-flush: Queue through IO scheduler when flush not required"
block: clarify blk_add_timer() use case for blk-mq
bio: use offset_in_page macro
block: do not initialise statics to 0 or NULL
block: do not initialise globals to 0 or NULL
block: rename request_queue slab cache
09 Jan, 2016
1 commit
-
The modified variables are only used in the file mtip32xx.c.
As such, the static keyword is inserted to define that object
to be only visible to the current code module during compilation.Signed-off-by: Zhu Yanjun
Signed-off-by: Jens Axboe
06 Jan, 2016
1 commit
-
[folded a fix by Dan Carpenter]
Signed-off-by: Al Viro
02 Dec, 2015
1 commit
-
We already have the reserved flag, and a nowait flag awkwardly encoded as
a gfp_t. Add a real flags argument to make the scheme more extensible and
allow for a nicer calling convention.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
20 Nov, 2015
1 commit
-
kthread_create_on_node takes format+args, so there's no need to do the
pretty-printing in advance. Moreover, "mtip_svc_thd_99" (including its
'\0') only just fits in 16 bytes, so if index could ever go above 99
we'd have a stack buffer overflow.Signed-off-by: Rasmus Villemoes
Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe
07 Nov, 2015
1 commit
-
__GFP_WAIT was used to signal that the caller was in atomic context and
could not sleep. Now it is possible to distinguish between true atomic
context and callers that are not willing to sleep. The latter should
clear __GFP_DIRECT_RECLAIM so kswapd will still wake. As clearing
__GFP_WAIT behaves differently, there is a risk that people will clear the
wrong flags. This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly
indicate what it does -- setting it allows all reclaim activity, clearing
them prevents it.[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mel Gorman
Acked-by: Michal Hocko
Acked-by: Vlastimil Babka
Acked-by: Johannes Weiner
Cc: Christoph Lameter
Acked-by: David Rientjes
Cc: Vitaly Wool
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Aug, 2015
1 commit
-
Hi,
After commit f70ced091707 (blk-mq: support per-distpatch_queue flush
machinery), the mtip32xx driver may oops upon module load due to walking
off the end of an array in mtip_init_cmd. On initialization of the
flush_rq, init_request is called with request_index >= the maximum queue
depth the driver supports. For mtip32xx, this value is used to index
into an array. What this means is that the driver will walk off the end
of the array, and either oops or cause random memory corruption.The problem is easily reproduced by doing modprobe/rmmod of the mtip32xx
driver in a loop. I can typically reproduce the problem in about 30
seconds.Now, in the case of mtip32xx, it actually doesn't support flush/fua, so
I think we can simply return without doing anything. In addition, no
other mq-enabled driver does anything with the request_index passed into
init_request(), so no other driver is affected. However, I'm not really
sure what is expected of drivers. Ming, what did you envision drivers
would do when initializing the flush requests?Signed-off-by: Jeff Moyer
Signed-off-by: Jens Axboe
24 Jun, 2015
1 commit
-
In mtip_pci_remove(), driver data 'dd' is accessed after freeing it. This
is a residue of SRSI code cleanup in the patch 016a41c38821 "mtip32xx: fix
crash on surprise removal of the drive". Removed the bit flags
MTIP_DDF_REMOVE_DONE_BIT and MTIP_PF_SR_CLEANUP_BIT.Reported-by: Julia Lawall
Signed-off-by: Vignesh Gunasekaran
Signed-off-by: Selvan Mani
Signed-off-by: Asai Thambi S P
Signed-off-by: Jens Axboe
16 Jun, 2015
3 commits
-
In LUN failure conditions, device takes longer time to complete the hba reset.
Increased wait time from 1 second to 10 seconds.Signed-off-by: Sam Bradshaw
Signed-off-by: Asai Thambi S P
Signed-off-by: Jens Axboe -
When a device is surprise removed and inserted, it is assigned a new minor
number because driver use multiples of 'instance' number. Modified to use the
multiples of 'index' for minor number.Signed-off-by: Asai Thambi S P
Signed-off-by: Jens Axboe -
Signed-off-by: Asai Thambi S P
Signed-off-by: Jens Axboe