24 Nov, 2013
3 commits
-
The new bio_split() can split arbitrary bios - it's not restricted to
single page bios, like the old bio_split() (previously renamed to
bio_pair_split()). It also has different semantics - it doesn't allocate
a struct bio_pair, leaving it up to the caller to handle completions.Then convert the existing bio_pair_split() users to the new bio_split()
- and also nvme, which was open coding bio splitting.(We have to take that BUG_ON() out of bio_integrity_trim() because this
bio_split() needs to use it, and there's no reason it has to be used on
bios marked as cloned; BIO_CLONED doesn't seem to have clearly
documented semantics anyways.)Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Martin K. Petersen
Cc: Matthew Wilcox
Cc: Keith Busch
Cc: Vishal Verma
Cc: Jiri Kosina
Cc: Neil Brown -
This is prep work for introducing a more general bio_split().
Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: NeilBrown
Cc: Alasdair Kergon
Cc: Lars Ellenberg
Cc: Peter Osterlund
Cc: Sage Weil -
Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Geert Uytterhoeven
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "Ed L. Cashin"
Cc: Nick Piggin
Cc: Lars Ellenberg
Cc: Jiri Kosina
Cc: Matthew Wilcox
Cc: Geoff Levand
Cc: Yehuda Sadeh
Cc: Sage Weil
Cc: Alex Elder
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris
Cc: Philip Kelleher
Cc: Rusty Russell
Cc: "Michael S. Tsirkin"
Cc: Konrad Rzeszutek Wilk
Cc: Jeremy Fitzhardinge
Cc: Neil Brown
Cc: Alasdair Kergon
Cc: Mike Snitzer
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh
Cc: Benny Halevy
Cc: "James E.J. Bottomley"
Cc: Greg Kroah-Hartman
Cc: "Nicholas A. Bellinger"
Cc: Alexander Viro
Cc: Chris Mason
Cc: "Theodore Ts'o"
Cc: Andreas Dilger
Cc: Jaegeuk Kim
Cc: Steven Whitehouse
Cc: Dave Kleikamp
Cc: Joern Engel
Cc: Prasad Joshi
Cc: Trond Myklebust
Cc: KONISHI Ryusuke
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Ben Myers
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Len Brown
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: Herton Ronaldo Krzesinski
Cc: Ben Hutchings
Cc: Andrew Morton
Cc: Guo Chao
Cc: Tejun Heo
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Wei Yongjun
Cc: "Roger Pau Monné"
Cc: Jan Beulich
Cc: Stefano Stabellini
Cc: Ian Campbell
Cc: Sebastian Ott
Cc: Christian Borntraeger
Cc: Minchan Kim
Cc: Jiang Liu
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Joe Perches
Cc: Peng Tao
Cc: Andy Adamson
Cc: fanchaoting
Cc: Jie Liu
Cc: Sunil Mushran
Cc: "Martin K. Petersen"
Cc: Namjae Jeon
Cc: Pankaj Kumar
Cc: Dan Magenheimer
Cc: Mel Gorman 6
24 Mar, 2013
1 commit
-
Just a little convenience macro - main reason to add it now is preparing
for immutable bio vecs, it'll reduce the size of the patch that puts
bi_sector/bi_size/bi_idx into a struct bvec_iter.Signed-off-by: Kent Overstreet
CC: Jens Axboe
CC: Lars Ellenberg
CC: Jiri Kosina
CC: Alasdair Kergon
CC: dm-devel@redhat.com
CC: Neil Brown
CC: Martin Schwidefsky
CC: Heiko Carstens
CC: linux-s390@vger.kernel.org
CC: Chris Mason
CC: Steven Whitehouse
Acked-by: Steven Whitehouse
11 Oct, 2012
2 commits
-
This makes md linear support TRIM.
Signed-off-by: Shaohua Li
Signed-off-by: NeilBrown -
According to the comment in linear_stop function
rcu_dereference in linear_start and linear_stop functions
occurs under reconfig_mutex. The patch represents this
agreement in code and prevents lockdep complaint.Found by Linux Driver Verification project (linuxtesting.org)
Signed-off-by: Denis Efremov
Signed-off-by: NeilBrown
02 Apr, 2012
1 commit
-
Signed-off-by: majianpeng
Signed-off-by: NeilBrown
19 Mar, 2012
2 commits
-
These personalities currently set a max request size of one page
when any member device has a merge_bvec_fn because they don't
bother to call that function.This causes extra works in splitting and combining requests.
So make the extra effort to call the merge_bvec_fn when it exists
so that we end up with larger requests out the bottom.Signed-off-by: NeilBrown
-
md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
mddev. However it uses the 'safe' version of list_for_each_entry,
and so requires the extra variable, but doesn't include 'safe' in the
name, which is useful documentation.Consequently some places use this safe version without needing it, and
many use an explicity list_for_each entry.So:
- rename rdev_for_each to rdev_for_each_safe
- create a new rdev_for_each which uses the plain
list_for_each_entry,
- use the 'safe' version only where needed, and convert all other
list_for_each_entry calls to use rdev_for_each.Signed-off-by: NeilBrown
23 Dec, 2011
1 commit
-
commit d70ed2e4fafdbef0800e73942482bb075c21578b
broke hot-add to a linear array.
After that commit, metadata if not written to devices until they
have been fully integrated into the array as determined by
saved_raid_disk. That patch arranged to clear that field after
a recovery completed.However for linear arrays, there is no recovery - the integration is
instantaneous. So we need to explicitly clear the saved_raid_disk
field.Signed-off-by: NeilBrown
07 Nov, 2011
1 commit
-
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include
net: sch_generic remove redundant use of
net: inet_timewait_sock doesnt need
...Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
05 Nov, 2011
1 commit
-
* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
block: don't call blk_drain_queue() if elevator is not up
blk-throttle: use queue_is_locked() instead of lockdep_is_held()
blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
blk-throttle: Free up policy node associated with deleted rule
block: warn if tag is greater than real_max_depth.
block: make gendisk hold a reference to its queue
blk-flush: move the queue kick into
blk-flush: fix invalid BUG_ON in blk_insert_flush
block: Remove the control of complete cpu from bio.
block: fix a typo in the blk-cgroup.h file
block: initialize the bounce pool if high memory may be added later
block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
block: drop @tsk from attempt_plug_merge() and explain sync rules
block: make get_request[_wait]() fail if queue is dead
block: reorganize throtl_get_tg() and blk_throtl_bio()
block: reorganize queue draining
block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
block: pass around REQ_* flags instead of broken down booleans during request alloc/free
block: move blk_throtl prototypes to block/blk.h
block: fix genhd refcounting in blkio_policy_parse_and_set()
...Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
and making the request functions be of type "void" instead of "int" in
- drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
- drivers/staging/zram/zram_drv.c
01 Nov, 2011
1 commit
-
A pending cleanup will mean that module.h won't be implicitly
everywhere anymore. Make sure the modular drivers in md dir
are actually calling out for explicitly in advance.Signed-off-by: Paul Gortmaker
11 Oct, 2011
5 commits
-
"mdk" doesn't mean anything any more.
Signed-off-by: NeilBrown
-
Signed-off-by: NeilBrown
-
Signed-off-by: NeilBrown
-
Having mddev_t and 'struct mddev_s' is ugly and not preferred
Signed-off-by: NeilBrown
-
The typedefs are just annoying. 'mdk' probably refers to 'md_k.h'
which used to be an include file that defined this thing.Signed-off-by: NeilBrown
12 Sep, 2011
1 commit
-
There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.Signed-off-by: Christoph Hellwig
Acked-by: NeilBrown
Signed-off-by: Jens Axboe
21 Jul, 2011
1 commit
-
The rcu callback free_conf() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(free_conf).Signed-off-by: Lai Jiangshan
Signed-off-by: Paul E. McKenney
Acked-by: NeilBrown
Reviewed-by: Josh Triplett
17 Mar, 2011
1 commit
-
MD and DM create a new bio_set for every metadevice. Each bio_set has an
integrity mempool attached regardless of whether the metadevice is
capable of passing integrity metadata. This is a waste of memory.Instead we defer the allocation decision to MD and DM since we know at
metadevice creation time whether integrity passthrough is needed or not.Automatic integrity mempool allocation can then be removed from
bioset_create() and we make an explicit integrity allocation for the
fs_bio_set.Signed-off-by: Martin K. Petersen
Reported-by: Zdenek Kabelac
Acked-by: Mike Snitzer
Signed-off-by: Jens Axboe
10 Mar, 2011
2 commits
-
Conflicts:
block/blk-core.c
block/blk-flush.c
drivers/md/raid1.c
drivers/md/raid10.c
drivers/md/raid5.c
fs/nilfs2/btnode.c
fs/nilfs2/mdt.cSigned-off-by: Jens Axboe
-
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().Signed-off-by: Jens Axboe
21 Feb, 2011
1 commit
-
blk_throtl_exit assumes that ->queue_lock still exists,
so make sure that it does.
To do this, we stop redirecting ->queue_lock to conf->device_lock
and leave it pointing where it is initialised - __queue_lock.As the blk_plug functions check the ->queue_lock is held, we now
take that spin_lock explicitly around the plug functions. We don't
need the locking, just the warning removal.This is needed for any kernel with the blk_throtl code, which is
which is 2.6.37 and later.Cc: stable@kernel.org
Signed-off-by: NeilBrown
10 Sep, 2010
1 commit
-
This patch converts md to support REQ_FLUSH/FUA instead of now
deprecated REQ_HARDBARRIER. In the core part (md.c), the following
changes are notable.* Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA don't interfere with
processing of other requests and thus there is no reason to mark the
queue congested while FLUSH/FUA is in progress.* REQ_FLUSH/FUA failures are final and its users don't need retry
logic. Retry logic is removed.* Preflush needs to be issued to all member devices but FUA writes can
be handled the same way as other writes - their processing can be
deferred to request_queue of member devices. md_barrier_request()
is renamed to md_flush_request() and simplified accordingly.For linear, raid0 and multipath, the core changes are enough. raid1,
5 and 10 need the following conversions.* raid1: Handling of FLUSH/FUA bio's can simply be deferred to
request_queues of member devices. Barrier related logic removed.* raid5: Queue draining logic dropped. FUA bit is propagated through
biodrain and stripe resconstruction such that all the updated parts
of the stripe are written out with FUA writes if any of the dirtying
writes was FUA. preread_active_stripes handling in make_request()
is updated as suggested by Neil Brown.* raid10: FUA bit needs to be propagated to write clones.
linear, raid0, 1, 5 and 10 tested.
Signed-off-by: Tejun Heo
Reviewed-by: Neil Brown
Signed-off-by: Jens Axboe
08 Aug, 2010
1 commit
-
Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
22 May, 2010
1 commit
-
Conflicts:
drivers/md/md.c- Resolved conflict in md_update_sb
- Added extra 'NULL' arg to new instance of sysfs_get_dirent.Signed-off-by: NeilBrown
18 May, 2010
3 commits
-
md/linear:mdname:
Signed-off-by: NeilBrown
-
We used to pass the personality make_request function direct
to the block layer so the first argument had to be a queue.
But now we have the intermediary md_make_request so it makes
at lot more sense to pass a struct mddev_s.
It makes it possible to have an mddev without its own queue too.Signed-off-by: NeilBrown
-
While I generally prefer letting personalities do as much as possible,
given that we have a central md_make_request anyway we may as well use
it to simplify code.
Also this centralises knowledge of ->gendisk which will help later.Signed-off-by: NeilBrown
17 May, 2010
1 commit
-
Since commit ef286f6fa673cd7fb367e1b145069d8dbfcc6081
it has been important that each personality clears
->private in the ->stop() function, or sets it to a
attribute group to be removed.
linear.c doesn't. This can sometimes lead to an oops,
though it doesn't always.Suitable for 2.6.33-stable and 2.6.34.
Signed-off-by: NeilBrown
Cc: stable@kernel.org
30 Mar, 2010
1 commit
-
…it slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
16 Mar, 2010
1 commit
-
If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to. Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.So instead set max_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.Signed-off-by: NeilBrown
Cc: stable@kernel.org
26 Feb, 2010
1 commit
-
The block layer calling convention is blk_queue_.
blk_queue_max_sectors predates this practice, leading to some confusion.
Rename the function to appropriately reflect that its intended use is to
set max_hw_sectors.Also introduce a temporary wrapper for backwards compability. This can
be removed after the merge window is closed.Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe
14 Dec, 2009
2 commits
-
Suggested by Oren Held
Signed-off-by: NeilBrown
-
Previously barriers were only supported on RAID1. This is because
other levels requires synchronisation across all devices and so needed
a different approach.
Here is that approach.When a barrier arrives, we send a zero-length barrier to every active
device. When that completes - and if the original request was not
empty - we submit the barrier request itself (with the barrier flag
cleared) and then submit a fresh load of zero length barriers.The barrier request itself is asynchronous, but any subsequent
request will block until the barrier completes.The reason for clearing the barrier flag is that a barrier request is
allowed to fail. If we pass a non-empty barrier through a striping
raid level it is conceivable that part of it could succeed and part
could fail. That would be way too hard to deal with.
So if the first run of zero length barriers succeed, we assume all is
sufficiently well that we send the request and ignore errors in the
second run of barriers.RAID5 needs extra care as write requests may not have been submitted
to the underlying devices yet. So we flush the stripe cache before
proceeding with the barrier.Note that the second set of zero-length barriers are submitted
immediately after the original request is submitted. Thus when
a personality finds mddev->barrier to be set during make_request,
it should not return from make_request until the corresponding
per-device request(s) have been queued.That will be done in later patches.
Signed-off-by: NeilBrown
Reviewed-by: Andre Noll
23 Sep, 2009
1 commit
-
This should writeback from coming when the device is temporarily
suspended.Signed-off-by: NeilBrown
11 Sep, 2009
1 commit
-
Get rid of any functions that test for these bits and make callers
use bio_rw_flagged() directly. Then it is at least directly apparent
what variable and flag they check.Signed-off-by: Jens Axboe
03 Aug, 2009
2 commits
-
As revalidate_disk calls check_disk_size_change, it will cause
any capacity change of a gendisk to be propagated to the blockdev
inode. So use that instead of mucking about with locks and
i_size_write.Also add a call to revalidate_disk in do_md_run and a few other places
where the gendisk capacity is changed.Signed-off-by: NeilBrown
-
This patch replaces md_integrity_check() by two new public functions:
md_integrity_register() and md_integrity_add_rdev() which are both
personality-independent.md_integrity_register() is called from the ->run and ->hot_remove
methods of all personalities that support data integrity. The
function iterates over the component devices of the array and
determines if all active devices are integrity capable and if their
profiles match. If this is the case, the common profile is registered
for the mddev via blk_integrity_register().The second new function, md_integrity_add_rdev() is called from the
->hot_add_disk methods, i.e. whenever a new device is being added
to a raid array. If the new device does not support data integrity,
or has a profile different from the one already registered, data
integrity for the mddev is disabled.For raid0 and linear, only the call to md_integrity_register() from
the ->run method is necessary.Signed-off-by: Andre Noll
Signed-off-by: NeilBrown