Eric Lee / smarc-fsl-linux-kernel

02 Aug, 2012

2 commits

eff0d13f3 Merge branch 'for-3.6/drivers' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver changes from Jens Axboe:

- Making the plugging support for drivers a bit more sane from Neil.
This supersedes the plugging change from Shaohua as well.

- The usual round of drbd updates.

- Using a tail add instead of a head add in the request completion for
ndb, making us find the most completed request more quickly.

- A few floppy changes, getting rid of a duplicated flag and also
running the floppy init async (since it takes forever in boot terms)
from Andi.

* 'for-3.6/drivers' of git://git.kernel.dk/linux-block:
floppy: remove duplicated flag FD_RAW_NEED_DISK
blk: pass from_schedule to non-request unplug functions.
block: stack unplug
blk: centralize non-request unplug handling.
md: remove plug_cnt feature of plugging.
block/nbd: micro-optimization in nbd request completion
drbd: announce FLUSH/FUA capability to upper layers
drbd: fix max_bio_size to be unsigned
drbd: flush drbd work queue before invalidate/invalidate remote
drbd: fix potential access after free
drbd: call local-io-error handler early
drbd: do not reset rs_pending_cnt too early
drbd: reset congestion information before reporting it in /proc/drbd
drbd: report congestion if we are waiting for some userland callback
drbd: differentiate between normal and forced detach
drbd: cleanup, remove two unused global flags
floppy: Run floppy initialization asynchronous

Linus Torvalds
2012-08-02 00:06:47 +0800
fcff06c43 Merge branch 'for-next' of git://neil.brown.name/md ... Browse Code »

Pull md updates from NeilBrown.

* 'for-next' of git://neil.brown.name/md:
DM RAID: Add support for MD RAID10
md/RAID1: Add missing case for attempting to repair known bad blocks.
md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE.
md/raid1: don't abort a resync on the first badblock.
md: remove duplicated test on ->openers when calling do_md_stop()
raid5: Add R5_ReadNoMerge flag which prevent bio from merging at block layer
md/raid1: prevent merging too large request
md/raid1: read balance chooses idlest disk for SSD
md/raid1: make sequential read detection per disk based
MD RAID10: Export md_raid10_congested
MD: Move macros from raid1*.h to raid1*.c
MD RAID1: rename mirror_info structure
MD RAID10: rename mirror_info structure
MD RAID10: Fix compiler warning.
raid5: add a per-stripe lock
raid5: remove unnecessary bitmap write optimization
raid5: lockless access raid5 overrided bi_phys_segments
raid5: reduce chance release_stripe() taking device_lock

Linus Torvalds
2012-08-02 00:02:01 +0800

01 Aug, 2012

2 commits

63f33b8dd DM RAID: Add support for MD RAID10 ... Browse Code »

Support the MD RAID10 personality through dm-raid.c

Signed-off-by: Jonathan Brassow
Signed-off-by: NeilBrown

Jonathan Brassow
2012-08-01 18:41:20 +0800
bb181e2e4 Merge commit 'c039c332f23e794deb6d6f37b9f07ff3b27fb2cf ' into md ... Browse Code »

Pull in pre-requisites for adding raid10 support to dm-raid.

NeilBrown
2012-08-01 18:40:02 +0800

31 Jul, 2012

18 commits

74018dc30 blk: pass from_schedule to non-request unplug functions. ... Browse Code »

This will allow md/raid to know why the unplug was called,
and will be able to act according - if !from_schedule it
is safe to perform tasks which could themselves schedule.

Signed-off-by: NeilBrown
Signed-off-by: Jens Axboe

NeilBrown
2012-07-31 15:08:15 +0800
9cbb17508 blk: centralize non-request unplug handling. ... Browse Code »

Both md and umem has similar code for getting notified on an
blk_finish_plug event.
Centralize this code in block/ and allow each driver to
provide its distinctive difference.

Signed-off-by: NeilBrown
Signed-off-by: Jens Axboe

NeilBrown
2012-07-31 15:08:14 +0800
0021b7bc0 md: remove plug_cnt feature of plugging. ... Browse Code »

This seemed like a good idea at the time, but after further thought I
cannot see it making a difference other than very occasionally and
testing to try to exercise the case it is most likely to help did not
show any performance difference by removing it.

So remove the counting of active plugs and allow 'pending writes' to
be activated at any time, not just when no plugs are active.

This is only relevant when there is a write-intent bitmap, and the
updating of the bitmap will likely introduce enough delay that
the single-threading of bitmap updates will be enough to collect large
numbers of updates together.

Removing this will make it easier to centralise the unplug code, and
will clear the other for other unplug enhancements which have a
measurable effect.

Signed-off-by: NeilBrown
Signed-off-by: Jens Axboe

NeilBrown
2012-07-31 15:08:14 +0800
d57368afe md/RAID1: Add missing case for attempting to repair known bad blocks. ... Browse Code »

When doing resync or repair, attempt to correct bad blocks, according
to WriteErrorSeen policy

Signed-off-by: Alex Lyakas
Signed-off-by: NeilBrown

Alexander Lyakas
2012-07-31 10:01:29 +0800
27c1ee3f9 Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge Andrew's first set of patches:
"Non-MM patches:

- lots of misc bits

- tree-wide have_clk() cleanups

- quite a lot of printk tweaks. I draw your attention to "printk:
convert the format for KERN_ to a 2 byte pattern" which
looks a bit scary. But afaict it's solid.

- backlight updates

- lib/ feature work (notably the addition and use of memweight())

- checkpatch updates

- rtc updates

- nilfs updates

- fatfs updates (partial, still waiting for acks)

- kdump, proc, fork, IPC, sysctl, taskstats, pps, etc

- new fault-injection feature work"

* Merge emailed patches from Andrew Morton : (128 commits)
drivers/misc/lkdtm.c: fix missing allocation failure check
lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table()
fault-injection: add tool to run command with failslab or fail_page_alloc
fault-injection: add selftests for cpu and memory hotplug
powerpc: pSeries reconfig notifier error injection module
memory: memory notifier error injection module
PM: PM notifier error injection module
cpu: rewrite cpu-notifier-error-inject module
fault-injection: notifier error injection
c/r: fcntl: add F_GETOWNER_UIDS option
resource: make sure requested range is included in the root range
include/linux/aio.h: cpp->C conversions
fs: cachefiles: add support for large files in filesystem caching
pps: return PTR_ERR on error in device_create
taskstats: check nla_reserve() return
sysctl: suppress kmemleak messages
ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION
ipc: compat: use signed size_t types for msgsnd and msgrcv
ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC
ipc: add COMPAT_SHMLBA support
...

Linus Torvalds
2012-07-31 08:25:34 +0800
8fb980e35 dm: use memweight() ... Browse Code »

Use memweight() to count the total number of bits set in memory area.

Signed-off-by: Akinobu Mita
Cc: Alasdair Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2012-07-31 08:25:16 +0800
895e3c5c5 md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE. ... Browse Code »

'sync' writes set both REQ_SYNC and REQ_NOIDLE.
O_DIRECT writes set REQ_SYNC but not REQ_NOIDLE.

We currently assume that a REQ_SYNC request will not be followed by
more requests and so set STRIPE_PREREAD_ACTIVE to expedite the
request.
This is appropriate for sync requests, but not for O_DIRECT requests.

So make the setting of STRIPE_PREREAD_ACTIVE conditional on REQ_NOIDLE
rather than REQ_SYNC. This is consistent with the documented meaning
of REQ_NOIDLE:

__REQ_NOIDLE, /* don't anticipate more IO after this one */

Signed-off-by: Jianpeng Ma
Signed-off-by: NeilBrown

majianpeng
2012-07-31 08:05:44 +0800
b7219ccb3 md/raid1: don't abort a resync on the first badblock. ... Browse Code »

If a resync of a RAID1 array with 2 devices finds a known bad block
one device it will neither read from, or write to, that device for
this block offset.
So there will be one read_target (The other device) and zero write
targets.
This condition causes md/raid1 to abort the resync assuming that it
has finished - without known bad blocks this would be true.

When there are no write targets because of the presence of bad blocks
we should only skip over the area covered by the bad block.
RAID10 already gets this right, raid1 doesn't. Or didn't.

As this can cause a 'sync' to abort early and appear to have succeeded
it could lead to some data corruption, so it suitable for -stable.

Cc: stable@vger.kernel.org
Reported-by: Alexander Lyakas
Signed-off-by: NeilBrown

NeilBrown
2012-07-31 08:05:34 +0800
90cf195d9 md: remove duplicated test on ->openers when calling do_md_stop() ... Browse Code »

do_md_stop tests mddev->openers while holding ->open_mutex,
and fails if this count is too high.
So callers do not need to check mddev->openers and doing so isn't
very meaningful as they don't hold ->open_mutex so the number could
change.

So remove the unnecessary tests on mddev->openers.
These are not called often enough for there to be any gain in
an early test on ->open_mutex to avoid the need for a slightly more
costly mutex_lock call.

Signed-off-by: NeilBrown

NeilBrown
2012-07-31 08:04:55 +0800
3f9e7c140 raid5: Add R5_ReadNoMerge flag which prevent bio from merging at block layer ... Browse Code »

Because bios will merge at block-layer,so bios-error may caused by other
bio which be merged into to the same request.
Using this flag,it will find exactly error-sector and not do redundant
operation like re-write and re-read.

V0->V1:Using REQ_FLUSH instead REQ_NOMERGE avoid bio merging at block
layer.

Signed-off-by: Jianpeng Ma
Signed-off-by: NeilBrown

majianpeng
2012-07-31 08:04:21 +0800
12cee5a8a md/raid1: prevent merging too large request ... Browse Code »

For SSD, if request size exceeds specific value (optimal io size), request size
isn't important for bandwidth. In such condition, if making request size bigger
will cause some disks idle, the total throughput will actually drop. A good
example is doing a readahead in a two-disk raid1 setup.

So when should we split big requests? We absolutly don't want to split big
request to very small requests. Even in SSD, big request transfer is more
efficient. This patch only considers request with size above optimal io size.

If all disks are busy, is it worth doing a split? Say optimal io size is 16k,
two requests 32k and two disks. We can let each disk run one 32k request, or
split the requests to 4 16k requests and each disk runs two. It's hard to say
which case is better, depending on hardware.

So only consider case where there are idle disks. For readahead, split is
always better in this case. And in my test, below patch can improve > 30%
thoughput. Hmm, not 100%, because disk isn't 100% busy.

Such case can happen not just in readahead, for example, in directio. But I
suppose directio usually will have bigger IO depth and make all disks busy, so
I ignored it.

Note: if the raid uses any hard disk, we don't prevent merging. That will make
performace worse.

Signed-off-by: Shaohua Li
Signed-off-by: NeilBrown

Shaohua Li
2012-07-31 08:03:53 +0800
9dedf6031 md/raid1: read balance chooses idlest disk for SSD ... Browse Code »

SSD hasn't spindle, distance between requests means nothing. And the original
distance based algorithm sometimes can cause severe performance issue for SSD
raid.

Considering two thread groups, one accesses file A, the other access file B.
The first group will access one disk and the second will access the other disk,
because requests are near from one group and far between groups. In this case,
read balance might keep one disk very busy but the other relative idle. For
SSD, we should try best to distribute requests to as many disks as possible.
There isn't spindle move penality anyway.

With below patch, I can see more than 50% throughput improvement sometimes
depending on workloads.

The only exception is small requests can be merged to a big request which
typically can drive higher throughput for SSD too. Such small requests are
sequential reads. Unlike hard disk, sequential read which can't be merged (for
example direct IO, or read without readahead) can be ignored for SSD. Again
there is no spindle move penality. readahead dispatches small requests and such
requests can be merged.

Last patch can help detect sequential read well, at least if concurrent read
number isn't greater than raid disk number. In that case, distance based
algorithm doesn't work well too.

V2: For hard disk and SSD mixed raid, doesn't use distance based algorithm for
random IO too. This makes the algorithm generic for raid with SSD.

Signed-off-by: Shaohua Li
Signed-off-by: NeilBrown

Shaohua Li
2012-07-31 08:03:53 +0800
be4d3280b md/raid1: make sequential read detection per disk based ... Browse Code »

Currently the sequential read detection is global wide. It's natural to make it
per disk based, which can improve the detection for concurrent multiple
sequential reads. And next patch will make SSD read balance not use distance
based algorithm, where this change help detect truly sequential read for SSD.

Signed-off-by: Shaohua Li
Signed-off-by: NeilBrown

Shaohua Li
2012-07-31 08:03:53 +0800
cc4d1efdd MD RAID10: Export md_raid10_congested ... Browse Code »

md/raid10: Export is_congested test.

In similar fashion to commits
11d8a6e3719519fbc0e2c9d61b6fa931b84bf813
1ed7242e591af7e233234d483f12d33818b189d9
we export the RAID10 congestion checking function so that dm-raid.c can
make use of it and make use of the personality. The 'queue' and 'gendisk'
structures will not be available to the MD code when device-mapper sets
up the device, so we conditionalize access to these fields also.

Signed-off-by: Jonathan Brassow
Signed-off-by: NeilBrown

Jonathan Brassow
2012-07-31 08:03:53 +0800
473e87ce4 MD: Move macros from raid1*.h to raid1*.c ... Browse Code »

MD RAID1/RAID10: Move some macros from .h file to .c file

There are three macros (IO_BLOCKED,IO_MADE_GOOD,BIO_SPECIAL) which are defined
in both raid1.h and raid10.h. They are only used in there respective .c files.
However, if we wish to make RAID10 accessible to the device-mapper RAID
target (dm-raid.c), then we need to move these macros into the .c files where
they are used so that they do not conflict with each other.

The macros from the two files are identical and could be moved into md.h, but
I chose to leave the duplication and have them remain in the personality
files.

Signed-off-by: Jonathan Brassow
Signed-off-by: NeilBrown

Jonathan Brassow
2012-07-31 08:03:52 +0800
0eaf822cb MD RAID1: rename mirror_info structure ... Browse Code »

MD RAID1: Rename the structure 'mirror_info' to 'raid1_info'

The same structure name ('mirror_info') is used by raid10. Each of these
structures are defined in there respective header files. If dm-raid is
to support both RAID1 and RAID10, the header files will be included and
the structure names must not collide. While only one of these structure
names needs to change, this patch adds consistency to the naming of the
structure.

Signed-off-by: Jonathan Brassow
Signed-off-by: NeilBrown

Jonathan Brassow
2012-07-31 08:03:52 +0800
dc280d987 MD RAID10: rename mirror_info structure ... Browse Code »

MD RAID10: Rename the structure 'mirror_info' to 'raid10_info'

The same structure name ('mirror_info') is used by raid1. Each of these
structures are defined in there respective header files. If dm-raid is
to support both RAID1 and RAID10, the header files will be included and
the structure names must not collide.

Signed-off-by: Jonathan Brassow
Signed-off-by: NeilBrown

Jonathan Brassow
2012-07-31 08:03:52 +0800
3bbae04b1 MD RAID10: Fix compiler warning. ... Browse Code »

MD RAID10: Fix compiler warning.

Initialize variable to prevent compiler warning.

Signed-off-by: Jonathan Brassow
Signed-off-by: NeilBrown

Jonathan Brassow
2012-07-31 08:03:52 +0800

27 Jul, 2012

18 commits

1f4e0ff07 dm thin: commit before gathering status ... Browse Code »

Commit outstanding metadata before returning the status for a dm thin
pool so that the numbers reported are as up-to-date as possible.

The commit is not performed if the device is suspended or if
the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
through a new 'status_flags' parameter in the target's dm_status_fn.

The userspace dmsetup tool will support the --noflush flag with the
'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
onwards.

Tested-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2012-07-27 22:08:16 +0800
e49e58296 dm thin: add read only and fail io modes ... Browse Code »

Add read-only and fail-io modes to thin provisioning.

If a transaction commit fails the pool's metadata device will transition
to "read-only" mode. If a commit fails once already in read-only mode
the transition to "fail-io" mode occurs.

Once in fail-io mode the pool and all associated thin devices will
report a status of "Fail".

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:16 +0800
da105ed5f dm thin metadata: introduce dm_pool_abort_metadata ... Browse Code »

Introduce dm_pool_abort_metadata to abort the current metadata
transaction. Generally this will only be called when bad things are
happening and dm-thin is trying to roll back to a good state for
read-only mode.

It's complicated by the fact that the metadata device may have failed
completely causing the abort to be unable to read the old transaction.
In this case the metadata object is placed in a 'fail' mode and
everything fails apart from destroying it.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:15 +0800
12ba58af4 dm thin metadata: introduce dm_pool_metadata_set_read_only ... Browse Code »

Introduce dm_pool_metadata_set_read_only to put the underlying block
manager into read-only mode.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:15 +0800
310975573 dm persistent data: introduce dm_bm_set_read_only ... Browse Code »

Introduce dm_bm_set_read_only to switch the block manager into a
read-only mode. To be used when dm-thin degrades due to io errors on
the metadata device.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:15 +0800
4afdd680f dm thin: reduce number of metadata commits ... Browse Code »

Reduce the number of metadata commits by using
dm_thin_changed_this_transaction to check if metadata was changed on a
per thin device granularity.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:14 +0800
40db5a537 dm thin metadata: add dm_thin_changed_this_transaction ... Browse Code »

Introduce dm_thin_changed_this_transaction to dm-thin-metadata to publish a
useful bit of information we're already tracking. This will help dm thin
decide when to commit.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:14 +0800
66b1edc05 dm thin metadata: add format option to dm_pool_metadata_open ... Browse Code »

Add a parameter to dm_pool_metadata_open to indicate whether or not an
unformatted metadata area should be formatted.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:14 +0800
0fa5b17b0 dm thin metadata: tidy up open and format error paths ... Browse Code »

Tidy up error path in __open_metadata and __format_metadata in dm-thin-metadata.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:14 +0800
d73ec5253 dm thin metadata: only check incompat features on open ... Browse Code »

Factor out __check_incompat_features and only call it once when we open
the metadata device rather than at the beginning of every transaction.

Signed-off-by: Mike Snitzer
Signed-off-by: Joe Thornber
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2012-07-27 22:08:13 +0800
b79399510 dm thin metadata: remove duplicate pmd initialisation ... Browse Code »

Remove some duplicate initialisation of struct dm_pool_metadata.

These pmd fields are initialised by both:
__format_metadata's calls to dm_btree_empty
__write_initial_superblock + __begin_transaction

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:13 +0800
8801e0694 dm thin metadata: remove create parameter from __create_persistent_data_objects ... Browse Code »

Remove 'create' parameter from __create_persistent_data_objects() in dm-thin-metadata.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:13 +0800
237074c0a dm thin metadata: move __superblock_all_zeroes to __open_or_format_metadata ... Browse Code »

Move the check for __superblock_all_zeroes from
__create_persistent_data_objects() down to __open_or_format_metadata in
dm-thin-metadata.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:13 +0800
a97e5e6fd dm thin metadata: remove nr_blocks arg from __create_persistent_data_objects ... Browse Code »

Remove nr_blocks arg from __create_persistent_data_objects in dm-thin-metadata.
It was always passed as zero.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:12 +0800
e4d2205cd dm thin metadata: split __open or format metadata ... Browse Code »

Split __open_or_format_metadata into __format_metadata and __open_metadata in
dm-thin-metadata.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:12 +0800
d6332814e dm thin metadata: use struct dm_pool_metadata members in __open_or_format_metadata ... Browse Code »

Clean up __open_or_format_metadata in dm-thin-metadata by using struct
dm_pool_metadata members to replace local variables.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:12 +0800
583ceee2e dm thin metadata: zero unused superblock uuid ... Browse Code »

Zero the unused uuid when initialising the metadata superblock.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:11 +0800
270938bac dm thin metadata: lift __begin_transaction out of __write_initial_superblock ... Browse Code »

Lift the call to __begin_transaction out of __write_initial_superblock in
dm-thin-metadata. Called higher up the call chain now.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Joe Thornber
2012-07-27 22:08:11 +0800