Eric Lee / smarc-fsl-linux-kernel

02 Mar, 2011

4 commits

255bb490c block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue() ... Browse Code »

blk-flush decomposes a flush into sequence of multiple requests. On
completion of a request, the next one is queued; however, block layer
must not implicitly call into q->request_fn() directly from completion
path. This makes the queue behave unexpectedly when seen from the
drivers and violates the assumption that q->request_fn() is called
with process context + queue_lock.

This patch makes blk-flush the following two changes to make sure
q->request_fn() is not called directly from request completion path.

- blk_flush_complete_seq_end_io() now asks __blk_run_queue() to always
use kblockd instead of calling directly into q->request_fn().

- queue_next_fseq() uses ELEVATOR_INSERT_REQUEUE instead of
ELEVATOR_INSERT_FRONT so that elv_insert() doesn't try to unplug the
request queue directly.

Reported by Jan in the following threads.

http://thread.gmane.org/gmane.linux.ide/48778
http://thread.gmane.org/gmane.linux.ide/48786

stable: applicable to v2.6.37.

Signed-off-by: Tejun Heo
Reported-by: Jan Beulich
Cc: "David S. Miller"
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Tejun Heo
2011-03-02 21:48:06 +0800
1654e7411 block: add @force_kblockd to __blk_run_queue() ... Browse Code »

__blk_run_queue() automatically either calls q->request_fn() directly
or schedules kblockd depending on whether the function is recursed.
blk-flush implementation needs to be able to explicitly choose
kblockd. Add @force_kblockd.

All the current users are converted to specify %false for the
parameter and this patch doesn't introduce any behavior change.

stable: This is prerequisite for fixing ide oops caused by the new
blk-flush implementation.

Signed-off-by: Tejun Heo
Cc: Jan Beulich
Cc: James Bottomley
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Tejun Heo
2011-03-02 21:48:05 +0800
291d24f6d block: fix kernel-doc format for blkdev_issue_zeroout ... Browse Code »

Signed-off-by: Ben Hutchings
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Ben Hutchings
2011-03-02 02:45:24 +0800
450adcbe5 blk-throttle: Do not use kblockd workqueue for throtl work ... Browse Code »

o Dominik Klein reported a system hang issue while doing some blkio
throttling testing.

https://lkml.org/lkml/2011/2/24/173

o Some tracing revealed that CFQ was not dispatching any more jobs as
queue unplug was not happening. And queue unplug was not happening
because unplug work was not being called as there was one throttling
work on same cpu which as not finished yet. And throttling work had not
finished as it was tyring to dispatch a bio to CFQ but all the request
descriptors were consume to it was put to sleep.

o So basically it is a cyclic dependecny between CFQ unplug work and
throtl dispatch work. Tejun suggested that use separate workqueue for
such cases.

o This patch uses a separate workqueue for throttle related work and
does not rely on kblockd workqueue anymore.

Cc: stable@kernel.org
Reported-by: Dominik Klein
Signed-off-by: Vivek Goyal
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Vivek Goyal
2011-03-02 02:41:53 +0800

26 Feb, 2011

1 commit

638691a7a Merge branch 'for-linus' of git://neil.brown.name/md ... Browse Code »

* 'for-linus' of git://neil.brown.name/md:
md: Fix - again - partition detection when array becomes active
Fix over-zealous flush_disk when changing device size.
md: avoid spinlock problem in blk_throtl_exit
md: correctly handle probe of an 'mdp' device.
md: don't set_capacity before array is active.
md: Fix raid1->raid0 takeover

Linus Torvalds
2011-02-26 03:13:26 +0800

25 Feb, 2011

1 commit

3c522cedb block: fix refcounting in BLKBSZSET ... Browse Code »

Adam Kovari and others reported that disconnecting an USB drive with
an ntfs-3g filesystem would cause "kernel BUG at fs/inode.c:1421!" to
be triggered.

The BUG could be traced back to ioctl(BLKBSZSET), which would
erroneously decrement the refcount on the bdev. This is because
blkdev_get() expects the refcount to be already incremented and either
returns success or decrements the refcount and returns an error.

The bug was introduced by e525fd89 (block: make blkdev_get/put()
handle exclusive access), which didn't take into account this behavior
of blkdev_get().

This fixes
https://bugzilla.kernel.org/show_bug.cgi?id=29202
(and likely 29792 too)

Reported-by: Adam Kovari
Acked-by: Tejun Heo
Signed-off-by: Miklos Szeredi
Signed-off-by: Linus Torvalds

Miklos Szeredi
2011-02-25 00:54:21 +0800

24 Feb, 2011

1 commit

93b270f76 Fix over-zealous flush_disk when changing device size. ... Browse Code »

There are two cases when we call flush_disk.
In one, the device has disappeared (check_disk_change) so any
data will hold becomes irrelevant.
In the oter, the device has changed size (check_disk_size_change)
so data we hold may be irrelevant.

In both cases it makes sense to discard any 'clean' buffers,
so they will be read back from the device if needed.

In the former case it makes sense to discard 'dirty' buffers
as there will never be anywhere safe to write the data. In the
second case it *does*not* make sense to discard dirty buffers
as that will lead to file system corruption when you simply enlarge
the containing devices.

flush_disk calls __invalidate_devices.
__invalidate_device calls both invalidate_inodes and invalidate_bdev.

invalidate_inodes *does* discard I_DIRTY inodes and this does lead
to fs corruption.

invalidate_bev *does*not* discard dirty pages, but I don't really care
about that at present.

So this patch adds a flag to __invalidate_device (calling it
__invalidate_device2) to indicate whether dirty buffers should be
killed, and this is passed to invalidate_inodes which can choose to
skip dirty inodes.

flusk_disk then passes true from check_disk_change and false from
check_disk_size_change.

dm avoids tripping over this problem by calling i_size_write directly
rathher than using check_disk_size_change.

md does use check_disk_size_change and so is affected.

This regression was introduced by commit 608aeef17a which causes
check_disk_size_change to call flush_disk, so it is suitable for any
kernel since 2.6.27.

Cc: stable@kernel.org
Acked-by: Jeff Moyer
Cc: Andrew Patterson
Cc: Jens Axboe
Signed-off-by: NeilBrown

NeilBrown
2011-02-24 14:25:47 +0800

10 Feb, 2011

1 commit

aceb91cd3 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cdrom: support devices that have check_events but not media_changed
cfq-iosched: Don't wait if queue already has requests.
blkio-throttle: Avoid calling blkiocg_lookup_group() for root group
cfq: rename a function to give it more appropriate name
cciss: make cciss_revalidate not loop through CISS_MAX_LUNS volumes unnecessarily.
drivers/block/aoe/Makefile: replace the use of -objs with -y
loop: queue_lock NULL pointer derefence in blk_throtl_exit
drivers/block/Makefile: replace the use of -objs with -y
blktrace: Don't output messages if NOTIFY isn't set.

Linus Torvalds
2011-02-10 03:45:21 +0800

09 Feb, 2011

1 commit

02a8f01b5 cfq-iosched: Don't wait if queue already has requests. ... Browse Code »

Commit 7667aa0630407bc07dc38dcc79d29cc0a65553c1 added logic to wait for
the last queue of the group to become busy (have at least one request),
so that the group does not lose out for not being continuously
backlogged. The commit did not check for the condition that the last
queue already has some requests. As a result, if the queue already has
requests, wait_busy is set. Later on, cfq_select_queue() checks the
flag, and decides that since the queue has a request now and wait_busy
is set, the queue is expired. This results in early expiration of the
queue.

This patch fixes the problem by adding a check to see if queue already
has requests. If it does, wait_busy is not set. As a result, time slices
do not expire early.

The queues with more than one request are usually buffered writers.
Testing shows improvement in isolation between buffered writers.

Cc: stable@kernel.org
Signed-off-by: Justin TerAvest
Reviewed-by: Gui Jianfeng
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Justin TerAvest
2011-02-09 21:22:36 +0800

21 Jan, 2011

1 commit

6a108a14f kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT ... Browse Code »

The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.

This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel. A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).

Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.

Reviewed-by: Ingo Molnar
Acked-by: David Woodhouse
Signed-off-by: David Rientjes
Cc: Greg KH
Cc: "David S. Miller"
Cc: Jens Axboe
Cc: Arnd Bergmann
Cc: Robin Holt
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-01-21 09:02:05 +0800

19 Jan, 2011

2 commits

be2c6b199 blkio-throttle: Avoid calling blkiocg_lookup_group() for root group ... Browse Code »

o Jeff Moyer was doing some testing on a RAM backed disk and
blkiocg_lookup_group() showed up high overhead after memcpy(). Similarly
somebody else reported that blkiocg_lookup_group() is eating 6% extra
cpu. Though looking at the code I can't think why the overhead of
this function is so high. One thing is that it is called with very high
frequency (once for every IO).

o For lot of folks blkio controller will be compiled in but they might
not have actually created cgroups. Hence optimize the case of root
cgroup where we can avoid calling blkiocg_lookup_group() if IO is happening
in root group (common case).

Reported-by: Jeff Moyer
Signed-off-by: Vivek Goyal
Acked-by: Jeff Moyer
Signed-off-by: Jens Axboe

Vivek Goyal
2011-01-19 23:25:02 +0800
ba5bd520f cfq: rename a function to give it more appropriate name ... Browse Code »

o Rename a function to give it more approprate name. We are calculating
cfq queue slice and function name gives the impression as if cfq group
slice length is being calculated.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2011-01-19 23:25:02 +0800

14 Jan, 2011

3 commits

c553f8e33 block cfq: compensate preempted queue even if it has no slice assigned ... Browse Code »

If a queue is preempted before it gets slice assigned, the queue doesn't get
compensation, which looks unfair. For such queue, we compensate it for a whole
slice.

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2011-01-14 15:41:03 +0800
f8ae6e3eb block cfq: make queue preempt work for queues from different workload ... Browse Code »

I got this:
fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt
fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1
fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0
fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1

cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
have cfqg->saved_workload_slice mechanism, the preempt is a nop.
Looks currently our preempt is totally broken if the two queues are not from
the same workload type.
Below patch fixes it. This will might make async queue starvation, but it's
what our old code does before cgroup is added.

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2011-01-14 15:41:02 +0800
275220f0f Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
block: ensure that completion error gets properly traced
blktrace: add missing probe argument to block_bio_complete
block cfq: don't use atomic_t for cfq_group
block cfq: don't use atomic_t for cfq_queue
block: trace event block fix unassigned field
block: add internal hd part table references
block: fix accounting bug on cross partition merges
kref: add kref_test_and_get
bio-integrity: mark kintegrityd_wq highpri and CPU intensive
block: make kblockd_workqueue smarter
Revert "sd: implement sd_check_events()"
block: Clean up exit_io_context() source code.
Fix compile warnings due to missing removal of a 'ret' variable
fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
cfq-iosched: don't check cfqg in choose_service_tree()
fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
cdrom: export cdrom_check_events()
sd: implement sd_check_events()
sr: implement sr_check_events()
...

Linus Torvalds
2011-01-14 02:45:01 +0800

13 Jan, 2011

1 commit

81c5e2ae3 Merge branch 'for-2.6.38/event-handling' into for-2.6.38/core Browse Code »

Jens Axboe
2011-01-13 21:47:54 +0800

07 Jan, 2011

3 commits

329a67815 block cfq: don't use atomic_t for cfq_group ... Browse Code »

cfq_group->ref is used with queue_lock hold, the only exception is
cfq_set_request, which looks like a bug to me, so ref doesn't need
to be an atomic and atomic operation is slower.

Signed-off-by: Shaohua Li
Reviewed-by: Jeff Moyer
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Shaohua Li
2011-01-07 15:48:28 +0800
30d7b9448 block cfq: don't use atomic_t for cfq_queue ... Browse Code »

cfq_queue->ref is used with queue_lock hold, so ref doesn't need to be an atomic
and atomic operation is slower.

Signed-off-by: Shaohua Li
Reviewed-by: Jeff Moyer
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Shaohua Li
2011-01-07 15:46:59 +0800
6c23a9681 block: add internal hd part table references ... Browse Code »

We can't use krefs since it's apparently restricted to very basic
reference counting.

This reverts commit e4a683c8.

Signed-off-by: Jens Axboe

Jens Axboe
2011-01-07 15:43:37 +0800

05 Jan, 2011

1 commit

09e099d4b block: fix accounting bug on cross partition merges ... Browse Code »

/proc/diskstats would display a strange output as follows.

$ cat /proc/diskstats |grep sda
8 0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
8 1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
~~~~~~~~~~
8 2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
8 3 sda3 54 487 2188 92 0 0 0 0 0 88 92
8 4 sda4 4 0 8 0 0 0 0 0 0 0 0
8 5 sda5 81 2027 2130 138 0 0 0 0 0 87 137

Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is
merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.

The detailed root cause is as follows.

Assuming that there are two partition, sda1 and sda2.

1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight
is 0 and sda2's one is 1.

| hd_struct->in_flight
---------------------------
sda1 | 0
sda2 | 1
---------------------------

2. A bio belongs to sda1 is issued and is merged into the request mentioned on
step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
from sda2 region to sda1 region. However the two partition's
hd_struct->in_flight are not changed.

| hd_struct->in_flight
---------------------------
sda1 | 0
sda2 | 1
---------------------------

3. The request is finished and blk_account_io_done() is called. In this case,
sda2's hd_struct->in_flight, not a sda1's one, is decremented.

| hd_struct->in_flight
---------------------------
sda1 | -1
sda2 | 1
---------------------------

The patch fixes the problem by caching the partition lookup
inside the request structure, hence making sure that the increment
and decrement will always happen on the same partition struct. This
also speeds up IO with accounting enabled, since it cuts down on
the number of lookups we have to do.

Also add a refcount to struct hd_struct to keep the partition in
memory as long as users exist. We use kref_test_and_get() to ensure
we don't add a reference to a partition which is going away.

Signed-off-by: Jerome Marchand
Signed-off-by: Yasuaki Ishimatsu
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Jerome Marchand
2011-01-05 23:57:38 +0800

03 Jan, 2011

1 commit

89b90be2d block: make kblockd_workqueue smarter ... Browse Code »

kblockd is used for unplugging and may affect IO latency and
throughput and the max number of concurrent work items are bound by
the number of block devices. Make it HIGHPRI workqueue w/ default max
concurrency.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2011-01-03 22:01:47 +0800

23 Dec, 2010

1 commit

4b7bd3647 Merge branch 'master' into for-next ... Browse Code »

Conflicts:
MAINTAINERS
arch/arm/mach-omap2/pm24xx.c
drivers/scsi/bfa/bfa_fcpim.c

Needed to update to apply fixes for which the old branch was too
outdated.

Jiri Kosina
2010-12-23 01:57:02 +0800

21 Dec, 2010

2 commits

27667c996 block: Clean up exit_io_context() source code. ... Browse Code »

This patch fixes a spelling error in a source code comment and removes
superfluous braces in the function exit_io_context().

Signed-off-by: Bart Van Assche
Cc: Jens Axboe
Signed-off-by: Jens Axboe

Bart Van Assche
2010-12-21 22:07:45 +0800
7f8635cc9 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cciss: fix cciss_revalidate panic
block: max hardware sectors limit wrapper
block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead
blk-throttle: Correct the placement of smp_rmb()
blk-throttle: Trim/adjust slice_end once a bio has been dispatched
block: check for proper length of iov entries earlier in blk_rq_map_user_iov()
drbd: fix for spin_lock_irqsave in endio callback
drbd: don't recvmsg with zero length

Linus Torvalds
2010-12-21 01:19:46 +0800

17 Dec, 2010

8 commits

e61eb2e93 fs/block: type signature of major_to_index(int) to major_to_index(unsigned) ... Browse Code »

The major/minor device numbers are always defined and used as `unsigned'.

Signed-off-by: Yang Zhang
Signed-off-by: Jens Axboe

Yang Zhang
2010-12-17 16:00:18 +0800
b9f985b6e block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p) ... Browse Code »

Signed-off-by: Yang Zhang
Signed-off-by: Jens Axboe

Yang Zhang
2010-12-17 15:58:36 +0800
7278c9c19 cfq-iosched: don't check cfqg in choose_service_tree() ... Browse Code »

When cfq_choose_cfqg() is called in select_queue(), there must be at least one
backlogged CFQ queue waiting for dispatching, hence there must be at least one
backlogged CFQ group on service tree. So we never call choose_service_tree()
with cfqg == NULL.

Signed-off-by: Gui Jianfeng
Reviewed-by: Jeff Moyer
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Gui Jianfeng
2010-12-17 15:57:14 +0800
72d4cd9f3 block: max hardware sectors limit wrapper ... Browse Code »

Implement blk_limits_max_hw_sectors() and make
blk_queue_max_hw_sectors() a wrapper around it.

DM needs this to avoid setting queue_limits' max_hw_sectors and
max_sectors directly. dm_set_device_limits() now leverages
blk_limits_max_hw_sectors() logic to establish the appropriate
max_hw_sectors minimum (PAGE_SIZE). Fixes issue where DM was
incorrectly setting max_sectors rather than max_hw_sectors (which
caused dm_merge_bvec()'s max_hw_sectors check to be ineffective).

Signed-off-by: Mike Snitzer
Cc: stable@kernel.org
Acked-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Mike Snitzer
2010-12-17 15:36:01 +0800
e692cb668 block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead ... Browse Code »

When stacking devices, a request_queue is not always available. This
forced us to have a no_cluster flag in the queue_limits that could be
used as a carrier until the request_queue had been set up for a
metadevice.

There were several problems with that approach. First of all it was up
to the stacking device to remember to set queue flag after stacking had
completed. Also, the queue flag and the queue limits had to be kept in
sync at all times. We got that wrong, which could lead to us issuing
commands that went beyond the max scatterlist limit set by the driver.

The proper fix is to avoid having two flags for tracking the same thing.
We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the
block layer merging functions. The queue_limit 'no_cluster' is turned
into 'cluster' to avoid double negatives and to ease stacking.
Clustering defaults to being enabled as before. The queue flag logic is
removed from the stacking function, and explicitly setting the cluster
flag is no longer necessary in DM and MD.

Reported-by: Ed Lin
Signed-off-by: Martin K. Petersen
Acked-by: Mike Snitzer
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Martin K. Petersen
2010-12-17 15:35:53 +0800
77ea887e4 implement in-kernel gendisk events handling ... Browse Code »

Currently, media presence polling for removeable block devices is done
from userland. There are several issues with this.

* Polling is done by periodically opening the device. For SCSI
devices, the command sequence generated by such action involves a
few different commands including TEST_UNIT_READY. This behavior,
while perfectly legal, is different from Windows which only issues
single command, GET_EVENT_STATUS_NOTIFICATION. Unfortunately, some
ATAPI devices lock up after being periodically queried such command
sequences.

* There is no reliable and unintrusive way for a userland program to
tell whether the target device is safe for media presence polling.
For example, polling for media presence during an on-going burning
session can make it fail. The polling program can avoid this by
opening the device with O_EXCL but then it risks making a valid
exclusive user of the device fail w/ -EBUSY.

* Userland polling is unnecessarily heavy and in-kernel implementation
is lighter and better coordinated (workqueue, timer slack).

This patch implements framework for in-kernel disk event handling,
which includes media presence polling.

* bdops->check_events() is added, which supercedes ->media_changed().
It should check whether there's any pending event and return if so.
Currently, two events are defined - DISK_EVENT_MEDIA_CHANGE and
DISK_EVENT_EJECT_REQUEST. ->check_events() is guaranteed not to be
called parallelly.

* gendisk->events and ->async_events are added. These should be
initialized by block driver before passing the device to add_disk().
The former contains the mask of all supported events and the latter
the mask of all events which the device can report without polling.
/sys/block/*/events[_async] export these to userland.

* Kernel parameter block.events_dfl_poll_msecs controls the system
polling interval (default is 0 which means disable) and
/sys/block/*/events_poll_msecs control polling intervals for
individual devices (default is -1 meaning use system setting). Note
that if a device can report all supported events asynchronously and
its polling interval isn't explicitly set, the device won't be
polled regardless of the system polling interval.

* If a device is opened exclusively with write access, event checking
is automatically disabled until all write exclusive accesses are
released.

* There are event 'clearing' events. For example, both of currently
defined events are cleared after the device has been successfully
opened. This information is passed to ->check_events() callback
using @clearing argument as a hint.

* Event checking is always performed from system_nrt_wq and timer
slack is set to 25% for polling.

* Nothing changes for drivers which implement ->media_changed() but
not ->check_events(). Going forward, all drivers will be converted
to ->check_events() and ->media_change() will be dropped.

Signed-off-by: Tejun Heo
Cc: Kay Sievers
Cc: Jan Kara
Signed-off-by: Jens Axboe

Tejun Heo
2010-12-17 00:53:38 +0800
d2bf1b672 block: move register_disk() and del_gendisk() to block/genhd.c ... Browse Code »

There's no reason for register_disk() and del_gendisk() to be in
fs/partitions/check.c. Move both to genhd.c. While at it, collapse
unlink_gendisk(), which was artificially in a separate function due to
genhd.c / check.c split, into del_gendisk().

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2010-12-17 00:53:38 +0800
dddd9dc34 block: kill genhd_media_change_notify() ... Browse Code »

There's no user of the facility. Kill it.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2010-12-17 00:53:38 +0800

13 Dec, 2010

1 commit

e4ea0c16a block cfq: select new workload if priority changed ... Browse Code »

If priority is changed, continuing to check workload_expires and service tree
count of the previous workload does not make sense. We should always choose
the workload with lowest key of new priority in such case.

Signed-off-by: Shaohua Li
Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe

Shaohua Li writes
2010-12-13 21:32:22 +0800

09 Dec, 2010

1 commit

c7a841f3a [SCSI] bsg: correct fault if queue object removed while dev_t open ... Browse Code »

This patch corrects an issue in bsg that results in a general protection
fault if an LLD is removed while an application is using an open file
handle to a bsg device, and the application issues an ioctl. The fault
occurs because the class_dev is NULL, having been cleared in
bsg_unregister_queue() when the driver was removed. With this
patch, a check is made for the class_dev, and the application
will receive ENXIO if the related object is gone.

Signed-off-by: Carl Lajeunesse
Signed-off-by: James Smart
Signed-off-by: James Bottomley

James Smart
2010-12-09 23:41:14 +0800

02 Dec, 2010

2 commits

04a6b516c blk-throttle: Correct the placement of smp_rmb() ... Browse Code »

o I was discussing what are the variable being updated without spin lock and
why do we need barriers and Oleg pointed out that location of smp_rmb()
should be between read of td->limits_changed and tg->limits_changed. This
patch fixes it.

o Following is one possible sequence of events. Say cpu0 is executing
throtl_update_blkio_group_read_bps() and cpu1 is executing
throtl_process_limit_change().

cpu0 cpu1

tg->limits_changed = true;
smp_mb__before_atomic_inc();
atomic_inc(&td->limits_changed);

if (!atomic_read(&td->limits_changed))
return;

if (tg->limits_changed)
do_something;

If cpu0 has updated tg->limits_changed and td->limits_changed, we want to
make sure that if update to td->limits_changed is visible on cpu1, then
update to tg->limits_changed should also be visible.

Oleg pointed out to ensure that we need to insert an smp_rmb() between
td->limits_changed read and tg->limits_changed read.

o I had erroneously put smp_rmb() before atomic_read(&td->limits_changed).
This patch fixes it.

Reported-by: Oleg Nesterov
Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2010-12-02 02:34:52 +0800
d1ae8ffdf blk-throttle: Trim/adjust slice_end once a bio has been dispatched ... Browse Code »

o During some testing I did following and noticed throttling stops working.

- Put a very low limit on a cgroup, say 1 byte per second.
- Start some reads, this will set slice_end to a very high value.
- Change the limit to higher value say 1MB/s
- Now IO unthrottles and finishes as expected.
- Try to do the read again but IO is not limited to 1MB/s as expected.

o What is happening.
- Initially low value of limit sets slice_end to a very high value.
- During updation of limit, slice_end is not being truncated.
- Very high value of slice_end leads to keeping the existing slice
valid for a very long time and new slice does not start.
- tg_may_dispatch() is called in blk_throtle_bio(), and trim_slice()
is not called in this path. So slice_start is some old value and
practically we are able to do huge amount of IO.

o There are many ways it can be fixed. I have fixed it by trying to
adjust/cleanup slice_end in trim_slice(). Generally we extend slices if bio
is big and can't be dispatched in one slice. After dispatch of bio, readjust
the slice_end to make sure we don't end up with huge values.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2010-12-02 02:34:46 +0800

01 Dec, 2010

2 commits

760701bfe cfq-iosched: Get rid of on_st flag ... Browse Code »

It's able to check whether a CFQ group on a service tree by
checking "cfqg->rb_node". There's no need to maintain an
extra flag here.

Signed-off-by: Gui Jianfeng
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Gui Jianfeng
2010-12-01 03:52:47 +0800
b54ce60eb cfq-iosched: Get rid of st->active ... Browse Code »

When a cfq group is running, it won't be dequeued from service tree, so
there's no need to store the active one in st->active. Just gid rid of it.

Signed-off-by: Gui Jianfeng
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Gui Jianfeng
2010-12-01 03:52:46 +0800

29 Nov, 2010

1 commit

547875561 block: check for proper length of iov entries earlier in blk_rq_map_user_iov() ... Browse Code »
1

commit 9284bcf checks for proper length of iov entries in
blk_rq_map_user_iov(). But if the map is unaligned, kernel
will break out the loop without checking for the proper length.
So we need to check the proper length before the unalign check.

Signed-off-by: Xiaotian Feng
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Xiaotian Feng
2010-11-29 17:04:50 +0800

28 Nov, 2010

1 commit

f30195c50 Merge branch 'cleanup-bd_claim' of git://git.kernel.org/pub/scm/linux/kernel/git… ... Browse Code »

…/tj/misc into for-2.6.38/core

Jens Axboe
2010-11-28 02:49:18 +0800