Eric Lee / smarc-fsl-linux-kernel

15 Jan, 2012

2 commits

0bfc96cb7 block: fail SCSI passthrough ioctls on partition devices ... Browse Code »
1

Linux allows executing the SG_IO ioctl on a partition or LVM volume, and
will pass the command to the underlying block device. This is
well-known, but it is also a large security problem when (via Unix
permissions, ACLs, SELinux or a combination thereof) a program or user
needs to be granted access only to part of the disk.

This patch lets partitions forward a small set of harmless ioctls;
others are logged with printk so that we can see which ioctls are
actually sent. In my tests only CDROM_GET_CAPABILITY actually occurred.
Of course it was being sent to a (partition on a) hard disk, so it would
have failed with ENOTTY and the patch isn't changing anything in
practice. Still, I'm treating it specially to avoid spamming the logs.

In principle, this restriction should include programs running with
CAP_SYS_RAWIO. If for example I let a program access /dev/sda2 and
/dev/sdb, it still should not be able to read/write outside the
boundaries of /dev/sda2 independent of the capabilities. However, for
now programs with CAP_SYS_RAWIO will still be allowed to send the
ioctls. Their actions will still be logged.

This patch does not affect the non-libata IDE driver. That driver
however already tests for bd != bd->bd_contains before issuing some
ioctl; it could be restricted further to forbid these ioctls even for
programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO.

Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe
Cc: James Bottomley
Signed-off-by: Paolo Bonzini
[ Make it also print the command name when warning - Linus ]
Signed-off-by: Linus Torvalds

Paolo Bonzini
2012-01-15 07:07:24 +0800
577ebb374 block: add and use scsi_blk_cmd_ioctl ... Browse Code »
1

Introduce a wrapper around scsi_cmd_ioctl that takes a block device.

The function will then be enhanced to detect partition block devices
and, in that case, subject the ioctls to whitelisting.

Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe
Cc: James Bottomley
Signed-off-by: Paolo Bonzini
Signed-off-by: Linus Torvalds

Paolo Bonzini
2012-01-15 07:07:24 +0800

23 Nov, 2011

1 commit

5151412dd block: initialize request_queue's numa node during ... Browse Code »
1

struct request_queue is allocated with __GFP_ZERO so its "node" field is
zero before initialization. This causes an oops if node 0 is offline in
the page allocator because its zonelists are not initialized. From Dave
Young's dmesg:

SRAT: Node 1 PXM 2 0-d0000000
SRAT: Node 1 PXM 2 100000000-330000000
SRAT: Node 0 PXM 1 330000000-630000000
Initmem setup node 1 0000000000000000-000000000affb000
...
Built 1 zonelists in Node order, mobility grouping on.
...
BUG: unable to handle kernel paging request at 0000000000001c08
IP: [] __alloc_pages_nodemask+0xb5/0x870

and __alloc_pages_nodemask+0xb5 translates to a NULL pointer on
zonelist->_zonerefs.

The fix is to initialize q->node at the time of allocation so the correct
node is passed to the slab allocator later.

Since blk_init_allocated_queue_node() is no longer needed, merge it with
blk_init_allocated_queue().

[rientjes@google.com: changelog, initializing q->node]
Cc: stable@vger.kernel.org [2.6.37+]
Reported-by: Dave Young
Signed-off-by: Mike Snitzer
Signed-off-by: David Rientjes
Tested-by: Dave Young
Signed-off-by: Jens Axboe

Mike Snitzer
2011-11-23 17:59:13 +0800

07 Nov, 2011

1 commit

32aaeffbd Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux ... Browse Code »

* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include
net: sch_generic remove redundant use of
net: inet_timewait_sock doesnt need
...

Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h

Linus Torvalds
2011-11-07 11:44:47 +0800

01 Nov, 2011

1 commit

de4772542 include: replace linux/module.h with "struct module" wherever possible ... Browse Code »
43

The pretty much brings in the kitchen sink along
with it, so it should be avoided wherever reasonably possible in
terms of being included from other commonly used
files, as it results in a measureable increase on compile times.

The worst culprit was probably device.h since it is used everywhere.
This file also had an implicit dependency/usage of mutex.h which was
masked by module.h, and is also fixed here at the same time.

There are over a dozen other headers that simply declare the
struct instead of pulling in the whole file, so follow their lead
and simply make it a few more.

Most of the implicit dependencies on module.h being present by
these headers pulling it in have been now weeded out, so we can
finally make this change with hopefully minimal breakage.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-11-01 07:32:32 +0800

19 Oct, 2011

2 commits

bc9fcbf9c block: move blk_throtl prototypes to block/blk.h ... Browse Code »

blk_throtl interface is block internal and there's no reason to have
them in linux/blkdev.h. Move them to block/blk.h.

This patch doesn't introduce any functional change.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2011-10-19 20:31:18 +0800
5c04b426f Merge branch 'v3.1-rc10' into for-3.2/core ... Browse Code »

Conflicts:
block/blk-core.c
include/linux/blkdev.h

Signed-off-by: Jens Axboe

Jens Axboe
2011-10-19 20:30:42 +0800

21 Sep, 2011

1 commit

75df71362 block: document blk-plug ... Browse Code »

Thus spake Andrew Morton:

"And I have the usual maintainability whine. If someone comes up to
vmscan.c and sees it calling blk_start_plug(), how are they supposed to
work out why that call is there? They go look at the blk_start_plug()
definition and it is undocumented. I think we can do better than this?"

Adapted from the LWN article - http://lwn.net/Articles/438256/ by Jens
Axboe and from an earlier attempt by Shaohua Li to document blk-plug.

[akpm@linux-foundation.org: grammatical and spelling tweaks]
Signed-off-by: Suresh Jayaraman
Cc: Shaohua Li
Cc: Jonathan Corbet
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Suresh Jayaraman
2011-09-21 16:00:16 +0800

12 Sep, 2011

3 commits

5a7bbad27 block: remove support for bio remapping from ->make_request ... Browse Code »
86

There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.

Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.

Signed-off-by: Christoph Hellwig
Acked-by: NeilBrown
Signed-off-by: Jens Axboe

Christoph Hellwig
2011-09-12 18:12:01 +0800
c20e8de27 block: rename __make_request() to blk_queue_bio() ... Browse Code »

Now that it's exported, lets put it in a more sane namespace.

Signed-off-by: Jens Axboe

Jens Axboe
2011-09-12 18:08:31 +0800
166e1f901 block: export __make_request ... Browse Code »

Avoid the hacks need for request based device mappers currently by simply
exporting the symbol instead of trying to get it through the back door.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2011-09-12 18:08:27 +0800

24 Aug, 2011

1 commit

56ebdaf2f block: simplify force plug flush code a little bit ... Browse Code »

Cleaning up the code a little bit. attempt_plug_merge() traverses the plug
list anyway, we can do the request counting there, so stack size is reduced
a little bit.
The motivation here is I suspect if we should count the requests for each
queue (task could handle multiple disks in the meantime), but my test doesn't
show it's worthy doing. If somebody proves we should do it, below change
will make that more easier.

Signed-off-by: Shaohua Li
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2011-08-24 22:04:34 +0800

16 Aug, 2011

1 commit

4853abaae block: fix flush machinery for stacking drivers with differring flush flags ... Browse Code »

Commit ae1b1539622fb46e51b4d13b3f9e5f4c713f86ae, block: reimplement
FLUSH/FUA to support merge, introduced a performance regression when
running any sort of fsyncing workload using dm-multipath and certain
storage (in our case, an HP EVA). The test I ran was fs_mark, and it
dropped from ~800 files/sec on ext4 to ~100 files/sec. It turns out
that dm-multipath always advertised flush+fua support, and passed
commands on down the stack, where those flags used to get stripped off.
The above commit changed that behavior:

static inline struct request *__elv_next_request(struct request_queue *q)
{
struct request *rq;

while (1) {
- while (!list_empty(&q->queue_head)) {
+ if (!list_empty(&q->queue_head)) {
rq = list_entry_rq(q->queue_head.next);
- if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
- (rq->cmd_flags & REQ_FLUSH_SEQ))
- return rq;
- rq = blk_do_flush(q, rq);
- if (rq)
- return rq;
+ return rq;
}

Note that previously, a command would come in here, have
REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:

struct request *blk_do_flush(struct request_queue *q, struct request *rq)
{
unsigned int fflags = q->flush_flags; /* may change, cache it */
bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
REQ_FUA);
unsigned skip = 0;
...
if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
rq->cmd_flags &= ~REQ_FLUSH;
if (!has_fua)
rq->cmd_flags &= ~REQ_FUA;
return rq;
}

So, the flush machinery was bypassed in such cases (q->flush_flags == 0
&& rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).

Now, however, we don't get into the flush machinery at all. Instead,
__elv_next_request just hands a request with flush and fua bits set to
the scsi_request_fn, even if the underlying request_queue does not
support flush or fua.

The agreed upon approach is to fix the flush machinery to allow
stacking. While this isn't used in practice (since there is only one
request-based dm target, and that target will now reflect the flush
flags of the underlying device), it does future-proof the solution, and
make it function as designed.

In order to make this work, I had to add a field to the struct request,
inside the flush structure (to store the original req->end_io). Shaohua
had suggested overloading the union with rb_node and completion_data,
but the completion data is used by device mapper and can also be used by
other drivers. So, I didn't see a way around the additional field.

I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
the lost performance. Comments and other testers, as always, are
appreciated.

Cheers,
Jeff

Signed-off-by: Jeff Moyer
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Jeff Moyer
2011-08-16 03:37:25 +0800

01 Aug, 2011

1 commit

aa387cc89 block: add bsg helper library ... Browse Code »

This moves the FC classes bsg code to the block layer and
makes it a lib so that other classes like iscsi and SAS can use it.

It is helpful because working with the request queue, bios,
creating scatterlists, etc are a pain that the LLD does not
have to worry about with normal IOs and should not have to
worry about for bsg requests.

Signed-off-by: Mike Christie
Signed-off-by: Jens Axboe

Mike Christie
2011-08-01 04:05:09 +0800

24 Jul, 2011

1 commit

5757a6d76 block: strict rq_affinity ... Browse Code »

Some systems benefit from completions always being steered to the strict
requester cpu rather than the looser "per-socket" steering that
blk_cpu_to_group() attempts by default. This is because the first
CPU in the group mask ends up being completely overloaded with work,
while the others (including the original submitter) has power left
to spare.

Allow the strict mode to be set by writing '2' to the sysfs control
file. This is identical to the scheme used for the nomerges file,
where '2' is a more aggressive setting than just being turned on.

echo 2 > /sys/block//queue/rq_affinity

Cc: Christoph Hellwig
Cc: Roland Dreier
Tested-by: Dave Jiang
Signed-off-by: Dan Williams
Signed-off-by: Jens Axboe

Dan Williams
2011-07-24 02:44:25 +0800

14 Jul, 2011

1 commit

d7b763013 block: reorder request_queue to remove 64 bit alignment padding ... Browse Code »

Reorder request_queue to remove 16 bytes of alignment padding in 64 bit
builds.

On my config this shrinks the size of this structure from 1608 to 1592
bytes and therefore needs one fewer cachelines.

Also trivially move the open bracket { to be on the same line as the
structure name to make it easier to grep.

Signed-off-by: Richard Kennedy
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Richard Kennedy
2011-07-14 03:17:49 +0800

08 Jul, 2011

2 commits

316cc67d5 block: document blk_plug list access ... Browse Code »

I'm often confused why not disable preempt when changing blk_plug list. It
would be better to add comments here in case others have the similar concerns.

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2011-07-08 14:19:21 +0800
55c022bbd block: avoid building too big plug list ... Browse Code »

When I test fio script with big I/O depth, I found the total throughput drops
compared to some relative small I/O depth. The reason is the thread accumulates
big requests in its plug list and causes some delays (surely this depends
on CPU speed).
I thought we'd better have a threshold for requests. When a threshold reaches,
this means there is no request merge and queue lock contention isn't severe
when pushing per-task requests to queue, so the main advantages of blk plug
don't exist. We can force a plug list flush in this case.
With this, my test throughput actually increases and almost equals to small
I/O depth. Another side effect is irq off time decreases in blk_flush_plug_list()
for big I/O depth.
The BLK_MAX_REQUEST_COUNT is choosen arbitarily, but 16 is efficiently to
reduce lock contention to me. But I'm open here, 32 is ok in my test too.

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2011-07-08 14:19:20 +0800

01 Jul, 2011

1 commit

04bf7869c Merge branch 'for-linus' into for-3.1/core ... Browse Code »

Conflicts:
block/blk-throttle.c
block/cfq-iosched.c

Signed-off-by: Jens Axboe

Jens Axboe
2011-07-01 22:17:13 +0800

13 Jun, 2011

1 commit

4d0d98b60 block:fix the comment error in blkdev.h ... Browse Code »

There is not a function rq_init but blk_rq_init in block/blk-core.c.

Signed-off-by: Wanlong Gao
Signed-off-by: Jens Axboe

Wanlong Gao
2011-06-13 16:45:38 +0800

31 May, 2011

1 commit

ea9d6553b block: remove unwanted semicolons ... Browse Code »

Since those defined functions require additional semicolon
from the caller, they could cause potential syntax errors
when used in if-else statements.

Signed-off-by: Namhyung Kim
Acked-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Namhyung Kim
2011-05-31 19:45:53 +0800

21 May, 2011

1 commit

698567f3f Merge commit 'v2.6.39' into for-2.6.40/core ... Browse Code »

Since for-2.6.40/core was forked off the 2.6.39 devel tree, we've
had churn in the core area that makes it difficult to handle
patches for eg cfq or blk-throttle. Instead of requiring that they
be based in older versions with bugs that have been fixed later
in the rc cycle, merge in 2.6.39 final.

Also fixes up conflicts in the below files.

Conflicts:
drivers/block/paride/pcd.c
drivers/cdrom/viocd.c
drivers/ide/ide-cd.c

Signed-off-by: Jens Axboe

Jens Axboe
2011-05-21 02:33:15 +0800

18 May, 2011

1 commit

a934a00a6 block: Fix discard topology stacking and reporting ... Browse Code »

In some cases we would end up stacking discard_zeroes_data incorrectly.
Fix this by enabling the feature by default for stacking drivers and
clearing it for low-level drivers. Incorporating a device that does not
support dzd will then cause the feature to be disabled in the stacking
driver.

Also ensure that the maximum discard value does not overflow when
exported in sysfs and return 0 in the alignment and dzd fields for
devices that don't support discard.

Reported-by: Lukas Czerner
Signed-off-by: Martin K. Petersen
Acked-by: Mike Snitzer
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Martin K. Petersen
2011-05-18 16:37:35 +0800

07 May, 2011

2 commits

3ac0cc450 block: hold queue if flush is running for non-queueable flush drive ... Browse Code »

In some drives, flush requests are non-queueable. When flush request is
running, normal read/write requests can't run. If block layer dispatches
such request, driver can't handle it and requeue it. Tejun suggested we
can hold the queue when flush is running. This can avoid unnecessary
requeue. Also this can improve performance. For example, we have
request flush1, write1, flush 2. flush1 is dispatched, then queue is
hold, write1 isn't inserted to queue. After flush1 is finished, flush2
will be dispatched. Since disk cache is already clean, flush2 will be
finished very soon, so looks like flush2 is folded to flush1.

In my test, the queue holding completely solves a regression introduced by
commit 53d63e6b0dfb95882ec0219ba6bbd50cde423794:

block: make the flush insertion use the tail of the dispatch list

It's not a preempt type request, in fact we have to insert it
behind requests that do specify INSERT_FRONT.

which causes about 20% regression running a sysbench fileio
workload.

Stable: 2.6.39 only

Cc: stable@kernel.org
Signed-off-by: Shaohua Li
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

shaohua.li@intel.com
2011-05-07 01:36:25 +0800
f38769309 block: add a non-queueable flush flag ... Browse Code »

flush request isn't queueable in some drives. Add a flag to let driver
notify block layer about this. We can optimize flush performance with the
knowledge.

Stable: 2.6.39 only

Cc: stable@kernel.org
Signed-off-by: Shaohua Li
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

shaohua.li@intel.com
2011-05-07 01:36:25 +0800

19 Apr, 2011

1 commit

c21e6beba block: get rid of QUEUE_FLAG_REENTER ... Browse Code »

We are currently using this flag to check whether it's safe
to call into ->request_fn(). If it is set, we punt to kblockd.
But we get a lot of false positives and excessive punts to
kblockd, which hurts performance.

The only real abuser of this infrastructure is SCSI. So export
the async queue run and convert SCSI over to use that. There's
room for improvement in that SCSI need not always use the async
call, but this fixes our performance issue and they can fix that
up in due time.

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-19 19:32:46 +0800

18 Apr, 2011

3 commits

24ecfbe27 block: add blk_run_queue_async ... Browse Code »

Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly. I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2011-04-18 17:41:33 +0800
b4cb290e0 Revert "block: add callback function for unplug notification" ... Browse Code »

MD can't use this since it really requires us to be able to
keep more than a single piece of state for the unplug. Commit
048c9374 added the required support for MD, so get rid of this
now unused code.

This reverts commit f75664570d8b75469cc468f23c2b27220984983b.

Conflicts:

block/blk-core.c

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-18 15:54:05 +0800
048c9374a block: Enhance new plugging support to support general callbacks ... Browse Code »

md/raid requires an unplug callback, but as it does not uses
requests the current code cannot provide one.

So allow arbitrary callbacks to be attached to the blk_plug.

Signed-off-by: NeilBrown
Signed-off-by: Jens Axboe

NeilBrown
2011-04-18 15:52:22 +0800

16 Apr, 2011

1 commit

a237c1c5b block: let io_schedule() flush the plug inline ... Browse Code »

Linus correctly observes that the most important dispatch cases
are now done from kblockd, this isn't ideal for latency reasons.
The original reason for switching dispatches out-of-line was to
avoid too deep a stack, so by _only_ letting the "accidental"
flush directly in schedule() be guarded by offload to kblockd,
we should be able to get the best of both worlds.

So add a blk_schedule_flush_plug() that offloads to kblockd,
and only use that from the schedule() path.

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-16 19:27:55 +0800

15 Apr, 2011

2 commits

f6603783f block: only force kblockd unplugging from the schedule() path ... Browse Code »

For the explicit unplugging, we'd prefer to kick things off
immediately and not pay the penalty of the latency to switch
to kblockd. So let blk_finish_plug() do the run inline, while
the implicit-on-schedule-out unplug will punt to kblockd.

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-15 21:49:07 +0800
88b996cd0 block: cleanup the block plug helper functions ... Browse Code »

It's a bit of a mess currently. task->plug is being cleared
and reset in __blk_finish_plug(), and blk_finish_plug() is
testing for a NULL plug which cannot happen even from schedule()
anymore since it uses blk_needs_flush_plug() to determine
whether to call into this function at all.

So get rid of some of the cruft.

Signed-off-by: Jens Axboe

Christoph Hellwig
2011-04-15 21:20:10 +0800

12 Apr, 2011

1 commit

f75664570 block: add callback function for unplug notification ... Browse Code »

MD would like to know when a queue is unplugged, so it can flush
it's bitmap writes. Add such a callback.

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-12 16:17:31 +0800

06 Apr, 2011

1 commit

a63a5cf84 dm: improve block integrity support ... Browse Code »
1

The current block integrity (DIF/DIX) support in DM is verifying that
all devices' integrity profiles match during DM device resume (which
is past the point of no return). To some degree that is unavoidable
(stacked DM devices force this late checking). But for most DM
devices (which aren't stacking on other DM devices) the ideal time to
verify all integrity profiles match is during table load.

Introduce the notion of an "initialized" integrity profile: a profile
that was blk_integrity_register()'d with a non-NULL 'blk_integrity'
template. Add blk_integrity_is_initialized() to allow checking if a
profile was initialized.

Update DM integrity support to:
- check all devices with _initialized_ integrity profiles match
during table load; uninitialized profiles (e.g. for underlying DM
device(s) of a stacked DM device) are ignored.
- disallow a table load that would result in an integrity profile that
conflicts with a DM device's existing (in-use) integrity profile
- avoid clearing an existing integrity profile
- validate all integrity profiles match during resume; but if they
don't all we can do is report the mismatch (during resume we're past
the point of no return)

Signed-off-by: Mike Snitzer
Cc: Martin K. Petersen
Signed-off-by: Jens Axboe

Mike Snitzer
2011-04-06 05:52:43 +0800

12 Mar, 2011

1 commit

1f940bdfc block: fixup plugging stubs for !CONFIG_BLOCK ... Browse Code »

They used an older prototype, fix it up.

Reported-by: Randy Dunlap
Signed-off-by: Jens Axboe

Jens Axboe
2011-03-12 03:17:08 +0800

10 Mar, 2011

4 commits

4c63f5646 Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core ... Browse Code »

Conflicts:
block/blk-core.c
block/blk-flush.c
drivers/md/raid1.c
drivers/md/raid10.c
drivers/md/raid5.c
fs/nilfs2/btnode.c
fs/nilfs2/mdt.c

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:58:35 +0800
7eaceacca block: remove per-queue plugging ... Browse Code »
86

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:07 +0800
73c101011 block: initial patch for on-stack per-task plugging ... Browse Code »

This patch adds support for creating a queuing context outside
of the queue itself. This enables us to batch up pieces of IO
before grabbing the block device queue lock and submitting them to
the IO scheduler.

The context is created on the stack of the process and assigned in
the task structure, so that we can auto-unplug it if we hit a schedule
event.

The current queue plugging happens implicitly if IO is submitted to
an empty device, yet callers have to remember to unplug that IO when
they are going to wait for it. This is an ugly API and has caused bugs
in the past. Additionally, it requires hacks in the vm (->sync_page()
callback) to handle that logic. By switching to an explicit plugging
scheme we make the API a lot nicer and can get rid of the ->sync_page()
hack in the vm.

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:45:54 +0800
3cca6dc1c block: add API for delaying work/request_fn a little bit ... Browse Code »

Currently we use plugging for that, but as plugging is going away,
we need an alternative mechanism.

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:45:54 +0800

05 Mar, 2011

1 commit

e83a46bbb Merge branch 'for-linus' of ../linux-2.6-block into block-for-2.6.39/core ... Browse Code »

This merge creates two set of conflicts. One is simple context
conflicts caused by removal of throtl_scheduled_delayed_work() in
for-linus and removal of throtl_shutdown_timer_wq() in
for-2.6.39/core.

The other is caused by commit 255bb490c8 (block: blk-flush shouldn't
call directly into q->request_fn() __blk_run_queue()) in for-linus
crashing with FLUSH reimplementation in for-2.6.39/core. The conflict
isn't trivial but the resolution is straight-forward.

* __blk_run_queue() calls in flush_end_io() and flush_data_end_io()
should be called with @force_kblockd set to %true.

* elv_insert() in blk_kick_flush() should use
%ELEVATOR_INSERT_REQUEUE.

Both changes are to avoid invoking ->request_fn() directly from
request completion path and closely match the changes in the commit
255bb490c8.

Signed-off-by: Tejun Heo

Tejun Heo
2011-03-05 02:09:02 +0800