Eric Lee / smarc-fsl-linux-kernel

02 Aug, 2012

2 commits

eff0d13f3 Merge branch 'for-3.6/drivers' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver changes from Jens Axboe:

- Making the plugging support for drivers a bit more sane from Neil.
This supersedes the plugging change from Shaohua as well.

- The usual round of drbd updates.

- Using a tail add instead of a head add in the request completion for
ndb, making us find the most completed request more quickly.

- A few floppy changes, getting rid of a duplicated flag and also
running the floppy init async (since it takes forever in boot terms)
from Andi.

* 'for-3.6/drivers' of git://git.kernel.dk/linux-block:
floppy: remove duplicated flag FD_RAW_NEED_DISK
blk: pass from_schedule to non-request unplug functions.
block: stack unplug
blk: centralize non-request unplug handling.
md: remove plug_cnt feature of plugging.
block/nbd: micro-optimization in nbd request completion
drbd: announce FLUSH/FUA capability to upper layers
drbd: fix max_bio_size to be unsigned
drbd: flush drbd work queue before invalidate/invalidate remote
drbd: fix potential access after free
drbd: call local-io-error handler early
drbd: do not reset rs_pending_cnt too early
drbd: reset congestion information before reporting it in /proc/drbd
drbd: report congestion if we are waiting for some userland callback
drbd: differentiate between normal and forced detach
drbd: cleanup, remove two unused global flags
floppy: Run floppy initialization asynchronous

Linus Torvalds
2012-08-02 00:06:47 +0800
8cf1a3fce Merge branch 'for-3.6/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block IO bits from Jens Axboe:
"The most complicated part if this is the request allocation rework by
Tejun, which has been queued up for a long time and has been in
for-next ditto as well.

There are a few commits from yesterday and today, mostly trivial and
obvious fixes. So I'm pretty confident that it is sound. It's also
smaller than usual."

* 'for-3.6/core' of git://git.kernel.dk/linux-block:
block: remove dead func declaration
block: add partition resize function to blkpg ioctl
block: uninitialized ioc->nr_tasks triggers WARN_ON
block: do not artificially constrain max_sectors for stacking drivers
blkcg: implement per-blkg request allocation
block: prepare for multiple request_lists
block: add q->nr_rqs[] and move q->rq.elvpriv to q->nr_rqs_elvpriv
blkcg: inline bio_blkcg() and friends
block: allocate io_context upfront
block: refactor get_request[_wait]()
block: drop custom queue draining used by scsi_transport_{iscsi|fc}
mempool: add @gfp_mask to mempool_create_node()
blkcg: make root blkcg allocation use %GFP_KERNEL
blkcg: __blkg_lookup_create() doesn't need radix preload

Linus Torvalds
2012-08-02 00:02:41 +0800

01 Aug, 2012

4 commits

80799fbb7 block: remove dead func declaration ... Browse Code »

__generic_unplug_device() function is removed with commit
7eaceaccab5f40bbfda044629a6298616aeaed50, which forgot to
remove the declaration at meantime. Here remove it.

Signed-off-by: Yuanhan Liu
Signed-off-by: Jens Axboe

Yuanhan Liu
2012-08-01 18:25:54 +0800
c83f6bf98 block: add partition resize function to blkpg ioctl ... Browse Code »

Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that
allows altering the size of an existing partition, even if it is currently
in use.

This patch converts hd_struct->nr_sects into sequence counter because
One might extend a partition while IO is happening to it and update of
nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This
can lead to issues like reading inconsistent size of a partition. Sequence
counter have been used so that readers don't have to take bdev mutex lock
as we call sector_in_part() very frequently.

Now all the access to hd_struct->nr_sects should happen using sequence
counter read/update helper functions part_nr_sects_read/part_nr_sects_write.
There is one exception though, set_capacity()/get_capacity(). I think
theoritically race should exist there too but this patch does not
modify set_capacity()/get_capacity() due to sheer number of call sites
and I am afraid that change might break something. I have left that as a
TODO item. We can handle it later if need be. This patch does not introduce
any new races as such w.r.t set_capacity()/get_capacity().

v2: Add CONFIG_LBDAF test to UP preempt case as suggested by Phillip.

Signed-off-by: Vivek Goyal
Signed-off-by: Phillip Susi
Signed-off-by: Jens Axboe

Vivek Goyal
2012-08-01 18:24:18 +0800
4638a83e8 block: uninitialized ioc->nr_tasks triggers WARN_ON ... Browse Code »

Hi,

I'm using the old-fashioned 'dump' backup tool, and I noticed that it spews the
below warning as of 3.5-rc1 and later (3.4 is fine):

[ 10.886893] ------------[ cut here ]------------
[ 10.886904] WARNING: at include/linux/iocontext.h:140 copy_process+0x1488/0x1560()
[ 10.886905] Hardware name: Bochs
[ 10.886906] Modules linked in:
[ 10.886908] Pid: 2430, comm: dump Not tainted 3.5.0-rc7+ #27
[ 10.886908] Call Trace:
[ 10.886911] [] warn_slowpath_common+0x7a/0xb0
[ 10.886912] [] warn_slowpath_null+0x15/0x20
[ 10.886913] [] copy_process+0x1488/0x1560
[ 10.886914] [] do_fork+0xb4/0x340
[ 10.886918] [] ? recalc_sigpending+0x1a/0x50
[ 10.886919] [] ? __set_task_blocked+0x32/0x80
[ 10.886920] [] ? __set_current_blocked+0x3a/0x60
[ 10.886923] [] sys_clone+0x23/0x30
[ 10.886925] [] stub_clone+0x13/0x20
[ 10.886927] [] ? system_call_fastpath+0x16/0x1b
[ 10.886928] ---[ end trace 32a14af7ee6a590b ]---

Reproducing is easy, I can hit it on a KVM system with a very basic
config (x86_64 make defconfig + enable the drivers needed). To hit it,
just install dump (on debian/ubuntu, not sure what the package might be
called on Fedora), and:

dump -o -f /tmp/foo /

You'll see the warning in dmesg once it forks off the I/O process and
starts dumping filesystem contents.

I bisected it down to the following commit:

commit f6e8d01bee036460e03bd4f6a79d014f98ba712e
Author: Tejun Heo
Date: Mon Mar 5 13:15:26 2012 -0800

block: add io_context->active_ref

Currently ioc->nr_tasks is used to decide two things - whether an ioc
is done issuing IOs and whether it's shared by multiple tasks. This
patch separate out the first into ioc->active_ref, which is acquired
and released using {get|put}_io_context_active() respectively.

This will be used to associate bio's with a given task. This patch
doesn't introduce any visible behavior change.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

It seems like the init of ioc->nr_tasks was removed in that patch,
so it starts out at 0 instead of 1.

Tejun, is the right thing here to add back the init, or should something else
be done?

The below patch removes the warning, but I haven't done any more extensive
testing on it.

Signed-off-by: Olof Johansson
Acked-by: Tejun Heo
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Olof Johansson
2012-08-01 18:17:27 +0800
fe86cdcef block: do not artificially constrain max_sectors for stacking drivers ... Browse Code »

blk_set_stacking_limits is intended to allow stacking drivers to build
up the limits of the stacked device based on the underlying devices'
limits. But defaulting 'max_sectors' to BLK_DEF_MAX_SECTORS (1024)
doesn't allow the stacking driver to inherit a max_sectors larger than
1024 -- due to blk_stack_limits' use of min_not_zero.

It is now clear that this artificial limit is getting in the way so
change blk_set_stacking_limits's max_sectors to UINT_MAX (which allows
stacking drivers like dm-multipath to inherit 'max_sectors' from the
underlying paths).

Reported-by: Vijay Chauhan
Tested-by: Vijay Chauhan
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Mike Snitzer
2012-08-01 16:44:28 +0800

31 Jul, 2012

3 commits

74018dc30 blk: pass from_schedule to non-request unplug functions. ... Browse Code »

This will allow md/raid to know why the unplug was called,
and will be able to act according - if !from_schedule it
is safe to perform tasks which could themselves schedule.

Signed-off-by: NeilBrown
Signed-off-by: Jens Axboe

NeilBrown
2012-07-31 15:08:15 +0800
2a7d5559b block: stack unplug ... Browse Code »

MD raid1 prepares to dispatch request in unplug callback. If make_request in
low level queue also uses unplug callback to dispatch request, the low level
queue's unplug callback will not be called. Recheck the callback list helps
this case.

Signed-off-by: Shaohua Li
Signed-off-by: NeilBrown
Signed-off-by: Jens Axboe

Shaohua Li
2012-07-31 15:08:15 +0800
9cbb17508 blk: centralize non-request unplug handling. ... Browse Code »

Both md and umem has similar code for getting notified on an
blk_finish_plug event.
Centralize this code in block/ and allow each driver to
provide its distinctive difference.

Signed-off-by: NeilBrown
Signed-off-by: Jens Axboe

NeilBrown
2012-07-31 15:08:14 +0800

20 Jul, 2012

1 commit

e81ca6fe8 [SCSI] block: Fix blk_execute_rq_nowait() dead queue handling ... Browse Code »

If the queue is dead blk_execute_rq_nowait() doesn't invoke the done()
callback function. That will result in blk_execute_rq() being stuck
in wait_for_completion(). Avoid this by initializing rq->end_io to the
done() callback before we check the queue state. Also, make sure the
queue lock is held around the invocation of the done() callback. Found
this through source code review.

Signed-off-by: Muthukumar Ratty
Signed-off-by: Bart Van Assche
Reviewed-by: Tejun Heo
Acked-by: Jens Axboe
Signed-off-by: James Bottomley

Muthukumar Ratty
2012-07-20 15:58:39 +0800

27 Jun, 2012

1 commit

a051661ca blkcg: implement per-blkg request allocation ... Browse Code »

Currently, request_queue has one request_list to allocate requests
from regardless of blkcg of the IO being issued. When the unified
request pool is used up, cfq proportional IO limits become meaningless
- whoever grabs the next request being freed wins the race regardless
of the configured weights.

This can be easily demonstrated by creating a blkio cgroup w/ very low
weight, put a program which can issue a lot of random direct IOs there
and running a sequential IO from a different cgroup. As soon as the
request pool is used up, the sequential IO bandwidth crashes.

This patch implements per-blkg request_list. Each blkg has its own
request_list and any IO allocates its request from the matching blkg
making blkcgs completely isolated in terms of request allocation.

* Root blkcg uses the request_list embedded in each request_queue,
which was renamed to @q->root_rl from @q->rq. While making blkcg rl
handling a bit harier, this enables avoiding most overhead for root
blkcg.

* Queue fullness is properly per request_list but bdi isn't blkcg
aware yet, so congestion state currently just follows the root
blkcg. As writeback isn't aware of blkcg yet, this works okay for
async congestion but readahead may get the wrong signals. It's
better than blkcg completely collapsing with shared request_list but
needs to be improved with future changes.

* After this change, each block cgroup gets a full request pool making
resource consumption of each cgroup higher. This makes allowing
non-root users to create cgroups less desirable; however, note that
allowing non-root users to directly manage cgroups is already
severely broken regardless of this patch - each block cgroup
consumes kernel memory and skews IO weight (IO weights are not
hierarchical).

v2: queue-sysfs.txt updated and patch description udpated as suggested
by Vivek.

v3: blk_get_rl() wasn't checking error return from
blkg_lookup_create() and may cause oops on lookup failure. Fix it
by falling back to root_rl on blkg lookup failures. This problem
was spotted by Rakesh Iyer .

v4: Updated to accomodate 458f27a982 "block: Avoid missed wakeup in
request waitqueue". blk_drain_queue() now wakes up waiters on all
blkg->rl on the target queue.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal
Cc: Wu Fengguang
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-27 06:42:49 +0800

25 Jun, 2012

9 commits

5b788ce3e block: prepare for multiple request_lists ... Browse Code »

Request allocation is about to be made per-blkg meaning that there'll
be multiple request lists.

* Make queue full state per request_list. blk_*queue_full() functions
are renamed to blk_*rl_full() and takes @rl instead of @q.

* Rename blk_init_free_list() to blk_init_rl() and make it take @rl
instead of @q. Also add @gfp_mask parameter.

* Add blk_exit_rl() instead of destroying rl directly from
blk_release_queue().

* Add request_list->q and make request alloc/free functions -
blk_free_request(), [__]freed_request(), __get_request() - take @rl
instead of @q.

This patch doesn't introduce any functional difference.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-25 17:53:52 +0800
8a5ecdd42 block: add q->nr_rqs[] and move q->rq.elvpriv to q->nr_rqs_elvpriv ... Browse Code »

Add q->nr_rqs[] which currently behaves the same as q->rq.count[] and
move q->rq.elvpriv to q->nr_rqs_elvpriv. blk_drain_queue() is updated
to use q->nr_rqs[] instead of q->rq.count[].

These counters separates queue-wide request statistics from the
request list and allow implementation of per-queue request allocation.

While at it, properly indent fields of struct request_list.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-25 17:53:51 +0800
b1208b56f blkcg: inline bio_blkcg() and friends ... Browse Code »

Make bio_blkcg() and friends inline. They all are very simple and
used only in few places.

This patch is to prepare for further updates to request allocation
path.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-25 17:53:50 +0800
7f4b35d15 block: allocate io_context upfront ... Browse Code »

Block layer very lazy allocation of ioc. It waits until the moment
ioc is absolutely necessary; unfortunately, that time could be inside
queue lock and __get_request() performs unlock - try alloc - retry
dancing.

Just allocate it up-front on entry to block layer. We're not saving
the rain forest by deferring it to the last possible moment and
complicating things unnecessarily.

This patch is to prepare for further updates to request allocation
path.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-25 17:53:50 +0800
a06e05e6a block: refactor get_request[_wait]() ... Browse Code »

Currently, there are two request allocation functions - get_request()
and get_request_wait(). The former tries to allocate a request once
and the latter keeps retrying until it succeeds. The latter wraps the
former and keeps retrying until allocation succeeds.

The combination of two functions deliver fallible non-wait allocation,
fallible wait allocation and unfailing wait allocation. However,
given that forward progress is guaranteed, fallible wait allocation
isn't all that useful and in fact nobody uses it.

This patch simplifies the interface as follows.

* get_request() is renamed to __get_request() and is only used by the
wrapper function.

* get_request_wait() is renamed to get_request(). It now takes
@gfp_mask and retries iff it contains %__GFP_WAIT.

This patch doesn't introduce any functional change and is to prepare
for further updates to request allocation path.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-25 17:53:49 +0800
86072d811 block: drop custom queue draining used by scsi_transport_{iscsi|fc} ... Browse Code »

iscsi_remove_host() uses bsg_remove_queue() which implements custom
queue draining. fc_bsg_remove() open-codes mostly identical logic.

The draining logic isn't correct in that blk_stop_queue() doesn't
prevent new requests from being queued - it just stops processing, so
nothing prevents new requests to be queued after the logic determines
that the queue is drained.

blk_cleanup_queue() now implements proper queue draining and these
custom draining logics aren't necessary. Drop them and use
bsg_unregister_queue() + blk_cleanup_queue() instead.

Signed-off-by: Tejun Heo
Reviewed-by: Mike Christie
Acked-by: Vivek Goyal
Cc: James Bottomley
Cc: James Smart
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-25 17:53:48 +0800
a91a5ac68 mempool: add @gfp_mask to mempool_create_node() ... Browse Code »

mempool_create_node() currently assumes %GFP_KERNEL. Its only user,
blk_init_free_list(), is about to be updated to use other allocation
flags - add @gfp_mask argument to the function.

Signed-off-by: Tejun Heo
Cc: Andrew Morton
Cc: Hugh Dickins
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-25 17:53:47 +0800
159749937 blkcg: make root blkcg allocation use %GFP_KERNEL ... Browse Code »

Currently, blkcg_activate_policy() depends on %GFP_ATOMIC allocation
from __blkg_lookup_create() for root blkcg creation. This could make
policy fail unnecessarily.

Make blkg_alloc() take @gfp_mask, __blkg_lookup_create() take an
optional @new_blkg for preallocated blkg, and blkcg_activate_policy()
preload radix tree and preallocate blkg with %GFP_KERNEL before trying
to create the root blkg.

v2: __blkg_lookup_create() was returning %NULL on blkg alloc failure
instead of ERR_PTR() value. Fixed.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-25 17:53:46 +0800
13589864b blkcg: __blkg_lookup_create() doesn't need radix preload ... Browse Code »

There's no point in calling radix_tree_preload() if preloading doesn't
use more permissible GFP mask. Drop preloading from
__blkg_lookup_create().

While at it, drop sparse locking annotation which no longer applies.

v2: Vivek pointed out the odd preload usage. Instead of updating,
just drop it.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-25 17:53:45 +0800

15 Jun, 2012

4 commits

6d9359280 scsi: Silence unnecessary warnings about ioctl to partition ... Browse Code »

Sometimes, warnings about ioctls to partition happen often enough that they
form majority of the warnings in the kernel log and users complain. In some
cases warnings are about ioctls such as SG_IO so it's not good to get rid of
the warnings completely as they can ease debugging of userspace problems
when ioctl is refused.

Since I have seen warnings from lots of commands, including some proprietary
userspace applications, I don't think disallowing the ioctls for processes
with CAP_SYS_RAWIO will happen in the near future if ever. So lets just
stop warning for processes with CAP_SYS_RAWIO for which ioctl is allowed.

CC: Paolo Bonzini
CC: James Bottomley
CC: linux-scsi@vger.kernel.org
Acked-by: Paolo Bonzini
Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2012-06-15 18:52:46 +0800
76aaa5101 block: Drop dead function blk_abort_queue() ... Browse Code »

This function was only used by btrfs code in btrfs_abort_devices()
(seems in a wrong way).

It was removed in commit d07eb9117050c9ed3f78296ebcc06128b52693be,
So, Let's remove the dead code to avoid any confusion.

Changes in v2: update commit log, btrfs_abort_devices() was removed
already.

Cc: Jens Axboe
Cc: linux-kernel@vger.kernel.org
Cc: Chris Mason
Cc: linux-btrfs@vger.kernel.org
Cc: David Sterba
Signed-off-by: Asias He
Signed-off-by: Jens Axboe

Asias He
2012-06-15 14:46:23 +0800
5e5cfac0c block: Mitigate lock unbalance caused by lock switching ... Browse Code »

Commit 777eb1bf15b8532c396821774bf6451e563438f5 disconnects externally
supplied queue_lock before blk_drain_queue(). Switching the lock would
introduce lock unbalance because theads which have taken the external
lock might unlock the internal lock in the during the queue drain. This
patch mitigate this by disconnecting the lock after the queue draining
since queue draining makes a lot of request_queue users go away.

However, please note, this patch only makes the problem less likely to
happen. Anyone who still holds a ref might try to issue a new request on
a dead queue after the blk_cleanup_queue() finishes draining, the lock
unbalance might still happen in this case.

=====================================
[ BUG: bad unlock balance detected! ]
3.4.0+ #288 Not tainted
-------------------------------------
fio/17706 is trying to release lock (&(&q->__queue_lock)->rlock) at:
[] blk_queue_bio+0x2a2/0x380
but there are no more locks to release!

other info that might help us debug this:
1 lock held by fio/17706:
#0: (&(&vblk->lock)->rlock){......}, at: []
get_request_wait+0x19a/0x250

stack backtrace:
Pid: 17706, comm: fio Not tainted 3.4.0+ #288
Call Trace:
[] ? blk_queue_bio+0x2a2/0x380
[] print_unlock_inbalance_bug+0xf9/0x100
[] lock_release_non_nested+0x1df/0x330
[] ? dio_bio_end_aio+0x34/0xc0
[] ? bio_check_pages_dirty+0x85/0xe0
[] ? dio_bio_end_aio+0xb1/0xc0
[] ? blk_queue_bio+0x2a2/0x380
[] ? blk_queue_bio+0x2a2/0x380
[] lock_release+0xd9/0x250
[] _raw_spin_unlock_irq+0x23/0x40
[] blk_queue_bio+0x2a2/0x380
[] generic_make_request+0xca/0x100
[] submit_bio+0x76/0xf0
[] ? set_page_dirty_lock+0x3c/0x60
[] ? bio_set_pages_dirty+0x51/0x70
[] do_blockdev_direct_IO+0xbf8/0xee0
[] ? blkdev_get_block+0x80/0x80
[] __blockdev_direct_IO+0x55/0x60
[] ? blkdev_get_block+0x80/0x80
[] blkdev_direct_IO+0x57/0x60
[] ? blkdev_get_block+0x80/0x80
[] generic_file_aio_read+0x70e/0x760
[] ? __lock_acquire+0x215/0x5a0
[] ? aio_run_iocb+0x54/0x1a0
[] ? grab_cache_page_nowait+0xc0/0xc0
[] aio_rw_vect_retry+0x7c/0x1e0
[] ? aio_fsync+0x30/0x30
[] aio_run_iocb+0x66/0x1a0
[] do_io_submit+0x6f0/0xb80
[] ? trace_hardirqs_on_thunk+0x3a/0x3f
[] sys_io_submit+0x10/0x20
[] system_call_fastpath+0x16/0x1b

Changes since v2: Update commit log to explain how the code is still
broken even if we delay the lock switching after the drain.
Changes since v1: Update commit log as Tejun suggested.

Acked-by: Tejun Heo
Signed-off-by: Asias He
Signed-off-by: Jens Axboe

Asias He
2012-06-15 14:46:22 +0800
458f27a98 block: Avoid missed wakeup in request waitqueue ... Browse Code »
44

After hot-unplug a stressed disk, I found that rl->wait[] is not empty
while rl->count[] is empty and there are theads still sleeping on
get_request after the queue cleanup. With simple debug code, I found
there are exactly nr_sleep - nr_wakeup of theads in D state. So there
are missed wakeup.

$ dmesg | grep nr_sleep
[ 52.917115] ---> nr_sleep=1046, nr_wakeup=873, delta=173
$ vmstat 1
1 173 0 712640 24292 96172 0 0 0 0 419 757 0 0 0 100 0

To quote Tejun:

Ah, okay, freed_request() wakes up single waiter with the assumption
that after the wakeup there will at least be one successful allocation
which in turn will continue the wakeup chain until the wait list is
empty - ie. waiter wakeup is dependent on successful request
allocation happening after each wakeup. With queue marked dead, any
woken up waiter fails the allocation path, so the wakeup chaining is
lost and we're left with hung waiters. What we need is wake_up_all()
after drain completion.

This patch fixes the missed wakeup by waking up all the theads which
are sleeping on wait queue after queue drain.

Changes in v2: Drop waitqueue_active() optimization

Acked-by: Tejun Heo
Signed-off-by: Asias He

Fixed a bug by me, where stacked devices would oops on calling
blk_drain_queue() since ->rq.wait[] do not get initialized unless
it's a full queue setup.

Signed-off-by: Jens Axboe

Asias He
2012-06-15 14:45:25 +0800

06 Jun, 2012

1 commit

27e1f9d1c blkcg: drop local variable @q from blkg_destroy() ... Browse Code »

blkg_destroy() caches @blkg->q in local variable @q. While there are
two places which needs @blkg->q, only lockdep_assert_held() used the
local variable leading to unused local variable warning if lockdep is
configured out. Drop the local variable and just use @blkg->q
directly.

Signed-off-by: Tejun Heo
Reported-by: Rakesh Iyer
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-06 14:35:31 +0800

04 Jun, 2012

3 commits

9b2ea86bc blkcg: fix blkg_alloc() failure path ... Browse Code »

When policy data allocation fails in the middle, blkg_alloc() invokes
blkg_free() to destroy the half constructed blkg. This ends up
calling pd_exit_fn() on policy datas which didn't go through
pd_init_fn(). Fix it by making blkg_alloc() call pd_init_fn()
immediately after each policy data allocation.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-04 16:03:21 +0800
ffea73fc7 block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED ... Browse Code »

cfq may be built w/ or w/o blkcg support depending on
CONFIG_CFQ_CGROUP_IOSCHED. If blkcg support is disabled, most of
related code is ifdef'd out but some part is left dangling -
blkcg_policy_cfq is left zero-filled and blkcg_policy_[un]register()
calls are made on it.

Feeding zero filled policy to blkcg_policy_register() is incorrect and
triggers the following WARN_ON() if CONFIG_BLK_CGROUP &&
!CONFIG_CFQ_GROUP_IOSCHED.

------------[ cut here ]------------
WARNING: at block/blk-cgroup.c:867
Modules linked in:
Modules linked in:
CPU: 3 Not tainted 3.4.0-09547-gfb21aff #1
Process swapper/0 (pid: 1, task: 000000003ff80000, ksp: 000000003ff7f8b8)
Krnl PSW : 0704100180000000 00000000003d76ca (blkcg_policy_register+0xca/0xe0)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
Krnl GPRS: 0000000000000000 00000000014b85ec 00000000014b85b0 0000000000000000
000000000096fb60 0000000000000000 00000000009a8e78 0000000000000048
000000000099c070 0000000000b6f000 0000000000000000 000000000099c0b8
00000000014b85b0 0000000000667580 000000003ff7fd98 000000003ff7fd70
Krnl Code: 00000000003d76be: a7280001 lhi %r2,1
00000000003d76c2: a7f4ffdf brc 15,3d7680
#00000000003d76c6: a7f40001 brc 15,3d76c8
>00000000003d76ca: a7c8ffea lhi %r12,-22
00000000003d76ce: a7f4ffce brc 15,3d766a
00000000003d76d2: a7f40001 brc 15,3d76d4
00000000003d76d6: a7c80000 lhi %r12,0
00000000003d76da: a7f4ffc2 brc 15,3d765e
Call Trace:
([] initcall_debug+0x0/0x4)
[] cfq_init+0x62/0xd4
[] do_one_initcall+0x3a/0x170
[] kernel_init+0x214/0x2bc
[] kernel_thread_starter+0x6/0xc
[] kernel_thread_starter+0x0/0xc
no locks held by swapper/0/1.
Last Breaking-Event-Address:
[] blkcg_policy_register+0xc6/0xe0
---[ end trace b8ef4903fcbf9dd3 ]---

This patch fixes the problem by ensuring all blkcg support code is
inside CONFIG_CFQ_GROUP_IOSCHED.

* blkcg_policy_cfq declaration and blkg_to_cfqg() definition are moved
inside the first CONFIG_CFQ_GROUP_IOSCHED block. __maybe_unused is
dropped from blkcg_policy_cfq decl.

* blkcg_deactivate_poilcy() invocation is moved inside ifdef. This
also makes the activation logic match cfq_init_queue().

* All blkcg_policy_[un]register() invocations are moved inside ifdef.

Signed-off-by: Tejun Heo
Reported-by: Heiko Carstens
LKML-Reference:
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-04 16:02:29 +0800
fd7949564 block: fix return value on cfq_init() failure ... Browse Code »

cfq_init() would return zero after kmem cache creation failure. Fix
so that it returns -ENOMEM.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-04 16:01:38 +0800

31 May, 2012

1 commit

3c9c708c9 block: avoid infinite loop in get_task_io_context() ... Browse Code »

Calling get_task_io_context() on a exiting task which isn't %current can
loop forever. This triggers at boot time on my dev machine.

BUG: soft lockup - CPU#3 stuck for 22s ! [mountall.1603]

Fix this by making create_task_io_context() returns -EBUSY in this case
to break the loop.

Signed-off-by: Eric Dumazet
Cc: Tejun Heo
Cc: Andrew Morton
Cc: Alan Cox
Signed-off-by: Jens Axboe

Eric Dumazet
2012-05-31 19:39:05 +0800

30 May, 2012

1 commit

0d167518e Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-block ... Browse Code »

Merge block/IO core bits from Jens Axboe:
"This is a bit bigger on the core side than usual, but that is purely
because we decided to hold off on parts of Tejun's submission on 3.4
to give it a bit more time to simmer. As a consequence, it's seen a
long cycle in for-next.

It contains:

- Bug fix from Dan, wrong locking type.
- Relax splice gifting restriction from Eric.
- A ton of updates from Tejun, primarily for blkcg. This improves
the code a lot, making the API nicer and cleaner, and also includes
fixes for how we handle and tie policies and re-activate on
switches. The changes also include generic bug fixes.
- A simple fix from Vivek, along with a fix for doing proper delayed
allocation of the blkcg stats."

Fix up annoying conflict just due to different merge resolution in
Documentation/feature-removal-schedule.txt

* 'for-3.5/core' of git://git.kernel.dk/linux-block: (92 commits)
blkcg: tg_stats_alloc_lock is an irq lock
vmsplice: relax alignement requirements for SPLICE_F_GIFT
blkcg: use radix tree to index blkgs from blkcg
blkcg: fix blkcg->css ref leak in __blkg_lookup_create()
block: fix elvpriv allocation failure handling
block: collapse blk_alloc_request() into get_request()
blkcg: collapse blkcg_policy_ops into blkcg_policy
blkcg: embed struct blkg_policy_data in policy specific data
blkcg: mass rename of blkcg API
blkcg: style cleanups for blk-cgroup.h
blkcg: remove blkio_group->path[]
blkcg: blkg_rwstat_read() was missing inline
blkcg: shoot down blkgs if all policies are deactivated
blkcg: drop stuff unused after per-queue policy activation update
blkcg: implement per-queue policy activation
blkcg: add request_queue->root_blkg
blkcg: make request_queue bypassing on allocation
blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing
blkcg: make blkg_conf_prep() take @pol and return with queue lock held
blkcg: remove static policy ID enums
...

Linus Torvalds
2012-05-30 23:52:42 +0800

23 May, 2012

2 commits

ff26eaadf blkcg: tg_stats_alloc_lock is an irq lock ... Browse Code »

tg_stats_alloc_lock nests inside queue lock and should always be held
with irq disabled. throtl_pd_{init|exit}() were using non-irqsafe
spinlock ops which triggered inverse lock ordering via irq warning via
RCU freeing of blkg invoking throtl_pd_exit() w/o disabling IRQ.

Update both functions to use irq safe operations.

Signed-off-by: Tejun Heo
Reported-by: Sasha Levin
LKML-Reference:
Signed-off-by: Jens Axboe

Tejun Heo
2012-05-23 18:16:21 +0800
88d6ae8dc Merge branch 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:
"cgroup file type addition / removal is updated so that file types are
added and removed instead of individual files so that dynamic file
type addition / removal can be implemented by cgroup and used by
controllers. blkio controller changes which will come through block
tree are dependent on this. Other changes include res_counter cleanup
and disallowing kthread / PF_THREAD_BOUND threads to be attached to
non-root cgroups.

There's a reported bug with the file type addition / removal handling
which can lead to oops on cgroup umount. The issue is being looked
into. It shouldn't cause problems for most setups and isn't a
security concern."

Fix up trivial conflict in Documentation/feature-removal-schedule.txt

* 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
res_counter: Account max_usage when calling res_counter_charge_nofail()
res_counter: Merge res_counter_charge and res_counter_charge_nofail
cgroups: disallow attaching kthreadd or PF_THREAD_BOUND threads
cgroup: remove cgroup_subsys->populate()
cgroup: get rid of populate for memcg
cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg
cgroup: make css->refcnt clearing on cgroup removal optional
cgroup: use negative bias on css->refcnt to block css_tryget()
cgroup: implement cgroup_rm_cftypes()
cgroup: introduce struct cfent
cgroup: relocate __d_cgrp() and __d_cft()
cgroup: remove cgroup_add_file[s]()
cgroup: convert memcg controller to the new cftype interface
memcg: always create memsw files if CONFIG_CGROUP_MEM_RES_CTLR_SWAP
cgroup: convert all non-memcg controllers to the new cftype interface
cgroup: relocate cftype and cgroup_subsys definitions in controllers
cgroup: merge cft_release_agent cftype array into the base files array
cgroup: implement cgroup_add_cftypes() and friends
cgroup: build list of all cgroups under a given cgroupfs_root
cgroup: move cgroup_clear_directory() call out of cgroup_populate_dir()
...

Linus Torvalds
2012-05-23 08:40:19 +0800

22 May, 2012

1 commit

e60b9a034 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull s390 updates from Martin Schwidefsky:
"Just a random collection of bug-fixes and cleanups, nothing new in
this merge request."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (46 commits)
s390/ap: Fix wrong or missing comments
s390/ap: move receive callback to message struct
s390/dasd: re-prioritize partition detection message
s390/qeth: reshuffle initialization
s390/qeth: cleanup drv attr usage
s390/claw: cleanup drv attr usage
s390/lcs: cleanup drv attr usage
s390/ctc: cleanup drv attr usage
s390/ccwgroup: remove ccwgroup_create_from_string
s390/qeth: stop using struct ccwgroup driver for discipline callbacks
s390/qeth: switch to ccwgroup_create_dev
s390/claw: switch to ccwgroup_create_dev
s390/lcs: switch to ccwgroup_create_dev
s390/ctcm: switch to ccwgroup_create_dev
s390/ccwgroup: exploit ccwdev_by_dev_id
s390/ccwgroup: introduce ccwgroup_create_dev
s390: fix race on TIF_MCCK_PENDING
s390/barrier: make use of fast-bcr facility
s390/barrier: cleanup barrier functions
s390/claw: remove "eieio" calls
...

Linus Torvalds
2012-05-22 03:41:17 +0800

16 May, 2012

1 commit

505e5ecfd s390/dasd: re-prioritize partition detection message ... Browse Code »

To avoid confusion while formatting a DASD device change the level of
the "Expected VOL1 label not found" message from warning to info.

Signed-off-by: Stefan Haberland
Signed-off-by: Martin Schwidefsky

Stefan Haberland
2012-05-16 20:42:51 +0800

15 May, 2012

1 commit

05c69d298 block: fix buffer overflow when printing partition UUIDs ... Browse Code »
1

6d1d8050b4bc8 "block, partition: add partition_meta_info to hd_struct"
added part_unpack_uuid() which assumes that the passed in buffer has
enough space for sprintfing "%pU" - 37 characters including '\0'.

Unfortunately, b5af921ec0233 "init: add support for root devices
specified by partition UUID" supplied 33 bytes buffer to the function
leading to the following panic with stackprotector enabled.

Kernel panic - not syncing: stack-protector: Kernel stack corrupted in: ffffffff81b14c7e

[] panic+0xba/0x1c6
[] ? printk_all_partitions+0x259/0x26xb
[] __stack_chk_fail+0x1b/0x20
[] printk_all_paritions+0x259/0x26xb
[] mount_block_root+0x1bc/0x27f
[] mount_root+0x57/0x5b
[] prepare_namespace+0x13d/0x176
[] ? release_tgcred.isra.4+0x330/0x30
[] kernel_init+0x155/0x15a
[] ? schedule_tail+0x27/0xb0
[] kernel_thread_helper+0x5/0x10
[] ? start_kernel+0x3c5/0x3c5
[] ? gs_change+0x13/0x13

Increase the buffer size, remove the dangerous part_unpack_uuid() and
use snprintf() directly from printk_all_partitions().

Signed-off-by: Tejun Heo
Reported-by: Szymon Gruszczynski
Cc: Will Drewry
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe

Tejun Heo
2012-05-15 14:22:04 +0800

01 May, 2012

1 commit

0b7877d4e Merge tag 'v3.4-rc5' into for-3.5/core ... Browse Code »

The core branch is behind driver commits that we want to build
on for 3.5, hence I'm pulling in a later -rc.

Linux 3.4-rc5

Conflicts:
Documentation/feature-removal-schedule.txt

Signed-off-by: Jens Axboe

Jens Axboe
2012-05-01 20:29:55 +0800

20 Apr, 2012

4 commits

a637120e4 blkcg: use radix tree to index blkgs from blkcg ... Browse Code »

blkg lookup is currently performed by traversing linked list anchored
at blkcg->blkg_list. This is very unscalable and with blk-throttle
enabled and enough request queues on the system, this can get very
ugly quickly (blk-throttle performs look up on every bio submission).

This patch makes blkcg use radix tree to index blkgs combined with
simple last-looked-up hint. This is mostly identical to how icqs are
indexed from ioc.

Note that because __blkg_lookup() may be invoked without holding queue
lock, hint is only updated from __blkg_lookup_create(). Due to cfq's
cfqq caching, this makes hint updates overly lazy. This will be
improved with scheduled blkcg aware request allocation.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:40 +0800
496fb7806 blkcg: fix blkcg->css ref leak in __blkg_lookup_create() ... Browse Code »

__blkg_lookup_create() leaked blkcg->css ref if blkg allocation
failed. Fix it.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:40 +0800
aaf7c6806 block: fix elvpriv allocation failure handling ... Browse Code »

Request allocation is mempool backed to guarantee forward progress
under memory pressure; unfortunately, this property got broken while
adding elvpriv data. Failures during elvpriv allocation, including
ioc and icq creation failures, currently make get_request() fail as
whole. There's no forward progress guarantee for these allocations -
they may fail indefinitely under memory pressure stalling IO and
deadlocking the system.

This patch updates get_request() such that elvpriv allocation failure
doesn't make the whole function fail. If elvpriv allocation fails,
the allocation is degraded into !ELVPRIV. This will force the request
to ELEVATOR_INSERT_BACK disturbing scheduling but elvpriv alloc
failures should be rare (nothing is per-request) and anything is
better than deadlocking.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:40 +0800
29e2b09ab block: collapse blk_alloc_request() into get_request() ... Browse Code »

Allocation failure handling in get_request() is about to be updated.
To ease the update, collapse blk_alloc_request() into get_request().

This patch doesn't introduce any functional change.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:40 +0800