Eric Lee / smarc-fsl-linux-kernel

30 Jun, 2011

6 commits

15b493d11 drbd: fix limit define, we support 1 PiByte now ... Browse Code »

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Lars Ellenberg
2011-06-30 15:23:45 +0800
cb6518cbe drbd: when receive times out on meta socket, also check last receive time on data socket ... Browse Code »

If we have an asymetrically congested network, we may send P_PING,
but due to congestion, the corresponding P_PING_ACK would time out,
and we would drop a (congested, but otherwise) healthy connection
("PingAck did not arrive in time.")

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Lars Ellenberg
2011-06-30 15:23:44 +0800
5a8b42427 drbd: account bitmap IO during resync as resync-(related-)-io ... Browse Code »

If we have a good resync rate, we will frequently update the on-disk
bitmap, which, if not accounted for as resync io, may let an otherwise
idle device appear to be "busy", and cause us to throttle resync.

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Lars Ellenberg
2011-06-30 15:23:43 +0800
8ccee20e3 drbd: don't cond_resched_lock with IRQs disabled ... Browse Code »

The last commit, drbd: add missing spinlock to bitmap receive,
introduced a cond_resched_lock(), where the lock in question is taken
with irqs disabled.

As we must not schedule with IRQs disabled,
and cond_resched_lock_irq() does not exist, yet,
we re-aquire the spin_lock_irq() for each bitmap page processed in turn.

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Lars Ellenberg
2011-06-30 15:23:42 +0800
829c60878 drbd: add missing spinlock to bitmap receive ... Browse Code »

During bitmap exchange, when using the RLE bitmap compression scheme,
we have a code path that can set the whole bitmap at once.

To avoid holding spin_lock_irq() for too long, we used to lock out other
bitmap modifications during bitmap exchange by other means, and then,
knowing we have exclusive access to the bitmap, modify it without
the spinlock, and with IRQs enabled.

Since we now allow local IO to continue, potentially setting additional
bits during the bitmap receive phase, this is no longer true, and we get
uncoordinated updates of bitmap members, causing bm_set to no longer
accurately reflect the total number of set bits.

To actually see this, you'd need to have a large bitmap, use RLE bitmap
compression, and have busy IO during sync handshake and bitmap exchange.

Fix this by taking the spin_lock_irq() in this code path as well, but
calling cond_resched_lock() after each page worth of bits processed.

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Lars Ellenberg
2011-06-30 15:23:41 +0800
0cfdd247d drbd: Use the correct max_bio_size when creating resync requests ... Browse Code »

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Philipp Reisner
2011-06-30 15:23:40 +0800

24 May, 2011

16 commits

a1c15c59f loop: handle on-demand devices correctly ... Browse Code »

When finding or allocating a loop device, loop_probe() did not take
partition numbers into account so that it can result to a different
device. Consider following example:

$ sudo modprobe loop max_part=15
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
$ sudo mknod /dev/loop8 b 7 128
$ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img
$ sudo losetup -a
/dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img)
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128
brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1
brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2
brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3
brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
brw-r--r-- 1 root root 7, 128 2011-05-24 22:17 /dev/loop8

After this patch, /dev/loop8 - instead of /dev/loop128 - was
accessed correctly.

In addition, 'range' passed to blk_register_region() should
include all range of dev_t that LOOP_MAJOR can address. It does
not need to be limited by partition numbers unless 'max_loop'
param was specified.

Signed-off-by: Namhyung Kim
Cc: Laurent Vivier
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Namhyung Kim
2011-05-24 22:48:55 +0800
78f4bb367 loop: limit 'max_part' module param to DISK_MAX_PARTS ... Browse Code »

The 'max_part' parameter controls the number of maximum partition
a loop block device can have. However if a user specifies very
large value it would exceed the limitation of device minor number
and can cause a kernel panic (or, at least, produce invalid
device nodes in some cases).

On my desktop system, following command kills the kernel. On qemu,
it triggers similar oops but the kernel was alive:

$ sudo modprobe loop max_part0000
------------[ cut here ]------------
kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
invalid opcode: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in: loop(+)

Pid: 43, comm: insmod Tainted: G W 2.6.39-qemu+ #155 Bochs Bochs
RIP: 0010:[] [] internal_create_group=
+0x2a/0x170
RSP: 0018:ffff880007b3fde8 EFLAGS: 00000246
RAX: 00000000ffffffef RBX: ffff880007b3d878 RCX: 00000000000007b4
RDX: ffffffff8152da50 RSI: 0000000000000000 RDI: ffff880007b3d878
RBP: ffff880007b3fe38 R08: ffff880007b3fde8 R09: 0000000000000000
R10: ffff88000783b4a8 R11: ffff880007b3d878 R12: ffffffff8152da50
R13: ffff880007b3d868 R14: 0000000000000000 R15: ffff880007b3d800
FS: 0000000002137880(0063) GS:ffff880007c00000(0000) knlGS:00000000000000=
00
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000422680 CR3: 0000000007b50000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
Process insmod (pid: 43, threadinfo ffff880007b3e000, task ffff880007afb9c=
0)
Stack:
ffff880007b3fe58 ffffffff811e66dd ffff880007b3fe58 ffffffff811e570b
0000000000000010 ffff880007b3d800 ffff880007a7b390 ffff880007b3d868
0000000000400920 ffff880007b3d800 ffff880007b3fe48 ffffffff8113cfc8
Call Trace:
[] ? device_add+0x4bc/0x5af
[] ? dev_set_name+0x3c/0x3e
[] sysfs_create_group+0xe/0x12
[] blk_trace_init_sysfs+0x14/0x16
[] blk_register_queue+0x47/0xf7
[] add_disk+0xdf/0x290
[] loop_init+0xeb/0x1b8 [loop]
[] ? 0xffffffffa0005fff
[] do_one_initcall+0x7a/0x12e
[] sys_init_module+0x9c/0x1e0
[] system_call_fastpath+0x16/0x1b
Code: c3 55 48 89 e5 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 53 48 89 fb=
48 83 ec 28 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 0b eb fe =
48 83 7f 30 00 b9 ea ff ff ff 0f 84 18 01 00 00 49
RIP [] internal_create_group+0x2a/0x170
RSP
---[ end trace a123eb592043acad ]---

Signed-off-by: Namhyung Kim
Cc: Laurent Vivier
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Namhyung Kim
2011-05-24 22:48:54 +0800
0ddf72be4 drbd: fix warning ... Browse Code »

In file included from drivers/block/drbd/drbd_main.c:54: drivers/block/drbd/drbd_int.h:1190: warning: parameter has incomplete type

Forward declarations of enums do not work.

Fix it unpleasantly by moving the prototype.

Cc: Jens Axboe
Signed-off-by: Lars Ellenberg
Signed-off-by: Philipp Reisner
Signed-off-by: Andrew Morton

Andrew Morton
2011-05-24 16:38:33 +0800
9b2f61aec drbd: fix warning ... Browse Code »

Signed-off-by: Philipp Reisner

Philipp Reisner
2011-05-24 16:38:32 +0800
24c4830c8 drbd: Fix spelling ... Browse Code »

Found these with the help of ispell -l.

Signed-off-by: Bart Van Assche
Signed-off-by: Lars Ellenberg
Signed-off-by: Philipp Reisner

Bart Van Assche
2011-05-24 16:21:29 +0800
9a0d9d038 drbd: fix schedule in atomic ... Browse Code »

An administrative detach used to request a state change directly to D_DISKLESS,
first suspending IO to avoid the last put_ldev() occuring from an endio handler,
potentially in irq context.

This is not enough on the receiving side (typically secondary), we may miss
some peer_req on the way to local disk, which then may do the last put_ldev()
from their drbd_peer_request_endio().

This patch makes the detach always go through the intermediate D_FAILED state.
We may consider to rename it D_DETACHING.

Alternative approach would be to create yet an other work item to be scheduled
on the worker, do the destructor work from there, and get the timing right.

manually picked commit 564040f from the drbd 8.4 branch.

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Lars Ellenberg
2011-05-24 16:14:32 +0800
99432fcc5 drbd: Take a more conservative approach when deciding max_bio_size ... Browse Code »

The old (optimistic) implementation could shrink the bio size
on an primary device.

Shrinking the bio size on a primary device is bad. Since there
we might get BIOs with the old (bigger) size shortly after
we published the new size.

The new implementation is more conservative, and eventually
increases the max_bio_size on a primary device (which is valid).
It does so, when it knows the local limit AND the remote limit.

We cache the last seen max_bio_size of the peer in the meta
data, and rely on that, to make the operation of single
nodes more efficient.

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Philipp Reisner
2011-05-24 16:08:58 +0800
21423fa79 drbd: Fixed state transitions after async outdate-peer-handler returned ... Browse Code »

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Philipp Reisner
2011-05-24 16:08:11 +0800
fa7d93966 drbd: Disallow the peer_disk_state to be D_OUTDATED while connected ... Browse Code »

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Philipp Reisner
2011-05-24 16:07:50 +0800
a8e407925 drbd: Fix for the connection problems on high latency links ... Browse Code »

It seems that the real cause of all the issues where that
we did not noticed in drbd_try_connect() when the other
guy closes one socket if the round trip time gets higher
than 100ms. There were that 100ms hard coded!

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Philipp Reisner
2011-05-24 16:07:22 +0800
76727f684 drbd: fix potential activity log refcount imbalance in error path ... Browse Code »

It is no longer sufficient to trigger on local WRITE,
we need to check on (rq_state & RQ_IN_ACT_LOG)
before calling drbd_al_complete_io also in the error path.

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Lars Ellenberg
2011-05-24 16:06:44 +0800
d2e17807e drbd: Only downgrade the disk state in case of disk failures ... Browse Code »

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Philipp Reisner
2011-05-24 16:05:48 +0800
f36af18c7 drbd: fix disconnect/reconnect loop, if ping-timeout == ping-int ... Browse Code »

If there is no replication traffic within the idle timeout
(ping-int seconds), DRBD will send a P_PING,
and adjust the timeout to ping-timeout.

If there is no P_PING_ACK received within this ping-timeout,
DRBD finally drops the connection, and tries to re-establish it.

To decide which timeout was active, we compared the current timeout
with the ping-timeout, and dropped the connection, if that was the case.

By default, ping-int is 10 seconds, ping-timeout is 500 ms.

Unfortunately, if you configure ping-timeout to be the same as ping-int,
expiry of the idle-timeout had been mistaken for a missing ping ack,
and caused an immediate reconnection attempt.

Fix:
Allow both timeouts to be equal, use a local variable
to store which timeout is active.

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Lars Ellenberg
2011-05-24 16:03:30 +0800
53ea43314 drbd: fix potential distributed deadlock ... Browse Code »

We limit ourselves to a configurable maximum number of pages used as
temporary bio pages.

If the configured "max_buffers" is not big enough to match the bandwidth
of the respective deployment, a distributed deadlock could be triggered
by e.g. fast online verify and heavy application IO.

TCP connections would block on congestion, because both receivers
would wait on pages to become available.

Fortunately the respective senders in this case would be able to give
back some pages already. So do that.

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Lars Ellenberg
2011-05-24 16:02:41 +0800
600942e0f lru_cache.h: fix comments referring to ts_ instead of lc_ ... Browse Code »

For some time we contemplated calling the "struct lru_cache"
a "struct tracked_set", and some comments kept the ts_ prefix.

Fix those to match the member field names.

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Lars Ellenberg
2011-05-24 16:01:37 +0800
738a84b25 drbd: Fix for application IO with the on-io-error=pass-on policy ... Browse Code »

In case a write failes on the local disk, go into D_INCONSISTENT
disk state. That causes future reads of that block to be shipped
to the peer.

Read retry remote was already in place.

Actually the documentation needs to get fixed now. Since the
application is still shielded from the error. (as long as we have
only a single disk failing) The difference to detach is that
we keep the disk. And therefore might keep all the other, still
working sectors up to date.

Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg

Philipp Reisner
2011-05-24 15:59:49 +0800

19 May, 2011

3 commits

779d53063 Merge branches 'for-jens/xen-backend-fixes' and 'for-jens/xen-blkback-v3.3' of g… ... Browse Code »

…it://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-2.6.40/drivers

Jens Axboe
2011-05-19 15:46:00 +0800
c9ce9e438 xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions. ... Browse Code »

If the backends, which use these two functions, are compiled as
a module we need these two functions to be exported.

Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-19 04:00:01 +0800
d5431d527 xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. ... Browse Code »

We only supported the M2P (and P2M) override only for the
GNTMAP_contains_pte type mappings. Meaning that we grants
operations would "contain the machine address of the PTE to update"
If the flag is unset, then the grant operation is
"contains a host virtual address". The latter case means that
the Hypervisor takes care of updating our page table
(specifically the PTE entry) with the guest's MFN. As such we should
not try to do anything with the PTE. Previous to this patch
we would try to clear the PTE which resulted in Xen hypervisor
being upset with us:

(XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
(XEN) domain_crash called from mm.c:1067
(XEN) Domain 0 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.0-110228 x86_64 debug=y Not tainted ]----

and crashing us.

This patch allows us to inhibit the PTE clearing in the PV guest
if the GNTMAP_contains_pte is not set.

On the m2p_remove_override path we provide the same parameter.

Sadly in the grant-table driver we do not have a mechanism to
tell m2p_remove_override whether to clear the PTE or not. Since
the grant-table driver is used by user-space, we can safely assume
that it operates only on PTE's. Hence the implementation for
it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
we can implement the support for this. It will require some extra
accounting structure to keep track of the page[i], and the flag.

[v1: Added documentation details, made it return -EOPNOTSUPP instead
of trying to do a half-way implementation]
Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-19 04:00:00 +0800

18 May, 2011

1 commit

8ab521506 xen/blkback: don't fail empty barrier requests ... Browse Code »

The sector number on empty barrier requests may (will?) be -1, which,
given that it's being treated as unsigned 64-bit quantity, will almost
always exceed the actual (virtual) disk's size.

Inspired by Konrad's "When writting barriers set the sector number to
zero...".

While at it also add overflow checking to the math in vbd_translate().

Signed-off-by: Jan Beulich
Signed-off-by: Konrad Rzeszutek Wilk

Jan Beulich
2011-05-18 23:28:16 +0800

13 May, 2011

14 commits

496b318eb xen/blkback: fix xenbus_transaction_start() hang caused by double xenbus_transaction_end() ... Browse Code »

vbd_resize() up_read()'s xs_state.suspend_mutex twice in a row via double
xenbus_transaction_end() calls. The next down_read() in
xenbus_transaction_start() (at eg. the next resize attempt) hangs.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=618317

Acked-by: Jan Beulich
Acked-by: Ian Campbell
Signed-off-by: Laszlo Ersek
Signed-off-by: Konrad Rzeszutek Wilk

Laszlo Ersek
2011-05-13 21:45:40 +0800
518543227 xen/blkback: Align the tabs on the structure. ... Browse Code »

The recent changes caused this field of the structure to be offset a bit.

Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-13 06:02:28 +0800
cca537af7 xen/blkback: if log_stats is enabled print out the data. ... Browse Code »

And not depend on the driver being built with -DDEBUG flag.
Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-13 05:55:54 +0800
5a577e387 xen/blkback: Add the prefix XEN in the common.h. ... Browse Code »

Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-13 05:55:53 +0800
3d814731b xen/blkback: Prefix 'vbd' with 'xen' in structs and functions. ... Browse Code »

Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-13 05:55:52 +0800
30fd15020 xen/blkback: Change structure name blkif_st to xen_blkif. ... Browse Code »

No need for that '_st' and xen_blkif is more apt.

Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-13 05:55:51 +0800
325a64860 xen/blkback: Remove the unused typedefs. ... Browse Code »

Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-13 05:55:50 +0800
452a6b2bb xen/blkback: Move include/xen/blkif.h into drivers/block/xen-blkback/common.h ... Browse Code »

Not point of the blkif.h file. It is not used by the frontend.
Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-13 05:55:49 +0800
b0f801273 xen/blkback: Fixing some more of the cleanpatch.pl warnings. ... Browse Code »

Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-13 05:55:48 +0800
03e0edf94 xen/blkback: Checkpatch.pl recommend against multiple assigments. ... Browse Code »

CHECK: multiple assignments should be avoided

Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-13 05:55:47 +0800
41ca4d388 xen/blkback: Fix checkpatch.pl warnings about more than 80 lines. ... Browse Code »

Break up the macro usage.

Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-13 05:55:46 +0800
a4c348580 xen/blkback: Flesh out the description in the Kconfig. ... Browse Code »

with more details.

Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-13 05:55:40 +0800
b9fc02968 xen/blkback: Fix spelling mistakes. ... Browse Code »

Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-13 04:43:21 +0800
68c88dd7d xen/blkback: Move blkif_get_x86_[32|64]_req to common.h in block/xen-blkback dir. ... Browse Code »

From the blkif.h header, which was exposed to the frontend.

Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2011-05-13 04:43:20 +0800