Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

02 Apr, 2014

1 commit

e25115786 switch nbd to sockfd_lookup/sockfd_put ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-04-02 11:19:10 +0800

24 Nov, 2013

2 commits

4550dd6c6 block: Immutable bio vecs ... Browse Code »
5

This adds a mechanism by which we can advance a bio by an arbitrary
number of bytes without modifying the biovec: bio->bi_iter.bi_bvec_done
indicates the number of bytes completed in the current bvec.

Various driver code still needs to be updated to not refer to the bvec
directly before we can use this for interesting things, like efficient
bio splitting.

Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Lars Ellenberg
Cc: Paul Clements
Cc: drbd-user@lists.linbit.com
Cc: nbd-general@lists.sourceforge.net

Kent Overstreet
2013-11-24 14:33:49 +0800
7988613b0 block: Convert bio_for_each_segment() to bvec_iter ... Browse Code »

More prep work for immutable biovecs - with immutable bvecs drivers
won't be able to use the biovec directly, they'll need to use helpers
that take into account bio->bi_iter.bi_bvec_done.

This updates callers for the new usage without changing the
implementation yet.

Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Geert Uytterhoeven
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "Ed L. Cashin"
Cc: Nick Piggin
Cc: Lars Ellenberg
Cc: Jiri Kosina
Cc: Paul Clements
Cc: Jim Paris
Cc: Geoff Levand
Cc: Yehuda Sadeh
Cc: Sage Weil
Cc: Alex Elder
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris
Cc: Philip Kelleher
Cc: Konrad Rzeszutek Wilk
Cc: Jeremy Fitzhardinge
Cc: Neil Brown
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: linux390@de.ibm.com
Cc: Nagalakshmi Nandigama
Cc: Sreekanth Reddy
Cc: support@lsi.com
Cc: "James E.J. Bottomley"
Cc: Greg Kroah-Hartman
Cc: Alexander Viro
Cc: Steven Whitehouse
Cc: Herton Ronaldo Krzesinski
Cc: Tejun Heo
Cc: Andrew Morton
Cc: Guo Chao
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Matthew Wilcox
Cc: Keith Busch
Cc: Stephen Hemminger
Cc: Quoc-Son Anh
Cc: Sebastian Ott
Cc: Nitin Gupta
Cc: Minchan Kim
Cc: Jerome Marchand
Cc: Seth Jennings
Cc: "Martin K. Petersen"
Cc: Mike Snitzer
Cc: Vivek Goyal
Cc: "Darrick J. Wong"
Cc: Chris Metcalf
Cc: Jan Kara
Cc: linux-m68k@lists.linux-m68k.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: drbd-user@lists.linbit.com
Cc: nbd-general@lists.sourceforge.net
Cc: cbe-oss-dev@lists.ozlabs.org
Cc: xen-devel@lists.xensource.com
Cc: virtualization@lists.linux-foundation.org
Cc: linux-raid@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: DL-MPTFusionLinux@lsi.com
Cc: linux-scsi@vger.kernel.org
Cc: devel@driverdev.osuosl.org
Cc: linux-fsdevel@vger.kernel.org
Cc: cluster-devel@redhat.com
Cc: linux-mm@kvack.org
Acked-by: Geoff Levand

Kent Overstreet
2013-11-24 14:33:49 +0800

04 Jul, 2013

3 commits

c378f70ad nbd: correct disconnect behavior ... Browse Code »

Currently, when a disconnect is requested by the user (via NBD_DISCONNECT
ioctl) the return from NBD_DO_IT is undefined (it is usually one of
several error codes). This means that nbd-client does not know if a
manual disconnect was performed or whether a network error occurred.
Because of this, nbd-client's persist mode (which tries to reconnect after
error, but not after manual disconnect) does not always work correctly.

This change fixes this by causing NBD_DO_IT to always return 0 if a user
requests a disconnect. This means that nbd-client can correctly either
persist the connection (if an error occurred) or disconnect (if the user
requested it).

Signed-off-by: Paul Clements
Acked-by: Rob Landley
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Clements
2013-07-04 07:08:05 +0800
9532f149e nbd: remove bogus BUG_ON in NBD_CLEAR_QUE ... Browse Code »

The NBD_CLEAR_QUE ioctl has been deprecated for quite some time (its job
is now done by two other ioctls). We should stop trying to make bogus
assertions in it. Also, user-level code should remove calls to
NBD_CLEAR_QUE, ASAP.

Signed-off-by: Michal Belczyk
Signed-off-by: Paul Clements
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Belczyk
2013-07-04 07:08:05 +0800
ffc8b3086 block: do not pass disk names as format strings ... Browse Code »

Disk names may contain arbitrary strings, so they must not be
interpreted as format strings. It seems that only md allows arbitrary
strings to be used for disk names, but this could allow for a local
memory corruption from uid 0 into ring 0.

CVE-2013-2851

Signed-off-by: Kees Cook
Cc: Jens Axboe
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2013-07-04 07:07:25 +0800

01 May, 2013

1 commit

078be02b8 nbd: increase default and max request sizes ... Browse Code »

Raise the default max request size for nbd to 128KB (from 127KB) to get it
4KB aligned. This patch also allows the max request size to be increased
(via /sys/block/nbd/queue/max_sectors_kb) to 32MB.

The patch makes nbd network traffic more efficient by:
- reducing request fragmentation (4KB alignment)
- reducing the number of requests (fewer round trips, less network overhead)

Especially in high latency networks, larger request size can make a dramatic

Signed-off-by: Paul Clements
Signed-off-by: Michal Belczyk
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Belczyk
2013-05-01 08:04:07 +0800

28 Feb, 2013

4 commits

398eb0855 nbd: fix sparse warning ... Browse Code »

I just fixed this in "drivers/block/rbd.c" and I noticed that
"drivers/block/nbd.c" has the same problem. Fix a warning issued by
sparse by adding some lockdep annotations to indicate the queue lock gets
dropped (because it's held when do_nbd_request() is called) and
re-acquired within the function.

Signed-off-by: Alex Elder
Cc: Paul Clements
Cc: Paul Clements
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alex Elder
2013-02-28 11:10:22 +0800
a83e814b5 nbd: show read-only state in sysfs ... Browse Code »

Pass the read-only flag to set_device_ro, so that it will be visible to
the block layer and in sysfs.

Signed-off-by: Paolo Bonzini
Cc: Paul Clements
Cc: Alex Bligh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paolo Bonzini
2013-02-28 11:10:22 +0800
3a2d63f87 nbd: fsync and kill block device on shutdown ... Browse Code »

There are two problems with shutdown in the NBD driver.

1: Receiving the NBD_DISCONNECT ioctl does not sync the filesystem.

This patch adds the sync operation into __nbd_ioctl()'s
NBD_DISCONNECT handler. This is useful because BLKFLSBUF is restricted
to processes that have CAP_SYS_ADMIN, and the NBD client may not
possess it (fsync of the block device does not sync the filesystem,
either).

2: Once we clear the socket we have no guarantee that later reads will
come from the same backing storage.

The patch adds calls to kill_bdev() in __nbd_ioctl()'s socket
clearing code so the page cache is cleaned, lest reads that hit on the
page cache will return stale data from the previously-accessible disk.

Example:

# qemu-nbd -r -c/dev/nbd0 /dev/sr0
# file -s /dev/nbd0
/dev/stdin: # UDF filesystem data (version 1.5) etc.
# qemu-nbd -d /dev/nbd0
# qemu-nbd -r -c/dev/nbd0 /dev/sda
# file -s /dev/nbd0
/dev/stdin: # UDF filesystem data (version 1.5) etc.

While /dev/sda has:

# file -s /dev/sda
/dev/sda: x86 boot sector; etc.

Signed-off-by: Paolo Bonzini
Acked-by: Paul Clements
Cc: Alex Bligh
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paolo Bonzini
2013-02-28 11:10:22 +0800
75f187aba nbd: support FLUSH requests ... Browse Code »

Currently, the NBD device does not accept flush requests from the Linux
block layer. If the NBD server opened the target with neither O_SYNC nor
O_DSYNC, however, the device will be effectively backed by a writeback
cache. Without issuing flushes properly, operation of the NBD device will
not be safe against power losses.

The NBD protocol has support for both a cache flush command and a FUA
command flag; the server will also pass a flag to note its support for
these features. This patch adds support for the cache flush command and
flag. In the kernel, we receive the flags via the NBD_SET_FLAGS ioctl,
and map NBD_FLAG_SEND_FLUSH to the argument of blk_queue_flush. When the
flag is active the block layer will send REQ_FLUSH requests, which we
translate to NBD_CMD_FLUSH commands.

FUA support is not included in this patch because all free software
servers implement it with a full fdatasync; thus it has no advantage over
supporting flush only. Because I [Paolo] cannot really benchmark it in a
realistic scenario, I cannot tell if it is a good idea or not. It is also
not clear if it is valid for an NBD server to support FUA but not flush.
The Linux block layer gives a warning for this combination, the NBD
protocol documentation says nothing about it.

The patch also fixes a small problem in the handling of flags: nbd->flags
must be cleared at the end of NBD_DO_IT, but the driver was not doing
that. The bug manifests itself as follows. Suppose you two different
client/server pairs to start the NBD device. Suppose also that the first
client supports NBD_SET_FLAGS, and the first server sends
NBD_FLAG_SEND_FLUSH; the second pair instead does neither of these two
things. Before this patch, the second invocation of NBD_DO_IT will use a
stale value of nbd->flags, and the second server will issue an error every
time it receives an NBD_CMD_FLUSH command.

This bug is pre-existing, but it becomes much more important after this
patch; flush failures make the device pretty much unusable, unlike

Signed-off-by: Paolo Bonzini
Signed-off-by: Alex Bligh
Acked-by: Paul Clements
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alex Bligh
2013-02-28 11:10:22 +0800

23 Feb, 2013

1 commit

496ad9aa8 new helper: file_inode(file) ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:31 +0800

06 Oct, 2012

2 commits

a336d2987 nbd: handle discard requests ... Browse Code »

Add discard support to nbd. If the nbd-server supports discard, it will
send NBD_FLAG_SEND_TRIM to the client. The client will then set the flag
in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards
for the device (QUEUE_FLAG_DISCARD).

If discard support is enabled, then when the nbd client system receives a
discard request, this will be passed along to the nbd-server. When the
discard request is received by the nbd-server, it will perform:

fallocate(.. FALLOC_FL_PUNCH_HOLE ..)

To punch a hole in the backend storage, which is no longer needed.

Signed-off-by: Paul Clements
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Clements
2012-10-06 02:05:24 +0800
2f0125088 nbd: add set flags ioctl ... Browse Code »

Add a set-flags ioctl, allowing various option flags to be set on an nbd
device. This allows the nbd-client to set the device flags (to enable
read-only mode, or enable discard support, etc.).

Flags are typically specified by the nbd-server. During the negotiation
phase of the nbd connection, the server sends its flags to the client.
The client then uses NBD_SET_FLAGS to inform the kernel of the options.

Also included is a one-line fix to debug output for the set-timeout ioctl.

Signed-off-by: Paul Clements
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Clements
2012-10-06 02:05:23 +0800

18 Sep, 2012

1 commit

fded4e090 nbd: clear waiting_queue on shutdown ... Browse Code »

Fix a serious but uncommon bug in nbd which occurs when there is heavy
I/O going to the nbd device while, at the same time, a failure (server,
network) or manual disconnect of the nbd connection occurs.

There is a small window between the time that the nbd_thread is stopped
and the socket is shutdown where requests can continue to be queued to
nbd's internal waiting_queue. When this happens, those requests are
never completed or freed.

The fix is to clear the waiting_queue on shutdown of the nbd device, in
the same way that the nbd request queue (queue_head) is already being
cleared.

Signed-off-by: Paul Clements
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Clements
2012-09-18 06:00:37 +0800

02 Aug, 2012

1 commit

eff0d13f3 Merge branch 'for-3.6/drivers' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver changes from Jens Axboe:

- Making the plugging support for drivers a bit more sane from Neil.
This supersedes the plugging change from Shaohua as well.

- The usual round of drbd updates.

- Using a tail add instead of a head add in the request completion for
ndb, making us find the most completed request more quickly.

- A few floppy changes, getting rid of a duplicated flag and also
running the floppy init async (since it takes forever in boot terms)
from Andi.

* 'for-3.6/drivers' of git://git.kernel.dk/linux-block:
floppy: remove duplicated flag FD_RAW_NEED_DISK
blk: pass from_schedule to non-request unplug functions.
block: stack unplug
blk: centralize non-request unplug handling.
md: remove plug_cnt feature of plugging.
block/nbd: micro-optimization in nbd request completion
drbd: announce FLUSH/FUA capability to upper layers
drbd: fix max_bio_size to be unsigned
drbd: flush drbd work queue before invalidate/invalidate remote
drbd: fix potential access after free
drbd: call local-io-error handler early
drbd: do not reset rs_pending_cnt too early
drbd: reset congestion information before reporting it in /proc/drbd
drbd: report congestion if we are waiting for some userland callback
drbd: differentiate between normal and forced detach
drbd: cleanup, remove two unused global flags
floppy: Run floppy initialization asynchronous

Linus Torvalds
2012-08-02 00:06:47 +0800

01 Aug, 2012

1 commit

7f338fe45 nbd: set SOCK_MEMALLOC for access to PFMEMALLOC reserves ... Browse Code »

Set SOCK_MEMALLOC on the NBD socket to allow access to PFMEMALLOC reserves
so pages backed by NBD, particularly if swap related, can be cleaned to
prevent the machine being deadlocked. It is still possible that the
PFMEMALLOC reserves get depleted resulting in deadlock but this can be
resolved by the administrator by increasing min_free_kbytes.

Signed-off-by: Mel Gorman
Cc: David Miller
Cc: Neil Brown
Cc: Peter Zijlstra
Cc: Mike Christie
Cc: Eric B Munson
Cc: Eric Dumazet
Cc: Sebastian Andrzej Siewior
Cc: Mel Gorman
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-08-01 09:42:46 +0800

31 Jul, 2012

1 commit

01ff5dbc0 block/nbd: micro-optimization in nbd request completion ... Browse Code »

Add in-flight cmds to the tail. That way while searching
(during request completion),we will always get a hit on the
first element.

Signed-off-by: Chetan Loke
Acked-by: Paul.Clements@steeleye.com
Signed-off-by: Jens Axboe

Chetan Loke
2012-07-31 14:47:13 +0800

29 Mar, 2012

3 commits

532bfc851 Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge third batch of patches from Andrew Morton:
- Some MM stragglers
- core SMP library cleanups (on_each_cpu_mask)
- Some IPI optimisations
- kexec
- kdump
- IPMI
- the radix-tree iterator work
- various other misc bits.

"That'll do for -rc1. I still have ~10 patches for 3.4, will send
those along when they've baked a little more."

* emailed from Andrew Morton : (35 commits)
backlight: fix typo in tosa_lcd.c
crc32: add help text for the algorithm select option
mm: move hugepage test examples to tools/testing/selftests/vm
mm: move slabinfo.c to tools/vm
mm: move page-types.c from Documentation to tools/vm
selftests/Makefile: make `run_tests' depend on `all'
selftests: launch individual selftests from the main Makefile
radix-tree: use iterators in find_get_pages* functions
radix-tree: rewrite gang lookup using iterator
radix-tree: introduce bit-optimized iterator
fs/proc/namespaces.c: prevent crash when ns_entries[] is empty
nbd: rename the nbd_device variable from lo to nbd
pidns: add reboot_pid_ns() to handle the reboot syscall
sysctl: use bitmap library functions
ipmi: use locks on watchdog timeout set on reboot
ipmi: simplify locking
ipmi: fix message handling during panics
ipmi: use a tasklet for handling received messages
ipmi: increase KCS timeouts
ipmi: decrease the IPMI message transaction time in interrupt mode
...

Linus Torvalds
2012-03-29 08:19:28 +0800
f4507164e nbd: rename the nbd_device variable from lo to nbd ... Browse Code »

rename the nbd_device variable from "lo" to "nbd", since "lo" is just a name
copied from loop.c.

Signed-off-by: Wanlong Gao
Cc: Paul Clements
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanlong Gao
2012-03-29 08:14:37 +0800
9ffc93f20 Remove all #inclusions of asm/system.h ... Browse Code »

Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:

perl -p -i -e 's!^#\s*include\s*.*\n!!' `grep -Irl '^#\s*include\s*' *`

Signed-off-by: David Howells

David Howells
2012-03-29 01:30:03 +0800

19 Aug, 2011

6 commits

548ef6cc2 nbd-replace-some-printk-with-dev_warn-and-dev_info-checkpatch-fixes ... Browse Code »

ERROR: code indent should use tabs where possible
#30: FILE: drivers/block/nbd.c:578:
+^I dev_info(disk_to_dev(lo->disk), "NBD_DISCONNECT\n");$

total: 1 errors, 0 warnings, 35 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
scripts/cleanfile

./patches/nbd-replace-some-printk-with-dev_warn-and-dev_info.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Paul Clements
Cc: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Andrew Morton
2011-08-19 20:48:28 +0800
5eedf5415 nbd: replace some printk with dev_warn() and dev_info() ... Browse Code »

Signed-off-by: WANG Cong
Cc: Paul Clements
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

WANG Cong
2011-08-19 20:48:28 +0800
7742ce4ab nbd: lower the loglevel of an error message ... Browse Code »

This is only an error, no need to use KERN_CRIT log level.

Signed-off-by: WANG Cong
Cc: Paul Clements
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

WANG Cong
2011-08-19 20:48:28 +0800
7f1b90f99 nbd: replace printk KERN_ERR with dev_err() ... Browse Code »

Signed-off-by: WANG Cong
Cc: Paul Clements
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

WANG Cong
2011-08-19 20:48:22 +0800
1695b87f7 nbd: replace sysfs_create_file() with device_create_file() ... Browse Code »

Signed-off-by: WANG Cong
Cc: Paul Clements
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

WANG Cong
2011-08-19 20:48:21 +0800
25ac0c2b9 nbd: use task_pid_nr() to get current pid ... Browse Code »

Signed-off-by: WANG Cong
Cc: Paul Clements
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

WANG Cong
2011-08-19 20:48:17 +0800

28 May, 2011

3 commits

5988ce239 nbd: adjust 'max_part' according to part_shift ... Browse Code »

The 'max_part' parameter determines how many partitions are supported
on each nbd device. However the actual number can be changed to the
power of 2 minus 1 form during the module initialization as
alloc_disk() is called with (1 << part_shift) for some reason.

So adjust 'max_part' also at least for consistency with loop and brd.
It is exported via sysfs already, and a user should check this value
after module loading if [s]he wants to use that number correctly
(i.e. fdisk or something).

Signed-off-by: Namhyung Kim
Cc: Laurent Vivier
Cc: Paul Clements
Signed-off-by: Jens Axboe

Namhyung Kim
2011-05-28 20:44:46 +0800
3b2710824 nbd: limit module parameters to a sane value ... Browse Code »

The 'max_part' parameter controls the number of maximum partition
a nbd device can have. However if a user specifies very large
value it would exceed the limitation of device minor number and
can cause a kernel oops (or, at least, produce invalid device
nodes in some cases).

In addition, specifying large 'nbds_max' value causes same
problem for the same reason.

On my desktop, following command results to the kernel bug:

$ sudo modprobe nbd max_part=100000
kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/block/nbd4/range
CPU 1
Modules linked in: nbd(+) bridge stp llc kvm_intel kvm asus_atk0110 sg sr_mod cdrom

Pid: 2522, comm: modprobe Tainted: G W 2.6.39-leonard+ #159 System manufacturer System Product Name/P5G41TD-M PRO
RIP: 0010:[] [] internal_create_group+0x2f/0x166
RSP: 0018:ffff8801009f1de8 EFLAGS: 00010246
RAX: 00000000ffffffef RBX: ffff880103920478 RCX: 00000000000a7bd3
RDX: ffffffff81a2dbe0 RSI: 0000000000000000 RDI: ffff880103920478
RBP: ffff8801009f1e38 R08: ffff880103920468 R09: ffff880103920478
R10: ffff8801009f1de8 R11: ffff88011eccbb68 R12: ffffffff81a2dbe0
R13: ffff880103920468 R14: 0000000000000000 R15: ffff880103920400
FS: 00007f3c49de9700(0000) GS:ffff88011f800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f3b7fe7c000 CR3: 00000000cd58d000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 2522, threadinfo ffff8801009f0000, task ffff8801009a93a0)
Stack:
ffff8801009f1e58 ffffffff812e8f6e ffff8801009f1e58 ffffffff812e7a80
ffff880000000010 ffff880103920400 ffff8801002fd0c0 ffff880103920468
0000000000000011 ffff880103920400 ffff8801009f1e48 ffffffff8115ab6a
Call Trace:
[] ? device_add+0x4f1/0x5e4
[] ? dev_set_name+0x41/0x43
[] sysfs_create_group+0x13/0x15
[] blk_trace_init_sysfs+0x14/0x16
[] blk_register_queue+0x4c/0xfd
[] add_disk+0xe4/0x29c
[] nbd_init+0x2ab/0x30d [nbd]
[] ? 0xffffffffa007dfff
[] do_one_initcall+0x7f/0x13e
[] sys_init_module+0xa1/0x1e3
[] system_call_fastpath+0x16/0x1b
Code: 41 57 41 56 41 55 41 54 53 48 83 ec 28 0f 1f 44 00 00 48 89 fb 41 89 f6 49 89 d4 48 85 ff 74 0b 85 f6 75 0b 48 83
7f 30 00 75 14 0b eb fe b9 ea ff ff ff 48 83 7f 30 00 0f 84 09 01 00 00 49
RIP [] internal_create_group+0x2f/0x166
RSP
---[ end trace 753285ffbf72c57c ]---

Signed-off-by: Namhyung Kim
Cc: Laurent Vivier
Cc: Paul Clements
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Namhyung Kim
2011-05-28 20:44:46 +0800
35fbf5bcf nbd: pass MSG_* flags to kernel_recvmsg() ... Browse Code »

Unlike kernel_sendmsg(), kernel_recvmsg() requires passing flags explicitly
via last parameter instead of struct msghdr.msg_flags. Therefore calls to
sock_xmit(lo, 0, ..., MSG_WAITALL) have not been processed properly by tcp
layer wrt. the flag. Fix it.

Signed-off-by: Namhyung Kim
Cc: Paul Clements
Signed-off-by: Jens Axboe

Namhyung Kim
2011-05-28 20:44:46 +0800

12 Feb, 2011

1 commit

de1f016f8 nbd: remove module-level ioctl mutex ... Browse Code »

Commit 2a48fc0ab242417 ("block: autoconvert trivial BKL users to private
mutex") replaced uses of the BKL in the nbd driver with mutex
operations. Since then, I've been been seeing these lock ups:

INFO: task qemu-nbd:16115 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-nbd D 0000000000000001 0 16115 16114 0x00000004
ffff88007d775d98 0000000000000082 ffff88007d775fd8 ffff88007d774000
0000000000013a80 ffff8800020347e0 ffff88007d775fd8 0000000000013a80
ffff880133730000 ffff880002034440 ffffea0004333db8 ffffffffa071c020
Call Trace:
[] __mutex_lock_slowpath+0xf7/0x180
[] mutex_lock+0x2b/0x50
[] nbd_ioctl+0x6c/0x1c0 [nbd]
[] blkdev_ioctl+0x230/0x730
[] block_ioctl+0x41/0x50
[] do_vfs_ioctl+0x93/0x370
[] sys_ioctl+0x81/0xa0
[] system_call_fastpath+0x16/0x1b

Instrumenting the nbd module's ioctl handler with some extra logging
clearly shows the NBD_DO_IT ioctl being invoked which is a long-lived
ioctl in the sense that it doesn't return until another ioctl asks the
driver to disconnect. However, that other ioctl blocks, waiting for the
module-level mutex that replaced the BKL, and then we're stuck.

This patch removes the module-level mutex altogether. It's clearly
wrong, and as far as I can see, it's entirely unnecessary, since the nbd
driver maintains per-device mutexes, and I don't see anything that would
require a module-level (or kernel-level, for that matter) mutex.

Signed-off-by: Soren Hansen
Acked-by: Serge Hallyn
Acked-by: Paul Clements
Cc: Arnd Bergmann
Cc: Jens Axboe
Cc: [2.6.37.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Soren Hansen
2011-02-12 08:12:20 +0800

05 Oct, 2010

1 commit

2a48fc0ab block: autoconvert trivial BKL users to private mutex ... Browse Code »

The block device drivers have all gained new lock_kernel
calls from a recent pushdown, and some of the drivers
were already using the BKL before.

This turns the BKL into a set of per-driver mutexes.
Still need to check whether this is safe to do.

file=$1
name=$2
if grep -q lock_kernel ${file} ; then
if grep -q 'include.*linux.mutex.h' ${file} ; then
sed -i '/include.*/d' ${file}
else
sed -i 's/include.*.*$/include /g' ${file}
fi
sed -i ${file} \
-e "/^#include.*linux.mutex.h/,$ {
1,/^$static\|int\|long$/ {
/^$static\|int\|long$/istatic DEFINE_MUTEX(${name}_mutex);

} }" \
-e "s/$un$*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \
-e '/[ ]*cycle_kernel_lock();/d'
else
sed -i -e '/include.*\/d' ${file} \
-e '/cycle_kernel_lock()/d'
fi

Signed-off-by: Arnd Bergmann

Arnd Bergmann
2010-10-05 21:01:10 +0800

11 Aug, 2010

1 commit

2f9e825d3 Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
xen-blkfront: fix missing out label
blkdev: fix blkdev_issue_zeroout return value
block: update request stacking methods to support discards
block: fix missing export of blk_types.h
writeback: fix bad _bh spinlock nesting
drbd: revert "delay probes", feature is being re-implemented differently
drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
drbd: Disable delay probes for the upcomming release
writeback: cleanup bdi_register
writeback: add new tracepoints
writeback: remove unnecessary init_timer call
writeback: optimize periodic bdi thread wakeups
writeback: prevent unnecessary bdi threads wakeups
writeback: move bdi threads exiting logic to the forker thread
writeback: restructure bdi forker loop a little
writeback: move last_active to bdi
writeback: do not remove bdi from bdi_list
writeback: simplify bdi code a little
writeback: do not lose wake-ups in bdi threads
...

Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.

Linus Torvalds
2010-08-11 06:22:42 +0800

08 Aug, 2010

2 commits

8a6cfeb6d block: push down BKL into .locked_ioctl ... Browse Code »

As a preparation for the removal of the big kernel
lock in the block layer, this removes the BKL
from the common ioctl handling code, moving it
into every single driver still using it.

Signed-off-by: Arnd Bergmann
Acked-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Arnd Bergmann
2010-08-08 00:25:00 +0800
33659ebba block: remove wrappers for request type/flags ... Browse Code »

Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
struct requests. This allows much easier grepping for different request
types instead of unwinding through macros.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-08-08 00:17:56 +0800

19 Jul, 2010

1 commit

a2531293d update email address ... Browse Code »

pavel@suse.cz no longer works, replace it with working address.

Signed-off-by: Pavel Machek
Signed-off-by: Jiri Kosina

Pavel Machek
2010-07-19 16:56:54 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

22 Sep, 2009

1 commit

83d5cde47 const: make block_device_operations const ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-09-22 22:17:25 +0800

11 May, 2009

2 commits

9934c8c04 block: implement and enforce request peek/start/fetch ... Browse Code »

Till now block layer allowed two separate modes of request execution.
A request is always acquired from the request queue via
elv_next_request(). After that, drivers are free to either dequeue it
or process it without dequeueing. Dequeue allows elv_next_request()
to return the next request so that multiple requests can be in flight.

Executing requests without dequeueing has its merits mostly in
allowing drivers for simpler devices which can't do sg to deal with
segments only without considering request boundary. However, the
benefit this brings is dubious and declining while the cost of the API
ambiguity is increasing. Segment based drivers are usually for very
old or limited devices and as converting to dequeueing model isn't
difficult, it doesn't justify the API overhead it puts on block layer
and its more modern users.

Previous patches converted all block low level drivers to dequeueing
model. This patch completes the API transition by...

* renaming elv_next_request() to blk_peek_request()

* renaming blkdev_dequeue_request() to blk_start_request()

* adding blk_fetch_request() which is combination of peek and start

* disallowing completion of queued (not started) requests

* applying new API to all LLDs

Renamings are for consistency and to break out of tree code so that
it's apparent that out of tree drivers need updating.

[ Impact: block request issue API cleanup, no functional change ]

Signed-off-by: Tejun Heo
Cc: Rusty Russell
Cc: James Bottomley
Cc: Mike Miller
Cc: unsik Kim
Cc: Paul Clements
Cc: Tim Waugh
Cc: Geert Uytterhoeven
Cc: David S. Miller
Cc: Laurent Vivier
Cc: Jeff Garzik
Cc: Jeremy Fitzhardinge
Cc: Grant Likely
Cc: Adrian McMenamin
Cc: Stephen Rothwell
Cc: Bartlomiej Zolnierkiewicz
Cc: Borislav Petkov
Cc: Sergei Shtylyov
Cc: Alex Dubov
Cc: Pierre Ossman
Cc: David Woodhouse
Cc: Markus Lidel
Cc: Stefan Weinhuber
Cc: Martin Schwidefsky
Cc: Pete Zaitcev
Cc: FUJITA Tomonori
Signed-off-by: Jens Axboe

Tejun Heo
2009-05-11 15:52:18 +0800
1011c1b9f block: blk_rq_[cur_]_{sectors|bytes}() usage cleanup ... Browse Code »

With the previous changes, the followings are now guaranteed for all
requests in any valid state.

* blk_rq_sectors() == blk_rq_bytes() >> 9
* blk_rq_cur_sectors() == blk_rq_cur_bytes() >> 9

Clean up accessor usages. Notable changes are

* nbd,i2o_block: end_all used instead of explicit byte count
* scsi_lib: unnecessary conditional on request type removed

[ Impact: cleanup ]

Signed-off-by: Tejun Heo
Cc: Paul Clements
Cc: Pete Zaitcev
Cc: Alex Dubov
Cc: Markus Lidel
Cc: David Woodhouse
Cc: James Bottomley
Cc: Boaz Harrosh
Signed-off-by: Jens Axboe

Tejun Heo
2009-05-11 15:50:55 +0800