Eric Lee / smarc-fsl-linux-kernel

30 Oct, 2012

12 commits

a1ecac3b0 loop: Make explicit loop device destruction lazy ... Browse Code »

xfstests has always had random failures of tests due to loop devices
failing to be torn down and hence leaving filesytems that cannot be
unmounted. This causes test runs to immediately stop.

Over the past 6 or 7 years we've added hacks like explicit unmount
-d commands for loop mounts, losetup -d after unmount -d fails, etc,
but still the problems persist. Recently, the frequency of loop
related failures increased again to the point that xfstests 259 will
reliably fail with a stray loop device that was not torn down.

That is despite the fact the test is above as simple as it gets -
loop 5 or 6 times running mkfs.xfs with different paramters:

lofile=$(losetup -f)
losetup $lofile "$testfile"
"$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null || echo "mkfs failed!"
sync
losetup -d $lofile

And losteup -d $lofile is failing with EBUSY on 1-3 of these loops
every time the test is run.

Turns out that blkid is running simultaneously with losetup -d, and
so it sees an elevated reference count and returns EBUSY. But why
is blkid running? It's obvious, isn't it? udev has decided to try
and find out what is on the block device as a result of a creation
notification. And it is racing with mkfs, so might still be scanning
the device when mkfs finishes and we try to tear it down.

So, make losetup -d force autoremove behaviour. That is, when the
last reference goes away, tear down the device. xfstests wants it
*gone*, not causing random teardown failures when we know that all
the operations the tests have specifically run on the device have
completed and are no longer referencing the loop device.

Signed-off-by: Dave Chinner
Signed-off-by: Jens Axboe

Dave Chinner
2012-10-30 15:37:31 +0800
4453bc88f mtip32xx:Added appropriate timeout value for secure erase ... Browse Code »

Added appropriate timeout value for secure erase based on identify device data

Signed-off-by: Asai Thambi S P
Signed-off-by: Selvan Mani
Signed-off-by: Jens Axboe

Selvan Mani
2012-10-30 15:37:27 +0800
1f999572f xen/blkback: Change xen_vbd's flush_support and discard_secure to have type unsi… ... Browse Code »

…gned int, rather than bool

Changing the type of bdev parameters to be unsigned int :1, rather than bool.
This is more consistent with the types of other features in the block drivers.

Signed-off-by: Oliver Chick <oliver.chick@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Oliver Chick
2012-10-30 15:37:20 +0800
b7010ede4 cciss: select CONFIG_CHECK_SIGNATURE ... Browse Code »

The patch cciss-use-check_signature.patch in -mm tree introduced
a build error:

drivers/built-in.o: In function `CISS_signature_present':
drivers/block/cciss.c:4270: undefined reference to `check_signature'

Add missing CONFIG_CHECK_SIGNATURE to fix this issue.

Reported-by: Fengguang Wu
Signed-off-by: Akinobu Mita
Cc: Fengguang Wu
Cc: Mike Miller
Cc: Jens Axboe
Acked-by: "Stephen M. Cameron"
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Akinobu Mita
2012-10-30 15:37:00 +0800
2541aa799 cciss: remove unneeded memset() ... Browse Code »

The memory return by kzalloc() or kmem_cache_zalloc() has already be set
to zero, so remove useless memset(0).

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun
Cc: Mike Miller
Cc: Jens Axboe
Cc: Stephen M. Cameron
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Wei Yongjun
2012-10-30 15:36:58 +0800
654dbef21 xen/blkback: use kmem_cache_zalloc instead of kmem_cache_alloc/memset ... Browse Code »

Using kmem_cache_zalloc() instead of kmem_cache_alloc() and memset().

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun
Signed-off-by: Konrad Rzeszutek Wilk

Wei Yongjun
2012-10-30 15:36:27 +0800
1a4ae43e4 floppy: remove dr, reuse drive on do_floppy_init ... Browse Code »

This is a small cleanup, that also may turn error handling of
unitialized disks more readable. We don't need a separate variable to
track allocated disks, remove dr and reuse drive variable instead.

Signed-off-by: Herton Ronaldo Krzesinski
Signed-off-by: Jiri Kosina
Signed-off-by: Jens Axboe

Herton Ronaldo Krzesinski
2012-10-30 15:36:07 +0800
8d3ab4ebf floppy: use common function to check if floppies can be registered ... Browse Code »

The same checks to see if a drive can be or is registered are
repeated through the code, factor out the checks in a common function
and replace the repeated checks with it.

Signed-off-by: Herton Ronaldo Krzesinski
Signed-off-by: Jiri Kosina
Signed-off-by: Jens Axboe

Herton Ronaldo Krzesinski
2012-10-30 15:34:25 +0800
d60e7ec18 floppy: properly handle failure on add_disk loop ... Browse Code »

On floppy initialization, if something failed inside the loop we call
add_disk, there was no cleanup of previous iterations in the error
handling.

Cc: stable@vger.kernel.org
Signed-off-by: Herton Ronaldo Krzesinski
Signed-off-by: Jiri Kosina
Signed-off-by: Jens Axboe

Herton Ronaldo Krzesinski
2012-10-30 15:34:25 +0800
238ab7846 floppy: do put_disk on current dr if blk_init_queue fails ... Browse Code »

If blk_init_queue fails, we do not call put_disk on the current dr
(dr is decremented first in the error handling loop).

Cc: stable@vger.kernel.org
Reviewed-by: Ben Hutchings
Signed-off-by: Herton Ronaldo Krzesinski
Signed-off-by: Jiri Kosina
Signed-off-by: Jens Axboe

Herton Ronaldo Krzesinski
2012-10-30 15:34:25 +0800
b54e1f888 floppy: don't call alloc_ordered_workqueue inside the alloc_disk loop ... Browse Code »

Since commit 070ad7e ("floppy: convert to delayed work and single-thread
wq"), we end up calling alloc_ordered_workqueue multiple times inside
the loop, which shouldn't be intended. Besides the leak, other side
effect in the current code is if blk_init_queue fails, we would end up
calling unregister_blkdev even if we didn't call yet register_blkdev.

Just moved the allocation of floppy_wq before the loop, and adjusted the
code accordingly.

Cc: stable@vger.kernel.org # 3.5+
Acked-by: Vivek Goyal
Reviewed-by: Ben Hutchings
Signed-off-by: Herton Ronaldo Krzesinski
Signed-off-by: Jiri Kosina
Signed-off-by: Jens Axboe

Herton Ronaldo Krzesinski
2012-10-30 15:34:24 +0800
2911758f1 xen/blkback: Fix compile warning ... Browse Code »

drivers/block/xen-blkback/xenbus.c:260:5: warning: symbol 'xenvbd_sysfs_addif' was not declared. Should it be static?
drivers/block/xen-blkback/xenbus.c:284:6: warning: symbol 'xenvbd_sysfs_delif' was not declared. Should it be static?

Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2012-10-30 15:32:43 +0800

24 Oct, 2012

1 commit

b8977285e drivers/block: remove CONFIG_EXPERIMENTAL ... Browse Code »

This config item has not carried much meaning for a while now and is
almost always enabled by default. As agreed during the Linux kernel
summit, remove it.

CC: Greg Kroah-Hartman
CC: Asai Thambi S P
CC: Pete Zaitcev
CC: Cong Wang
CC: Jens Axboe
Signed-off-by: Kees Cook
Signed-off-by: Jens Axboe

Kees Cook
2012-10-24 04:30:38 +0800

11 Oct, 2012

1 commit

ce40be7a8 Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block IO update from Jens Axboe:
"Core block IO bits for 3.7. Not a huge round this time, it contains:

- First series from Kent cleaning up and generalizing bio allocation
and freeing.

- WRITE_SAME support from Martin.

- Mikulas patches to prevent O_DIRECT crashes when someone changes
the block size of a device.

- Make bio_split() work on data-less bio's (like trim/discards).

- A few other minor fixups."

Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew
Morton. It is due to the VM no longer using a prio-tree (see commit
6b2dbba8b6ac: "mm: replace vma prio_tree with an interval tree").

So make set_blocksize() use mapping_mapped() instead of open-coding the
internal VM knowledge that has changed.

* 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits)
block: makes bio_split support bio without data
scatterlist: refactor the sg_nents
scatterlist: add sg_nents
fs: fix include/percpu-rwsem.h export error
percpu-rw-semaphore: fix documentation typos
fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared
blockdev: turn a rw semaphore into a percpu rw semaphore
Fix a crash when block device is read and block size is changed at the same time
block: fix request_queue->flags initialization
block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
block: ioctl to zero block ranges
block: Make blkdev_issue_zeroout use WRITE SAME
block: Implement support for WRITE SAME
block: Consolidate command flag and queue limit checks for merges
block: Clean up special command handling logic
block/blk-tag.c: Remove useless kfree
block: remove the duplicated setting for congestion_threshold
block: reject invalid queue attribute values
block: Add bio_clone_bioset(), bio_clone_kmalloc()
block: Consolidate bio_alloc_bioset(), bio_kmalloc()
...

Linus Torvalds
2012-10-11 08:04:23 +0800

08 Oct, 2012

1 commit

7035cdf36 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull ceph updates from Sage Weil:
"The bulk of this pull is a series from Alex that refactors and cleans
up the RBD code to lay the groundwork for supporting the new image
format and evolving feature set. There are also some cleanups in
libceph, and for ceph there's fixed validation of file striping
layouts and a bugfix in the code handling a shrinking MDS cluster."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
ceph: avoid 32-bit page index overflow
ceph: return EIO on invalid layout on GET_DATALOC ioctl
rbd: BUG on invalid layout
ceph: propagate layout error on osd request creation
libceph: check for invalid mapping
ceph: convert to use le32_add_cpu()
ceph: Fix oops when handling mdsmap that decreases max_mds
rbd: update remaining header fields for v2
rbd: get snapshot name for a v2 image
rbd: get the snapshot context for a v2 image
rbd: get image features for a v2 image
rbd: get the object prefix for a v2 rbd image
rbd: add code to get the size of a v2 rbd image
rbd: lay out header probe infrastructure
rbd: encapsulate code that gets snapshot info
rbd: add an rbd features field
rbd: don't use index in __rbd_add_snap_dev()
rbd: kill create_snap sysfs entry
rbd: define rbd_dev_image_id()
rbd: define some new format constants
...

Linus Torvalds
2012-10-08 05:38:18 +0800

07 Oct, 2012

2 commits

dc92b1f9a Merge branch 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull virtio changes from Rusty Russell:
"New workflow: same git trees pulled by linux-next get sent straight to
Linus. Git is awkward at shuffling patches compared with quilt or mq,
but that doesn't happen often once things get into my -next branch."

* 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (24 commits)
lguest: fix occasional crash in example launcher.
virtio-blk: Disable callback in virtblk_done()
virtio_mmio: Don't attempt to create empty virtqueues
virtio_mmio: fix off by one error allocating queue
drivers/virtio/virtio_pci.c: fix error return code
virtio: don't crash when device is buggy
virtio: remove CONFIG_VIRTIO_RING
virtio: add help to CONFIG_VIRTIO option.
virtio: support reserved vqs
virtio: introduce an API to set affinity for a virtqueue
virtio-ring: move queue_index to vring_virtqueue
virtio_balloon: not EXPERIMENTAL any more.
virtio-balloon: dependency fix
virtio-blk: fix NULL checking in virtblk_alloc_req()
virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path
virtio-blk: Add bio-based IO path for virtio-blk
virtio: console: fix error handling in init() function
tools: Fix pthread flag for Makefile of trace-agent used by virtio-trace
tools: Add guest trace agent as a user tool
virtio/console: Allocate scatterlist according to the current pipe size
...

Linus Torvalds
2012-10-07 20:04:56 +0800
f1c6872e4 Merge tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen ... Browse Code »

Pull ADM Xen support from Konrad Rzeszutek Wilk:

Features:
* Allow a Linux guest to boot as initial domain and as normal guests
on Xen on ARM (specifically ARMv7 with virtualized extensions). PV
console, block and network frontend/backends are working.
Bug-fixes:
* Fix compile linux-next fallout.
* Fix PVHVM bootup crashing.

The Xen-unstable hypervisor (so will be 4.3 in a ~6 months), supports
ARMv7 platforms.

The goal in implementing this architecture is to exploit the hardware
as much as possible. That means use as little as possible of PV
operations (so no PV MMU) - and use existing PV drivers for I/Os
(network, block, console, etc). This is similar to how PVHVM guests
operate in X86 platform nowadays - except that on ARM there is no need
for QEMU. The end result is that we share a lot of the generic Xen
drivers and infrastructure.

Details on how to compile/boot/etc are available at this Wiki:

http://wiki.xen.org/wiki/Xen_ARMv7_with_Virtualization_Extensions

and this blog has links to a technical discussion/presentations on the
overall architecture:

http://blog.xen.org/index.php/2012/09/21/xensummit-sessions-new-pvh-virtualisation-mode-for-arm-cortex-a15arm-servers-and-x86/

* tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (21 commits)
xen/xen_initial_domain: check that xen_start_info is initialized
xen: mark xen_init_IRQ __init
xen/Makefile: fix dom-y build
arm: introduce a DTS for Xen unprivileged virtual machines
MAINTAINERS: add myself as Xen ARM maintainer
xen/arm: compile netback
xen/arm: compile blkfront and blkback
xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree
xen/arm: receive Xen events on ARM
xen/arm: initialize grant_table on ARM
xen/arm: get privilege status
xen/arm: introduce CONFIG_XEN on ARM
xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM
xen/arm: Introduce xen_ulong_t for unsigned long
xen/arm: Xen detection and shared_info page mapping
docs: Xen ARM DT bindings
xen/arm: empty implementation of grant_table arch specific functions
xen/arm: sync_bitops
xen/arm: page.h definitions
xen/arm: hypercalls
...

Linus Torvalds
2012-10-07 06:13:01 +0800

06 Oct, 2012

21 commits

322c9ec00 aoe: update aoe-internal version number to 50 ... Browse Code »

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:30 +0800
1ac9e6026 aoe: remove unused code ... Browse Code »

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:30 +0800
08b606235 aoe: make dynamic block minor numbers the default ... Browse Code »

Because udev use is so widespread, making the old static mapping the
default is too conservative, given the severe limitations it places on
usable AoE addresses. Storage virtualization and larger shelves have made
the old limitations too confining.

These changes make the dynamic block device minor numbers the default,
removing the limitations on usable AoE addresses.

The static arrangement is still available with aoe_dyndevs=0, and the
aoe-stat tool from the userland aoetools package, the user space
counterpart to the aoe driver, recognizes the case where there is a
mismatch between the minor number in sysfs and the minor number in a
special device file.

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:29 +0800
7159e969d aoe: update and specify AoE address guards and error messages ... Browse Code »

In general, specific is better when it comes to messages about AoE usage
problems. Also, explicit checks for the AoE broadcast addresses are
added.

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:29 +0800
4bcce1a35 aoe: retain static block device numbers for backwards compatibility ... Browse Code »

The old mapping between AoE target shelf and slot addresses and the block
device minor number is retained as a backwards-compatible feature, with a
new "aoe_dyndevs" module parameter available for enabling dynamic block
device minor numbers.

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:29 +0800
0c9662145 aoe: support more AoE addresses with dynamic block device minor numbers ... Browse Code »

The ATA over Ethernet protocol uses a major (shelf) and minor (slot)
address to identify a particular storage target. These changes remove an
artificial limitation the aoe driver imposes on the use of AoE addresses.
For example, without these changes, the slot address has a maximum of 15,
but users commonly use slot numbers much greater than that.

The AoE shelf and slot address space is often used sparsely. Instead of
using a static mapping between AoE addresses and the block device minor
number, the block device minor numbers are now allocated on demand.

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:28 +0800
fea05a26c aoe: update copyright year in touched files ... Browse Code »

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:28 +0800
7392fbe5a aoe: update internal version number to 49 ... Browse Code »

The internal version number of the aoe driver appears in a console message
when the driver loads and is usually obtained by the user with the
userland aoe-version tool, part of the aoetools.[1]

Although this patchset includes bugfixes backported from higher-numbered
versions published on the coraid.com website, it is a form of version 49.

1. http://aoetools.sourceforge.net/

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:27 +0800
b21faa25c aoe: remove unused code and add cosmetic improvements ... Browse Code »

This change removes some unused code and attempts to increase code
consistency.

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:27 +0800
1b86fda9a aoe: increase net_device reference count while using it ... Browse Code »

This change eliminates the danger that the user could rmmod the driver for
a network interface that is being used for AoE by the aoe driver.

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:27 +0800
64a80f5ac aoe: associate frames with the AoE storage target ... Browse Code »

In the driver code, "target" and aoetgt refer to a particular remote
interface on the AoE storage target. The latter is identified by its AoE
major and minor addresses. Commands that are being sent to an AoE storage
target {major, minor} can be sent or retransmitted to any of the remote
MAC addresses associated with the AoE storage target.

That is, frames are naturally associated with not an aoetgt (AoE major,
AoE minor, remote MAC address) but an aoedev (AoE major, AoE minor).
Making the code reflect that reality simplifies the driver, especially
when the path to a remote MAC address becomes unusable.

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:27 +0800
6583303c5 aoe: disallow unsupported AoE minor addresses ... Browse Code »

A guard is inserted to prevent AoE minor addresses (slot addresses) higher
than 15 to be used, as they are not yet supported by the driver.

There is a change coming that will allow the aoe driver to overcome this
limit by using system device minor numbers dynamically, but until then,
this guard prevents unexpected targets from being used by the driver when
AoE targets with high minor numbers are on the AoE network.

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:26 +0800
25f4d75ea aoe: do revalidation steps in order ... Browse Code »

The discovery process begins with an optional AoE config query command and
an AoE config query response. Normally when an aoe device is already
open, the config query response does not trigger an ATA identify device
command to be sent out, since the response contains storage capacity
information that, if changed, could surprise the user of the device.

The userland "aoe-revalidate" tool uses a character device to trigger an
AoE config query for a particular AoE storage target and an ATA device
identify command, even when the device is open.

This change causes the config query to go out first, reflecting the normal
discovery sequence. The responses could come back in any order, so this
change is fairly cosmetic.

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:26 +0800
d54d35ac6 aoe: failover remote interface based on aoe_deadsecs parameter ... Browse Code »

The aoe_deadsecs module parameter allows the user to specify a hard limit
on the number of seconds an AoE command can be retransmitted before the
AoE block device is considered to have failed.

Using aoe_deadsecs to determine the time we try using a different remote
interface helps to ensure that the hard limit is not reached before we've
tried to recover by sending to a different remote port.

As a data storage target, the AoE target is unambiguously identified by
its {major, minor} AoE address tuple, and an AoE target can have multiple
MAC addresses. However, note that "target" in the driver code and
comments means a {major, minor, MAC address} tuple, as in "somewhere to
send packets".

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:26 +0800
3f0f01337 aoe: use packets that work with the smallest-MTU local interface ... Browse Code »

Users with several network interfaces dedicated to AoE generally do not
configure them to support different-sized AoE data payloads on purpose.

For a given AoE target, there will be a set of local network interfaces
that can reach it. Using only the payload that will fit in the
smallest-sized MTU of all those local interfaces greatly simplifies the
driver, especially in failure scenarios.

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:25 +0800
eb086ec59 aoe: use a kernel thread for transmissions ... Browse Code »

The dev_queue_xmit function needs to have interrupts enabled, so the most
simple way to get the locking right but still fulfill that requirement is
to use a process that can call dev_queue_xmit serially over queued
transmissions.

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:25 +0800
69cf2d85d aoe: become I/O request queue handler for increased user control ... Browse Code »

To allow users to choose an elevator algorithm for their particular
workloads, change from a make_request-style driver to an
I/O-request-queue-handler-style driver.

We have to do a couple of things that might be surprising. We manipulate
the page _count directly on the assumption that we still have no guarantee
that users of the block layer are prohibited from submitting bios
containing pages with zero reference counts.[1] If such a prohibition now
exists, I can get rid of the _count manipulation.

Just as before this patch, we still keep track of the sk_buffs that the
network layer still hasn't finished yet and cap the resources we use with
a "pool" of skbs.[2]

Now that the block layer maintains the disk stats, the aoe driver's
diskstats function can go away.

1. https://lkml.org/lkml/2007/3/1/374
2. https://lkml.org/lkml/2007/7/6/241

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:25 +0800
896831f59 aoe: kernel thread handles I/O completions for simple locking ... Browse Code »

Make the frames the aoe driver uses to track the relationship between bios
and packets more flexible and detached, so that they can be passed to an
"aoe_ktio" thread for completion of I/O.

The frames are handled much like skbs, with a capped amount of
preallocation so that real-world use cases are likely to run smoothly and
degenerate gracefully even under memory pressure.

Decoupling I/O completion from the receive path and serializing it in a
process makes it easier to think about the correctness of the locking in
the driver, especially in the case of a remote MAC address becoming
unusable.

[dan.carpenter@oracle.com: cleanup an allocation a bit]
Signed-off-by: Ed Cashin
Signed-off-by: Dan Carpenter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:24 +0800
3d5b06051 aoe: for performance support larger packet payloads ... Browse Code »

tAdd adds the ability to work with large packets composed of a number of
segments, using the scatter gather feature of the block layer (biovecs)
and the network layer (skb frag array). The motivation is the performance
gained by using a packet data payload greater than a page size and by
using the network card's scatter gather feature.

Users of the out-of-tree aoe driver already had these changes, but since
early 2011, they have complained of increased memory utilization and
higher CPU utilization during heavy writes.[1] The commit below appears
related, as it disables scatter gather on non-IP protocols inside the
harmonize_features function, even when the NIC supports sg.

commit f01a5236bd4b140198fbcc550f085e8361fd73fa
Author: Jesse Gross
Date: Sun Jan 9 06:23:31 2011 +0000

net offloading: Generalize netif_get_vlan_features().

With that regression in place, transmits always linearize sg AoE packets,
but in-kernel users did not have this patch. Before 2.6.38, though, these
changes were working to allow sg to increase performance.

1. http://www.spinics.net/lists/linux-mm/msg15184.html

Signed-off-by: Ed Cashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ed Cashin
2012-10-06 02:05:24 +0800
a336d2987 nbd: handle discard requests ... Browse Code »

Add discard support to nbd. If the nbd-server supports discard, it will
send NBD_FLAG_SEND_TRIM to the client. The client will then set the flag
in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards
for the device (QUEUE_FLAG_DISCARD).

If discard support is enabled, then when the nbd client system receives a
discard request, this will be passed along to the nbd-server. When the
discard request is received by the nbd-server, it will perform:

fallocate(.. FALLOC_FL_PUNCH_HOLE ..)

To punch a hole in the backend storage, which is no longer needed.

Signed-off-by: Paul Clements
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Clements
2012-10-06 02:05:24 +0800
2f0125088 nbd: add set flags ioctl ... Browse Code »

Add a set-flags ioctl, allowing various option flags to be set on an nbd
device. This allows the nbd-client to set the device flags (to enable
read-only mode, or enable discard support, etc.).

Flags are typically specified by the nbd-server. During the negotiation
phase of the nbd connection, the server sends its flags to the client.
The client then uses NBD_SET_FLAGS to inform the kernel of the options.

Also included is a one-line fix to debug output for the set-timeout ioctl.

Signed-off-by: Paul Clements
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Clements
2012-10-06 02:05:23 +0800

03 Oct, 2012

2 commits

437589a74 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.

The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.

The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.

Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.

Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.

While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.

Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.

After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."

Fix up some fairly trivial conflicts in netfilter uid/git logging code.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...

Linus Torvalds
2012-10-03 02:11:09 +0800
033d9959e Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue changes from Tejun Heo:
"This is workqueue updates for v3.7-rc1. A lot of activities this
round including considerable API and behavior cleanups.

* delayed_work combines a timer and a work item. The handling of the
timer part has always been a bit clunky leading to confusing
cancelation API with weird corner-case behaviors. delayed_work is
updated to use new IRQ safe timer and cancelation now works as
expected.

* Another deficiency of delayed_work was lack of the counterpart of
mod_timer() which led to cancel+queue combinations or open-coded
timer+work usages. mod_delayed_work[_on]() are added.

These two delayed_work changes make delayed_work provide interface
and behave like timer which is executed with process context.

* A work item could be executed concurrently on multiple CPUs, which
is rather unintuitive and made flush_work() behavior confusing and
half-broken under certain circumstances. This problem doesn't
exist for non-reentrant workqueues. While non-reentrancy check
isn't free, the overhead is incurred only when a work item bounces
across different CPUs and even in simulated pathological scenario
the overhead isn't too high.

All workqueues are made non-reentrant. This removes the
distinction between flush_[delayed_]work() and
flush_[delayed_]_work_sync(). The former is now as strong as the
latter and the specified work item is guaranteed to have finished
execution of any previous queueing on return.

* In addition to the various bug fixes, Lai redid and simplified CPU
hotplug handling significantly.

* Joonsoo introduced system_highpri_wq and used it during CPU
hotplug.

There are two merge commits - one to pull in IRQ safe timer from
tip/timers/core and the other to pull in CPU hotplug fixes from
wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."

Fixed a number of trivial conflicts, but the more interesting conflicts
were silent ones where the deprecated interfaces had been used by new
code in the merge window, and thus didn't cause any real data conflicts.

Tejun pointed out a few of them, I fixed a couple more.

* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
workqueue: remove @delayed from cwq_dec_nr_in_flight()
workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
workqueue: use __cpuinit instead of __devinit for cpu callbacks
workqueue: rename manager_mutex to assoc_mutex
workqueue: WORKER_REBIND is no longer necessary for idle rebinding
workqueue: WORKER_REBIND is no longer necessary for busy rebinding
workqueue: reimplement idle worker rebinding
workqueue: deprecate __cancel_delayed_work()
workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
workqueue: use mod_delayed_work() instead of __cancel + queue
workqueue: use irqsafe timer for delayed_work
workqueue: clean up delayed_work initializers and add missing one
workqueue: make deferrable delayed_work initializer names consistent
workqueue: cosmetic whitespace updates for macro definitions
workqueue: deprecate system_nrt[_freezable]_wq
workqueue: deprecate flush[_delayed]_work_sync()
...

Linus Torvalds
2012-10-03 00:54:49 +0800