Eric Lee / smarc-fsl-linux-kernel

17 Feb, 2014

1 commit

3962dfbe2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"We have a small collection of fixes in my for-linus branch.

The big thing that stands out is a revert of a new ioctl. Users
haven't shipped yet in btrfs-progs, and Dave Sterba found a better way
to export the information"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: use right clone root offset for compressed extents
btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105
Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol
Btrfs: fix max_inline mount option
Btrfs: fix a lockdep warning when cleaning up aborted transaction
Revert "btrfs: add ioctl to export size of global metadata reservation"

Linus Torvalds
2014-02-17 03:05:27 +0800

16 Feb, 2014

2 commits

93de4ba86 Btrfs: use right clone root offset for compressed extents ... Browse Code »

For non compressed extents, iterate_extent_inodes() gives us offsets
that take into account the data offset from the file extent items, while
for compressed extents it doesn't. Therefore we have to adjust them before
placing them in a send clone instruction. Not doing this adjustment leads to
the receiving end requesting for a wrong a file range to the clone ioctl,
which results in different file content from the one in the original send
root.

Issue reproducible with the following excerpt from the test I made for
xfstests:

_scratch_mkfs
_scratch_mount "-o compress-force=lzo"

$XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo
$XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo

$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1

$XFS_IO_PROG -c "pwrite -S 0x3e -b 80000 200000 80000" $SCRATCH_MNT/foo
$BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
$XFS_IO_PROG -c "pwrite -S 0xdc -b 10000 250000 10000" $SCRATCH_MNT/foo
$XFS_IO_PROG -c "pwrite -S 0xff -b 10000 300000 10000" $SCRATCH_MNT/foo

# will be used for incremental send to be able to issue clone operations
$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/clones_snap

$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2

$FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
$FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
-x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2
$FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \
-x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2

$BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
$BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap
$BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \
-c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap

_scratch_unmount
_scratch_mkfs
_scratch_mount

$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap
$FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full

$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap
$FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> $seqres.full

$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap
$FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full

Signed-off-by: Filipe David Borba Manana
Signed-off-by: Chris Mason

Filipe David Borba Manana
2014-02-16 00:04:27 +0800
f085381e6 btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105 ... Browse Code »

bdev is null when disk has disappeared and mounted with
the degrade option

stack trace
---------
btrfs_sysfs_add_one+0x105/0x1c0 [btrfs]
open_ctree+0x15f3/0x1fe0 [btrfs]
btrfs_mount+0x5db/0x790 [btrfs]
? alloc_pages_current+0xa4/0x160
mount_fs+0x34/0x1b0
vfs_kern_mount+0x62/0xf0
do_mount+0x22e/0xa80
? __get_free_pages+0x9/0x40
? copy_mount_options+0x31/0x170
SyS_mount+0x7e/0xc0
system_call_fastpath+0x16/0x1b
---------

reproducer:
-------
mkfs.btrfs -draid1 -mraid1 /dev/sdc /dev/sdd
(detach a disk)
devmgt detach /dev/sdc [1]
mount -o degrade /dev/sdd /btrfs
-------

[1] github.com/anajain/devmgt.git

Signed-off-by: Anand Jain
Tested-by: Hidetoshi Seto
Signed-off-by: Chris Mason

Anand Jain
2014-02-16 00:03:09 +0800

15 Feb, 2014

6 commits

3a0dfa6a1 Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol ... Browse Code »

A user was running into errors from an NFS export of a subvolume that had a
default subvol set. When we mount a default subvol we will use d_obtain_alias()
to find an existing dentry for the subvolume in the case that the root subvol
has already been mounted, or a dummy one is allocated in the case that the root
subvol has not already been mounted. This allows us to connect the dentry later
on if we wander into the path. However if we don't ever wander into the path we
will keep DCACHE_DISCONNECTED set for a long time, which angers NFS. It doesn't
appear to cause any problems but it is annoying nonetheless, so simply unset
DCACHE_DISCONNECTED in the get_default_root case and switch btrfs_lookup() to
use d_materialise_unique() instead which will make everything play nicely
together and reconnect stuff if we wander into the defaul subvol path from a
different way. With this patch I'm no longer getting the NFS errors when
exporting a volume that has been mounted with a default subvol set. Thanks,

cc: bfields@fieldses.org
cc: ebiederm@xmission.com
Signed-off-by: Josef Bacik
Acked-by: "Eric W. Biederman"
Signed-off-by: Chris Mason

Josef Bacik
2014-02-15 05:44:32 +0800
feb5f9658 Btrfs: fix max_inline mount option ... Browse Code »

Currently, the only mount option for max_inline that has any effect is
max_inline=0. Any other value that is supplied to max_inline will be
adjusted to a minimum of 4k. Since max_inline has an effective maximum
of ~3900 bytes due to page size limitations, the current behaviour
only has meaning for max_inline=0.

This patch will allow the the max_inline mount option to accept non-zero
values as indicated in the documentation.

Signed-off-by: Mitch Harder
Signed-off-by: Chris Mason

Mitch Harder
2014-02-15 05:44:32 +0800
a9d2d4adb Btrfs: fix a lockdep warning when cleaning up aborted transaction ... Browse Code »

Given now we have 2 spinlock for management of delayed refs,
CONFIG_DEBUG_SPINLOCK=y helped me find this,

[ 4723.413809] BUG: spinlock wrong CPU on CPU#1, btrfs-transacti/2258
[ 4723.414882] lock: 0xffff880048377670, .magic: dead4ead, .owner: btrfs-transacti/2258, .owner_cpu: 2
[ 4723.417146] CPU: 1 PID: 2258 Comm: btrfs-transacti Tainted: G W O 3.12.0+ #4
[ 4723.421321] Call Trace:
[ 4723.421872] [] dump_stack+0x54/0x74
[ 4723.422753] [] spin_dump+0x8c/0x91
[ 4723.424979] [] spin_bug+0x21/0x26
[ 4723.425846] [] do_raw_spin_unlock+0x66/0x90
[ 4723.434424] [] _raw_spin_unlock+0x27/0x40
[ 4723.438747] [] btrfs_cleanup_one_transaction+0x35e/0x710 [btrfs]
[ 4723.443321] [] btrfs_cleanup_transaction+0x104/0x570 [btrfs]
[ 4723.444692] [] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 4723.450336] [] ? trace_hardirqs_on+0xd/0x10
[ 4723.451332] [] transaction_kthread+0x22e/0x270 [btrfs]
[ 4723.452543] [] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
[ 4723.457833] [] kthread+0xea/0xf0
[ 4723.458990] [] ? kthread_create_on_node+0x140/0x140
[ 4723.460133] [] ret_from_fork+0x7c/0xb0
[ 4723.460865] [] ? kthread_create_on_node+0x140/0x140
[ 4723.496521] ------------[ cut here ]------------

----------------------------------------------------------------------

The reason is that we get to call cond_resched_lock(&head_ref->lock) while
still holding @delayed_refs->lock.

So it's different with __btrfs_run_delayed_refs(), where we do drop-acquire
dance before and after actually processing delayed refs.

Here we don't drop the lock, others are not able to add new delayed refs to
head_ref, so cond_resched_lock(&head_ref->lock) is not necessary here.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2014-02-15 05:44:32 +0800
11bcac89c Revert "btrfs: add ioctl to export size of global metadata reservation" ... Browse Code »

This reverts commit 01e219e8069516cdb98594d417b8bb8d906ed30d.

David Sterba found a different way to provide these features without adding a new
ioctl. We haven't released any progs with this ioctl yet, so I'm taking this out
for now until we finalize things.

Signed-off-by: Chris Mason
Signed-off-by: David Sterba
CC: Jeff Mahoney

Chris Mason
2014-02-15 05:42:13 +0800
8ba74517e Merge branch 'for-3.14' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull two nfsd bugfixes from Bruce Fields.

* 'for-3.14' of git://linux-nfs.org/~bfields/linux:
lockd: send correct lock when granting a delayed lock.
nfsd4: fix acl buffer overrun

Linus Torvalds
2014-02-15 04:48:46 +0800
5e57dc811 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block IO fixes from Jens Axboe:
"Second round of updates and fixes for 3.14-rc2. Most of this stuff
has been queued up for a while. The notable exception is the blk-mq
changes, which are naturally a bit more in flux still.

The pull request contains:

- Two bug fixes for the new immutable vecs, causing crashes with raid
or swap. From Kent.

- Various blk-mq tweaks and fixes from Christoph. A fix for
integrity bio's from Nic.

- A few bcache fixes from Kent and Darrick Wong.

- xen-blk{front,back} fixes from David Vrabel, Matt Rushton, Nicolas
Swenson, and Roger Pau Monne.

- Fix for a vec miscount with integrity vectors from Martin.

- Minor annotations or fixes from Masanari Iida and Rashika Kheria.

- Tweak to null_blk to do more normal FIFO processing of requests
from Shlomo Pongratz.

- Elevator switching bypass fix from Tejun.

- Softlockup in blkdev_issue_discard() fix when !CONFIG_PREEMPT from
me"

* 'for-linus' of git://git.kernel.dk/linux-block: (31 commits)
block: add cond_resched() to potentially long running ioctl discard loop
xen-blkback: init persistent_purge_work work_struct
blk-mq: pair blk_mq_start_request / blk_mq_requeue_request
blk-mq: dont assume rq->errors is set when returning an error from ->queue_rq
block: Fix cloning of discard/write same bios
block: Fix type mismatch in ssize_t_blk_mq_tag_sysfs_show
blk-mq: rework flush sequencing logic
null_blk: use blk_complete_request and blk_mq_complete_request
virtio_blk: use blk_mq_complete_request
blk-mq: rework I/O completions
fs: Add prototype declaration to appropriate header file include/linux/bio.h
fs: Mark function as static in fs/bio-integrity.c
block/null_blk: Fix completion processing from LIFO to FIFO
block: Explicitly handle discard/write same segments
block: Fix nr_vecs for inline integrity vectors
blk-mq: Add bio_integrity setup to blk_mq_make_request
blk-mq: initialize sg_reserved_size
blk-mq: handle dma_drain_size
blk-mq: divert __blk_put_request for MQ ops
blk-mq: support at_head inserations for blk_execute_rq
...

Linus Torvalds
2014-02-15 02:45:18 +0800

14 Feb, 2014

1 commit

2ec197db1 lockd: send correct lock when granting a delayed lock. ... Browse Code »

If an NFS client attempts to get a lock (using NLM) and the lock is
not available, the server will remember the request and when the lock
becomes available it will send a GRANT request to the client to
provide the lock.

If the client already held an adjacent lock, the GRANT callback will
report the union of the existing and new locks, which can confuse the
client.

This happens because __posix_lock_file (called by vfs_lock_file)
updates the passed-in file_lock structure when adjacent or
over-lapping locks are found.

To avoid this problem we take a copy of the two fields that can
be changed (fl_start and fl_end) before the call and restore them
afterwards.
An alternate would be to allocate a 'struct file_lock', initialise it,
use locks_copy_lock() to take a copy, then locks_release_private()
after the vfs_lock_file() call. But that is a lot more work.

Reported-by: Olaf Kirch
Signed-off-by: NeilBrown
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields

--
v1 had a couple of issues (large on-stack struct and didn't really work properly).
This version is much better tested.
Signed-off-by: J. Bruce Fields

NeilBrown
2014-02-14 03:55:02 +0800

12 Feb, 2014

1 commit

09bdc2d70 nfsd4: fix acl buffer overrun ... Browse Code »

4ac7249ea5a0ceef9f8269f63f33cc873c3fac61 "nfsd: use get_acl and
->set_acl" forgets to set the size in the case get_acl() succeeds, so
_posix_to_nfsv4_one() can then write past the end of its allocation.
Symptoms were slab corruption warnings.

Also, some minor cleanup while we're here. (Among other things, note
that the first few lines guarantee that pacl is non-NULL.)

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2014-02-12 02:48:11 +0800

11 Feb, 2014

10 commits

8423ae3d7 block: Fix cloning of discard/write same bios ... Browse Code »

Immutable biovecs changed the way bio segments are treated in such a way that
bio_for_each_segment() cannot now do what we want for discard/write same bios,
since bi_size means something completely different for them.

Fortunately discard and write same bios never have more than a single biovec, so
bio_for_each_segment() is unnecessary and not terribly meaningful for them, but
we still have to special case them in a few places.

Signed-off-by: Kent Overstreet
Tested-by: Richard W.M. Jones
Signed-off-by: Jens Axboe

Kent Overstreet
2014-02-11 23:40:45 +0800
6792dfe38 Merge branch 'akpm' (patches from Andrew Morton) ... Browse Code »

Merge misc fixes from Andrew Morton:
"A bunch of fixes"

* emailed patches fron Andrew Morton :
ocfs2: check existence of old dentry in ocfs2_link()
ocfs2: update inode size after zeroing the hole
ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size
mm/memory-failure.c: move refcount only in !MF_COUNT_INCREASED
smp.h: fix x86+cpu.c sparse warnings about arch nonboot CPU calls
mm: fix page leak at nfs_symlink()
slub: do not assert not having lock in removing freed partial
gitignore: add all.config
ocfs2: fix ocfs2_sync_file() if filesystem is readonly
drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zero
fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem
xen: properly account for _PAGE_NUMA during xen pte translations
mm/slub.c: list_lock may not be held in some circumstances
drivers/md/bcache/extents.c: use %zi to format size_t
vmcore: prevent PT_NOTE p_memsz overflow during header update
drivers/message/i2o/i2o_config.c: fix deadlock in compat_ioctl(I2OGETIOPS)
Documentation/: update 00-INDEX files
checkpatch: fix detection of git repository
get_maintainer: fix detection of git repository
drivers/misc/sgi-gru/grukdump.c: unlocking should be conditional in gru_dump_context()

Linus Torvalds
2014-02-11 08:03:16 +0800
0e048316f ocfs2: check existence of old dentry in ocfs2_link() ... Browse Code »

System call linkat first calls user_path_at(), check the existence of
old dentry, and then calls vfs_link()->ocfs2_link() to do the actual
work. There may exist a race when Node A create a hard link for file
while node B rm it.

Node A Node B
user_path_at()
->ocfs2_lookup(),
find old dentry exist
rm file, add inode say inodeA
to orphan_dir

call ocfs2_link(),create a
hard link for inodeA.

rm the link, add inodeA to orphan_dir
again

When orphan_scan work start, it calls ocfs2_queue_orphans() to do the
main work. It first tranverses entrys in orphan_dir, linking all inodes
in this orphan_dir to a list look like this:

inodeA->inodeB->...->inodeA

When tranvering this list, it will fall into loop, calling iput() again
and again. And finally trigger BUG_ON(inode->i_state & I_CLEAR).

Signed-off-by: joyce
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2014-02-11 08:01:43 +0800
c7d2cbc36 ocfs2: update inode size after zeroing the hole ... Browse Code »

fs-writeback will release the dirty pages without page lock whose offset
are over inode size, the release happens at
block_write_full_page_endio(). If not update, dirty pages in file holes
may be released before flushed to the disk, then file holes will contain
some non-zero data, this will cause sparse file md5sum error.

To reproduce the bug, find a big sparse file with many holes, like vm
image file, its actual size should be bigger than available mem size to
make writeback work more frequently, tar it with -S option, then keep
untar it and check its md5sum again and again until you get a wrong
md5sum.

Signed-off-by: Junxiao Bi
Cc: Younger Liu
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2014-02-11 08:01:43 +0800
d62e74be1 ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size ... Browse Code »

The issue scenario is as following:

- Create a small file and fallocate a large disk space for a file with
FALLOC_FL_KEEP_SIZE option.

- ftruncate the file back to the original size again. but the disk free
space is not changed back. This is a real bug that be fixed in this
patch.

In order to solve the issue above, we modified ocfs2_setattr(), if
attr->ia_size != i_size_read(inode), It calls ocfs2_truncate_file(), and
truncate disk space to attr->ia_size.

Signed-off-by: Younger Liu
Reviewed-by: Jie Liu
Tested-by: Jie Liu
Cc: Joel Becker
Reviewed-by: Mark Fasheh
Cc: Sunil Mushran
Reviewed-by: Jensen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Younger Liu
2014-02-11 08:01:43 +0800
a0b54adda mm: fix page leak at nfs_symlink() ... Browse Code »

Changes in commit a0b8cab3b9b2 ("mm: remove lru parameter from
__pagevec_lru_add and remove parts of pagevec API") have introduced a
call to add_to_page_cache_lru() which causes a leak in nfs_symlink() as
now the page gets an extra refcount that is not dropped.

Jan Stancek observed and reported the leak effect while running test8
from Connectathon Testsuite. After several iterations over the test
case, which creates several symlinks on a NFS mountpoint, the test
system was quickly getting into an out-of-memory scenario.

This patch fixes the page leak by dropping that extra refcount
add_to_page_cache_lru() is grabbing.

Signed-off-by: Jan Stancek
Signed-off-by: Rafael Aquini
Acked-by: Mel Gorman
Acked-by: Rik van Riel
Cc: Jeff Layton
Cc: Trond Myklebust
Cc: [3.11.x+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael Aquini
2014-02-11 08:01:42 +0800
a987c7ca7 ocfs2: fix ocfs2_sync_file() if filesystem is readonly ... Browse Code »

If filesystem is readonly, there is no need to flush drive's caches or
force any uncommitted transactions.

[akpm@linux-foundation.org: return -EROFS, not 0]
Signed-off-by: Younger Liu
Cc: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Younger Liu
2014-02-11 08:01:42 +0800
96c7a2ff2 fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem ... Browse Code »

Recently due to a spike in connections per second memcached on 3
separate boxes triggered the OOM killer from accept. At the time the
OOM killer was triggered there was 4GB out of 36GB free in zone 1. The
problem was that alloc_fdtable was allocating an order 3 page (32KiB) to
hold a bitmap, and there was sufficient fragmentation that the largest
page available was 8KiB.

I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious
but I do agree that order 3 allocations are very likely to succeed.

There are always pathologies where order > 0 allocations can fail when
there are copious amounts of free memory available. Using the pigeon
hole principle it is easy to show that it requires 1 page more than 50%
of the pages being free to guarantee an order 1 (8KiB) allocation will
succeed, 1 page more than 75% of the pages being free to guarantee an
order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of
the pages being free to guarantee an order 3 allocate will succeed.

A server churning memory with a lot of small requests and replies like
memcached is a common case that if anything can will skew the odds
against large pages being available.

Therefore let's not give external applications a practical way to kill
linux server applications, and specify __GFP_NORETRY to the kmalloc in
alloc_fdmem. Unless I am misreading the code and by the time the code
reaches should_alloc_retry in __alloc_pages_slowpath (where
__GFP_NORETRY becomes signification). We have already tried everything
reasonable to allocate a page and the only thing left to do is wait. So
not waiting and falling back to vmalloc immediately seems like the
reasonable thing to do even if there wasn't a chance of triggering the
OOM killer.

Signed-off-by: "Eric W. Biederman"
Cc: Eric Dumazet
Acked-by: David Rientjes
Cc: Cong Wang
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2014-02-11 08:01:41 +0800
38dfac843 vmcore: prevent PT_NOTE p_memsz overflow during header update ... Browse Code »

Currently, update_note_header_size_elf64() and
update_note_header_size_elf32() will add the size of a PT_NOTE entry to
real_sz even if that causes real_sz to exceeds max_sz. This patch
corrects the while loop logic in those routines to ensure that does not
happen and prints a warning if a PT_NOTE entry is dropped. If zero
PT_NOTE entries are found or this condition is encountered because the
only entry was dropped, a warning is printed and an error is returned.

One possible negative side effect of exceeding the max_sz limit is an
allocation failure in merge_note_headers_elf64() or
merge_note_headers_elf32() which would produce console output such as
the following while booting the crash kernel.

vmalloc: allocation failure: 14076997632 bytes
swapper/0: page allocation failure: order:0, mode:0x80d2
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-gbp1 #7
Call Trace:
dump_stack+0x19/0x1b
warn_alloc_failed+0xf0/0x160
__vmalloc_node_range+0x19e/0x250
vmalloc_user+0x4c/0x70
merge_note_headers_elf64.constprop.9+0x116/0x24a
vmcore_init+0x2d4/0x76c
do_one_initcall+0xe2/0x190
kernel_init_freeable+0x17c/0x207
kernel_init+0xe/0x180
ret_from_fork+0x7c/0xb0

Kdump: vmcore not initialized

kdump: dump target is /dev/sda4
kdump: saving to /sysroot//var/crash/127.0.0.1-2014.01.28-13:58:52/
kdump: saving vmcore-dmesg.txt
Cannot open /proc/vmcore: No such file or directory
kdump: saving vmcore-dmesg.txt failed
kdump: saving vmcore
kdump: saving vmcore failed

This type of failure has been seen on a four socket prototype system
with certain memory configurations. Most PT_NOTE sections have a single
entry similar to:

n_namesz = 0x5
n_descsz = 0x150
n_type = 0x1

Occasionally, a second entry is encountered with very large n_namesz and
n_descsz sizes:

n_namesz = 0x80000008
n_descsz = 0x510ae163
n_type = 0x80000008

Not yet sure of the source of these extra entries, they seem bogus, but
they shouldn't cause crash dump to fail.

Signed-off-by: Greg Pearson
Acked-by: Vivek Goyal
Cc: HATAYAMA Daisuke
Cc: Michael Holzheu
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Pearson
2014-02-11 08:01:40 +0800
cbf2822a7 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull CIFS fixes from Steve French:
"Small fix from Jeff for writepages leak, and some fixes for ACLs and
xattrs when SMB2 enabled.

Am expecting another fix from Jeff and at least one more fix (for
mounting SMB2 with cifsacl) in the next week"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
[CIFS] clean up page array when uncached write send fails
cifs: use a flexarray in cifs_writedata
retrieving CIFS ACLs when mounted with SMB2 fails dropping session
Add protocol specific operation for CIFS xattrs

Linus Torvalds
2014-02-11 02:33:50 +0800

10 Feb, 2014

4 commits

f94aa7c7f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:
"A couple of fixes, both -stable fodder. The O_SYNC bug is fairly
old..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix a kmap leak in virtio_console
fix O_SYNC|O_APPEND syncing the wrong range on write()

Linus Torvalds
2014-02-10 10:12:07 +0800
0b4ef8de0 fs: Mark function as static in fs/bio-integrity.c ... Browse Code »

Mark functions as static in bio-integrity.c because it is not used
outside this file.

This eliminates the following warnings in bio-integrity.c:
fs/bio-integrity.c:224:5: warning: no previous prototype for ‘bio_integrity_tag’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria
Reviewed-by: Josh Triplett
Signed-off-by: Jens Axboe

Rashika Kheria
2014-02-10 04:56:23 +0800
d311d79de fix O_SYNC|O_APPEND syncing the wrong range on write() ... Browse Code »

It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
when sync_page_range() had been introduced; generic_file_write{,v}() correctly
synced
pos_after_write - written .. pos_after_write - 1
but generic_file_aio_write() synced
pos_before_write .. pos_before_write + written - 1
instead. Which is not the same thing with O_APPEND, obviously.
A couple of years later correct variant had been killed off when
everything switched to use of generic_file_aio_write().

All users of generic_file_aio_write() are affected, and the same bug
has been copied into other instances of ->aio_write().

The fix is trivial; the only subtle point is that generic_write_sync()
ought to be inlined to avoid calculations useless for the majority of
calls.

Signed-off-by: Al Viro

Al Viro
2014-02-10 04:18:09 +0800
9c1db7798 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"This is a small collection of fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix data corruption when reading/updating compressed extents
Btrfs: don't loop forever if we can't run because of the tree mod log
btrfs: reserve no transaction units in btrfs_ioctl_set_features
btrfs: commit transaction after setting label and features
Btrfs: fix assert screwup for the pending move stuff

Linus Torvalds
2014-02-10 03:12:26 +0800

09 Feb, 2014

7 commits

a2aa75e18 Btrfs: fix data corruption when reading/updating compressed extents ... Browse Code »

When using a mix of compressed file extents and prealloc extents, it
is possible to fill a page of a file with random, garbage data from
some unrelated previous use of the page, instead of a sequence of zeroes.

A simple sequence of steps to get into such case, taken from the test
case I made for xfstests, is:

_scratch_mkfs
_scratch_mount "-o compress-force=lzo"
$XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar
$XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar
$XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar

This results in the following file items in the fs tree:

item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160
inode generation 6 transid 6 size 542872 block group 0 mode 100600
item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16
inode ref index 2 namelen 6 name: foobar
item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
extent data disk byte 0 nr 0 gen 6
extent data offset 0 nr 24576 ram 266240
extent compression 0
item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53
prealloc data disk byte 12849152 nr 241664 gen 6
prealloc data offset 0 nr 241664
item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53
extent data disk byte 12845056 nr 4096 gen 6
extent data offset 0 nr 20480 ram 20480
extent compression 2
item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53
prealloc data disk byte 13090816 nr 405504 gen 6
prealloc data offset 0 nr 258048

The on disk extent at offset 266240 (which corresponds to 1 single disk block),
contains 5 compressed chunks of file data. Each of the first 4 compress 4096
bytes of file data, while the last one only compresses 3024 bytes of file data.
Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 =
1072 bytes) should always return zeroes (our next extent is a prealloc one).

The solution here is the compression code path to zero the remaining (untouched)
bytes of the last page it uncompressed data into, as the information about how
much space the file data consumes in the last page is not known in the upper layer
fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing
the remainder of the page but only if it corresponds to the last page of the inode
and if the inode's size is not a multiple of the page size.

This would cause not only returning random data on reads, but also permanently
storing random data when updating parts of the region that should be zeroed.
For the example above, it means updating a single byte in the region [285648 ; 286720[
would store that byte correctly but also store random data on disk.

A test case for xfstests follows soon.

Signed-off-by: Filipe David Borba Manana
Signed-off-by: Chris Mason

Filipe David Borba Manana
2014-02-09 09:57:15 +0800
27a377db7 Btrfs: don't loop forever if we can't run because of the tree mod log ... Browse Code »

A user reported a 100% cpu hang with my new delayed ref code. Turns out I
forgot to increase the count check when we can't run a delayed ref because of
the tree mod log. If we can't run any delayed refs during this there is no
point in continuing to look, and we need to break out. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2014-02-09 09:57:15 +0800
8051aa1a3 btrfs: reserve no transaction units in btrfs_ioctl_set_features ... Browse Code »

Added in patch "btrfs: add ioctls to query/change feature bits online"
modifications to superblock don't need to reserve metadata blocks when
starting a transaction.

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2014-02-09 09:57:15 +0800
d0270aca8 btrfs: commit transaction after setting label and features ... Browse Code »

The set_fslabel ioctl uses btrfs_end_transaction, which means it's
possible that the change will be lost if the system crashes, same for
the newly set features. Let's use btrfs_commit_transaction instead.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba
Signed-off-by: Chris Mason

Jeff Mahoney
2014-02-09 09:57:15 +0800
6cc98d90f Btrfs: fix assert screwup for the pending move stuff ... Browse Code »

Wang noticed that he was failing btrfs/030 even though me and Filipe couldn't
reproduce. Turns out this is because Wang didn't have CONFIG_BTRFS_ASSERT set,
which meant that a key part of Filipe's original patch was not being built in.
This appears to be a mess up with merging Filipe's patch as it does not exist in
his original patch. Fix this by changing how we make sure del_waiting_dir_move
asserts that it did not error and take the function out of the ifdef check.
This makes btrfs/030 pass with the assert on or off. Thanks,

Signed-off-by: Josef Bacik
Reviewed-by: Filipe Manana
Signed-off-by: Chris Mason

Josef Bacik
2014-02-09 09:57:15 +0800
ec2e6cb24 Merge tag 'jfs-3.14-rc2' of git://github.com/kleikamp/linux-shaggy ... Browse Code »

Pull jfs fix from David Kleikamp:
"Fix regression"

* tag 'jfs-3.14-rc2' of git://github.com/kleikamp/linux-shaggy:
jfs: fix generic posix ACL regression

Linus Torvalds
2014-02-09 02:13:47 +0800
c18f7b512 jfs: fix generic posix ACL regression ... Browse Code »

I missed a couple errors in reviewing the patches converting jfs
to use the generic posix ACL function. Setting ACL's currently
fails with -EOPNOTSUPP.

Signed-off-by: Dave Kleikamp
Reported-by: Michael L. Semon
Reviewed-by: Christoph Hellwig

Dave Kleikamp
2014-02-09 00:50:58 +0800

08 Feb, 2014

6 commits

4a5c80d7b [CIFS] clean up page array when uncached write send fails ... Browse Code »

In the event that a send fails in an uncached write, or we end up
needing to reissue it (-EAGAIN case), we'll kfree the wdata but
the pages currently leak.

Fix this by adding a new kref release routine for uncached writedata
that releases the pages, and have the uncached codepaths use that.

[original patch by Jeff modified to fix minor formatting problems]

Signed-off-by: Jeff Layton
Reviewed-by: Pavel Shilovsky
Signed-off-by: Steve French

Steve French
2014-02-08 10:47:00 +0800
26c8f0d60 cifs: use a flexarray in cifs_writedata ... Browse Code »

The cifs_writedata code uses a single element trailing array, which
just adds unneeded complexity. Use a flexarray instead.

Signed-off-by: Jeff Layton
Reviewed-by: Pavel Shilovsky
Signed-off-by: Steve French

Jeff Layton
2014-02-08 10:38:29 +0800
34a9bff4a Merge tag 'driver-core-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core fix from Greg KH:
"Here is a single kernfs fix to resolve a much-reported lockdep issue
with the removal of entries in sysfs"

* tag 'driver-core-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag

Linus Torvalds
2014-02-08 06:17:18 +0800
087787959 block: Fix nr_vecs for inline integrity vectors ... Browse Code »

Commit 9f060e2231ca changed the way we handle allocations for the
integrity vectors. When the vectors are inline there is no associated
slab and consequently bvec_nr_vecs() returns 0. Ensure that we check
against BIP_INLINE_VECS in that case.

Reported-by: David Milburn
Tested-by: David Milburn
Cc: stable@vger.kernel.org # v3.10+
Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Martin K. Petersen
2014-02-08 04:49:48 +0800
83e3bc23e retrieving CIFS ACLs when mounted with SMB2 fails dropping session ... Browse Code »

The get/set ACL xattr support for CIFS ACLs attempts to send old
cifs dialect protocol requests even when mounted with SMB2 or later
dialects. Sending cifs requests on an smb2 session causes problems -
the server drops the session due to the illegal request.

This patch makes CIFS ACL operations protocol specific to fix that.

Attempting to query/set CIFS ACLs for SMB2 will now return
EOPNOTSUPP (until we add worker routines for sending query
ACL requests via SMB2) instead of sending invalid (cifs)
requests.

A separate followon patch will be needed to fix cifs_acl_to_fattr
(which takes a cifs specific u16 fid so can't be abstracted
to work with SMB2 until that is changed) and will be needed
to fix mount problems when "cifsacl" is specified on mount
with e.g. vers=2.1

Signed-off-by: Steve French
Reviewed-by: Shirish Pargaonkar
CC: Stable

Steve French
2014-02-08 01:08:17 +0800
d979f3b0a Add protocol specific operation for CIFS xattrs ... Browse Code »

Changeset 666753c3ef8fc88b0ddd5be4865d0aa66428ac35 added protocol
operations for get/setxattr to avoid calling cifs operations
on smb2/smb3 mounts for xattr operations and this changeset
adds the calls to cifs specific protocol operations for xattrs
(in order to reenable cifs support for xattrs which was
temporarily disabled by the previous changeset. We do not
have SMB2/SMB3 worker function for setting xattrs yet so
this only enables it for cifs.

CCing stable since without these two small changsets (its
small coreq 666753c3ef8fc88b0ddd5be4865d0aa66428ac35 is
also needed) calling getfattr/setfattr on smb2/smb3 mounts
causes problems.

Signed-off-by: Steve French
Reviewed-by: Shirish Pargaonkar
CC: Stable

Steve French
2014-02-08 01:08:15 +0800

07 Feb, 2014

2 commits

227d53b39 mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq ... Browse Code »

To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
During aio buffer migration, we have a possibility to see the following
call stack.

aio_migratepage [disable interrupt]
migrate_page_copy
clear_page_dirty_for_io
set_page_dirty
__set_page_dirty_buffers
__set_page_dirty
spin_lock_irq

This mean, current aio migration is a deadlockable. spin_lock_irqsave
is a safer alternative and we should use it.

Signed-off-by: KOSAKI Motohiro
Reported-by: David Rientjes rientjes@google.com>
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2014-02-07 05:48:51 +0800
fb951eb5e ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters ... Browse Code »

Even if using the same jbd2 handle, we cannot rollback a transaction.
So once some error occurs after successfully allocating clusters, the
allocated clusters will never be used and it means they are lost. For
example, call ocfs2_claim_clusters successfully when expanding a file,
but failed in ocfs2_insert_extent. So we need free the allocated
clusters if they are not used indeed.

Signed-off-by: Zongxun Wang
Signed-off-by: Joseph Qi
Acked-by: Joel Becker
Cc: Mark Fasheh
Cc: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zongxun Wang
2014-02-07 05:48:51 +0800