Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

20 Apr, 2014

1 commit

404ca80eb coredump: fix va_list corruption ... Browse Code »
5

A va_list needs to be copied in case it needs to be used twice.

Thanks to Hugh for debugging this issue, leading to various panics.

Tested:

lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern

'produce_core' is simply : main() { *(int *)0 = 1;}

lpq84:~# ./produce_core
Segmentation fault (core dumped)
lpq84:~# dmesg | tail -1
[ 614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed

Notice the last argument was replaced by a NULL (we were lucky enough to
not crash, but do not try this on your production machine !)

After fix :

lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
lpq83:~# ./produce_core
Segmentation fault
lpq83:~# dmesg | tail -1
[ 740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed

Fixes: 5fe9d8ca21cc ("coredump: cn_vprintf() has no reason to call vsnprintf() twice")
Signed-off-by: Eric Dumazet
Diagnosed-by: Hugh Dickins
Acked-by: Oleg Nesterov
Cc: Neil Horman
Cc: Andrew Morton
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Linus Torvalds

Eric Dumazet
2014-04-20 04:23:31 +0800

19 Apr, 2014

2 commits

6e66d5dab Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull cifs fixes from Steve French:
"A set of 5 small cifs fixes"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cif: fix dead code
cifs: fix error handling cifs_user_readv
fs: cifs: remove unused variable.
Return correct error on query of xattr on file with empty xattrs
cifs: Wait for writebacks to complete before attempting write.

Linus Torvalds
2014-04-19 08:52:39 +0800
60fbf2bda Merge tag 'driver-core-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core fixes from Greg KH:
"Here are some driver core fixes for 3.15-rc2. Also in here are some
documentation updates, as well as an API removal that had to wait for
after -rc1 due to the cleanups coming into you from multiple developer
trees (this one and the PPC tree.)

All have been in linux next successfully"

* tag 'driver-core-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
drivers/base/dd.c incorrect pr_debug() parameters
Documentation: Update stable address in Chinese and Japanese translations
topology: Fix compilation warning when not in SMP
Chinese: add translation of io_ordering.txt
stable_kernel_rules: spelling/word usage
sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()
kernfs: protect lazy kernfs_iattrs allocation with mutex
fs: Don't return 0 from get_anon_bdev

Linus Torvalds
2014-04-19 07:59:52 +0800

17 Apr, 2014

14 commits

1f80c0cc3 cif: fix dead code ... Browse Code »

This issue was found by Coverity (CID 1202536)

This proposes a fix for a statement that creates dead code.
The "rc < 0" statement is within code that is run
with "rc > 0".

It seems like "err < 0" was meant to be used here.
This way, the error code is returned by the function.

Signed-off-by: Michael Opdenacker
Acked-by: Al Viro
Signed-off-by: Steve French

Michael Opdenacker
2014-04-17 12:08:57 +0800
bae9f746a cifs: fix error handling cifs_user_readv ... Browse Code »

Coverity says:

*** CID 1202537: Dereference after null check (FORWARD_NULL)
/fs/cifs/file.c: 2873 in cifs_user_readv()
2867 cur_len = min_t(const size_t, len - total_read, cifs_sb->rsize);
2868 npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
2869
2870 /* allocate a readdata struct */
2871 rdata = cifs_readdata_alloc(npages,
2872 cifs_uncached_readv_complete);
>>> CID 1202537: Dereference after null check (FORWARD_NULL)
>>> Comparing "rdata" to null implies that "rdata" might be null.
2873 if (!rdata) {
2874 rc = -ENOMEM;
2875 goto error;
2876 }
2877
2878 rc = cifs_read_allocate_pages(rdata, npages);

...when we "goto error", rc will be non-zero, and then we end up trying
to do a kref_put on the rdata (which is NULL). Fix this by replacing
the "goto error" with a "break".

Reported-by:
Signed-off-by: Jeff Layton
Signed-off-by: Steve French

Jeff Layton
2014-04-17 11:54:30 +0800
330033d69 xfs: fix tmpfile/selinux deadlock and initialize security ... Browse Code »
13

xfstests generic/004 reproduces an ilock deadlock using the tmpfile
interface when selinux is enabled. This occurs because
xfs_create_tmpfile() takes the ilock and then calls d_tmpfile(). The
latter eventually calls into xfs_xattr_get() which attempts to get the
lock again. E.g.:

xfs_io D ffffffff81c134c0 4096 3561 3560 0x00000080
ffff8801176a1a68 0000000000000046 ffff8800b401b540 ffff8801176a1fd8
00000000001d5800 00000000001d5800 ffff8800b401b540 ffff8800b401b540
ffff8800b73a6bd0 fffffffeffffffff ffff8800b73a6bd8 ffff8800b5ddb480
Call Trace:
[] schedule+0x29/0x70
[] rwsem_down_read_failed+0xc5/0x120
[] ? xfs_ilock_attr_map_shared+0x1f/0x50 [xfs]
[] call_rwsem_down_read_failed+0x14/0x30
[] ? down_read_nested+0x89/0xa0
[] ? xfs_ilock+0x122/0x250 [xfs]
[] xfs_ilock+0x122/0x250 [xfs]
[] xfs_ilock_attr_map_shared+0x1f/0x50 [xfs]
[] xfs_attr_get+0x90/0xe0 [xfs]
[] xfs_xattr_get+0x37/0x50 [xfs]
[] generic_getxattr+0x4f/0x70
[] inode_doinit_with_dentry+0x1ae/0x650
[] selinux_d_instantiate+0x1c/0x20
[] security_d_instantiate+0x1b/0x30
[] d_instantiate+0x50/0x70
[] d_tmpfile+0xb5/0xc0
[] xfs_create_tmpfile+0x362/0x410 [xfs]
[] xfs_vn_tmpfile+0x18/0x20 [xfs]
[] path_openat+0x228/0x6a0
[] ? sched_clock+0x9/0x10
[] ? kvm_clock_read+0x27/0x40
[] ? __alloc_fd+0xaf/0x1f0
[] do_filp_open+0x3a/0x90
[] ? _raw_spin_unlock+0x27/0x40
[] ? __alloc_fd+0xaf/0x1f0
[] do_sys_open+0x12e/0x210
[] SyS_open+0x1e/0x20
[] system_call_fastpath+0x16/0x1b

xfs_vn_tmpfile() also fails to initialize security on the newly created
inode.

Pull the d_tmpfile() call up into xfs_vn_tmpfile() after the transaction
has been committed and the inode unlocked. Also, initialize security on
the inode based on the parent directory provided via the tmpfile call.

Signed-off-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Brian Foster
2014-04-17 06:15:30 +0800
8d6c12101 xfs: fix buffer use after free on IO error ... Browse Code »

When testing exhaustion of dm snapshots, the following appeared
with CONFIG_DEBUG_OBJECTS_FREE enabled:

ODEBUG: free active (active state 0) object type: work_struct hint: xfs_buf_iodone_work+0x0/0x1d0 [xfs]

indicating that we'd freed a buffer which still had a pending reference,
down this path:

[ 190.867975] [] debug_check_no_obj_freed+0x22b/0x270
[ 190.880820] [] kmem_cache_free+0xd0/0x370
[ 190.892615] [] xfs_buf_free+0xe4/0x210 [xfs]
[ 190.905629] [] xfs_buf_rele+0xe7/0x270 [xfs]
[ 190.911770] [] xfs_trans_read_buf_map+0x7b6/0xac0 [xfs]

At issue is the fact that if IO fails in xfs_buf_iorequest,
we'll queue completion unconditionally, and then call
xfs_buf_rele; but if IO failed, there are no IOs remaining,
and xfs_buf_rele will free the bp while work is still queued.

Fix this by not scheduling completion if the buffer has
an error on it; run it immediately. The rest is only comment
changes.

Thanks to dchinner for spotting the root cause.

Signed-off-by: Eric Sandeen
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Eric Sandeen
2014-04-17 06:15:28 +0800
07d5035a2 xfs: wrong error sign conversion during failed DIO writes ... Browse Code »

We negate the error value being returned from a generic function
incorrectly. The code path that it is running in returned negative
errors, so there is no need to negate it to get the correct error
signs here.

This was uncovered by generic/019.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2014-04-17 06:15:27 +0800
9c23eccc1 xfs: unmount does not wait for shutdown during unmount ... Browse Code »

And interesting situation can occur if a log IO error occurs during
the unmount of a filesystem. The cases reported have the same
signature - the update of the superblock counters fails due to a log
write IO error:

XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa08a44a1
XFS (dm-16): Log I/O Error Detected. Shutting down filesystem
XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount.
XFS (dm-16): xfs_log_force: error 5 returned.
XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)

It can be seen that the last line of output contains a corrupt
device name - this is because the log and xfs_mount structures have
already been freed by the time this message is printed. A kernel
oops closely follows.

The issue is that the shutdown is occurring in a separate IO
completion thread to the unmount. Once the shutdown processing has
started and all the iclogs are marked with XLOG_STATE_IOERROR, the
log shutdown code wakes anyone waiting on a log force so they can
process the shutdown error. This wakes up the unmount code that
is doing a synchronous transaction to update the superblock
counters.

The unmount path now sees all the iclogs are marked with
XLOG_STATE_IOERROR and so never waits on them again, knowing that if
it does, there will not be a wakeup trigger for it and we will hang
the unmount if we do. Hence the unmount runs through all the
remaining code and frees all the filesystem structures while the
xlog_iodone() is still processing the shutdown. When the log
shutdown processing completes, xfs_do_force_shutdown() emits the
"Please umount the filesystem and rectify the problem(s)" message,
and xlog_iodone() then aborts all the objects attached to the iclog.
An iclog that has already been freed....

The real issue here is that there is no serialisation point between
the log IO and the unmount. We have serialisations points for log
writes, log forces, reservations, etc, but we don't actually have
any code that wakes for log IO to fully complete. We do that for all
other types of object, so why not iclogbufs?

Well, it turns out that we can easily do this. We've got xfs_buf
handles, and that's what everyone else uses for IO serialisation.
i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
release the lock in xlog_iodone() when we are finished with the
buffer. That way before we tear down the iclog, we can lock and
unlock the buffer to ensure IO completion has finished completely
before we tear it down.

Signed-off-by: Dave Chinner
Tested-by: Mike Snitzer
Tested-by: Bob Mastors
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Dave Chinner
2014-04-17 06:15:26 +0800
d39a2ced0 xfs: collapse range is delalloc challenged ... Browse Code »

FSX has been detecting data corruption after to collapse range
calls. The key observation is that the offset of the last extent in
the file was not being shifted, and hence when the file size was
adjusted it was truncating away data because the extents handled
been correctly shifted.

Tracing indicated that before the collapse, the extent list looked
like:

....
ino 0x5788 state idx 6 offset 26 block 195904 count 10 flag 0
ino 0x5788 state idx 7 offset 39 block 195917 count 35 flag 0
ino 0x5788 state idx 8 offset 86 block 195964 count 32 flag 0

and after the shift of 2 blocks:

ino 0x5788 state idx 6 offset 24 block 195904 count 10 flag 0
ino 0x5788 state idx 7 offset 37 block 195917 count 35 flag 0
ino 0x5788 state idx 8 offset 86 block 195964 count 32 flag 0

Note that the last extent did not change offset. After the changing
of the file size:

ino 0x5788 state idx 6 offset 24 block 195904 count 10 flag 0
ino 0x5788 state idx 7 offset 37 block 195917 count 35 flag 0
ino 0x5788 state idx 8 offset 86 block 195964 count 30 flag 0

You can see that the last extent had it's length truncated,
indicating that we've lost data.

The reason for this is that the xfs_bmap_shift_extents() loop uses
XFS_IFORK_NEXTENTS() to determine how many extents are in the inode.
This, unfortunately, doesn't take into account delayed allocation
extents - it's a count of physically allocated extents - and hence
when the file being collapsed has a delalloc extent like this one
does prior to the range being collapsed:

....
ino 0x5788 state idx 4 offset 11 block 4503599627239429 count 1 flag 0
....

it gets the count wrong and terminates the shift loop early.

Fix it by using the in-memory extent array size that includes
delayed allocation extents to determine the number of extents on the
inode.

Signed-off-by: Dave Chinner
Tested-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2014-04-17 06:15:25 +0800
0e1f789d0 xfs: don't map ranges that span EOF for direct IO ... Browse Code »

Al Viro tracked down the problem that has caused generic/263 to fail
on XFS since the test was introduced. If is caused by
xfs_get_blocks() mapping a single extent that spans EOF without
marking it as buffer-new() so that the direct IO code does not zero
the tail of the block at the new EOF. This is a long standing bug
that has been around for many, many years.

Because xfs_get_blocks() starts the map before EOF, it can't set
buffer_new(), because that causes he direct IO code to also zero
unaligned sectors at the head of the IO. This would overwrite valid
data with zeros, and hence we cannot validly return a single extent
that spans EOF to direct IO.

Fix this by detecting a mapping that spans EOF and truncate it down
to EOF. This results in the the direct IO code doing the right thing
for unaligned data blocks before EOF, and then returning to get
another mapping for the region beyond EOF which XFS treats correctly
by setting buffer_new() on it. This makes direct Io behave correctly
w.r.t. tail block zeroing beyond EOF, and fsx is happy about that.

Again, thanks to Al Viro for finding what I couldn't.

[ dchinner: Fix for __divdi3 build error:

Reported-by: Paul Gortmaker
Tested-by: Paul Gortmaker
Signed-off-by: Mark Tinguely
Reviewed-by: Eric Sandeen
]

Signed-off-by: Dave Chinner
Tested-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2014-04-17 06:15:19 +0800
33ac1257f sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner() ... Browse Code »

All device_schedule_callback_owner() users are converted to use
device_remove_file_self(). Remove now unused
{sysfs|device}_schedule_callback_owner().

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-04-17 02:56:33 +0800
4afddd60a kernfs: protect lazy kernfs_iattrs allocation with mutex ... Browse Code »
5

kernfs_iattrs is allocated lazily when operations which require it
take place; unfortunately, the lazy allocation and returning weren't
properly synchronized and when there are multiple concurrent
operations, it might end up returning kernfs_iattrs which hasn't
finished initialization yet or different copies to different callers.

Fix it by synchronizing with a mutex. This can be smarter with memory
barriers but let's go there if it actually turns out to be necessary.

Signed-off-by: Tejun Heo
Link: http://lkml.kernel.org/g/533ABA32.9080602@oracle.com
Reported-by: Sasha Levin
Cc: stable@vger.kernel.org # 3.14
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-04-17 02:54:40 +0800
a2a4dc494 fs: Don't return 0 from get_anon_bdev ... Browse Code »
5

Commit 9e30cc9595303b27b48 removed an internal mount. This
has the side-effect that rootfs now has FSID 0. Many
userspace utilities assume that st_dev in struct stat
is never 0, so this change breaks a number of tools in
early userspace.

Since we don't know how many userspace programs are affected,
make sure that FSID is at least 1.

References: http://article.gmane.org/gmane.linux.kernel/1666905
References: http://permalink.gmane.org/gmane.linux.utilities.util-linux-ng/8557
Cc: 3.14
Signed-off-by: Thomas Bächler
Acked-by: Tejun Heo
Acked-by: H. Peter Anvin
Tested-by: Alexandre Demers
Signed-off-by: Greg Kroah-Hartman

Thomas Bächler
2014-04-17 02:53:08 +0800
8e3ecc876 fs: cifs: remove unused variable. ... Browse Code »

In SMB2_set_compression(), the "res_key" variable is only initialized to NULL
and later kfreed. It is therefore useless and should be removed.

Found with the following semantic patch:

@@
identifier foo;
identifier f;
type T;
@@
* f(...) {
...
* T *foo = NULL;
... when forall
when != foo
* kfree(foo);
...
}

Signed-off-by: Cyril Roelandt
Signed-off-by: Steve French

Cyril Roelandt
2014-04-17 02:51:46 +0800
60977fcc8 Return correct error on query of xattr on file with empty xattrs ... Browse Code »

xfstest 020 detected a problem with cifs xattr handling. When a file
had an empty xattr list, we returned success (with an empty xattr value)
on query of particular xattrs rather than returning ENODATA.
This patch fixes it so that query of an xattr returns ENODATA when the
xattr list is empty for the file.

Signed-off-by: Steve French
Reviewed-by: Jeff Layton

Steve French
2014-04-17 02:51:46 +0800
c11f1df50 cifs: Wait for writebacks to complete before attempting write. ... Browse Code »
21

Problem reported in Red Hat bz 1040329 for strict writes where we cache
only when we hold oplock and write direct to the server when we don't.

When we receive an oplock break, we first change the oplock value for
the inode in cifsInodeInfo->oplock to indicate that we no longer hold
the oplock before we enqueue a task to flush changes to the backing
device. Once we have completed flushing the changes, we return the
oplock to the server.

There are 2 ways here where we can have data corruption
1) While we flush changes to the backing device as part of the oplock
break, we can have processes write to the file. These writes check for
the oplock, find none and attempt to write directly to the server.
These direct writes made while we are flushing from cache could be
overwritten by data being flushed from the cache causing data
corruption.
2) While a thread runs in cifs_strict_writev, the machine could receive
and process an oplock break after the thread has checked the oplock and
found that it allows us to cache and before we have made changes to the
cache. In that case, we end up with a dirty page in cache when we
shouldn't have any. This will be flushed later and will overwrite all
subsequent writes to the part of the file represented by this page.

Before making any writes to the server, we need to confirm that we are
not in the process of flushing data to the server and if we are, we
should wait until the process is complete before we attempt the write.
We should also wait for existing writes to complete before we process
an oplock break request which changes oplock values.

We add a version specific downgrade_oplock() operation to allow for
differences in the oplock values set for the different smb versions.

Cc: stable@vger.kernel.org
Signed-off-by: Sachin Prabhu
Reviewed-by: Jeff Layton
Reviewed-by: Pavel Shilovsky
Signed-off-by: Steve French

Sachin Prabhu
2014-04-17 02:51:46 +0800

14 Apr, 2014

5 commits

897b73b6a xfs: zeroing space needs to punch delalloc blocks ... Browse Code »

When we are zeroing space andit is covered by a delalloc range, we
need to punch the delalloc range out before we truncate the page
cache. Failing to do so leaves and inconsistency between the page
cache and the extent tree, which we later trip over when doing
direct IO over the same range.

Signed-off-by: Dave Chinner
Tested-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2014-04-14 16:15:11 +0800
aad3f3755 xfs: xfs_vm_write_end truncates too much on failure ... Browse Code »

Similar to the write_begin problem, xfs-vm_write_end will truncate
back to the old EOF, potentially removing page cache from over the
top of delalloc blocks with valid data in them. Fix this by
truncating back to just the start of the failed write.

Signed-off-by: Dave Chinner
Tested-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2014-04-14 16:14:11 +0800
72ab70a19 xfs: write failure beyond EOF truncates too much data ... Browse Code »

If we fail a write beyond EOF and have to handle it in
xfs_vm_write_begin(), we truncate the inode back to the current inode
size. This doesn't take into account the fact that we may have
already made successful writes to the same page (in the case of block
size < page size) and hence we can truncate the page cache away from
blocks with valid data in them. If these blocks are delayed
allocation blocks, we now have a mismatch between the page cache and
the extent tree, and this will trigger - at minimum - a delayed
block count mismatch assert when the inode is evicted from the cache.
We can also trip over it when block mapping for direct IO - this is
the most common symptom seen from fsx and fsstress when run from
xfstests.

Fix it by only truncating away the exact range we are updating state
for in this write_begin call.

Signed-off-by: Dave Chinner
Tested-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2014-04-14 16:13:29 +0800
4ab9ed578 xfs: kill buffers over failed write ranges properly ... Browse Code »

When a write fails, if we don't clear the delalloc flags from the
buffers over the failed range, they can persist beyond EOF and cause
problems. writeback will see the pages in the page cache, see they
are dirty and continually retry the write, assuming that the page
beyond EOF is just racing with a truncate. The page will eventually
be released due to some other operation (e.g. direct IO), and it
will not pass through invalidation because it is dirty. Hence it
will be released with buffer_delay set on it, and trigger warnings
in xfs_vm_releasepage() and assert fail in xfs_file_aio_write_direct
because invalidation failed and we didn't write the corect amount.

This causes failures on block size < page size filesystems in fsx
and fsstress workloads run by xfstests.

Fix it by completely trashing any state on the buffer that could be
used to imply that it contains valid data when the delalloc range
over the buffer is punched out during the failed write handling.

Signed-off-by: Dave Chinner
Tested-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2014-04-14 16:11:58 +0800
e686bd8dc cifs: Use min_t() when comparing "size_t" and "unsigned long" ... Browse Code »

On 32 bit, size_t is "unsigned int", not "unsigned long", causing the
following warning when comparing with PAGE_SIZE, which is always "unsigned
long":

fs/cifs/file.c: In function ‘cifs_readdata_to_iov’:
fs/cifs/file.c:2757: warning: comparison of distinct pointer types lacks a cast

Introduced by commit 7f25bba819a3 ("cifs_iovec_read: keep iov_iter
between the calls of cifs_readdata_to_iov()"), which changed the
signedness of "remaining" and the code from min_t() to min().

Signed-off-by: Geert Uytterhoeven
Signed-off-by: Linus Torvalds

Geert Uytterhoeven
2014-04-14 05:10:26 +0800

13 Apr, 2014

4 commits

454fd351f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull yet more networking updates from David Miller:

1) Various fixes to the new Redpine Signals wireless driver, from
Fariya Fatima.

2) L2TP PPP connect code takes PMTU from the wrong socket, fix from
Dmitry Petukhov.

3) UFO and TSO packets differ in whether they include the protocol
header in gso_size, account for that in skb_gso_transport_seglen().
From Florian Westphal.

4) If VLAN untagging fails, we double free the SKB in the bridging
output path. From Toshiaki Makita.

5) Several call sites of sk->sk_data_ready() were referencing an SKB
just added to the socket receive queue in order to calculate the
second argument via skb->len. This is dangerous because the moment
the skb is added to the receive queue it can be consumed in another
context and freed up.

It turns out also that none of the sk->sk_data_ready()
implementations even care about this second argument.

So just kill it off and thus fix all these use-after-free bugs as a
side effect.

6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti.

7) pktgen needs to do locking properly for LLTX devices, from Daniel
Borkmann.

8) xen-netfront driver initializes TX array entries in RX loop :-) From
Vincenzo Maffione.

9) After refactoring, some tunnel drivers allow a tunnel to be
configured on top itself. Fix from Nicolas Dichtel.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
vti: don't allow to add the same tunnel twice
gre: don't allow to add the same tunnel twice
drivers: net: xen-netfront: fix array initialization bug
pktgen: be friendly to LLTX devices
r8152: check RTL8152_UNPLUG
net: sun4i-emac: add promiscuous support
net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
net: ipv6: Fix oif in TCP SYN+ACK route lookup.
drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts
drivers: net: cpsw: discard all packets received when interface is down
net: Fix use after free by removing length arg from sk_data_ready callbacks.
Drivers: net: hyperv: Address UDP checksum issues
Drivers: net: hyperv: Negotiate suitable ndis version for offload support
Drivers: net: hyperv: Allocate memory for all possible per-pecket information
bridge: Fix double free and memory leak around br_allowed_ingress
bonding: Remove debug_fs files when module init fails
i40evf: program RSS LUT correctly
i40evf: remove open-coded skb_cow_head
ixgb: remove open-coded skb_cow_head
igbvf: remove open-coded skb_cow_head
...

Linus Torvalds
2014-04-13 08:31:22 +0800
96c57ade7 ceph: fix pr_fmt() redefinition ... Browse Code »

The vfs merge caused a latent bug to show up:

In file included from fs/ceph/super.h:4:0,
from fs/ceph/ioctl.c:3:
include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by default]
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
^
In file included from include/linux/kernel.h:13:0,
from include/linux/uio.h:12,
from include/linux/socket.h:7,
from include/uapi/linux/in.h:22,
from include/linux/in.h:23,
from fs/ceph/ioctl.c:1:
include/linux/printk.h:214:0: note: this is the location of the previous definition
#define pr_fmt(fmt) fmt
^

where the reason is that is included much too late
for the "pr_fmt()" define.

The include of needs to be the first include in the
file, but fs/ceph/ioctl.c had for some reason missed that, and it wasn't
noticeable until some unrelated header file changes brought in an
indirect earlier include of .

Signed-off-by: Linus Torvalds

Linus Torvalds
2014-04-13 06:39:53 +0800
5166701b3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"The first vfs pile, with deep apologies for being very late in this
window.

Assorted cleanups and fixes, plus a large preparatory part of iov_iter
work. There's a lot more of that, but it'll probably go into the next
merge window - it *does* shape up nicely, removes a lot of
boilerplate, gets rid of locking inconsistencie between aio_write and
splice_write and I hope to get Kent's direct-io rewrite merged into
the same queue, but some of the stuff after this point is having
(mostly trivial) conflicts with the things already merged into
mainline and with some I want more testing.

This one passes LTP and xfstests without regressions, in addition to
usual beating. BTW, readahead02 in ltp syscalls testsuite has started
giving failures since "mm/readahead.c: fix readahead failure for
memoryless NUMA nodes and limit readahead pages" - might be a false
positive, might be a real regression..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
missing bits of "splice: fix racy pipe->buffers uses"
cifs: fix the race in cifs_writev()
ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
kill generic_file_buffered_write()
ocfs2_file_aio_write(): switch to generic_perform_write()
ceph_aio_write(): switch to generic_perform_write()
xfs_file_buffered_aio_write(): switch to generic_perform_write()
export generic_perform_write(), start getting rid of generic_file_buffer_write()
generic_file_direct_write(): get rid of ppos argument
btrfs_file_aio_write(): get rid of ppos
kill the 5th argument of generic_file_buffered_write()
kill the 4th argument of __generic_file_aio_write()
lustre: don't open-code kernel_recvmsg()
ocfs2: don't open-code kernel_recvmsg()
drbd: don't open-code kernel_recvmsg()
constify blk_rq_map_user_iov() and friends
lustre: switch to kernel_sendmsg()
ocfs2: don't open-code kernel_sendmsg()
take iov_iter stuff to mm/iov_iter.c
process_vm_access: tidy up a bit
...

Linus Torvalds
2014-04-13 05:49:50 +0800
0b747172d Merge git://git.infradead.org/users/eparis/audit ... Browse Code »

Pull audit updates from Eric Paris.

* git://git.infradead.org/users/eparis/audit: (28 commits)
AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
audit: do not cast audit_rule_data pointers pointlesly
AUDIT: Allow login in non-init namespaces
audit: define audit_is_compat in kernel internal header
kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
sched: declare pid_alive as inline
audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
syscall_get_arch: remove useless function arguments
audit: remove stray newline from audit_log_execve_info() audit_panic() call
audit: remove stray newlines from audit_log_lost messages
audit: include subject in login records
audit: remove superfluous new- prefix in AUDIT_LOGIN messages
audit: allow user processes to log from another PID namespace
audit: anchor all pid references in the initial pid namespace
audit: convert PPIDs to the inital PID namespace.
pid: get pid_t ppid of task in init_pid_ns
audit: rename the misleading audit_get_context() to audit_take_context()
audit: Add generic compat syscall support
audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
...

Linus Torvalds
2014-04-13 03:38:53 +0800

12 Apr, 2014

5 commits

19dfc1f5f cifs: fix the race in cifs_writev() ... Browse Code »

O_APPEND handling there hadn't been completely fixed by Pavel's
patch; it checks the right value, but it's racy - we can't really
do that until i_mutex has been taken.

Fix by switching to __generic_file_aio_write() (open-coding
generic_file_aio_write(), actually) and pulling mutex_lock() above
inode_size_read().

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro

Al Viro
2014-04-12 18:52:48 +0800
eab87235c ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure ... Browse Code »

ceph_osdc_put_request(ERR_PTR(-error)) oopses. What we want there
is break, not goto out.

Signed-off-by: Al Viro

Al Viro
2014-04-12 18:51:51 +0800
a63b747b4 Merge git://git.kvack.org/~bcrl/aio-next ... Browse Code »

Pull aio ctx->ring_pages migration serialization fix from Ben LaHaise.

* git://git.kvack.org/~bcrl/aio-next:
aio: v4 ensure access to ctx->ring_pages is correctly serialised for migration

Linus Torvalds
2014-04-12 07:36:50 +0800
3123bca71 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull second set of btrfs updates from Chris Mason:
"The most important changes here are from Josef, fixing a btrfs
regression in 3.14 that can cause corruptions in the extent allocation
tree when snapshots are in use.

Josef also fixed some deadlocks in send/recv and other assorted races
when balance is running"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (23 commits)
Btrfs: fix compile warnings on on avr32 platform
btrfs: allow mounting btrfs subvolumes with different ro/rw options
btrfs: export global block reserve size as space_info
btrfs: fix crash in remount(thread_pool=) case
Btrfs: abort the transaction when we don't find our extent ref
Btrfs: fix EINVAL checks in btrfs_clone
Btrfs: fix unlock in __start_delalloc_inodes()
Btrfs: scrub raid56 stripes in the right way
Btrfs: don't compress for a small write
Btrfs: more efficient io tree navigation on wait_extent_bit
Btrfs: send, build path string only once in send_hole
btrfs: filter invalid arg for btrfs resize
Btrfs: send, fix data corruption due to incorrect hole detection
Btrfs: kmalloc() doesn't return an ERR_PTR
Btrfs: fix snapshot vs nocow writting
btrfs: Change the expanding write sequence to fix snapshot related bug.
btrfs: make device scan less noisy
btrfs: fix lockdep warning with reclaim lock inversion
Btrfs: hold the commit_root_sem when getting the commit root during send
Btrfs: remove transaction from send
...

Linus Torvalds
2014-04-12 05:16:53 +0800
676d23690 net: Fix use after free by removing length arg from sk_data_ready callbacks. ... Browse Code »
13

Several spots in the kernel perform a sequence like:

skb_queue_tail(&sk->s_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up. So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument. And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Signed-off-by: David S. Miller

David S. Miller
2014-04-12 04:15:36 +0800

11 Apr, 2014

4 commits

e4fbaee29 Btrfs: fix compile warnings on on avr32 platform ... Browse Code »

fs/btrfs/scrub.c: In function 'get_raid56_logic_offset':
fs/btrfs/scrub.c:2269: warning: comparison of distinct pointer types lacks a cast
fs/btrfs/scrub.c:2269: warning: right shift count >= width of type
fs/btrfs/scrub.c:2269: warning: passing argument 1 of '__div64_32' from incompatible pointer type

Since @rot is an int type, we should not use do_div(), fix it.

Reported-by: kbuild test robot
Signed-off-by: Wang Shilong
Signed-off-by: Chris Mason

Wang Shilong
2014-04-11 21:35:50 +0800
9e897e13b Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd ... Browse Code »

Pull exofs updates from Boaz Harrosh:
"Trivial updates to exofs for 3.15-rc1

Just a few fixes sent by people"

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
MAINTAINERS: Update email address for bhalevy
fs: Mark functions as static in exofs/ore_raid.c
fs: Mark function as static in exofs/super.c

Linus Torvalds
2014-04-11 05:33:02 +0800
0723a0473 btrfs: allow mounting btrfs subvolumes with different ro/rw options ... Browse Code »
10

Given the following /etc/fstab entries:

/dev/sda3 /mnt/foo btrfs subvol=foo,ro 0 0
/dev/sda3 /mnt/bar btrfs subvol=bar,rw 0 0

you can't issue:

$ mount /mnt/foo
$ mount /mnt/bar

You would have to do:

$ mount /mnt/foo
$ mount -o remount,rw /mnt/foo
$ mount --bind -o remount,ro /mnt/foo
$ mount /mnt/bar

or

$ mount /mnt/bar
$ mount --rw /mnt/foo
$ mount --bind -o remount,ro /mnt/foo

With this patch you can do

$ mount /mnt/foo
$ mount /mnt/bar

$ cat /proc/self/mountinfo
49 33 0:41 /foo /mnt/foo ro,relatime shared:36 - btrfs /dev/sda3 rw,ssd,space_cache
87 33 0:41 /bar /mnt/bar rw,relatime shared:74 - btrfs /dev/sda3 rw,ssd,space_cache

Signed-off-by: Chris Mason

Harald Hoyer
2014-04-11 04:32:50 +0800
dd76a786a Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block layer fixes from Jens Axboe:
"A small collection of fixes that should go in before -rc1. The pull
request contains:

- A two patch fix for a regression with block enabled tagging caused
by a commit in the initial pull request. One patch is from Martin
and ensures that SCSI doesn't truncate 64-bit block flags, the
other one is from me and prevents us from double using struct
request queuelist for both completion and busy tags. This caused
anything from a boot crash for some, to crashes under load.

- A blk-mq fix for a potential soft stall when hot unplugging CPUs
with busy IO.

- percpu_counter fix is listed in here, that caused a suspend issue
with virtio-blk due to percpu counters having an inconsistent state
during CPU removal. Andrew sent this in separately a few days ago,
but it's here. JFYI.

- A few fixes for block integrity from Martin.

- A ratelimit fix for loop from Mike Galbraith, to avoid spewing too
much in error cases"

* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix regression with block enabled tagging
scsi: Make sure cmd_flags are 64-bit
block: Ensure we only enable integrity metadata for reads and writes
block: Fix integrity verification
block: Fix for_each_bvec()
drivers/block/loop.c: ratelimit error messages
blk-mq: fix potential stall during CPU unplug with IO pending
percpu_counter: fix bad counter state during suspend

Linus Torvalds
2014-04-11 00:26:55 +0800

09 Apr, 2014

5 commits

e69f18f06 block: Ensure we only enable integrity metadata for reads and writes ... Browse Code »

We'd occasionally attempt to generate protection information for flushes
and other requests with a zero payload. Make sure we only attempt to
enable integrity for reads and writes.

Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Martin K. Petersen
2014-04-09 22:00:06 +0800
0bc699730 block: Fix integrity verification ... Browse Code »

Commit bf36f9cfa6d3d caused a regression by effectively reverting Nic's
fix from 5837c80e870b that ensures we traverse the full bio_vec list
upon completion.

Signed-off-by: Martin K. Petersen
Cc: Nicholas Bellinger
Cc: Gu Zheng
Signed-off-by: Jens Axboe

Martin K. Petersen
2014-04-09 22:00:04 +0800
75ff24fa5 Merge branch 'for-3.15' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd updates from Bruce Fields:
"Highlights:
- server-side nfs/rdma fixes from Jeff Layton and Tom Tucker
- xdr fixes (a larger xdr rewrite has been posted but I decided it
would be better to queue it up for 3.16).
- miscellaneous fixes and cleanup from all over (thanks especially to
Kinglong Mee)"

* 'for-3.15' of git://linux-nfs.org/~bfields/linux: (36 commits)
nfsd4: don't create unnecessary mask acl
nfsd: revert v2 half of "nfsd: don't return high mode bits"
nfsd4: fix memory leak in nfsd4_encode_fattr()
nfsd: check passed socket's net matches NFSd superblock's one
SUNRPC: Clear xpt_bc_xprt if xs_setup_bc_tcp failed
NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp
SUNRPC: New helper for creating client with rpc_xprt
NFSD: Free backchannel xprt in bc_destroy
NFSD: Clear wcc data between compound ops
nfsd: Don't return NFS4ERR_STALE_STATEID for NFSv4.1+
nfsd4: fix nfs4err_resource in 4.1 case
nfsd4: fix setclientid encode size
nfsd4: remove redundant check from nfsd4_check_resp_size
nfsd4: use more generous NFS4_ACL_MAX
nfsd4: minor nfsd4_replay_cache_entry cleanup
nfsd4: nfsd4_replay_cache_entry should be static
nfsd4: update comments with obsolete function name
rpc: Allow xdr_buf_subsegment to operate in-place
NFSD: Using free_conn free connection
SUNRPC: fix memory leak of peer addresses in XPRT
...

Linus Torvalds
2014-04-09 09:28:14 +0800
ffddc5fd1 fs/ncpfs/dir.c: fix indenting in ncp_lookup() ... Browse Code »

My static checker suggests adding curly braces here. Probably that was
the intent, but actually the code works the same either way. I've just
changed the indenting and left the code as-is.

Signed-off-by: Dan Carpenter
Cc: Petr Vandrovec
Acked-by: Dave Chiluk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2014-04-09 07:48:53 +0800
15a03ac6f ncpfs/inode.c: fix mismatch printk formats and arguments ... Browse Code »

Conversions to ncp_dbg showed some format/argument mismatches so fix
them.

Signed-off-by: Joe Perches
Cc: Petr Vandrovec
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2014-04-09 07:48:53 +0800