Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

10 Nov, 2015

5 commits

ee03d02eb btrfs: fix possible leak in btrfs_ioctl_balance() ... Browse Code »

commit 0f89abf56abbd0e1c6e3cef9813e6d9f05383c1e upstream.

Commit 8eb934591f8b ("btrfs: check unsupported filters in balance
arguments") adds a jump to exit label out_bargs in case the argument
check fails. At this point in addition to the bargs memory, the
memory for struct btrfs_balance_control has already been allocated.
Ownership of bctl is passed to btrfs_balance() in the good case,
thus the memory is not freed due to the introduced jump. Make sure
that the memory gets freed in any case as necessary. Detected by
Coverity CID 1328378.

Signed-off-by: Christian Engelmayer
Reviewed-by: David Sterba
Signed-off-by: Chris Mason
Signed-off-by: Greg Kroah-Hartman

Christian Engelmayer
2015-11-10 06:33:39 +0800
7fd58acc9 ovl: fix dentry reference leak ... Browse Code »

commit ab79efab0a0ba01a74df782eb7fa44b044dae8b5 upstream.

In ovl_copy_up_locked(), newdentry is leaked if the function exits through
out_cleanup as this just to out after calling ovl_cleanup() - which doesn't
actually release the ref on newdentry.

The out_cleanup segment should instead exit through out2 as certainly
newdentry leaks - and possibly upper does also, though this isn't caught
given the catch of newdentry.

Without this fix, something like the following is seen:

BUG: Dentry ffff880023e9eb20{i=f861,n=#ffff880023e82d90} still in use (1) [unmount of tmpfs tmpfs]
BUG: Dentry ffff880023ece640{i=0,n=bigfile} still in use (1) [unmount of tmpfs tmpfs]

when unmounting the upper layer after an error occurred in copyup.

An error can be induced by creating a big file in a lower layer with
something like:

dd if=/dev/zero of=/lower/a/bigfile bs=65536 count=1 seek=$((0xf000))

to create a large file (4.1G). Overlay an upper layer that is too small
(on tmpfs might do) and then induce a copy up by opening it writably.

Reported-by: Ulrich Obergfell
Signed-off-by: David Howells
Signed-off-by: Miklos Szeredi
Signed-off-by: Greg Kroah-Hartman

David Howells
2015-11-10 06:33:38 +0800
aa637cda1 ovl: use O_LARGEFILE in ovl_copy_up() ... Browse Code »

commit 0480334fa60488d12ae101a02d7d9e1a3d03d7dd upstream.

Open the lower file with O_LARGEFILE in ovl_copy_up().

Pass O_LARGEFILE unconditionally in ovl_copy_up_data() as it's purely for
catching 32-bit userspace dealing with a file large enough that it'll be
mishandled if the application isn't aware that there might be an integer
overflow. Inside the kernel, there shouldn't be any problems.

Reported-by: Ulrich Obergfell
Signed-off-by: David Howells
Signed-off-by: Miklos Szeredi
Signed-off-by: Greg Kroah-Hartman

David Howells
2015-11-10 06:33:38 +0800
5c418f1bd ovl: free lower_mnt array in ovl_put_super ... Browse Code »

commit 5ffdbe8bf1e485026e1c7e4714d2841553cf0b40 upstream.

This fixes memory leak after umount.

Kmemleak report:

unreferenced object 0xffff8800ba791010 (size 8):
comm "mount", pid 2394, jiffies 4294996294 (age 53.920s)
hex dump (first 8 bytes):
20 1c 13 02 00 88 ff ff .......
backtrace:
[] create_object+0x124/0x2c0
[] kmemleak_alloc+0x7b/0xc0
[] __kmalloc+0x106/0x340
[] ovl_fill_super+0x55c/0x9b0 [overlay]
[] mount_nodev+0x54/0xa0
[] ovl_mount+0x18/0x20 [overlay]
[] mount_fs+0x43/0x170
[] vfs_kern_mount+0x74/0x170
[] do_mount+0x22d/0xdf0
[] SyS_mount+0x7b/0xc0
[] entry_SYSCALL_64_fastpath+0x12/0x76
[] 0xffffffffffffffff

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Miklos Szeredi
Fixes: dd662667e6d3 ("ovl: add mutli-layer infrastructure")
Signed-off-by: Greg Kroah-Hartman

Konstantin Khlebnikov
2015-11-10 06:33:38 +0800
a03bd0e03 ovl: free stack of paths in ovl_fill_super ... Browse Code »

commit 0f95502ad84874b3c05fc7cdd9d4d9d5cddf7859 upstream.

This fixes small memory leak after mount.

Kmemleak report:

unreferenced object 0xffff88003683fe00 (size 16):
comm "mount", pid 2029, jiffies 4294909563 (age 33.380s)
hex dump (first 16 bytes):
20 27 1f bb 00 88 ff ff 40 4b 0f 36 02 88 ff ff '......@K.6....
backtrace:
[] create_object+0x124/0x2c0
[] kmemleak_alloc+0x7b/0xc0
[] __kmalloc+0x106/0x340
[] ovl_fill_super+0x389/0x9a0 [overlay]
[] mount_nodev+0x54/0xa0
[] ovl_mount+0x18/0x20 [overlay]
[] mount_fs+0x43/0x170
[] vfs_kern_mount+0x74/0x170
[] do_mount+0x22d/0xdf0
[] SyS_mount+0x7b/0xc0
[] entry_SYSCALL_64_fastpath+0x12/0x76
[] 0xffffffffffffffff

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Miklos Szeredi
Fixes: a78d9f0d5d5c ("ovl: support multiple lower layers")
Signed-off-by: Greg Kroah-Hartman

Konstantin Khlebnikov
2015-11-10 06:33:37 +0800

27 Oct, 2015

7 commits

41c4e0825 nfs4: have do_vfs_lock take an inode pointer ... Browse Code »

commit 83bfff23e9ed19f37c4ef0bba84e75bd88e5cf21 upstream.

Now that we have file locking helpers that can deal with an inode
instead of a filp, we can change the NFSv4 locking code to use that
instead.

This should fix the case where we have a filp that is closed while flock
or OFD locks are set on it, and the task is signaled so that it doesn't
wait for the LOCKU reply to come in before the filp is freed. At that
point we can end up with a use-after-free with the current code, which
relies on dereferencing the fl_file in the lock request.

Signed-off-by: Jeff Layton
Reviewed-by: "J. Bruce Fields"
Tested-by: "J. Bruce Fields"
Cc: William Dauchy
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2015-10-27 08:52:00 +0800
c7fc0d838 locks: inline posix_lock_file_wait and flock_lock_file_wait ... Browse Code »

commit ee296d7c5709440f8abd36b5b65c6b3e388538d9 upstream.

They just call file_inode and then the corresponding *_inode_file_wait
function. Just make them static inlines instead.

Signed-off-by: Jeff Layton
Cc: William Dauchy
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2015-10-27 08:52:00 +0800
b2540f146 locks: new helpers - flock_lock_inode_wait and posix_lock_inode_wait ... Browse Code »

commit 29d01b22eaa18d8b46091d3c98c6001c49f78e4a upstream.

Allow callers to pass in an inode instead of a filp.

Signed-off-by: Jeff Layton
Reviewed-by: "J. Bruce Fields"
Tested-by: "J. Bruce Fields"
Cc: William Dauchy
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2015-10-27 08:52:00 +0800
0bdb53e1b locks: have flock_lock_file take an inode pointer instead of a filp ... Browse Code »

commit bcd7f78d078ff6197715c1ed070c92aca57ec12c upstream.

...and rename it to better describe how it works.

In order to fix a use-after-free in NFS, we need to be able to remove
locks from an inode after the filp associated with them may have already
been freed. flock_lock_file already only dereferences the filp to get to
the inode, so just change it so the callers do that.

All of the callers already pass in a lock request that has the fl_file
set properly, so we don't need to pass it in individually. With that
change it now only dereferences the filp to get to the inode, so just
push that out to the callers.

Signed-off-by: Jeff Layton
Reviewed-by: "J. Bruce Fields"
Tested-by: "J. Bruce Fields"
Cc: William Dauchy
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2015-10-27 08:51:59 +0800
271759afb nfsd/blocklayout: accept any minlength ... Browse Code »

commit 8c3ad9cb7343dc5f61b8cf3cdbe1016c5e7c2c8b upstream.

Recent Linux clients have started to send GETLAYOUT requests with
minlength less than blocksize.

Servers aren't really allowed to impose this kind of restriction on
layouts; see RFC 5661 section 18.43.3 for details.

This has been observed to cause indefinite hangs on fsx runs on some
clients.

Signed-off-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Christoph Hellwig
2015-10-27 08:51:54 +0800
6780e0d1b btrfs: fix use after free iterating extrefs ... Browse Code »

commit dc6c5fb3b514221f2e9d21ee626a9d95d3418dff upstream.

The code for btrfs inode-resolve has never worked properly for
files with enough hard links to trigger extrefs. It was trying to
get the leaf out of a path after freeing the path:

btrfs_release_path(path);
leaf = path->nodes[0];
item_size = btrfs_item_size_nr(leaf, slot);

The fix here is to use the extent buffer we cloned just a little higher
up to avoid deadlocks caused by using the leaf in the path.

Signed-off-by: Chris Mason
cc: Mark Fasheh
Reviewed-by: Filipe Manana
Reviewed-by: Mark Fasheh
Signed-off-by: Greg Kroah-Hartman

Chris Mason
2015-10-27 08:51:54 +0800
2382a147e btrfs: check unsupported filters in balance arguments ... Browse Code »

commit 8eb934591f8bf584969454a658f629cd06e59f3a upstream.

We don't verify that all the balance filter arguments supplemented by
the flags are actually known to the kernel. Thus we let it silently pass
and do nothing.

At the moment this means only the 'limit' filter, but we're going to add
a few more soon so it's better to have that fixed. Also in older stable
kernels so that it works with newer userspace tools.

Signed-off-by: David Sterba
Signed-off-by: Chris Mason
Signed-off-by: Greg Kroah-Hartman

David Sterba
2015-10-27 08:51:54 +0800

23 Oct, 2015

20 commits

2058efbcb namei: results of d_is_negative() should be checked after dentry revalidation ... Browse Code »

commit daf3761c9fcde0f4ca64321cbed6c1c86d304193 upstream.

Leandro Awa writes:
"After switching to version 4.1.6, our parallelized and distributed
workflows now fail consistently with errors of the form:

T34: ./regex.c:39:22: error: config.h: No such file or directory

From our 'git bisect' testing, the following commit appears to be the
possible cause of the behavior we've been seeing: commit 766c4cbfacd8"

Al Viro says:
"What happens is that 766c4cbfacd8 got the things subtly wrong.

We used to treat d_is_negative() after lookup_fast() as "fall with
ENOENT". That was wrong - checking ->d_flags outside of ->d_seq
protection is unreliable and failing with hard error on what should've
fallen back to non-RCU pathname resolution is a bug.

Unfortunately, we'd pulled the test too far up and ran afoul of
another kind of staleness. The dentry might have been absolutely
stable from the RCU point of view (and we might be on UP, etc), but
stale from the remote fs point of view. If ->d_revalidate() returns
"it's actually stale", dentry gets thrown away and the original code
wouldn't even have looked at its ->d_flags.

What we need is to check ->d_flags where 766c4cbfacd8 does (prior to
->d_seq validation) but only use the result in cases where we do not
discard this dentry outright"

Reported-by: Leandro Awa
Link: https://bugzilla.kernel.org/show_bug.cgi?id=104911
Fixes: 766c4cbfacd8 ("namei: d_is_negative() should be checked...")
Tested-by: Leandro Awa
Signed-off-by: Trond Myklebust
Acked-by: Al Viro
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2015-10-23 05:43:26 +0800
863e9b4f5 nfs/filelayout: Fix NULL reference caused by double freeing of fh_array ... Browse Code »

commit 3ec0c97959abff33a42db9081c22132bcff5b4f2 upstream.

If filelayout_decode_layout fail, _filelayout_free_lseg will causes
a double freeing of fh_array.

[ 1179.279800] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1179.280198] IP: [] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.281010] PGD 0
[ 1179.281443] Oops: 0000 [#1]
[ 1179.281831] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) xfs libcrc32c coretemp nfsd crct10dif_pclmul ppdev crc32_pclmul crc32c_intel auth_rpcgss ghash_clmulni_intel nfs_acl lockd vmw_balloon grace sunrpc parport_pc vmw_vmci parport shpchp i2c_piix4 vmwgfx drm_kms_helper ttm drm serio_raw mptspi scsi_transport_spi mptscsih e1000 mptbase ata_generic pata_acpi [last unloaded: fscache]
[ 1179.283891] CPU: 0 PID: 13336 Comm: cat Tainted: G OE 4.3.0-rc1-pnfs+ #244
[ 1179.284323] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 1179.285206] task: ffff8800501d48c0 ti: ffff88003e3c4000 task.ti: ffff88003e3c4000
[ 1179.285668] RIP: 0010:[] [] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.286612] RSP: 0018:ffff88003e3c77f8 EFLAGS: 00010202
[ 1179.287092] RAX: 0000000000000000 RBX: ffff88001fe78900 RCX: 0000000000000000
[ 1179.287731] RDX: ffffea0000f40760 RSI: ffff88001fe789c8 RDI: ffff88001fe789c0
[ 1179.288383] RBP: ffff88003e3c7810 R08: ffffea0000f40760 R09: 0000000000000000
[ 1179.289170] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88001fe789c8
[ 1179.289959] R13: ffff88001fe789c0 R14: ffff88004ec05a80 R15: ffff88004f935b88
[ 1179.290791] FS: 00007f4e66bb5700(0000) GS:ffffffff81c29000(0000) knlGS:0000000000000000
[ 1179.291580] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1179.292209] CR2: 0000000000000000 CR3: 00000000203f8000 CR4: 00000000001406f0
[ 1179.292731] Stack:
[ 1179.293195] ffff88001fe78900 00000000000000d0 ffff88001fe78178 ffff88003e3c7868
[ 1179.293676] ffffffffa0272737 0000000000000001 0000000000000001 ffff88001fe78800
[ 1179.294151] 00000000614fffce ffffffff81727671 ffff88001fe78100 ffff88001fe78100
[ 1179.294623] Call Trace:
[ 1179.295092] [] filelayout_alloc_lseg+0xa7/0x2d0 [nfs_layout_nfsv41_files]
[ 1179.295625] [] ? out_of_line_wait_on_bit+0x81/0xb0
[ 1179.296133] [] pnfs_layout_process+0xae/0x320 [nfsv4]
[ 1179.296632] [] nfs4_proc_layoutget+0x2b1/0x360 [nfsv4]
[ 1179.297134] [] pnfs_update_layout+0x853/0xb30 [nfsv4]
[ 1179.297632] [] ? nfs_get_lock_context+0x74/0x170 [nfs]
[ 1179.298158] [] filelayout_pg_init_read+0x37/0x50 [nfs_layout_nfsv41_files]
[ 1179.298834] [] __nfs_pageio_add_request+0x119/0x460 [nfs]
[ 1179.299385] [] ? nfs_create_request.part.9+0x37/0x2e0 [nfs]
[ 1179.299872] [] nfs_pageio_add_request+0xa3/0x1b0 [nfs]
[ 1179.300362] [] readpage_async_filler+0x85/0x260 [nfs]
[ 1179.300907] [] read_cache_pages+0x91/0xd0
[ 1179.301391] [] ? nfs_read_completion+0x220/0x220 [nfs]
[ 1179.301867] [] nfs_readpages+0x128/0x200 [nfs]
[ 1179.302330] [] __do_page_cache_readahead+0x203/0x280
[ 1179.302784] [] ? __do_page_cache_readahead+0xd8/0x280
[ 1179.303413] [] ondemand_readahead+0x1a6/0x2f0
[ 1179.303855] [] page_cache_sync_readahead+0x31/0x50
[ 1179.304286] [] generic_file_read_iter+0x4a6/0x5c0
[ 1179.304711] [] ? __nfs_revalidate_mapping+0x1f6/0x240 [nfs]
[ 1179.305132] [] nfs_file_read+0x52/0xa0 [nfs]
[ 1179.305540] [] __vfs_read+0xcc/0x100
[ 1179.305936] [] vfs_read+0x85/0x130
[ 1179.306326] [] SyS_read+0x58/0xd0
[ 1179.306708] [] entry_SYSCALL_64_fastpath+0x12/0x76
[ 1179.307094] Code: c4 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 8b 07 49 89 f4 85 c0 74 47 48 8b 06 49 89 fd 8b 38 48 85 ff 74 22 31 db eb 0c 48 63 d3 48 8b 3c d0 48 85
[ 1179.308357] RIP [] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.309177] RSP
[ 1179.309582] CR2: 0000000000000000

Signed-off-by: Kinglong Mee
Signed-off-by: Trond Myklebust
Cc: William Dauchy
Signed-off-by: Greg Kroah-Hartman

Kinglong Mee
2015-10-23 05:43:26 +0800
aaf19f122 fix a braino in ovl_d_select_inode() ... Browse Code »

commit 9391dd00d13c853ab4f2a85435288ae2202e0e43 upstream.

when opening a directory we want the overlayfs inode, not one from
the topmost layer.

Reported-By: Andrey Jr. Melnikov
Tested-By: Andrey Jr. Melnikov
Signed-off-by: Al Viro
Cc: "Kamata, Munehisa"
Signed-off-by: Greg Kroah-Hartman

Al Viro
2015-10-23 05:43:26 +0800
9abb3b810 overlayfs: Make f_path always point to the overlay and f_inode to the underlay ... Browse Code »

commit 4bacc9c9234c7c8eec44f5ed4e960d9f96fa0f01 upstream.

Make file->f_path always point to the overlay dentry so that the path in
/proc/pid/fd is correct and to ensure that label-based LSMs have access to the
overlay as well as the underlay (path-based LSMs probably don't need it).

Using my union testsuite to set things up, before the patch I see:

[root@andromeda union-testsuite]# bash 5 /a/foo107
[root@andromeda union-testsuite]# stat /mnt/a/foo107
...
Device: 23h/35d Inode: 13381 Links: 1
...
[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
...
Device: 23h/35d Inode: 13381 Links: 1
...

After the patch:

[root@andromeda union-testsuite]# bash 5 /mnt/a/foo107
[root@andromeda union-testsuite]# stat /mnt/a/foo107
...
Device: 23h/35d Inode: 40346 Links: 1
...
[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
...
Device: 23h/35d Inode: 40346 Links: 1
...

Note the change in where /proc/$$/fd/5 points to in the ls command. It was
pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
(which is correct).

The inode accessed, however, is the lower layer. The union layer is on device
25h/37d and the upper layer on 24h/36d.

Signed-off-by: David Howells
Signed-off-by: Al Viro
Cc: "Kamata, Munehisa"
Signed-off-by: Greg Kroah-Hartman

David Howells
2015-10-23 05:43:26 +0800
0d2ea357d overlay: Call ovl_drop_write() earlier in ovl_dentry_open() ... Browse Code »

commit f25801ee4680ef1db21e15c112e6e5fe3ffe8da5 upstream.

Call ovl_drop_write() earlier in ovl_dentry_open() before we call vfs_open()
as we've done the copy up for which we needed the freeze-write lock by that
point.

Signed-off-by: David Howells
Signed-off-by: Al Viro
Cc: "Kamata, Munehisa"
Signed-off-by: Greg Kroah-Hartman

David Howells
2015-10-23 05:43:26 +0800
eed13ce27 vfs: Test for and handle paths that are unreachable from their mnt_root ... Browse Code »

commit 397d425dc26da728396e66d392d5dcb8dac30c37 upstream.

In rare cases a directory can be renamed out from under a bind mount.
In those cases without special handling it becomes possible to walk up
the directory tree to the root dentry of the filesystem and down
from the root dentry to every other file or directory on the filesystem.

Like division by zero .. from an unconnected path can not be given
a useful semantic as there is no predicting at which path component
the code will realize it is unconnected. We certainly can not match
the current behavior as the current behavior is a security hole.

Therefore when encounting .. when following an unconnected path
return -ENOENT.

- Add a function path_connected to verify path->dentry is reachable
from path->mnt.mnt_root. AKA to validate that rename did not do
something nasty to the bind mount.

To avoid races path_connected must be called after following a path
component to it's next path component.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: Al Viro
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2015-10-23 05:43:25 +0800
6f4e45e35 dcache: Handle escaped paths in prepend_path ... Browse Code »

commit cde93be45a8a90d8c264c776fab63487b5038a65 upstream.

A rename can result in a dentry that by walking up d_parent
will never reach it's mnt_root. For lack of a better term
I call this an escaped path.

prepend_path is called by four different functions __d_path,
d_absolute_path, d_path, and getcwd.

__d_path only wants to see paths are connected to the root it passes
in. So __d_path needs prepend_path to return an error.

d_absolute_path similarly wants to see paths that are connected to
some root. Escaped paths are not connected to any mnt_root so
d_absolute_path needs prepend_path to return an error greater
than 1. So escaped paths will be treated like paths on lazily
unmounted mounts.

getcwd needs to prepend "(unreachable)" so getcwd also needs
prepend_path to return an error.

d_path is the interesting hold out. d_path just wants to print
something, and does not care about the weird cases. Which raises
the question what should be printed?

Given that / should result in -ENOENT I
believe it is desirable for escaped paths to be printed as empty
paths. As there are not really any meaninful path components when
considered from the perspective of a mount tree.

So tweak prepend_path to return an empty path with an new error
code of 3 when it encounters an escaped path.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: Al Viro
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2015-10-23 05:43:25 +0800
207663ca0 UBIFS: Kill unneeded locking in ubifs_init_security ... Browse Code »

commit cf6f54e3f133229f02a90c04fe0ff9dd9d3264b4 upstream.

Fixes the following lockdep splat:
[ 1.244527] =============================================
[ 1.245193] [ INFO: possible recursive locking detected ]
[ 1.245193] 4.2.0-rc1+ #37 Not tainted
[ 1.245193] ---------------------------------------------
[ 1.245193] cp/742 is trying to acquire lock:
[ 1.245193] (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [] ubifs_init_security+0x29/0xb0
[ 1.245193]
[ 1.245193] but task is already holding lock:
[ 1.245193] (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [] path_openat+0x3af/0x1280
[ 1.245193]
[ 1.245193] other info that might help us debug this:
[ 1.245193] Possible unsafe locking scenario:
[ 1.245193]
[ 1.245193] CPU0
[ 1.245193] ----
[ 1.245193] lock(&sb->s_type->i_mutex_key#9);
[ 1.245193] lock(&sb->s_type->i_mutex_key#9);
[ 1.245193]
[ 1.245193] *** DEADLOCK ***
[ 1.245193]
[ 1.245193] May be due to missing lock nesting notation
[ 1.245193]
[ 1.245193] 2 locks held by cp/742:
[ 1.245193] #0: (sb_writers#5){.+.+.+}, at: [] mnt_want_write+0x1f/0x50
[ 1.245193] #1: (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [] path_openat+0x3af/0x1280
[ 1.245193]
[ 1.245193] stack backtrace:
[ 1.245193] CPU: 2 PID: 742 Comm: cp Not tainted 4.2.0-rc1+ #37
[ 1.245193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140816_022509-build35 04/01/2014
[ 1.245193] ffffffff8252d530 ffff88007b023a38 ffffffff814f6f49 ffffffff810b56c5
[ 1.245193] ffff88007c30cc80 ffff88007b023af8 ffffffff810a150d ffff88007b023a68
[ 1.245193] 000000008101302a ffff880000000000 00000008f447e23f ffffffff8252d500
[ 1.245193] Call Trace:
[ 1.245193] [] dump_stack+0x4c/0x65
[ 1.245193] [] ? console_unlock+0x1c5/0x510
[ 1.245193] [] __lock_acquire+0x1a6d/0x1ea0
[ 1.245193] [] ? __lock_is_held+0x58/0x80
[ 1.245193] [] lock_acquire+0xd3/0x270
[ 1.245193] [] ? ubifs_init_security+0x29/0xb0
[ 1.245193] [] mutex_lock_nested+0x6b/0x3a0
[ 1.245193] [] ? ubifs_init_security+0x29/0xb0
[ 1.245193] [] ? ubifs_init_security+0x29/0xb0
[ 1.245193] [] ubifs_init_security+0x29/0xb0
[ 1.245193] [] ubifs_create+0xa6/0x1f0
[ 1.245193] [] ? path_openat+0x3af/0x1280
[ 1.245193] [] vfs_create+0x95/0xc0
[ 1.245193] [] path_openat+0x7cc/0x1280
[ 1.245193] [] ? __lock_acquire+0x543/0x1ea0
[ 1.245193] [] ? sched_clock_cpu+0x90/0xc0
[ 1.245193] [] ? calc_global_load_tick+0x60/0x90
[ 1.245193] [] ? sched_clock_cpu+0x90/0xc0
[ 1.245193] [] ? __alloc_fd+0xaf/0x180
[ 1.245193] [] do_filp_open+0x75/0xd0
[ 1.245193] [] ? _raw_spin_unlock+0x26/0x40
[ 1.245193] [] ? __alloc_fd+0xaf/0x180
[ 1.245193] [] do_sys_open+0x129/0x200
[ 1.245193] [] SyS_open+0x19/0x20
[ 1.245193] [] entry_SYSCALL_64_fastpath+0x12/0x6f

While the lockdep splat is a false positive, becuase path_openat holds i_mutex
of the parent directory and ubifs_init_security() tries to acquire i_mutex
of a new inode, it reveals that taking i_mutex in ubifs_init_security() is
in vain because it is only being called in the inode allocation path
and therefore nobody else can see the inode yet.

Reported-and-tested-by: Boris Brezillon
Reviewed-and-tested-by: Dongsheng Yang
Signed-off-by: Richard Weinberger
Signed-off-by: dedekind1@gmail.com
Signed-off-by: Greg Kroah-Hartman

Richard Weinberger
2015-10-23 05:43:24 +0800
22ee53c3b cifs: use server timestamp for ntlmv2 authentication ... Browse Code »

commit 98ce94c8df762d413b3ecb849e2b966b21606d04 upstream.

Linux cifs mount with ntlmssp against an Mac OS X (Yosemite
10.10.5) share fails in case the clocks differ more than +/-2h:

digest-service: digest-request: od failed with 2 proto=ntlmv2
digest-service: digest-request: kdc failed with -1561745592 proto=ntlmv2

Fix this by (re-)using the given server timestamp for the
ntlmv2 authentication (as Windows 7 does).

A related problem was also reported earlier by Namjae Jaen (see below):

Windows machine has extended security feature which refuse to allow
authentication when there is time difference between server time and
client time when ntlmv2 negotiation is used. This problem is prevalent
in embedded enviornment where system time is set to default 1970.

Modern servers send the server timestamp in the TargetInfo Av_Pair
structure in the challenge message [see MS-NLMP 2.2.2.1]
In [MS-NLMP 3.1.5.1.2] it is explicitly mentioned that the client must
use the server provided timestamp if present OR current time if it is
not

Reported-by: Namjae Jeon
Signed-off-by: Peter Seiderer
Signed-off-by: Steve French
Signed-off-by: Greg Kroah-Hartman

Peter Seiderer
2015-10-23 05:43:21 +0800
5ceb4f5dd Do not fall back to SMBWriteX in set_file_size error cases ... Browse Code »

commit 646200a041203f440fb6fcf9cacd9efeda9de74c upstream.

The error paths in set_file_size for cifs and smb3 are incorrect.

In the unlikely event that a server did not support set file info
of the file size, the code incorrectly falls back to trying SMBWriteX
(note that only the original core SMB Write, used for example by DOS,
can set the file size this way - this actually does not work for the more
recent SMBWriteX). The idea was since the old DOS SMB Write could set
the file size if you write zero bytes at that offset then use that if
server rejects the normal set file info call.

Fortunately the SMBWriteX will never be sent on the wire (except when
file size is zero) since the length and offset fields were reversed
in the two places in this function that call SMBWriteX causing
the fall back path to return an error. It is also important to never call
an SMB request from an SMB2/sMB3 session (which theoretically would
be possible, and can cause a brief session drop, although the client
recovers) so this should be fixed. In practice this path does not happen
with modern servers but the error fall back to SMBWriteX is clearly wrong.

Removing the calls to SMBWriteX in the error paths in cifs_set_file_size

Pointed out by PaX/grsecurity team

Signed-off-by: Steve French
Reported-by: PaX Team
CC: Emese Revfy
CC: Brad Spengler
Signed-off-by: Greg Kroah-Hartman

Steve French
2015-10-23 05:43:18 +0800
52213f9e4 disabling oplocks/leases via module parm enable_oplocks broken for SMB3 ... Browse Code »

commit e0ddde9d44e37fbc21ce893553094ecf1a633ab5 upstream.

leases (oplocks) were always requested for SMB2/SMB3 even when oplocks
disabled in the cifs.ko module.

Signed-off-by: Steve French
Reviewed-by: Chandrika Srinivasan
Signed-off-by: Greg Kroah-Hartman

Steve French
2015-10-23 05:43:18 +0800
f1f22082d Fix sec=krb5 on smb3 mounts ... Browse Code »

commit ceb1b0b9b4d1089e9f2731a314689ae17784c861 upstream.

Kerberos, which is very important for security, was only enabled for
CIFS not SMB2/SMB3 mounts (e.g. vers=3.0)

Patch based on the information detailed in
http://thread.gmane.org/gmane.linux.kernel.cifs/10081/focus=10307
to enable Kerberized SMB2/SMB3

a) SMB2_negotiate: enable/use decode_negTokenInit in SMB2_negotiate
b) SMB2_sess_setup: handle Kerberos sectype and replicate Kerberos
SMB1 processing done in sess_auth_kerberos

Signed-off-by: Noel Power
Signed-off-by: Jim McDonough
Signed-off-by: Steve French
Signed-off-by: Greg Kroah-Hartman

Steve French
2015-10-23 05:43:18 +0800
896f0858c NFS: Fix a write performance regression ... Browse Code »

commit 8fa4592a14ebb3c22a21d846d1e4f65dab7d1a7c upstream.

If all other conditions in nfs_can_extend_write() are met, and there
are no locks, then we should be able to assume close-to-open semantics
and the ability to extend our write to cover the whole page.

With this patch, the xfstests generic/074 test completes in 242s instead
of >1400s on my test rig.

Fixes: bd61e0a9c852 ("locks: convert posix locks to file_lock_context")
Cc: Jeff Layton
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2015-10-23 05:43:18 +0800
0fe83960e nfs: fix pg_test page count calculation ... Browse Code »

commit 048883e0b934d9a5103d40e209cb14b7f33d2933 upstream.

We really want sizeof(struct page *) instead. Otherwise we limit
maximum IO size to 64 pages rather than 512 pages on a 64bit system.

Fixes 2e11f829(nfs: cap request size to fit a kmalloced page array).

Cc: Christoph Hellwig
Signed-off-by: Peng Tao
Fixes: 2e11f8296d22 ("nfs: cap request size to fit a kmalloced page array")
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Peng Tao
2015-10-23 05:43:18 +0800
953c97224 NFS: Do cleanup before resetting pageio read/write to mds ... Browse Code »

commit 6f29b9bba7b08c6b1d6f2cc4cf750b342fc1946c upstream.

There is a reference leak of layout segment after resetting
pageio read/write to mds.

Signed-off-by: Kinglong Mee
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Kinglong Mee
2015-10-23 05:43:18 +0800
e84e81255 Btrfs: update fix for read corruption of compressed and shared extents ... Browse Code »

commit 808f80b46790f27e145c72112189d6a3be2bc884 upstream.

My previous fix in commit 005efedf2c7d ("Btrfs: fix read corruption of
compressed and shared extents") was effective only if the compressed
extents cover a file range with a length that is not a multiple of 16
pages. That's because the detection of when we reached a different range
of the file that shares the same compressed extent as the previously
processed range was done at extent_io.c:__do_contiguous_readpages(),
which covers subranges with a length up to 16 pages, because
extent_readpages() groups the pages in clusters no larger than 16 pages.
So fix this by tracking the start of the previously processed file
range's extent map at extent_readpages().

The following test case for fstests reproduces the issue:

seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15

_cleanup()
{
rm -f $tmp.*
}

# get standard environment, filters and checks
. ./common/rc
. ./common/filter

# real QA test starts here
_need_to_be_root
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_cloner

rm -f $seqres.full

test_clone_and_read_compressed_extent()
{
local mount_opts=$1

_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount $mount_opts

# Create our test file with a single extent of 64Kb that is going to
# be compressed no matter which compression algo is used (zlib/lzo).
$XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 64K" \
$SCRATCH_MNT/foo | _filter_xfs_io

# Now clone the compressed extent into an adjacent file offset.
$CLONER_PROG -s 0 -d $((64 * 1024)) -l $((64 * 1024)) \
$SCRATCH_MNT/foo $SCRATCH_MNT/foo

echo "File digest before unmount:"
md5sum $SCRATCH_MNT/foo | _filter_scratch

# Remount the fs or clear the page cache to trigger the bug in
# btrfs. Because the extent has an uncompressed length that is a
# multiple of 16 pages, all the pages belonging to the second range
# of the file (64K to 128K), which points to the same extent as the
# first range (0K to 64K), had their contents full of zeroes instead
# of the byte 0xaa. This was a bug exclusively in the read path of
# compressed extents, the correct data was stored on disk, btrfs
# just failed to fill in the pages correctly.
_scratch_remount

echo "File digest after remount:"
# Must match the digest we got before.
md5sum $SCRATCH_MNT/foo | _filter_scratch
}

echo -e "\nTesting with zlib compression..."
test_clone_and_read_compressed_extent "-o compress=zlib"

_scratch_unmount

echo -e "\nTesting with lzo compression..."
test_clone_and_read_compressed_extent "-o compress=lzo"

status=0
exit

Signed-off-by: Filipe Manana
Tested-by: Timofey Titovets
Signed-off-by: Greg Kroah-Hartman

Filipe Manana
2015-10-23 05:43:16 +0800
98ef62597 Btrfs: fix read corruption of compressed and shared extents ... Browse Code »

commit 005efedf2c7d0a270ffbe28d8997b03844f3e3e7 upstream.

If a file has a range pointing to a compressed extent, followed by
another range that points to the same compressed extent and a read
operation attempts to read both ranges (either completely or part of
them), the pages that correspond to the second range are incorrectly
filled with zeroes.

Consider the following example:

File layout
[0 - 8K] [8K - 24K]
| |
| |
points to extent X, points to extent X,
offset 4K, length of 8K offset 0, length 16K

[extent X, compressed length = 4K uncompressed length = 16K]

If a readpages() call spans the 2 ranges, a single bio to read the extent
is submitted - extent_io.c:submit_extent_page() would only create a new
bio to cover the second range pointing to the extent if the extent it
points to had a different logical address than the extent associated with
the first range. This has a consequence of the compressed read end io
handler (compression.c:end_compressed_bio_read()) finish once the extent
is decompressed into the pages covering the first range, leaving the
remaining pages (belonging to the second range) filled with zeroes (done
by compression.c:btrfs_clear_biovec_end()).

So fix this by submitting the current bio whenever we find a range
pointing to a compressed extent that was preceded by a range with a
different extent map. This is the simplest solution for this corner
case. Making the end io callback populate both ranges (or more, if we
have multiple pointing to the same extent) is a much more complex
solution since each bio is tightly coupled with a single extent map and
the extent maps associated to the ranges pointing to the shared extent
can have different offsets and lengths.

The following test case for fstests triggers the issue:

seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15

_cleanup()
{
rm -f $tmp.*
}

# get standard environment, filters and checks
. ./common/rc
. ./common/filter

# real QA test starts here
_need_to_be_root
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_cloner

rm -f $seqres.full

test_clone_and_read_compressed_extent()
{
local mount_opts=$1

_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount $mount_opts

# Create a test file with a single extent that is compressed (the
# data we write into it is highly compressible no matter which
# compression algorithm is used, zlib or lzo).
$XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 4K" \
-c "pwrite -S 0xbb 4K 8K" \
-c "pwrite -S 0xcc 12K 4K" \
$SCRATCH_MNT/foo | _filter_xfs_io

# Now clone our extent into an adjacent offset.
$CLONER_PROG -s $((4 * 1024)) -d $((16 * 1024)) -l $((8 * 1024)) \
$SCRATCH_MNT/foo $SCRATCH_MNT/foo

# Same as before but for this file we clone the extent into a lower
# file offset.
$XFS_IO_PROG -f -c "pwrite -S 0xaa 8K 4K" \
-c "pwrite -S 0xbb 12K 8K" \
-c "pwrite -S 0xcc 20K 4K" \
$SCRATCH_MNT/bar | _filter_xfs_io

$CLONER_PROG -s $((12 * 1024)) -d 0 -l $((8 * 1024)) \
$SCRATCH_MNT/bar $SCRATCH_MNT/bar

echo "File digests before unmounting filesystem:"
md5sum $SCRATCH_MNT/foo | _filter_scratch
md5sum $SCRATCH_MNT/bar | _filter_scratch

# Evicting the inode or clearing the page cache before reading
# again the file would also trigger the bug - reads were returning
# all bytes in the range corresponding to the second reference to
# the extent with a value of 0, but the correct data was persisted
# (it was a bug exclusively in the read path). The issue happened
# only if the same readpages() call targeted pages belonging to the
# first and second ranges that point to the same compressed extent.
_scratch_remount

echo "File digests after mounting filesystem again:"
# Must match the same digests we got before.
md5sum $SCRATCH_MNT/foo | _filter_scratch
md5sum $SCRATCH_MNT/bar | _filter_scratch
}

echo -e "\nTesting with zlib compression..."
test_clone_and_read_compressed_extent "-o compress=zlib"

_scratch_unmount

echo -e "\nTesting with lzo compression..."
test_clone_and_read_compressed_extent "-o compress=lzo"

status=0
exit

Signed-off-by: Filipe Manana
Reviewed-by: Qu Wenruo
Reviewed-by: Liu Bo
Signed-off-by: Greg Kroah-Hartman

Filipe Manana
2015-10-23 05:43:16 +0800
51e828d6a btrfs: skip waiting on ordered range for special files ... Browse Code »

commit a30e577c96f59b1e1678ea5462432b09bf7d5cbc upstream.

In btrfs_evict_inode, we properly truncate the page cache for evicted
inodes but then we call btrfs_wait_ordered_range for every inode as well.
It's the right thing to do for regular files but results in incorrect
behavior for device inodes for block devices.

filemap_fdatawrite_range gets called with inode->i_mapping which gets
resolved to the block device inode before getting passed to
wbc_attach_fdatawrite_inode and ultimately to inode_to_bdi. What happens
next depends on whether there's an open file handle associated with the
inode. If there is, we write to the block device, which is unexpected
behavior. If there isn't, we through normally and inode->i_data is used.
We can also end up racing against open/close which can result in crashes
when i_mapping points to a block device inode that has been closed.

Since there can't be any page cache associated with special file inodes,
it's safe to skip the btrfs_wait_ordered_range call entirely and avoid
the problem.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100911
Tested-by: Christoph Biedl
Signed-off-by: Jeff Mahoney
Reviewed-by: Filipe Manana
Signed-off-by: Greg Kroah-Hartman

Jeff Mahoney
2015-10-23 05:43:16 +0800
40c96f620 ocfs2/dlm: fix deadlock when dispatch assert master ... Browse Code »

commit 012572d4fc2e4ddd5c8ec8614d51414ec6cae02a upstream.

The order of the following three spinlocks should be:
dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock

But dlm_dispatch_assert_master() is called while holding
dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls
dlm_grab() which will take dlm_domain_lock.

Once another thread (for example, dlm_query_join_handler) has already
taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock
happens.

Signed-off-by: Joseph Qi
Cc: Joel Becker
Cc: Mark Fasheh
Cc: "Junxiao Bi"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Joseph Qi
2015-10-23 05:43:15 +0800
eb218995a blockdev: don't set S_DAX for misaligned partitions ... Browse Code »

commit f0b2e563bc419df7c1b3d2f494574c25125f6aed upstream.

The dax code doesn't currently support misaligned partitions,
so disable O_DIRECT via dax until such time as that support
materializes.

Suggested-by: Boaz Harrosh
Signed-off-by: Jeff Moyer
Signed-off-by: Dan Williams
Signed-off-by: Greg Kroah-Hartman

Jeff Moyer
2015-10-23 05:43:13 +0800

30 Sep, 2015

8 commits

164802e40 jbd2: avoid infinite loop when destroying aborted journal ... Browse Code »

commit 841df7df196237ea63233f0f9eaa41db53afd70f upstream.

Commit 6f6a6fda2945 "jbd2: fix ocfs2 corrupt when updating journal
superblock fails" changed jbd2_cleanup_journal_tail() to return EIO
when the journal is aborted. That makes logic in
jbd2_log_do_checkpoint() bail out which is fine, except that
jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make
a progress in cleaning the journal. Without it jbd2_journal_destroy()
just loops in an infinite loop.

Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of
jbd2_log_do_checkpoint() fails with error.

Reported-by: Eryu Guan
Tested-by: Eryu Guan
Fixes: 6f6a6fda294506dfe0e3e0a253bb2d2923f28f0a
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o
Signed-off-by: Greg Kroah-Hartman

Jan Kara
2015-09-30 01:26:19 +0800
d9c410f96 hfs,hfsplus: cache pages correctly between bnode_create and bnode_free ... Browse Code »

commit 7cb74be6fd827e314f81df3c5889b87e4c87c569 upstream.

Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
hfs_bnode_find() for finding or creating pages corresponding to an inode)
are immediately kmap()'ed and used (both read and write) and kunmap()'ed,
and should not be page_cache_release()'ed until hfs_bnode_free().

This patch fixes a problem I first saw in July 2012: merely running "du"
on a large hfsplus-mounted directory a few times on a reasonably loaded
system would get the hfsplus driver all confused and complaining about
B-tree inconsistencies, and generates a "BUG: Bad page state". Most
recently, I can generate this problem on up-to-date Fedora 22 with shipped
kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller
mounts) and "du /mnt" simultaneously on two windows, where /mnt is a
lightly-used QEMU VM image of the full Mac OS X 10.9:

$ df -i / /home /mnt
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/fedora-root 3276800 551665 2725135 17% /
/dev/mapper/fedora-home 52879360 716221 52163139 2% /home
/dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt

After applying the patch, I was able to run "du /" (60+ times) and "du
/mnt" (150+ times) continuously and simultaneously for 6+ hours.

There are many reports of the hfsplus driver getting confused under load
and generating "BUG: Bad page state" or other similar issues over the
years. [1]

The unpatched code [2] has always been wrong since it entered the kernel
tree. The only reason why it gets away with it is that the
kmap/memcpy/kunmap follow very quickly after the page_cache_release() so
the kernel has not had a chance to reuse the memory for something else,
most of the time.

The current RW driver appears to have followed the design and development
of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec
2001) had a B-tree node-centric approach to
read_cache_page()/page_cache_release() per bnode_get()/bnode_put(),
migrating towards version 0.2 (June 2002) of caching and releasing pages
per inode extents. When the current RW code first entered the kernel [2]
in 2005, there was an REF_PAGES conditional (and "//" commented out code)
to switch between B-node centric paging to inode-centric paging. There
was a mistake with the direction of one of the REF_PAGES conditionals in
__hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the
read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
removed, but a page_cache_release() was mistakenly left in (propagating
the "REF_PAGES !REF_PAGE" mistake), and the commented-out
page_cache_release() in bnode_release() (which should be spanned by
!REF_PAGES) was never enabled.

References:
[1]:
Michael Fox, Apr 2013
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")

Sasha Levin, Feb 2015
http://lkml.org/lkml/2015/2/20/85 ("use after free")

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
https://bugzilla.kernel.org/show_bug.cgi?id=42342
https://bugzilla.kernel.org/show_bug.cgi?id=63841
https://bugzilla.kernel.org/show_bug.cgi?id=78761

[2]:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db
commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
Author: Andrew Morton
Date: Wed Feb 25 16:17:36 2004 -0800

[PATCH] HFS rewrite

http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd

commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton
Date: Wed Feb 25 16:17:48 2004 -0800

[PATCH] HFS+ support

[3]:
http://sourceforge.net/projects/linux-hfsplus/

http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/
http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/

http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
fs/hfsplus/bnode.c?r1=1.4&r2=1.5

Date: Thu Jun 6 09:45:14 2002 +0000
Use buffer cache instead of page cache in bnode.c. Cache inode extents.

[4]:
http://git.kernel.org/cgit/linux/kernel/git/\
stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d

commit a5e3985fa014029eb6795664c704953720cc7f7d
Author: Roman Zippel
Date: Tue Sep 6 15:18:47 2005 -0700

[PATCH] hfs: remove debug code

Signed-off-by: Hin-Tak Leung
Signed-off-by: Sergei Antonov
Reviewed-by: Anton Altaparmakov
Reported-by: Sasha Levin
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Vyacheslav Dubeyko
Cc: Sougata Santra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Hin-Tak Leung
2015-09-30 01:26:18 +0800
919a78e3e hfs: fix B-tree corruption after insertion at position 0 ... Browse Code »

commit b4cc0efea4f0bfa2477c56af406cfcf3d3e58680 upstream.

Fix B-tree corruption when a new record is inserted at position 0 in the
node in hfs_brec_insert().

This is an identical change to the corresponding hfs b-tree code to Sergei
Antonov's "hfsplus: fix B-tree corruption after insertion at position 0",
to keep similar code paths in the hfs and hfsplus drivers in sync, where
appropriate.

Signed-off-by: Hin-Tak Leung
Cc: Sergei Antonov
Cc: Joe Perches
Reviewed-by: Vyacheslav Dubeyko
Cc: Anton Altaparmakov
Cc: Al Viro
Cc: Christoph Hellwig
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Hin-Tak Leung
2015-09-30 01:26:15 +0800
22fda415e eCryptfs: Invalidate dcache entries when lower i_nlink is zero ... Browse Code »

commit 5556e7e6d30e8e9b5ee51b0e5edd526ee80e5e36 upstream.

Consider eCryptfs dcache entries to be stale when the corresponding
lower inode's i_nlink count is zero. This solves a problem caused by the
lower inode being directly modified, without going through the eCryptfs
mount, leaving stale eCryptfs dentries cached and the eCryptfs inode's
i_nlink count not being cleared.

Signed-off-by: Tyler Hicks
Reported-by: Richard Weinberger
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

Tyler Hicks
2015-09-30 01:26:15 +0800
2be9c8262 fs: Don't dump core if the corefile would become world-readable. ... Browse Code »

commit 40f705a736eac10e7dca7ab5dd5ed675a6df031d upstream.

On a filesystem like vfat, all files are created with the same owner
and mode independent of who created the file. When a vfat filesystem
is mounted with root as owner of all files and read access for everyone,
root's processes left world-readable coredumps on it (but other
users' processes only left empty corefiles when given write access
because of the uid mismatch).

Given that the old behavior was inconsistent and insecure, I don't see
a problem with changing it. Now, all processes refuse to dump core unless
the resulting corefile will only be readable by their owner.

Signed-off-by: Jann Horn
Acked-by: Kees Cook
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Jann Horn
2015-09-30 01:26:12 +0800
244d3c13d fs: if a coredump already exists, unlink and recreate with O_EXCL ... Browse Code »

commit fbb1816942c04429e85dbf4c1a080accc534299e upstream.

It was possible for an attacking user to trick root (or another user) into
writing his coredumps into an attacker-readable, pre-existing file using
rename() or link(), causing the disclosure of secret data from the victim
process' virtual memory. Depending on the configuration, it was also
possible to trick root into overwriting system files with coredumps. Fix
that issue by never writing coredumps into existing files.

Requirements for the attack:
- The attack only applies if the victim's process has a nonzero
RLIMIT_CORE and is dumpable.
- The attacker can trick the victim into coredumping into an
attacker-writable directory D, either because the core_pattern is
relative and the victim's cwd is attacker-writable or because an
absolute core_pattern pointing to a world-writable directory is used.
- The attacker has one of these:
A: on a system with protected_hardlinks=0:
execute access to a folder containing a victim-owned,
attacker-readable file on the same partition as D, and the
victim-owned file will be deleted before the main part of the attack
takes place. (In practice, there are lots of files that fulfill
this condition, e.g. entries in Debian's /var/lib/dpkg/info/.)
This does not apply to most Linux systems because most distros set
protected_hardlinks=1.
B: on a system with protected_hardlinks=1:
execute access to a folder containing a victim-owned,
attacker-readable and attacker-writable file on the same partition
as D, and the victim-owned file will be deleted before the main part
of the attack takes place.
(This seems to be uncommon.)
C: on any system, independent of protected_hardlinks:
write access to a non-sticky folder containing a victim-owned,
attacker-readable file on the same partition as D
(This seems to be uncommon.)

The basic idea is that the attacker moves the victim-owned file to where
he expects the victim process to dump its core. The victim process dumps
its core into the existing file, and the attacker reads the coredump from
it.

If the attacker can't move the file because he does not have write access
to the containing directory, he can instead link the file to a directory
he controls, then wait for the original link to the file to be deleted
(because the kernel checks that the link count of the corefile is 1).

A less reliable variant that requires D to be non-sticky works with link()
and does not require deletion of the original link: link() the file into
D, but then unlink() it directly before the kernel performs the link count
check.

On systems with protected_hardlinks=0, this variant allows an attacker to
not only gain information from coredumps, but also clobber existing,
victim-writable files with coredumps. (This could theoretically lead to a
privilege escalation.)

Signed-off-by: Jann Horn
Cc: Kees Cook
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Jann Horn
2015-09-30 01:26:12 +0800
d40d9de9d Revert "NFSv4: Remove incorrect check in can_open_delegated()" ... Browse Code »

commit 36319608e28701c07cad80ae3be8b0fdfb1ab40f upstream.

This reverts commit 4e379d36c050b0117b5d10048be63a44f5036115.

This commit opens up a race between the recovery code and the open code.

Reported-by: Olga Kornievskaia
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2015-09-30 01:26:10 +0800
3d5c6b90e NFSv4.1: Fix a protocol issue with CLOSE stateids ... Browse Code »

commit 4a1e2feb9d246775dee0f78ed5b18826bae2b1c5 upstream.

According to RFC5661 Section 18.2.4, CLOSE is supposed to return
the zero stateid. This means that nfs_clear_open_stateid_locked()
cannot assume that the result stateid will always match the 'other'
field of the existing open stateid when trying to determine a race
with a parallel OPEN.

Instead, we look at the argument, and check for matches.

Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2015-09-30 01:26:10 +0800