Eric Lee / smarc-fsl-linux-kernel

30 Dec, 2015

3 commits

cc28d6d80 ocfs2/dlm: clear migration_pending when migration target goes down ... Browse Code »

We have found a BUG on res->migration_pending when migrating lock
resources. The situation is as follows.

dlm_mark_lockres_migration
res->migration_pending = 1;
__dlm_lockres_reserve_ast
dlm_lockres_release_ast returns with res->migration_pending remains
because other threads reserve asts
wait dlm_migration_can_proceed returns 1
>>>>>>> o2hb found that target goes down and remove target
from domain_map
dlm_migration_can_proceed returns 1
dlm_mark_lockres_migrating returns -ESHOTDOWN with
res->migration_pending still remains.

When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON
with res->migration_pending. So clear migration_pending when target is
down.

Signed-off-by: Jiufei Xue
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

xuejiufei
2015-12-30 09:45:49 +0800
b5a8bc338 ocfs2: fix flock panic issue ... Browse Code »

Commit 4f6563677ae8 ("Move locks API users to locks_lock_inode_wait()")
move flock/posix lock indentify code to locks_lock_inode_wait(), but
missed to set fl_flags to FL_FLOCK which caused the following kernel
panic on 4.4.0_rc5.

kernel BUG at fs/locks.c:1895!
invalid opcode: 0000 [#1] SMP
Modules linked in: ocfs2(O) ocfs2_dlmfs(O) ocfs2_stack_o2cb(O) ocfs2_dlm(O) ocfs2_nodemanager(O) ocfs2_stackglue(O) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi xen_kbdfront xen_netfront xen_fbfront xen_blkfront
CPU: 0 PID: 20268 Comm: flock_unit_test Tainted: G O 4.4.0-rc5-next-20151217 #1
Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014
task: ffff88007b3672c0 ti: ffff880028b58000 task.ti: ffff880028b58000
RIP: locks_lock_inode_wait+0x2e/0x160
Call Trace:
ocfs2_do_flock+0x91/0x160 [ocfs2]
ocfs2_flock+0x76/0xd0 [ocfs2]
SyS_flock+0x10f/0x1a0
entry_SYSCALL_64_fastpath+0x12/0x71
Code: e5 41 57 41 56 49 89 fe 41 55 41 54 53 48 89 f3 48 81 ec 88 00 00 00 8b 46 40 83 e0 03 83 f8 01 0f 84 ad 00 00 00 83 f8 02 74 04 0b eb fe 4c 8d ad 60 ff ff ff 4c 8d 7b 58 e8 0e 8e 73 00 4d
RIP locks_lock_inode_wait+0x2e/0x160
RSP
---[ end trace dfca74ec9b5b274c ]---

Fixes: 4f6563677ae8 ("Move locks API users to locks_lock_inode_wait()")
Signed-off-by: Junxiao Bi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2015-12-30 09:45:49 +0800
5c9ee4cbf ocfs2: fix BUG when calculate new backup super ... Browse Code »

When resizing, it firstly extends the last gd. Once it should backup
super in the gd, it calculates new backup super and update the
corresponding value.

But it currently doesn't consider the situation that the backup super is
already done. And in this case, it still sets the bit in gd bitmap and
then decrease from bg_free_bits_count, which leads to a corrupted gd and
trigger the BUG in ocfs2_block_group_set_bits:

BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits);

So check whether the backup super is done and then do the updates.

Signed-off-by: Joseph Qi
Reviewed-by: Jiufei Xue
Reviewed-by: Yiwen Jiang
Cc: Mark Fasheh
Cc: Joel Becker
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-12-30 09:45:49 +0800

23 Dec, 2015

1 commit

0bee6ec80 Merge tag 'nfsd-4.4-1' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd fix from Bruce Fields:
"Just one fix for a NFSv4 callback bug introduced in 4.4"

* tag 'nfsd-4.4-1' of git://linux-nfs.org/~bfields/linux:
nfsd: don't hold ls_mutex across a layout recall

Linus Torvalds
2015-12-23 07:52:32 +0800

19 Dec, 2015

2 commits

fc315e3e5 Merge branch 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"A couple of small fixes"

* 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: check prepare_uptodate_page() error code earlier
Btrfs: check for empty bitmap list in setup_cluster_bitmaps
btrfs: fix misleading warning when space cache failed to load
Btrfs: fix transaction handle leak in balance
Btrfs: fix unprotected list move from unused_bgs to deleted_bgs list

Linus Torvalds
2015-12-19 07:35:08 +0800
41a0c249c proc: fix -ESRCH error when writing to /proc/$pid/coredump_filter ... Browse Code »

Writing to /proc/$pid/coredump_filter always returns -ESRCH because commit
774636e19ed51 ("proc: convert to kstrto*()/kstrto*_from_user()") removed
the setting of ret after the get_proc_task call and incorrectly left it as
-ESRCH. Instead, return 0 when successful.

Example breakage:

echo 0 > /proc/self/coredump_filter
bash: echo: write error: No such process

Fixes: 774636e19ed51 ("proc: convert to kstrto*()/kstrto*_from_user()")
Signed-off-by: Colin Ian King
Acked-by: Kees Cook
Cc: [4.3+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Colin Ian King
2015-12-19 06:25:40 +0800

17 Dec, 2015

1 commit

be20aa00c nfsd: don't hold ls_mutex across a layout recall ... Browse Code »

We do need to serialize layout stateid morphing operations, but we
currently hold the ls_mutex across a layout recall which is pretty
ugly. It's also unnecessary -- once we've bumped the seqid and
copied it, we don't need to serialize the rest of the CB_LAYOUTRECALL
vs. anything else. Just drop the mutex once the copy is done.

This was causing a "workqueue leaked lock or atomic" warning and an
occasional deadlock.

There's more work to be done here but this fixes the immediate
regression.

Fixes: cc8a55320b5f "nfsd: serialize layout stateid morphing operations"
Cc: stable@vger.kernel.org
Reported-by: Kinglong Mee
Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2015-12-17 00:49:58 +0800

16 Dec, 2015

3 commits

1d3a5a82f Merge branch 'for-chris-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/fd… ... Browse Code »

…manana/linux into for-linus-4.4

Chris Mason
2015-12-16 01:09:59 +0800
bb1591b4e Btrfs: check prepare_uptodate_page() error code earlier ... Browse Code »

prepare_pages() may end up calling prepare_uptodate_page() twice if our
write only spans a single page. But if the first call returns an error,
our page will be unlocked and its not safe to call it again.

This bug goes all the way back to 2011, and it's not something commonly
hit.

While we're here, add a more explicit check for the page being truncated
away. The bare lock_page() alone is protected only by good thoughts and
i_mutex, which we're sure to regret eventually.

Reported-by: Dave Jones
Signed-off-by: Chris Mason

Chris Mason
2015-12-16 01:09:38 +0800
1b9b922a3 Btrfs: check for empty bitmap list in setup_cluster_bitmaps ... Browse Code »

Dave Jones found a warning from kasan in setup_cluster_bitmaps()

==================================================================
BUG: KASAN: stack-out-of-bounds in setup_cluster_bitmap+0xc4/0x5a0 at
addr ffff88039bef6828
Read of size 8 by task nfsd/1009
page:ffffea000e6fbd80 count:0 mapcount:0 mapping: (null)
index:0x0
flags: 0x8000000000000000()
page dumped because: kasan: bad access detected
CPU: 1 PID: 1009 Comm: nfsd Tainted: G W
4.4.0-rc3-backup-debug+ #1
ffff880065647b50 000000006bb712c2 ffff88039bef6640 ffffffffa680a43e
0000004559c00000 ffff88039bef66c8 ffffffffa62638d1 ffffffffa61121c0
ffff8803a5769de8 0000000000000296 ffff8803a5769df0 0000000000046280
Call Trace:
[] dump_stack+0x4b/0x6d
[] kasan_report_error+0x501/0x520
[] ? debug_show_all_locks+0x1e0/0x1e0
[] kasan_report+0x58/0x60
[] ? rb_last+0x10/0x40
[] ? setup_cluster_bitmap+0xc4/0x5a0
[] __asan_load8+0x5d/0x70
[] setup_cluster_bitmap+0xc4/0x5a0
[] ? setup_cluster_no_bitmap+0x6a/0x400
[] btrfs_find_space_cluster+0x4b6/0x640
[] ? btrfs_alloc_from_cluster+0x4e0/0x4e0
[] ? btrfs_return_cluster_to_free_space+0x9e/0xb0
[] ? _raw_spin_unlock+0x27/0x40
[] find_free_extent+0xba1/0x1520

Andrey noticed this was because we were doing list_first_entry on a list
that might be empty. Rework the tests a bit so we don't do that.

Signed-off-by: Chris Mason
Reprorted-by: Andrey Ryabinin
Reported-by: Dave Jones

Chris Mason
2015-12-16 01:09:33 +0800

14 Dec, 2015

2 commits

dfd01f026 sched/wait: Fix the signal handling fix ... Browse Code »

Jan Stancek reported that I wrecked things for him by fixing things for
Vladimir :/

His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which
should not be possible, however my previous patch made this possible by
unconditionally checking signal_pending().

We cannot use current->state as was done previously, because the
instruction after the store to that variable it can be changed. We must
instead pass the initial state along and use that.

Fixes: 68985633bccb ("sched/wait: Fix signal handling in bit wait helpers")
Reported-by: Jan Stancek
Reported-by: Chris Mason
Tested-by: Jan Stancek
Tested-by: Vladimir Murzin
Tested-by: Chris Mason
Reviewed-by: Paul Turner
Cc: Ingo Molnar
Cc: tglx@linutronix.de
Cc: Oleg Nesterov
Cc: hpa@zytor.com
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Linus Torvalds

Peter Zijlstra
2015-12-14 06:30:59 +0800
fc8918283 Merge tag 'nfs-for-4.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfix from Trond Myklebust:
"SUNRPC: Fix a NFSv4.1 callback channel regression"

* tag 'nfs-for-4.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Fix callback channel

Linus Torvalds
2015-12-14 04:46:04 +0800

13 Dec, 2015

4 commits

800f1ac47 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"17 fixes"

* emailed patches from Andrew Morton :
MIPS: fix DMA contiguous allocation
sh64: fix __NR_fgetxattr
ocfs2: fix SGID not inherited issue
mm/oom_kill.c: avoid attempting to kill init sharing same memory
drivers/base/memory.c: prohibit offlining of memory blocks with missing sections
tmpfs: fix shmem_evict_inode() warnings on i_blocks
mm/hugetlb.c: fix resv map memory leak for placeholder entries
mm: hugetlb: call huge_pte_alloc() only if ptep is null
kernel: remove stop_machine() Kconfig dependency
mm: kmemleak: mark kmemleak_init prototype as __init
mm: fix kerneldoc on mem_cgroup_replace_page
osd fs: __r4w_get_page rely on PageUptodate for uptodate
MAINTAINERS: make Vladimir co-maintainer of the memory controller
mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress
mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo
memcg: fix memory.high target
mm: hugetlb: fix hugepage memory leak caused by wrong reserve count

Linus Torvalds
2015-12-13 02:44:49 +0800
780756318 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block layer fixes from Jens Axboe:
"A set of fixes for the current series. This contains:

- A bunch of fixes for lightnvm, should be the last round for this
series. From Matias and Wenwei.

- A writeback detach inode fix from Ilya, also marked for stable.

- A block (though it says SCSI) fix for an OOPS in SCSI runtime power
management.

- Module init error path fixes for null_blk from Minfei"

* 'for-linus' of git://git.kernel.dk/linux-block:
null_blk: Fix error path in module initialization
lightnvm: do not compile in debugging by default
lightnvm: prevent gennvm module unload on use
lightnvm: fix media mgr registration
lightnvm: replace req queue with nvmdev for lld
lightnvm: comments on constants
lightnvm: check mm before use
lightnvm: refactor spin_unlock in gennvm_get_blk
lightnvm: put blks when luns configure failed
lightnvm: use flags in rrpc_get_blk
block: detach bdev inode from its wb in __blkdev_put()
SCSI: Fix NULL pointer dereference in runtime PM

Linus Torvalds
2015-12-13 02:24:00 +0800
854ee2e94 ocfs2: fix SGID not inherited issue ... Browse Code »

Commit 8f1eb48758aa ("ocfs2: fix umask ignored issue") introduced an
issue, SGID of sub dir was not inherited from its parents dir. It is
because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but
is overwritten by "mode" which don't have SGID set later.

Fixes: 8f1eb48758aa ("ocfs2: fix umask ignored issue")
Signed-off-by: Junxiao Bi
Cc: Mark Fasheh
Cc: Joel Becker
Acked-by: Srinivas Eeda
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2015-12-13 02:15:34 +0800
3066a9670 osd fs: __r4w_get_page rely on PageUptodate for uptodate ... Browse Code »

Commit 42cb14b110a5 ("mm: migrate dirty page without
clear_page_dirty_for_io etc") simplified the migration of a PageDirty
pagecache page: one stat needs moving from zone to zone and that's about
all.

It's convenient and safest for it to shift the PageDirty bit from old
page to new, just before updating the zone stats: before copying data
and marking the new PageUptodate. This is all done while both pages are
isolated and locked, just as before; and just as before, there's a
moment when the new page is visible in the radix_tree, but not yet
PageUptodate. What's new is that it may now be briefly visible as
PageDirty before it is PageUptodate.

When I scoured the tree to see if this could cause a problem anywhere,
the only places I found were in two similar functions __r4w_get_page():
which look up a page with find_get_page() (not using page lock), then
claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.

I'm not sure whether that was right before, but now it might be wrong
(on rare occasions): only claim the page is uptodate if PageUptodate.
Or perhaps the page in question could never be migratable anyway?

Signed-off-by: Hugh Dickins
Tested-by: Boaz Harrosh
Cc: Benny Halevy
Cc: Trond Myklebust
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2015-12-13 02:15:34 +0800

12 Dec, 2015

1 commit

732c4a9e1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

Pull fuse fixes from Miklos Szeredi:
"Two bugfixes, both bound for -stable"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: break infinite loop in fuse_fill_write_pages()
cuse: fix memory leak

Linus Torvalds
2015-12-12 02:56:41 +0800

10 Dec, 2015

4 commits

94356889c btrfs: fix misleading warning when space cache failed to load ... Browse Code »

When an inconsistent space cache is detected during loading we log a
warning that users frequently mistake as instruction to invalidate the
cache manually, even though this is not required. Fix the message to
indicate that the cache will be rebuilt automatically.

Signed-off-by: Holger Hoffstätte
Acked-by: Filipe Manana

Holger Hoffstätte
2015-12-10 19:38:08 +0800
8a7d656f3 Btrfs: fix transaction handle leak in balance ... Browse Code »

If we fail to allocate a new data chunk, we were jumping to the error path
without release the transaction handle we got before. Fix this by always
releasing it before doing the jump.

Fixes: 2c9fe8355258 ("btrfs: Fix lost-data-profile caused by balance bg")
Signed-off-by: Filipe Manana

Filipe Manana
2015-12-10 19:23:24 +0800
348a0013d Btrfs: fix unprotected list move from unused_bgs to deleted_bgs list ... Browse Code »

As of my previous change titled "Btrfs: fix scrub preventing unused block
groups from being deleted", the following warning at
extent-tree.c:btrfs_delete_unused_bgs() can be hit when we mount the a
filesysten with "-o discard":

10263 void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
10264 {
(...)
10405 if (trimming) {
10406 WARN_ON(!list_empty(&block_group->bg_list));
10407 spin_lock(&trans->transaction->deleted_bgs_lock);
10408 list_move(&block_group->bg_list,
10409 &trans->transaction->deleted_bgs);
10410 spin_unlock(&trans->transaction->deleted_bgs_lock);
10411 btrfs_get_block_group(block_group);
10412 }
(...)

This happens because scrub can now add back the block group to the list of
unused block groups (fs_info->unused_bgs). This is dangerous because we
are moving the block group from the unused block groups list to the list
of deleted block groups without holding the lock that protects the source
list (fs_info->unused_bgs_lock).

The following diagram illustrates how this happens:

CPU 1 CPU 2

cleaner_kthread()
btrfs_delete_unused_bgs()

sees bg X in list
fs_info->unused_bgs

deletes bg X from list
fs_info->unused_bgs

scrub_enumerate_chunks()

searches device tree using
its commit root

finds device extent for
block group X

gets block group X from the tree
fs_info->block_group_cache_tree
(via btrfs_lookup_block_group())

sets bg X to RO (again)

scrub_chunk(bg X)

sets bg X back to RW mode

adds bg X to the list
fs_info->unused_bgs again,
since it's still unused and
currently not in that list

sets bg X to RO mode

btrfs_remove_chunk(bg X)

--> discard is enabled and bg X
is in the fs_info->unused_bgs
list again so the warning is
triggered
--> we move it from that list into
the transaction's delete_bgs
list, but we can have another
task currently manipulating
the first list (fs_info->unused_bgs)

Fix this by using the same lock (fs_info->unused_bgs_lock) to protect both
the list of unused block groups and the list of deleted block groups. This
makes it safe and there's not much worry for more lock contention, as this
lock is seldom used and only the cleaner kthread adds elements to the list
of deleted block groups. The warning goes away too, as this was previously
an impossible case (and would have been better a BUG_ON/ASSERT) but it's
not impossible anymore.
Reproduced with fstest btrfs/073 (using MOUNT_OPTIONS="-o discard").

Signed-off-by: Filipe Manana

Filipe Manana
2015-12-10 19:22:38 +0800
626d114f4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:
"A couple of fixes, both -stable fodder (9p one all way back to 2.6.32,
dio - to all branches where "Fix negative return from dio read beyond
eof" will end up it; it's a fixup to commit marked for -stable)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix the regression from "direct-io: Fix negative return from dio read beyond eof"
9p: ->evict_inode() should kick out ->i_data, not ->i_mapping

Linus Torvalds
2015-12-10 01:34:26 +0800

09 Dec, 2015

2 commits

2d4594acb fix the regression from "direct-io: Fix negative return from dio read beyond eof" ... Browse Code »

Sure, it's better to bail out of past-the-eof read and return 0 than return
a bogus negative value on such. Only we'd better make sure we are bailing out
with 0 and not -ENOMEM...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro

Al Viro
2015-12-09 04:02:42 +0800
4ad786284 9p: ->evict_inode() should kick out ->i_data, not ->i_mapping ... Browse Code »

For block devices the pagecache is associated with the inode
on bdevfs, not with the aliasing ones on the mountable filesystems.
The latter have its own ->i_data empty and ->i_mapping pointing
to the (unique per major/minor) bdevfs inode. That guarantees
cache coherence between all block device inodes with the same
device number.

Eviction of an alias inode has no business trying to evict the
pages belonging to bdevfs one; moreover, ->i_mapping is only
safe to access when the thing is opened. At the time of
->evict_inode() the victim is definitely *not* opened. We are
about to kill the address space embedded into struct inode
(inode->i_data) and that's what we need to empty of any pages.

9p instance tries to empty inode->i_mapping instead, which is
both unsafe and bogus - if we have several device nodes with
the same device number in different places, closing one of them
should not try to empty the (shared) page cache.

Fortunately, other instances in the tree are OK; they are
evicting from &inode->i_data instead, as 9p one should.

Cc: stable@vger.kernel.org # v2.6.32+, ones prior to 2.6.36 need only half of that
Reported-by: "Suzuki K. Poulose"
Tested-by: "Suzuki K. Poulose"
Signed-off-by: Al Viro

Al Viro
2015-12-09 03:51:16 +0800

08 Dec, 2015

2 commits

756b9b37c SUNRPC: Fix callback channel ... Browse Code »

The NFSv4.1 callback channel is currently broken because the receive
message will keep shrinking because the backchannel receive buffer size
never gets reset.
The easiest solution to this problem is instead of changing the receive
buffer, to rather adjust the copied request.

Fixes: 38b7631fbe42 ("nfs4: limit callback decoding to received bytes")
Cc: Benjamin Coddington
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust

Trond Myklebust
2015-12-08 05:04:59 +0800
f41683a20 Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 fixes from Ted Ts'o:
"Ext4 bug fixes for v4.4, including fixes for post-2038 time encodings,
some endian conversion problems with ext4 encryption, potential memory
leaks after truncate in data=journal mode, and an ocfs2 regression
caused by a jbd2 performance improvement"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: fix null committed data return in undo_access
ext4: add "static" to ext4_seq_##name##_fops struct
ext4: fix an endianness bug in ext4_encrypted_follow_link()
ext4: fix an endianness bug in ext4_encrypted_zeroout()
jbd2: Fix unreclaimed pages after truncate in data=journal mode
ext4: Fix handling of extended tv_sec

Linus Torvalds
2015-12-08 02:25:00 +0800

07 Dec, 2015

4 commits

d8cd93ea6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:
"A couple of fixes (-stable fodder) + dead code removal after the
overlayfs fix.

I agree that it's better to separate from the fix part to make
backporting easier, but IMO it's not worth delaying said dead code
removal until the next window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Don't reset ->total_link_count on nested calls of vfs_path_lookup()
ovl: get rid of the dead code left from broken (and disabled) optimizations
ovl: fix permission checking for setattr

Linus Torvalds
2015-12-07 05:51:49 +0800
2788cc47f Don't reset ->total_link_count on nested calls of vfs_path_lookup() ... Browse Code »

we already zero it on outermost set_nameidata(), so initialization in
path_init() is pointless and wrong. The same DoS exists on pre-4.2
kernels, but there a slightly different fix will be needed.

Cc: stable@vger.kernel.org # v4.2
Signed-off-by: Al Viro

Al Viro
2015-12-07 01:33:02 +0800
0f7ff2dab ovl: get rid of the dead code left from broken (and disabled) optimizations ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2015-12-07 01:31:07 +0800
acff81ec2 ovl: fix permission checking for setattr ... Browse Code »

[Al Viro] The bug is in being too enthusiastic about optimizing ->setattr()
away - instead of "copy verbatim with metadata" + "chmod/chown/utimes"
(with the former being always safe and the latter failing in case of
insufficient permissions) it tries to combine these two. Note that copyup
itself will have to do ->setattr() anyway; _that_ is where the elevated
capabilities are right. Having these two ->setattr() (one to set verbatim
copy of metadata, another to do what overlayfs ->setattr() had been asked
to do in the first place) combined is where it breaks.

Signed-off-by: Miklos Szeredi
Cc:
Signed-off-by: Al Viro

Miklos Szeredi
2015-12-07 01:28:23 +0800

05 Dec, 2015

2 commits

43d1c0eb7 block: detach bdev inode from its wb in __blkdev_put() ... Browse Code »

Since 52ebea749aae ("writeback: make backing_dev_info host
cgroup-specific bdi_writebacks") inode, at some point in its lifetime,
gets attached to a wb (struct bdi_writeback). Detaching happens on
evict, in inode_detach_wb() called from __destroy_inode(), and involves
updating wb.

However, detaching an internal bdev inode from its wb in
__destroy_inode() is too late. Its bdi and by extension root wb are
embedded into struct request_queue, which has different lifetime rules
and can be freed long before the final bdput() is called (can be from
__fput() of a corresponding /dev inode, through dput() - evict() -
bd_forget(). bdevs hold onto the underlying disk/queue pair only while
opened; as soon as bdev is closed all bets are off. In fact,
disk/queue can be gone before __blkdev_put() even returns:

1499 static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
1500 {
...
1518 if (bdev->bd_contains == bdev) {
1519 if (disk->fops->release)
1520 disk->fops->release(disk, mode);

[ Driver puts its references to disk/queue ]

1521 }
1522 if (!bdev->bd_openers) {
1523 struct module *owner = disk->fops->owner;
1524
1525 disk_put_part(bdev->bd_part);
1526 bdev->bd_part = NULL;
1527 bdev->bd_disk = NULL;
1528 if (bdev != bdev->bd_contains)
1529 victim = bdev->bd_contains;
1530 bdev->bd_contains = NULL;
1531
1532 put_disk(disk);

[ We put ours, the queue is gone
The last bdput() would result in a write to invalid memory ]

1533 module_put(owner);
...
1539 }

Since bdev inodes are special anyway, detach them in __blkdev_put()
after clearing inode's dirty bits, turning the problematic
inode_detach_wb() in __destroy_inode() into a noop.

add_disk() grabs its disk->queue since 523e1d399ce0 ("block: make
gendisk hold a reference to its queue"), so the old ->release comment
is removed in favor of the new inode_detach_wb() comment.

Cc: stable@vger.kernel.org # 4.2+, needs backporting
Signed-off-by: Ilya Dryomov
Acked-by: Tejun Heo
Tested-by: Raghavendra K T
Signed-off-by: Jens Axboe

Ilya Dryomov
2015-12-05 02:02:17 +0800
087ffd4ea jbd2: fix null committed data return in undo_access ... Browse Code »

introduced jbd2_write_access_granted() to improve write|undo_access
speed, but missed to check the status of b_committed_data which caused
a kernel panic on ocfs2.

[ 6538.405938] ------------[ cut here ]------------
[ 6538.406686] kernel BUG at fs/ocfs2/suballoc.c:2400!
[ 6538.406686] invalid opcode: 0000 [#1] SMP
[ 6538.406686] Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront xen_fbfront parport_pc parport pcspkr i2c_piix4 acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix cirrus ttm drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect i2c_core syscopyarea dm_mirror dm_region_hash dm_log dm_mod
[ 6538.406686] CPU: 1 PID: 16265 Comm: mmap_truncate Not tainted 4.3.0 #1
[ 6538.406686] Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014
[ 6538.406686] task: ffff88007c2bab00 ti: ffff880075b78000 task.ti: ffff880075b78000
[ 6538.406686] RIP: 0010:[] [] ocfs2_block_group_clear_bits+0x23b/0x250 [ocfs2]
[ 6538.406686] RSP: 0018:ffff880075b7b7f8 EFLAGS: 00010246
[ 6538.406686] RAX: ffff8800760c5b40 RBX: ffff88006c06a000 RCX: ffffffffa06e6df0
[ 6538.406686] RDX: 0000000000000000 RSI: ffff88007a6f6ea0 RDI: ffff88007a760430
[ 6538.406686] RBP: ffff880075b7b878 R08: 0000000000000002 R09: 0000000000000001
[ 6538.406686] R10: ffffffffa06769be R11: 0000000000000000 R12: 0000000000000001
[ 6538.406686] R13: ffffffffa06a1750 R14: 0000000000000001 R15: ffff88007a6f6ea0
[ 6538.406686] FS: 00007f17fde30720(0000) GS:ffff88007f040000(0000) knlGS:0000000000000000
[ 6538.406686] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6538.406686] CR2: 0000000000601730 CR3: 000000007aea0000 CR4: 00000000000406e0
[ 6538.406686] Stack:
[ 6538.406686] ffff88007c2bb5b0 ffff880075b7b8e0 ffff88007a7604b0 ffff88006c640800
[ 6538.406686] ffff88007a7604b0 ffff880075d77390 0000000075b7b878 ffffffffa06a309d
[ 6538.406686] ffff880075d752d8 ffff880075b7b990 ffff880075b7b898 0000000000000000
[ 6538.406686] Call Trace:
[ 6538.406686] [] ? ocfs2_read_group_descriptor+0x6d/0xa0 [ocfs2]
[ 6538.406686] [] _ocfs2_free_suballoc_bits+0xe4/0x320 [ocfs2]
[ 6538.406686] [] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
[ 6538.406686] [] _ocfs2_free_clusters+0xee/0x210 [ocfs2]
[ 6538.406686] [] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
[ 6538.406686] [] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
[ 6538.406686] [] ? ocfs2_extend_trans+0x50/0x1a0 [ocfs2]
[ 6538.406686] [] ocfs2_free_clusters+0x15/0x20 [ocfs2]
[ 6538.406686] [] ocfs2_replay_truncate_records+0xfc/0x290 [ocfs2]
[ 6538.406686] [] ? ocfs2_start_trans+0xec/0x1d0 [ocfs2]
[ 6538.406686] [] __ocfs2_flush_truncate_log+0x140/0x2d0 [ocfs2]
[ 6538.406686] [] ? ocfs2_reserve_blocks_for_rec_trunc.clone.0+0x44/0x170 [ocfs2]
[ 6538.406686] [] ocfs2_remove_btree_range+0x374/0x630 [ocfs2]
[ 6538.406686] [] ? jbd2_journal_stop+0x25b/0x470 [jbd2]
[ 6538.406686] [] ocfs2_commit_truncate+0x305/0x670 [ocfs2]
[ 6538.406686] [] ? ocfs2_journal_access_eb+0x20/0x20 [ocfs2]
[ 6538.406686] [] ocfs2_truncate_file+0x297/0x380 [ocfs2]
[ 6538.406686] [] ? jbd2_journal_begin_ordered_truncate+0x64/0xc0 [jbd2]
[ 6538.406686] [] ocfs2_setattr+0x572/0x860 [ocfs2]
[ 6538.406686] [] ? current_fs_time+0x3f/0x50
[ 6538.406686] [] notify_change+0x1d7/0x340
[ 6538.406686] [] ? generic_getxattr+0x79/0x80
[ 6538.406686] [] do_truncate+0x66/0x90
[ 6538.406686] [] ? __audit_syscall_entry+0xb0/0x110
[ 6538.406686] [] do_sys_ftruncate.clone.0+0xf3/0x120
[ 6538.406686] [] SyS_ftruncate+0xe/0x10
[ 6538.406686] [] entry_SYSCALL_64_fastpath+0x12/0x71
[ 6538.406686] Code: 28 48 81 ee b0 04 00 00 48 8b 92 50 fb ff ff 48 8b 80 b0 03 00 00 48 39 90 88 00 00 00 0f 84 30 fe ff ff 0f 0b eb fe 0f 0b eb fe 0b 0f 1f 00 eb fb 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
[ 6538.406686] RIP [] ocfs2_block_group_clear_bits+0x23b/0x250 [ocfs2]
[ 6538.406686] RSP
[ 6538.691128] ---[ end trace 31cd7011d6770d7e ]---
[ 6538.694492] Kernel panic - not syncing: Fatal exception
[ 6538.695484] Kernel Offset: disabled

Fixes: de92c8caf16c("jbd2: speedup jbd2_journal_get_[write|undo]_access()")
Cc:
Signed-off-by: Junxiao Bi
Signed-off-by: Theodore Ts'o

Junxiao Bi
2015-12-05 01:29:28 +0800

04 Dec, 2015

2 commits

071f5d105 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"A lot of Thanksgiving turkey leftovers accumulated, here goes:

1) Fix bluetooth l2cap_chan object leak, from Johan Hedberg.

2) IDs for some new iwlwifi chips, from Oren Givon.

3) Fix rtlwifi lockups on boot, from Larry Finger.

4) Fix memory leak in fm10k, from Stephen Hemminger.

5) We have a route leak in the ipv6 tunnel infrastructure, fix from
Paolo Abeni.

6) Fix buffer pointer handling in arm64 bpf JIT,f rom Zi Shen Lim.

7) Wrong lockdep annotations in tcp md5 support, fix from Eric
Dumazet.

8) Work around some middle boxes which prevent proper handling of TCP
Fast Open, from Yuchung Cheng.

9) TCP repair can do huge kmalloc() requests, build paged SKBs
instead. From Eric Dumazet.

10) Fix msg_controllen overflow in scm_detach_fds, from Daniel
Borkmann.

11) Fix device leaks on ipmr table destruction in ipv4 and ipv6, from
Nikolay Aleksandrov.

12) Fix use after free in epoll with AF_UNIX sockets, from Rainer
Weikusat.

13) Fix double free in VRF code, from Nikolay Aleksandrov.

14) Fix skb leaks on socket receive queue in tipc, from Ying Xue.

15) Fix ifup/ifdown crach in xgene driver, from Iyappan Subramanian.

16) Fix clearing of persistent array maps in bpf, from Daniel
Borkmann.

17) In TCP, for the cross-SYN case, we don't initialize tp->copied_seq
early enough. From Eric Dumazet.

18) Fix out of bounds accesses in bpf array implementation when
updating elements, from Daniel Borkmann.

19) Fill gaps in RCU protection of np->opt in ipv6 stack, from Eric
Dumazet.

20) When dumping proxy neigh entries, we have to accomodate NULL
device pointers properly, from Konstantin Khlebnikov.

21) SCTP doesn't release all ipv6 socket resources properly, fix from
Eric Dumazet.

22) Prevent underflows of sch->q.qlen for multiqueue packet
schedulers, also from Eric Dumazet.

23) Fix MAC and unicast list handling in bnxt_en driver, from Jeffrey
Huang and Michael Chan.

24) Don't actively scan radar channels, from Antonio Quartulli"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (110 commits)
net: phy: reset only targeted phy
bnxt_en: Setup uc_list mac filters after resetting the chip.
bnxt_en: enforce proper storing of MAC address
bnxt_en: Fixed incorrect implementation of ndo_set_mac_address
net: lpc_eth: remove irq > NR_IRQS check from probe()
net_sched: fix qdisc_tree_decrease_qlen() races
openvswitch: fix hangup on vxlan/gre/geneve device deletion
ipv4: igmp: Allow removing groups from a removed interface
ipv6: sctp: implement sctp_v6_destroy_sock()
arm64: bpf: add 'store immediate' instruction
ipv6: kill sk_dst_lock
ipv6: sctp: add rcu protection around np->opt
net/neighbour: fix crash at dumping device-agnostic proxy entries
sctp: use GFP_USER for user-controlled kmalloc
sctp: convert sack_needed and sack_generation to bits
ipv6: add complete rcu protection around np->opt
bpf: fix allocation warnings in bpf maps and integer overflow
mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0
net: mvneta: enable setting custom TX IP checksum limit
net: mvneta: fix error path for building skb
...

Linus Torvalds
2015-12-04 08:02:46 +0800
2873d32ff Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A collection of fixes from this series. The most important here is a
regression fix for an issue that some folks would hit in blk-merge.c,
and the NVMe queue depth limit for the screwed up Apple "nvme"
controller.

In more detail, this pull request contains:

- a set of fixes for null_blk, including a fix for a few corner cases
where we could hang the device. From Arianna and Paolo.

- lightnvm:
- A build improvement from Keith.
- Update the qemu pci id detection from Matias.
- Error handling fixes for leaks and other little fixes from
Sudip and Wenwei.

- fix from Eric where BLKRRPART would not return EBUSY for whole
device mounts, only when partitions were mounted.

- fix from Jan Kara, where EOF O_DIRECT reads would return
negatively.

- remove check for rq_mergeable() when checking limits for cloned
requests. The check doesn't make any sense. It's assuming that
since NOMERGE is set on the request that we don't have to
recalculate limits since the request didn't change, but that's not
true if the request has been redirected. From Hannes.

- correctly get the bio front segment value set for single segment
bio's, fixing a BUG() in blk-merge. From Ming"

* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: temporary fix for Apple controller reset
null_blk: change type of completion_nsec to unsigned long
null_blk: guarantee device restart in all irq modes
null_blk: set a separate timer for each command
blk-merge: fix computing bio->bi_seg_front_size in case of single segment
direct-io: Fix negative return from dio read beyond eof
block: Always check queue limits for cloned requests
lightnvm: missing nvm_lock acquire
lightnvm: unconverted ppa returned in get_bb_tbl
lightnvm: refactor and change vendor id for qemu
lightnvm: do device max sectors boundary check first
lightnvm: fix ioctl memory leaks
lightnvm: free memory when gennvm register fails
lightnvm: Simplify config when disabled
Return EBUSY from BLKRRPART for mounted whole-dev fs

Linus Torvalds
2015-12-04 07:45:16 +0800

02 Dec, 2015

1 commit

9cd3e072b net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA ... Browse Code »

This patch is a cleanup to make following patch easier to
review.

Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
from (struct socket)->flags to a (struct socket_wq)->flags
to benefit from RCU protection in sock_wake_async()

To ease backports, we rename both constants.

Two new helpers, sk_set_bit(int nr, struct sock *sk)
and sk_clear_bit(int net, struct sock *sk) are added so that
following patch can change their implementation.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-12-02 04:45:05 +0800

01 Dec, 2015

1 commit

74cedf9b6 direct-io: Fix negative return from dio read beyond eof ... Browse Code »

Assume a filesystem with 4KB blocks. When a file has size 1000 bytes and
we issue direct IO read at offset 1024, blockdev_direct_IO() reads the
tail of the last block and the logic for handling short DIO reads in
dio_complete() results in a return value -24 (1000 - 1024) which
obviously confuses userspace.

Fix the problem by bailing out early once we sample i_size and can
reliably check that direct IO read starts beyond i_size.

Reported-by: Avi Kivity
Fixes: 9fe55eea7e4b444bafc42fa0000cc2d1d2847275
CC: stable@vger.kernel.org
CC: Steven Whitehouse
Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2015-12-01 01:15:42 +0800

28 Nov, 2015

2 commits

8003a5735 Merge tag 'nfs-for-4.4-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:

Stable patches:
- Fix a NFSv4 callback identifier leak that was also causing client
crashes
- Fix NFSv4 callback decoding issues when incoming requests are
truncated
- Don't declare the attribute cache valid when we call
nfs_update_inode with an empty attribute structure.
- Resend LAYOUTGET when there is a race that changes the seqid

Bugfixes:
- Fix a number of issues with the NFSv4.2 CLONE ioctl()
- Properly set NFS v4.2 NFSDBG_FACILITY
- NFSv4 referrals are broken; Cleanup FATTR4_WORD0_FS_LOCATIONS after
decoding success
- Use sliding delay when LAYOUTGET gets NFS4ERR_DELAY
- Ensure that attrcache is revalidated after a SETATTR"

* tag 'nfs-for-4.4-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs4: resend LAYOUTGET when there is a race that changes the seqid
nfs: if we have no valid attrs, then don't declare the attribute cache valid
nfs: ensure that attrcache is revalidated after a SETATTR
nfs4: limit callback decoding to received bytes
nfs4: start callback_ident at idr 1
nfs: use sliding delay when LAYOUTGET gets NFS4ERR_DELAY
NFS4: Cleanup FATTR4_WORD0_FS_LOCATIONS after decoding success
NFS: Properly set NFS v4.2 NFSDBG_FACILITY
nfs: reduce the amount of ifdefs for v4.2 in nfs4file.c
nfs: use btrfs ioctl defintions for clone
nfs: allow intra-file CLONE
nfs: offer native ioctls even if CONFIG_COMPAT is set
nfs: pass on count for CLONE operations

Linus Torvalds
2015-11-28 09:22:47 +0800
80e0c505b Merge branch 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"This has Mark Fasheh's patches to fix quota accounting during subvol
deletion, which we've been working on for a while now. The patch is
pretty small but it's a key fix.

Otherwise it's a random assortment"

* 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: fix balance range usage filters in 4.4-rc
btrfs: qgroup: account shared subtree during snapshot delete
Btrfs: use btrfs_get_fs_root in resolve_indirect_ref
btrfs: qgroup: fix quota disable during rescan
Btrfs: fix race between cleaner kthread and space cache writeout
Btrfs: fix scrub preventing unused block groups from being deleted
Btrfs: fix race between scrub and block group deletion
btrfs: fix rcu warning during device replace
btrfs: Continue replace when set_block_ro failed
btrfs: fix clashing number of the enhanced balance usage filter
Btrfs: fix the number of transaction units needed to remove a block group
Btrfs: use global reserve when deleting unused block group after ENOSPC
Btrfs: tests: checking for NULL instead of IS_ERR()
btrfs: fix signed overflows in btrfs_sync_file

Linus Torvalds
2015-11-28 07:45:45 +0800

27 Nov, 2015

3 commits

681c46b16 ext4: add "static" to ext4_seq_##name##_fops struct ... Browse Code »

to fix sparse warning, add static to ext4_seq_##name##_fops struct.

Signed-off-by: Theodore Ts'o

Xu Cang
2015-11-27 04:52:24 +0800
5a1c7f47d ext4: fix an endianness bug in ext4_encrypted_follow_link() ... Browse Code »

applying le32_to_cpu() to 16bit value is a bad idea...

Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Al Viro
Signed-off-by: Theodore Ts'o

Al Viro
2015-11-27 04:20:50 +0800
e2c9e0b28 ext4: fix an endianness bug in ext4_encrypted_zeroout() ... Browse Code »

ex->ee_block is not host-endian (note that accesses of other fields
of *ex right next to that line go through the helpers that do proper
conversion from little-endian to host-endian; it might make sense
to add similar for ->ee_block to avoid reintroducing that kind of
bugs...)

Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Al Viro
Signed-off-by: Theodore Ts'o

Al Viro
2015-11-27 04:20:19 +0800