Eric Lee / smarc-fsl-linux-kernel

26 Jan, 2018

2 commits

6793f1c45 orangefs: fix deadlock; do not write i_size in read_iter ... Browse Code »

After do_readv_writev, the inode cache is invalidated anyway, so i_size
will never be read. It will be fetched from the server which will also
know about updates from other machines.

Fixes deadlock on 32-bit SMP.

See https://marc.info/?l=linux-fsdevel&m=151268557427760&w=2

Signed-off-by: Martin Brandenburg
Cc: Al Viro
Cc: Mike Marshall
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds

Martin Brandenburg
2018-01-26 09:26:24 +0800
525273fb2 Merge tag 'for-4.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux ... Browse Code »

Pull btrfs fix from David Sterba:
"It's been reported recently that readdir can list stale entries under
some conditions. Fix it."

* tag 'for-4.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix stale entries in readdir

Linus Torvalds
2018-01-26 01:03:10 +0800

25 Jan, 2018

1 commit

e4fd493c0 Btrfs: fix stale entries in readdir ... Browse Code »

In fixing the readdir+pagefault deadlock I accidentally introduced a
stale entry regression in readdir. If we get close to full for the
temporary buffer, and then skip a few delayed deletions, and then try to
add another entry that won't fit, we will emit the entries we found and
retry. Unfortunately we delete entries from our del_list as we find
them, assuming we won't need them. However our pos will be with
whatever our last entry was, which could be before the delayed deletions
we skipped, so the next search will add the deleted entries back into
our readdir buffer. So instead don't delete entries we find in our
del_list so we can make sure we always find our delayed deletions. This
is a slight perf hit for readdir with lots of pending deletions, but
hopefully this isn't a common occurrence. If it is we can revist this
and optimize it.

cc: stable@vger.kernel.org
Fixes: 23b5ec74943f ("btrfs: fix readdir deadlock with pagefault")
Signed-off-by: Josef Bacik
Signed-off-by: David Sterba

Josef Bacik
2018-01-25 03:27:48 +0800

23 Jan, 2018

3 commits

199526672 nfsd: auth: Fix gid sorting when rootsquash enabled ... Browse Code »

Commit bdcf0a423ea1 ("kernel: make groups_sort calling a responsibility
group_info allocators") appears to break nfsd rootsquash in a pretty
major way.

It adds a call to groups_sort() inside the loop that copies/squashes
gids, which means the valid gids are sorted along with the following
garbage. The net result is that the highest numbered valid gids are
replaced with any lower-valued garbage gids, possibly including 0.

We should sort only once, after filling in all the gids.

Fixes: bdcf0a423ea1 ("kernel: make groups_sort calling a responsibility ...")
Signed-off-by: Ben Hutchings
Acked-by: J. Bruce Fields
Signed-off-by: Linus Torvalds

Ben Hutchings
2018-01-23 12:13:07 +0800
a0ec1ded2 orangefs: initialize op on loop restart in orangefs_devreq_read ... Browse Code »

In orangefs_devreq_read, there is a loop which picks an op off the list
of pending ops. If the loop fails to find an op, there is nothing to
read, and it returns EAGAIN. If the op has been given up on, the loop
is restarted via a goto. The bug is that the variable which the found
op is written to is not reinitialized, so if there are no more eligible
ops on the list, the code runs again on the already handled op.

This is triggered by interrupting a process while the op is being copied
to the client-core. It's a fairly small window, but it's there.

Signed-off-by: Martin Brandenburg
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds

Martin Brandenburg
2018-01-23 05:51:14 +0800
0afc0decf orangefs: use list_for_each_entry_safe in purge_waiting_ops ... Browse Code »

set_op_state_purged can delete the op.

Signed-off-by: Martin Brandenburg
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds

Martin Brandenburg
2018-01-23 05:51:14 +0800

20 Jan, 2018

1 commit

8bb2ee192 proc: fix coredump vs read /proc/*/stat race ... Browse Code »

do_task_stat() accesses IP and SP of a task without bumping reference
count of a stack (which became an entity with independent lifetime at
some point).

Steps to reproduce:

#include
#include
#include
#include
#include
#include
#include
#include

int main(void)
{
setrlimit(RLIMIT_CORE, &(struct rlimit){});

while (1) {
char buf[64];
char buf2[4096];
pid_t pid;
int fd;

pid = fork();
if (pid == 0) {
*(volatile int *)0 = 0;
}

snprintf(buf, sizeof(buf), "/proc/%u/stat", pid);
fd = open(buf, O_RDONLY);
read(fd, buf2, sizeof(buf2));
close(fd);

waitpid(pid, NULL, 0);
}
return 0;
}

BUG: unable to handle kernel paging request at 0000000000003fd8
IP: do_task_stat+0x8b4/0xaf0
PGD 800000003d73e067 P4D 800000003d73e067 PUD 3d558067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 1417 Comm: a.out Not tainted 4.15.0-rc8-dirty #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
RIP: 0010:do_task_stat+0x8b4/0xaf0
Call Trace:
proc_single_show+0x43/0x70
seq_read+0xe6/0x3b0
__vfs_read+0x1e/0x120
vfs_read+0x84/0x110
SyS_read+0x3d/0xa0
entry_SYSCALL_64_fastpath+0x13/0x6c
RIP: 0033:0x7f4d7928cba0
RSP: 002b:00007ffddb245158 EFLAGS: 00000246
Code: 03 b7 a0 01 00 00 4c 8b 4c 24 70 4c 8b 44 24 78 4c 89 74 24 18 e9 91 f9 ff ff f6 45 4d 02 0f 84 fd f7 ff ff 48 8b 45 40 48 89 ef 8b 80 d8 3f 00 00 48 89 44 24 20 e8 9b 97 eb ff 48 89 44 24
RIP: do_task_stat+0x8b4/0xaf0 RSP: ffffc90000607cc8
CR2: 0000000000003fd8

John Ogness said: for my tests I added an else case to verify that the
race is hit and correctly mitigated.

Link: http://lkml.kernel.org/r/20180116175054.GA11513@avx2
Signed-off-by: Alexey Dobriyan
Reported-by: "Kohli, Gaurav"
Tested-by: John Ogness
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2018-01-20 02:09:41 +0800

07 Jan, 2018

1 commit

75d4276e8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:

- untangle sys_close() abuses in xt_bpf

- deal with register_shrinker() failures in sget()

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix "netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'"
sget(): handle failures of register_shrinker()
mm,vmscan: Make unregister_shrinker() no-op if register_shrinker() failed.

Linus Torvalds
2018-01-07 09:13:21 +0800

06 Jan, 2018

2 commits

89876f275 Merge tag 'for-4.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux ... Browse Code »

Pull btrfs fixes from David Sterba:
"We have two more fixes for 4.15, both aimed for stable.

The leak fix is obvious, the second patch fixes a bug revealed by the
refcount API, when it behaves differently than previous atomic_t and
reports refs going from 0 to 1 in one case"

* tag 'for-4.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes
btrfs: Fix flush bio leak

Linus Torvalds
2018-01-06 05:02:46 +0800
12e971b65 Merge tag 'xfs-4.15-fixes-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull XFS fixes from Darrick Wong:
"I have just a few fixes for bugs and resource cleanup problems this
week:

- Fix resource cleanup of failed quota initialization

- Fix integer overflow problems wrt s_maxbytes"

* tag 'xfs-4.15-fixes-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix s_maxbytes overflow problems
xfs: quota: check result of register_shrinker()
xfs: quota: fix missed destroy of qi_tree_lock

Linus Torvalds
2018-01-06 04:59:32 +0800

05 Jan, 2018

1 commit

0cbb4b4f4 userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails ... Browse Code »

The previous fix in commit 384632e67e08 ("userfaultfd: non-cooperative:
fix fork use after free") corrected the refcounting in case of
UFFD_EVENT_FORK failure for the fork userfault paths.

That still didn't clear the vma->vm_userfaultfd_ctx of the vmas that
were set to point to the aborted new uffd ctx earlier in
dup_userfaultfd.

Link: http://lkml.kernel.org/r/20171223002505.593-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli
Reported-by: syzbot
Reviewed-by: Mike Rapoport
Cc: Eric Biggers
Cc: Dmitry Vyukov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2018-01-05 08:45:09 +0800

04 Jan, 2018

2 commits

50d0f78f5 Merge branch 'afs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs ... Browse Code »

Pull afs/fscache fixes from David Howells:

- Fix the default return of fscache_maybe_release_page() when a cache
isn't in use - it prevents a filesystem from releasing pages. This
can cause a system to OOM.

- Fix a potential uninitialised variable in AFS.

- Fix AFS unlink's handling of the nlink count. It needs to use the
nlink manipulation functions so that inode structs of deleted inodes
actually get scheduled for destruction.

- Fix error handling in afs_write_end() so that the page gets unlocked
and put if we can't fill the unwritten portion.

* 'afs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix missing error handling in afs_write_end()
afs: Fix unlink
afs: Potential uninitialized variable in afs_extract_data()
fscache: Fix the default for fscache_maybe_release_page()

Linus Torvalds
2018-01-04 02:58:56 +0800
e816c201a exec: Weaken dumpability for secureexec ... Browse Code »

This is a logical revert of commit e37fdb785a5f ("exec: Use secureexec
for setting dumpability")

This weakens dumpability back to checking only for uid/gid changes in
current (which is useless), but userspace depends on dumpability not
being tied to secureexec.

https://bugzilla.redhat.com/show_bug.cgi?id=1528633

Reported-by: Tom Horsley
Fixes: e37fdb785a5f ("exec: Use secureexec for setting dumpability")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook
Signed-off-by: Linus Torvalds

Kees Cook
2018-01-04 02:13:36 +0800

03 Jan, 2018

5 commits

b4d8ad7fd xfs: fix s_maxbytes overflow problems ... Browse Code »

Fix some integer overflow problems if offset + count happen to be large
enough to cause an integer overflow.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2018-01-03 02:16:32 +0800
3a3882ff2 xfs: quota: check result of register_shrinker() ... Browse Code »

xfs_qm_init_quotainfo() does not check result of register_shrinker()
which was tagged as __must_check recently, reported by sparse.

Signed-off-by: Aliaksei Karaliou
[darrick: move xfs_qm_destroy_quotainos nearer xfs_qm_init_quotainos]
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Aliaksei Karaliou
2018-01-03 02:16:32 +0800
219688156 xfs: quota: fix missed destroy of qi_tree_lock ... Browse Code »

xfs_qm_destroy_quotainfo() does not destroy quotainfo->qi_tree_lock
while destroys quotainfo->qi_quotaofflock.

Signed-off-by: Aliaksei Karaliou
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Aliaksei Karaliou
2018-01-03 02:16:32 +0800
ec35e48b2 btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes ... Browse Code »

refcounts have a generic implementation and an asm optimized one. The
generic version has extra debugging to make sure that once a refcount
goes to zero, refcount_inc won't increase it.

The btrfs delayed inode code wasn't expecting this, and we're tripping
over the warnings when the generic refcounts are used. We ended up with
this race:

Process A Process B
btrfs_get_delayed_node()
spin_lock(root->inode_lock)
radix_tree_lookup()
__btrfs_release_delayed_node()
refcount_dec_and_test(&delayed_node->refs)
our refcount is now zero
refcount_add(2) inode_lock)
radix_tree_delete()

With the generic refcounts, we actually warn again when process B above
tries to release his refcount because refcount_add() turned into a
no-op.

We saw this in production on older kernels without the asm optimized
refcounts.

The fix used here is to use refcount_inc_not_zero() to detect when the
object is in the middle of being freed and return NULL. This is almost
always the right answer anyway, since we usually end up pitching the
delayed_node if it didn't have fresh data in it.

This also changes __btrfs_release_delayed_node() to remove the extra
check for zero refcounts before radix tree deletion.
btrfs_get_delayed_node() was the only path that was allowing refcounts
to go from zero to one.

Fixes: 6de5f18e7b0da ("btrfs: fix refcount_t usage when deleting btrfs_delayed_node")
CC: # 4.12+
Signed-off-by: Chris Mason
Reviewed-by: Liu Bo
Signed-off-by: David Sterba

Chris Mason
2018-01-03 01:00:14 +0800
beed9263f btrfs: Fix flush bio leak ... Browse Code »

Commit e0ae99941423 ("btrfs: preallocate device flush bio") reworked
the way the flush bio is allocated and used. Concretely it allocates
the bio in __alloc_device and then re-uses it multiple times with a
very simple endio routine that just calls complete() without consuming
a reference. Allocated bios by default come with a ref count of 1,
which is then consumed by the endio routine (or not, in which case they
should be bio_put by the caller). The way the impleementation works now
is that the flush bio has a refcount of 2 and we only ever bio_put it
once, leaving it to hang indefinitely. Fix this by removing the extra
bio_get in __alloc_device.

Fixes: e0ae99941423 ("btrfs: preallocate device flush bio")
Signed-off-by: Nikolay Borisov
Reviewed-by: Liu Bo
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2018-01-03 01:00:13 +0800

02 Jan, 2018

3 commits

afae457d8 afs: Fix missing error handling in afs_write_end() ... Browse Code »

afs_write_end() is missing page unlock and put if afs_fill_page() fails.

Reported-by: Al Viro
Signed-off-by: David Howells

David Howells
2018-01-02 18:02:19 +0800
440fbc3a8 afs: Fix unlink ... Browse Code »

Repeating creation and deletion of a file on an afs mount will run the box
out of memory, e.g.:

dd if=/dev/zero of=/afs/scratch/m0 bs=$((1024*1024)) count=512
rm /afs/scratch/m0

The problem seems to be that it's not properly decrementing the nlink count
so that the inode can be scrapped.

Note that this doesn't fix local creation followed by remote deletion.
That's harder to handle and will require a separate patch as we're not told
that the file has been deleted - only that the directory has changed.

Reported-by: Marc Dionne
Signed-off-by: David Howells

David Howells
2018-01-02 18:02:19 +0800
7888da958 afs: Potential uninitialized variable in afs_extract_data() ... Browse Code »

Smatch warns that:

fs/afs/rxrpc.c:922 afs_extract_data()
error: uninitialized symbol 'remote_abort'.

Smatch is right that "remote_abort" might be uninitialized when we pass
it to afs_set_call_complete(). I don't know if that function uses the
uninitialized variable. Anyway, the comment for rxrpc_kernel_recv_data(),
says that "*_abort should also be initialised to 0." and this patch does
that.

Signed-off-by: Dan Carpenter
Signed-off-by: David Howells

Dan Carpenter
2018-01-02 18:02:19 +0800

23 Dec, 2017

1 commit

fca0e39b2 Merge tag 'xfs-4.15-fixes-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull xfs fixes from Darrick Wong:
"Here are some XFS fixes for 4.15-rc5. Apologies for the unusually
large number of patches this late, but I wanted to make sure the
corruption fixes were really ready to go.

Changes since last update:

- Fix a locking problem during xattr block conversion that could lead
to the log checkpointing thread to try to write an incomplete
buffer to disk, which leads to a corruption shutdown

- Fix a null pointer dereference when removing delayed allocation
extents

- Remove post-eof speculative allocations when reflinking a block
past current inode size so that we don't just leave them there and
assert on inode reclaim

- Relax an assert which didn't accurately reflect the way locking
works and would trigger under heavy io load

- Avoid infinite loop when cancelling copy on write extents after a
writeback failure

- Try to avoid copy on write transaction reservation overflows when
remapping after a successful write

- Fix various problems with the copy-on-write reservation automatic
garbage collection not being cleaned up properly during a ro
remount

- Fix problems with rmap log items being processed in the wrong
order, leading to corruption shutdowns

- Fix problems with EFI recovery wherein the "remove any rmapping if
present" mechanism wasn't actually doing anything, which would lead
to corruption problems later when the extent is reallocated,
leading to multiple rmaps for the same extent"

* tag 'xfs-4.15-fixes-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: only skip rmap owner checks for unknown-owner rmap removal
xfs: always honor OWN_UNKNOWN rmap removal requests
xfs: queue deferred rmap ops for cow staging extent alloc/free in the right order
xfs: set cowblocks tag for direct cow writes too
xfs: remove leftover CoW reservations when remounting ro
xfs: don't be so eager to clear the cowblocks tag on truncate
xfs: track cowblocks separately in i_flags
xfs: allow CoW remap transactions to use reserve blocks
xfs: avoid infinite loop when cancelling CoW blocks after writeback failure
xfs: relax is_reflink_inode assert in xfs_reflink_find_cow_mapping
xfs: remove dest file's post-eof preallocations before reflinking
xfs: move xfs_iext_insert tracepoint to report useful information
xfs: account for null transactions in bunmapi
xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of an attribute
xfs: add the ability to join a held buffer to a defer_ops

Linus Torvalds
2017-12-23 04:27:27 +0800

22 Dec, 2017

6 commits

68c58e9b9 xfs: only skip rmap owner checks for unknown-owner rmap removal ... Browse Code »

For rmap removal, refactor the rmap owner checks into a separate
function, then skip the checks if we are performing an unknown-owner
removal.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2017-12-22 00:48:38 +0800
33df3a9cf xfs: always honor OWN_UNKNOWN rmap removal requests ... Browse Code »

Calling xfs_rmap_free with an unknown owner is supposed to remove any
rmaps covering that range regardless of owner. This is used by the EFI
recovery code to say "we're freeing this, it mustn't be owned by
anything anymore", but for whatever reason xfs_free_ag_extent filters
them out.

Therefore, remove the filter and make xfs_rmap_unmap actually treat it
as a wildcard owner -- free anything that's already there, and if
there's no owner at all then that's fine too.

There are two existing callers of bmap_add_free that take care the rmap
deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap
cleanup; convert these to use OWN_NULL (via helpers), and now we really
require that an RUI (if any) gets added to the defer ops before any EFI.

Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests,
growfs will have to consult directly with the rmap to ensure that there
aren't any rmaps in the grown region.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2017-12-22 00:48:38 +0800
0525e952d xfs: queue deferred rmap ops for cow staging extent alloc/free in the right order ... Browse Code »

Under the deferred rmap operation scheme, there's a certain order in
which the rmap deferred ops have to be queued to maintain integrity
during log replay. For alloc/map operations that order is cui -> rui;
for free/unmap operations that order is cui -> rui -> efi. However, the
initial refcount code got the ordering wrong in the free side of things
because it queued refcount free op and an EFI and the refcount free op
queued a rmap free op, resulting in the order cui -> efi -> rui.

If we fail before the efd finishes, the efi recovery will try to do a
wildcard rmap removal and the subsequent rui will fail to find the rmap
and blow up. This didn't ever happen due to other screws up in handling
unknown owner rmap removals, but those other screw ups broke recovery in
other ways, so fix the ordering to follow the intended rules.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2017-12-22 00:48:38 +0800
86d692bfa xfs: set cowblocks tag for direct cow writes too ... Browse Code »

If a user performs a direct CoW write, we end up loading the CoW fork
with preallocated extents. Therefore, we must set the cowblocks tag so
that they can be cleared out if we run low on space.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2017-12-22 00:47:37 +0800
10ddf64e4 xfs: remove leftover CoW reservations when remounting ro ... Browse Code »

When we're remounting the filesystem readonly, remove all CoW
preallocations prior to going ro. If the fs goes down after the ro
remount, we never clean up the staging extents, which means xfs_check
will trip over them on a subsequent run. Practically speaking, the next
mount will clean them up too, so this is unlikely to be seen. Since we
shut down the cowblocks cleaner on remount-ro, we also have to make sure
we start it back up if/when we remount-rw.

Found by adding clonerange to fsstress and running xfs/017.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2017-12-22 00:47:32 +0800
363e59baa xfs: don't be so eager to clear the cowblocks tag on truncate ... Browse Code »

Currently, xfs_itruncate_extents clears the cowblocks tag if i_cnextents
is zero. This is wrong, since i_cnextents only tracks real extents in
the CoW fork, which means that we could have some delayed CoW
reservations still in there that will now never get cleaned.

Fix a further bug where we /don't/ clear the reflink iflag if there are
any attribute blocks -- really, it's only safe to clear the reflink flag
if there are no data fork extents and no cow fork extents.

Found by adding clonerange to fsstress in xfs/017.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2017-12-22 00:47:28 +0800

21 Dec, 2017

1 commit

91aae6be4 xfs: track cowblocks separately in i_flags ... Browse Code »

The EOFBLOCKS/COWBLOCKS tags are totally separate things, so track them
with separate i_flags. Right now we're abusing IEOFBLOCKS for both,
which is totally bogus because we won't tag the inode with COWBLOCKS if
IEOFBLOCKS was set by a previous tagging of the inode with EOFBLOCKS.
Found by wiring up clonerange to fsstress in xfs/017.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2017-12-21 09:11:48 +0800

19 Dec, 2017

1 commit

9ee332d99 sget(): handle failures of register_shrinker() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2017-12-19 04:05:07 +0800

18 Dec, 2017

4 commits

779f4e1c6 Revert "exec: avoid RLIMIT_STACK races with prlimit()" ... Browse Code »

This reverts commit 04e35f4495dd560db30c25efca4eecae8ec8c375.

SELinux runs with secureexec for all non-"noatsecure" domain transitions,
which means lots of processes end up hitting the stack hard-limit change
that was introduced in order to fix a race with prlimit(). That race fix
will need to be redesigned.

Reported-by: Laura Abbott
Reported-by: Tomáš Trnka
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook
Signed-off-by: Linus Torvalds

Kees Cook
2017-12-18 06:26:25 +0800
b9f5fb180 cramfs: fix MTD dependency ... Browse Code »

With CONFIG_MTD=m and CONFIG_CRAMFS=y, we now get a link failure:

fs/cramfs/inode.o: In function `cramfs_mount': inode.c:(.text+0x220): undefined reference to `mount_mtd'
fs/cramfs/inode.o: In function `cramfs_mtd_fill_super':
inode.c:(.text+0x6d8): undefined reference to `mtd_point'
inode.c:(.text+0xae4): undefined reference to `mtd_unpoint'

This adds a more specific Kconfig dependency to avoid the broken
configuration.

Alternatively we could make CRAMFS itself depend on "MTD || !MTD" with a
similar result.

Fixes: 99c18ce580c6 ("cramfs: direct memory access support")
Signed-off-by: Arnd Bergmann
Signed-off-by: Nicolas Pitre
Signed-off-by: Linus Torvalds

Arnd Bergmann
2017-12-18 04:20:58 +0800
73d080d37 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:
"The alloc_super() one is a regression in this merge window, lazytime
thing is older..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: Handle lazytime in do_mount()
alloc_super(): do ->s_umount initialization earlier

Linus Torvalds
2017-12-18 04:18:35 +0800
1c6b942d7 Merge tag 'ext4_for_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 fixes from Ted Ts'o:
"Fix a regression which caused us to fail to interpret symlinks in very
ancient ext3 file system images.

Also fix two xfstests failures, one of which could cause an OOPS, plus
an additional bug fix caught by fuzz testing"

* tag 'ext4_for_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix crash when a directory's i_size is too small
ext4: add missing error check in __ext4_new_inode()
ext4: fix fdatasync(2) after fallocate(2) operation
ext4: support fast symlinks from ext3 file systems

Linus Torvalds
2017-12-18 04:14:33 +0800

17 Dec, 2017

1 commit

d025fbf1a Merge tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfs ... Browse Code »

Pull NFS client fixes from Anna Schumaker:
"This has two stable bugfixes, one to fix a BUG_ON() when
nfs_commit_inode() is called with no outstanding commit requests and
another to fix a race in the SUNRPC receive codepath.

Additionally, there are also fixes for an NFS client deadlock and an
xprtrdma performance regression.

Summary:

Stable bugfixes:
- NFS: Avoid a BUG_ON() in nfs_commit_inode() by not waiting for a
commit in the case that there were no commit requests.
- SUNRPC: Fix a race in the receive code path

Other fixes:
- NFS: Fix a deadlock in nfs client initialization
- xprtrdma: Fix a performance regression for small IOs"

* tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: Fix a race in the receive code path
nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests
xprtrdma: Spread reply processing over more CPUs
nfs: fix a deadlock in nfs client initialization

Linus Torvalds
2017-12-17 05:12:53 +0800

16 Dec, 2017

5 commits

f6f373216 Revert "mm: replace p??_write with pte_access_permitted in fault + gup paths" ... Browse Code »

This reverts commits 5c9d2d5c269c, c7da82b894e9, and e7fe7b5cae90.

We'll probably need to revisit this, but basically we should not
complicate the get_user_pages_fast() case, and checking the actual page
table protection key bits will require more care anyway, since the
protection keys depend on the exact state of the VM in question.

Particularly when doing a "remote" page lookup (ie in somebody elses VM,
not your own), you need to be much more careful than this was. Dave
Hansen says:

"So, the underlying bug here is that we now a get_user_pages_remote()
and then go ahead and do the p*_access_permitted() checks against the
current PKRU. This was introduced recently with the addition of the
new p??_access_permitted() calls.

We have checks in the VMA path for the "remote" gups and we avoid
consulting PKRU for them. This got missed in the pkeys selftests
because I did a ptrace read, but not a *write*. I also didn't
explicitly test it against something where a COW needed to be done"

It's also not entirely clear that it makes sense to check the protection
key bits at this level at all. But one possible eventual solution is to
make the get_user_pages_fast() case just abort if it sees protection key
bits set, which makes us fall back to the regular get_user_pages() case,
which then has a vma and can do the check there if we want to.

We'll see.

Somewhat related to this all: what we _do_ want to do some day is to
check the PAGE_USER bit - it should obviously always be set for user
pages, but it would be a good check to have back. Because we have no
generic way to test for it, we lost it as part of moving over from the
architecture-specific x86 GUP implementation to the generic one in
commit e585513b76f7 ("x86/mm/gup: Switch GUP to the generic
get_user_page_fast() implementation").

Cc: Peter Zijlstra
Cc: Dan Williams
Cc: Dave Hansen
Cc: Kirill A. Shutemov
Cc: "Jérôme Glisse"
Cc: Andrew Morton
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2017-12-16 10:53:22 +0800
dd3d66b83 Merge tag 'ceph-for-4.15-rc4' of git://github.com/ceph/ceph-client ... Browse Code »

Pull ceph fix from Ilya Dryomov:
"CephFS inode trimming fix from Zheng, marked for stable"

* tag 'ceph-for-4.15-rc4' of git://github.com/ceph/ceph-client:
ceph: drop negative child dentries before try pruning inode's alias

Linus Torvalds
2017-12-16 04:48:27 +0800
227701e0e Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs ... Browse Code »

Pull overlayfs fixes from Miklos Szeredi:

- fix incomplete syncing of filesystem

- fix regression in readdir on ovl over 9p

- only follow redirects when needed

- misc fixes and cleanups

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix overlay: warning prefix
ovl: Use PTR_ERR_OR_ZERO()
ovl: Sync upper dirty data when syncing overlayfs
ovl: update ctx->pos on impure dir iteration
ovl: Pass ovl_get_nlink() parameters in right order
ovl: don't follow redirects if redirect_dir=off

Linus Torvalds
2017-12-16 04:46:48 +0800
dc4fd9ab0 nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests ... Browse Code »

If there were no commit requests, then nfs_commit_inode() should not
wait on the commit or mark the inode dirty, otherwise the following
BUG_ON can be triggered:

[ 1917.130762] kernel BUG at fs/inode.c:578!
[ 1917.130766] Oops: Exception in kernel mode, sig: 5 [#1]
[ 1917.130768] SMP NR_CPUS=2048 NUMA pSeries
[ 1917.130772] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sg nx_crypto pseries_rng ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[ 1917.130805] CPU: 2 PID: 14923 Comm: umount.nfs4 Tainted: G ------------ T 3.10.0-768.el7.ppc64 #1
[ 1917.130810] task: c0000005ecd88040 ti: c00000004cea0000 task.ti: c00000004cea0000
[ 1917.130813] NIP: c000000000354178 LR: c000000000354160 CTR: c00000000012db80
[ 1917.130816] REGS: c00000004cea3720 TRAP: 0700 Tainted: G ------------ T (3.10.0-768.el7.ppc64)
[ 1917.130820] MSR: 8000000100029032 CR: 22002822 XER: 20000000
[ 1917.130828] CFAR: c00000000011f594 SOFTE: 1
GPR00: c000000000354160 c00000004cea39a0 c0000000014c4700 c0000000018cc750
GPR04: 000000000000c750 80c0000000000000 0600000000000000 04eeb76bea749a03
GPR08: 0000000000000034 c0000000018cc758 0000000000000001 d000000005e619e8
GPR12: c00000000012db80 c000000007b31200 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 c000000000dfc3ec 0000000000000000 c0000005eefc02c0
GPR28: d0000000079dbd50 c0000005b94a02c0 c0000005b94a0250 c0000005b94a01c8
[ 1917.130867] NIP [c000000000354178] .evict+0x1c8/0x350
[ 1917.130871] LR [c000000000354160] .evict+0x1b0/0x350
[ 1917.130873] Call Trace:
[ 1917.130876] [c00000004cea39a0] [c000000000354160] .evict+0x1b0/0x350 (unreliable)
[ 1917.130880] [c00000004cea3a30] [c0000000003558cc] .evict_inodes+0x13c/0x270
[ 1917.130884] [c00000004cea3af0] [c000000000327d20] .kill_anon_super+0x70/0x1e0
[ 1917.130896] [c00000004cea3b80] [d000000005e43e30] .nfs_kill_super+0x20/0x60 [nfs]
[ 1917.130900] [c00000004cea3c00] [c000000000328a20] .deactivate_locked_super+0xa0/0x1b0
[ 1917.130903] [c00000004cea3c80] [c00000000035ba54] .cleanup_mnt+0xd4/0x180
[ 1917.130907] [c00000004cea3d10] [c000000000119034] .task_work_run+0x114/0x150
[ 1917.130912] [c00000004cea3db0] [c00000000001ba6c] .do_notify_resume+0xcc/0x100
[ 1917.130916] [c00000004cea3e30] [c00000000000a7b0] .ret_from_except_lite+0x5c/0x60
[ 1917.130919] Instruction dump:
[ 1917.130921] 7fc3f378 486734b5 60000000 387f00a0 38800003 4bdcb365 60000000 e95f00a0
[ 1917.130927] 694a0060 7d4a0074 794ad182 694a0001 892d02a4 2f890000 40de0134

Signed-off-by: Scott Mayhew
Cc: stable@vger.kernel.org # 4.5+
Signed-off-by: Anna Schumaker

Scott Mayhew
2017-12-16 03:31:50 +0800
c156618e1 nfs: fix a deadlock in nfs client initialization ... Browse Code »

The following deadlock can occur between a process waiting for a client
to initialize in while walking the client list during nfsv4 server trunking
detection and another process waiting for the nfs_clid_init_mutex so it
can initialize that client:

Process 1 Process 2
--------- ---------
spin_lock(&nn->nfs_client_lock);
list_add_tail(&CLIENTA->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
spin_lock(&nn->nfs_client_lock);
list_add_tail(&CLIENTB->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
mutex_lock(&nfs_clid_init_mutex);
nfs41_walk_client_list(clp, result, cred);
nfs_wait_client_init_complete(CLIENTA);
(waiting for nfs_clid_init_mutex)

Make sure nfs_match_client() only evaluates clients that have completed
initialization in order to prevent that deadlock.

This patch also fixes v4.0 trunking behavior by not marking the client
NFS_CS_READY until the clientid has been confirmed.

Signed-off-by: Scott Mayhew
Signed-off-by: Anna Schumaker

Scott Mayhew
2017-12-16 03:31:49 +0800