Eric Lee / smarc-fsl-linux-kernel

05 May, 2018

4 commits

eb4f959b2 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma ... Browse Code »

Pull rdma fixes from Doug Ledford:
"This is our first pull request of the rc cycle. It's not that it's
been overly quiet, we were just waiting on a few things before sending
this off.

For instance, the 6 patch series from Intel for the hfi1 driver had
actually been pulled in on Tuesday for a Wednesday pull request, only
to have Jason notice something I missed, so we held off for some
testing, and then on Thursday had to respin the series because the
very first patch needed a minor fix (unnecessary cast is all).

There is a sizable hns patch series in here, as well as a reasonably
largish hfi1 patch series, then all of the lines of uapi updates are
just the change to the new official Linux-OpenIB SPDX tag (a bunch of
our files had what amounts to a BSD-2-Clause + MIT Warranty statement
as their license as a result of the initial code submission years ago,
and the SPDX folks decided it was unique enough to warrant a unique
tag), then the typical mlx4 and mlx5 updates, and finally some cxgb4
and core/cache/cma updates to round out the bunch.

None of it was overly large by itself, but in the 2 1/2 weeks we've
been collecting patches, it has added up :-/.

As best I can tell, it's been through 0day (I got a notice about my
last for-next push, but not for my for-rc push, but Jason seems to
think that failure messages are prioritized and success messages not
so much). It's also been through linux-next. And yes, we did notice in
the context portion of the CMA query gid fix patch that there is a
dubious BUG_ON() in the code, and have plans to audit our BUG_ON usage
and remove it anywhere we can.

Summary:

- Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off)

- SPDX license tag cleanups (new tag Linux-OpenIB)

- RoCE GID fixes related to default GIDs

- Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch),
mlx4, mlx5, and hfi1 (medium batch)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (52 commits)
RDMA/cma: Do not query GID during QP state transition to RTR
IB/mlx4: Fix integer overflow when calculating optimal MTT size
IB/hfi1: Fix memory leak in exception path in get_irq_affinity()
IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure
IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
IB/hfi1: Fix loss of BECN with AHG
IB/hfi1 Use correct type for num_user_context
IB/hfi1: Fix handling of FECN marked multicast packet
IB/core: Make ib_mad_client_id atomic
iw_cxgb4: Atomically flush per QP HW CQEs
IB/uverbs: Fix kernel crash during MR deregistration flow
IB/uverbs: Prevent reregistration of DM_MR to regular MR
RDMA/mlx4: Add missed RSS hash inner header flag
RDMA/hns: Fix a couple misspellings
RDMA/hns: Submit bad wr
RDMA/hns: Update assignment method for owner field of send wqe
RDMA/hns: Adjust the order of cleanup hem table
RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set
RDMA/hns: Remove some unnecessary attr_mask judgement
RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set
...

Linus Torvalds
2018-05-05 14:51:10 +0800
2f50037a1 Merge tag 'for-linus-20180504' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A collection of fixes that should to into this release. This contains:

- Set of bcache fixes from Coly, fixing regression in patches that
went into this series.

- Set of NVMe fixes by way of Keith.

- Set of bdi related fixes, one from Jan and two from Tetsuo Handa,
fixing various issues around device addition/removal.

- Two block inflight fixes from Omar, fixing issues around the
transition to using tags for blk-mq inflight accounting that we
did a few releases ago"

* tag 'for-linus-20180504' of git://git.kernel.dk/linux-block:
bdi: Fix oops in wb_workfn()
nvmet: switch loopback target state to connecting when resetting
nvme/multipath: Fix multipath disabled naming collisions
nvme/multipath: Disable runtime writable enabling parameter
nvme: Set integrity flag for user passthrough commands
nvme: fix potential memory leak in option parsing
bdi: Fix use after free bug in debugfs_remove()
bdi: wake up concurrent wb_shutdown() callers.
bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set
bcache: set dc->io_disable to true in conditional_stop_bcache_device()
bcache: add wait_for_kthread_stop() in bch_allocator_thread()
bcache: count backing device I/O error for writeback I/O
bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()
bcache: store disk name in struct cache and struct cached_dev
blk-mq: fix sysfs inflight counter
blk-mq: count allocated but not started requests in iostats inflight

Linus Torvalds
2018-05-05 14:41:44 +0800
2e171ffcd Merge tag 'xfs-4.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull xfs fixes from Darrick Wong:
"I've got one more bug fix for xfs for 4.17-rc4, which caps the amount
of data we try to handle in one dedupe request so that userspace can't
livelock the kernel.

This series has been run through a full xfstests run during the week
and through a quick xfstests run against this morning's master, with
no ajor failures reported.

Summary:

- Cap the maximum length of a deduplication request at MAX_RW_COUNT/2
to avoid kernel livelock due to excessively large IO requests"

* tag 'xfs-4.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: cap the length of deduplication requests

Linus Torvalds
2018-05-05 14:36:50 +0800
4148d3884 Merge tag 'for-4.17-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux ... Browse Code »

Pull btrfs fixes from David Sterba:
"Two regression fixes and one fix for stable"

* tag 'for-4.17-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: send, fix missing truncate for inode with prealloc extent past eof
btrfs: Take trans lock before access running trans in check_delayed_ref
btrfs: Fix wrong first_key parameter in replace_path

Linus Torvalds
2018-05-05 14:32:18 +0800

04 May, 2018

1 commit

b8b784958 bdi: Fix oops in wb_workfn() ... Browse Code »

Syzbot has reported that it can hit a NULL pointer dereference in
wb_workfn() due to wb->bdi->dev being NULL. This indicates that
wb_workfn() was called for an already unregistered bdi which should not
happen as wb_shutdown() called from bdi_unregister() should make sure
all pending writeback works are completed before bdi is unregistered.
Except that wb_workfn() itself can requeue the work with:

mod_delayed_work(bdi_wq, &wb->dwork, 0);

and if this happens while wb_shutdown() is waiting in:

flush_delayed_work(&wb->dwork);

the dwork can get executed after wb_shutdown() has finished and
bdi_unregister() has cleared wb->bdi->dev.

Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
the necessary precautions against racing with bdi unregistration.

CC: Tetsuo Handa
CC: Tejun Heo
Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
Reported-by: syzbot
Reviewed-by: Dave Chinner
Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2018-05-04 06:11:37 +0800

03 May, 2018

1 commit

021ba8e98 xfs: cap the length of deduplication requests ... Browse Code »

Since deduplication potentially has to read in all the pages in both
files in order to compare the contents, cap the deduplication request
length at MAX_RW_COUNT/2 (roughly 1GB) so that we have /some/ upper bound
on the request length and can't just lock up the kernel forever. Found
by running generic/304 after commit 1ddae54555b62 ("common/rc: add
missing 'local' keywords").

Reported-by: matorola@gmail.com
Signed-off-by: Darrick J. Wong
Reviewed-by: Carlos Maiolino

Darrick J. Wong
2018-05-03 00:21:33 +0800

02 May, 2018

3 commits

a6aa10c70 Btrfs: send, fix missing truncate for inode with prealloc extent past eof ... Browse Code »

An incremental send operation can miss a truncate operation when an inode
has an increased size in the send snapshot and a prealloc extent beyond
its size.

Consider the following scenario where a necessary truncate operation is
missing in the incremental send stream:

1) In the parent snapshot an inode has a size of 1282957 bytes and it has
no prealloc extents beyond its size;

2) In the the send snapshot it has a size of 5738496 bytes and has a new
extent at offsets 1884160 (length of 106496 bytes) and a prealloc
extent beyond eof at offset 6729728 (and a length of 339968 bytes);

3) When processing the prealloc extent, at offset 6729728, we end up at
send.c:send_write_or_clone() and set the @len variable to a value of
18446744073708560384 because @offset plus the original @len value is
larger then the inode's size (6729728 + 339968 > 5738496). We then
call send_extent_data(), with that @offset and @len, which in turn
calls send_write(), and then the later calls fill_read_buf(). Because
the offset passed to fill_read_buf() is greater then inode's i_size,
this function returns 0 immediately, which makes send_write() and
send_extent_data() do nothing and return immediately as well. When
we get back to send.c:send_write_or_clone() we adjust the value
of sctx->cur_inode_next_write_offset to @offset plus @len, which
corresponds to 6729728 + 18446744073708560384 = 5738496, which is
precisely the the size of the inode in the send snapshot;

4) Later when at send.c:finish_inode_if_needed() we determine that
we don't need to issue a truncate operation because the value of
sctx->cur_inode_next_write_offset corresponds to the inode's new
size, 5738496 bytes. This is wrong because the last write operation
that was issued started at offset 1884160 with a length of 106496
bytes, so the correct value for sctx->cur_inode_next_write_offset
should be 1990656 (1884160 + 106496), so that a truncate operation
with a value of 5738496 bytes would have been sent to insert a
trailing hole at the destination.

So fix the issue by making send.c:send_write_or_clone() not attempt
to send write or clone operations for extents that start beyond the
inode's size, since such attempts do nothing but waste time by
calling helper functions and allocating path structures, and send
currently has no fallocate command in order to create prealloc extents
at the destination (either beyond a file's eof or not).

The issue was found running the test btrfs/007 from fstests using a seed
value of 1524346151 for fsstress.

Reported-by: Gu, Jinxiang
Fixes: ffa7c4296e93 ("Btrfs: send, do not issue unnecessary truncate operations")
Signed-off-by: Filipe Manana
Signed-off-by: David Sterba

Filipe Manana
2018-05-02 17:55:29 +0800
998ac6d21 btrfs: Take trans lock before access running trans in check_delayed_ref ... Browse Code »

In preivous patch:
Btrfs: kill trans in run_delalloc_nocow and btrfs_cross_ref_exist
We avoid starting btrfs transaction and get this information from
fs_info->running_transaction directly.

When accessing running_transaction in check_delayed_ref, there's a
chance that current transaction will be freed by commit transaction
after the NULL pointer check of running_transaction is passed.

After looking all the other places using fs_info->running_transaction,
they are either protected by trans_lock or holding the transactions.

Fix this by using trans_lock and increasing the use_count.

Fixes: e4c3b2dcd144 ("Btrfs: kill trans in run_delalloc_nocow and btrfs_cross_ref_exist")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: ethanwu
Signed-off-by: David Sterba

ethanwu
2018-05-02 17:54:58 +0800
f2125992e Merge tag 'xfs-4.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull xfs fixes from Darrick Wong:
"Here are a few more bug fixes for xfs for 4.17-rc4. Most of them are
fixes for bad behavior.

This series has been run through a full xfstests run during LSF and
through a quick xfstests run against this morning's master, with no
major failures reported.

Summary:

- Enhance inode fork verifiers to prevent loading of corrupted
metadata.

- Fix a crash when we try to convert extents format inodes to btree
format, we run out of space, but forget to revert the in-core state
changes.

- Fix file size checks when doing INSERT_RANGE that could cause files
to end up negative size if there previously was an extent mapped at
s_maxbytes.

- Fix a bug when doing a remove-then-add ATTR_REPLACE xattr update
where we forget to clear ATTR_REPLACE after the remove, which
causes the attr to be lost and the fs to shut down due to (what it
thinks is) inconsistent in-core state"

* tag 'xfs-4.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: don't fail when converting shortform attr to long form during ATTR_REPLACE
xfs: prevent creating negative-sized file via INSERT_RANGE
xfs: set format back to extents if xfs_bmap_extents_to_btree
xfs: enhance dinode verifier

Linus Torvalds
2018-05-02 00:11:45 +0800

29 Apr, 2018

2 commits

cdface520 Merge tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 fixes from Ted Ts'o:
"Fix misc bugs and a regression for ext4"

* tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: add MODULE_SOFTDEP to ensure crc32c is included in the initramfs
ext4: fix bitmap position validation
ext4: set h_journal if there is a failure starting a reserved handle
ext4: prevent right-shifting extents beyond EXT_MAX_BLOCKS

Linus Torvalds
2018-04-29 11:07:21 +0800
cac264288 Merge tag '4.17-rc2-smb3' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull cifs fixes from Steve French:
"A few security related fixes for SMB3, most importantly for SMB3.11
encryption"

* tag '4.17-rc2-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: smbd: Avoid allocating iov on the stack
cifs: smbd: Don't use RDMA read/write when signing is used
SMB311: Fix reconnect
SMB3: Fix 3.11 encryption to Windows and handle encrypted smb3 tcon
CIFS: set *resp_buf_type to NO_BUFFER on error

Linus Torvalds
2018-04-29 00:51:56 +0800

27 Apr, 2018

1 commit

3c6b03d18 cifs: smbd: depend on INFINIBAND_ADDR_TRANS ... Browse Code »

CIFS_SMB_DIRECT code depends on INFINIBAND_ADDR_TRANS provided symbols.
So declare the kconfig dependency. This is necessary to allow for
enabling INFINIBAND without INFINIBAND_ADDR_TRANS.

Signed-off-by: Greg Thelen
Cc: Tarick Bedeir
Reviewed-by: Long Li
Signed-off-by: Doug Ledford

Greg Thelen
2018-04-27 23:15:44 +0800

26 Apr, 2018

5 commits

17515f1b7 btrfs: Fix wrong first_key parameter in replace_path ... Browse Code »

Commit 581c1760415c ("btrfs: Validate child tree block's level and first
key") introduced new @first_key parameter for read_tree_block(), however
caller in replace_path() is parasing wrong key to read_tree_block().

It should use parameter @first_key other than @key.

Normally it won't expose problem as @key is normally initialzied to the
same value of @first_key we expect.
However in relocation recovery case, @key can be set to (0, 0, 0), and
since no valid key in relocation tree can be (0, 0, 0), it will cause
read_tree_block() to return -EUCLEAN and interrupt relocation recovery.

Fix it by setting @first_key correctly.

Fixes: 581c1760415c ("btrfs: Validate child tree block's level and first key")
Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba

Qu Wenruo
2018-04-26 19:21:04 +0800
7ef79ad52 ext4: add MODULE_SOFTDEP to ensure crc32c is included in the initramfs ... Browse Code »

Fixes: a45403b51582 ("ext4: always initialize the crc32c checksum driver")
Reported-by: François Valenduc
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org

Theodore Ts'o
2018-04-26 12:44:46 +0800
8bcda1d2a cifs: smbd: Avoid allocating iov on the stack ... Browse Code »

It's not necessary to allocate another iov when going through the buffers
in smbd_send() through RDMA send.

Remove it to reduce stack size.

Thanks to Matt for spotting a printk typo in the earlier version of this.

CC: Matt Redfearn
Signed-off-by: Long Li
Acked-by: Ronnie Sahlberg
Cc: stable@vger.kernel.org
Signed-off-by: Steve French

Long Li
2018-04-26 00:15:58 +0800
bb4c04194 cifs: smbd: Don't use RDMA read/write when signing is used ... Browse Code »

SMB server will not sign data transferred through RDMA read/write. When
signing is used, it's a good idea to have all the data signed.

In this case, use RDMA send/recv for all data transfers. This will degrade
performance as this is not generally configured in RDMA environemnt. So
warn the user on signing and RDMA send/recv.

Signed-off-by: Long Li
Acked-by: Ronnie Sahlberg
Cc: stable@vger.kernel.org
Signed-off-by: Steve French

Long Li
2018-04-26 00:15:53 +0800
0d5ec281c SMB311: Fix reconnect ... Browse Code »

The preauth hash was not being recalculated properly on reconnect
of SMB3.11 dialect mounts (which caused access denied repeatedly
on auto-reconnect).

Fixes: 8bd68c6e47ab ("CIFS: implement v3.11 preauth integrity")

Signed-off-by: Steve French
CC: Stable
Reviewed-by: Ronnie Sahlberg

Steve French
2018-04-26 00:15:20 +0800

24 Apr, 2018

3 commits

22be37acc ext4: fix bitmap position validation ... Browse Code »

Currently in ext4_valid_block_bitmap() we expect the bitmap to be
positioned anywhere between 0 and s_blocksize clusters, but that's
wrong because the bitmap can be placed anywhere in the block group. This
causes false positives when validating bitmaps on perfectly valid file
system layouts. Fix it by checking whether the bitmap is within the group
boundary.

The problem can be reproduced using the following

mkfs -t ext3 -E stride=256 /dev/vdb1
mount /dev/vdb1 /mnt/test
cd /mnt/test
wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.3.tar.xz
tar xf linux-4.16.3.tar.xz

This will result in the warnings in the logs

EXT4-fs error (device vdb1): ext4_validate_block_bitmap:399: comm tar: bg 84: block 2774529: invalid block bitmap

[ Changed slightly for clarity and to not drop a overflow test -- TYT ]

Signed-off-by: Lukas Czerner
Signed-off-by: Theodore Ts'o
Reported-by: Ilya Dryomov
Fixes: 7dac4a1726a9 ("ext4: add validity checks for bitmap block numbers")
Cc: stable@vger.kernel.org

Lukas Czerner
2018-04-24 23:31:44 +0800
23657ad73 SMB3: Fix 3.11 encryption to Windows and handle encrypted smb3 tcon ... Browse Code »

Temporarily disable AES-GCM, as AES-CCM is only currently
enabled mechanism on client side. This fixes SMB3.11
encrypted mounts to Windows.

Also the tree connect request itself should be encrypted if
requested encryption ("seal" on mount), in addition we should be
enabling encryption in 3.11 based on whether we got any valid
encryption ciphers back in negprot (the corresponding session flag is
not set as it is in 3.0 and 3.02)

Signed-off-by: Steve French
Reviewed-by: Pavel Shilovsky
Reviewed-by: Ronnie Sahlberg
CC: Stable

Steve French
2018-04-24 23:07:14 +0800
117e3b7fe CIFS: set *resp_buf_type to NO_BUFFER on error ... Browse Code »

Dan Carpenter had pointed this out a while ago, but the code around
this had changed so wasn't causing any problems since that field
was not used in this error path.

Still, it is cleaner to always initialize this field, so changing
the error path to set it.

Reviewed-by: Ronnie Sahlberg
CC: Dan Carpenter
Signed-off-by: Steve French

Steve French
2018-04-24 23:06:28 +0800

23 Apr, 2018

3 commits

f19198268 ceph: check if mds create snaprealm when setting quota ... Browse Code »

If mds does not, return -EOPNOTSUPP.

Link: http://tracker.ceph.com/issues/23491
Signed-off-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Yan, Zheng
2018-04-23 23:35:19 +0800
5ec83b22a Merge tag '4.17-rc1-SMB3-CIFS' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull cifs fixes from Steve French:
"Various SMB3/CIFS fixes.

There are three more security related fixes in progress that are not
included in this set but they are still being tested and reviewed, so
sending this unrelated set of smaller fixes now"

* tag '4.17-rc1-SMB3-CIFS' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: fix typo in cifs_dbg
cifs: do not allow creating sockets except with SMB1 posix exensions
cifs: smbd: Dump SMB packet when configured
cifs: smbd: Check for iov length on sending the last iov
fs: cifs: Adding new return type vm_fault_t
cifs: smb2ops: Fix NULL check in smb2_query_symlink

Linus Torvalds
2018-04-23 03:13:04 +0800
d54b5c131 Merge tag 'for-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux ... Browse Code »

Pull btrfs fixes from David Sterba:
"This contains a few fixups to the qgroup patches that were merged this
dev cycle, unaligned access fix, blockgroup removal corner case fix
and a small debugging output tweak"

* tag 'for-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: print-tree: debugging output enhancement
btrfs: Fix race condition between delayed refs and blockgroup removal
btrfs: fix unaligned access in readdir
btrfs: Fix wrong btrfs_delalloc_release_extents parameter
btrfs: delayed-inode: Remove wrong qgroup meta reservation calls
btrfs: qgroup: Use independent and accurate per inode qgroup rsv
btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT

Linus Torvalds
2018-04-23 03:09:27 +0800

21 Apr, 2018

16 commits

d23a61ee9 fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error ... Browse Code »

Commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") is
printing spurious messages under memory pressure due to map_addr == -ENOMEM.

9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already
14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already
16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already

Complain only if -EEXIST, and use %px for printing the address.

Link: http://lkml.kernel.org/r/201804182307.FAC17665.SFMOFJVFtHOLOQ@I-love.SAKURA.ne.jp
Fixes: 4ed28639519c7bad ("fs, elf: drop MAP_FIXED usage from elf_map") is
Signed-off-by: Tetsuo Handa
Acked-by: Michal Hocko
Cc: Andrei Vagin
Cc: Khalid Aziz
Cc: Michael Ellerman
Cc: Kees Cook
Cc: Abdul Haleem
Cc: Joel Stanley
Cc: Anshuman Khandual
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tetsuo Handa
2018-04-21 08:18:36 +0800
9a1015b32 proc: fix /proc/loadavg regression ... Browse Code »

Commit 95846ecf9dac ("pid: replace pid bitmap implementation with IDR
API") changed last field of /proc/loadavg (last pid allocated) to be off
by one:

# unshare -p -f --mount-proc cat /proc/loadavg
0.00 0.00 0.00 1/60 2
Cc: "Eric W. Biederman"
Cc: Gargi Sharma
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2018-04-21 08:18:36 +0800
2e0ad552f proc: revalidate kernel thread inodes to root:root ... Browse Code »

task_dump_owner() has the following code:

mm = task->mm;
if (mm) {
if (get_dumpable(mm) != SUID_DUMP_USER) {
uid = ...
}
}

Check for ->mm is buggy -- kernel thread might be borrowing mm
and inode will go to some random uid:gid pair.

Link: http://lkml.kernel.org/r/20180412220109.GA20978@avx2
Signed-off-by: Alexey Dobriyan
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2018-04-21 08:18:35 +0800
1e6306652 autofs: mount point create should honour passed in mode ... Browse Code »

The autofs file system mkdir inode operation blindly sets the created
directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can
cause selinux dac_override denials.

But the function also checks if the caller is the daemon (as no-one else
should be able to do anything here) so there's no point in not honouring
the passed in mode, allowing the daemon to set appropriate mode when
required.

Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.net
Signed-off-by: Ian Kent
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ian Kent
2018-04-21 08:18:35 +0800
2e898e4c0 writeback: safer lock nesting ... Browse Code »

lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
the page's memcg is undergoing move accounting, which occurs when a
process leaves its memcg for a new one that has
memory.move_charge_at_immigrate set.

unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
the given inode is switching writeback domains. Switches occur when
enough writes are issued from a new domain.

This existing pattern is thus suspicious:
lock_page_memcg(page);
unlocked_inode_to_wb_begin(inode, &locked);
...
unlocked_inode_to_wb_end(inode, locked);
unlock_page_memcg(page);

If both inode switch and process memcg migration are both in-flight then
unlocked_inode_to_wb_end() will unconditionally enable interrupts while
still holding the lock_page_memcg() irq spinlock. This suggests the
possibility of deadlock if an interrupt occurs before unlock_page_memcg().

truncate
__cancel_dirty_page
lock_page_memcg
unlocked_inode_to_wb_begin
unlocked_inode_to_wb_end

end_page_writeback
test_clear_page_writeback
lock_page_memcg

unlock_page_memcg

Due to configuration limitations this deadlock is not currently possible
because we don't mix cgroup writeback (a cgroupv2 feature) and
memory.move_charge_at_immigrate (a cgroupv1 feature).

If the kernel is hacked to always claim inode switching and memcg
moving_account, then this script triggers lockup in less than a minute:

cd /mnt/cgroup/memory
mkdir a b
echo 1 > a/memory.move_charge_at_immigrate
echo 1 > b/memory.move_charge_at_immigrate
(
echo $BASHPID > a/cgroup.procs
while true; do
dd if=/dev/zero of=/mnt/big bs=1M count=256
done
) &
while true; do
sync
done &
sleep 1h &
SLEEP=$!
while true; do
echo $SLEEP > a/cgroup.procs
echo $SLEEP > b/cgroup.procs
done

The deadlock does not seem possible, so it's debatable if there's any
reason to modify the kernel. I suggest we should to prevent future
surprises. And Wang Long said "this deadlock occurs three times in our
environment", so there's more reason to apply this, even to stable.
Stable 4.4 has minor conflicts applying this patch. For a clean 4.4 patch
see "[PATCH for-4.4] writeback: safer lock nesting"
https://lkml.org/lkml/2018/4/11/146

Wang Long said "this deadlock occurs three times in our environment"

[gthelen@google.com: v4]
Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
[akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
Signed-off-by: Greg Thelen
Reported-by: Wang Long
Acked-by: Wang Long
Acked-by: Michal Hocko
Reviewed-by: Andrew Morton
Cc: Johannes Weiner
Cc: Tejun Heo
Cc: Nicholas Piggin
Cc: [v4.2+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Thelen
2018-04-21 08:18:35 +0800
88c28f246 mm, pagemap: fix swap offset value for PMD migration entry ... Browse Code »

The swap offset reported by /proc//pagemap may be not correct for
PMD migration entries. If addr passed into pagemap_pmd_range() isn't
aligned with PMD start address, the swap offset reported doesn't
reflect this. And in the loop to report information of each sub-page,
the swap offset isn't increased accordingly as that for PFN.

This may happen after opening /proc//pagemap and seeking to a page
whose address doesn't align with a PMD start address. I have verified
this with a simple test program.

BTW: migration swap entries have PFN information, do we need to restrict
whether to show them?

[akpm@linux-foundation.org: fix typo, per Huang, Ying]
Link: http://lkml.kernel.org/r/20180408033737.10897-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying"
Cc: Michal Hocko
Cc: "Kirill A. Shutemov"
Cc: Andrei Vagin
Cc: Dan Williams
Cc: "Jerome Glisse"
Cc: Daniel Colascione
Cc: Zi Yan
Cc: Naoya Horiguchi
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2018-04-21 08:18:35 +0800
596632de0 CIFS: fix typo in cifs_dbg ... Browse Code »

Signed-off-by: Aurelien Aptel
Signed-off-by: Steve French
Reported-by: Long Li

Aurelien Aptel
2018-04-21 02:39:10 +0800
1d0cffa67 cifs: do not allow creating sockets except with SMB1 posix exensions ... Browse Code »

RHBZ: 1453123

Since at least the 3.10 kernel and likely a lot earlier we have
not been able to create unix domain sockets in a cifs share
when mounted using the SFU mount option (except when mounted
with the cifs unix extensions to Samba e.g.)
Trying to create a socket, for example using the af_unix command from
xfstests will cause :
BUG: unable to handle kernel NULL pointer dereference at 00000000
00000040

Since no one uses or depends on being able to create unix domains sockets
on a cifs share the easiest fix to stop this vulnerability is to simply
not allow creation of any other special files than char or block devices
when sfu is used.

Added update to Ronnie's patch to handle a tcon link leak, and
to address a buf leak noticed by Gustavo and Colin.

Acked-by: Gustavo A. R. Silva
CC: Colin Ian King
Reviewed-by: Pavel Shilovsky
Reported-by: Eryu Guan
Signed-off-by: Ronnie Sahlberg
Signed-off-by: Steve French
Cc: stable@vger.kernel.org

Steve French
2018-04-21 02:31:32 +0800
ff30b89e0 cifs: smbd: Dump SMB packet when configured ... Browse Code »

When sending through SMB Direct, also dump the packet in SMB send path.

Also fixed a typo in debug message.

Signed-off-by: Long Li
Cc: stable@vger.kernel.org
Signed-off-by: Steve French
Reviewed-by: Ronnie Sahlberg

Long Li
2018-04-21 01:18:28 +0800
c08723237 btrfs: print-tree: debugging output enhancement ... Browse Code »

This patch enhances the following things:

- tree block header
* add generation and owner output for node and leaf
- node pointer generation output
- allow btrfs_print_tree() to not follow nodes
* just like btrfs-progs

Please note that, although function btrfs_print_tree() is not called by
anyone right now, it's still a pretty useful function to debug kernel.
So that function is still kept for later use.

Signed-off-by: Qu Wenruo
Reviewed-by: Lu Fengqi
Signed-off-by: David Sterba

Qu Wenruo
2018-04-21 01:18:16 +0800
5e388e958 btrfs: Fix race condition between delayed refs and blockgroup removal ... Browse Code »

When the delayed refs for a head are all run, eventually
cleanup_ref_head is called which (in case of deletion) obtains a
reference for the relevant btrfs_space_info struct by querying the bg
for the range. This is problematic because when the last extent of a
bg is deleted a race window emerges between removal of that bg and the
subsequent invocation of cleanup_ref_head. This can result in cache being null
and either a null pointer dereference or assertion failure.

task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000
RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs]
RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292
RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8
RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001
R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0
R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0
FS: 00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs]
btrfs_run_delayed_refs+0x68/0x250 [btrfs]
btrfs_should_end_transaction+0x42/0x60 [btrfs]
btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs]
btrfs_evict_inode+0x4c6/0x5c0 [btrfs]
evict+0xc6/0x190
do_unlinkat+0x19c/0x300
do_syscall_64+0x74/0x140
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7fbf589c57a7

To fix this, introduce a new flag "is_system" to head_ref structs,
which is populated at insertion time. This allows to decouple the
querying for the spaceinfo from querying the possibly deleted bg.

Fixes: d7eae3403f46 ("Btrfs: rework delayed ref total_bytes_pinned accounting")
CC: stable@vger.kernel.org # 4.14+
Suggested-by: Omar Sandoval
Signed-off-by: Nikolay Borisov
Reviewed-by: Omar Sandoval
Signed-off-by: David Sterba

Nikolay Borisov
2018-04-21 01:17:25 +0800
a9e5b7328 vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion ... Browse Code »

In do_mount() when the MS_* flags are being converted to MNT_* flags,
MS_RDONLY got accidentally convered to SB_RDONLY.

Undo this change.

Fixes: e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2018-04-21 00:59:33 +0800
660625922 afs: Fix server record deletion ... Browse Code »

AFS server records get removed from the net->fs_servers tree when
they're deleted, but not from the net->fs_addresses{4,6} lists, which
can lead to an oops in afs_find_server() when a server record has been
removed, for instance during rmmod.

Fix this by deleting the record from the by-address lists before posting
it for RCU destruction.

The reason this hasn't been noticed before is that the fileserver keeps
probing the local cache manager, thereby keeping the service record
alive, so the oops would only happen when a fileserver eventually gets
bored and stops pinging or if the module gets rmmod'd and a call comes
in from the fileserver during the window between the server records
being destroyed and the socket being closed.

The oops looks something like:

BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
...
Workqueue: kafsd afs_process_async_call [kafs]
RIP: 0010:afs_find_server+0x271/0x36f [kafs]
...
Call Trace:
afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
afs_deliver_to_call+0x1ee/0x5e8 [kafs]
afs_process_async_call+0x5b/0xd0 [kafs]
process_one_work+0x2c2/0x504
worker_thread+0x1d4/0x2ac
kthread+0x11f/0x127
ret_from_fork+0x24/0x30

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2018-04-21 00:59:33 +0800
b9abdcfd1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:
"Assorted fixes.

Some of that is only a matter with fault injection (broken handling of
small allocation failure in various mount-related places), but the
last one is a root-triggerable stack overflow, and combined with
userns it gets really nasty ;-/"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Don't leak MNT_INTERNAL away from internal mounts
mm,vmscan: Allow preallocating memory for register_shrinker().
rpc_pipefs: fix double-dput()
orangefs_kill_sb(): deal with allocation failures
jffs2_kill_sb(): deal with failed allocations
hypfs_kill_super(): deal with failed allocations

Linus Torvalds
2018-04-21 00:15:14 +0800
43f70c960 Merge tag 'ecryptfs-4.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tyhicks/ecryptfs

Pull eCryptfs fixes from Tyler Hicks:
"Minor cleanups and a bug fix to completely ignore unencrypted
filenames in the lower filesystem when filename encryption is enabled
at the eCryptfs layer"

* tag 'ecryptfs-4.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: don't pass up plaintext names when using filename encryption
ecryptfs: fix spelling mistake: "cadidate" -> "candidate"
ecryptfs: lookup: Don't check if mount_crypt_stat is NULL

Linus Torvalds
2018-04-21 00:08:37 +0800
0d9cf33b4 Merge tag 'for_v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

- isofs memory leak fix

- two fsnotify fixes of event mask handling

- udf fix of UTF-16 handling

- couple other smaller cleanups

* tag 'for_v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix leak of UTF-16 surrogates into encoded strings
fs: ext2: Adding new return type vm_fault_t
isofs: fix potential memory leak in mount option parsing
MAINTAINERS: add an entry for FSNOTIFY infrastructure
fsnotify: fix typo in a comment about mark->g_list
fsnotify: fix ignore mask logic in send_to_group()
isofs compress: Remove VLA usage
fs: quota: Replace GFP_ATOMIC with GFP_KERNEL in dquot_init
fanotify: fix logic of events on child

Linus Torvalds
2018-04-21 00:01:26 +0800

20 Apr, 2018

1 commit

16a34adb9 Don't leak MNT_INTERNAL away from internal mounts ... Browse Code »

We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
their copies. As it is, creating a deep stack of bindings of /proc/*/ns/*
somewhere in a new namespace and exiting yields a stack overflow.

Cc: stable@kernel.org
Reported-by: Alexander Aring
Bisected-by: Kirill Tkhai
Tested-by: Kirill Tkhai
Tested-by: Alexander Aring
Signed-off-by: Al Viro

Al Viro
2018-04-20 11:52:15 +0800