Eric Lee / smarc-fsl-linux-kernel

07 Jul, 2017

11 commits

481f001ff ceph: update ceph_dentry_info::lease_session when necessary ... Browse Code »

Current code does not update ceph_dentry_info::lease_session once
it is set. If auth mds of corresponding dentry changes, dentry lease
keeps in an invalid state.

Signed-off-by: "Yan, Zheng"
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Yan, Zheng
2017-07-07 23:25:14 +0800
1d8f83604 ceph: new mount option that specifies fscache uniquifier ... Browse Code »

Current ceph uses FSID as primary index key of fscache data. This
allows ceph to retain cached data across remount. But this causes
problem (kernel opps, fscache does not support sharing data) when
a filesystem get mounted several times (with fscache enabled, with
different mount options).

The fix is adding a new mount option, which specifies uniquifier
for fscache.

Signed-off-by: "Yan, Zheng"
Acked-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Yan, Zheng
2017-07-07 23:25:14 +0800
4b9f2042f ceph: avoid accessing freeing inode in ceph_check_delayed_caps() ... Browse Code »

Signed-off-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Yan, Zheng
2017-07-07 23:25:13 +0800
62a65f36d ceph: avoid invalid memory dereference in the middle of umount ... Browse Code »

extra_mon_dispatch() and debugfs' foo_show functions dereference
fsc->mdsc. we should clean up fsc->client->extra_mon_dispatch
and debugfs before destroying fsc->mds.

Signed-off-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Yan, Zheng
2017-07-07 23:25:13 +0800
1684dd03e ceph: getattr before read on ceph.* xattrs ... Browse Code »

Previously we were returning values for quota, layout
xattrs without any kind of update -- the user just got
whatever happened to be in our cache.

Clearly this extra round trip has a cost, but reads of
these xattrs are fairly rare, happening on admin
intervention rather than in normal operation.

Link: http://tracker.ceph.com/issues/17939
Signed-off-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Yan, Zheng
2017-07-07 23:25:13 +0800
92e57e628 ceph: don't re-send interrupted flock request ... Browse Code »

Don't re-send interrupted flock request in cases of mds failover
and receiving request forward. Because corresponding 'lock intr'
request may have been finished, it won't get re-sent.

Link: http://tracker.ceph.com/issues/20170
Signed-off-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Yan, Zheng
2017-07-07 23:25:13 +0800
439868812 ceph: cleanup writepage_nounlock() ... Browse Code »

Signed-off-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Yan, Zheng
2017-07-07 23:25:13 +0800
fa71fefb3 ceph: redirty page when writepage_nounlock() skips unwritable page ... Browse Code »

Ceph needs to flush dirty page in the order in which in which snap
context they belong to. Dirty pages belong to older snap context
should be flushed earlier. if writepage_nounlock() can not flush a
page, it should redirty the page.

Reported-by: Dan Carpenter
Signed-off-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Yan, Zheng
2017-07-07 23:25:13 +0800
f2b0c45f0 ceph: remove useless page->mapping check in writepage_nounlock() ... Browse Code »

Callers of writepage_nounlock() have already ensured non-null
page->mapping.

Reported-by: Dan Carpenter
Signed-off-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Yan, Zheng
2017-07-07 23:25:13 +0800
efb0ca765 ceph: update the 'approaching max_size' code ... Browse Code »

The old 'approaching max_size' code expects MDS set max_size to
'2 * reported_size'. This is no longer true. The new code reports
file size when half of previous max_size increment has been used.

Signed-off-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Yan, Zheng
2017-07-07 23:25:12 +0800
84eea8c79 ceph: re-request max size after importing caps ... Browse Code »

The 'wanted max size' could be sent to inode's old auth mds, re-send
it to inode's new auth mds if necessary. Otherwise write syscall may
hang.

Signed-off-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Yan, Zheng
2017-07-07 23:25:12 +0800

01 Jul, 2017

1 commit

86c3e00af Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs ... Browse Code »

Pull overlayfs fixes from Miklos Szeredi:
"Fix two bugs in copy-up code. One introduced in 4.11 and one in
4.12-rc"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: don't set origin on broken lower hardlink
ovl: copy-up: don't unlock between lookup and link

Linus Torvalds
2017-07-01 01:22:59 +0800

30 Jun, 2017

1 commit

374bf8831 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"Two fixes that should go into this release.

One is an nvme regression fix from Keith, fixing a missing queue
freeze if the controller is being reset. This causes the reset to
hang.

The other is a fix for a leak of the bio protection info, if smaller
sized O_DIRECT is used. This fix should be more involved as we have
other problematic paths in the kernel, but given as this isn't a
regression in this series, we'll tackle those for 4.13"

* 'for-linus' of git://git.kernel.dk/linux-block:
block: provide bio_uninit() free freeing integrity/task associations
nvme/pci: Fix stuck nvme reset

Linus Torvalds
2017-06-30 05:10:37 +0800

29 Jun, 2017

2 commits

9ae3b3f52 block: provide bio_uninit() free freeing integrity/task associations ... Browse Code »

Wen reports significant memory leaks with DIF and O_DIRECT:

"With nvme devive + T10 enabled, On a system it has 256GB and started
logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
leaking.

/proc/meminfo | grep SUnreclaim...

SUnreclaim: 6752128 kB
SUnreclaim: 6874880 kB
SUnreclaim: 7238080 kB
....
SUnreclaim: 22307264 kB
SUnreclaim: 22485888 kB
SUnreclaim: 22720256 kB

When testcases with T10 enabled call into __blkdev_direct_IO_simple,
code doesn't free memory allocated by bio_integrity_alloc. The patch
fixes the issue. HTX has been run with +60 hours without failure."

Since __blkdev_direct_IO_simple() allocates the bio on the stack, it
doesn't go through the regular bio free. This means that any ancillary
data allocated with the bio through the stack is not freed. Hence, we
can leak the integrity data associated with the bio, if the device is
using DIF/DIX.

Fix this by providing a bio_uninit() and export it, so that we can use
it to free this data. Note that this is a minimal fix for this issue.
Any current user of bio's that are allocated outside of
bio_alloc_bioset() suffers from this issue, most notably some drivers.
We will fix those in a more comprehensive patch for 4.13. This also
means that the commit marked as being fixed by this isn't the real
culprit, it's just the most obvious one out there.

Fixes: 542ff7bf18c6 ("block: new direct I/O implementation")
Reported-by: Wen Xiong
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Jens Axboe
2017-06-29 05:30:13 +0800
e547204f1 Merge tag 'nfs-for-4.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Trond Myklebust:
"Bugfixes include:

- stable fix for exclusive create if the server supports the umask
attribute

- trunking detection should handle ERESTARTSYS/EINTR

- stable fix for a race in the LAYOUTGET function

- stable fix to revert "nfs_rename() handle -ERESTARTSYS dentry left
behind"

- nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()"

* tag 'nfs-for-4.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4.1: nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()
Revert "NFS: nfs_rename() handle -ERESTARTSYS dentry left behind"
NFSv4.1: Fix a race in nfs4_proc_layoutget
NFS: Trunking detection should handle ERESTARTSYS/EINTR
NFSv4.2: Don't send mode again in post-EXCLUSIVE4_1 SETATTR with umask

Linus Torvalds
2017-06-29 04:27:15 +0800

28 Jun, 2017

6 commits

fbaf94ee3 ovl: don't set origin on broken lower hardlink ... Browse Code »

When copying up a file that has multiple hard links we need to break any
association with the origin file. This makes copy-up be essentially an
atomic replace.

The new file has nothing to do with the old one (except having the same
data and metadata initially), so don't set the overlay.origin attribute.

We can relax this in the future when we are able to index upper object by
origin.

Signed-off-by: Miklos Szeredi
Fixes: 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")

Miklos Szeredi
2017-06-28 19:41:22 +0800
e85f82ff9 ovl: copy-up: don't unlock between lookup and link ... Browse Code »

Nothing prevents mischief on upper layer while we are busy copying up the
data.

Move the lookup right before the looked up dentry is actually used.

Signed-off-by: Miklos Szeredi
Fixes: 01ad3eb8a073 ("ovl: concurrent copy up of regular files")
Cc: # v4.11

Miklos Szeredi
2017-06-28 19:41:22 +0800
2e31b4cb8 NFSv4.1: nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete() ... Browse Code »

The current code works only for the case where we have exactly one slot,
which is no longer true.
nfs4_free_slot() will automatically declare the callback channel to be
drained when all slots have been returned.

Signed-off-by: Trond Myklebust

Trond Myklebust
2017-06-28 10:26:23 +0800
d9f295000 Revert "NFS: nfs_rename() handle -ERESTARTSYS dentry left behind" ... Browse Code »

This reverts commit 920b4530fb80430ff30ef83efe21ba1fa5623731 which could
call d_move() without holding the directory's i_mutex, and reverts commit
d4ea7e3c5c0e341c15b073016dbf3ab6c65f12f3 "NFS: Fix old dentry rehash after
move", which was a follow-up fix.

Signed-off-by: Benjamin Coddington
Fixes: 920b4530fb80 ("NFS: nfs_rename() handle -ERESTARTSYS dentry left behind")
Cc: stable@vger.kernel.org # v4.10+
Reviewed-by: Jeff Layton
Signed-off-by: Trond Myklebust

Benjamin Coddington
2017-06-28 09:58:14 +0800
bd171930e NFSv4.1: Fix a race in nfs4_proc_layoutget ... Browse Code »

If the task calling layoutget is signalled, then it is possible for the
calls to nfs4_sequence_free_slot() and nfs4_layoutget_prepare() to race,
in which case we leak a slot.
The fix is to move the call to nfs4_sequence_free_slot() into the
nfs4_layoutget_release() so that it gets called at task teardown time.

Fixes: 2e80dbe7ac51 ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Trond Myklebust

Trond Myklebust
2017-06-28 09:44:58 +0800
898fc11bb NFS: Trunking detection should handle ERESTARTSYS/EINTR ... Browse Code »

Currently, it will return EIO in those cases.

Signed-off-by: Trond Myklebust

Trond Myklebust
2017-06-28 09:44:58 +0800

24 Jun, 2017

6 commits

337c6ba2d Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"8 fixes"

* emailed patches from Andrew Morton :
fs/exec.c: account for argv/envp pointers
ocfs2: fix deadlock caused by recursive locking in xattr
slub: make sysfs file removal asynchronous
lib/cmdline.c: fix get_options() overflow while parsing ranges
fs/dax.c: fix inefficiency in dax_writeback_mapping_range()
autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL
mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings
mm, thp: remove cond_resched from __collapse_huge_page_copy

Linus Torvalds
2017-06-24 07:30:52 +0800
98da7d088 fs/exec.c: account for argv/envp pointers ... Browse Code »

When limiting the argv/envp strings during exec to 1/4 of the stack limit,
the storage of the pointers to the strings was not included. This means
that an exec with huge numbers of tiny strings could eat 1/4 of the stack
limit in strings and then additional space would be later used by the
pointers to the strings.

For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721
single-byte strings would consume less than 2MB of stack, the max (8MB /
4) amount allowed, but the pointers to the strings would consume the
remaining additional stack space (1677721 * 4 == 6710884).

The result (1677721 + 6710884 == 8388605) would exhaust stack space
entirely. Controlling this stack exhaustion could result in
pathological behavior in setuid binaries (CVE-2017-1000365).

[akpm@linux-foundation.org: additional commenting from Kees]
Fixes: b6a2fea39318 ("mm: variable length argument support")
Link: http://lkml.kernel.org/r/20170622001720.GA32173@beast
Signed-off-by: Kees Cook
Acked-by: Rik van Riel
Acked-by: Michal Hocko
Cc: Alexander Viro
Cc: Qualys Security Advisory
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2017-06-24 07:15:56 +0800
8818efaaa ocfs2: fix deadlock caused by recursive locking in xattr ... Browse Code »

Another deadlock path caused by recursive locking is reported. This
kind of issue was introduced since commit 743b5f1434f5 ("ocfs2: take
inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths have been
fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when taking
inode lock at vfs entry points"). Yes, we intend to fix this kind of
case in incremental way, because it's hard to find out all possible
paths at once.

This one can be reproduced like this. On node1, cp a large file from
home directory to ocfs2 mountpoint. While on node2, run
setfacl/getfacl. Both nodes will hang up there. The backtraces:

On node1:
__ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
ocfs2_write_begin+0x43/0x1a0 [ocfs2]
generic_perform_write+0xa9/0x180
__generic_file_write_iter+0x1aa/0x1d0
ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
__vfs_write+0xc3/0x130
vfs_write+0xb1/0x1a0
SyS_write+0x46/0xa0

On node2:
__ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
ocfs2_set_acl+0x22d/0x260 [ocfs2]
ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
set_posix_acl+0x75/0xb0
posix_acl_xattr_set+0x49/0xa0
__vfs_setxattr+0x69/0x80
__vfs_setxattr_noperm+0x72/0x1a0
vfs_setxattr+0xa7/0xb0
setxattr+0x12d/0x190
path_setxattr+0x9f/0xb0
SyS_setxattr+0x14/0x20

Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is
exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking logic
to avoid recursive cluster lock").

Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com
Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
Signed-off-by: Eric Ren
Reported-by: Thomas Voegtle
Tested-by: Thomas Voegtle
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Ren
2017-06-24 07:15:55 +0800
1eb643d02 fs/dax.c: fix inefficiency in dax_writeback_mapping_range() ... Browse Code »

dax_writeback_mapping_range() fails to update iteration index when
searching radix tree for entries needing cache flushing. Thus each
pagevec worth of entries is searched starting from the start which is
inefficient and prone to livelocks. Update index properly.

Link: http://lkml.kernel.org/r/20170619124531.21491-1-jack@suse.cz
Fixes: 9973c98ecfda3 ("dax: add support for fsync/sync")
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Cc: Dan Williams
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2017-06-24 07:15:55 +0800
9fa4eb8e4 autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL ... Browse Code »

If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL ioctl,
autofs4_d_automount() will return

ERR_PTR(status)

with that status to follow_automount(), which will then dereference an
invalid pointer.

So treat a positive status the same as zero, and map to ENOENT.

See comment in systemd src/core/automount.c::automount_send_ready().

Link: http://lkml.kernel.org/r/871sqwczx5.fsf@notabene.neil.brown.name
Signed-off-by: NeilBrown
Cc: Ian Kent
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2017-06-24 07:15:55 +0800
7b249bdc3 Merge tag 'xfs-4.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull xfs fixes from Darrick Wong:
"I have one more bugfix for you for 4.12-rc7 to fix a disk corruption
problem:

- don't allow swapon on files on the realtime device, because the
swap code will swap pages out to blocks on the data device, thereby
corrupting the filesystem"

* tag 'xfs-4.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: don't allow bmap on rt files

Linus Torvalds
2017-06-24 03:23:06 +0800

23 Jun, 2017

1 commit

a38371cba Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull cifs fixes from Steve French:
"Various small fixes for stable"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix some return values in case of error in 'crypt_message'
cifs: remove redundant return in cifs_creation_time_get
CIFS: Improve readdir verbosity
CIFS: check if pages is null rather than bv for a failed allocation
CIFS: Set ->should_dirty in cifs_user_readv()

Linus Torvalds
2017-06-23 02:16:55 +0800

22 Jun, 2017

2 commits

eb5e248d5 xfs: don't allow bmap on rt files ... Browse Code »

bmap returns a dumb LBA address but not the block device that goes with
that LBA. Swapfiles don't care about this and will blindly assume that
the data volume is the correct blockdev, which is totally bogus for
files on the rt subvolume. This results in the swap code doing IOs to
arbitrary locations on the data device(!) if the passed in mapping is a
realtime file, so just turn off bmap for rt files.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2017-06-22 11:27:35 +0800
021f60198 Merge branch 'ufs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more ufs fixes from Al Viro:
"More UFS fixes, unfortunately including build regression fix for the
64-bit s_dsize commit. Fixed in this pile:

- trivial bug in signedness of 32bit timestamps on ufs1

- ESTALE instead of ufs_error() when doing open-by-fhandle on
something deleted

- build regression on 32bit in ufs_new_fragments() - calculating that
many percents of u64 pulls libgcc stuff on some of those. Mea
culpa.

- fix hysteresis loop broken by typo in 2.4.14.7 (right next to the
location of previous bug).

- fix the insane limits of said hysteresis loop on filesystems with
very low percentage of reserved blocks. If it's 5% or less, just
use the OPTSPACE policy.

- calculate those limits once and mount time.

This tree does pass xfstests clean (both ufs1 and ufs2) and it _does_
survive cross-builds.

Again, my apologies for missing that, especially since I have noticed
a related percentage-of-64bit issue in earlier patches (when dealing
with amount of reserved blocks). Self-LART applied..."

* 'ufs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ufs: fix the logics for tail relocation
ufs_iget(): fail with -ESTALE on deleted inode
fix signedness of timestamps on ufs1

Linus Torvalds
2017-06-22 02:30:52 +0800

21 Jun, 2017

5 commits

517a6e43c CIFS: Fix some return values in case of error in 'crypt_message' ... Browse Code »

'rc' is known to be 0 at this point. So if 'init_sg' or 'kzalloc' fails, we
should return -ENOMEM instead.

Also remove a useless 'rc' in a debug message as it is meaningless here.

Fixes: 026e93dc0a3ee ("CIFS: Encrypt SMB3 requests before sending")
Signed-off-by: Christophe JAILLET
Reviewed-by: Pavel Shilovsky
Reviewed-by: Aurelien Aptel
Signed-off-by: Steve French
CC: Stable

Christophe Jaillet
2017-06-21 13:09:28 +0800
e125f5284 cifs: remove redundant return in cifs_creation_time_get ... Browse Code »

There is a redundant return in function cifs_creation_time_get
that appears to be old vestigial code than can be removed. So
remove it.

Detected by CoverityScan, CID#1361924 ("Structurally dead code")

Signed-off-by: Colin Ian King
Signed-off-by: Steve French

Colin Ian King
2017-06-21 08:14:40 +0800
dcd87838c CIFS: Improve readdir verbosity ... Browse Code »

Downgrade the loglevel for SMB2 to prevent filling the log
with messages if e.g. readdir was interrupted. Also make SMB2
and SMB1 codepaths do the same logging during readdir.

Signed-off-by: Pavel Shilovsky
Signed-off-by: Steve French
CC: Stable

Pavel Shilovsky
2017-06-21 08:13:47 +0800
ecf3411a1 CIFS: check if pages is null rather than bv for a failed allocation ... Browse Code »

pages is being allocated however a null check on bv is being used
to see if the allocation failed. Fix this by checking if pages is
null.

Detected by CoverityScan, CID#1432974 ("Logically dead code")

Fixes: ccf7f4088af2dd ("CIFS: Add asynchronous context to support kernel AIO")
Signed-off-by: Colin Ian King
Reviewed-by: Pavel Shilovsky
Signed-off-by: Steve French

Colin Ian King
2017-06-21 08:11:35 +0800
8a7b0d8e8 CIFS: Set ->should_dirty in cifs_user_readv() ... Browse Code »

The current code causes a static checker warning because ITER_IOVEC is
zero so the condition is never true.

Fixes: 6685c5e2d1ac ("CIFS: Add asynchronous read support through kernel AIO")
Signed-off-by: Dan Carpenter
Signed-off-by: Steve French

Dan Carpenter
2017-06-21 06:57:27 +0800

19 Jun, 2017

1 commit

1be7107fb mm: larger stack guard gap, between vmas ... Browse Code »

Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.

Original-patch-by: Oleg Nesterov
Original-patch-by: Michal Hocko
Signed-off-by: Hugh Dickins
Acked-by: Michal Hocko
Tested-by: Helge Deller # parisc
Signed-off-by: Linus Torvalds

Hugh Dickins
2017-06-19 21:50:20 +0800

18 Jun, 2017

4 commits

6e2035065 Merge tag 'ceph-for-4.12-rc6' of git://github.com/ceph/ceph-client ... Browse Code »

Pull ceph fixes from Ilya Dryomov:
"A fix for an old ceph ->fh_to_* bug from Luis and two timestamp fixups
from Zheng, prompted by the ongoing y2038 work"

* tag 'ceph-for-4.12-rc6' of git://github.com/ceph/ceph-client:
ceph: unify inode i_ctime update
ceph: use current_kernel_time() to get request time stamp
ceph: check i_nlink while converting a file handle to dentry

Linus Torvalds
2017-06-18 07:23:02 +0800
77e9ce327 ufs: fix the logics for tail relocation ... Browse Code »

* original hysteresis loop got broken by typo back in 2002; now
it never switches out of OPTTIME state. Fixed.
* critical levels for switching from OPTTIME to OPTSPACE and back
ought to be calculated once, at mount time.
* we should use mul_u64_u32_div() for those calculations, now that
->s_dsize is 64bit.
* to quote Kirk McKusick (in 1995 FreeBSD commit message):
The threshold for switching from time-space and space-time is too small
when minfree is 5%...so make it stay at space in this case.

Signed-off-by: Al Viro

Al Viro
2017-06-18 05:22:42 +0800
c0ef65d29 ufs_iget(): fail with -ESTALE on deleted inode ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2017-06-18 00:25:58 +0800
23ac7cba7 fix signedness of timestamps on ufs1 ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2017-06-18 00:25:13 +0800