Eric Lee / smarc-fsl-linux-kernel

21 Feb, 2016

1 commit

0389075ec Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Ingo Molnar:
"This is unusually large, partly due to the EFI fixes that prevent
accidental deletion of EFI variables through efivarfs that may brick
machines. These fixes are somewhat involved to maintain compatibility
with existing install methods and other usage modes, while trying to
turn off the 'rm -rf' bricking vector.

Other fixes are for large page ioremap()s and for non-temporal
user-memcpy()s"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix vmalloc_fault() to handle large pages properly
hpet: Drop stale URLs
x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
lib/ucs2_string: Correct ucs2 -> utf8 conversion
efi: Add pstore variables to the deletion whitelist
efi: Make efivarfs entries immutable by default
efi: Make our variable validation list include the guid
efi: Do variable name validation tests in utf8
efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version
lib/ucs2_string: Add ucs2 -> utf8 helper functions

Linus Torvalds
2016-02-21 01:32:40 +0800

20 Feb, 2016

3 commits

020ecbba0 Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 bugfixes from Ted Ts'o:
"Miscellaneous ext4 bug fixes for v4.5"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix crashes in dioread_nolock mode
ext4: fix bh->b_state corruption
ext4: fix memleak in ext4_readdir()
ext4: remove unused parameter "newblock" in convert_initialized_extent()
ext4: don't read blocks from disk after extents being swapped
ext4: fix potential integer overflow
ext4: add a line break for proc mb_groups display
ext4: ioctl: fix erroneous return value
ext4: fix scheduling in atomic on group checksum failure
ext4 crypto: move context consistency check to ext4_file_open()
ext4 crypto: revalidate dentry after adding or removing the key

Linus Torvalds
2016-02-20 05:44:12 +0800
ce6b71432 Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fix from Chris Mason:
"My for-linus-4.5 branch has a btrfs DIO error passing fix.

I know how much you love DIO, so I'm going to suggest against reading
it. We'll follow up with a patch to drop the error arg from
dio_end_io in the next merge window."

* 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix direct IO requests not reporting IO error to user space

Linus Torvalds
2016-02-20 05:40:42 +0800
87d9ac712 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge fixes from Andrew Morton:
"10 fixes"

* emailed patches from Andrew Morton :
mm: slab: free kmem_cache_node after destroy sysfs file
ipc/shm: handle removed segments gracefully in shm_mmap()
MAINTAINERS: update Kselftest Framework mailing list
devm_memremap_release(): fix memremap'd addr handling
mm/hugetlb.c: fix incorrect proc nr_hugepages value
mm, x86: fix pte_page() crash in gup_pte_range()
fsnotify: turn fsnotify reaper thread into a workqueue job
Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"
mm: fix regression in remap_file_pages() emulation
thp, dax: do not try to withdraw pgtable from non-anon VMA

Linus Torvalds
2016-02-20 05:36:00 +0800

19 Feb, 2016

4 commits

74dae4278 ext4: fix crashes in dioread_nolock mode ... Browse Code »

Competing overwrite DIO in dioread_nolock mode will just overwrite
pointer to io_end in the inode. This may result in data corruption or
extent conversion happening from IO completion interrupt because we
don't properly set buffer_defer_completion() when unlocked DIO races
with locked DIO to unwritten extent.

Since unlocked DIO doesn't need io_end for anything, just avoid
allocating it and corrupting pointer from inode for locked DIO.
A cleaner fix would be to avoid these games with io_end pointer from the
inode but that requires more intrusive changes so we leave that for
later.

Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-02-19 13:33:21 +0800
ed8ad8380 ext4: fix bh->b_state corruption ... Browse Code »

ext4 can update bh->b_state non-atomically in _ext4_get_block() and
ext4_da_get_block_prep(). Usually this is fine since bh is just a
temporary storage for mapping information on stack but in some cases it
can be fully living bh attached to a page. In such case non-atomic
update of bh->b_state can race with an atomic update which then gets
lost. Usually when we are mapping bh and thus updating bh->b_state
non-atomically, nobody else touches the bh and so things work out fine
but there is one case to especially worry about: ext4_finish_bio() uses
BH_Uptodate_Lock on the first bh in the page to synchronize handling of
PageWriteback state. So when blocksize < pagesize, we can be atomically
modifying bh->b_state of a buffer that actually isn't under IO and thus
can race e.g. with delalloc trying to map that buffer. The result is
that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.

Fix the problem by always updating bh->b_state bits atomically.

CC: stable@vger.kernel.org
Reported-by: Nikolay Borisov
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-02-19 13:18:25 +0800
0918f1c30 fsnotify: turn fsnotify reaper thread into a workqueue job ... Browse Code »

We don't require a dedicated thread for fsnotify cleanup. Switch it
over to a workqueue job instead that runs on the system_unbound_wq.

In the interest of not thrashing the queued job too often when there are
a lot of marks being removed, we delay the reaper job slightly when
queueing it, to allow several to gather on the list.

Signed-off-by: Jeff Layton
Tested-by: Eryu Guan
Reviewed-by: Jan Kara
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Layton
2016-02-19 08:23:24 +0800
13d34ac6e Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread" ... Browse Code »

This reverts commit c510eff6beba ("fsnotify: destroy marks with
call_srcu instead of dedicated thread").

Eryu reported that he was seeing some OOM kills kick in when running a
testcase that adds and removes inotify marks on a file in a tight loop.

The above commit changed the code to use call_srcu to clean up the
marks. While that does (in principle) work, the srcu callback job is
limited to cleaning up entries in small batches and only once per jiffy.
It's easily possible to overwhelm that machinery with too many call_srcu
callbacks, and Eryu's reproduer did just that.

There's also another potential problem with using call_srcu here. While
you can obviously sleep while holding the srcu_read_lock, the callbacks
run under local_bh_disable, so you can't sleep there.

It's possible when putting the last reference to the fsnotify_mark that
we'll end up putting a chain of references including the fsnotify_group,
uid, and associated keys. While I don't see any obvious ways that that
could occurs, it's probably still best to avoid using call_srcu here
after all.

This patch reverts the above patch. A later patch will take a different
approach to eliminated the dedicated thread here.

Signed-off-by: Jeff Layton
Reported-by: Eryu Guan
Tested-by: Eryu Guan
Cc: Jan Kara
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Layton
2016-02-19 08:23:24 +0800

18 Feb, 2016

1 commit

285071357 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A collection of fixes from the past few weeks that should go into 4.5.
This contains:

- Overflow fix for sysfs discard show function from Alan.

- A stacking limit init fix for max_dev_sectors, so we don't end up
artificially capping some use cases. From Keith.

- Have blk-mq proper end unstarted requests on a dying queue, instead
of pushing that to the driver. From Keith.

- NVMe:
- Update to Kconfig description for NVME_SCSI, since it was
vague and having it on is important for some SUSE distros.
From Christoph.
- Set of fixes from Keith, around surprise removal. Also kills
the no-merge flag, so it supports merging.

- Set of fixes for lightnvm from Matias, Javier, and Wenwei.

- Fix null_blk oops when asked for lightnvm, but not available. From
Matias.

- Copy-to-user EINTR fix from Hannes, fixing a case where SG_IO fails
if interrupted by a signal.

- Two floppy fixes from Jiri, fixing signal handling and blocking
open.

- A use-after-free fix for O_DIRECT, from Mike Krinkin.

- A block module ref count fix from Roman Pen.

- An fs IO wait accounting fix for O_DSYNC from Stephane Gasparini.

- Smaller reallo fix for xen-blkfront from Bob Liu.

- Removal of an unused struct member in the deadline IO scheduler,
from Tahsin.

- Also from Tahsin, properly initialize inode struct members
associated with cgroup writeback, if enabled.

- From Tejun, ensure that we keep the superblock pinned during cgroup
writeback"

* 'for-linus' of git://git.kernel.dk/linux-block: (25 commits)
blk: fix overflow in queue_discard_max_hw_show
writeback: initialize inode members that track writeback history
writeback: keep superblock pinned during cgroup writeback association switches
bio: return EINTR if copying to user space got interrupted
NVMe: Rate limit nvme IO warnings
NVMe: Poll device while still active during remove
NVMe: Requeue requests on suspended queues
NVMe: Allow request merges
NVMe: Fix io incapable return values
blk-mq: End unstarted requests on dying queue
block: Initialize max_dev_sectors to 0
null_blk: oops when initializing without lightnvm
block: fix module reference leak on put_disk() call for cgroups throttle
nvme: fix Kconfig description for BLK_DEV_NVME_SCSI
kernel/fs: fix I/O wait not accounted for RW O_DSYNC
floppy: refactor open() flags handling
lightnvm: allow to force mm initialization
lightnvm: check overflow and correct mlc pairs
lightnvm: fix request intersection locking in rrpc
lightnvm: warn if irqs are disabled in lock laddr
...

Linus Torvalds
2016-02-18 03:59:23 +0800

17 Feb, 2016

3 commits

3d65ae463 writeback: initialize inode members that track writeback history ... Browse Code »

inode struct members that track cgroup writeback information
should be reinitialized when inode gets allocated from
kmem_cache. Otherwise, their values remain and get used by the
new inode.

Signed-off-by: Tahsin Erdogan
Acked-by: Tejun Heo
Fixes: d10c80955265 ("writeback: implement foreign cgroup inode bdi_writeback switching")
Signed-off-by: Jens Axboe

Tahsin Erdogan
2016-02-17 05:57:21 +0800
65c23c65b Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull cifs fixes from Steve French:
"A small set of cifs fixes.

I am still reviewing some more, recently submitted SMB3 fixes, but
these three are small and safe and ready now"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix erroneous return value
cifs: fix potential overflow in cifs_compose_mount_options
cifs: remove redundant check for null string pointer

Linus Torvalds
2016-02-17 02:52:59 +0800
5ff8eaac1 writeback: keep superblock pinned during cgroup writeback association switches ... Browse Code »

If cgroup writeback is in use, an inode is associated with a cgroup
for writeback. If the inode's main dirtier changes to another cgroup,
the association gets updated asynchronously. Nothing was pinning the
superblock while such switches are in progress and superblock could go
away while async switching is pending or in progress leading to
crashes like the following.

kernel BUG at fs/jbd2/transaction.c:319!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
CPU: 1 PID: 29158 Comm: kworker/1:10 Not tainted 4.5.0-rc3 #51
Hardware name: Google Google, BIOS Google 01/01/2011
Workqueue: events inode_switch_wbs_work_fn
task: ffff880213dbbd40 ti: ffff880209264000 task.ti: ffff880209264000
RIP: 0010:[] [] start_this_handle+0x382/0x3e0
RSP: 0018:ffff880209267c30 EFLAGS: 00010202
...
Call Trace:
[] jbd2__journal_start+0xf4/0x190
[] __ext4_journal_start_sb+0x4e/0x70
[] ext4_evict_inode+0x12c/0x3d0
[] evict+0xbb/0x190
[] iput+0x130/0x190
[] inode_switch_wbs_work_fn+0x343/0x4c0
[] process_one_work+0x129/0x300
[] worker_thread+0x126/0x480
[] kthread+0xc4/0xe0
[] ret_from_fork+0x3f/0x70

Fix it by bumping s_active while cgroup association switching is in
flight.

Signed-off-by: Tejun Heo
Reported-and-tested-by: Tahsin Erdogan
Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com
Fixes: d10c80955265 ("writeback: implement foreign cgroup inode bdi_writeback switching")
Cc: stable@vger.kernel.org #v4.5+
Signed-off-by: Jens Axboe

Tejun Heo
2016-02-17 02:34:07 +0800

16 Feb, 2016

3 commits

4682c211a Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent ... Browse Code »

Pull EFI fixes from Matt Fleming:

* Prevent accidental deletion of EFI variables through efivarfs that
may brick machines. We use a whitelist of known-safe variables to
allow things like installing distributions to work out of the box, and
instead restrict vendor-specific variable deletion by making
non-whitelist variables immutable (Peter Jones)

Signed-off-by: Ingo Molnar

Ingo Molnar
2016-02-16 20:14:57 +0800
c906f38e8 ext4: fix memleak in ext4_readdir() ... Browse Code »

When ext4_bread() fails, fname_crypto_str remains
allocated after return. Fix that.

Signed-off-by: Kirill Tkhai
Signed-off-by: Theodore Ts'o
CC: Dmitry Monakhov

Kirill Tkhai
2016-02-16 13:20:19 +0800
1636d1d77 Btrfs: fix direct IO requests not reporting IO error to user space ... Browse Code »

If a bio for a direct IO request fails, we were not setting the error in
the parent bio (the main DIO bio), making us not return the error to
user space in btrfs_direct_IO(), that is, it made __blockdev_direct_IO()
return the number of bytes issued for IO and not the error a bio created
and submitted by btrfs_submit_direct() got from the block layer.
This essentially happens because when we call:

dio_end_io(dio_bio, bio->bi_error);

It does not set dio_bio->bi_error to the value of the second argument.
So just add this missing assignment in endio callbacks, just as we do in
the error path at btrfs_submit_direct() when we fail to clone the dio bio
or allocate its private object. This follows the convention of what is
done with other similar APIs such as bio_endio() where the caller is
responsible for setting the bi_error field in the bio it passes as an
argument to bio_endio().

This was detected by the new generic test cases in xfstests: 271, 272,
276 and 278. Which essentially setup a dm error target, then load the
error table, do a direct IO write and unload the error table. They
expect the write to fail with -EIO, which was not getting reported
when testing against btrfs.

Cc: stable@vger.kernel.org # 4.3+
Fixes: 4246a0b63bd8 ("block: add a bi_error field to struct bio")
Signed-off-by: Filipe Manana

Filipe Manana
2016-02-16 11:41:26 +0800

15 Feb, 2016

1 commit

779ee19da Merge tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty ... Browse Code »

Pull tty/serial fixes from Greg KH:
"Here are a number of small tty and serial driver fixes for 4.5-rc4
that resolve some reported issues.

One of them got reverted as it wasn't correct based on testing, and
all have been in linux-next for a while"

* tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "8250: uniphier: allow modular build with 8250 console"
pty: make sure super_block is still valid in final /dev/tty close
pty: fix possible use after free of tty->driver_data
tty: Add support for PCIe WCH382 2S multi-IO card
serial/omap: mark wait_for_xmitr as __maybe_unused
serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485)
8250: uniphier: allow modular build with 8250 console
tty: Drop krefs for interrupted tty lock

Linus Torvalds
2016-02-15 04:29:59 +0800

13 Feb, 2016

2 commits

27c9d772e Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"This has a few fixes from Filipe, along with a readdir fix from Dave
that we've been testing for some time"

* 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: properly set the termination value of ctx->pos in readdir
Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl
Btrfs: remove no longer used function extent_read_full_page_nolock()
Btrfs: fix page reading in extent_same ioctl leading to csum errors
Btrfs: fix invalid page accesses in extent_same (dedup) ioctl

Linus Torvalds
2016-02-13 01:21:28 +0800
dfc852864 Merge tag 'xfs-fixes-for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs ... Browse Code »

Pull xfs fix from Dve Chinner:
"This contains a fix for an endian conversion issue in new CRC
validation in log recovery that was discovered on a ppc64 platform"

* tag 'xfs-fixes-for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: fix endianness error when checking log block crc on big endian platforms

Linus Torvalds
2016-02-13 01:17:03 +0800

12 Feb, 2016

6 commits

56263b4ce ext4: remove unused parameter "newblock" in convert_initialized_extent() ... Browse Code »

The "newblock" parameter is not used in convert_initialized_extent(),
remove it.

Signed-off-by: Eryu Guan
Signed-off-by: Theodore Ts'o

Eryu Guan
2016-02-12 14:23:00 +0800
bcff24887 ext4: don't read blocks from disk after extents being swapped ... Browse Code »

I notice ext4/307 fails occasionally on ppc64 host, reporting md5
checksum mismatch after moving data from original file to donor file.

The reason is that move_extent_per_page() calls __block_write_begin()
and block_commit_write() to write saved data from original inode blocks
to donor inode blocks, but __block_write_begin() not only maps buffer
heads but also reads block content from disk if the size is not block
size aligned. At this time the physical block number in mapped buffer
head is pointing to the donor file not the original file, and that
results in reading wrong data to page, which get written to disk in
following block_commit_write call.

This also can be reproduced by the following script on 1k block size ext4
on x86_64 host:

mnt=/mnt/ext4
donorfile=$mnt/donor
testfile=$mnt/testfile
e4compact=~/xfstests/src/e4compact

rm -f $donorfile $testfile

# reserve space for donor file, written by 0xaa and sync to disk to
# avoid EBUSY on EXT4_IOC_MOVE_EXT
xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile

# create test file written by 0xbb
xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile

# compute initial md5sum
md5sum $testfile | tee md5sum.txt
# drop cache, force e4compact to read data from disk
echo 3 > /proc/sys/vm/drop_caches

# test defrag
echo "$testfile" | $e4compact -i -v -f $donorfile
# check md5sum
md5sum -c md5sum.txt

Fix it by creating & mapping buffer heads only but not reading blocks
from disk, because all the data in page is guaranteed to be up-to-date
in mext_page_mkuptodate().

Cc: stable@vger.kernel.org
Signed-off-by: Eryu Guan
Signed-off-by: Theodore Ts'o

Eryu Guan
2016-02-12 14:20:43 +0800
46901760b ext4: fix potential integer overflow ... Browse Code »

Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data),
integer overflow could be happened.
Therefore, need to fix integer overflow sanitization.

Cc: stable@vger.kernel.org
Signed-off-by: Insu Yun
Signed-off-by: Theodore Ts'o

Insu Yun
2016-02-12 14:15:59 +0800
802cf1f9f ext4: add a line break for proc mb_groups display ... Browse Code »

This patch adds a line break for proc mb_groups display.

Signed-off-by: Huaitong Han
Signed-off-by: Theodore Ts'o
Reviewed-by: Andreas Dilger

Huaitong Han
2016-02-12 13:17:16 +0800
fdde368e7 ext4: ioctl: fix erroneous return value ... Browse Code »

The ext4_ioctl_setflags() function which is used in the ioctls
EXT4_IOC_SETFLAGS and EXT4_IOC_FSSETXATTR may return the positive value
EPERM instead of -EPERM in case of error. This bug was introduced by a
recent commit 9b7365fc.

The following program can be used to illustrate the wrong behavior:

#include
#include
#include
#include
#include

#define FS_IOC_GETFLAGS _IOR('f', 1, long)
#define FS_IOC_SETFLAGS _IOW('f', 2, long)
#define FS_IMMUTABLE_FL 0x00000010

int main(void)
{
int fd;
long flags;

fd = open("file", O_RDWR|O_CREAT, 0600);
if (fd < 0)
err(1, "open");

if (ioctl(fd, FS_IOC_GETFLAGS, &flags) < 0)
err(1, "ioctl: FS_IOC_GETFLAGS");

flags |= FS_IMMUTABLE_FL;

if (ioctl(fd, FS_IOC_SETFLAGS, &flags) < 0)
err(1, "ioctl: FS_IOC_SETFLAGS");

warnx("ioctl returned no error");

return 0;
}

Running it gives the following result:

$ strace -e ioctl ./test
ioctl(3, FS_IOC_GETFLAGS, 0x7ffdbd8bfd38) = 0
ioctl(3, FS_IOC_SETFLAGS, 0x7ffdbd8bfd38) = 1
test: ioctl returned no error
+++ exited with 0 +++

Running the program on a kernel with the bug fixed gives the proper result:

$ strace -e ioctl ./test
ioctl(3, FS_IOC_GETFLAGS, 0x7ffdd2768258) = 0
ioctl(3, FS_IOC_SETFLAGS, 0x7ffdd2768258) = -1 EPERM (Operation not permitted)
test: ioctl: FS_IOC_SETFLAGS: Operation not permitted
+++ exited with 1 +++

Signed-off-by: Anton Protopopov
Signed-off-by: Theodore Ts'o

Anton Protopopov
2016-02-12 12:57:21 +0800
05145bd79 ext4: fix scheduling in atomic on group checksum failure ... Browse Code »

When block group checksum is wrong, we call ext4_error() while holding
group spinlock from ext4_init_block_bitmap() or
ext4_init_inode_bitmap() which results in scheduling while in atomic.
Fix the issue by calling ext4_error() later after dropping the spinlock.

CC: stable@vger.kernel.org
Reported-by: Dmitry Vyukov
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o
Reviewed-by: Darrick J. Wong

Jan Kara
2016-02-12 12:15:12 +0800

11 Feb, 2016

5 commits

bc4ef7592 btrfs: properly set the termination value of ctx->pos in readdir ... Browse Code »

The value of ctx->pos in the last readdir call is supposed to be set to
INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a
larger value, then it's LLONG_MAX.

There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++"
overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a
64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before
the increment.

We can get to that situation like that:

* emit all regular readdir entries
* still in the same call to readdir, bump the last pos to INT_MAX
* next call to readdir will not emit any entries, but will reach the
bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX

Normally this is not a problem, but if we call readdir again, we'll find
'pos' set to LLONG_MAX and the unconditional increment will overflow.

The report from Victor at
(http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging
print shows that pattern:

Overflow: e
Overflow: 7fffffff
Overflow: 7fffffffffffffff
PAX: size overflow detected in function btrfs_real_readdir
fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0;
context: dir_context;
CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1
Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015
ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48
ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78
ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8
Call Trace:
[] dump_stack+0x4c/0x7f
[] report_size_overflow+0x36/0x40
[] btrfs_real_readdir+0x69c/0x6d0
[] iterate_dir+0xa8/0x150
[] ? __fget_light+0x2d/0x70
[] SyS_getdents+0xba/0x1c0
Overflow: 1a
[] ? iterate_dir+0x150/0x150
[] entry_SYSCALL_64_fastpath+0x12/0x83

The jump from 7fffffff to 7fffffffffffffff happens when new dir entries
are not yet synced and are processed from the delayed list. Then the code
could go to the bump section again even though it might not emit any new
dir entries from the delayed list.

The fix avoids entering the "bump" section again once we've finished
emitting the entries, both for synced and delayed entries.

References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284
Reported-by: Victor
CC: stable@vger.kernel.org
Signed-off-by: David Sterba
Tested-by: Holger Hoffstätte
Signed-off-by: Chris Mason

David Sterba
2016-02-11 23:01:59 +0800
4b550af51 cifs: fix erroneous return value ... Browse Code »

The setup_ntlmv2_rsp() function may return positive value ENOMEM instead
of -ENOMEM in case of kmalloc failure.

Signed-off-by: Anton Protopopov
CC: Stable
Signed-off-by: Steve French

Anton Protopopov
2016-02-11 08:23:31 +0800
f34d69c3e cifs: fix potential overflow in cifs_compose_mount_options ... Browse Code »

In worst case, "ip=" + sb_mountdata + ipv6 can be copied into mountdata.
Therefore, for safe, it is better to add more size when allocating memory.

Signed-off-by: Insu Yun
Signed-off-by: Steve French

Insu Yun
2016-02-11 08:04:56 +0800
997152f62 cifs: remove redundant check for null string pointer ... Browse Code »

server_RFC1001_name is declared as a RFC1001_NAME_LEN_WITH_NULL sized
char array in struct TCP_Server_Info so the null pointer check on
server_RFC1001_name is redundant and can be removed. Detected with
smatch:

fs/cifs/connect.c:2982 ip_rfc1001_connect() warn: this array is probably
non-NULL. 'server->server_RFC1001_name'

Signed-off-by: Colin Ian King
Signed-off-by: Steve French

Colin Ian King
2016-02-11 08:04:53 +0800
ed8b0de5a efi: Make efivarfs entries immutable by default ... Browse Code »

"rm -rf" is bricking some peoples' laptops because of variables being
used to store non-reinitializable firmware driver data that's required
to POST the hardware.

These are 100% bugs, and they need to be fixed, but in the mean time it
shouldn't be easy to *accidentally* brick machines.

We have to have delete working, and picking which variables do and don't
work for deletion is quite intractable, so instead make everything
immutable by default (except for a whitelist), and make tools that
aren't quite so broad-spectrum unset the immutable flag.

Signed-off-by: Peter Jones
Tested-by: Lee, Chun-Yi
Acked-by: Matthew Garrett
Signed-off-by: Matt Fleming

Peter Jones
2016-02-11 00:25:52 +0800

10 Feb, 2016

1 commit

e0d64e6a8 efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version ... Browse Code »

Translate EFI's UCS-2 variable names to UTF-8 instead of just assuming
all variable names fit in ASCII.

Signed-off-by: Peter Jones
Acked-by: Matthew Garrett
Tested-by: Lee, Chun-Yi
Signed-off-by: Matt Fleming

Peter Jones
2016-02-10 21:19:14 +0800

08 Feb, 2016

3 commits

ff978b09f ext4 crypto: move context consistency check to ext4_file_open() ... Browse Code »

In the case where the per-file key for the directory is cached, but
root does not have access to the key needed to derive the per-file key
for the files in the directory, we allow the lookup to succeed, so
that lstat(2) and unlink(2) can suceed. However, if a program tries
to open the file, it will get an ENOKEY error.

Signed-off-by: Theodore Ts'o

Theodore Ts'o
2016-02-08 13:54:26 +0800
28b4c2639 ext4 crypto: revalidate dentry after adding or removing the key ... Browse Code »

Add a validation check for dentries for encrypted directory to make
sure we're not caching stale data after a key has been added or removed.

Also check to make sure that status of the encryption key is updated
when readdir(2) is executed.

Signed-off-by: Theodore Ts'o

Theodore Ts'o
2016-02-08 08:35:05 +0800
8e0bd4925 xfs: fix endianness error when checking log block crc on big endian platforms ... Browse Code »

Since the checksum function and the field are both __le32, don't
perform endian conversion when comparing the two. This fixes mount
failures on ppc64.

Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Darrick J. Wong
2016-02-08 08:03:58 +0800

07 Feb, 2016

1 commit

1f55c718c pty: make sure super_block is still valid in final /dev/tty close ... Browse Code »

Considering current pty code and multiple devpts instances, it's possible
to umount a devpts file system while a program still has /dev/tty opened
pointing to a previosuly closed pty pair in that instance. In the case all
ptmx and pts/N files are closed, umount can be done. If the program closes
/dev/tty after umount is done, devpts_kill_index will use now an invalid
super_block, which was already destroyed in the umount operation after
running ->kill_sb. This is another "use after free" type of issue, but now
related to the allocated super_block instance.

To avoid the problem (warning at ida_remove and potential crashes) for
this specific case, I added two functions in devpts which grabs additional
references to the super_block, which pty code now uses so it makes sure
the super block structure is still valid until pty shutdown is done.
I also moved the additional inode references to the same functions, which
also covered similar case with inode being freed before /dev/tty final
close/shutdown.

Signed-off-by: Herton R. Krzesinski
Cc: stable@vger.kernel.org # 2.6.29+
Reviewed-by: Peter Hurley
Signed-off-by: Greg Kroah-Hartman

Herton R. Krzesinski
2016-02-07 15:45:46 +0800

06 Feb, 2016

6 commits

5af9c2e19 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge fixes from Andrew Morton:
"22 fixes"

* emailed patches from Andrew Morton : (22 commits)
epoll: restrict EPOLLEXCLUSIVE to POLLIN and POLLOUT
radix-tree: fix oops after radix_tree_iter_retry
MAINTAINERS: trim the file triggers for ABI/API
dax: dirty inode only if required
thp: make deferred_split_scan() work again
mm: replace vma_lock_anon_vma with anon_vma_lock_read/write
ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup
um: asm/page.h: remove the pte_high member from struct pte_t
mm, hugetlb: don't require CMA for runtime gigantic pages
mm/hugetlb: fix gigantic page initialization/allocation
mm: downgrade VM_BUG in isolate_lru_page() to warning
mempolicy: do not try to queue pages from !vma_migratable()
mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress
vmstat: make vmstat_update deferrable
mm, vmstat: make quiet_vmstat lighter
mm/Kconfig: correct description of DEFERRED_STRUCT_PAGE_INIT
memblock: don't mark memblock_phys_mem_size() as __init
dump_stack: avoid potential deadlocks
mm: validate_mm browse_rb SMP race condition
m32r: fix build failure due to SMP and MMU
...

Linus Torvalds
2016-02-06 12:20:07 +0800
5d6a6a75e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull Ceph fixes from Sage Weil:
"We have a few wire protocol compatibility fixes, ports of a few recent
CRUSH mapping changes, and a couple error path fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: MOSDOpReply v7 encoding
libceph: advertise support for TUNABLES5
crush: decode and initialize chooseleaf_stable
crush: add chooseleaf_stable tunable
crush: ensure take bucket value is valid
crush: ensure bucket id is valid before indexing buckets array
ceph: fix snap context leak in error path
ceph: checking for IS_ERR instead of NULL

Linus Torvalds
2016-02-06 11:52:57 +0800
b6a515c8a epoll: restrict EPOLLEXCLUSIVE to POLLIN and POLLOUT ... Browse Code »

In the current implementation of the EPOLLEXCLUSIVE flag (added for
4.5-rc1), if epoll waiters create different POLL* sets and register them
as exclusive against the same target fd, the current implementation will
stop waking any further waiters once it finds the first idle waiter.
This means that waiters could miss wakeups in certain cases.

For example, when we wake up a pipe for reading we do:
wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLRDNORM); So if
one epoll set or epfd is added to pipe p with POLLIN and a second set
epfd2 is added to pipe p with POLLRDNORM, only epfd may receive the
wakeup since the current implementation will stop after it finds any
intersection of events with a waiter that is blocked in epoll_wait().

We could potentially address this by requiring all epoll waiters that
are added to p be required to pass the same set of POLL* events. IE the
first EPOLL_CTL_ADD that passes EPOLLEXCLUSIVE establishes the set POLL*
flags to be used by any other epfds that are added as EPOLLEXCLUSIVE.
However, I think it might be somewhat confusing interface as we would
have to reference count the number of users for that set, and so
userspace would have to keep track of that count, or we would need a
more involved interface. It also adds some shared state that we'd have
store somewhere. I don't think anybody will want to bloat
__wait_queue_head for this.

I think what we could do instead, is to simply restrict EPOLLEXCLUSIVE
such that it can only be specified with EPOLLIN and/or EPOLLOUT. So
that way if the wakeup includes 'POLLIN' and not 'POLLOUT', we can stop
once we hit the first idle waiter that specifies the EPOLLIN bit, since
any remaining waiters that only have 'POLLOUT' set wouldn't need to be
woken. Likewise, we can do the same thing if 'POLLOUT' is in the wakeup
bit set and not 'POLLIN'. If both 'POLLOUT' and 'POLLIN' are set in the
wake bit set (there is at least one example of this I saw in fs/pipe.c),
then we just wake the entire exclusive list. Having both 'POLLOUT' and
'POLLIN' both set should not be on any performance critical path, so I
think that's ok (in fs/pipe.c its in pipe_release()). We also continue
to include EPOLLERR and EPOLLHUP by default in any exclusive set. Thus,
the user can specify EPOLLERR and/or EPOLLHUP but is not required to do
so.

Since epoll waiters may be interested in other events as well besides
EPOLLIN, EPOLLOUT, EPOLLERR and EPOLLHUP, these can still be added by
doing a 'dup' call on the target fd and adding that as one normally
would with EPOLL_CTL_ADD. Since I think that the POLLIN and POLLOUT
events are what we are interest in balancing, I think that the 'dup'
thing could perhaps be added to only one of the waiter threads.
However, I think that EPOLLIN, EPOLLOUT, EPOLLERR and EPOLLHUP should be
sufficient for the majority of use-cases.

Since EPOLLEXCLUSIVE is intended to be used with a target fd shared
among multiple epfds, where between 1 and n of the epfds may receive an
event, it does not satisfy the semantics of EPOLLONESHOT where only 1
epfd would get an event. Thus, it is not allowed to be specified in
conjunction with EPOLLEXCLUSIVE.

EPOLL_CTL_MOD is also not allowed if the fd was previously added as
EPOLLEXCLUSIVE. It seems with the limited number of flags to not be as
interesting, but this could be relaxed at some further point.

Signed-off-by: Jason Baron
Tested-by: Madars Vitolins
Cc: Michael Kerrisk
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Al Viro
Cc: Eric Wong
Cc: Jonathan Corbet
Cc: Andy Lutomirski
Cc: Hagen Paul Pfeifer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jason Baron
2016-02-06 10:10:40 +0800
d2b2a28e6 dax: dirty inode only if required ... Browse Code »

Signed-off-by: Dmitry Monakhov
Reviewed-by: Jan Kara
Reviewed-by: Ross Zwisler
Cc: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dmitry Monakhov
2016-02-06 10:10:40 +0800
c95a51807 ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup ... Browse Code »

When recovery master down, dlm_do_local_recovery_cleanup() only remove
the $RECOVERY lock owned by dead node, but do not clear the refmap bit.
Which will make umount thread falling in dead loop migrating $RECOVERY
to the dead node.

Signed-off-by: xuejiufei
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

xuejiufei
2016-02-06 10:10:40 +0800
9c5a05bc3 block: fix pfn_mkwrite() DAX fault handler ... Browse Code »

Previously the pfn_mkwrite() fault handler for raw block devices called
bldev_dax_fault() -> __dax_fault() to do a full DAX page fault.

Really what the pfn_mkwrite() fault handler needs to do is call
dax_pfn_mkwrite() to make sure that the radix tree entry for the given
PTE is marked as dirty so that a follow-up fsync or msync call will
flush it durably to media.

Fixes: 5a023cdba50c ("block: enable dax for raw block devices")
Signed-off-by: Ross Zwisler
Cc: Alexander Viro
Cc: Dan Williams
Cc: Dave Chinner
Reviewed-by: Jan Kara
Cc: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ross Zwisler
2016-02-06 10:10:40 +0800