Eric Lee / smarc-fsl-linux-kernel

03 May, 2016

1 commit

33656a1f2 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull UDF fix from Jan Kara:
"A fix of a regression in UDF that got introduced in 4.6-rc1 by one of
the charset encoding fixes"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix conversion of 'dstring' fields to UTF8

Linus Torvalds
2016-05-03 00:59:57 +0800

30 Apr, 2016

1 commit

1d003af2e Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge fixes from Andrew Morton:
"20 fixes"

* emailed patches from Andrew Morton :
Documentation/sysctl/vm.txt: update numa_zonelist_order description
lib/stackdepot.c: allow the stack trace hash to be zero
rapidio: fix potential NULL pointer dereference
mm/memory-failure: fix race with compound page split/merge
ocfs2/dlm: return zero if deref_done message is successfully handled
Ananth has moved
kcov: don't profile branches in kcov
kcov: don't trace the code coverage code
mm: wake kcompactd before kswapd's short sleep
.mailmap: add Frank Rowand
mm/hwpoison: fix wrong num_poisoned_pages accounting
mm: call swap_slot_free_notify() with page lock held
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
numa: fix /proc//numa_maps for THP
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
mailmap: fix Krzysztof Kozlowski's misspelled name
thp: keep huge zero page pinned until tlb flush
mm: exclude HugeTLB pages from THP page_mapped() logic
kexec: export OFFSET(page.compound_head) to find out compound tail page
kexec: update VMCOREINFO for compound_order/dtor

Linus Torvalds
2016-04-30 02:21:22 +0800

29 Apr, 2016

3 commits

b73413647 ocfs2/dlm: return zero if deref_done message is successfully handled ... Browse Code »

dlm_deref_lockres_done_handler() should return zero if the message is
successfully handled.

Fixes: 60d663cb5273 ("ocfs2/dlm: add DEREF_DONE message").
Signed-off-by: xuejiufei
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

xuejiufei
2016-04-29 10:34:04 +0800
28093f9f3 numa: fix /proc/<pid>/numa_maps for THP ... Browse Code »

In gather_pte_stats() a THP pmd is cast into a pte, which is wrong
because the layouts may differ depending on the architecture. On s390
this will lead to inaccurate numa_maps accounting in /proc because of
misguided pte_present() and pte_dirty() checks on the fake pte.

On other architectures pte_present() and pte_dirty() may work by chance,
but there may be an issue with direct-access (dax) mappings w/o
underlying struct pages when HAVE_PTE_SPECIAL is set and THP is
available. In vm_normal_page() the fake pte will be checked with
pte_special() and because there is no "special" bit in a pmd, this will
always return false and the VM_PFNMAP | VM_MIXEDMAP checking will be
skipped. On dax mappings w/o struct pages, an invalid struct page
pointer would then be returned that can crash the kernel.

This patch fixes the numa_maps THP handling by introducing new "_pmd"
variants of the can_gather_numa_stats() and vm_normal_page() functions.

Signed-off-by: Gerald Schaefer
Cc: Naoya Horiguchi
Cc: "Kirill A . Shutemov"
Cc: Konstantin Khlebnikov
Cc: Michal Hocko
Cc: Vlastimil Babka
Cc: Jerome Marchand
Cc: Johannes Weiner
Cc: Dave Hansen
Cc: Mel Gorman
Cc: Dan Williams
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Michael Holzheu
Cc: [4.3+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gerald Schaefer
2016-04-29 10:34:04 +0800
6fa9bffbc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull Ceph fixes from Sage Weil:
"There is a lifecycle fix in the auth code, a fix for a narrow race
condition on map, and a helpful message in the log when there is a
feature mismatch (which happens frequently now that the default
server-side options have changed)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: report unsupported features to syslog
rbd: fix rbd map vs notify races
libceph: make authorizer destruction independent of ceph_auth_client

Linus Torvalds
2016-04-29 09:59:24 +0800

27 Apr, 2016

1 commit

8ead9dd54 devpts: more pty driver interface cleanups ... Browse Code »

This is more prep-work for the upcoming pty changes. Still just code
cleanup with no actual semantic changes.

This removes a bunch pointless complexity by just having the slave pty
side remember the dentry associated with the devpts slave rather than
the inode. That allows us to remove all the "look up the dentry" code
for when we want to remove it again.

Together with moving the tty pointer from "inode->i_private" to
"dentry->d_fsdata" and getting rid of pointless inode locking, this
removes about 30 lines of code. Not only is the end result smaller,
it's simpler and easier to understand.

The old code, for example, depended on the d_find_alias() to not just
find the dentry, but also to check that it is still hashed, which in
turn validated the tty pointer in the inode.

That is a _very_ roundabout way to say "invalidate the cached tty
pointer when the dentry is removed".

The new code just does

dentry->d_fsdata = NULL;

in devpts_pty_kill() instead, invalidating the tty pointer rather more
directly and obviously. Don't do something complex and subtle when the
obvious straightforward approach will do.

The rest of the patch (ie apart from code deletion and the above tty
pointer clearing) is just switching the calling convention to pass the
dentry or file pointer around instead of the inode.

Cc: Eric Biederman
Cc: Peter Anvin
Cc: Andy Lutomirski
Cc: Al Viro
Cc: Peter Hurley
Cc: Serge Hallyn
Cc: Willy Tarreau
Cc: Aurelien Jarno
Cc: Alan Cox
Cc: Jann Horn
Cc: Greg KH
Cc: Jiri Slaby
Cc: Florian Weimer
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-04-27 06:47:32 +0800

26 Apr, 2016

1 commit

6c1ea260f libceph: make authorizer destruction independent of ceph_auth_client ... Browse Code »

Starting the kernel client with cephx disabled and then enabling cephx
and restarting userspace daemons can result in a crash:

[262671.478162] BUG: unable to handle kernel paging request at ffffebe000000000
[262671.531460] IP: [] kfree+0x5a/0x130
[262671.584334] PGD 0
[262671.635847] Oops: 0000 [#1] SMP
[262672.055841] CPU: 22 PID: 2961272 Comm: kworker/22:2 Not tainted 4.2.0-34-generic #39~14.04.1-Ubuntu
[262672.162338] Hardware name: Dell Inc. PowerEdge R720/068CDY, BIOS 2.4.3 07/09/2014
[262672.268937] Workqueue: ceph-msgr con_work [libceph]
[262672.322290] task: ffff88081c2d0dc0 ti: ffff880149ae8000 task.ti: ffff880149ae8000
[262672.428330] RIP: 0010:[] [] kfree+0x5a/0x130
[262672.535880] RSP: 0018:ffff880149aeba58 EFLAGS: 00010286
[262672.589486] RAX: 000001e000000000 RBX: 0000000000000012 RCX: ffff8807e7461018
[262672.695980] RDX: 000077ff80000000 RSI: ffff88081af2be04 RDI: 0000000000000012
[262672.803668] RBP: ffff880149aeba78 R08: 0000000000000000 R09: 0000000000000000
[262672.912299] R10: ffffebe000000000 R11: ffff880819a60e78 R12: ffff8800aec8df40
[262673.021769] R13: ffffffffc035f70f R14: ffff8807e5b138e0 R15: ffff880da9785840
[262673.131722] FS: 0000000000000000(0000) GS:ffff88081fac0000(0000) knlGS:0000000000000000
[262673.245377] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[262673.303281] CR2: ffffebe000000000 CR3: 0000000001c0d000 CR4: 00000000001406e0
[262673.417556] Stack:
[262673.472943] ffff880149aeba88 ffff88081af2be04 ffff8800aec8df40 ffff88081af2be04
[262673.583767] ffff880149aeba98 ffffffffc035f70f ffff880149aebac8 ffff8800aec8df00
[262673.694546] ffff880149aebac8 ffffffffc035c89e ffff8807e5b138e0 ffff8805b047f800
[262673.805230] Call Trace:
[262673.859116] [] ceph_x_destroy_authorizer+0x1f/0x50 [libceph]
[262673.968705] [] ceph_auth_destroy_authorizer+0x3e/0x60 [libceph]
[262674.078852] [] put_osd+0x45/0x80 [libceph]
[262674.134249] [] remove_osd+0xae/0x140 [libceph]
[262674.189124] [] __reset_osd+0x103/0x150 [libceph]
[262674.243749] [] kick_requests+0x223/0x460 [libceph]
[262674.297485] [] ceph_osdc_handle_map+0x282/0x5e0 [libceph]
[262674.350813] [] dispatch+0x4e/0x720 [libceph]
[262674.403312] [] try_read+0x3d1/0x1090 [libceph]
[262674.454712] [] ? dequeue_entity+0x152/0x690
[262674.505096] [] con_work+0xcb/0x1300 [libceph]
[262674.555104] [] process_one_work+0x14e/0x3d0
[262674.604072] [] worker_thread+0x11a/0x470
[262674.652187] [] ? rescuer_thread+0x310/0x310
[262674.699022] [] kthread+0xd2/0xf0
[262674.744494] [] ? kthread_create_on_node+0x1c0/0x1c0
[262674.789543] [] ret_from_fork+0x3f/0x70
[262674.834094] [] ? kthread_create_on_node+0x1c0/0x1c0

What happens is the following:

(1) new MON session is established
(2) old "none" ac is destroyed
(3) new "cephx" ac is constructed
...
(4) old OSD session (w/ "none" authorizer) is put
ceph_auth_destroy_authorizer(ac, osd->o_auth.authorizer)

osd->o_auth.authorizer in the "none" case is just a bare pointer into
ac, which contains a single static copy for all services. By the time
we get to (4), "none" ac, freed in (2), is long gone. On top of that,
a new vtable installed in (3) points us at ceph_x_destroy_authorizer(),
so we end up trying to destroy a "none" authorizer with a "cephx"
destructor operating on invalid memory!

To fix this, decouple authorizer destruction from ac and do away with
a single static "none" authorizer by making a copy for each OSD or MDS
session. Authorizers themselves are independent of ac and so there is
no reason for destroy_authorizer() to be an ac op. Make it an op on
the authorizer itself by turning ceph_authorizer into a real struct.

Fixes: http://tracker.ceph.com/issues/15447

Reported-by: Alan Zhang
Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2016-04-26 02:54:13 +0800

25 Apr, 2016

1 commit

c26f6c615 udf: Fix conversion of 'dstring' fields to UTF8 ... Browse Code »

Commit 9293fcfbc1812a22ad5ce1b542eb90c1bbe01be1
("udf: Remove struct ustr as non-needed intermediate storage"),
while getting rid of 'struct ustr', does not take any special care
of 'dstring' fields and effectively use fixed field length instead
of actual string length, encoded in the last byte of the field.

Also, commit 484a10f49387e4386bf2708532e75bf78ffea2cb
("udf: Merge linux specific translation into CS0 conversion function")
introduced checking of the length of the string being converted,
requiring proper alignment to number of bytes constituing each
character.

The UDF volume identifier is represented as a 32-bytes 'dstring',
and needs to be converted from CS0 to UTF8, while mounting UDF
filesystem. The changes in mentioned commits can in some cases
lead to incorrect handling of volume identifier:
- if the actual string in 'dstring' is of maximal length and
does not have zero bytes separating it from dstring encoded
length in last byte, that last byte may be included in conversion,
thus making incorrect resulting string;
- if the identifier is encoded with 2-bytes characters (compression
code is 16), the length of 31 bytes (32 bytes of field length minus
1 byte of compression code), taken as the string length, is reported
as an incorrect (unaligned) length, and the conversion fails, which
in its turn leads to volume mounting failure.

This patch introduces handling of 'dstring' encoded length field
in udf_CS0toUTF8 function, that is used in all and only cases
when 'dstring' fields are converted. Currently these cases are
processing of Volume Identifier and Volume Set Identifier fields.
The function is also renamed to udf_dstrCS0toUTF8 to distinctly
indicate that it handles 'dstring' input.

Signed-off-by: Andrew Gabbasov
Signed-off-by: Jan Kara

Andrew Gabbasov
2016-04-25 21:18:50 +0800

20 Apr, 2016

1 commit

9a0e3eea2 Merge branch 'ptmx-cleanup' ... Browse Code »

Merge the ptmx internal interface cleanup branch.

This doesn't change semantics, but it should be a sane basis for
eventually getting the multi-instance devpts code into some sane shape
where we can get rid of the kernel config option. Which we can
hopefully get done next merge window..

* ptmx-cleanup:
devpts: clean up interface to pty drivers

Linus Torvalds
2016-04-20 07:36:18 +0800

19 Apr, 2016

1 commit

67245ff33 devpts: clean up interface to pty drivers ... Browse Code »

This gets rid of the horrible notion of having that

struct inode *ptmx_inode

be the linchpin of the interface between the pty code and devpts.

By de-emphasizing the ptmx inode, a lot of things actually get cleaner,
and we will have a much saner way forward. In particular, this will
allow us to associate with any particular devpts instance at open-time,
and not be artificially tied to one particular ptmx inode.

The patch itself is actually fairly straightforward, and apart from some
locking and return path cleanups it's pretty mechanical:

- the interfaces that devpts exposes all take "struct pts_fs_info *"
instead of "struct inode *ptmx_inode" now.

NOTE! The "struct pts_fs_info" thing is a completely opaque structure
as far as the pty driver is concerned: it's still declared entirely
internally to devpts. So the pty code can't actually access it in any
way, just pass it as a "cookie" to the devpts code.

- the "look up the pts fs info" is now a single clear operation, that
also does the reference count increment on the pts superblock.

So "devpts_add/del_ref()" is gone, and replaced by a "lookup and get
ref" operation (devpts_get_ref(inode)), along with a "put ref" op
(devpts_put_ref()).

- the pty master "tty->driver_data" field now contains the pts_fs_info,
not the ptmx inode.

- because we don't care about the ptmx inode any more as some kind of
base index, the ref counting can now drop the inode games - it just
gets the ref on the superblock.

- the pts_fs_info now has a back-pointer to the super_block. That's so
that we can easily look up the information we actually need. Although
quite often, the pts fs info was actually all we wanted, and not having
to look it up based on some magical inode makes things more
straightforward.

In particular, now that "devpts_get_ref(inode)" operation should really
be the *only* place we need to look up what devpts instance we're
associated with, and we do it exactly once, at ptmx_open() time.

The other side of this is that one ptmx node could now be associated
with multiple different devpts instances - you could have a single
/dev/ptmx node, and then have multiple mount namespaces with their own
instances of devpts mounted on /dev/pts/. And that's all perfectly sane
in a model where we just look up the pts instance at open time.

This will eventually allow us to get rid of our odd single-vs-multiple
pts instance model, but this patch in itself changes no semantics, only
an internal binding model.

Cc: Eric Biederman
Cc: Peter Anvin
Cc: Andy Lutomirski
Cc: Al Viro
Cc: Peter Hurley
Cc: Serge Hallyn
Cc: Willy Tarreau
Cc: Aurelien Jarno
Cc: Alan Cox
Cc: Jann Horn
Cc: Greg KH
Cc: Jiri Slaby
Cc: Florian Weimer
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-04-19 04:43:02 +0800

17 Apr, 2016

1 commit

e1e22b27e Merge tag 'driver-core-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull misc fixes from Greg KH:
"Here are three small fixes for 4.6-rc4.

Two fix up some lz4 issues with big endian systems, and the remaining
one resolves a minor debugfs issue that was reported.

All have been in linux-next with no reported issues"

* tag 'driver-core-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
lib: lz4: cleanup unaligned access efficiency detection
lib: lz4: fixed zram with lz4 on big endian machines
debugfs: Make automount point inodes permanently empty

Linus Torvalds
2016-04-17 11:53:50 +0800

15 Apr, 2016

2 commits

dfe70581c Merge tag 'for-linus-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs ... Browse Code »

Pull f2fs/fscrypto fixes from Jaegeuk Kim:
"In addition to f2fs/fscrypto fixes, I've added one patch which
prevents RCU mode lookup in d_revalidate, as Al mentioned.

These patches fix f2fs and fscrypto based on -rc3 bug fixes in ext4
crypto, which have not yet been fully propagated as follows.

- use of dget_parent and file_dentry to avoid crashes
- disallow RCU-mode lookup in d_invalidate
- disallow -ENOMEM in the core data encryption path"

* tag 'for-linus-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
ext4/fscrypto: avoid RCU lookup in d_revalidate
fscrypto: don't let data integrity writebacks fail with ENOMEM
f2fs: use dget_parent and file_dentry in f2fs_file_open
fscrypto: use dget_parent() in fscrypt_d_revalidate()

Linus Torvalds
2016-04-15 09:22:42 +0800
34dbbcdbf Make file credentials available to the seqfile interfaces ... Browse Code »

A lot of seqfile users seem to be using things like %pK that uses the
credentials of the current process, but that is actually completely
wrong for filesystem interfaces.

The unix semantics for permission checking files is to check permissions
at _open_ time, not at read or write time, and that is not just a small
detail: passing off stdin/stdout/stderr to a suid application and making
the actual IO happen in privileged context is a classic exploit
technique.

So if we want to be able to look at permissions at read time, we need to
use the file open credentials, not the current ones. Normal file
accesses can just use "f_cred" (or any of the helper functions that do
that, like file_ns_capable()), but the seqfile interfaces do not have
any such options.

It turns out that seq_file _does_ save away the user_ns information of
the file, though. Since user_ns is just part of the full credential
information, replace that special case with saving off the cred pointer
instead, and suddenly seq_file has all the permission information it
needs.

Signed-off-by: Linus Torvalds

Linus Torvalds
2016-04-15 03:56:09 +0800

13 Apr, 2016

5 commits

03a8bb0e5 ext4/fscrypto: avoid RCU lookup in d_revalidate ... Browse Code »

As Al pointed, d_revalidate should return RCU lookup before using d_inode.
This was originally introduced by:
commit 34286d666230 ("fs: rcu-walk aware d_revalidate method").

Reported-by: Al Viro
Signed-off-by: Jaegeuk Kim
Cc: Theodore Ts'o
Cc: stable

Jaegeuk Kim
2016-04-13 11:01:35 +0800
87243deb8 debugfs: Make automount point inodes permanently empty ... Browse Code »

Starting with 4.1 the tracing subsystem has its own filesystem
which is automounted in the tracing subdirectory of debugfs.
Prior to this debugfs could be bind mounted in a cloned mount
namespace, but if tracefs has been mounted under debugfs this
now fails because there is a locked child mount. This creates
a regression for container software which bind mounts debugfs
to satisfy the assumption of some userspace software.

In other pseudo filesystems such as proc and sysfs we're already
creating mountpoints like this in such a way that no dirents can
be created in the directories, allowing them to be exceptions to
some MNT_LOCKED tests. In fact we're already do this for the
tracefs mountpoint in sysfs.

Do the same in debugfs_create_automount(), since the intention
here is clearly to create a mountpoint. This fixes the regression,
as locked child mounts on permanently empty directories do not
cause a bind mount to fail.

Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Seth Forshee
Acked-by: Serge Hallyn
Signed-off-by: Greg Kroah-Hartman

Seth Forshee
2016-04-13 06:01:53 +0800
b32e4482a fscrypto: don't let data integrity writebacks fail with ENOMEM ... Browse Code »

This patch fixes the issue introduced by the ext4 crypto fix in a same manner.
For F2FS, however, we flush the pending IOs and wait for a while to acquire free
memory.

Fixes: c9af28fdd4492 ("ext4 crypto: don't let data integrity writebacks fail with ENOMEM")
Cc: Theodore Ts'o
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-04-13 01:25:30 +0800
33b139512 f2fs: use dget_parent and file_dentry in f2fs_file_open ... Browse Code »

This patch synced with the below two ext4 crypto fixes together.

In 4.6-rc1, f2fs newly introduced accessing f_path.dentry which crashes
overlayfs. To fix, now we need to use file_dentry() to access that field.

Fixes: c0a37d487884 ("ext4: use file_dentry()")
Fixes: 9dd78d8c9a7b ("ext4: use dget_parent() in ext4_file_open()")
Cc: Miklos Szeredi
Cc: Theodore Ts'o
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-04-13 01:24:22 +0800
d7d753528 fscrypto: use dget_parent() in fscrypt_d_revalidate() ... Browse Code »

This patch updates fscrypto along with the below ext4 crypto change.

Fixes: 3d43bcfef5f0 ("ext4 crypto: use dget_parent() in ext4_d_revalidate()")
Cc: Theodore Ts'o
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-04-13 01:24:04 +0800

11 Apr, 2016

1 commit

9f2394c9b Revert "ext4: allow readdir()'s of large empty directories to be interrupted" ... Browse Code »

This reverts commit 1028b55bafb7611dda1d8fed2aeca16a436b7dff.

It's broken: it makes ext4 return an error at an invalid point, causing
the readdir wrappers to write the the position of the last successful
directory entry into the position field, which means that the next
readdir will now return that last successful entry _again_.

You can only return fatal errors (that terminate the readdir directory
walk) from within the filesystem readdir functions, the "normal" errors
(that happen when the readdir buffer fills up, for example) happen in
the iterorator where we know the position of the actual failing entry.

I do have a very different patch that does the "signal_pending()"
handling inside the iterator function where it is allowable, but while
that one passes all the sanity checks, I screwed up something like four
times while emailing it out, so I'm not going to commit it today.

So my track record is not good enough, and the stars will have to align
better before that one gets committed. And it would be good to get some
review too, of course, since celestial alignments are always an iffy
debugging model.

IOW, let's just revert the commit that caused the problem for now.

Reported-by: Greg Thelen
Cc: Theodore Ts'o
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-04-11 07:52:24 +0800

10 Apr, 2016

2 commits

839a3f765 Merge branch 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"These are bug fixes, including a really old fsync bug, and a few trace
points to help us track down problems in the quota code"

* 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix file/data loss caused by fsync after rename and new inode
btrfs: Reset IO error counters before start of device replacing
btrfs: Add qgroup tracing
Btrfs: don't use src fd for printk
btrfs: fallback to vmalloc in btrfs_compare_tree
btrfs: handle non-fatal errors in btrfs_qgroup_inherit()
btrfs: Output more info for enospc_debug mount option
Btrfs: fix invalid reference in replace_path
Btrfs: Improve FL_KEEP_SIZE handling in fallocate

Linus Torvalds
2016-04-10 01:41:34 +0800
675921264 Merge tag 'for-linus-4.6-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux ... Browse Code »

Pull orangefs fixes from Mike Marshall:
"Orangefs cleanups and a strncpy vulnerability fix.

Cleanups:
- remove an unused variable from orangefs_readdir.
- clean up printk wrapper used for ofs "gossip" debugging.
- clean up truncate ctime and mtime setting in inode.c
- remove a useless null check found by coccinelle.
- optimize some memcpy/memset boilerplate code.
- remove some useless sanity checks from xattr.c

Fix:
- fix a potential strncpy vulnerability"

* tag 'for-linus-4.6-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: remove unused variable
orangefs: Add KERN_ to gossip_ macros
orangefs: strncpy -> strscpy
orangefs: clean up truncate ctime and mtime setting
Orangefs: fix ifnullfree.cocci warnings
Orangefs: optimize boilerplate code.
Orangefs: xattr.c cleanup

Linus Torvalds
2016-04-10 01:33:58 +0800

09 Apr, 2016

7 commits

e56f49814 orangefs: remove unused variable ... Browse Code »

Signed-off-by: Martin Brandenburg
Signed-off-by: Mike Marshall

Martin Brandenburg
2016-04-09 03:50:44 +0800
1917a6932 orangefs: Add KERN_<LEVEL> to gossip_<level> macros ... Browse Code »

Emit the logging messages at the appropriate levels.

Miscellanea:

o Change format to fmt
o Use the more common ##__VA_ARGS__

Signed-off-by: Joe Perches
Signed-off-by: Mike Marshall

Joe Perches
2016-04-09 02:10:45 +0800
2eacea74c orangefs: strncpy -> strscpy ... Browse Code »

It would have been possible for a rogue client-core to send in a symlink
target which is not NUL terminated. This returns EIO if the client-core
gives us corrupt data.

Leave debugfs and superblock code as is for now.

Other dcache.c and namei.c strncpy instances are safe because
ORANGEFS_NAME_MAX = NAME_MAX + 1; there is always enough space for a
name plus a NUL byte.

Signed-off-by: Martin Brandenburg
Signed-off-by: Mike Marshall

Martin Brandenburg
2016-04-09 02:10:34 +0800
f83140c14 orangefs: clean up truncate ctime and mtime setting ... Browse Code »

The ctime and mtime are always updated on a successful ftruncate and
only updated on a successful truncate where the size changed.

We handle the ``if the size changed'' bit.

This matches FUSE's behavior.

Signed-off-by: Martin Brandenburg
Signed-off-by: Mike Marshall

Martin Brandenburg
2016-04-09 02:10:31 +0800
2fa37fd71 Orangefs: fix ifnullfree.cocci warnings ... Browse Code »

fs/orangefs/orangefs-debugfs.c:130:2-26: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.

NULL check before some freeing functions is not needed.

Based on checkpatch warning
"kfree(NULL) is safe this check is probably not required"
and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci

Signed-off-by: Fengguang Wu
Signed-off-by: Mike Marshall

kbuild test robot
2016-04-09 02:08:38 +0800
a9bb3ba81 Orangefs: optimize boilerplate code. ... Browse Code »

Suggested by David Binderman
The former can potentially be a performance win over the latter.

memcpy(d, s, len);
memset(d+len, c, size-len);

memset(d, c, size);
memcpy(d, s, len);

Signed-off-by: Mike Marshall

Mike Marshall
2016-04-09 02:08:27 +0800
2d09a2ca6 Orangefs: xattr.c cleanup ... Browse Code »

1. It is nonsense to test for negative size_t, suggested by
David Binderman

2. By the time Orangefs gets called, the vfs has ensured that
name != NULL, and that buffer and size are sane.

Signed-off-by: Mike Marshall

Mike Marshall
2016-04-09 02:08:27 +0800

08 Apr, 2016

1 commit

93061f390 Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 bugfixes from Ted Ts'o:
"These changes contains a fix for overlayfs interacting with some
(badly behaved) dentry code in various file systems. These have been
reviewed by Al and the respective file system mtinainers and are going
through the ext4 tree for convenience.

This also has a few ext4 encryption bug fixes that were discovered in
Android testing (yes, we will need to get these sync'ed up with the
fs/crypto code; I'll take care of that). It also has some bug fixes
and a change to ignore the legacy quota options to allow for xfstests
regression testing of ext4's internal quota feature and to be more
consistent with how xfs handles this case"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: ignore quota mount options if the quota feature is enabled
ext4 crypto: fix some error handling
ext4: avoid calling dquot_get_next_id() if quota is not enabled
ext4: retry block allocation for failed DIO and DAX writes
ext4: add lockdep annotations for i_data_sem
ext4: allow readdir()'s of large empty directories to be interrupted
btrfs: fix crash/invalid memory access on fsync when using overlayfs
ext4 crypto: use dget_parent() in ext4_d_revalidate()
ext4: use file_dentry()
ext4: use dget_parent() in ext4_file_open()
nfs: use file_dentry()
fs: add file_dentry()
ext4 crypto: don't let data integrity writebacks fail with ENOMEM
ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()

Linus Torvalds
2016-04-08 08:22:20 +0800

07 Apr, 2016

1 commit

56f23fdbb Btrfs: fix file/data loss caused by fsync after rename and new inode ... Browse Code »

If we rename an inode A (be it a file or a directory), create a new
inode B with the old name of inode A and under the same parent directory,
fsync inode B and then power fail, at log tree replay time we end up
removing inode A completely. If inode A is a directory then all its files
are gone too.

Example scenarios where this happens:
This is reproducible with the following steps, taken from a couple of
test cases written for fstests which are going to be submitted upstream
soon:

# Scenario 1

mkfs.btrfs -f /dev/sdc
mount /dev/sdc /mnt
mkdir -p /mnt/a/x
echo "hello" > /mnt/a/x/foo
echo "world" > /mnt/a/x/bar
sync
mv /mnt/a/x /mnt/a/y
mkdir /mnt/a/x
xfs_io -c fsync /mnt/a/x

The next time the fs is mounted, log tree replay happens and
the directory "y" does not exist nor do the files "foo" and
"bar" exist anywhere (neither in "y" nor in "x", nor the root
nor anywhere).

# Scenario 2

mkfs.btrfs -f /dev/sdc
mount /dev/sdc /mnt
mkdir /mnt/a
echo "hello" > /mnt/a/foo
sync
mv /mnt/a/foo /mnt/a/bar
echo "world" > /mnt/a/foo
xfs_io -c fsync /mnt/a/foo

The next time the fs is mounted, log tree replay happens and the
file "bar" does not exists anymore. A file with the name "foo"
exists and it matches the second file we created.

Another related problem that does not involve file/data loss is when a
new inode is created with the name of a deleted snapshot and we fsync it:

mkfs.btrfs -f /dev/sdc
mount /dev/sdc /mnt
mkdir /mnt/testdir
btrfs subvolume snapshot /mnt /mnt/testdir/snap
btrfs subvolume delete /mnt/testdir/snap
rmdir /mnt/testdir
mkdir /mnt/testdir
xfs_io -c fsync /mnt/testdir # or fsync some file inside /mnt/testdir

The next time the fs is mounted the log replay procedure fails because
it attempts to delete the snapshot entry (which has dir item key type
of BTRFS_ROOT_ITEM_KEY) as if it were a regular (non-root) entry,
resulting in the following error that causes mount to fail:

[52174.510532] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257
[52174.512570] ------------[ cut here ]------------
[52174.513278] WARNING: CPU: 12 PID: 28024 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x178/0x351 [btrfs]()
[52174.514681] BTRFS: Transaction aborted (error -2)
[52174.515630] Modules linked in: btrfs dm_flakey dm_mod overlay crc32c_generic ppdev xor raid6_pq acpi_cpufreq parport_pc tpm_tis sg parport tpm evdev i2c_piix4 proc
[52174.521568] CPU: 12 PID: 28024 Comm: mount Tainted: G W 4.5.0-rc6-btrfs-next-27+ #1
[52174.522805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[52174.524053] 0000000000000000 ffff8801df2a7710 ffffffff81264e93 ffff8801df2a7758
[52174.524053] 0000000000000009 ffff8801df2a7748 ffffffff81051618 ffffffffa03591cd
[52174.524053] 00000000fffffffe ffff88015e6e5000 ffff88016dbc3c88 ffff88016dbc3c88
[52174.524053] Call Trace:
[52174.524053] [] dump_stack+0x67/0x90
[52174.524053] [] warn_slowpath_common+0x99/0xb2
[52174.524053] [] ? __btrfs_unlink_inode+0x178/0x351 [btrfs]
[52174.524053] [] warn_slowpath_fmt+0x48/0x50
[52174.524053] [] __btrfs_unlink_inode+0x178/0x351 [btrfs]
[52174.524053] [] ? iput+0xb0/0x284
[52174.524053] [] btrfs_unlink_inode+0x1c/0x3d [btrfs]
[52174.524053] [] check_item_in_log+0x1fe/0x29b [btrfs]
[52174.524053] [] replay_dir_deletes+0x167/0x1cf [btrfs]
[52174.524053] [] fixup_inode_link_count+0x289/0x2aa [btrfs]
[52174.524053] [] fixup_inode_link_counts+0xcb/0x105 [btrfs]
[52174.524053] [] btrfs_recover_log_trees+0x258/0x32c [btrfs]
[52174.524053] [] ? replay_one_extent+0x511/0x511 [btrfs]
[52174.524053] [] open_ctree+0x1dd4/0x21b9 [btrfs]
[52174.524053] [] btrfs_mount+0x97e/0xaed [btrfs]
[52174.524053] [] ? trace_hardirqs_on+0xd/0xf
[52174.524053] [] mount_fs+0x67/0x131
[52174.524053] [] vfs_kern_mount+0x6c/0xde
[52174.524053] [] btrfs_mount+0x1ac/0xaed [btrfs]
[52174.524053] [] ? trace_hardirqs_on+0xd/0xf
[52174.524053] [] ? lockdep_init_map+0xb9/0x1b3
[52174.524053] [] mount_fs+0x67/0x131
[52174.524053] [] vfs_kern_mount+0x6c/0xde
[52174.524053] [] do_mount+0x8a6/0x9e8
[52174.524053] [] ? strndup_user+0x3f/0x59
[52174.524053] [] SyS_mount+0x77/0x9f
[52174.524053] [] entry_SYSCALL_64_fastpath+0x12/0x6b
[52174.561288] ---[ end trace 6b53049efb1a3ea6 ]---

Fix this by forcing a transaction commit when such cases happen.
This means we check in the commit root of the subvolume tree if there
was any other inode with the same reference when the inode we are
fsync'ing is a new inode (created in the current transaction).

Test cases for fstests, covering all the scenarios given above, were
submitted upstream for fstests:

* fstests: generic test for fsync after renaming directory
https://patchwork.kernel.org/patch/8694281/

* fstests: generic test for fsync after renaming file
https://patchwork.kernel.org/patch/8694301/

* fstests: add btrfs test for fsync after snapshot deletion
https://patchwork.kernel.org/patch/8670671/

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana
Signed-off-by: Chris Mason

Filipe Manana
2016-04-07 08:01:44 +0800

05 Apr, 2016

5 commits

e865f4965 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull quota fixes from Jan Kara:
"Fixes for oopses when the new quotactl gets used with quotas disabled"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas
quota: Handle Q_GETNEXTQUOTA when quota is disabled

Linus Torvalds
2016-04-05 04:18:27 +0800
c7e82c648 Merge tag 'f2fs-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs ... Browse Code »

Pull f2fs fixes from Jaegeuk Kim.

* tag 'f2fs-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: retrieve IO write stat from the right place
f2fs crypto: fix corrupted symlink in encrypted case
f2fs: cover large section in sanity check of super

Linus Torvalds
2016-04-05 04:00:39 +0800
4a2d057e4 Merge branch 'PAGE_CACHE_SIZE-removal' ... Browse Code »

Merge PAGE_CACHE_SIZE removal patches from Kirill Shutemov:
"PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

Let's stop pretending that pages in page cache are special. They are
not.

The first patch with most changes has been done with coccinelle. The
second is manual fixups on top.

The third patch removes macros definition"

[ I was planning to apply this just before rc2, but then I spaced out,
so here it is right _after_ rc2 instead.

As Kirill suggested as a possibility, I could have decided to only
merge the first two patches, and leave the old interfaces for
compatibility, but I'd rather get it all done and any out-of-tree
modules and patches can trivially do the converstion while still also
working with older kernels, so there is little reason to try to
maintain the redundant legacy model. - Linus ]

* PAGE_CACHE_SIZE-removal:
mm: drop PAGE_CACHE_* and page_cache_{get,release} definition
mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

Linus Torvalds
2016-04-05 01:50:24 +0800
ea1754a08 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage ... Browse Code »

Mostly direct substitution with occasional adjustment or removing
outdated comments.

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800
09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

04 Apr, 2016

5 commits

7ccefb98c btrfs: Reset IO error counters before start of device replacing ... Browse Code »

If device replace entry was found on disk at mounting and its num_write_errors
stats counter has non-NULL value, then replace operation will never be
finished and -EIO error will be reported by btrfs_scrub_dev() because
this counter is never reset.

# mount -o degraded /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
# btrfs replace status /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
Started on 25.Mar 07:28:00, canceled on 25.Mar 07:28:01 at 0.0%, 40 write errs, 0 uncorr. read errs
# btrfs replace start -B 4 /dev/sdg /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
ERROR: ioctl(DEV_REPLACE_START) failed on "/media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/": Input/output error, no error

Reset num_write_errors and num_uncorrectable_read_errors counters in the
dev_replace structure before start of replacing.

Signed-off-by: Yauhen Kharuzhy
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Yauhen Kharuzhy
2016-04-04 22:29:22 +0800
0f5dcf8de btrfs: Add qgroup tracing ... Browse Code »

This patch adds tracepoints to the qgroup code on both the reporting side
(insert_dirty_extents) and the accounting side. Taken together it allows us
to see what qgroup operations have happened, and what their result was.

Signed-off-by: Mark Fasheh
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Mark Fasheh
2016-04-04 22:29:22 +0800
c79b47133 Btrfs: don't use src fd for printk ... Browse Code »

The fd we pass in may not be on a btrfs file system, so don't try to do
BTRFS_I() on it. Thanks,

Signed-off-by: Josef Bacik
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Josef Bacik
2016-04-04 22:29:22 +0800
8f282f71e btrfs: fallback to vmalloc in btrfs_compare_tree ... Browse Code »

The allocation of node could fail if the memory is too fragmented for a
given node size, practically observed with 64k.

http://article.gmane.org/gmane.comp.file-systems.btrfs/54689

Reported-and-tested-by: Jean-Denis Girard
Signed-off-by: David Sterba

David Sterba
2016-04-04 22:29:22 +0800
918c2ee10 btrfs: handle non-fatal errors in btrfs_qgroup_inherit() ... Browse Code »

create_pending_snapshot() will go readonly on _any_ error return from
btrfs_qgroup_inherit(). If qgroups are enabled, a user can crash their fs by
just making a snapshot and asking it to inherit from an invalid qgroup. For
example:

$ btrfs sub snap -i 1/10 /btrfs/ /btrfs/foo

Will cause a transaction abort.

Fix this by only throwing errors in btrfs_qgroup_inherit() when we know
going readonly is acceptable.

The following xfstests test case reproduces this bug:

seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"

here=`pwd`
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15

_cleanup()
{
cd /
rm -f $tmp.*
}

# get standard environment, filters and checks
. ./common/rc
. ./common/filter

# remove previous $seqres.full before test
rm -f $seqres.full

# real QA test starts here
_supported_fs btrfs
_supported_os Linux
_require_scratch

rm -f $seqres.full

_scratch_mkfs
_scratch_mount
_run_btrfs_util_prog quota enable $SCRATCH_MNT
# The qgroup '1/10' does not exist and should be silently ignored
_run_btrfs_util_prog subvolume snapshot -i 1/10 $SCRATCH_MNT $SCRATCH_MNT/snap1

_scratch_unmount

echo "Silence is golden"

status=0
exit

Signed-off-by: Mark Fasheh
Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

Mark Fasheh
2016-04-04 22:29:22 +0800