Eric Lee / smarc-fsl-linux-kernel

11 Sep, 2016

2 commits

98ac9a608 Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm fixes from Dan Williams:
"nvdimm fixes for v4.8, two of them are tagged for -stable:

- Fix devm_memremap_pages() to use track_pfn_insert(). Otherwise,
DAX pmd mappings end up with an uncached pgprot, and unusable
performance for the device-dax interface. The device-dax interface
appeared in 4.7 so this is tagged for -stable.

- Fix a couple VM_BUG_ON() checks in the show_smaps() path to
understand DAX pmd entries. This fix is tagged for -stable.

- Fix a mis-merge of the nfit machine-check handler to flip the
polarity of an if() to match the final version of the patch that
Vishal sent for 4.8-rc1. Without this the nfit machine check
handler never detects / inserts new 'badblocks' entries which
applications use to identify lost portions of files.

- For test purposes, fix the nvdimm_clear_poison() path to operate on
legacy / simulated nvdimm memory ranges. Without this fix a test
can set badblocks, but never clear them on these ranges.

- Fix the range checking done by dax_dev_pmd_fault(). This is not
tagged for -stable since this problem is mitigated by specifying
aligned resources at device-dax setup time.

These patches have appeared in a next release over the past week. The
recent rebase you can see in the timestamps was to drop an invalid fix
as identified by the updated device-dax unit tests [1]. The -mm
touches have an ack from Andrew"

[1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm: allow legacy (e820) pmem region to clear bad blocks
nfit, mce: Fix SPA matching logic in MCE handler
mm: fix cache mode of dax pmd mappings
mm: fix show_smap() for zone_device-pmd ranges
dax: fix mapping size check

Linus Torvalds
2016-09-11 00:58:52 +0800
6905732c8 Merge tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull fscrypto fixes fromTed Ts'o:
"Fix some brown-paper-bag bugs for fscrypto, including one one which
allows a malicious user to set an encryption policy on an empty
directory which they do not own"

* tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
fscrypto: require write access to mount to set encryption policy
fscrypto: only allow setting encryption policy on directories
fscrypto: add authorization check for setting encryption policy

Linus Torvalds
2016-09-11 00:18:33 +0800

10 Sep, 2016

7 commits

ba63f23d6 fscrypto: require write access to mount to set encryption policy ... Browse Code »

Since setting an encryption policy requires writing metadata to the
filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
Otherwise, a user could cause a write to a frozen or readonly
filesystem. This was handled correctly by f2fs but not by ext4. Make
fscrypt_process_policy() handle it rather than relying on the filesystem
to get it right.

Signed-off-by: Eric Biggers
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: Theodore Ts'o
Acked-by: Jaegeuk Kim

Eric Biggers
2016-09-10 13:18:57 +0800
002ced4be fscrypto: only allow setting encryption policy on directories ... Browse Code »

The FS_IOC_SET_ENCRYPTION_POLICY ioctl allowed setting an encryption
policy on nondirectory files. This was unintentional, and in the case
of nonempty regular files did not behave as expected because existing
data was not actually encrypted by the ioctl.

In the case of ext4, the user could also trigger filesystem errors in
->empty_dir(), e.g. due to mismatched "directory" checksums when the
kernel incorrectly tried to interpret a regular file as a directory.

This bug affected ext4 with kernels v4.8-rc1 or later and f2fs with
kernels v4.6 and later. It appears that older kernels only permitted
directories and that the check was accidentally lost during the
refactoring to share the file encryption code between ext4 and f2fs.

This patch restores the !S_ISDIR() check that was present in older
kernels.

Signed-off-by: Eric Biggers
Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o

Eric Biggers
2016-09-10 11:38:12 +0800
163ae1c6a fscrypto: add authorization check for setting encryption policy ... Browse Code »

On an ext4 or f2fs filesystem with file encryption supported, a user
could set an encryption policy on any empty directory(*) to which they
had readonly access. This is obviously problematic, since such a
directory might be owned by another user and the new encryption policy
would prevent that other user from creating files in their own directory
(for example).

Fix this by requiring inode_owner_or_capable() permission to set an
encryption policy. This means that either the caller must own the file,
or the caller must have the capability CAP_FOWNER.

(*) Or also on any regular file, for f2fs v4.6 and later and ext4
v4.8-rc1 and later; a separate bug fix is coming for that.

Signed-off-by: Eric Biggers
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: Theodore Ts'o

Eric Biggers
2016-09-10 11:37:14 +0800
ca120cf68 mm: fix show_smap() for zone_device-pmd ranges ... Browse Code »

Attempting to dump /proc//smaps for a process with pmd dax mappings
currently results in the following VM_BUG_ONs:

kernel BUG at mm/huge_memory.c:1105!
task: ffff88045f16b140 task.stack: ffff88045be14000
RIP: 0010:[] [] follow_trans_huge_pmd+0x2cb/0x340
[..]
Call Trace:
[] smaps_pte_range+0xa0/0x4b0
[] ? vsnprintf+0x255/0x4c0
[] __walk_page_range+0x1fe/0x4d0
[] walk_page_vma+0x62/0x80
[] show_smap+0xa6/0x2b0

kernel BUG at fs/proc/task_mmu.c:585!
RIP: 0010:[] [] smaps_pte_range+0x499/0x4b0
Call Trace:
[] ? vsnprintf+0x255/0x4c0
[] __walk_page_range+0x1fe/0x4d0
[] walk_page_vma+0x62/0x80
[] show_smap+0xa6/0x2b0

These locations are sanity checking page flags that must be set for an
anonymous transparent huge page, but are not set for the zone_device
pages associated with dax mappings.

Cc: Ross Zwisler
Cc: Kirill A. Shutemov
Acked-by: Andrew Morton
Signed-off-by: Dan Williams

Dan Williams
2016-09-10 08:34:45 +0800
6dc728ccd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

Pull fuse fix from Miklos Szeredi:
"This fixes a deadlock when fuse, direct I/O and loop device are
combined"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: direct-io: don't dirty ITER_BVEC pages

Linus Torvalds
2016-09-10 04:00:41 +0800
5c44ad6a3 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs ... Browse Code »

Pull overlayfs fix from Miklos Szeredi:
"This fixes a regression caused by the last pull request"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix workdir creation

Linus Torvalds
2016-09-10 03:56:28 +0800
f4a9c169c Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"I'm not proud of how long it took me to track down that one liner in
btrfs_sync_log(), but the good news is the patches I was trying to
blame for these problems were actually fine (sorry Filipe)"

* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress
btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns
btrfs: do not decrease bytes_may_use when replaying extents

Linus Torvalds
2016-09-10 03:52:31 +0800

08 Sep, 2016

1 commit

b7f3c7d34 Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/… ... Browse Code »

…linux into for-linus-4.8

Chris Mason
2016-09-08 03:55:36 +0800

06 Sep, 2016

2 commits

ce129655c btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress ... Browse Code »

In btrfs_async_reclaim_metadata_space(), we use ticket's address to
determine whether asynchronous metadata reclaim work is making progress.

ticket = list_first_entry(&space_info->tickets,
struct reserve_ticket, list);
if (last_ticket == ticket) {
flush_state++;
} else {
last_ticket = ticket;
flush_state = FLUSH_DELAYED_ITEMS_NR;
if (commit_cycles)
commit_cycles--;
}

But indeed it's wrong, we should not rely on local variable's address to
do this check, because addresses may be same. In my test environment, I
dd one 168MB file in a 256MB fs, found that for this file, every time
wait_reserve_ticket() called, local variable ticket's address is same,

For above codes, assume a previous ticket's address is addrA, last_ticket
is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and
wake up it, then another ticket is added, but with the same address addrA,
now last_ticket will be same to current ticket, then current ticket's flush
work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR,
which may result in some enospc issues(I have seen this in my test machine).

Signed-off-by: Wang Xiaoguang
Reviewed-by: Josef Bacik
Signed-off-by: David Sterba

Wang Xiaoguang
2016-09-06 22:31:43 +0800
cbd60aa7c Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns ... Browse Code »

We use a btrfs_log_ctx structure to pass information into the
tree log commit, and get error values out. It gets added to a per
log-transaction list which we walk when things go bad.

Commit d1433debe added an optimization to skip waiting for the log
commit, but didn't take root_log_ctx out of the list. This
patch makes sure we remove things before exiting.

Signed-off-by: Chris Mason
Fixes: d1433debe7f4346cf9fc0dafc71c3137d2a97bc4
cc: stable@vger.kernel.org # 3.15+

Chris Mason
2016-09-06 20:57:25 +0800

05 Sep, 2016

3 commits

ed7a69483 btrfs: do not decrease bytes_may_use when replaying extents ... Browse Code »

When replaying extents, there is no need to update bytes_may_use
in btrfs_alloc_logged_file_extent(), otherwise it'll trigger a
WARN_ON about bytes_may_use.

Fixes: ("btrfs: update btrfs_space_info's bytes_may_use timely")
Signed-off-by: Wang Xiaoguang
Reviewed-by: Josef Bacik
Signed-off-by: David Sterba

Wang Xiaoguang
2016-09-05 23:40:41 +0800
0f5aa88a7 ceph: do not modify fi->frag in need_reset_readdir() ... Browse Code »

Commit f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset")
modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |=
fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a
comparison operator with an assignment one.

This looks like a typo which is reported by clang when building the
kernel with some warning flags:

fs/ceph/dir.c:600:22: error: using the result of an assignment as a
condition without parentheses [-Werror,-Wparentheses]
} else if (fi->frag |= fpos_frag(new_pos)) {
~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
fs/ceph/dir.c:600:22: note: place parentheses around the assignment
to silence this warning
} else if (fi->frag |= fpos_frag(new_pos)) {
^
( )
fs/ceph/dir.c:600:22: note: use '!=' to turn this compound
assignment into an inequality comparison
} else if (fi->frag |= fpos_frag(new_pos)) {
^~
!=

Fixes: f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset")
Signed-off-by: Nicolas Iooss
Signed-off-by: Ilya Dryomov

Nicolas Iooss
2016-09-05 20:30:35 +0800
e1ff3dd1a ovl: fix workdir creation ... Browse Code »

Workdir creation fails in latest kernel.

Fix by allowing EOPNOTSUPP as a valid return value from
vfs_removexattr(XATTR_NAME_POSIX_ACL_*). Upper filesystem may not support
ACL and still be perfectly able to support overlayfs.

Reported-by: Martin Ziegler
Signed-off-by: Miklos Szeredi
Fixes: c11b9fdd6a61 ("ovl: remove posix_acl_default from workdir")
Cc:

Miklos Szeredi
2016-09-05 19:55:20 +0800

04 Sep, 2016

3 commits

4b30b6d12 Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"I'm still prepping a set of fixes for btrfs fsync, just nailing down a
hard to trigger memory corruption. For now, these are tested and ready."

* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket()
Btrfs: fix endless loop in balancing block groups
Btrfs: kill invalid ASSERT() in process_all_refs()

Linus Torvalds
2016-09-04 03:40:45 +0800
41488202f Merge tag 'driver-core-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core fixes from Greg KH:
"Here are three small fixes for 4.8-rc5.

One for sysfs, one for kernfs, and one documentation fix, all for
reported issues. All of these have been in linux-next for a while"

* tag 'driver-core-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: correctly handle read offset on PREALLOC attrs
documentation: drivers/core/of: fix name of of_node symlink
kernfs: don't depend on d_find_any_alias() when generating notifications

Linus Torvalds
2016-09-04 02:36:55 +0800
3e423945e devpts: return NULL pts 'priv' entry for non-devpts nodes ... Browse Code »

In commit 8ead9dd54716 ("devpts: more pty driver interface cleanups") I
made devpts_get_priv() just return the dentry->fs_data directly. And
because I thought it wouldn't happen, I added a warning if you ever saw
a pts node that wasn't on devpts.

And no, that warning never triggered under any actual real use, but you
can trigger it by creating nonsensical pts nodes by hand.

So just revert the warning, and make devpts_get_priv() return NULL for
that case like it used to.

Reported-by: Dmitry Vyukov
Cc: stable@vger.kernel.org # 4.6+
Cc: Eric W Biederman"
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-09-04 02:02:50 +0800

03 Sep, 2016

1 commit

f28929ba3 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs ... Browse Code »

Pull overlayfs fixes from Miklos Szeredi:
"Most of this is regression fixes for posix acl behavior introduced in
4.8-rc1 (these were caught by the pjd-fstest suite). The are also
miscellaneous fixes marked as stable material and cleanups.

Other than overlayfs code, it touches to add a constant
with which to disable posix acl caching. No changes needed to the
actual caching code, it automatically does the right thing, although
later we may want to optimize this case.

I'm now testing overlayfs with the following test suites to catch
regressions:

- unionmount-testsuite
- xfstests
- pjd-fstest"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: update doc
ovl: listxattr: use strnlen()
ovl: Switch to generic_getxattr
ovl: copyattr after setting POSIX ACL
ovl: Switch to generic_removexattr
ovl: Get rid of ovl_xattr_noacl_handlers array
ovl: Fix OVL_XATTR_PREFIX
ovl: fix spelling mistake: "directries" -> "directories"
ovl: don't cache acl on overlay layer
ovl: use cached acl on underlying layer
ovl: proper cleanup of workdir
ovl: remove posix_acl_default from workdir
ovl: handle umask and posix_acl_default correctly on creation
ovl: don't copy up opaqueness

Linus Torvalds
2016-09-03 00:32:15 +0800

02 Sep, 2016

2 commits

511a8cdb6 Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit ... Browse Code »

Pull audit fixes from Paul Moore:
"Two small patches to fix some bugs with the audit-by-executable
functionality we introduced back in v4.3 (both patches are marked
for the stable folks)"

* 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit:
audit: fix exe_file access in audit_exe_compare
mm: introduce get_task_exe_file

Linus Torvalds
2016-09-02 06:55:56 +0800
7d1ce606a Merge tag 'xfs-iomap-for-linus-4.8-rc5' of git://git.kernel.org/pub/scm/linux/ke… ... Browse Code »

…rnel/git/dgc/linux-xfs

Pull xfs and iomap fixes from Dave Chinner:
"Most of these changes are small regression fixes that address problems
introduced in the 4.8-rc1 window. The two fixes that aren't (IO
completion fix and superblock inprogress check) are fixes for problems
introduced some time ago and need to be pushed back to stable kernels.

Changes in this update:
- iomap FIEMAP_EXTENT_MERGED usage fix
- additional mount-time feature restrictions
- rmap btree query fixes
- freeze/unmount io completion workqueue fix
- memory corruption fix for deferred operations handling"

* tag 'xfs-iomap-for-linus-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: track log done items directly in the deferred pending work item
iomap: don't set FIEMAP_EXTENT_MERGED for extent based filesystems
xfs: prevent dropping ioend completions during buftarg wait
xfs: fix superblock inprogress check
xfs: simple btree query range should look right if LE lookup fails
xfs: fix some key handling problems in _btree_simple_query_range
xfs: don't log the entire end of the AGF
xfs: disallow mounting of realtime + rmap filesystems
xfs: don't perform lookups on zero-height btrees

Linus Torvalds
2016-09-02 06:33:16 +0800

01 Sep, 2016

17 commits

e0af24849 btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket() ... Browse Code »

If can_overcommit() in btrfs_calc_reclaim_metadata_size() returns true,
btrfs_async_reclaim_metadata_space() will not reclaim metadata space, just
return directly and also forget to wake up process which are waiting for
their tickets, so these processes will wait endlessly.

Fstests case generic/172 with mount option "-o compress=lzo" have revealed
this bug in my test machine. Here if we have tickets to handle, we must
handle them first.

Signed-off-by: Wang Xiaoguang
Reviewed-by: Josef Bacik
Signed-off-by: David Sterba

Wang Xiaoguang
2016-09-01 23:23:24 +0800
a9b1fc851 Btrfs: fix endless loop in balancing block groups ... Browse Code »

Qgroup function may overwrite the saved error 'err' with 0
in case quota is not enabled, and this ends up with a
endless loop in balance because we keep going back to balance
the same block group.

It really should use 'ret' instead.

Signed-off-by: Liu Bo
Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

Liu Bo
2016-09-01 23:16:47 +0800
3dc09ec89 Btrfs: kill invalid ASSERT() in process_all_refs() ... Browse Code »

Suppose you have the following tree in snap1 on a file system mounted with -o
inode_cache so that inode numbers are recycled

└── [ 258] a
└── [ 257] b

and then you remove b, rename a to c, and then re-create b in c so you have the
following tree

└── [ 258] c
└── [ 257] b

and then you try to do an incremental send you will hit

ASSERT(pending_move == 0);

in process_all_refs(). This is because we assume that any recycling of inodes
will not have a pending change in our path, which isn't the case. This is the
case for the DELETE side, since we want to remove the old file using the old
path, but on the create side we could have a pending move and need to do the
normal pending rename dance. So remove this ASSERT() and put a comment about
why we ignore pending_move. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: David Sterba

Josef Bacik
2016-09-01 23:16:47 +0800
7cb35119d ovl: listxattr: use strnlen() ... Browse Code »

Be defensive about what underlying fs provides us in the returned xattr
list buffer. If it's not properly null terminated, bail out with a warning
insead of BUG.

Signed-off-by: Miklos Szeredi
Cc:

Miklos Szeredi
2016-09-01 17:12:00 +0800
0eb45fc3b ovl: Switch to generic_getxattr ... Browse Code »

Now that overlayfs has xattr handlers for iop->{set,remove}xattr, use
those same handlers for iop->getxattr as well.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Miklos Szeredi

Andreas Gruenbacher
2016-09-01 17:12:00 +0800
ce31513a9 ovl: copyattr after setting POSIX ACL ... Browse Code »

Setting POSIX acl may also modify the file mode, so need to copy that up to
the overlay inode.

Reported-by: Eryu Guan
Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting")
Signed-off-by: Miklos Szeredi

Miklos Szeredi
2016-09-01 17:12:00 +0800
0e585ccc1 ovl: Switch to generic_removexattr ... Browse Code »

Commit d837a49bd57f ("ovl: fix POSIX ACL setting") switches from
iop->setxattr from ovl_setxattr to generic_setxattr, so switch from
ovl_removexattr to generic_removexattr as well. As far as permission
checking goes, the same rules should apply in either case.

While doing that, rename ovl_setxattr to ovl_xattr_set to indicate that
this is not an iop->setxattr implementation and remove the unused inode
argument.

Move ovl_other_xattr_set above ovl_own_xattr_set so that they match the
order of handlers in ovl_xattr_handlers.

Signed-off-by: Andreas Gruenbacher
Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting")
Signed-off-by: Miklos Szeredi

Andreas Gruenbacher
2016-09-01 17:12:00 +0800
0c97be22f ovl: Get rid of ovl_xattr_noacl_handlers array ... Browse Code »

Use an ordinary #ifdef to conditionally include the POSIX ACL handlers
in ovl_xattr_handlers, like the other filesystems do. Flag the code
that is now only used conditionally with __maybe_unused.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Miklos Szeredi

Andreas Gruenbacher
2016-09-01 17:11:59 +0800
fe2b75952 ovl: Fix OVL_XATTR_PREFIX ... Browse Code »

Make sure ovl_own_xattr_handler only matches attribute names starting
with "overlay.", not "overlayXXX".

Signed-off-by: Andreas Gruenbacher
Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting")
Signed-off-by: Miklos Szeredi

Andreas Gruenbacher
2016-09-01 17:11:59 +0800
fd36570a8 ovl: fix spelling mistake: "directries" -> "directories" ... Browse Code »

Trivial fix to spelling mistake in pr_err message.

Signed-off-by: Colin Ian King
Signed-off-by: Miklos Szeredi

Colin Ian King
2016-09-01 17:11:59 +0800
2a3a2a3f3 ovl: don't cache acl on overlay layer ... Browse Code »

Some operations (setxattr/chmod) can make the cached acl stale. We either
need to clear overlay's acl cache for the affected inode or prevent acl
caching on the overlay altogether. Preventing caching has the following
advantages:

- no double caching, less memory used

- overlay cache doesn't go stale when fs clears it's own cache

Possible disadvantage is performance loss. If that becomes a problem
get_acl() can be optimized for overlayfs.

This patch disables caching by pre setting i_*acl to a value that

- has bit 0 set, so is_uncached_acl() will return true

- is not equal to ACL_NOT_CACHED, so get_acl() will not overwrite it

The constant -3 was chosen for this purpose.

Fixes: 39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes")
Signed-off-by: Miklos Szeredi

Miklos Szeredi
2016-09-01 17:11:59 +0800
5201dc449 ovl: use cached acl on underlying layer ... Browse Code »

Instead of calling ->get_acl() directly, use get_acl() to get the cached
value.

We will have the acl cached on the underlying inode anyway, because we do
permission checking on the both the overlay and the underlying fs.

So, since we already have double caching, this improves performance without
any cost.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2016-09-01 17:11:59 +0800
eea2fb485 ovl: proper cleanup of workdir ... Browse Code »

When mounting overlayfs it needs a clean "work" directory under the
supplied workdir.

Previously the mount code removed this directory if it already existed and
created a new one. If the removal failed (e.g. directory was not empty)
then it fell back to a read-only mount not using the workdir.

While this has never been reported, it is possible to get a non-empty
"work" dir from a previous mount of overlayfs in case of crash in the
middle of an operation using the work directory.

In this case the left over state should be discarded and the overlay
filesystem will be consistent, guaranteed by the atomicity of operations on
moving to/from the workdir to the upper layer.

This patch implements cleaning out any files left in workdir. It is
implemented using real recursion for simplicity, but the depth is limited
to 2, because the worst case is that of a directory containing whiteouts
under "work".

Signed-off-by: Miklos Szeredi
Cc:

Miklos Szeredi
2016-09-01 17:11:59 +0800
c11b9fdd6 ovl: remove posix_acl_default from workdir ... Browse Code »

Clear out posix acl xattrs on workdir and also reset the mode after
creation so that an inherited sgid bit is cleared.

Signed-off-by: Miklos Szeredi
Cc:

Miklos Szeredi
2016-09-01 17:11:59 +0800
38b256973 ovl: handle umask and posix_acl_default correctly on creation ... Browse Code »

Setting MS_POSIXACL in sb->s_flags has the side effect of passing mode to
create functions without masking against umask.

Another problem when creating over a whiteout is that the default posix acl
is not inherited from the parent dir (because the real parent dir at the
time of creation is the work directory).

Fix these problems by:

a) If upper fs does not have MS_POSIXACL, then mask mode with umask.

b) If creating over a whiteout, call posix_acl_create() to get the
inherited acls. After creation (but before moving to the final
destination) set these acls on the created file. posix_acl_create() also
updates the file creation mode as appropriate.

Fixes: 39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes")
Signed-off-by: Miklos Szeredi

Miklos Szeredi
2016-09-01 17:11:59 +0800
cd81a9170 mm: introduce get_task_exe_file ... Browse Code »

For more convenient access if one has a pointer to the task.

As a minor nit take advantage of the fact that only task lock + rcu are
needed to safely grab ->exe_file. This saves mm refcount dance.

Use the helper in proc_exe_link.

Signed-off-by: Mateusz Guzik
Acked-by: Konstantin Khlebnikov
Acked-by: Richard Guy Briggs
Cc: # 4.3.x
Signed-off-by: Paul Moore

Mateusz Guzik
2016-09-01 04:11:20 +0800
9f834ec18 binfmt_elf: switch to new creds when switching to new mm ... Browse Code »

We used to delay switching to the new credentials until after we had
mapped the executable (and possible elf interpreter). That was kind of
odd to begin with, since the new executable will actually then _run_
with the new creds, but whatever.

The bigger problem was that we also want to make sure that we turn off
prof events and tracing before we start mapping the new executable
state. So while this is a cleanup, it's also a fix for a possible
information leak.

Reported-by: Robert Święcki
Tested-by: Peter Zijlstra
Acked-by: David Howells
Acked-by: Oleg Nesterov
Acked-by: Andy Lutomirski
Acked-by: Eric W. Biederman
Cc: Willy Tarreau
Cc: Kees Cook
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-09-01 00:13:56 +0800

31 Aug, 2016

2 commits

17d0774f8 sysfs: correctly handle read offset on PREALLOC attrs ... Browse Code »

Attributes declared with __ATTR_PREALLOC use sysfs_kf_read() which returns
zero bytes for non-zero offset. This breaks script checkarray in mdadm tool
in debian where /bin/sh is 'dash' because its builtin 'read' reads only one
byte at a time. Script gets 'i' instead of 'idle' when reads current action
from /sys/block/$dev/md/sync_action and as a result does nothing.

This patch adds trivial implementation of partial read: generate whole
string and move required part into buffer head.

Signed-off-by: Konstantin Khlebnikov
Fixes: 4ef67a8c95f3 ("sysfs/kernfs: make read requests on pre-alloc files use the buffer.")
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787950
Cc: Stable # v3.19+
Acked-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Konstantin Khlebnikov
2016-08-31 21:14:44 +0800
df6a58c5c kernfs: don't depend on d_find_any_alias() when generating notifications ... Browse Code »

kernfs_notify_workfn() sends out file modified events for the
scheduled kernfs_nodes. Because the modifications aren't from
userland, it doesn't have the matching file struct at hand and can't
use fsnotify_modify(). Instead, it looked up the inode and then used
d_find_any_alias() to find the dentry and used fsnotify_parent() and
fsnotify() directly to generate notifications.

The assumption was that the relevant dentries would have been pinned
if there are listeners, which isn't true as inotify doesn't pin
dentries at all and watching the parent doesn't pin the child dentries
even for dnotify. This led to, for example, inotify watchers not
getting notifications if the system is under memory pressure and the
matching dentries got reclaimed. It can also be triggered through
/proc/sys/vm/drop_caches or a remount attempt which involves shrinking
dcache.

fsnotify_parent() only uses the dentry to access the parent inode,
which kernfs can do easily. Update kernfs_notify_workfn() so that it
uses fsnotify() directly for both the parent and target inodes without
going through d_find_any_alias(). While at it, supply the target file
name to fsnotify() from kernfs_node->name.

Signed-off-by: Tejun Heo
Reported-by: Evgeny Vereshchagin
Fixes: d911d9874801 ("kernfs: make kernfs_notify() trigger inotify events too")
Cc: John McCutchan
Cc: Robert Love
Cc: Eric Paris
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2016-08-31 20:48:52 +0800