Eric Lee / smarc-fsl-linux-kernel

27 Mar, 2016

1 commit

d101a1259 fs: add file_dentry() ... Browse Code »

This series fixes bugs in nfs and ext4 due to 4bacc9c9234c ("overlayfs:
Make f_path always point to the overlay and f_inode to the underlay").

Regular files opened on overlayfs will result in the file being opened on
the underlying filesystem, while f_path points to the overlayfs
mount/dentry.

This confuses filesystems which get the dentry from struct file and assume
it's theirs.

Add a new helper, file_dentry() [*], to get the filesystem's own dentry
from the file. This checks file->f_path.dentry->d_flags against
DCACHE_OP_REAL, and returns file->f_path.dentry if DCACHE_OP_REAL is not
set (this is the common, non-overlayfs case).

In the uncommon case it will call into overlayfs's ->d_real() to get the
underlying dentry, matching file_inode(file).

The reason we need to check against the inode is that if the file is copied
up while being open, d_real() would return the upper dentry, while the open
file comes from the lower dentry.

[*] If possible, it's better simply to use file_inode() instead.

Signed-off-by: Miklos Szeredi
Signed-off-by: Theodore Ts'o
Tested-by: Goldwyn Rodrigues
Reviewed-by: Trond Myklebust
Cc: # v4.2
Cc: David Howells
Cc: Al Viro
Cc: Daniel Axtens

Miklos Szeredi
2016-03-27 04:14:37 +0800

22 Mar, 2016

7 commits

6986c012f ovl: cleanup unused var in rename2 ... Browse Code »

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2016-03-22 00:31:46 +0800
56656e960 ovl: rename is_merge to is_lowest ... Browse Code »

The 'is_merge' is an historical naming from when only a single lower layer
could exist. With the introduction of multiple lower layers the meaning of
this flag was changed to mean only the "lowest layer" (while all lower
layers were being merged).

So now 'is_merge' is inaccurate and hence renaming to 'is_lowest'

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2016-03-22 00:31:46 +0800
f134f2446 ovl: fixed coding style warning ... Browse Code »

This patch fixes a newline warning found by the checkpatch.pl tool

Signed-off-by: Sohom-Bhattacharjee
Signed-off-by: Miklos Szeredi

Sohom Bhattacharjee
2016-03-22 00:31:45 +0800
45aebeaf4 ovl: Ensure upper filesystem supports d_type ... Browse Code »

In some instances xfs has been created with ftype=0 and there if a file
on lower fs is removed, overlay leaves a whiteout in upper fs but that
whiteout does not get filtered out and is visible to overlayfs users.

And reason it does not get filtered out because upper filesystem does
not report file type of whiteout as DT_CHR during iterate_dir().

So it seems to be a requirement that upper filesystem support d_type for
overlayfs to work properly. Do this check during mount and fail if d_type
is not supported.

Suggested-by: Dave Chinner
Signed-off-by: Vivek Goyal
Signed-off-by: Miklos Szeredi

Vivek Goyal
2016-03-22 00:31:45 +0800
fb5bb2c3b ovl: Warn on copy up if a process has a R/O fd open to the lower file ... Browse Code »

Print a warning when overlayfs copies up a file if the process that
triggered the copy up has a R/O fd open to the lower file being copied up.

This can help catch applications that do things like the following:

fd1 = open("foo", O_RDONLY);
fd2 = open("foo", O_RDWR);

where they expect fd1 and fd2 to refer to the same file - which will no
longer be the case post-copy up.

With this patch, the following commands:

bash 5<>/mnt/a/foo128

assuming /mnt/a/foo128 to be an un-copied up file on an overlay will
produce the following warning in the kernel log:

overlayfs: Copying up foo129, but open R/O on fd 5 which will cease
to be coherent [pid=3818 bash]

This is enabled by setting:

/sys/module/overlay/parameters/check_copy_up

to 1.

The warnings are ratelimited and are also limited to one warning per file -
assuming the copy up completes in each case.

Signed-off-by: David Howells
Signed-off-by: Miklos Szeredi

David Howells
2016-03-22 00:31:45 +0800
07f2af7bf ovl: honor flag MS_SILENT at mount ... Browse Code »

This patch hides error about missing lowerdir if MS_SILENT is set.

We use mount(NULL, "/", "overlay", MS_SILENT, NULL) for testing support of
overlayfs: syscall returns -ENODEV if it's not supported. Otherwise kernel
automatically loads module and returns -EINVAL because lowerdir is missing.

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Miklos Szeredi

Konstantin Khlebnikov
2016-03-22 00:31:45 +0800
11f371041 ovl: verify upper dentry before unlink and rename ... Browse Code »

Unlink and rename in overlayfs checked the upper dentry for staleness by
verifying upper->d_parent against upperdir. However the dentry can go
stale also by being unhashed, for example.

Expand the verification to actually look up the name again (under parent
lock) and check if it matches the upper dentry. This matches what the VFS
does before passing the dentry to filesytem's unlink/rename methods, which
excludes any inconsistency caused by overlayfs.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2016-03-22 00:31:44 +0800

04 Mar, 2016

4 commits

b81de061f ovl: copy new uid/gid into overlayfs runtime inode ... Browse Code »

Overlayfs must update uid/gid after chown, otherwise functions
like inode_owner_or_capable() will check user against stale uid.
Catched by xfstests generic/087, it chowns file and calls utimes.

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Miklos Szeredi
Cc:

Konstantin Khlebnikov
2016-03-04 00:17:46 +0800
45d117389 ovl: ignore lower entries when checking purity of non-directory entries ... Browse Code »

After rename file dentry still holds reference to lower dentry from
previous location. This doesn't matter for data access because data comes
from upper dentry. But this stale lower dentry taints dentry at new
location and turns it into non-pure upper. Such file leaves visible
whiteout entry after remove in directory which shouldn't have whiteouts at
all.

Overlayfs already tracks pureness of file location in oe->opaque. This
patch just uses that for detecting actual path type.

Comment from Vivek Goyal's patch:

Here are the details of the problem. Do following.

$ mkdir upper lower work merged upper/dir/
$ touch lower/test
$ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=
work merged
$ mv merged/test merged/dir/
$ rm merged/dir/test
$ ls -l merged/dir/
/usr/bin/ls: cannot access merged/dir/test: No such file or directory
total 0
c????????? ? ? ? ? ? test

Basic problem seems to be that once a file has been unlinked, a whiteout
has been left behind which was not needed and hence it becomes visible.

Whiteout is visible because parent dir is of not type MERGE, hence
od->is_real is set during ovl_dir_open(). And that means ovl_iterate()
passes on iterate handling directly to underlying fs. Underlying fs does
not know/filter whiteouts so it becomes visible to user.

Why did we leave a whiteout to begin with when we should not have.
ovl_do_remove() checks for OVL_TYPE_PURE_UPPER() and does not leave
whiteout if file is pure upper. In this case file is not found to be pure
upper hence whiteout is left.

So why file was not PURE_UPPER in this case? I think because dentry is
still carrying some leftover state which was valid before rename. For
example, od->numlower was set to 1 as it was a lower file. After rename,
this state is not valid anymore as there is no such file in lower.

Signed-off-by: Konstantin Khlebnikov
Reported-by: Viktor Stanchev
Suggested-by: Vivek Goyal
Link: https://bugzilla.kernel.org/show_bug.cgi?id=109611
Acked-by: Vivek Goyal
Signed-off-by: Miklos Szeredi
Cc:

Konstantin Khlebnikov
2016-03-04 00:17:45 +0800
ce9113bbc ovl: fix getcwd() failure after unsuccessful rmdir ... Browse Code »

ovl_remove_upper() should do d_drop() only after it successfully
removes the dir, otherwise a subsequent getcwd() system call will
fail, breaking userspace programs.

This is to fix: https://bugzilla.kernel.org/show_bug.cgi?id=110491

Signed-off-by: Rui Wang
Reviewed-by: Konstantin Khlebnikov
Signed-off-by: Miklos Szeredi
Cc:

Rui Wang
2016-03-04 00:17:45 +0800
b5891cfab ovl: fix working on distributed fs as lower layer ... Browse Code »

This adds missing .d_select_inode into alternative dentry_operations.

Signed-off-by: Konstantin Khlebnikov
Fixes: 7c03b5d45b8e ("ovl: allow distributed fs as lower layer")
Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Reviewed-by: Nikolay Borisov
Tested-by: Nikolay Borisov
Signed-off-by: Miklos Szeredi
Cc: # 4.2+

Konstantin Khlebnikov
2016-03-04 00:17:45 +0800

23 Jan, 2016

1 commit

5955102c9 wrappers for ->i_mutex access ... Browse Code »

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro

Al Viro
2016-01-23 07:04:28 +0800

22 Jan, 2016

2 commits

eae21770b Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge third patch-bomb from Andrew Morton:
"I'm pretty much done for -rc1 now:

- the rest of MM, basically

- lib/ updates

- checkpatch, epoll, hfs, fatfs, ptrace, coredump, exit

- cpu_mask simplifications

- kexec, rapidio, MAINTAINERS etc, etc.

- more dma-mapping cleanups/simplifications from hch"

* emailed patches from Andrew Morton : (109 commits)
MAINTAINERS: add/fix git URLs for various subsystems
mm: memcontrol: add "sock" to cgroup2 memory.stat
mm: memcontrol: basic memory statistics in cgroup2 memory controller
mm: memcontrol: do not uncharge old page in page cache replacement
Documentation: cgroup: add memory.swap.{current,max} description
mm: free swap cache aggressively if memcg swap is full
mm: vmscan: do not scan anon pages if memcg swap limit is hit
swap.h: move memcg related stuff to the end of the file
mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online
mm: vmscan: pass memcg to get_scan_count()
mm: memcontrol: charge swap to cgroup2
mm: memcontrol: clean up alloc, online, offline, free functions
mm: memcontrol: flatten struct cg_proto
mm: memcontrol: rein in the CONFIG space madness
net: drop tcp_memcontrol.c
mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM
mm: memcontrol: allow to disable kmem accounting for cgroup2
mm: memcontrol: account "kmem" consumers in cgroup2 memory controller
mm: memcontrol: move kmem accounting code to CONFIG_MEMCG
mm: memcontrol: separate kmem code from legacy tcp accounting code
...

Linus Torvalds
2016-01-22 04:32:08 +0800
e9f57ebcb Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs ... Browse Code »

Pull overlayfs updates from Miklos Szeredi:
"This contains several bug fixes and a new mount option
'default_permissions' that allows read-only exported NFS
filesystems to be used as lower layer"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: check dentry positiveness in ovl_cleanup_whiteouts()
ovl: setattr: check permissions before copy-up
ovl: root: copy attr
ovl: move super block magic number to magic.h
ovl: use a minimal buffer in ovl_copy_xattr
ovl: allow zero size xattr
ovl: default permissions

Linus Torvalds
2016-01-22 04:20:46 +0800

21 Jan, 2016

1 commit

e458bcd16 fs/overlayfs/super.c needs pagemap.h ... Browse Code »

i386 allmodconfig:

In file included from fs/overlayfs/super.c:10:0:
fs/overlayfs/super.c: In function 'ovl_fill_super':
include/linux/fs.h:898:36: error: 'PAGE_CACHE_SIZE' undeclared (first use in this function)
#define MAX_LFS_FILESIZE (((loff_t)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1)
^
fs/overlayfs/super.c:939:19: note: in expansion of macro 'MAX_LFS_FILESIZE'
sb->s_maxbytes = MAX_LFS_FILESIZE;
^
include/linux/fs.h:898:36: note: each undeclared identifier is reported only once for each function it appears in
#define MAX_LFS_FILESIZE (((loff_t)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1)
^
fs/overlayfs/super.c:939:19: note: in expansion of macro 'MAX_LFS_FILESIZE'
sb->s_maxbytes = MAX_LFS_FILESIZE;
^

Cc: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2016-01-21 09:09:18 +0800

31 Dec, 2015

1 commit

fceef393a switch ->get_link() to delayed_call, kill ->put_link() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2015-12-31 02:01:03 +0800

11 Dec, 2015

2 commits

84889d493 ovl: check dentry positiveness in ovl_cleanup_whiteouts() ... Browse Code »

This patch fixes kernel crash at removing directory which contains
whiteouts from lower layers.

Cache of directory content passed as "list" contains entries from all
layers, including whiteouts from lower layers. So, lookup in upper dir
(moved into work at this stage) will return negative entry. Plus this
cache is filled long before and we can race with external removal.

Example:
mkdir -p lower0/dir lower1/dir upper work overlay
touch lower0/dir/a lower0/dir/b
mknod lower1/dir/a c 0 0
mount -t overlay none overlay -o lowerdir=lower1:lower0,upperdir=upper,workdir=work
rm -fr overlay/dir

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Miklos Szeredi
Cc: # 3.18+

Konstantin Khlebnikov
2015-12-11 23:30:49 +0800
cf9a6784f ovl: setattr: check permissions before copy-up ... Browse Code »

Without this copy-up of a file can be forced, even without actually being
allowed to do anything on the file.

[Arnd Bergmann] include for PAGE_CACHE_SIZE (used by
MAX_LFS_FILESIZE definition).

Signed-off-by: Miklos Szeredi
Cc:

Miklos Szeredi
2015-12-11 23:30:49 +0800

09 Dec, 2015

2 commits

ed06e0697 ovl: root: copy attr ... Browse Code »

We copy i_uid and i_gid of underlying inode into overlayfs inode. Except
for the root inode.

Fix this omission.

Signed-off-by: Miklos Szeredi
Cc:

Miklos Szeredi
2015-12-09 23:11:59 +0800
6b2553918 replace ->follow_link() with new method that could stay in RCU mode ... Browse Code »

new method: ->get_link(); replacement of ->follow_link(). The differences
are:
* inode and dentry are passed separately
* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.

It's a flagday change - the old method is gone, all in-tree instances
converted. Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode. That'll change
in the next commits.

Signed-off-by: Al Viro

Al Viro
2015-12-09 11:41:54 +0800

07 Dec, 2015

2 commits

0f7ff2dab ovl: get rid of the dead code left from broken (and disabled) optimizations ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2015-12-07 01:31:07 +0800
acff81ec2 ovl: fix permission checking for setattr ... Browse Code »

[Al Viro] The bug is in being too enthusiastic about optimizing ->setattr()
away - instead of "copy verbatim with metadata" + "chmod/chown/utimes"
(with the former being always safe and the latter failing in case of
insufficient permissions) it tries to combine these two. Note that copyup
itself will have to do ->setattr() anyway; _that_ is where the elevated
capabilities are right. Having these two ->setattr() (one to set verbatim
copy of metadata, another to do what overlayfs ->setattr() had been asked
to do in the first place) combined is where it breaks.

Signed-off-by: Miklos Szeredi
Cc:
Signed-off-by: Al Viro

Miklos Szeredi
2015-12-07 01:28:23 +0800

11 Nov, 2015

3 commits

257f87199 ovl: move super block magic number to magic.h ... Browse Code »

The overlayfs file system is not recognized by programs
like tail because the magic number is not in standard header location.

Move it so that the value will propagate on for the GNU library
and utilities. Needs to go in the fstatfs manual page as well.

Signed-off-by: Stephen Hemminger
Signed-off-by: Miklos Szeredi

Stephen Hemminger
2015-11-11 00:13:35 +0800
e4ad29fa0 ovl: use a minimal buffer in ovl_copy_xattr ... Browse Code »

Rather than always allocating the high-order XATTR_SIZE_MAX buffer
which is costly and prone to failure, only allocate what is needed and
realloc if necessary.

Fixes https://github.com/coreos/bugs/issues/489

Signed-off-by: Miklos Szeredi
Cc:

Vito Caputo
2015-11-11 00:08:42 +0800
97daf8b97 ovl: allow zero size xattr ... Browse Code »

When ovl_copy_xattr() encountered a zero size xattr no more xattrs were
copied and the function returned success. This is clearly not the desired
behavior.

Signed-off-by: Miklos Szeredi
Cc:

Miklos Szeredi
2015-11-11 00:08:41 +0800

01 Nov, 2015

1 commit

4bb0fb57f Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs ... Browse Code »

Pull overlayfs bug fixes from Miklos Szeredi:
"This contains fixes for bugs that appeared in earlier kernels (all are
marked for -stable)"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: free lower_mnt array in ovl_put_super
ovl: free stack of paths in ovl_fill_super
ovl: fix open in stacked overlay
ovl: fix dentry reference leak
ovl: use O_LARGEFILE in ovl_copy_up()

Linus Torvalds
2015-11-01 05:49:19 +0800

12 Oct, 2015

6 commits

8d3095f4a ovl: default permissions ... Browse Code »

Add mount option "default_permissions" to alter the way permissions are
calculated.

Without this option and prior to this patch permissions were calculated by
underlying lower or upper filesystem.

With this option the permissions are calculated by overlayfs based on the
file owner, group and mode bits.

This has significance for example when a read-only exported NFS filesystem
is used as a lower layer. In this case the underlying NFS filesystem will
reply with EROFS, in which case all we know is that the filesystem is
read-only. But that's not what we are interested in, we are interested in
whether the access would be allowed if the filesystem wasn't read-only; the
server doesn't tell us that, and would need updating at various levels,
which doesn't seem practicable.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2015-10-12 23:11:44 +0800
5ffdbe8bf ovl: free lower_mnt array in ovl_put_super ... Browse Code »

This fixes memory leak after umount.

Kmemleak report:

unreferenced object 0xffff8800ba791010 (size 8):
comm "mount", pid 2394, jiffies 4294996294 (age 53.920s)
hex dump (first 8 bytes):
20 1c 13 02 00 88 ff ff .......
backtrace:
[] create_object+0x124/0x2c0
[] kmemleak_alloc+0x7b/0xc0
[] __kmalloc+0x106/0x340
[] ovl_fill_super+0x55c/0x9b0 [overlay]
[] mount_nodev+0x54/0xa0
[] ovl_mount+0x18/0x20 [overlay]
[] mount_fs+0x43/0x170
[] vfs_kern_mount+0x74/0x170
[] do_mount+0x22d/0xdf0
[] SyS_mount+0x7b/0xc0
[] entry_SYSCALL_64_fastpath+0x12/0x76
[] 0xffffffffffffffff

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Miklos Szeredi
Fixes: dd662667e6d3 ("ovl: add mutli-layer infrastructure")
Cc: # v4.0+

Konstantin Khlebnikov
2015-10-12 23:11:44 +0800
0f95502ad ovl: free stack of paths in ovl_fill_super ... Browse Code »

This fixes small memory leak after mount.

Kmemleak report:

unreferenced object 0xffff88003683fe00 (size 16):
comm "mount", pid 2029, jiffies 4294909563 (age 33.380s)
hex dump (first 16 bytes):
20 27 1f bb 00 88 ff ff 40 4b 0f 36 02 88 ff ff '......@K.6....
backtrace:
[] create_object+0x124/0x2c0
[] kmemleak_alloc+0x7b/0xc0
[] __kmalloc+0x106/0x340
[] ovl_fill_super+0x389/0x9a0 [overlay]
[] mount_nodev+0x54/0xa0
[] ovl_mount+0x18/0x20 [overlay]
[] mount_fs+0x43/0x170
[] vfs_kern_mount+0x74/0x170
[] do_mount+0x22d/0xdf0
[] SyS_mount+0x7b/0xc0
[] entry_SYSCALL_64_fastpath+0x12/0x76
[] 0xffffffffffffffff

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Miklos Szeredi
Fixes: a78d9f0d5d5c ("ovl: support multiple lower layers")
Cc: # v4.0+

Konstantin Khlebnikov
2015-10-12 23:11:43 +0800
1c8a47df3 ovl: fix open in stacked overlay ... Browse Code »

If two overlayfs filesystems are stacked on top of each other, then we need
recursion in ovl_d_select_inode().

I guess d_backing_inode() is supposed to do that. But currently it doesn't
and that functionality is open coded in vfs_open(). This is now copied
into ovl_d_select_inode() to fix this regression.

Reported-by: Alban Crequy
Signed-off-by: Miklos Szeredi
Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay...")
Cc: David Howells
Cc: # v4.2+

Miklos Szeredi
2015-10-12 21:56:20 +0800
ab79efab0 ovl: fix dentry reference leak ... Browse Code »

In ovl_copy_up_locked(), newdentry is leaked if the function exits through
out_cleanup as this just to out after calling ovl_cleanup() - which doesn't
actually release the ref on newdentry.

The out_cleanup segment should instead exit through out2 as certainly
newdentry leaks - and possibly upper does also, though this isn't caught
given the catch of newdentry.

Without this fix, something like the following is seen:

BUG: Dentry ffff880023e9eb20{i=f861,n=#ffff880023e82d90} still in use (1) [unmount of tmpfs tmpfs]
BUG: Dentry ffff880023ece640{i=0,n=bigfile} still in use (1) [unmount of tmpfs tmpfs]

when unmounting the upper layer after an error occurred in copyup.

An error can be induced by creating a big file in a lower layer with
something like:

dd if=/dev/zero of=/lower/a/bigfile bs=65536 count=1 seek=$((0xf000))

to create a large file (4.1G). Overlay an upper layer that is too small
(on tmpfs might do) and then induce a copy up by opening it writably.

Reported-by: Ulrich Obergfell
Signed-off-by: David Howells
Signed-off-by: Miklos Szeredi
Cc: # v3.18+

David Howells
2015-10-12 21:56:20 +0800
0480334fa ovl: use O_LARGEFILE in ovl_copy_up() ... Browse Code »

Open the lower file with O_LARGEFILE in ovl_copy_up().

Pass O_LARGEFILE unconditionally in ovl_copy_up_data() as it's purely for
catching 32-bit userspace dealing with a file large enough that it'll be
mishandled if the application isn't aware that there might be an integer
overflow. Inside the kernel, there shouldn't be any problems.

Reported-by: Ulrich Obergfell
Signed-off-by: David Howells
Signed-off-by: Miklos Szeredi
Cc: # v3.18+

David Howells
2015-10-12 21:56:20 +0800

05 Sep, 2015

1 commit

a068acf2e fs: create and use seq_show_option for escaping ... Browse Code »

Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g. new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else. This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use
of "sudo" is something more sneaky:

$ BASE="ovl"
$ MNT="$BASE/mnt"
$ LOW="$BASE/lower"
$ UP="$BASE/upper"
$ WORK="$BASE/work/ 0 0
none /proc fuse.pwn user_id=1000"
$ mkdir -p "$LOW" "$UP" "$WORK"
$ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
$ cat /proc/mounts
none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
none /proc fuse.pwn user_id=1000 0 0
$ fusermount -u /proc
$ cat /proc/mounts
cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed. Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook
Acked-by: Serge Hallyn
Acked-by: Jan Kara
Acked-by: Paul Moore
Cc: J. R. Okajima
Signed-off-by: Kees Cook
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2015-09-05 07:54:41 +0800

12 Jul, 2015

1 commit

9391dd00d fix a braino in ovl_d_select_inode() ... Browse Code »

when opening a directory we want the overlayfs inode, not one from
the topmost layer.

Reported-By: Andrey Jr. Melnikov
Tested-By: Andrey Jr. Melnikov
Signed-off-by: Al Viro

Al Viro
2015-07-12 23:22:05 +0800

05 Jul, 2015

1 commit

1dc51b828 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"

[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...

Linus Torvalds
2015-07-05 10:36:06 +0800

03 Jul, 2015

1 commit

320cd413f Merge branch 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs ... Browse Code »

Pull overlayfs updates from Miklos Szeredi:
"This relaxes the requirements on the lower layer filesystem: now ones
that implement .d_revalidate, such as NFS, can be used.

Upper layer filesystems still has the "no .d_revalidate" requirement.

Also a bad interaction with jffs2 locking has been fixed"

* 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: lookup whiteouts outside iterate_dir()
ovl: allow distributed fs as lower layer
ovl: don't traverse automount points

Linus Torvalds
2015-07-03 02:23:00 +0800

23 Jun, 2015

1 commit

052b398a4 Merge branch 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"In this pile: pathname resolution rewrite.

- recursion in link_path_walk() is gone.

- nesting limits on symlinks are gone (the only limit remaining is
that the total amount of symlinks is no more than 40, no matter how
nested).

- "fast" (inline) symlinks are handled without leaving rcuwalk mode.

- stack footprint (independent of the nesting) is below kilobyte now,
about on par with what it used to be with one level of nested
symlinks and ~2.8 times lower than it used to be in the worst case.

- struct nameidata is entirely private to fs/namei.c now (not even
opaque pointers are being passed around).

- ->follow_link() and ->put_link() calling conventions had been
changed; all in-tree filesystems converted, out-of-tree should be
able to follow reasonably easily.

For out-of-tree conversions, see Documentation/filesystems/porting
for details (and in-tree filesystems for examples of conversion).

That has sat in -next since mid-May, seems to survive all testing
without regressions and merges clean with v4.1"

* 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (131 commits)
turn user_{path_at,path,lpath,path_dir}() into static inlines
namei: move saved_nd pointer into struct nameidata
inline user_path_create()
inline user_path_parent()
namei: trim do_last() arguments
namei: stash dfd and name into nameidata
namei: fold path_cleanup() into terminate_walk()
namei: saner calling conventions for filename_parentat()
namei: saner calling conventions for filename_create()
namei: shift nameidata down into filename_parentat()
namei: make filename_lookup() reject ERR_PTR() passed as name
namei: shift nameidata inside filename_lookup()
namei: move putname() call into filename_lookup()
namei: pass the struct path to store the result down into path_lookupat()
namei: uninline set_root{,_rcu}()
namei: be careful with mountpoint crossings in follow_dotdot_rcu()
Documentation: remove outdated information from automount-support.txt
get rid of assorted nameidata-related debris
lustre: kill unused helper
lustre: kill unused macro (LOOKUP_CONTINUE)
...

Linus Torvalds
2015-06-23 03:51:21 +0800

22 Jun, 2015

2 commits

cdb672795 ovl: lookup whiteouts outside iterate_dir() ... Browse Code »

If jffs2 can deadlock on overlayfs readdir because it takes the same lock
on ->iterate() as in ->lookup().

Fix by moving whiteout checking outside iterate_dir(). Optimized by
collecting potential whiteouts (DT_CHR) in a temporary list and if
non-empty iterating throug these and checking for a 0/0 chardev.

Signed-off-by: Miklos Szeredi
Fixes: 49c21e1cacd7 ("ovl: check whiteout while reading directory")
Reported-by: Roman Yeryomin

Miklos Szeredi
2015-06-22 19:53:48 +0800
7c03b5d45 ovl: allow distributed fs as lower layer ... Browse Code »

Allow filesystems with .d_revalidate as lower layer(s), but not as upper
layer.

For local filesystems the rule was that modifications on the layers
directly while being part of the overlay results in undefined behavior.

This can easily be extended to distributed filesystems: we assume the tree
used as lower layer is static, which means ->d_revalidate() should always
return "1". If that is not the case, return -ESTALE, don't try to work
around the modification.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2015-06-22 19:53:48 +0800