05 Sep, 2015
1 commit
-
Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g. new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else. This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use
of "sudo" is something more sneaky:$ BASE="ovl"
$ MNT="$BASE/mnt"
$ LOW="$BASE/lower"
$ UP="$BASE/upper"
$ WORK="$BASE/work/ 0 0
none /proc fuse.pwn user_id=1000"
$ mkdir -p "$LOW" "$UP" "$WORK"
$ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
$ cat /proc/mounts
none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
none /proc fuse.pwn user_id=1000 0 0
$ fusermount -u /proc
$ cat /proc/mounts
cat: /proc/mounts: No such file or directoryThis fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed. Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook
Acked-by: Serge Hallyn
Acked-by: Jan Kara
Acked-by: Paul Moore
Cc: J. R. Okajima
Signed-off-by: Kees Cook
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Jul, 2015
1 commit
-
when opening a directory we want the overlayfs inode, not one from
the topmost layer.Reported-By: Andrey Jr. Melnikov
Tested-By: Andrey Jr. Melnikov
Signed-off-by: Al Viro
05 Jul, 2015
1 commit
-
Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...
03 Jul, 2015
1 commit
-
Pull overlayfs updates from Miklos Szeredi:
"This relaxes the requirements on the lower layer filesystem: now ones
that implement .d_revalidate, such as NFS, can be used.Upper layer filesystems still has the "no .d_revalidate" requirement.
Also a bad interaction with jffs2 locking has been fixed"
* 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: lookup whiteouts outside iterate_dir()
ovl: allow distributed fs as lower layer
ovl: don't traverse automount points
23 Jun, 2015
1 commit
-
Pull vfs updates from Al Viro:
"In this pile: pathname resolution rewrite.- recursion in link_path_walk() is gone.
- nesting limits on symlinks are gone (the only limit remaining is
that the total amount of symlinks is no more than 40, no matter how
nested).- "fast" (inline) symlinks are handled without leaving rcuwalk mode.
- stack footprint (independent of the nesting) is below kilobyte now,
about on par with what it used to be with one level of nested
symlinks and ~2.8 times lower than it used to be in the worst case.- struct nameidata is entirely private to fs/namei.c now (not even
opaque pointers are being passed around).- ->follow_link() and ->put_link() calling conventions had been
changed; all in-tree filesystems converted, out-of-tree should be
able to follow reasonably easily.For out-of-tree conversions, see Documentation/filesystems/porting
for details (and in-tree filesystems for examples of conversion).That has sat in -next since mid-May, seems to survive all testing
without regressions and merges clean with v4.1"* 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (131 commits)
turn user_{path_at,path,lpath,path_dir}() into static inlines
namei: move saved_nd pointer into struct nameidata
inline user_path_create()
inline user_path_parent()
namei: trim do_last() arguments
namei: stash dfd and name into nameidata
namei: fold path_cleanup() into terminate_walk()
namei: saner calling conventions for filename_parentat()
namei: saner calling conventions for filename_create()
namei: shift nameidata down into filename_parentat()
namei: make filename_lookup() reject ERR_PTR() passed as name
namei: shift nameidata inside filename_lookup()
namei: move putname() call into filename_lookup()
namei: pass the struct path to store the result down into path_lookupat()
namei: uninline set_root{,_rcu}()
namei: be careful with mountpoint crossings in follow_dotdot_rcu()
Documentation: remove outdated information from automount-support.txt
get rid of assorted nameidata-related debris
lustre: kill unused helper
lustre: kill unused macro (LOOKUP_CONTINUE)
...
22 Jun, 2015
3 commits
-
If jffs2 can deadlock on overlayfs readdir because it takes the same lock
on ->iterate() as in ->lookup().Fix by moving whiteout checking outside iterate_dir(). Optimized by
collecting potential whiteouts (DT_CHR) in a temporary list and if
non-empty iterating throug these and checking for a 0/0 chardev.Signed-off-by: Miklos Szeredi
Fixes: 49c21e1cacd7 ("ovl: check whiteout while reading directory")
Reported-by: Roman Yeryomin -
Allow filesystems with .d_revalidate as lower layer(s), but not as upper
layer.For local filesystems the rule was that modifications on the layers
directly while being part of the overlay results in undefined behavior.This can easily be extended to distributed filesystems: we assume the tree
used as lower layer is static, which means ->d_revalidate() should always
return "1". If that is not the case, return -ESTALE, don't try to work
around the modification.Signed-off-by: Miklos Szeredi
-
NFS and other distributed filesystems may place automount points in the
tree. Previoulsy overlayfs refused to mount such filesystems types (based
on the existence of the .d_automount callback), even if the actual export
didn't have any automount points.It cannot be determined in advance whether the filesystem has automount
points or not. The solution is to allow fs with .d_automount but refuse to
traverse any automount points encountered.Signed-off-by: Miklos Szeredi
19 Jun, 2015
2 commits
-
Make file->f_path always point to the overlay dentry so that the path in
/proc/pid/fd is correct and to ensure that label-based LSMs have access to the
overlay as well as the underlay (path-based LSMs probably don't need it).Using my union testsuite to set things up, before the patch I see:
[root@andromeda union-testsuite]# bash 5 /a/foo107
[root@andromeda union-testsuite]# stat /mnt/a/foo107
...
Device: 23h/35d Inode: 13381 Links: 1
...
[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
...
Device: 23h/35d Inode: 13381 Links: 1
...After the patch:
[root@andromeda union-testsuite]# bash 5 /mnt/a/foo107
[root@andromeda union-testsuite]# stat /mnt/a/foo107
...
Device: 23h/35d Inode: 40346 Links: 1
...
[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
...
Device: 23h/35d Inode: 40346 Links: 1
...Note the change in where /proc/$$/fd/5 points to in the ls command. It was
pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
(which is correct).The inode accessed, however, is the lower layer. The union layer is on device
25h/37d and the upper layer on 24h/36d.Signed-off-by: David Howells
Signed-off-by: Al Viro -
Call ovl_drop_write() earlier in ovl_dentry_open() before we call vfs_open()
as we've done the copy up for which we needed the freeze-write lock by that
point.Signed-off-by: David Howells
Signed-off-by: Al Viro
19 May, 2015
1 commit
-
OpenWRT folks reported that overlayfs fails to mount if upper fs is full,
because workdir can't be created. Wordir creation can fail for various
other reasons too.There's no reason that the mount itself should fail, overlayfs can work
fine without a workdir, as long as the overlay isn't modified.So mount it read-only and don't allow remounting read-write.
Add a couple of WARN_ON()s for the impossible case of workdir being used
despite being read-only.Reported-by: Bastian Bittorf
Signed-off-by: Miklos Szeredi
Cc: # v3.18+
14 May, 2015
1 commit
-
When removing an opaque directory we can't just call rmdir() to check for
emptiness, because the directory will need to be replaced with a whiteout.
The replacement is done with RENAME_EXCHANGE, which doesn't check
emptiness.Solution is just to check emptiness by reading the directory. In the
future we could add a new rename flag to check for emptiness even for
RENAME_EXCHANGE to optimize this case.Reported-by: Vincent Batts
Signed-off-by: Miklos Szeredi
Tested-by: Jordi Pujol Palomer
Fixes: 263b4a0fee43 ("ovl: dont replace opaque dir")
Cc: # v4.0+
11 May, 2015
4 commits
-
only one instance looks at that argument at all; that sole
exception wants inode rather than dentry.Signed-off-by: Al Viro
-
its only use is getting passed to nd_jump_link(), which can obtain
it from current->nameidataSigned-off-by: Al Viro
-
a) instead of storing the symlink body (via nd_set_link()) and returning
an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
that opaque pointer (into void * passed by address by caller) and returns
the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic
symlinks) and pointer to symlink body for normal symlinks. Stored pointer
is ignored in all cases except the last one.Storing NULL for opaque pointer (or not storing it at all) means no call
of ->put_link().b) the body used to be passed to ->put_link() implicitly (via nameidata).
Now only the opaque pointer is. In the cases when we used the symlink body
to free stuff, ->follow_link() now should store it as opaque pointer in addition
to returning it.Signed-off-by: Al Viro
-
ovl_follow_link current calls ->put_link on an error path.
However ->put_link is about to change in a way that it will be
impossible to call it from ovl_follow_link.So rearrange the code to avoid the need for that error path.
Specifically: move the kmalloc() call before the ->follow_link()
call to the subordinate filesystem.Signed-off-by: NeilBrown
Signed-off-by: Al Viro
18 Mar, 2015
3 commits
-
After importing multi-lower layer support, users could mount a r/o
partition as the left most lowerdir instead of using it as upperdir.
And a r/o upperdir may cause an error likeoverlayfs: failed to create directory ./workdir/work
during mount.
This patch check the *s_flags* of upper fs and return an error if
it is a r/o partition. The checking of *upper_mnt->mnt_sb->s_flags*
can be removed now.This patch also remove
/* FIXME: workdir is not needed for a R/O mount */
from ovl_fill_super() because:
1) for upper fs r/o case
Setting a r/o partition as upper is prevented, no need to care about
workdir in this case.2) for "mount overlay -o ro" with a r/w upper fs case
Users could remount overlayfs to r/w in this case, so workdir should
not be omitted.Signed-off-by: hujianyang
Signed-off-by: Miklos Szeredi -
Recently multi-lower layer mount support allow upperdir and workdir
to be omitted, then cause overlayfs can be mount with only one
lowerdir directory. This action make no sense and have potential risk.This patch check the total number of lower directories to prevent
mounting overlayfs with only one directory.Also, an error message is added to indicate lower directories exceed
OVL_MAX_STACK limit.Signed-off-by: hujianyang
Signed-off-by: Miklos Szeredi -
Overlayfs should print an error message if an incorrect mount option
is caught like other filesystems.After this patch, improper option input could be clearly known.
Reported-by: Fabian Sturm
Signed-off-by: hujianyang
Signed-off-by: Miklos Szeredi
23 Feb, 2015
1 commit
-
Convert the following where appropriate:
(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.The following perl+coccinelle script was used:
use strict;
my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = ;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}[AV: overlayfs parts skipped]
Signed-off-by: David Howells
Signed-off-by: Al Viro
20 Feb, 2015
1 commit
-
…szeredi/vfs into for-next
09 Jan, 2015
1 commit
-
Since the ovl_dir_cache is stable during a directory reading, the cursor
of struct ovl_dir_file don't need to be an independent entry in the list
of a merged directory.This patch changes *cursor* to a pointer which points to the entry in the
ovl_dir_cache. After this, we don't need to check *is_cursor* either.Signed-off-by: hujianyang
Signed-off-by: Miklos Szeredi
08 Jan, 2015
3 commits
-
Overlayfs should be mounted read-only when upper-fs is read-only or nonexistent.
But now it can be remounted read-write and this can cause kernel panic.
So we should prevent read-write remount when the above situation happens.Signed-off-by: Seunghun Lee
Signed-off-by: Miklos Szeredi -
Current multi-layer support overlayfs has a regression in
.lookup(). If there is a directory in upperdir and a regular
file has same name in lowerdir in a merged directory, lower
file is hidden and upper directory is set to opaque in former
case. But it is changed in present code.In lowerdir lookup path, if a found inode is not directory,
the type checking of previous inode is missing. This inode
will be copied to the lowerstack of ovl_entry directly.That will lead to several wrong conditions, for example,
the reading of the directory in upperdir may return an error
like:ls: reading directory .: Not a directory
This patch makes the lowerdir lookup path check the opaque
for non-directory file too.Signed-off-by: hujianyang
Signed-off-by: Miklos Szeredi -
The function ovl_fill_super() in recently multi-layer support
version will incorrectly return 0 at error handling path and
then cause kernel panic.This failure can be reproduced by mounting a overlayfs with
upperdir and workdir in different mounts.And also, If the memory allocation of *lower_mnt* fail, this
function may return an zero either.This patch fix this problem by setting *err* to proper error
number before jumping to error handling path.Signed-off-by: hujianyang
Signed-off-by: Miklos Szeredi
13 Dec, 2014
15 commits
-
This patch adds two macros:
OVL_XATTR_PRE_NAME and OVL_XATTR_PRE_LEN
to present ovl_xattr name prefix and its length. Also, a
new macro OVL_XATTR_OPAQUE is introduced to replace old
*ovl_opaque_xattr*.Fix the length of "trusted.overlay." to *16*.
Signed-off-by: hujianyang
Signed-off-by: Miklos Szeredi -
This patch removes redundant blanks lines in overlayfs.
Signed-off-by: hujianyang
Signed-off-by: Miklos Szeredi -
Allow "lowerdir=" option to contain multiple lower directories separated by
a colon (e.g. "lowerdir=/bin:/usr/bin"). Colon characters in filenames can
be escaped with a backslash.Signed-off-by: Miklos Szeredi
-
Make "upperdir=" mount option optional. If "upperdir=" is not given, then
the "workdir=" option is also optional (and ignored if given).Signed-off-by: Miklos Szeredi
-
Move common checks into ovl_mount_dir() helper.
Create helper for looking up lower directories.
Signed-off-by: Miklos Szeredi
-
Move allocation of root entry above to where it's needed.
Move initializations related to upperdir and workdir near each other.
Signed-off-by: Miklos Szeredi
-
Handle "no upper layer" case in statfs.
Signed-off-by: Miklos Szeredi
-
"Suppose you have in one of the lower layers a filesystem with
->lookup()-enforced upper limit on name length. Pretty much every local fs
has one, but... they are not all equal. 255 characters is the common upper
limit, but e.g. jffs2 stops at 254, minixfs upper limit is somewhere from
14 to 60, depending upon version, etc. You are doing a lookup for
something that is present in upper layer, but happens to be too long for
one of the lower layers. Too bad - ENAMETOOLONG for you..."Reported-by: Al Viro
Signed-off-by: Miklos Szeredi -
Not checking whiteouts on lowest layer was an optimization (there's nothing
to white out there), but it could result in inconsitent behavior when a
layer previously used as upper/middle is later used as lowest.Signed-off-by: Miklos Szeredi
-
Look up dentry in all relevant layers.
Signed-off-by: Miklos Szeredi
-
If multiple lower layers exist, merge them as well in readdir according to
the same rules as merging upper with lower. I.e. take whiteouts and opaque
directories into account on all but the lowers layer.Signed-off-by: Miklos Szeredi
-
Add helper to iterate through all the layers, starting from the upper layer
(if exists) and continuing down through the lower layers.Signed-off-by: Miklos Szeredi
-
Add multiple lower layers to 'struct ovl_fs' and 'struct ovl_entry'.
ovl_entry will have an array of paths, instead of just the dentry. This
allows a compact array containing just the layers which exist at current
point in the tree (which is expected to be a small number for the majority
of dentries).The number of layers is not limited by this infrastructure.
Signed-off-by: Miklos Szeredi
-
When removing an empty opaque directory, then it makes no sense to replace
it with an exact replica of itself before removal.Signed-off-by: Miklos Szeredi
-
OVL_PATH_PURE_UPPER -> __OVL_PATH_UPPER | __OVL_PATH_PURE
OVL_PATH_UPPER -> __OVL_PATH_UPPER
OVL_PATH_MERGE -> __OVL_PATH_UPPER | __OVL_PATH_MERGE
OVL_PATH_LOWER -> 0Multiple R/O layers will allow __OVL_PATH_MERGE without __OVL_PATH_UPPER.
Signed-off-by: Miklos Szeredi