05 Sep, 2015

1 commit

  • Many file systems that implement the show_options hook fail to correctly
    escape their output which could lead to unescaped characters (e.g. new
    lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
    could lead to confusion, spoofed entries (resulting in things like
    systemd issuing false d-bus "mount" notifications), and who knows what
    else. This looks like it would only be the root user stepping on
    themselves, but it's possible weird things could happen in containers or
    in other situations with delegated mount privileges.

    Here's an example using overlay with setuid fusermount trusting the
    contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use
    of "sudo" is something more sneaky:

    $ BASE="ovl"
    $ MNT="$BASE/mnt"
    $ LOW="$BASE/lower"
    $ UP="$BASE/upper"
    $ WORK="$BASE/work/ 0 0
    none /proc fuse.pwn user_id=1000"
    $ mkdir -p "$LOW" "$UP" "$WORK"
    $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
    $ cat /proc/mounts
    none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
    none /proc fuse.pwn user_id=1000 0 0
    $ fusermount -u /proc
    $ cat /proc/mounts
    cat: /proc/mounts: No such file or directory

    This fixes the problem by adding new seq_show_option and
    seq_show_option_n helpers, and updating the vulnerable show_option
    handlers to use them as needed. Some, like SELinux, need to be open
    coded due to unusual existing escape mechanisms.

    [akpm@linux-foundation.org: add lost chunk, per Kees]
    [keescook@chromium.org: seq_show_option should be using const parameters]
    Signed-off-by: Kees Cook
    Acked-by: Serge Hallyn
    Acked-by: Jan Kara
    Acked-by: Paul Moore
    Cc: J. R. Okajima
    Signed-off-by: Kees Cook
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

12 Jul, 2015

1 commit


05 Jul, 2015

1 commit

  • Pull more vfs updates from Al Viro:
    "Assorted VFS fixes and related cleanups (IMO the most interesting in
    that part are f_path-related things and Eric's descriptor-related
    stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
    fs-cache series, DAX patches, Jan's file_remove_suid() work"

    [ I'd say this is much more than "fixes and related cleanups". The
    file_table locking rule change by Eric Dumazet is a rather big and
    fundamental update even if the patch isn't huge. - Linus ]

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
    9p: cope with bogus responses from server in p9_client_{read,write}
    p9_client_write(): avoid double p9_free_req()
    9p: forgetting to cancel request on interrupted zero-copy RPC
    dax: bdev_direct_access() may sleep
    block: Add support for DAX reads/writes to block devices
    dax: Use copy_from_iter_nocache
    dax: Add block size note to documentation
    fs/file.c: __fget() and dup2() atomicity rules
    fs/file.c: don't acquire files->file_lock in fd_install()
    fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
    vfs: avoid creation of inode number 0 in get_next_ino
    namei: make set_root_rcu() return void
    make simple_positive() public
    ufs: use dir_pages instead of ufs_dir_pages()
    pagemap.h: move dir_pages() over there
    remove the pointless include of lglock.h
    fs: cleanup slight list_entry abuse
    xfs: Correctly lock inode when removing suid and file capabilities
    fs: Call security_ops->inode_killpriv on truncate
    fs: Provide function telling whether file_remove_privs() will do anything
    ...

    Linus Torvalds
     

03 Jul, 2015

1 commit

  • Pull overlayfs updates from Miklos Szeredi:
    "This relaxes the requirements on the lower layer filesystem: now ones
    that implement .d_revalidate, such as NFS, can be used.

    Upper layer filesystems still has the "no .d_revalidate" requirement.

    Also a bad interaction with jffs2 locking has been fixed"

    * 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: lookup whiteouts outside iterate_dir()
    ovl: allow distributed fs as lower layer
    ovl: don't traverse automount points

    Linus Torvalds
     

23 Jun, 2015

1 commit

  • Pull vfs updates from Al Viro:
    "In this pile: pathname resolution rewrite.

    - recursion in link_path_walk() is gone.

    - nesting limits on symlinks are gone (the only limit remaining is
    that the total amount of symlinks is no more than 40, no matter how
    nested).

    - "fast" (inline) symlinks are handled without leaving rcuwalk mode.

    - stack footprint (independent of the nesting) is below kilobyte now,
    about on par with what it used to be with one level of nested
    symlinks and ~2.8 times lower than it used to be in the worst case.

    - struct nameidata is entirely private to fs/namei.c now (not even
    opaque pointers are being passed around).

    - ->follow_link() and ->put_link() calling conventions had been
    changed; all in-tree filesystems converted, out-of-tree should be
    able to follow reasonably easily.

    For out-of-tree conversions, see Documentation/filesystems/porting
    for details (and in-tree filesystems for examples of conversion).

    That has sat in -next since mid-May, seems to survive all testing
    without regressions and merges clean with v4.1"

    * 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (131 commits)
    turn user_{path_at,path,lpath,path_dir}() into static inlines
    namei: move saved_nd pointer into struct nameidata
    inline user_path_create()
    inline user_path_parent()
    namei: trim do_last() arguments
    namei: stash dfd and name into nameidata
    namei: fold path_cleanup() into terminate_walk()
    namei: saner calling conventions for filename_parentat()
    namei: saner calling conventions for filename_create()
    namei: shift nameidata down into filename_parentat()
    namei: make filename_lookup() reject ERR_PTR() passed as name
    namei: shift nameidata inside filename_lookup()
    namei: move putname() call into filename_lookup()
    namei: pass the struct path to store the result down into path_lookupat()
    namei: uninline set_root{,_rcu}()
    namei: be careful with mountpoint crossings in follow_dotdot_rcu()
    Documentation: remove outdated information from automount-support.txt
    get rid of assorted nameidata-related debris
    lustre: kill unused helper
    lustre: kill unused macro (LOOKUP_CONTINUE)
    ...

    Linus Torvalds
     

22 Jun, 2015

3 commits

  • If jffs2 can deadlock on overlayfs readdir because it takes the same lock
    on ->iterate() as in ->lookup().

    Fix by moving whiteout checking outside iterate_dir(). Optimized by
    collecting potential whiteouts (DT_CHR) in a temporary list and if
    non-empty iterating throug these and checking for a 0/0 chardev.

    Signed-off-by: Miklos Szeredi
    Fixes: 49c21e1cacd7 ("ovl: check whiteout while reading directory")
    Reported-by: Roman Yeryomin

    Miklos Szeredi
     
  • Allow filesystems with .d_revalidate as lower layer(s), but not as upper
    layer.

    For local filesystems the rule was that modifications on the layers
    directly while being part of the overlay results in undefined behavior.

    This can easily be extended to distributed filesystems: we assume the tree
    used as lower layer is static, which means ->d_revalidate() should always
    return "1". If that is not the case, return -ESTALE, don't try to work
    around the modification.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • NFS and other distributed filesystems may place automount points in the
    tree. Previoulsy overlayfs refused to mount such filesystems types (based
    on the existence of the .d_automount callback), even if the actual export
    didn't have any automount points.

    It cannot be determined in advance whether the filesystem has automount
    points or not. The solution is to allow fs with .d_automount but refuse to
    traverse any automount points encountered.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

19 Jun, 2015

2 commits

  • Make file->f_path always point to the overlay dentry so that the path in
    /proc/pid/fd is correct and to ensure that label-based LSMs have access to the
    overlay as well as the underlay (path-based LSMs probably don't need it).

    Using my union testsuite to set things up, before the patch I see:

    [root@andromeda union-testsuite]# bash 5 /a/foo107
    [root@andromeda union-testsuite]# stat /mnt/a/foo107
    ...
    Device: 23h/35d Inode: 13381 Links: 1
    ...
    [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
    ...
    Device: 23h/35d Inode: 13381 Links: 1
    ...

    After the patch:

    [root@andromeda union-testsuite]# bash 5 /mnt/a/foo107
    [root@andromeda union-testsuite]# stat /mnt/a/foo107
    ...
    Device: 23h/35d Inode: 40346 Links: 1
    ...
    [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
    ...
    Device: 23h/35d Inode: 40346 Links: 1
    ...

    Note the change in where /proc/$$/fd/5 points to in the ls command. It was
    pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
    (which is correct).

    The inode accessed, however, is the lower layer. The union layer is on device
    25h/37d and the upper layer on 24h/36d.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Call ovl_drop_write() earlier in ovl_dentry_open() before we call vfs_open()
    as we've done the copy up for which we needed the freeze-write lock by that
    point.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

19 May, 2015

1 commit

  • OpenWRT folks reported that overlayfs fails to mount if upper fs is full,
    because workdir can't be created. Wordir creation can fail for various
    other reasons too.

    There's no reason that the mount itself should fail, overlayfs can work
    fine without a workdir, as long as the overlay isn't modified.

    So mount it read-only and don't allow remounting read-write.

    Add a couple of WARN_ON()s for the impossible case of workdir being used
    despite being read-only.

    Reported-by: Bastian Bittorf
    Signed-off-by: Miklos Szeredi
    Cc: # v3.18+

    Miklos Szeredi
     

14 May, 2015

1 commit

  • When removing an opaque directory we can't just call rmdir() to check for
    emptiness, because the directory will need to be replaced with a whiteout.
    The replacement is done with RENAME_EXCHANGE, which doesn't check
    emptiness.

    Solution is just to check emptiness by reading the directory. In the
    future we could add a new rename flag to check for emptiness even for
    RENAME_EXCHANGE to optimize this case.

    Reported-by: Vincent Batts
    Signed-off-by: Miklos Szeredi
    Tested-by: Jordi Pujol Palomer
    Fixes: 263b4a0fee43 ("ovl: dont replace opaque dir")
    Cc: # v4.0+

    Miklos Szeredi
     

11 May, 2015

4 commits

  • only one instance looks at that argument at all; that sole
    exception wants inode rather than dentry.

    Signed-off-by: Al Viro

    Al Viro
     
  • its only use is getting passed to nd_jump_link(), which can obtain
    it from current->nameidata

    Signed-off-by: Al Viro

    Al Viro
     
  • a) instead of storing the symlink body (via nd_set_link()) and returning
    an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
    that opaque pointer (into void * passed by address by caller) and returns
    the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic
    symlinks) and pointer to symlink body for normal symlinks. Stored pointer
    is ignored in all cases except the last one.

    Storing NULL for opaque pointer (or not storing it at all) means no call
    of ->put_link().

    b) the body used to be passed to ->put_link() implicitly (via nameidata).
    Now only the opaque pointer is. In the cases when we used the symlink body
    to free stuff, ->follow_link() now should store it as opaque pointer in addition
    to returning it.

    Signed-off-by: Al Viro

    Al Viro
     
  • ovl_follow_link current calls ->put_link on an error path.
    However ->put_link is about to change in a way that it will be
    impossible to call it from ovl_follow_link.

    So rearrange the code to avoid the need for that error path.
    Specifically: move the kmalloc() call before the ->follow_link()
    call to the subordinate filesystem.

    Signed-off-by: NeilBrown
    Signed-off-by: Al Viro

    NeilBrown
     

18 Mar, 2015

3 commits

  • After importing multi-lower layer support, users could mount a r/o
    partition as the left most lowerdir instead of using it as upperdir.
    And a r/o upperdir may cause an error like

    overlayfs: failed to create directory ./workdir/work

    during mount.

    This patch check the *s_flags* of upper fs and return an error if
    it is a r/o partition. The checking of *upper_mnt->mnt_sb->s_flags*
    can be removed now.

    This patch also remove

    /* FIXME: workdir is not needed for a R/O mount */

    from ovl_fill_super() because:

    1) for upper fs r/o case
    Setting a r/o partition as upper is prevented, no need to care about
    workdir in this case.

    2) for "mount overlay -o ro" with a r/w upper fs case
    Users could remount overlayfs to r/w in this case, so workdir should
    not be omitted.

    Signed-off-by: hujianyang
    Signed-off-by: Miklos Szeredi

    hujianyang
     
  • Recently multi-lower layer mount support allow upperdir and workdir
    to be omitted, then cause overlayfs can be mount with only one
    lowerdir directory. This action make no sense and have potential risk.

    This patch check the total number of lower directories to prevent
    mounting overlayfs with only one directory.

    Also, an error message is added to indicate lower directories exceed
    OVL_MAX_STACK limit.

    Signed-off-by: hujianyang
    Signed-off-by: Miklos Szeredi

    hujianyang
     
  • Overlayfs should print an error message if an incorrect mount option
    is caught like other filesystems.

    After this patch, improper option input could be clearly known.

    Reported-by: Fabian Sturm
    Signed-off-by: hujianyang
    Signed-off-by: Miklos Szeredi

    hujianyang
     

23 Feb, 2015

1 commit

  • Convert the following where appropriate:

    (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

    (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

    (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
    complicated than it appears as some calls should be converted to
    d_can_lookup() instead. The difference is whether the directory in
    question is a real dir with a ->lookup op or whether it's a fake dir with
    a ->d_automount op.

    In some circumstances, we can subsume checks for dentry->d_inode not being
    NULL into this, provided we the code isn't in a filesystem that expects
    d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
    use d_inode() rather than d_backing_inode() to get the inode pointer).

    Note that the dentry type field may be set to something other than
    DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
    manages the fall-through from a negative dentry to a lower layer. In such a
    case, the dentry type of the negative union dentry is set to the same as the
    type of the lower dentry.

    However, if you know d_inode is not NULL at the call site, then you can use
    the d_is_xxx() functions even in a filesystem.

    There is one further complication: a 0,0 chardev dentry may be labelled
    DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
    intended for special directory entry types that don't have attached inodes.

    The following perl+coccinelle script was used:

    use strict;

    my @callers;
    open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
    @callers = ;
    close($fd);
    unless (@callers) {
    print "No matches\n";
    exit(0);
    }

    my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

    my $coccifile = "tmp.sp.cocci";
    open($fd, ">$coccifile") || die $coccifile;
    print($fd "$_\n") || die $coccifile foreach (@cocci);
    close($fd);

    foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
    die "spatch failed";
    }

    [AV: overlayfs parts skipped]

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

20 Feb, 2015

1 commit


09 Jan, 2015

1 commit

  • Since the ovl_dir_cache is stable during a directory reading, the cursor
    of struct ovl_dir_file don't need to be an independent entry in the list
    of a merged directory.

    This patch changes *cursor* to a pointer which points to the entry in the
    ovl_dir_cache. After this, we don't need to check *is_cursor* either.

    Signed-off-by: hujianyang
    Signed-off-by: Miklos Szeredi

    hujianyang
     

08 Jan, 2015

3 commits

  • Overlayfs should be mounted read-only when upper-fs is read-only or nonexistent.
    But now it can be remounted read-write and this can cause kernel panic.
    So we should prevent read-write remount when the above situation happens.

    Signed-off-by: Seunghun Lee
    Signed-off-by: Miklos Szeredi

    Seunghun Lee
     
  • Current multi-layer support overlayfs has a regression in
    .lookup(). If there is a directory in upperdir and a regular
    file has same name in lowerdir in a merged directory, lower
    file is hidden and upper directory is set to opaque in former
    case. But it is changed in present code.

    In lowerdir lookup path, if a found inode is not directory,
    the type checking of previous inode is missing. This inode
    will be copied to the lowerstack of ovl_entry directly.

    That will lead to several wrong conditions, for example,
    the reading of the directory in upperdir may return an error
    like:

    ls: reading directory .: Not a directory

    This patch makes the lowerdir lookup path check the opaque
    for non-directory file too.

    Signed-off-by: hujianyang
    Signed-off-by: Miklos Szeredi

    hujianyang
     
  • The function ovl_fill_super() in recently multi-layer support
    version will incorrectly return 0 at error handling path and
    then cause kernel panic.

    This failure can be reproduced by mounting a overlayfs with
    upperdir and workdir in different mounts.

    And also, If the memory allocation of *lower_mnt* fail, this
    function may return an zero either.

    This patch fix this problem by setting *err* to proper error
    number before jumping to error handling path.

    Signed-off-by: hujianyang
    Signed-off-by: Miklos Szeredi

    hujianyang
     

13 Dec, 2014

15 commits