29 Nov, 2016

1 commit

  • Handling of recursion in d_real() is completely broken. Recursion is only
    done in the 'inode != NULL' case. But when opening the file we have
    'inode == NULL' hence d_real() will return an overlay dentry. This won't
    work since overlayfs doesn't define its own file operations, so all file
    ops will fail.

    Fix by doing the recursion first and the check against the inode second.

    Bash script to reproduce the issue written by Quentin:

    - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -
    tmpdir=$(mktemp -d)
    pushd ${tmpdir}

    mkdir -p {upper,lower,work}
    echo -n 'rocks' > lower/ksplice
    mount -t overlay level_zero upper -o lowerdir=lower,upperdir=upper,workdir=work
    cat upper/ksplice

    tmpdir2=$(mktemp -d)
    pushd ${tmpdir2}

    mkdir -p {upper,work}
    mount -t overlay level_one upper -o lowerdir=${tmpdir}/upper,upperdir=upper,workdir=work
    ls -l upper/ksplice
    cat upper/ksplice
    - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -

    Reported-by: Quentin Casasnovas
    Signed-off-by: Miklos Szeredi
    Fixes: 2d902671ce1c ("vfs: merge .d_select_inode() into .d_real()")
    Cc: # v4.8+

    Miklos Szeredi
     

31 Oct, 2016

1 commit

  • This change fixes xfstest generic/375, which failed to clear the
    setgid bit in the following test case on overlayfs:

    touch $testfile
    chown 100:100 $testfile
    chmod 2755 $testfile
    _runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile

    Reported-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Tested-by: Amir Goldstein
    Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting")
    Cc: # v4.8

    Miklos Szeredi
     

15 Oct, 2016

3 commits

  • Pull more misc uaccess and vfs updates from Al Viro:
    "The rest of the stuff from -next (more uaccess work) + assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    score: traps: Add missing include file to fix build error
    fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths
    fs/super.c: fix race between freeze_super() and thaw_super()
    overlayfs: Fix setting IOP_XATTR flag
    iov_iter: kernel-doc import_iovec() and rw_copy_check_uvector()
    blackfin: no access_ok() for __copy_{to,from}_user()
    arm64: don't zero in __copy_from_user{,_inatomic}
    arm: don't zero in __copy_from_user_inatomic()/__copy_from_user()
    arc: don't leak bits of kernel stack into coredump
    alpha: get rid of tail-zeroing in __copy_user()

    Linus Torvalds
     
  • Pull overlayfs updates from Miklos Szeredi:
    "This update contains fixes to the "use mounter's permission to access
    underlying layers" area, and miscellaneous other fixes and cleanups.

    No new features this time"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: use vfs_get_link()
    vfs: add vfs_get_link() helper
    ovl: use generic_readlink
    ovl: explain error values when removing acl from workdir
    ovl: Fix info leak in ovl_lookup_temp()
    ovl: during copy up, switch to mounter's creds early
    ovl: lookup: do getxattr with mounter's permission
    ovl: copy_up_xattr(): use strnlen

    Linus Torvalds
     
  • ovl_fill_super calls ovl_new_inode to create a root inode for the new
    superblock before initializing sb->s_xattr. This wrongly causes
    IOP_XATTR to be cleared in i_opflags of the new inode, causing SELinux
    to log the following message:

    SELinux: (dev overlay, type overlay) has no xattr support

    Fix this by initializing sb->s_xattr and similar fields before calling
    ovl_new_inode.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Vivek Goyal
     

14 Oct, 2016

1 commit


11 Oct, 2016

1 commit

  • Pull vfs xattr updates from Al Viro:
    "xattr stuff from Andreas

    This completes the switch to xattr_handler ->get()/->set() from
    ->getxattr/->setxattr/->removexattr"

    * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Remove {get,set,remove}xattr inode operations
    xattr: Stop calling {get,set,remove}xattr inode operations
    vfs: Check for the IOP_XATTR flag in listxattr
    xattr: Add __vfs_{get,set,remove}xattr helpers
    libfs: Use IOP_XATTR flag for empty directory handling
    vfs: Use IOP_XATTR flag for bad-inode handling
    vfs: Add IOP_XATTR inode operations flag
    vfs: Move xattr_resolve_name to the front of fs/xattr.c
    ecryptfs: Switch to generic xattr handlers
    sockfs: Get rid of getxattr iop
    sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
    kernfs: Switch to generic xattr handlers
    hfs: Switch to generic xattr handlers
    jffs2: Remove jffs2_{get,set,remove}xattr macros
    xattr: Remove unnecessary NULL attribute name check

    Linus Torvalds
     

08 Oct, 2016

1 commit

  • Right now, various places in the kernel check for the existence of
    getxattr, setxattr, and removexattr inode operations and directly call
    those operations. Switch to helper functions and test for the IOP_XATTR
    flag instead.

    Signed-off-by: Andreas Gruenbacher
    Acked-by: James Morris
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

16 Sep, 2016

2 commits

  • The getxattr() in ovl_is_opaquedir() was missed when converting all
    operations on underlying fs to be done under mounter's permission.

    This patch fixes this by moving the ovl_override_creds()/revert_creds() out
    from ovl_lookup_real() to ovl_lookup().

    Also convert to using vfs_getxattr() instead of directly calling
    i_op->getxattr().

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • This patch allows flock, posix locks, ofd locks and leases to work
    correctly on overlayfs.

    Instead of using the underlying inode for storing lock context use the
    overlay inode. This allows locks to be persistent across copy-up.

    This is done by introducing locks_inode() helper and using it instead of
    file_inode() to get the inode in locking code. For non-overlayfs the two
    are equivalent, except for an extra pointer dereference in locks_inode().

    Since lock operations are in "struct file_operations" we must also make
    sure not to call underlying filesystem's lock operations. Introcude a
    super block flag MS_NOREMOTELOCK to this effect.

    Signed-off-by: Miklos Szeredi
    Acked-by: Jeff Layton
    Cc: "J. Bruce Fields"

    Miklos Szeredi
     

05 Sep, 2016

1 commit

  • Workdir creation fails in latest kernel.

    Fix by allowing EOPNOTSUPP as a valid return value from
    vfs_removexattr(XATTR_NAME_POSIX_ACL_*). Upper filesystem may not support
    ACL and still be perfectly able to support overlayfs.

    Reported-by: Martin Ziegler
    Signed-off-by: Miklos Szeredi
    Fixes: c11b9fdd6a61 ("ovl: remove posix_acl_default from workdir")
    Cc:

    Miklos Szeredi
     

01 Sep, 2016

7 commits

  • Now that overlayfs has xattr handlers for iop->{set,remove}xattr, use
    those same handlers for iop->getxattr as well.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Miklos Szeredi

    Andreas Gruenbacher
     
  • Setting POSIX acl may also modify the file mode, so need to copy that up to
    the overlay inode.

    Reported-by: Eryu Guan
    Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting")
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Commit d837a49bd57f ("ovl: fix POSIX ACL setting") switches from
    iop->setxattr from ovl_setxattr to generic_setxattr, so switch from
    ovl_removexattr to generic_removexattr as well. As far as permission
    checking goes, the same rules should apply in either case.

    While doing that, rename ovl_setxattr to ovl_xattr_set to indicate that
    this is not an iop->setxattr implementation and remove the unused inode
    argument.

    Move ovl_other_xattr_set above ovl_own_xattr_set so that they match the
    order of handlers in ovl_xattr_handlers.

    Signed-off-by: Andreas Gruenbacher
    Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting")
    Signed-off-by: Miklos Szeredi

    Andreas Gruenbacher
     
  • Use an ordinary #ifdef to conditionally include the POSIX ACL handlers
    in ovl_xattr_handlers, like the other filesystems do. Flag the code
    that is now only used conditionally with __maybe_unused.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Miklos Szeredi

    Andreas Gruenbacher
     
  • Trivial fix to spelling mistake in pr_err message.

    Signed-off-by: Colin Ian King
    Signed-off-by: Miklos Szeredi

    Colin Ian King
     
  • When mounting overlayfs it needs a clean "work" directory under the
    supplied workdir.

    Previously the mount code removed this directory if it already existed and
    created a new one. If the removal failed (e.g. directory was not empty)
    then it fell back to a read-only mount not using the workdir.

    While this has never been reported, it is possible to get a non-empty
    "work" dir from a previous mount of overlayfs in case of crash in the
    middle of an operation using the work directory.

    In this case the left over state should be discarded and the overlay
    filesystem will be consistent, guaranteed by the atomicity of operations on
    moving to/from the workdir to the upper layer.

    This patch implements cleaning out any files left in workdir. It is
    implemented using real recursion for simplicity, but the depth is limited
    to 2, because the worst case is that of a directory containing whiteouts
    under "work".

    Signed-off-by: Miklos Szeredi
    Cc:

    Miklos Szeredi
     
  • Clear out posix acl xattrs on workdir and also reset the mode after
    creation so that an inherited sgid bit is cleared.

    Signed-off-by: Miklos Szeredi
    Cc:

    Miklos Szeredi
     

29 Jul, 2016

12 commits

  • Signed-off-by: Al Viro
    Signed-off-by: Miklos Szeredi

    Al Viro
     
  • This does not work and does not make sense. So instead of fixing it
    (probably not hard) just disallow.

    Reported-by: Andrei Vagin
    Signed-off-by: Miklos Szeredi
    Cc:

    Miklos Szeredi
     
  • There's a superfluous newline in the warning message in ovl_d_real().

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Remove duplicated include.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Miklos Szeredi

    Wei Yongjun
     
  • Setting POSIX ACL needs special handling:

    1) Some permission checks are done by ->setxattr() which now uses mounter's
    creds ("ovl: do operations on underlying file system in mounter's
    context"). These permission checks need to be done with current cred as
    well.

    2) Setting ACL can fail for various reasons. We do not need to copy up in
    these cases.

    In the mean time switch to using generic_setxattr.

    [Arnd Bergmann] Fix link error without POSIX ACL. posix_acl_from_xattr()
    doesn't have a 'static inline' implementation when CONFIG_FS_POSIX_ACL is
    disabled, and I could not come up with an obvious way to do it.

    This instead avoids the link error by defining two sets of ACL operations
    and letting the compiler drop one of the two at compile time depending
    on CONFIG_FS_POSIX_ACL. This avoids all references to the ACL code,
    also leading to smaller code.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Inode attributes are copied up to overlay inode (uid, gid, mode, atime,
    mtime, ctime) so generic code using these fields works correcty. If a hard
    link is created in overlayfs separate inodes are allocated for each link.
    If chmod/chown/etc. is performed on one of the links then the inode
    belonging to the other ones won't be updated.

    This patch attempts to fix this by sharing inodes for hard links.

    Use inode hash (with real inode pointer as a key) to make sure overlay
    inodes are shared for hard links on upper. Hard links on lower are still
    split (which is not user observable until the copy-up happens, see
    Documentation/filesystems/overlayfs.txt under "Non-standard behavior").

    The inode is only inserted in the hash if it is non-directoy and upper.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • To get from overlay inode to real inode we currently use 'struct
    ovl_entry', which has lifetime connected to overlay dentry. This is okay,
    since each overlay dentry had a new overlay inode allocated.

    Following patch will break that assumption, so need to leave out ovl_entry.
    This patch stores the real inode directly in i_private, with the lowest bit
    used to indicate whether the inode is upper or lower.

    Lifetime rules remain, using ovl_inode_real() must only be done while
    caller holds ref on overlay dentry (and hence on real dentry), or within
    RCU protected regions.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Fix atime update logic in overlayfs.

    This patch adds an i_op->update_time() handler to overlayfs inodes. This
    forwards atime updates to the upper layer only. No atime updates are done
    on lower layers.

    Remove implicit atime updates to underlying files and directories with
    O_NOATIME. Remove explicit atime update in ovl_readlink().

    Clear atime related mnt flags from cloned upper mount. This means atime
    updates are controlled purely by overlayfs mount options.

    Reported-by: Konstantin Khlebnikov
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • The fact that we always do permission checking on the overlay inode and
    clear MAY_WRITE for checking access to the lower inode allows cruft to be
    removed from ovl_permission().

    1) "default_permissions" option effectively did generic_permission() on the
    overlay inode with i_mode, i_uid and i_gid updated from underlying
    filesystem. This is what we do by default now. It did the update using
    vfs_getattr() but that's only needed if the underlying filesystem can
    change (which is not allowed). We may later introduce a "paranoia_mode"
    that verifies that mode/uid/gid are not changed.

    2) splitting out the IS_RDONLY() check from inode_permission() also becomes
    unnecessary once we remove the MAY_WRITE from the lower inode check.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Now we are planning to do DAC permission checks on overlay inode
    itself. And to make it work, we will need to make sure we can get acls from
    underlying inode. So define ->get_acl() for overlay inodes and this in turn
    calls into underlying filesystem to get acls, if any.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • No point in keeping overlay inodes around since they will never be reused.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • The hash salting changes meant that we can no longer reuse the hash in the
    overlay dentry to look up the underlying dentry.

    Instead of lookup_hash(), use lookup_one_len_unlocked() and swith to
    mounter's creds (like we do for all other operations later in the series).

    Now the lookup_hash() export introduced in 4.6 by 3c9fe8cdff1b ("vfs: add
    lookup_hash() helper") is unused and can possibly be removed; its
    usefulness negated by the hash salting and the idea that mounter's creds
    should be used on operations on underlying filesystems.

    Signed-off-by: Miklos Szeredi
    Fixes: 8387ff2577eb ("vfs: make the string hashes salt the hash")

    Miklos Szeredi
     

27 Jul, 2016

1 commit


03 Jul, 2016

1 commit

  • overlay needs underlying fs to support d_type. Recently I put in a
    patch in to detect this condition and started failing mount if
    underlying fs did not support d_type.

    But this breaks existing configurations over kernel upgrade. Those who
    are running docker (partially broken configuration) with xfs not
    supporting d_type, are surprised that after kernel upgrade docker does
    not run anymore.

    https://github.com/docker/docker/issues/22937#issuecomment-229881315

    So instead of erroring out, detect broken configuration and warn
    about it. This should allow existing docker setups to continue
    working after kernel upgrade.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi
    Fixes: 45aebeaf4f67 ("ovl: Ensure upper filesystem supports d_type")
    Cc: 4.6

    Vivek Goyal
     

30 Jun, 2016

1 commit

  • The two methods essentially do the same: find the real dentry/inode
    belonging to an overlay dentry. The difference is in the usage:

    vfs_open() uses ->d_select_inode() and expects the function to perform
    copy-up if necessary based on the open flags argument.

    file_dentry() uses ->d_real() passing in the overlay dentry as well as the
    underlying inode.

    vfs_rename() uses ->d_select_inode() but passes zero flags. ->d_real()
    with a zero inode would have worked just as well here.

    This patch merges the functionality of ->d_select_inode() into ->d_real()
    by adding an 'open_flags' argument to the latter.

    [Al Viro] Make the signature of d_real() match that of ->d_real() again.
    And constify the inode argument, while we are at it.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

28 May, 2016

1 commit

  • Pull overlayfs update from Miklos Szeredi:
    "The meat of this is a change to use the mounter's credentials for
    operations that require elevated privileges (such as whiteout
    creation). This fixes behavior under user namespaces as well as being
    a nice cleanup"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: Do d_type check only if work dir creation was successful
    ovl: update documentation
    ovl: override creds with the ones from the superblock mounter

    Linus Torvalds
     

27 May, 2016

2 commits

  • d_type check requires successful creation of workdir as iterates
    through work dir and expects work dir to be present in it. If that's
    not the case, this check will always return d_type not supported even
    if underlying filesystem might be supporting it.

    So don't do this check if work dir creation failed in previous step.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • In user namespace the whiteout creation fails with -EPERM because the
    current process isn't capable(CAP_SYS_ADMIN) when setting xattr.

    A simple reproducer:

    $ mkdir upper lower work merged lower/dir
    $ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
    $ unshare -m -p -f -U -r bash

    Now as root in the user namespace:

    \# touch merged/dir/{1,2,3} # this will force a copy up of lower/dir
    \# rm -fR merged/*

    This ends up failing with -EPERM after the files in dir has been
    correctly deleted:

    unlinkat(4, "2", 0) = 0
    unlinkat(4, "1", 0) = 0
    unlinkat(4, "3", 0) = 0
    close(4) = 0
    unlinkat(AT_FDCWD, "merged/dir", AT_REMOVEDIR) = -1 EPERM (Operation not
    permitted)

    Interestingly, if you don't place files in merged/dir you can remove it,
    meaning if upper/dir does not exist, creating the char device file works
    properly in that same location.

    This patch uses ovl_sb_creator_cred() to get the cred struct from the
    superblock mounter and override the old cred with these new ones so that
    the whiteout creation is possible because overlay is wrong in assuming that
    the creds it will get with prepare_creds will be in the initial user
    namespace. The old cap_raise game is removed in favor of just overriding
    the old cred struct.

    This patch also drops from ovl_copy_up_one() the following two lines:

    override_cred->fsuid = stat->uid;
    override_cred->fsgid = stat->gid;

    This is because the correct uid and gid are taken directly with the stat
    struct and correctly set with ovl_set_attr().

    Signed-off-by: Antonio Murdaca
    Signed-off-by: Miklos Szeredi

    Antonio Murdaca
     

17 May, 2016

1 commit


11 May, 2016

1 commit

  • Generally permission checking is not necessary when overlayfs looks up a
    dentry on one of the underlying layers, since search permission on base
    directory was already checked in ovl_permission().

    More specifically using lookup_one_len() causes a problem when the lower
    directory lacks search permission for a specific user while the upper
    directory does have search permission. Since lookups are cached, this
    causes inconsistency in behavior: success depends on who did the first
    lookup.

    So instead use lookup_hash() which doesn't do the permission check.

    Reported-by: Ignacy Gawędzki
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

03 May, 2016

2 commits