27 Sep, 2016

1 commit


01 Sep, 2016

4 commits

  • Now that overlayfs has xattr handlers for iop->{set,remove}xattr, use
    those same handlers for iop->getxattr as well.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Miklos Szeredi

    Andreas Gruenbacher
     
  • Commit d837a49bd57f ("ovl: fix POSIX ACL setting") switches from
    iop->setxattr from ovl_setxattr to generic_setxattr, so switch from
    ovl_removexattr to generic_removexattr as well. As far as permission
    checking goes, the same rules should apply in either case.

    While doing that, rename ovl_setxattr to ovl_xattr_set to indicate that
    this is not an iop->setxattr implementation and remove the unused inode
    argument.

    Move ovl_other_xattr_set above ovl_own_xattr_set so that they match the
    order of handlers in ovl_xattr_handlers.

    Signed-off-by: Andreas Gruenbacher
    Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting")
    Signed-off-by: Miklos Szeredi

    Andreas Gruenbacher
     
  • Make sure ovl_own_xattr_handler only matches attribute names starting
    with "overlay.", not "overlayXXX".

    Signed-off-by: Andreas Gruenbacher
    Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting")
    Signed-off-by: Miklos Szeredi

    Andreas Gruenbacher
     
  • When mounting overlayfs it needs a clean "work" directory under the
    supplied workdir.

    Previously the mount code removed this directory if it already existed and
    created a new one. If the removal failed (e.g. directory was not empty)
    then it fell back to a read-only mount not using the workdir.

    While this has never been reported, it is possible to get a non-empty
    "work" dir from a previous mount of overlayfs in case of crash in the
    middle of an operation using the work directory.

    In this case the left over state should be discarded and the overlay
    filesystem will be consistent, guaranteed by the atomicity of operations on
    moving to/from the workdir to the upper layer.

    This patch implements cleaning out any files left in workdir. It is
    implemented using real recursion for simplicity, but the depth is limited
    to 2, because the worst case is that of a directory containing whiteouts
    under "work".

    Signed-off-by: Miklos Szeredi
    Cc:

    Miklos Szeredi
     

08 Aug, 2016

1 commit

  • When a copy up of a directory occurs which has the opaque xattr set, the
    xattr remains in the upper directory. The immediate behavior with overlayfs
    is that the upper directory is not treated as opaque, however after a
    remount the opaque flag is used and upper directory is treated as opaque.
    This causes files created in the lower layer to be hidden when using
    multiple lower directories.

    Fix by not copying up the opaque flag.

    To reproduce:

    ----8
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=151291
    Signed-off-by: Miklos Szeredi
    Cc:

    Miklos Szeredi
     

29 Jul, 2016

6 commits

  • Setting POSIX ACL needs special handling:

    1) Some permission checks are done by ->setxattr() which now uses mounter's
    creds ("ovl: do operations on underlying file system in mounter's
    context"). These permission checks need to be done with current cred as
    well.

    2) Setting ACL can fail for various reasons. We do not need to copy up in
    these cases.

    In the mean time switch to using generic_setxattr.

    [Arnd Bergmann] Fix link error without POSIX ACL. posix_acl_from_xattr()
    doesn't have a 'static inline' implementation when CONFIG_FS_POSIX_ACL is
    disabled, and I could not come up with an obvious way to do it.

    This instead avoids the link error by defining two sets of ACL operations
    and letting the compiler drop one of the two at compile time depending
    on CONFIG_FS_POSIX_ACL. This avoids all references to the ACL code,
    also leading to smaller code.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Inode attributes are copied up to overlay inode (uid, gid, mode, atime,
    mtime, ctime) so generic code using these fields works correcty. If a hard
    link is created in overlayfs separate inodes are allocated for each link.
    If chmod/chown/etc. is performed on one of the links then the inode
    belonging to the other ones won't be updated.

    This patch attempts to fix this by sharing inodes for hard links.

    Use inode hash (with real inode pointer as a key) to make sure overlay
    inodes are shared for hard links on upper. Hard links on lower are still
    split (which is not user observable until the copy-up happens, see
    Documentation/filesystems/overlayfs.txt under "Non-standard behavior").

    The inode is only inserted in the hash if it is non-directoy and upper.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • To get from overlay inode to real inode we currently use 'struct
    ovl_entry', which has lifetime connected to overlay dentry. This is okay,
    since each overlay dentry had a new overlay inode allocated.

    Following patch will break that assumption, so need to leave out ovl_entry.
    This patch stores the real inode directly in i_private, with the lowest bit
    used to indicate whether the inode is upper or lower.

    Lifetime rules remain, using ovl_inode_real() must only be done while
    caller holds ref on overlay dentry (and hence on real dentry), or within
    RCU protected regions.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Fix atime update logic in overlayfs.

    This patch adds an i_op->update_time() handler to overlayfs inodes. This
    forwards atime updates to the upper layer only. No atime updates are done
    on lower layers.

    Remove implicit atime updates to underlying files and directories with
    O_NOATIME. Remove explicit atime update in ovl_readlink().

    Clear atime related mnt flags from cloned upper mount. This means atime
    updates are controlled purely by overlayfs mount options.

    Reported-by: Konstantin Khlebnikov
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • The fact that we always do permission checking on the overlay inode and
    clear MAY_WRITE for checking access to the lower inode allows cruft to be
    removed from ovl_permission().

    1) "default_permissions" option effectively did generic_permission() on the
    overlay inode with i_mode, i_uid and i_gid updated from underlying
    filesystem. This is what we do by default now. It did the update using
    vfs_getattr() but that's only needed if the underlying filesystem can
    change (which is not allowed). We may later introduce a "paranoia_mode"
    that verifies that mode/uid/gid are not changed.

    2) splitting out the IS_RDONLY() check from inode_permission() also becomes
    unnecessary once we remove the MAY_WRITE from the lower inode check.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Now we are planning to do DAC permission checks on overlay inode
    itself. And to make it work, we will need to make sure we can get acls from
    underlying inode. So define ->get_acl() for overlay inodes and this in turn
    calls into underlying filesystem to get acls, if any.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     

27 Jul, 2016

1 commit


04 Jul, 2016

1 commit

  • Right now when a new overlay inode is created, we initialize overlay
    inode's ->i_mode from underlying inode ->i_mode but we retain only
    file type bits (S_IFMT) and discard permission bits.

    This patch changes it and retains permission bits too. This should allow
    overlay to do permission checks on overlay inode itself in task context.

    [SzM] It also fixes clearing suid/sgid bits on write.

    Signed-off-by: Vivek Goyal
    Reported-by: Eryu Guan
    Signed-off-by: Miklos Szeredi
    Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
    Cc:

    Vivek Goyal
     

30 Jun, 2016

1 commit

  • The two methods essentially do the same: find the real dentry/inode
    belonging to an overlay dentry. The difference is in the usage:

    vfs_open() uses ->d_select_inode() and expects the function to perform
    copy-up if necessary based on the open flags argument.

    file_dentry() uses ->d_real() passing in the overlay dentry as well as the
    underlying inode.

    vfs_rename() uses ->d_select_inode() but passes zero flags. ->d_real()
    with a zero inode would have worked just as well here.

    This patch merges the functionality of ->d_select_inode() into ->d_real()
    by adding an 'open_flags' argument to the latter.

    [Al Viro] Make the signature of d_real() match that of ->d_real() again.
    And constify the inode argument, while we are at it.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

28 May, 2016

3 commits

  • Pull vfs fixes from Al Viro:
    "Followups to the parallel lookup work:

    - update docs

    - restore killability of the places that used to take ->i_mutex
    killably now that we have down_write_killable() merged

    - Additionally, it turns out that I missed a prerequisite for
    security_d_instantiate() stuff - ->getxattr() wasn't the only thing
    that could be called before dentry is attached to inode; with smack
    we needed the same treatment applied to ->setxattr() as well"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    switch ->setxattr() to passing dentry and inode separately
    switch xattr_handler->set() to passing dentry and inode separately
    restore killability of old mutex_lock_killable(&inode->i_mutex) users
    add down_write_killable_nested()
    update D/f/directory-locking

    Linus Torvalds
     
  • smack ->d_instantiate() uses ->setxattr(), so to be able to call it before
    we'd hashed the new dentry and attached it to inode, we need ->setxattr()
    instances getting the inode as an explicit argument rather than obtaining
    it from dentry.

    Similar change for ->getxattr() had been done in commit ce23e64. Unlike
    ->getxattr() (which is used by both selinux and smack instances of
    ->d_instantiate()) ->setxattr() is used only by smack one and unfortunately
    it got missed back then.

    Reported-by: Seung-Woo Kim
    Tested-by: Casey Schaufler
    Signed-off-by: Al Viro

    Al Viro
     
  • Pull overlayfs update from Miklos Szeredi:
    "The meat of this is a change to use the mounter's credentials for
    operations that require elevated privileges (such as whiteout
    creation). This fixes behavior under user namespaces as well as being
    a nice cleanup"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: Do d_type check only if work dir creation was successful
    ovl: update documentation
    ovl: override creds with the ones from the superblock mounter

    Linus Torvalds
     

27 May, 2016

1 commit

  • In user namespace the whiteout creation fails with -EPERM because the
    current process isn't capable(CAP_SYS_ADMIN) when setting xattr.

    A simple reproducer:

    $ mkdir upper lower work merged lower/dir
    $ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
    $ unshare -m -p -f -U -r bash

    Now as root in the user namespace:

    \# touch merged/dir/{1,2,3} # this will force a copy up of lower/dir
    \# rm -fR merged/*

    This ends up failing with -EPERM after the files in dir has been
    correctly deleted:

    unlinkat(4, "2", 0) = 0
    unlinkat(4, "1", 0) = 0
    unlinkat(4, "3", 0) = 0
    close(4) = 0
    unlinkat(AT_FDCWD, "merged/dir", AT_REMOVEDIR) = -1 EPERM (Operation not
    permitted)

    Interestingly, if you don't place files in merged/dir you can remove it,
    meaning if upper/dir does not exist, creating the char device file works
    properly in that same location.

    This patch uses ovl_sb_creator_cred() to get the cred struct from the
    superblock mounter and override the old cred with these new ones so that
    the whiteout creation is possible because overlay is wrong in assuming that
    the creds it will get with prepare_creds will be in the initial user
    namespace. The old cap_raise game is removed in favor of just overriding
    the old cred struct.

    This patch also drops from ovl_copy_up_one() the following two lines:

    override_cred->fsuid = stat->uid;
    override_cred->fsgid = stat->gid;

    This is because the correct uid and gid are taken directly with the stat
    struct and correctly set with ovl_set_attr().

    Signed-off-by: Antonio Murdaca
    Signed-off-by: Miklos Szeredi

    Antonio Murdaca
     

11 Apr, 2016

1 commit


22 Mar, 2016

1 commit

  • In some instances xfs has been created with ftype=0 and there if a file
    on lower fs is removed, overlay leaves a whiteout in upper fs but that
    whiteout does not get filtered out and is visible to overlayfs users.

    And reason it does not get filtered out because upper filesystem does
    not report file type of whiteout as DT_CHR during iterate_dir().

    So it seems to be a requirement that upper filesystem support d_type for
    overlayfs to work properly. Do this check during mount and fail if d_type
    is not supported.

    Suggested-by: Dave Chinner
    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     

22 Jan, 2016

1 commit

  • Pull overlayfs updates from Miklos Szeredi:
    "This contains several bug fixes and a new mount option
    'default_permissions' that allows read-only exported NFS
    filesystems to be used as lower layer"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: check dentry positiveness in ovl_cleanup_whiteouts()
    ovl: setattr: check permissions before copy-up
    ovl: root: copy attr
    ovl: move super block magic number to magic.h
    ovl: use a minimal buffer in ovl_copy_xattr
    ovl: allow zero size xattr
    ovl: default permissions

    Linus Torvalds
     

07 Dec, 2015

1 commit


12 Oct, 2015

1 commit

  • Add mount option "default_permissions" to alter the way permissions are
    calculated.

    Without this option and prior to this patch permissions were calculated by
    underlying lower or upper filesystem.

    With this option the permissions are calculated by overlayfs based on the
    file owner, group and mode bits.

    This has significance for example when a read-only exported NFS filesystem
    is used as a lower layer. In this case the underlying NFS filesystem will
    reply with EROFS, in which case all we know is that the filesystem is
    read-only. But that's not what we are interested in, we are interested in
    whether the access would be allowed if the filesystem wasn't read-only; the
    server doesn't tell us that, and would need updating at various levels,
    which doesn't seem practicable.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

19 Jun, 2015

1 commit

  • Make file->f_path always point to the overlay dentry so that the path in
    /proc/pid/fd is correct and to ensure that label-based LSMs have access to the
    overlay as well as the underlay (path-based LSMs probably don't need it).

    Using my union testsuite to set things up, before the patch I see:

    [root@andromeda union-testsuite]# bash 5 /a/foo107
    [root@andromeda union-testsuite]# stat /mnt/a/foo107
    ...
    Device: 23h/35d Inode: 13381 Links: 1
    ...
    [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
    ...
    Device: 23h/35d Inode: 13381 Links: 1
    ...

    After the patch:

    [root@andromeda union-testsuite]# bash 5 /mnt/a/foo107
    [root@andromeda union-testsuite]# stat /mnt/a/foo107
    ...
    Device: 23h/35d Inode: 40346 Links: 1
    ...
    [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
    ...
    Device: 23h/35d Inode: 40346 Links: 1
    ...

    Note the change in where /proc/$$/fd/5 points to in the ls command. It was
    pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
    (which is correct).

    The inode accessed, however, is the lower layer. The union layer is on device
    25h/37d and the upper layer on 24h/36d.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

13 Dec, 2014

3 commits

  • This patch adds two macros:

    OVL_XATTR_PRE_NAME and OVL_XATTR_PRE_LEN

    to present ovl_xattr name prefix and its length. Also, a
    new macro OVL_XATTR_OPAQUE is introduced to replace old
    *ovl_opaque_xattr*.

    Fix the length of "trusted.overlay." to *16*.

    Signed-off-by: hujianyang
    Signed-off-by: Miklos Szeredi

    hujianyang
     
  • Add helper to iterate through all the layers, starting from the upper layer
    (if exists) and continuing down through the lower layers.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • OVL_PATH_PURE_UPPER -> __OVL_PATH_UPPER | __OVL_PATH_PURE
    OVL_PATH_UPPER -> __OVL_PATH_UPPER
    OVL_PATH_MERGE -> __OVL_PATH_UPPER | __OVL_PATH_MERGE
    OVL_PATH_LOWER -> 0

    Multiple R/O layers will allow __OVL_PATH_MERGE without __OVL_PATH_UPPER.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

24 Oct, 2014

1 commit

  • Overlayfs allows one, usually read-write, directory tree to be
    overlaid onto another, read-only directory tree. All modifications
    go to the upper, writable layer.

    This type of mechanism is most often used for live CDs but there's a
    wide variety of other uses.

    The implementation differs from other "union filesystem"
    implementations in that after a file is opened all operations go
    directly to the underlying, lower or upper, filesystems. This
    simplifies the implementation and allows native performance in these
    cases.

    The dentry tree is duplicated from the underlying filesystems, this
    enables fast cached lookups without adding special support into the
    VFS. This uses slightly more memory than union mounts, but dentries
    are relatively small.

    Currently inodes are duplicated as well, but it is a possible
    optimization to share inodes for non-directories.

    Opening non directories results in the open forwarded to the
    underlying filesystem. This makes the behavior very similar to union
    mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file
    descriptors).

    Usage:

    mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper/upper,workdir=/upper/work /overlay

    The following cotributions have been folded into this patch:

    Neil Brown :
    - minimal remount support
    - use correct seek function for directories
    - initialise is_real before use
    - rename ovl_fill_cache to ovl_dir_read

    Felix Fietkau :
    - fix a deadlock in ovl_dir_read_merged
    - fix a deadlock in ovl_remove_whiteouts

    Erez Zadok
    - fix cleanup after WARN_ON

    Sedat Dilek
    - fix up permission to confirm to new API

    Robin Dong
    - fix possible leak in ovl_new_inode
    - create new inode in ovl_link

    Andy Whitcroft
    - switch to __inode_permission()
    - copy up i_uid/i_gid from the underlying inode

    AV:
    - ovl_copy_up_locked() - dput(ERR_PTR(...)) on two failure exits
    - ovl_clear_empty() - one failure exit forgetting to do unlock_rename(),
    lack of check for udir being the parent of upper, dropping and regaining
    the lock on udir (which would require _another_ check for parent being
    right).
    - bogus d_drop() in copyup and rename [fix from your mail]
    - copyup/remove and copyup/rename races [fix from your mail]
    - ovl_dir_fsync() leaving ERR_PTR() in ->realfile
    - ovl_entry_free() is pointless - it's just a kfree_rcu()
    - fold ovl_do_lookup() into ovl_lookup()
    - manually assigning ->d_op is wrong. Just use ->s_d_op.
    [patches picked from Miklos]:
    * copyup/remove and copyup/rename races
    * bogus d_drop() in copyup and rename

    Also thanks to the following people for testing and reporting bugs:

    Jordi Pujol
    Andy Whitcroft
    Michal Suchanek
    Felix Fietkau
    Erez Zadok
    Randy Dunlap

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi