03 Aug, 2018

1 commit

  • commit e8d4bfe3a71537284a90561f77c85dea6c154369 upstream.

    When executing filesystem sync or umount on overlayfs,
    dirty data does not get synced as expected on upper filesystem.
    This patch fixes sync filesystem method to keep data consistency
    for overlayfs.

    Signed-off-by: Chengguang Xu
    Fixes: e593b2bf513d ("ovl: properly implement sync_filesystem()")
    Cc: #4.11
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Chengguang Xu
     

22 Feb, 2018

1 commit

  • commit 31747eda41ef3c30c09c5c096b380bf54013746a upstream.

    fsnotify pins a watched directory inode in cache, but if directory dentry
    is released, new lookup will allocate a new dentry and a new inode.
    Directory events will be notified on the new inode, while fsnotify listener
    is watching the old pinned inode.

    Hash all directory inodes to reuse the pinned inode on lookup. Pure upper
    dirs are hashes by real upper inode, merge and lower dirs are hashed by
    real lower inode.

    The reference to lower inode was being held by the lower dentry object
    in the overlay dentry (oe->lowerstack[0]). Releasing the overlay dentry
    may drop lower inode refcount to zero. Add a refcount on behalf of the
    overlay inode to prevent that.

    As a by-product, hashing directory inodes also detects multiple
    redirected dirs to the same lower dir and uncovered redirected dir
    target on and returns -ESTALE on lookup.

    The reported issue dates back to initial version of overlayfs, but this
    patch depends on ovl_inode code that was introduced in kernel v4.13.

    Cc: #v4.13
    Reported-by: Niklas Cassel
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Tested-by: Niklas Cassel
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     

19 Oct, 2017

1 commit


05 Oct, 2017

1 commit

  • Enforcing exclusive ownership on upper/work dirs caused a docker
    regression: https://github.com/moby/moby/issues/34672.

    Euan spotted the regression and pointed to the offending commit.
    Vivek has brought the regression to my attention and provided this
    reproducer:

    Terminal 1:

    mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
    merged/

    Terminal 2:

    unshare -m

    Terminal 1:

    umount merged
    mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
    merged/
    mount: /root/overlay-testing/merged: none already mounted or mount point
    busy

    To fix the regression, I replaced the error with an alarming warning.
    With index feature enabled, mount does fail, but logs a suggestion to
    override exclusive dir protection by disabling index.
    Note that index=off mount does take the inuse locks, so a concurrent
    index=off will issue the warning and a concurrent index=on mount will fail.

    Documentation was updated to reflect this change.

    Fixes: 2cac0c00a6cd ("ovl: get exclusive ownership on upper/work dirs")
    Cc: # v4.13
    Reported-by: Euan Kemp
    Reported-by: Vivek Goyal
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     

15 Sep, 2017

1 commit

  • Pull mount flag updates from Al Viro:
    "Another chunk of fmount preparations from dhowells; only trivial
    conflicts for that part. It separates MS_... bits (very grotty
    mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
    only a small subset of MS_... stuff).

    This does *not* convert the filesystems to new constants; only the
    infrastructure is done here. The next step in that series is where the
    conflicts would be; that's the conversion of filesystems. It's purely
    mechanical and it's better done after the merge, so if you could run
    something like

    list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

    sed -i -e 's/\/SB_RDONLY/g' \
    -e 's/\/SB_NOSUID/g' \
    -e 's/\/SB_NODEV/g' \
    -e 's/\/SB_NOEXEC/g' \
    -e 's/\/SB_SYNCHRONOUS/g' \
    -e 's/\/SB_MANDLOCK/g' \
    -e 's/\/SB_DIRSYNC/g' \
    -e 's/\/SB_NOATIME/g' \
    -e 's/\/SB_NODIRATIME/g' \
    -e 's/\/SB_SILENT/g' \
    -e 's/\/SB_POSIXACL/g' \
    -e 's/\/SB_KERNMOUNT/g' \
    -e 's/\/SB_I_VERSION/g' \
    -e 's/\/SB_LAZYTIME/g' \
    $list

    and commit it with something along the lines of 'convert filesystems
    away from use of MS_... constants' as commit message, it would save a
    quite a bit of headache next cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    VFS: Differentiate mount flags (MS_*) from internal superblock flags
    VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
    vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags

    Linus Torvalds
     

05 Sep, 2017

2 commits


04 Sep, 2017

1 commit


28 Jul, 2017

1 commit

  • Impure directories are ones which contain objects with origins (i.e. those
    that have been copied up). These are relevant to readdir operation only
    because of the d_ino field, no other transformation is necessary. Also a
    directory can become impure between two getdents(2) calls.

    This patch creates a cache for impure directories. Unlike the cache for
    merged directories, this one only contains entries with origin and is not
    refcounted but has a its lifetime tied to that of the dentry.

    Similarly to the merged cache, the impure cache is invalidated based on a
    version number. This version number is incremented when an entry with
    origin is added or removed from the directory.

    If the cache is empty, then the impure xattr is removed from the directory.

    This patch also fixes up handling of d_ino for the ".." entry if the parent
    directory is merged.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

20 Jul, 2017

1 commit

  • inode_doinit_with_dentry() in SELinux wants to read the upper inode's xattr
    to get security label, and ovl_xattr_get() calls ovl_dentry_real(), which
    depends on dentry->d_inode, but d_inode is null and not initialized yet at
    this point resulting in an Oops.

    Fix by getting the upperdentry info from the inode directly in this case.

    Reported-by: Eryu Guan
    Fixes: 09d8b586731b ("ovl: move __upperdentry to ovl_inode")
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

17 Jul, 2017

1 commit

  • Firstly by applying the following with coccinelle's spatch:

    @@ expression SB; @@
    -SB->s_flags & MS_RDONLY
    +sb_rdonly(SB)

    to effect the conversion to sb_rdonly(sb), then by applying:

    @@ expression A, SB; @@
    (
    -(!sb_rdonly(SB)) && A
    +!sb_rdonly(SB) && A
    |
    -A != (sb_rdonly(SB))
    +A != sb_rdonly(SB)
    |
    -A == (sb_rdonly(SB))
    +A == sb_rdonly(SB)
    |
    -!(sb_rdonly(SB))
    +!sb_rdonly(SB)
    |
    -A && (sb_rdonly(SB))
    +A && sb_rdonly(SB)
    |
    -A || (sb_rdonly(SB))
    +A || sb_rdonly(SB)
    |
    -(sb_rdonly(SB)) != A
    +sb_rdonly(SB) != A
    |
    -(sb_rdonly(SB)) == A
    +sb_rdonly(SB) == A
    |
    -(sb_rdonly(SB)) && A
    +sb_rdonly(SB) && A
    |
    -(sb_rdonly(SB)) || A
    +sb_rdonly(SB) || A
    )

    @@ expression A, B, SB; @@
    (
    -(sb_rdonly(SB)) ? 1 : 0
    +sb_rdonly(SB)
    |
    -(sb_rdonly(SB)) ? A : B
    +sb_rdonly(SB) ? A : B
    )

    to remove left over excess bracketage and finally by applying:

    @@ expression A, SB; @@
    (
    -(A & MS_RDONLY) != sb_rdonly(SB)
    +(bool)(A & MS_RDONLY) != sb_rdonly(SB)
    |
    -(A & MS_RDONLY) == sb_rdonly(SB)
    +(bool)(A & MS_RDONLY) == sb_rdonly(SB)
    )

    to make comparisons against the result of sb_rdonly() (which is a bool)
    work correctly.

    Signed-off-by: David Howells

    David Howells
     

14 Jul, 2017

2 commits

  • ovl_workdir_create() returns a valid index dentry or NULL.

    Reported-by: Dan Carpenter
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • On failure to prepare_creds(), mount fails with a random
    return value, as err was last set to an integer cast of
    a valid lower mnt pointer or set to 0 if inodes index feature
    is enabled.

    Reported-by: Dan Carpenter
    Fixes: 3fe6e52f0626 ("ovl: override creds with the ones from ...")
    Cc: # v4.7
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     

05 Jul, 2017

16 commits

  • index entry should live only as long as there are upper or lower
    hardlinks.

    Cleanup orphan index entries on mount and when dropping the last
    overlay inode nlink.

    When about to cleanup or link up to orphan index and the index inode
    nlink > 1, admit that something went wrong and adjust overlay nlink
    to index inode nlink - 1 to prevent it from dropping below zero.
    This could happen when adding lower hardlinks underneath a mounted
    overlay and then trying to unlink them.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • For rename, we need to ensure that an upper alias exists for hard links
    before attempting the operation. Introduce a flag in ovl_entry to track
    the state of the upper alias.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Bad index entries are entries whose name does not match the
    origin file handle stored in trusted.overlay.origin xattr.
    Bad index entries could be a result of a system power off in
    the middle of copy up.

    Stale index entries are entries whose origin file handle is
    stale. Stale index entries could be a result of copying layers
    or removing lower entries while the overlay is not mounted.
    The case of copying layers should be detected earlier by the
    verification of upper root dir origin and index dir origin.

    Both bad and stale index entries are detected and removed
    on mount.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • An index dir contains persistent hardlinks to files in upper dir.
    Therefore, we must never mount an existing index dir with a differnt
    upper dir.

    Store the upper root dir file handle in index dir inode when index
    dir is created and verify the file handle before using an existing
    index dir on mount.

    Add an 'is_upper' flag to the overlay file handle encoding and set it
    when encoding the upper root file handle. This is not critical for index
    dir verification, but it is good practice towards a standard overlayfs
    file handle format for NFS export.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • When inodes index feature is enabled, verify that the file handle stored
    in upper root dir matches the lower root dir or fail to mount.

    If upper root dir has no stored file handle, encode and store the lower
    root dir file handle in overlay.origin xattr.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • Create the index dir on mount. The index dir will contain hardlinks to
    upper inodes, named after the hex representation of their origin lower
    inodes.

    The index dir is going to be used to prevent breaking lower hardlinks
    on copy up and to implement overlayfs NFS export.

    Because the feature is not fully backward compat, enabling the feature
    is opt-in by config/module/mount option.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • Pass in the subdir name to create and specify if subdir is persistent
    or if it should be cleaned up on every mount.

    Move fallback to readonly mount on failure to create dir and print of error
    message into the helper.

    This function is going to be used for creating the persistent 'index' dir
    under workbasedir.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • Bad things can happen if several concurrent overlay mounts try to
    use the same upperdir/workdir path.

    Try to get the 'inuse' advisory lock on upperdir and workdir.
    Fail mount if another overlay mount instance or another user
    holds the 'inuse' lock on these directories.

    Note that this provides no protection for concurrent overlay
    mount that use overlapping (i.e. descendant) upper/work dirs.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Use the new ovl_inode mutex to synchonize concurrent copy up
    instead of the super block copy up workqueue.

    Moving the synchronization object from the overlay dentry to
    the overlay inode is needed for synchonizing concurrent copy up
    of lower hardlinks to the same upper inode.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • We need some more space to store overlay inode data in memory,
    so allocate overlay inodes from a slab of struct ovl_inode.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     

29 May, 2017

1 commit

  • An upper dir is marked "impure" to let ovl_iterate() know that this
    directory may contain non pure upper entries whose d_ino may need to be
    read from the origin inode.

    We already mark a non-merge dir "impure" when moving a non-pure child
    entry inside it, to let ovl_iterate() know not to iterate the non-merge
    dir directly.

    Mark also a merge dir "impure" when moving a non-pure child entry inside
    it and when copying up a child entry inside it.

    This can be used to optimize ovl_iterate() to perform a "pure merge" of
    upper and lower directories, merging the content of the directories,
    without having to read d_ino from origin inodes.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     

18 May, 2017

1 commit


05 May, 2017

1 commit

  • Some features can only work when all layers are on the same fs. Test this
    condition during mount time, so features can check them later.

    Add helper ovl_same_sb() to return the common super block in case all
    layers are on the same fs.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     

20 Apr, 2017

2 commits

  • For overlay file open, check IS_APPEND() on the real upper inode
    inside d_real(), because the overlay inode does not have the
    S_APPEND flag and IS_APPEND() can only be checked at open time.

    Note that because overlayfs does not copy up the chattr inode flags
    (i.e. S_APPEND, S_IMMUTABLE), the IS_APPEND() check is only relevant
    for upper inodes that were set with chattr +a and not to lower
    inodes that had chattr +a before copy up.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • Prepare to mark sensitive kernel structures for randomization by making
    sure they're using designated initializers. These were identified during
    allyesconfig builds of x86, arm, and arm64, with most initializer fixes
    extracted from grsecurity.

    For these cases, use { }, which will be zero-filled, instead of
    undesignated NULLs.

    Signed-off-by: Kees Cook
    Signed-off-by: Miklos Szeredi

    Kees Cook
     

04 Mar, 2017

1 commit

  • Pull overlayfs updates from Miklos Szeredi:
    "Because copy up can take a long time, serialized copy ups could be a
    big performance bottleneck. This update allows concurrent copy up of
    regular files eliminating this potential problem.

    There are also minor fixes"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials
    ovl: properly implement sync_filesystem()
    ovl: concurrent copy up of regular files
    ovl: introduce copy up waitqueue
    ovl: copy up regular file using O_TMPFILE
    ovl: rearrange code in ovl_copy_up_locked()
    ovl: check if upperdir fs supports O_TMPFILE

    Linus Torvalds
     

02 Mar, 2017

1 commit


07 Feb, 2017

4 commits

  • If overlay was mounted by root then quota set for upper layer does not work
    because overlay now always use mounter's credentials for operations.
    Also overlay might deplete reserved space and inodes in ext4.

    This patch drops capability SYS_RESOURCE from saved credentials.
    This affects creation new files, whiteouts, and copy-up operations.

    Signed-off-by: Konstantin Khlebnikov
    Fixes: 1175b6b8d963 ("ovl: do operations on underlying file system in mounter's context")
    Cc: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Konstantin Khlebnikov
     
  • overlayfs syncs all inode pages on sync_filesystem(), but it also
    needs to call s_op->sync_fs() of upper fs for metadata sync.

    This fixes correctness of syncfs(2) as demonstrated by following
    xfs specific test:

    xfs_sync_stats()
    {
    echo $1
    echo -n "xfs_log_force = "
    grep log /proc/fs/xfs/stat | awk '{ print $5 }'
    }

    xfs_sync_stats "before touch"
    touch x
    xfs_sync_stats "after touch"
    xfs_io -c syncfs .
    xfs_sync_stats "after syncfs"
    xfs_io -c fsync x
    xfs_sync_stats "after fsync"
    xfs_io -c fsync x
    xfs_sync_stats "after fsync #2"

    When this test is run in overlay mount over xfs, log force
    count does not increase with syncfs command.

    Signed-off-by: Amir Goldstein
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • The overlay sb 'copyup_wq' and overlay inode 'copying' condition
    variable are about to replace the upper sb rename_lock, as finer
    grained synchronization objects for concurrent copy up.

    Suggested-by: Miklos Szeredi
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • This is needed for choosing between concurrent copyup
    using O_TMPFILE and legacy copyup using workdir+rename.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein