21 Nov, 2018

3 commits

  • commit 5e1275808630ea3b2c97c776f40e475017535f72 upstream.

    Kaixuxia repors that it's possible to crash overlayfs by removing the
    whiteout on the upper layer before creating a directory over it. This is a
    reproducer:

    mkdir lower upper work merge
    touch lower/file
    mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merge
    rm merge/file
    ls -al merge/file
    rm upper/file
    ls -al merge/
    mkdir merge/file

    Before commencing with a vfs_rename(..., RENAME_EXCHANGE) verify that the
    lookup of "upper" is positive and is a whiteout, and return ESTALE
    otherwise.

    Reported by: kaixuxia
    Signed-off-by: Miklos Szeredi
    Fixes: e9be9d5e76e3 ("overlay filesystem")
    Cc: # v3.18
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit 6cd078702f2f33cb6b19a682de3e9184112f1a46 upstream.

    linking a non-copied-up file into a non-copied-up parent results in a
    nested call to mutex_lock_interruptible(&oi->lock). Fix this by copying up
    target parent before ovl_nlink_start(), same as done in ovl_rename().

    ~/unionmount-testsuite$ ./run --ov -s
    ~/unionmount-testsuite$ ln /mnt/a/foo100 /mnt/a/dir100/

    WARNING: possible recursive locking detected
    --------------------------------------------
    ln/1545 is trying to acquire lock:
    00000000bcce7c4c (&ovl_i_lock_key[depth]){+.+.}, at:
    ovl_copy_up_start+0x28/0x7d
    but task is already holding lock:
    0000000026d73d5b (&ovl_i_lock_key[depth]){+.+.}, at:
    ovl_nlink_start+0x3c/0xc1

    [SzM: this seems to be a false positive, but doing the copy-up first is
    harmless and removes the lockdep splat]

    Reported-by: syzbot+3ef5c0d1a5cb0b21e6be@syzkaller.appspotmail.com
    Fixes: 5f8415d6b87e ("ovl: persistent overlay inode nlink for...")
    Cc: # v4.13
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    [amir: backport to v4.18]
    Signed-off-by: Amir Goldstein
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     
  • commit babf4770be0adc69e6d2de150f4040f175e24beb upstream.

    We hit a BUG on kfree of an ERR_PTR()...

    Reported-by: syzbot+ff03fe05c717b82502d0@syzkaller.appspotmail.com
    Fixes: 8b88a2e64036 ("ovl: verify upper root dir matches lower root dir")
    Cc: # v4.13
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     

10 Nov, 2018

1 commit

  • commit a725356b6659469d182d662f22d770d83d3bc7b5 upstream.

    Commit 031a072a0b8a ("vfs: call vfs_clone_file_range() under freeze
    protection") created a wrapper do_clone_file_range() around
    vfs_clone_file_range() moving the freeze protection to former, so
    overlayfs could call the latter.

    The more common vfs practice is to call do_xxx helpers from vfs_xxx
    helpers, where freeze protecction is taken in the vfs_xxx helper, so
    this anomality could be a source of confusion.

    It seems that commit 8ede205541ff ("ovl: add reflink/copyfile/dedup
    support") may have fallen a victim to this confusion -
    ovl_clone_file_range() calls the vfs_clone_file_range() helper in the
    hope of getting freeze protection on upper fs, but in fact results in
    overlayfs allowing to bypass upper fs freeze protection.

    Swap the names of the two helpers to conform to common vfs practice
    and call the correct helpers from overlayfs and nfsd.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Fixes: 031a072a0b8a ("vfs: call vfs_clone_file_range() under freeze...")
    Signed-off-by: Amir Goldstein
    Signed-off-by: Sasha Levin

    Amir Goldstein
     

10 Oct, 2018

3 commits

  • commit 1a8f8d2a443ef9ad9a3065ba8c8119df714240fa upstream.

    Format has a typo: it was meant to be "%.*s", not "%*s". But at some point
    callers grew nonprintable values as well, so use "%*pE" instead with a
    maximized length.

    Reported-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Fixes: 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
    Cc: # v4.12
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit 63e132528032ce937126aba591a7b37ec593a6bb upstream.

    The memory leak was detected by kmemleak when running xfstests
    overlay/051,053

    Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries")
    Cc: # v4.13
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     
  • commit 601350ff58d5415a001769532f6b8333820e5786 upstream.

    KASAN detected slab-out-of-bounds access in printk from overlayfs,
    because string format used %*s instead of %.*s.

    > BUG: KASAN: slab-out-of-bounds in string+0x298/0x2d0 lib/vsprintf.c:604
    > Read of size 1 at addr ffff8801c36c66ba by task syz-executor2/27811
    >
    > CPU: 0 PID: 27811 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #36
    ...
    > printk+0xa7/0xcf kernel/printk/printk.c:1996
    > ovl_lookup_index.cold.15+0xe8/0x1f8 fs/overlayfs/namei.c:689

    Reported-by: syzbot+376cea2b0ef340db3dd4@syzkaller.appspotmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin")
    Cc: # v4.13
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     

04 Oct, 2018

1 commit

  • commit 764baba80168ad3adafb521d2ab483ccbc49e344 upstream.

    Commit 31747eda41ef ("ovl: hash directory inodes for fsnotify")
    fixed an issue of inotify watch on directory that stops getting
    events after dropping dentry caches.

    A similar issue exists for non-dir non-upper files, for example:

    $ mkdir -p lower upper work merged
    $ touch lower/foo
    $ mount -t overlay -o
    lowerdir=lower,workdir=work,upperdir=upper none merged
    $ inotifywait merged/foo &
    $ echo 2 > /proc/sys/vm/drop_caches
    $ cat merged/foo

    inotifywait doesn't get the OPEN event, because ovl_lookup() called
    from 'cat' allocates a new overlay inode and does not reuse the
    watched inode.

    Fix this by hashing non-dir overlay inodes by lower real inode in
    the following cases that were not hashed before this change:
    - A non-upper overlay mount
    - A lower non-hardlink when index=off

    A helper ovl_hash_bylower() was added to put all the logic and
    documentation about which real inode an overlay inode is hashed by
    into one place.

    The issue dates back to initial version of overlayfs, but this
    patch depends on ovl_inode code that was introduced in kernel v4.13.

    Cc: #v4.13
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Mark Salyzyn #4.14
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     

10 Sep, 2018

1 commit

  • commit 67810693077afc1ebf9e1646af300436cb8103c2 upstream.

    Only upper dir can be impure, but if we are in the middle of
    iterating a lower real dir, dir could be copied up and marked
    impure. We only want the impure cache if we started iterating
    a real upper dir to begin with.

    Aditya Kali reported that the following reproducer hits the
    WARN_ON(!cache->refcount) in ovl_get_cache():

    docker run --rm drupal:8.5.4-fpm-alpine \
    sh -c 'cd /var/www/html/vendor/symfony && \
    chown -R www-data:www-data . && ls -l .'

    Reported-by: Aditya Kali
    Tested-by: Aditya Kali
    Fixes: 4edb83bb1041 ('ovl: constant d_ino for non-merge dirs')
    Cc: # v4.14
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     

03 Aug, 2018

1 commit

  • commit e8d4bfe3a71537284a90561f77c85dea6c154369 upstream.

    When executing filesystem sync or umount on overlayfs,
    dirty data does not get synced as expected on upper filesystem.
    This patch fixes sync filesystem method to keep data consistency
    for overlayfs.

    Signed-off-by: Chengguang Xu
    Fixes: e593b2bf513d ("ovl: properly implement sync_filesystem()")
    Cc: #4.11
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Chengguang Xu
     

19 Apr, 2018

1 commit

  • commit 3ec9b3fafcaf441cc4d46b9742cd6ec0c79f8df0 upstream.

    As of now if we encounter an opaque dir while looking for a dentry, we set
    d->last=true. This means that there is no need to look further in any of
    the lower layers. This works fine as long as there are no redirets or
    relative redircts. But what if there is an absolute redirect on the
    children dentry of opaque directory. We still need to continue to look into
    next lower layer. This patch fixes it.

    Here is an example to demonstrate the issue. Say you have following setup.

    upper: /redirect (redirect=/a/b/c)
    lower1: /a/[b]/c ([b] is opaque) (c has absolute redirect=/a/b/d/)
    lower0: /a/b/d/foo

    Now "redirect" dir should merge with lower1:/a/b/c/ and lower0:/a/b/d.
    Note, despite the fact lower1:/a/[b] is opaque, we need to continue to look
    into lower0 because children c has an absolute redirect.

    Following is a reproducer.

    Watch me make foo disappear:

    $ mkdir lower middle upper work work2 merged
    $ mkdir lower/origin
    $ touch lower/origin/foo
    $ mount -t overlay none merged/ \
    -olowerdir=lower,upperdir=middle,workdir=work2
    $ mkdir merged/pure
    $ mv merged/origin merged/pure/redirect
    $ umount merged
    $ mount -t overlay none merged/ \
    -olowerdir=middle:lower,upperdir=upper,workdir=work
    $ mv merged/pure/redirect merged/redirect

    Now you see foo inside a twice redirected merged dir:

    $ ls merged/redirect
    foo
    $ umount merged
    $ mount -t overlay none merged/ \
    -olowerdir=middle:lower,upperdir=upper,workdir=work

    After mount cycle you don't see foo inside the same dir:

    $ ls merged/redirect

    During middle layer lookup, the opaqueness of middle/pure is left in
    the lookup state and then middle/pure/redirect is wrongly treated as
    opaque.

    Fixes: 02b69b284cd7 ("ovl: lookup redirects")
    Cc: #v4.10
    Signed-off-by: Amir Goldstein
    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     

22 Feb, 2018

1 commit

  • commit 31747eda41ef3c30c09c5c096b380bf54013746a upstream.

    fsnotify pins a watched directory inode in cache, but if directory dentry
    is released, new lookup will allocate a new dentry and a new inode.
    Directory events will be notified on the new inode, while fsnotify listener
    is watching the old pinned inode.

    Hash all directory inodes to reuse the pinned inode on lookup. Pure upper
    dirs are hashes by real upper inode, merge and lower dirs are hashed by
    real lower inode.

    The reference to lower inode was being held by the lower dentry object
    in the overlay dentry (oe->lowerstack[0]). Releasing the overlay dentry
    may drop lower inode refcount to zero. Add a refcount on behalf of the
    overlay inode to prevent that.

    As a by-product, hashing directory inodes also detects multiple
    redirected dirs to the same lower dir and uncovered redirected dir
    target on and returns -ESTALE on lookup.

    The reported issue dates back to initial version of overlayfs, but this
    patch depends on ovl_inode code that was introduced in kernel v4.13.

    Cc: #v4.13
    Reported-by: Niklas Cassel
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Tested-by: Niklas Cassel
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     

17 Feb, 2018

2 commits

  • commit a5a927a7c82e28ea76599dee4019c41e372c911f upstream.

    The optimization in ovl_cache_get_impure() that tries to remove an
    unneeded "impure" xattr needs to take mnt_want_write() on upper fs.

    Fixes: 4edb83bb1041 ("ovl: constant d_ino for non-merge dirs")
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     
  • commit d796e77f1dd541fe34481af2eee6454688d13982 upstream.

    As a writable mount, it is not expected for overlayfs to return
    EINVAL/EROFS for fsync, even if dir/file is not changed.

    This commit fixes the case of fsync of directory, which is easier to
    address, because overlayfs already implements fsync file operation for
    directories.

    The problem reported by Raphael is that new PostgreSQL 10.0 with a
    database in overlayfs where lower layer in squashfs fails to start.
    The failure is due to fsync error, when PostgreSQL does fsync on all
    existing db directories on startup and a specific directory exists
    lower layer with no changes.

    Reported-by: Raphael Hertzog
    Signed-off-by: Amir Goldstein
    Tested-by: Raphaël Hertzog
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     

25 Dec, 2017

1 commit

  • commit 3382290ed2d5e275429cef510ab21889d3ccd164 upstream.

    [ Note, this is a Git cherry-pick of the following commit:

    506458efaf15 ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()")

    ... for easier x86 PTI code testing and back-porting. ]

    READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it
    can be used instead of lockless_dereference() without any change in
    semantics.

    Signed-off-by: Will Deacon
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Will Deacon
     

20 Dec, 2017

2 commits

  • commit b02a16e6413a2f782e542ef60bad9ff6bf212f8a upstream.

    This fixes a regression with readdir of impure dir in overlayfs
    that is shared to VM via 9p fs.

    Reported-by: Miguel Bernal Marin
    Fixes: 4edb83bb1041 ("ovl: constant d_ino for non-merge dirs")
    Signed-off-by: Amir Goldstein
    Tested-by: Miguel Bernal Marin
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     
  • commit 08d8f8a5b094b66b29936e8751b4a818b8db1207 upstream.

    Right now we seem to be passing index as "lowerdentry" and origin.dentry
    as "upperdentry". IIUC, we should pass these parameters in reversed order
    and this looks like a bug.

    Signed-off-by: Vivek Goyal
    Acked-by: Amir Goldstein
    Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries")
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Vivek Goyal
     

30 Nov, 2017

1 commit

  • commit 5455f92b54e516995a9ca45bbf790d3629c27a93 upstream.

    If ovl_check_origin() fails, we should put upperdentry. We have a reference
    on it by now. So goto out_put_upper instead of out.

    Fixes: a9d019573e88 ("ovl: lookup non-dir copy-up-origin by file handle")
    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Vivek Goyal
     

24 Oct, 2017

3 commits

  • With index=on, ovl_indexdir_cleanup() tries to cleanup invalid index
    entries (e.g. bad index name). This behavior could result in cleaning of
    entries created by newer kernels and is therefore undesirable.
    Instead, abort mount if such entries are encountered. We still cleanup
    'stale' entries and 'orphan' entries, both those cases can be a result
    of offline changes to lower and upper dirs.

    When encoutering an index entry of type directory or whiteout, kernel
    was supposed to fallback to read-only mount, but the fill_super()
    operation returns EROFS in this case instead of returning success with
    read-only mount flag, so mount fails when encoutering directory or
    whiteout index entries. Bless this behavior by returning -EINVAL on
    directory and whiteout index entries as we do for all unsupported index
    entries.

    Fixes: 61b674710cd9 ("ovl: do not cleanup directory and whiteout index..")
    Cc: # v4.13
    Signed-off-by: Amir Goldstein

    Amir Goldstein
     
  • Treat ENOENT from index entry lookup the same way as treating a returned
    negative dentry. Apparently, either could be returned if file is not
    found, depending on the underlying file system.

    Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin")
    Cc: # v4.13
    Signed-off-by: Amir Goldstein

    Amir Goldstein
     
  • Commit fbaf94ee3cd5 ("ovl: don't set origin on broken lower hardlink")
    attempt to avoid the condition of non-indexed upper inode with lower
    hardlink as origin. If this condition is found, lookup returns EIO.

    The protection of commit mentioned above does not cover the case of lower
    that is not a hardlink when it is copied up (with either index=off/on)
    and then lower is hardlinked while overlay is offline.

    Changes to lower layer while overlayfs is offline should not result in
    unexpected behavior, so a permanent EIO error after creating a link in
    lower layer should not be considered as correct behavior.

    This fix replaces EIO error with success in cases where upper has origin
    but no index is found, or index is found that does not match upper
    inode. In those cases, lookup will not fail and the returned overlay inode
    will be hashed by upper inode instead of by lower origin inode.

    Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin")
    Cc: # v4.13
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     

19 Oct, 2017

2 commits


05 Oct, 2017

5 commits

  • Enforcing exclusive ownership on upper/work dirs caused a docker
    regression: https://github.com/moby/moby/issues/34672.

    Euan spotted the regression and pointed to the offending commit.
    Vivek has brought the regression to my attention and provided this
    reproducer:

    Terminal 1:

    mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
    merged/

    Terminal 2:

    unshare -m

    Terminal 1:

    umount merged
    mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
    merged/
    mount: /root/overlay-testing/merged: none already mounted or mount point
    busy

    To fix the regression, I replaced the error with an alarming warning.
    With index feature enabled, mount does fail, but logs a suggestion to
    override exclusive dir protection by disabling index.
    Note that index=off mount does take the inuse locks, so a concurrent
    index=off will issue the warning and a concurrent index=on mount will fail.

    Documentation was updated to reflect this change.

    Fixes: 2cac0c00a6cd ("ovl: get exclusive ownership on upper/work dirs")
    Cc: # v4.13
    Reported-by: Euan Kemp
    Reported-by: Vivek Goyal
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • Use the ovl_lock_rename_workdir() helper which requires
    unlock_rename() only on lock success.

    Fixes: ("fd210b7d67ee ovl: move copy up lock out")
    Cc: # v4.13
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • index dentry was not released when breaking out of the loop
    due to index verification error.

    Fixes: 415543d5c64f ("ovl: cleanup bad and stale index entries on mount")
    Cc: # v4.13
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries")
    Cc: # v4.13
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin")
    Cc: # v4.13
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     

15 Sep, 2017

1 commit

  • Pull mount flag updates from Al Viro:
    "Another chunk of fmount preparations from dhowells; only trivial
    conflicts for that part. It separates MS_... bits (very grotty
    mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
    only a small subset of MS_... stuff).

    This does *not* convert the filesystems to new constants; only the
    infrastructure is done here. The next step in that series is where the
    conflicts would be; that's the conversion of filesystems. It's purely
    mechanical and it's better done after the merge, so if you could run
    something like

    list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

    sed -i -e 's/\/SB_RDONLY/g' \
    -e 's/\/SB_NOSUID/g' \
    -e 's/\/SB_NODEV/g' \
    -e 's/\/SB_NOEXEC/g' \
    -e 's/\/SB_SYNCHRONOUS/g' \
    -e 's/\/SB_MANDLOCK/g' \
    -e 's/\/SB_DIRSYNC/g' \
    -e 's/\/SB_NOATIME/g' \
    -e 's/\/SB_NODIRATIME/g' \
    -e 's/\/SB_SILENT/g' \
    -e 's/\/SB_POSIXACL/g' \
    -e 's/\/SB_KERNMOUNT/g' \
    -e 's/\/SB_I_VERSION/g' \
    -e 's/\/SB_LAZYTIME/g' \
    $list

    and commit it with something along the lines of 'convert filesystems
    away from use of MS_... constants' as commit message, it would save a
    quite a bit of headache next cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    VFS: Differentiate mount flags (MS_*) from internal superblock flags
    VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
    vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags

    Linus Torvalds
     

14 Sep, 2017

2 commits

  • GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived
    and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's
    primary motivation was to allow users to tell that an allocation is
    short lived and so the allocator can try to place such allocations close
    together and prevent long term fragmentation. As much as this sounds
    like a reasonable semantic it becomes much less clear when to use the
    highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the
    context holding that memory sleep? Can it take locks? It seems there is
    no good answer for those questions.

    The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
    __GFP_RECLAIMABLE which in itself is tricky because basically none of
    the existing caller provide a way to reclaim the allocated memory. So
    this is rather misleading and hard to evaluate for any benefits.

    I have checked some random users and none of them has added the flag
    with a specific justification. I suspect most of them just copied from
    other existing users and others just thought it might be a good idea to
    use without any measuring. This suggests that GFP_TEMPORARY just
    motivates for cargo cult usage without any reasoning.

    I believe that our gfp flags are quite complex already and especially
    those with highlevel semantic should be clearly defined to prevent from
    confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and
    replace all existing users to simply use GFP_KERNEL. Please note that
    SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
    so they will be placed properly for memory fragmentation prevention.

    I can see reasons we might want some gfp flag to reflect shorterm
    allocations but I propose starting from a clear semantic definition and
    only then add users with proper justification.

    This was been brought up before LSF this year by Matthew [1] and it
    turned out that GFP_TEMPORARY really doesn't have a clear semantic. It
    seems to be a heuristic without any measured advantage for most (if not
    all) its current users. The follow up discussion has revealed that
    opinions on what might be temporary allocation differ a lot between
    developers. So rather than trying to tweak existing users into a
    semantic which they haven't expected I propose to simply remove the flag
    and start from scratch if we really need a semantic for short term
    allocations.

    [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

    [akpm@linux-foundation.org: fix typo]
    [akpm@linux-foundation.org: coding-style fixes]
    [sfr@canb.auug.org.au: drm/i915: fix up]
    Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Signed-off-by: Stephen Rothwell
    Acked-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Matthew Wilcox
    Cc: Neil Brown
    Cc: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Pull overlayfs updates from Miklos Szeredi:
    "This fixes d_ino correctness in readdir, which brings overlayfs on par
    with normal filesystems regarding inode number semantics, as long as
    all layers are on the same filesystem.

    There are also some bug fixes, one in particular (random ioctl's
    shouldn't be able to modify lower layers) that touches some vfs code,
    but of course no-op for non-overlay fs"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: fix false positive ESTALE on lookup
    ovl: don't allow writing ioctl on lower layer
    ovl: fix relatime for directories
    vfs: add flags to d_real()
    ovl: cleanup d_real for negative
    ovl: constant d_ino for non-merge dirs
    ovl: constant d_ino across copy up
    ovl: fix readdir error value
    ovl: check snprintf return

    Linus Torvalds
     

12 Sep, 2017

1 commit

  • Commit b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up origin")
    verifies that the origin lower inode stored in the overlayfs inode matched
    the inode of a copy up origin dentry found by lookup.

    There is a false positive result in that check when lower fs does not
    support file handles and copy up origin cannot be followed by file handle
    at lookup time.

    The false negative happens when finding an overlay inode in cache on a
    copied up overlay dentry lookup. The overlay inode still 'remembers' the
    copy up origin inode, but the copy up origin dentry is not available for
    verification.

    Relax the check in case copy up origin dentry is not available.

    Fixes: b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up...")
    Cc: # v4.13
    Reported-by: Jordi Pujol
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     

05 Sep, 2017

2 commits


04 Sep, 2017

1 commit


10 Aug, 2017

1 commit

  • While we could replace the smp_mb__before_spinlock() with the new
    smp_mb__after_spinlock(), the normal pattern is to use
    smp_store_release() to publish an object that is used for
    lockless_dereference() -- and mirrors the regular rcu_assign_pointer()
    / rcu_dereference() patterns.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Al Viro
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

28 Jul, 2017

4 commits

  • Impure directories are ones which contain objects with origins (i.e. those
    that have been copied up). These are relevant to readdir operation only
    because of the d_ino field, no other transformation is necessary. Also a
    directory can become impure between two getdents(2) calls.

    This patch creates a cache for impure directories. Unlike the cache for
    merged directories, this one only contains entries with origin and is not
    refcounted but has a its lifetime tied to that of the dentry.

    Similarly to the merged cache, the impure cache is invalidated based on a
    version number. This version number is incremented when an entry with
    origin is added or removed from the directory.

    If the cache is empty, then the impure xattr is removed from the directory.

    This patch also fixes up handling of d_ino for the ".." entry if the parent
    directory is merged.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • When all layers are on the same fs, and iterating a directory which may
    contain copy up entries, call vfs_getattr() on the overlay entries to make
    sure that d_ino will be consistent with st_ino from stat(2).

    There is an overhead of lookup per upper entry in readdir.

    The overhead is minimal if the iterated entries are already in dcache. It
    is also quite useful for the common case of 'ls -l' that readdir() pre
    populates the dcache with the listed entries, making the following stat()
    calls faster.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • actor's return value is taken as a bool (filled/not filled) so we need to
    return the error in the context.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Signed-off-by: Miklos Szeredi

    Miklos Szeredi