07 Apr, 2018

1 commit

  • Pull misc vfs updates from Al Viro:
    "Assorted stuff, including Christoph's I_DIRTY patches"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: move I_DIRTY_INODE to fs.h
    ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
    ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
    gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) calls
    fs: fold open_check_o_direct into do_dentry_open
    vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalents
    vfs: make sure struct filename->iname is word-aligned
    get rid of pointless includes of fs_struct.h
    [poll] annotate SAA6588_CMD_POLL users

    Linus Torvalds
     

03 Apr, 2018

9 commits

  • Using the ksys_ftruncate() wrapper allows us to get rid of in-kernel
    calls to the sys_ftruncate() syscall. The ksys_ prefix denotes that this
    function is meant as a drop-in replacement for the syscall. In
    particular, it uses the same calling convention as sys_ftruncate().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the fs-interal do_fchownat() wrapper allows us to get rid of
    fs-internal calls to the sys_fchownat() syscall.

    Introducing the ksys_fchown() helper and the ksys_{,}chown() wrappers
    allows us to avoid the in-kernel calls to the sys_{,l,f}chown() syscalls.
    The ksys_ prefix denotes that these functions are meant as a drop-in
    replacement for the syscalls. In particular, they use the same calling
    convention as sys_{,l,f}chown().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the fs-internal do_faccessat() helper allows us to get rid of
    fs-internal calls to the sys_faccessat() syscall.

    Introducing the ksys_access() wrapper allows us to avoid the in-kernel
    calls to the sys_access() syscall. The ksys_ prefix denotes that this
    function is meant as a drop-in replacement for the syscall. In
    particular, it uses the same calling convention as sys_access().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • … in-kernel calls to syscall

    Using the fs-internal do_fchmodat() helper allows us to get rid of
    fs-internal calls to the sys_fchmodat() syscall.

    Introducing the ksys_fchmod() helper and the ksys_chmod() wrapper allows
    us to avoid the in-kernel calls to the sys_fchmod() and sys_chmod()
    syscalls. The ksys_ prefix denotes that these functions are meant as a
    drop-in replacement for the syscalls. In particular, they use the same
    calling convention as sys_fchmod() and sys_chmod().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>

    Dominik Brodowski
     
  • Using the fs-internal do_linkat() helper allows us to get rid of
    fs-internal calls to the sys_linkat() syscall.

    Introducing the ksys_link() wrapper allows us to avoid the in-kernel
    calls to sys_link() syscall. The ksys_ prefix denotes that this function
    is meant as a drop-in replacement for the syscall. In particular, it uses
    the same calling convention as sys_link().

    In the near future, the only fs-external user of ksys_link() should be
    converted to use vfs_link() instead.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the fs-internal do_mknodat() helper allows us to get rid of
    fs-internal calls to the sys_mknodat() syscall.

    Introducing the ksys_mknod() wrapper allows us to avoid the in-kernel
    calls to sys_mknod() syscall. The ksys_ prefix denotes that this function
    is meant as a drop-in replacement for the syscall. In particular, it uses
    the same calling convention as sys_mknod().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the fs-internal do_symlinkat() helper allows us to get rid of
    fs-internal calls to the sys_symlinkat() syscall.

    Introducing the ksys_symlink() wrapper allows us to avoid the in-kernel
    calls to the sys_symlink() syscall. The ksys_ prefix denotes that this
    function is meant as a drop-in replacement for the syscall. In particular,
    it uses the same calling convention as sys_symlink().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the fs-internal do_mkdirat() helper allows us to get rid of
    fs-internal calls to the sys_mkdirat() syscall.

    Introducing the ksys_mkdir() wrapper allows us to avoid the in-kernel calls
    to the sys_mkdir() syscall. The ksys_ prefix denotes that this function is
    meant as a drop-in replacement for the syscall. In particular, it uses the
    same calling convention as sys_mkdir().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using this wrapper allows us to avoid the in-kernel calls to the
    sys_rmdir() syscall. The ksys_ prefix denotes that this function is meant
    as a drop-in replacement for the syscall. In particular, it uses the same
    calling convention as sys_rmdir().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     

28 Mar, 2018

1 commit


10 Nov, 2017

1 commit


14 Sep, 2017

1 commit

  • Pull overlayfs updates from Miklos Szeredi:
    "This fixes d_ino correctness in readdir, which brings overlayfs on par
    with normal filesystems regarding inode number semantics, as long as
    all layers are on the same filesystem.

    There are also some bug fixes, one in particular (random ioctl's
    shouldn't be able to modify lower layers) that touches some vfs code,
    but of course no-op for non-overlay fs"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: fix false positive ESTALE on lookup
    ovl: don't allow writing ioctl on lower layer
    ovl: fix relatime for directories
    vfs: add flags to d_real()
    ovl: cleanup d_real for negative
    ovl: constant d_ino for non-merge dirs
    ovl: constant d_ino across copy up
    ovl: fix readdir error value
    ovl: check snprintf return

    Linus Torvalds
     

05 Sep, 2017

1 commit

  • Problem with ioctl() is that it's a file operation, yet often used as an
    inode operation (i.e. modify the inode despite the file being opened for
    read-only).

    mnt_want_write_file() is used by filesystems in such cases to get write
    access on an arbitrary open file.

    Since overlayfs lets filesystems do all file operations, including ioctl,
    this can lead to mnt_want_write_file() returning OK for a lower file and
    modification of that lower file.

    This patch prevents modification by checking if the file is from an
    overlayfs lower layer and returning EPERM in that case.

    Need to introduce a mnt_want_write_file_path() variant that still does the
    old thing for inode operations that can do the copy up + modification
    correctly in such cases (fchown, fsetxattr, fremovexattr).

    This does not address the correctness of such ioctls on overlayfs (the
    correct way would be to copy up and attempt to perform ioctl on upper
    file).

    In theory this could be a regression. We very much hope that nobody is
    relying on such a hack in any sane setup.

    While this patch meddles in VFS code, it has no effect on non-overlayfs
    filesystems.

    Reported-by: "zhangyi (F)"
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

02 Sep, 2017

1 commit

  • When we introduced the bmap redo log items, we set MS_ACTIVE on the
    mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
    from being truncated prematurely during log recovery. This also had the
    effect of putting linked inodes on the lru instead of evicting them.

    Unfortunately, we neglected to find all those unreferenced lru inodes
    and evict them after finishing log recovery, which means that we leak
    them if anything goes wrong in the rest of xfs_mountfs, because the lru
    is only cleaned out on unmount.

    Therefore, evict unreferenced inodes in the lru list immediately
    after clearing MS_ACTIVE.

    Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
    Signed-off-by: Darrick J. Wong
    Cc: viro@ZenIV.linux.org.uk
    Reviewed-by: Brian Foster

    Darrick J. Wong
     

10 May, 2017

1 commit

  • Pull misc vfs updates from Al Viro:
    "Assorted bits and pieces from various people. No common topic in this
    pile, sorry"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs/affs: add rename exchange
    fs/affs: add rename2 to prepare multiple methods
    Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()
    fs: don't set *REFERENCED on single use objects
    fs: compat: Remove warning from COMPATIBLE_IOCTL
    remove pointless extern of atime_need_update_rcu()
    fs: completely ignore unknown open flags
    fs: add a VALID_OPEN_FLAGS
    fs: remove _submit_bh()
    fs: constify tree_descr arrays passed to simple_fill_super()
    fs: drop duplicate header percpu-rwsem.h
    fs/affs: bugfix: Write files greater than page size on OFS
    fs/affs: bugfix: enable writes on OFS disks
    fs/affs: remove node generation check
    fs/affs: import amigaffs.h
    fs/affs: bugfix: make symbolic links work again

    Linus Torvalds
     

30 Apr, 2017

1 commit


18 Apr, 2017

1 commit


31 Jan, 2017

1 commit


18 Dec, 2016

1 commit

  • Pull more vfs updates from Al Viro:
    "In this pile:

    - autofs-namespace series
    - dedupe stuff
    - more struct path constification"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
    ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features
    ocfs2: charge quota for reflinked blocks
    ocfs2: fix bad pointer cast
    ocfs2: always unlock when completing dio writes
    ocfs2: don't eat io errors during _dio_end_io_write
    ocfs2: budget for extent tree splits when adding refcount flag
    ocfs2: prohibit refcounted swapfiles
    ocfs2: add newlines to some error messages
    ocfs2: convert inode refcount test to a helper
    simple_write_end(): don't zero in short copy into uptodate
    exofs: don't mess with simple_write_{begin,end}
    9p: saner ->write_end() on failing copy into non-uptodate page
    fix gfs2_stuffed_write_end() on short copies
    fix ceph_write_end()
    nfs_write_end(): fix handling of short copies
    vfs: refactor clone/dedupe_file_range common functions
    fs: try to clone files first in vfs_copy_file_range
    vfs: misc struct path constification
    namespace.c: constify struct path passed to a bunch of primitives
    quota: constify struct path in quota_on
    ...

    Linus Torvalds
     

06 Dec, 2016

1 commit


30 Nov, 2016

1 commit


11 Oct, 2016

1 commit

  • Pull misc vfs updates from Al Viro:
    "Assorted misc bits and pieces.

    There are several single-topic branches left after this (rename2
    series from Miklos, current_time series from Deepa Dinamani, xattr
    series from Andreas, uaccess stuff from from me) and I'd prefer to
    send those separately"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
    proc: switch auxv to use of __mem_open()
    hpfs: support FIEMAP
    cifs: get rid of unused arguments of CIFSSMBWrite()
    posix_acl: uapi header split
    posix_acl: xattr representation cleanups
    fs/aio.c: eliminate redundant loads in put_aio_ring_file
    fs/internal.h: add const to ns_dentry_operations declaration
    compat: remove compat_printk()
    fs/buffer.c: make __getblk_slow() static
    proc: unsigned file descriptors
    fs/file: more unsigned file descriptors
    fs: compat: remove redundant check of nr_segs
    cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
    cifs: don't use memcpy() to copy struct iov_iter
    get rid of separate multipage fault-in primitives
    fs: Avoid premature clearing of capabilities
    fs: Give dentry to inode_change_ok() instead of inode
    fuse: Propagate dentry down to inode_change_ok()
    ceph: Propagate dentry down to inode_change_ok()
    xfs: Propagate dentry down to inode_change_ok()
    ...

    Linus Torvalds
     

08 Oct, 2016

1 commit


28 Sep, 2016

1 commit


19 Sep, 2016

1 commit


16 Sep, 2016

1 commit

  • On overlayfs relatime_need_update() needs inode times to be correct on
    overlay inode. But i_mtime and i_ctime are updated by filesystem code on
    underlying inode only, so they will be out-of-date on the overlay inode.

    This patch copies the times from the underlying inode if needed. This
    can't be done if called from RCU lookup (link following) but link m/ctime
    are not updated by fs, so this is all right.

    This patch doesn't change functionality for anything but overlayfs.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

07 Aug, 2016

1 commit

  • Pull binfmt_misc update from James Bottomley:
    "This update is to allow architecture emulation containers to function
    such that the emulation binary can be housed outside the container
    itself. The container and fs parts both have acks from relevant
    experts.

    To use the new feature you have to add an F option to your binfmt_misc
    configuration"

    From the docs:
    "The usual behaviour of binfmt_misc is to spawn the binary lazily when
    the misc format file is invoked. However, this doesn't work very well
    in the face of mount namespaces and changeroots, so the F mode opens
    the binary as soon as the emulation is installed and uses the opened
    image to spawn the emulator, meaning it is always available once
    installed, regardless of how the environment changes"

    * tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc:
    binfmt_misc: add F option description to documentation
    binfmt_misc: add persistent opened binary handler for containers
    fs: add filp_clone_open API

    Linus Torvalds
     

04 Aug, 2016

1 commit


03 Aug, 2016

1 commit


28 Jul, 2016

1 commit

  • Pull xfs updates from Dave Chinner:
    "The major addition is the new iomap based block mapping
    infrastructure. We've been kicking this about locally for years, but
    there are other filesystems want to use it too (e.g. gfs2). Now it
    is fully working, reviewed and ready for merge and be used by other
    filesystems.

    There are a lot of other fixes and cleanups in the tree, but those are
    XFS internal things and none are of the scale or visibility of the
    iomap changes. See below for details.

    I am likely to send another pull request next week - we're just about
    ready to merge some new functionality (on disk block->owner reverse
    mapping infrastructure), but that's a huge chunk of code (74 files
    changed, 7283 insertions(+), 1114 deletions(-)) so I'm keeping that
    separate to all the "normal" pull request changes so they don't get
    lost in the noise.

    Summary of changes in this update:
    - generic iomap based IO path infrastructure
    - generic iomap based fiemap implementation
    - xfs iomap based Io path implementation
    - buffer error handling fixes
    - tracking of in flight buffer IO for unmount serialisation
    - direct IO and DAX io path separation and simplification
    - shortform directory format definition changes for wider platform
    compatibility
    - various buffer cache fixes
    - cleanups in preparation for rmap merge
    - error injection cleanups and fixes
    - log item format buffer memory allocation restructuring to prevent
    rare OOM reclaim deadlocks
    - sparse inode chunks are now fully supported"

    * tag 'xfs-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (53 commits)
    xfs: remove EXPERIMENTAL tag from sparse inode feature
    xfs: bufferhead chains are invalid after end_page_writeback
    xfs: allocate log vector buffers outside CIL context lock
    libxfs: directory node splitting does not have an extra block
    xfs: remove dax code from object file when disabled
    xfs: skip dirty pages in ->releasepage()
    xfs: remove __arch_pack
    xfs: kill xfs_dir2_inou_t
    xfs: kill xfs_dir2_sf_off_t
    xfs: split direct I/O and DAX path
    xfs: direct calls in the direct I/O path
    xfs: stop using generic_file_read_iter for direct I/O
    xfs: split xfs_file_read_iter into buffered and direct I/O helpers
    xfs: remove s_maxbytes enforcement in xfs_file_read_iter
    xfs: kill ioflags
    xfs: don't pass ioflags around in the ioctl path
    xfs: track and serialize in-flight async buffers against unmount
    xfs: exclude never-released buffers from buftarg I/O accounting
    xfs: don't reset b_retries to 0 on every failure
    xfs: remove extraneous buffer flag changes
    ...

    Linus Torvalds
     

21 Jun, 2016

1 commit

  • Add infrastructure for multipage buffered writes. This is implemented
    using an main iterator that applies an actor function to a range that
    can be written.

    This infrastucture is used to implement a buffered write helper, one
    to zero file ranges and one to implement the ->page_mkwrite VM
    operations. All of them borrow a fair amount of code from fs/buffers.
    for now by using an internal version of __block_write_begin that
    gets passed an iomap and builds the corresponding buffer head.

    The file system is gets a set of paired ->iomap_begin and ->iomap_end
    calls which allow it to map/reserve a range and get a notification
    once the write code is finished with it.

    Based on earlier code from Dave Chinner.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Bob Peterson
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

10 Jun, 2016

1 commit

  • d_walk() relies upon the tree not getting rearranged under it without
    rename_lock being touched. And we do grab rename_lock around the
    places that change the tree topology. Unfortunately, branch reordering
    is just as bad from d_walk() POV and we have two places that do it
    without touching rename_lock - one in handling of cursors (for ramfs-style
    directories) and another in autofs. autofs one is a separate story; this
    commit deals with the cursors.
    * mark cursor dentries explicitly at allocation time
    * make __dentry_kill() leave ->d_child.next pointing to the next
    non-cursor sibling, making sure that it won't be moved around unnoticed
    before the parent is relocked on ascend-to-parent path in d_walk().
    * make d_walk() skip cursors explicitly; strictly speaking it's
    not necessary (all callbacks we pass to d_walk() are no-ops on cursors),
    but it makes analysis easier.

    Signed-off-by: Al Viro

    Al Viro
     

31 Mar, 2016

1 commit

  • I need an API that allows me to obtain a clone of the current file
    pointer to pass in to an exec handler. I've labelled this as an
    internal API because I can't see how it would be useful outside of the
    fs subsystem. The use case will be a persistent binfmt_misc handler.

    Signed-off-by: James Bottomley
    Acked-by: Serge Hallyn
    Acked-by: Jan Kara

    James Bottomley
     

09 Jan, 2016

2 commits


04 Jan, 2016

1 commit


18 Aug, 2015

2 commits

  • There's a small consistency problem between the inode and writeback
    naming. Writeback calls the "for IO" inode queues b_io and
    b_more_io, but the inode calls these the "writeback list" or
    i_wb_list. This makes it hard to an new "under writeback" list to
    the inode, or call it an "under IO" list on the bdi because either
    way we'll have writeback on IO and IO on writeback and it'll just be
    confusing. I'm getting confused just writing this!

    So, rename the inode "for IO" list variable to i_io_list so we can
    add a new "writeback list" in a subsequent patch.

    Signed-off-by: Dave Chinner
    Signed-off-by: Josef Bacik
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Tested-by: Dave Chinner

    Dave Chinner
     
  • The process of reducing contention on per-superblock inode lists
    starts with moving the locking to match the per-superblock inode
    list. This takes the global lock out of the picture and reduces the
    contention problems to within a single filesystem. This doesn't get
    rid of contention as the locks still have global CPU scope, but it
    does isolate operations on different superblocks form each other.

    Signed-off-by: Dave Chinner
    Signed-off-by: Josef Bacik
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Tested-by: Dave Chinner

    Dave Chinner
     

19 Jun, 2015

1 commit

  • Make file->f_path always point to the overlay dentry so that the path in
    /proc/pid/fd is correct and to ensure that label-based LSMs have access to the
    overlay as well as the underlay (path-based LSMs probably don't need it).

    Using my union testsuite to set things up, before the patch I see:

    [root@andromeda union-testsuite]# bash 5 /a/foo107
    [root@andromeda union-testsuite]# stat /mnt/a/foo107
    ...
    Device: 23h/35d Inode: 13381 Links: 1
    ...
    [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
    ...
    Device: 23h/35d Inode: 13381 Links: 1
    ...

    After the patch:

    [root@andromeda union-testsuite]# bash 5 /mnt/a/foo107
    [root@andromeda union-testsuite]# stat /mnt/a/foo107
    ...
    Device: 23h/35d Inode: 40346 Links: 1
    ...
    [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
    ...
    Device: 23h/35d Inode: 40346 Links: 1
    ...

    Note the change in where /proc/$$/fd/5 points to in the ls command. It was
    pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
    (which is correct).

    The inode accessed, however, is the lower layer. The union layer is on device
    25h/37d and the upper layer on 24h/36d.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells