03 Apr, 2017

1 commit


03 Mar, 2017

1 commit

  • Add a system call to make extended file information available, including
    file creation and some attribute flags where available through the
    underlying filesystem.

    The getattr inode operation is altered to take two additional arguments: a
    u32 request_mask and an unsigned int flags that indicate the
    synchronisation mode. This change is propagated to the vfs_getattr*()
    function.

    Functions like vfs_stat() are now inline wrappers around new functions
    vfs_statx() and vfs_statx_fd() to reduce stack usage.

    ========
    OVERVIEW
    ========

    The idea was initially proposed as a set of xattrs that could be retrieved
    with getxattr(), but the general preference proved to be for a new syscall
    with an extended stat structure.

    A number of requests were gathered for features to be included. The
    following have been included:

    (1) Make the fields a consistent size on all arches and make them large.

    (2) Spare space, request flags and information flags are provided for
    future expansion.

    (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an
    __s64).

    (4) Creation time: The SMB protocol carries the creation time, which could
    be exported by Samba, which will in turn help CIFS make use of
    FS-Cache as that can be used for coherency data (stx_btime).

    This is also specified in NFSv4 as a recommended attribute and could
    be exported by NFSD [Steve French].

    (5) Lightweight stat: Ask for just those details of interest, and allow a
    netfs (such as NFS) to approximate anything not of interest, possibly
    without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
    Dilger] (AT_STATX_DONT_SYNC).

    (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks
    its cached attributes are up to date [Trond Myklebust]
    (AT_STATX_FORCE_SYNC).

    And the following have been left out for future extension:

    (7) Data version number: Could be used by userspace NFS servers [Aneesh
    Kumar].

    Can also be used to modify fill_post_wcc() in NFSD which retrieves
    i_version directly, but has just called vfs_getattr(). It could get
    it from the kstat struct if it used vfs_xgetattr() instead.

    (There's disagreement on the exact semantics of a single field, since
    not all filesystems do this the same way).

    (8) BSD stat compatibility: Including more fields from the BSD stat such
    as creation time (st_btime) and inode generation number (st_gen)
    [Jeremy Allison, Bernd Schubert].

    (9) Inode generation number: Useful for FUSE and userspace NFS servers
    [Bernd Schubert].

    (This was asked for but later deemed unnecessary with the
    open-by-handle capability available and caused disagreement as to
    whether it's a security hole or not).

    (10) Extra coherency data may be useful in making backups [Andreas Dilger].

    (No particular data were offered, but things like last backup
    timestamp, the data version number and the DOS archive bit would come
    into this category).

    (11) Allow the filesystem to indicate what it can/cannot provide: A
    filesystem can now say it doesn't support a standard stat feature if
    that isn't available, so if, for instance, inode numbers or UIDs don't
    exist or are fabricated locally...

    (This requires a separate system call - I have an fsinfo() call idea
    for this).

    (12) Store a 16-byte volume ID in the superblock that can be returned in
    struct xstat [Steve French].

    (Deferred to fsinfo).

    (13) Include granularity fields in the time data to indicate the
    granularity of each of the times (NFSv4 time_delta) [Steve French].

    (Deferred to fsinfo).

    (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags.
    Note that the Linux IOC flags are a mess and filesystems such as Ext4
    define flags that aren't in linux/fs.h, so translation in the kernel
    may be a necessity (or, possibly, we provide the filesystem type too).

    (Some attributes are made available in stx_attributes, but the general
    feeling was that the IOC flags were to ext[234]-specific and shouldn't
    be exposed through statx this way).

    (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
    Michael Kerrisk].

    (Deferred, probably to fsinfo. Finding out if there's an ACL or
    seclabal might require extra filesystem operations).

    (16) Femtosecond-resolution timestamps [Dave Chinner].

    (A __reserved field has been left in the statx_timestamp struct for
    this - if there proves to be a need).

    (17) A set multiple attributes syscall to go with this.

    ===============
    NEW SYSTEM CALL
    ===============

    The new system call is:

    int ret = statx(int dfd,
    const char *filename,
    unsigned int flags,
    unsigned int mask,
    struct statx *buffer);

    The dfd, filename and flags parameters indicate the file to query, in a
    similar way to fstatat(). There is no equivalent of lstat() as that can be
    emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is
    also no equivalent of fstat() as that can be emulated by passing a NULL
    filename to statx() with the fd of interest in dfd.

    Whether or not statx() synchronises the attributes with the backing store
    can be controlled by OR'ing a value into the flags argument (this typically
    only affects network filesystems):

    (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
    respect.

    (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
    its attributes with the server - which might require data writeback to
    occur to get the timestamps correct.

    (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
    network filesystem. The resulting values should be considered
    approximate.

    mask is a bitmask indicating the fields in struct statx that are of
    interest to the caller. The user should set this to STATX_BASIC_STATS to
    get the basic set returned by stat(). It should be noted that asking for
    more information may entail extra I/O operations.

    buffer points to the destination for the data. This must be 256 bytes in
    size.

    ======================
    MAIN ATTRIBUTES RECORD
    ======================

    The following structures are defined in which to return the main attribute
    set:

    struct statx_timestamp {
    __s64 tv_sec;
    __s32 tv_nsec;
    __s32 __reserved;
    };

    struct statx {
    __u32 stx_mask;
    __u32 stx_blksize;
    __u64 stx_attributes;
    __u32 stx_nlink;
    __u32 stx_uid;
    __u32 stx_gid;
    __u16 stx_mode;
    __u16 __spare0[1];
    __u64 stx_ino;
    __u64 stx_size;
    __u64 stx_blocks;
    __u64 __spare1[1];
    struct statx_timestamp stx_atime;
    struct statx_timestamp stx_btime;
    struct statx_timestamp stx_ctime;
    struct statx_timestamp stx_mtime;
    __u32 stx_rdev_major;
    __u32 stx_rdev_minor;
    __u32 stx_dev_major;
    __u32 stx_dev_minor;
    __u64 __spare2[14];
    };

    The defined bits in request_mask and stx_mask are:

    STATX_TYPE Want/got stx_mode & S_IFMT
    STATX_MODE Want/got stx_mode & ~S_IFMT
    STATX_NLINK Want/got stx_nlink
    STATX_UID Want/got stx_uid
    STATX_GID Want/got stx_gid
    STATX_ATIME Want/got stx_atime{,_ns}
    STATX_MTIME Want/got stx_mtime{,_ns}
    STATX_CTIME Want/got stx_ctime{,_ns}
    STATX_INO Want/got stx_ino
    STATX_SIZE Want/got stx_size
    STATX_BLOCKS Want/got stx_blocks
    STATX_BASIC_STATS [The stuff in the normal stat struct]
    STATX_BTIME Want/got stx_btime{,_ns}
    STATX_ALL [All currently available stuff]

    stx_btime is the file creation time, stx_mask is a bitmask indicating the
    data provided and __spares*[] are where as-yet undefined fields can be
    placed.

    Time fields are structures with separate seconds and nanoseconds fields
    plus a reserved field in case we want to add even finer resolution. Note
    that times will be negative if before 1970; in such a case, the nanosecond
    fields will also be negative if not zero.

    The bits defined in the stx_attributes field convey information about a
    file, how it is accessed, where it is and what it does. The following
    attributes map to FS_*_FL flags and are the same numerical value:

    STATX_ATTR_COMPRESSED File is compressed by the fs
    STATX_ATTR_IMMUTABLE File is marked immutable
    STATX_ATTR_APPEND File is append-only
    STATX_ATTR_NODUMP File is not to be dumped
    STATX_ATTR_ENCRYPTED File requires key to decrypt in fs

    Within the kernel, the supported flags are listed by:

    KSTAT_ATTR_FS_IOC_FLAGS

    [Are any other IOC flags of sufficient general interest to be exposed
    through this interface?]

    New flags include:

    STATX_ATTR_AUTOMOUNT Object is an automount trigger

    These are for the use of GUI tools that might want to mark files specially,
    depending on what they are.

    Fields in struct statx come in a number of classes:

    (0) stx_dev_*, stx_blksize.

    These are local system information and are always available.

    (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
    stx_size, stx_blocks.

    These will be returned whether the caller asks for them or not. The
    corresponding bits in stx_mask will be set to indicate whether they
    actually have valid values.

    If the caller didn't ask for them, then they may be approximated. For
    example, NFS won't waste any time updating them from the server,
    unless as a byproduct of updating something requested.

    If the values don't actually exist for the underlying object (such as
    UID or GID on a DOS file), then the bit won't be set in the stx_mask,
    even if the caller asked for the value. In such a case, the returned
    value will be a fabrication.

    Note that there are instances where the type might not be valid, for
    instance Windows reparse points.

    (2) stx_rdev_*.

    This will be set only if stx_mode indicates we're looking at a
    blockdev or a chardev, otherwise will be 0.

    (3) stx_btime.

    Similar to (1), except this will be set to 0 if it doesn't exist.

    =======
    TESTING
    =======

    The following test program can be used to test the statx system call:

    samples/statx/test-statx.c

    Just compile and run, passing it paths to the files you want to examine.
    The file is built automatically if CONFIG_SAMPLES is enabled.

    Here's some example output. Firstly, an NFS directory that crosses to
    another FSID. Note that the AUTOMOUNT attribute is set because transiting
    this directory will cause d_automount to be invoked by the VFS.

    [root@andromeda ~]# /tmp/test-statx -A /warthog/data
    statx(/warthog/data) = 0
    results=7ff
    Size: 4096 Blocks: 8 IO Block: 1048576 directory
    Device: 00:26 Inode: 1703937 Links: 125
    Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041
    Access: 2016-11-24 09:02:12.219699527+0000
    Modify: 2016-11-17 10:44:36.225653653+0000
    Change: 2016-11-17 10:44:36.225653653+0000
    Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------)

    Secondly, the result of automounting on that directory.

    [root@andromeda ~]# /tmp/test-statx /warthog/data
    statx(/warthog/data) = 0
    results=7ff
    Size: 4096 Blocks: 8 IO Block: 1048576 directory
    Device: 00:27 Inode: 2 Links: 125
    Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041
    Access: 2016-11-24 09:02:12.219699527+0000
    Modify: 2016-11-17 10:44:36.225653653+0000
    Change: 2016-11-17 10:44:36.225653653+0000

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

18 Jan, 2017

2 commits

  • The helper xfs_dentry_to_name() is used by 2 different
    classes of callers: Callers that pass zero mode and don't care
    about the returned name.type field and Callers that pass
    non zero mode and do care about the name.type field.

    Change xfs_dentry_to_name() to not take the mode argument and
    change the call sites of the first class to not pass the mode
    argument.

    Create a new helper xfs_dentry_mode_to_name() which does pass
    the mode argument and returns -EFSCORRUPTED if mode is invalid.
    Callers that translate non zero mode to on-disk file type now
    check the return value and will export the error to user instead
    of staging an invalid file type to be written to directory entry.

    Signed-off-by: Amir Goldstein
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Amir Goldstein
     
  • The size of the xfs_mode_to_ftype[] conversion table
    was too small to handle an invalid value of mode=S_IFMT.

    Instead of fixing the table size, replace the conversion table
    with a conversion helper that uses a switch statement.

    Suggested-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Amir Goldstein
    Signed-off-by: Darrick J. Wong

    Amir Goldstein
     

18 Dec, 2016

1 commit

  • …/linux/kernel/git/mszeredi/vfs

    Pull partial readlink cleanups from Miklos Szeredi.

    This is the uncontroversial part of the readlink cleanup patch-set that
    simplifies the default readlink handling.

    Miklos and Al are still discussing the rest of the series.

    * git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    vfs: make generic_readlink() static
    vfs: remove ".readlink = generic_readlink" assignments
    vfs: default to generic_readlink()
    vfs: replace calling i_op->readlink with vfs_readlink()
    proc/self: use generic_readlink
    ecryptfs: use vfs_get_link()
    bad_inode: add missing i_op initializers

    Linus Torvalds
     

09 Dec, 2016

2 commits

  • If .readlink == NULL implies generic_readlink().

    Generated by:

    to_del="\.readlink.*=.*generic_readlink"
    for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; done

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Commit 6552321831dc ("xfs: remove i_iolock and use i_rwsem in the
    VFS inode instead") introduced a regression that truncate(2) doesn't
    check on new size, so it succeeds even if the new size exceeds the
    current resource limit. Because xfs_setattr_size() was used instead
    of xfs_vn_setattr_size(), and the latter calls xfs_vn_change_ok()
    first to do sanity check on permission and new size.

    This is found by truncate03 test from ltp, and the following is a
    simplified reproducer:

    #!/bin/bash
    dev=/dev/sda5
    mnt=/mnt/xfs

    mkfs -t xfs -f $dev
    mount $dev $mnt

    # set max file size to 16k
    ulimit -f 16
    truncate -s $((16 * 1024 + 1)) /mnt/xfs/testfile
    [ $? -eq 0 ] && echo "FAIL: truncate exceeded max file size"
    ulimit -f unlimited
    umount $mnt

    Signed-off-by: Eryu Guan
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Eryu Guan
     

30 Nov, 2016

1 commit

  • This patch drops the XFS-own i_iolock and uses the VFS i_rwsem which
    recently replaced i_mutex instead. This means we only have to take
    one lock instead of two in many fast path operations, and we can
    also shrink the xfs_inode structure. Thanks to the xfs_ilock family
    there is very little churn, the only thing of note is that we need
    to switch to use the lock_two_directory helper for taking the i_rwsem
    on two inodes in a few places to make sure our lock order matches
    the one used in the VFS.

    Signed-off-by: Christoph Hellwig
    Tested-by: Jens Axboe
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

14 Oct, 2016

1 commit

  • …kernel/git/dgc/linux-xfs

    < XFS has gained super CoW powers! >
    ----------------------------------
    \ ^__^
    \ (oo)\_______
    (__)\ )\/\
    ||----w |
    || ||

    Pull XFS support for shared data extents from Dave Chinner:
    "This is the second part of the XFS updates for this merge cycle. This
    pullreq contains the new shared data extents feature for XFS.

    Given the complexity and size of this change I am expecting - like the
    addition of reverse mapping last cycle - that there will be some
    follow-up bug fixes and cleanups around the -rc3 stage for issues that
    I'm sure will show up once the code hits a wider userbase.

    What it is:

    At the most basic level we are simply adding shared data extents to
    XFS - i.e. a single extent on disk can now have multiple owners. To do
    this we have to add new on-disk features to both track the shared
    extents and the number of times they've been shared. This is done by
    the new "refcount" btree that sits in every allocation group. When we
    share or unshare an extent, this tree gets updated.

    Along with this new tree, the reverse mapping tree needs to be updated
    to track each owner or a shared extent. This also needs to be updated
    ever share/unshare operation. These interactions at extent allocation
    and freeing time have complex ordering and recovery constraints, so
    there's a significant amount of new intent-based transaction code to
    ensure that operations are performed atomically from both the runtime
    and integrity/crash recovery perspectives.

    We also need to break sharing when writes hit a shared extent - this
    is where the new copy-on-write implementation comes in. We allocate
    new storage and copy the original data along with the overwrite data
    into the new location. We only do this for data as we don't share
    metadata at all - each inode has it's own metadata that tracks the
    shared data extents, the extents undergoing CoW and it's own private
    extents.

    Of course, being XFS, nothing is simple - we use delayed allocation
    for CoW similar to how we use it for normal writes. ENOSPC is a
    significant issue here - we build on the reservation code added in
    4.8-rc1 with the reverse mapping feature to ensure we don't get
    spurious ENOSPC issues part way through a CoW operation. These
    mechanisms also help minimise fragmentation due to repeated CoW
    operations. To further reduce fragmentation overhead, we've also
    introduced a CoW extent size hint, which indicates how large a region
    we should allocate when we execute a CoW operation.

    With all this functionality in place, we can hook up .copy_file_range,
    .clone_file_range and .dedupe_file_range and we gain all the
    capabilities of reflink and other vfs provided functionality that
    enable manipulation to shared extents. We also added a fallocate mode
    that explicitly unshares a range of a file, which we implemented as an
    explicit CoW of all the shared extents in a file.

    As such, it's a huge chunk of new functionality with new on-disk
    format features and internal infrastructure. It warns at mount time as
    an experimental feature and that it may eat data (as we do with all
    new on-disk features until they stabilise). We have not released
    userspace suport for it yet - userspace support currently requires
    download from Darrick's xfsprogs repo and build from source, so the
    access to this feature is really developer/tester only at this point.
    Initial userspace support will be released at the same time the kernel
    with this code in it is released.

    The new code causes 5-6 new failures with xfstests - these aren't
    serious functional failures but things the output of tests changing
    slightly due to perturbations in layouts, space usage, etc. OTOH,
    we've added 150+ new tests to xfstests that specifically exercise this
    new functionality so it's got far better test coverage than any
    functionality we've previously added to XFS.

    Darrick has done a pretty amazing job getting us to this stage, and
    special mention also needs to go to Christoph (review, testing,
    improvements and bug fixes) and Brian (caught several intricate bugs
    during review) for the effort they've also put in.

    Summary:

    - unshare range (FALLOC_FL_UNSHARE) support for fallocate

    - copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr
    interface

    - shared extent support for XFS

    - copy-on-write support for shared extents

    - copy_file_range support

    - clone_file_range support (implements reflink)

    - dedupe_file_range support

    - defrag support for reverse mapping enabled filesystems"

    * tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits)
    xfs: convert COW blocks to real blocks before unwritten extent conversion
    xfs: rework refcount cow recovery error handling
    xfs: clear reflink flag if setting realtime flag
    xfs: fix error initialization
    xfs: fix label inaccuracies
    xfs: remove isize check from unshare operation
    xfs: reduce stack usage of _reflink_clear_inode_flag
    xfs: check inode reflink flag before calling reflink functions
    xfs: implement swapext for rmap filesystems
    xfs: refactor swapext code
    xfs: various swapext cleanups
    xfs: recognize the reflink feature bit
    xfs: simulate per-AG reservations being critically low
    xfs: don't mix reflink and DAX mode for now
    xfs: check for invalid inode reflink flags
    xfs: set a default CoW extent size of 32 blocks
    xfs: convert unwritten status of reverse mappings for shared files
    xfs: use interval query for rmap alloc operations on shared files
    xfs: add shared rmap map/unmap/convert log item types
    xfs: increase log reservations for reflink
    ...

    Linus Torvalds
     

11 Oct, 2016

3 commits

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     
  • Al Viro
     
  • Pull vfs xattr updates from Al Viro:
    "xattr stuff from Andreas

    This completes the switch to xattr_handler ->get()/->set() from
    ->getxattr/->setxattr/->removexattr"

    * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Remove {get,set,remove}xattr inode operations
    xattr: Stop calling {get,set,remove}xattr inode operations
    vfs: Check for the IOP_XATTR flag in listxattr
    xattr: Add __vfs_{get,set,remove}xattr helpers
    libfs: Use IOP_XATTR flag for empty directory handling
    vfs: Use IOP_XATTR flag for bad-inode handling
    vfs: Add IOP_XATTR inode operations flag
    vfs: Move xattr_resolve_name to the front of fs/xattr.c
    ecryptfs: Switch to generic xattr handlers
    sockfs: Get rid of getxattr iop
    sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
    kernfs: Switch to generic xattr handlers
    hfs: Switch to generic xattr handlers
    jffs2: Remove jffs2_{get,set,remove}xattr macros
    xattr: Remove unnecessary NULL attribute name check

    Linus Torvalds
     

08 Oct, 2016

2 commits


06 Oct, 2016

1 commit


28 Sep, 2016

1 commit

  • current_fs_time() uses struct super_block* as an argument.
    As per Linus's suggestion, this is changed to take struct
    inode* as a parameter instead. This is because the function
    is primarily meant for vfs inode timestamps.
    Also the function was renamed as per Arnd's suggestion.

    Change all calls to current_fs_time() to use the new
    current_time() function instead. current_fs_time() will be
    deleted.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Al Viro

    Deepa Dinamani
     

27 Sep, 2016

1 commit


22 Sep, 2016

2 commits

  • inode_change_ok() will be resposible for clearing capabilities and IMA
    extended attributes and as such will need dentry. Give it as an argument
    to inode_change_ok() instead of an inode. Also rename inode_change_ok()
    to setattr_prepare() to better relect that it does also some
    modifications in addition to checks.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     
  • To avoid clearing of capabilities or security related extended
    attributes too early, inode_change_ok() will need to take dentry instead
    of inode. Propagate dentry down to functions calling inode_change_ok().
    This is rather straightforward except for xfs_set_mode() function which
    does not have dentry easily available. Luckily that function does not
    call inode_change_ok() anyway so we just have to do a little dance with
    function prototypes.

    Acked-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     

17 Aug, 2016

1 commit


21 Jun, 2016

4 commits

  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • Note that this removes support for the untested FIEMAP_FLAG_XATTR. It
    could be added relatively easily with iomap ops for the attr fork, but
    without test coverage I don't feel safe doing this.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • Convert XFS to use the new iomap based multipage write path. This involves
    implementing the ->iomap_begin and ->iomap_end methods, and switching the
    buffered file write, page_mkwrite and xfs_iozero paths to the new iomap
    helpers.

    With this change __xfs_get_blocks will never be used for buffered writes,
    and the code handling them can be removed.

    Based on earlier code from Dave Chinner.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Bob Peterson
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • Currently zeroing out blocks and waiting for writeout is a bit of a mess in
    truncate. This patch gives it a clear order in preparation for the iomap
    path:

    (1) we first wait for any direct I/O to complete to prevent any races
    for it
    (2) we then perform the actual zeroing, and only use the truncate_page
    helpers for truncating down. The truncate up case already is
    handled by the separate call to xfs_zero_eof.
    (3) only then we write back dirty data, as zeroing block may cause
    dirty pages when using either xfs_zero_eof or the new iomap
    infrastructure.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Bob Peterson
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

20 May, 2016

1 commit


06 Apr, 2016

3 commits

  • Merge xfs_trans_reserve and xfs_trans_alloc into a single function call
    that returns a transaction with all the required log and block reservations,
    and which allows passing transaction flags directly to avoid the cumbersome
    _xfs_trans_alloc interface.

    While we're at it we also get rid of the transaction type argument that has
    been superflous since we stopped supporting the non-CIL logging mode. The
    guts of it will be removed in another patch.

    [dchinner: fixed transaction leak in error path in xfs_setattr_nonsize]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • By overallocating the in-core inode fork data buffer and zero
    terminating the link target in xfs_init_local_fork we can avoid
    the memory allocation in ->follow_link.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • In the next patch we'll set up different inode operations for inline vs
    out of line symlinks, for that we need to make sure the flags are already
    set up properly.

    [dchinner: added xfs_setup_iops() call to xfs_rename_alloc_whiteout()]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

07 Mar, 2016

1 commit


01 Mar, 2016

2 commits

  • If the block size of a filesystem is not at least PAGE_SIZEd, then
    at this point in time DAX cannot be used due to the fact we can't
    guarantee extents are page sized or aligned without further work.
    Hence disallow setting the DAX flag on an inode if the block size is
    too small. Also, be defensive and check the block size when reading
    an inode in off disk.

    In future, we want to allow DAX to work on any filesystem, so this
    is temporary while we sort of the correct conbination of extent size
    hints and allocation alignment configurations needed to guarantee
    page sized and aligned extent allocation for DAX enabled files.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Tested-by: Ross Zwisler
    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • Only regular files can use DAX for data operations, so we should
    restrict setting it on the VFS inode to regular files. Setting it on
    metadata inodes may cause the VFS to do the wrong thing for such
    inodes, so avoid potential problems by restricting the scope of the
    flag to what we know is supported.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Tested-by: Ross Zwisler
    Signed-off-by: Dave Chinner

    Dave Chinner
     

09 Feb, 2016

4 commits

  • Move the di_mode value from the xfs_icdinode to the VFS inode, reducing
    the xfs_icdinode byte another 2 bytes and collapsing another 2 byte hole
    in the structure.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • Pull another 4 bytes out of the xfs_icdinode.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • The VFS tracks the inode nlink just like the xfs_icdinode. We can
    remove the variable from the icdinode and use the VFS inode variable
    everywhere, reducing the size of the xfs_icdinode by a further 4
    bytes.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • The struct xfs_inode has two copies of the current timestamps in it,
    one in the vfs inode and one in the struct xfs_icdinode. Now that we
    no longer log the struct xfs_icdinode directly, we don't need to
    keep the timestamps in this structure. instead we can copy them
    straight out of the VFS inode when formatting the inode log item or
    the on-disk inode.

    This reduces the struct xfs_inode in size by 24 bytes.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Dave Chinner
     

23 Jan, 2016

1 commit

  • Pull more xfs updates from Dave Chinner:
    "This is the second update for XFS that I mentioned in the original
    pull request last week.

    It contains a revert for a suspend regression in 4.4 and a fix for a
    long standing log recovery issue that has been further exposed by all
    the log recovery changes made in the original 4.5 merge.

    There is one more thing in this pull request - one that I forgot to
    merge into the origin. That is, pulling the XFS_IOC_FS[GS]ETXATTR
    ioctl up to the VFS level so that other filesystems can also use it
    for modifying project quota IDs

    Summary:

    - promotion of XFS_IOC_FS[GS]ETXATTR ioctl to the vfs level so that
    it can be shared with other filesystems. The ext4 project quota
    functionality is the first target for this. The commits in this
    series have not been updated with review or final SOB tags because
    the branch they were originally published in was needed by ext4.
    Those tags are:

    Reviewed-by: Theodore Ts'o
    Signed-off-by: Dave Chinner

    - Revert a change that is causing suspend failures.

    - Fix a use-after-free that can occur on log mount failures. Been
    around forever, but now exposed by other changes to log recovery
    made in the first 4.5 merge"

    * tag 'xfs-for-linus-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
    xfs: log mount failures don't wait for buffers to be released
    Revert "xfs: clear PF_NOFREEZE for xfsaild kthread"
    xfs: introduce per-inode DAX enablement
    xfs: use FS_XFLAG definitions directly
    fs: XFS_IOC_FS[SG]SETXATTR to FS_IOC_FS[SG]ETXATTR promotion

    Linus Torvalds
     

04 Jan, 2016

1 commit

  • Rather than just being able to turn DAX on and off via a mount
    option, some applications may only want to enable DAX for certain
    performance critical files in a filesystem.

    This patch introduces a new inode flag to enable DAX in the v3 inode
    di_flags2 field. It adds support for setting and clearing flags in
    the di_flags2 field via the XFS_IOC_FSSETXATTR ioctl, and sets the
    S_DAX inode flag appropriately when it is seen.

    When this flag is set on a directory, it acts as an "inherit flag".
    That is, inodes created in the directory will automatically inherit
    the on-disk inode DAX flag, enabling administrators to set up
    directory heirarchies that automatically use DAX. Setting this flag
    on an empty root directory will make the entire filesystem use DAX
    by default.

    Signed-off-by: Dave Chinner

    Dave Chinner
     

31 Dec, 2015

1 commit


09 Dec, 2015

1 commit

  • new method: ->get_link(); replacement of ->follow_link(). The differences
    are:
    * inode and dentry are passed separately
    * might be called both in RCU and non-RCU mode;
    the former is indicated by passing it a NULL dentry.
    * when called that way it isn't allowed to block
    and should return ERR_PTR(-ECHILD) if it needs to be called
    in non-RCU mode.

    It's a flagday change - the old method is gone, all in-tree instances
    converted. Conversion isn't hard; said that, so far very few instances
    do not immediately bail out when called in RCU mode. That'll change
    in the next commits.

    Signed-off-by: Al Viro

    Al Viro
     

12 Oct, 2015

1 commit

  • This patch modifies the stats counting macros and the callers
    to those macros to properly increment, decrement, and add-to
    the xfs stats counts. The counts for global and per-fs stats
    are correctly advanced, and cleared by writing a "1" to the
    corresponding clear file.

    global counts: /sys/fs/xfs/stats/stats
    per-fs counts: /sys/fs/xfs/sda*/stats/stats

    global clear: /sys/fs/xfs/stats/stats_clear
    per-fs clear: /sys/fs/xfs/sda*/stats/stats_clear

    [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
    stats structures and macros. ]

    Signed-off-by: Bill O'Donnell
    Reviewed-by: Eric Sandeen
    Signed-off-by: Dave Chinner

    Bill O'Donnell