Eric Lee / smarc-fsl-linux-kernel

03 Apr, 2017

1 commit

5f955f26f xfs: report crtime and attribute flags to statx ... Browse Code »

statx has the ability to report inode creation times and inode flags, so
hook up di_crtime and di_flags to that functionality.

Signed-off-by: Darrick J. Wong
Signed-off-by: David Howells
Signed-off-by: Al Viro

Darrick J. Wong
2017-04-03 13:05:59 +0800

03 Mar, 2017

1 commit

a528d35e8 statx: Add a system call to make enhanced file info available ... Browse Code »

Add a system call to make extended file information available, including
file creation and some attribute flags where available through the
underlying filesystem.

The getattr inode operation is altered to take two additional arguments: a
u32 request_mask and an unsigned int flags that indicate the
synchronisation mode. This change is propagated to the vfs_getattr*()
function.

Functions like vfs_stat() are now inline wrappers around new functions
vfs_statx() and vfs_statx_fd() to reduce stack usage.

========
OVERVIEW
========

The idea was initially proposed as a set of xattrs that could be retrieved
with getxattr(), but the general preference proved to be for a new syscall
with an extended stat structure.

A number of requests were gathered for features to be included. The
following have been included:

(1) Make the fields a consistent size on all arches and make them large.

(2) Spare space, request flags and information flags are provided for
future expansion.

(3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an
__s64).

(4) Creation time: The SMB protocol carries the creation time, which could
be exported by Samba, which will in turn help CIFS make use of
FS-Cache as that can be used for coherency data (stx_btime).

This is also specified in NFSv4 as a recommended attribute and could
be exported by NFSD [Steve French].

(5) Lightweight stat: Ask for just those details of interest, and allow a
netfs (such as NFS) to approximate anything not of interest, possibly
without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
Dilger] (AT_STATX_DONT_SYNC).

(6) Heavyweight stat: Force a netfs to go to the server, even if it thinks
its cached attributes are up to date [Trond Myklebust]
(AT_STATX_FORCE_SYNC).

And the following have been left out for future extension:

(7) Data version number: Could be used by userspace NFS servers [Aneesh
Kumar].

Can also be used to modify fill_post_wcc() in NFSD which retrieves
i_version directly, but has just called vfs_getattr(). It could get
it from the kstat struct if it used vfs_xgetattr() instead.

(There's disagreement on the exact semantics of a single field, since
not all filesystems do this the same way).

(8) BSD stat compatibility: Including more fields from the BSD stat such
as creation time (st_btime) and inode generation number (st_gen)
[Jeremy Allison, Bernd Schubert].

(9) Inode generation number: Useful for FUSE and userspace NFS servers
[Bernd Schubert].

(This was asked for but later deemed unnecessary with the
open-by-handle capability available and caused disagreement as to
whether it's a security hole or not).

(10) Extra coherency data may be useful in making backups [Andreas Dilger].

(No particular data were offered, but things like last backup
timestamp, the data version number and the DOS archive bit would come
into this category).

(11) Allow the filesystem to indicate what it can/cannot provide: A
filesystem can now say it doesn't support a standard stat feature if
that isn't available, so if, for instance, inode numbers or UIDs don't
exist or are fabricated locally...

(This requires a separate system call - I have an fsinfo() call idea
for this).

(12) Store a 16-byte volume ID in the superblock that can be returned in
struct xstat [Steve French].

(Deferred to fsinfo).

(13) Include granularity fields in the time data to indicate the
granularity of each of the times (NFSv4 time_delta) [Steve French].

(Deferred to fsinfo).

(14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags.
Note that the Linux IOC flags are a mess and filesystems such as Ext4
define flags that aren't in linux/fs.h, so translation in the kernel
may be a necessity (or, possibly, we provide the filesystem type too).

(Some attributes are made available in stx_attributes, but the general
feeling was that the IOC flags were to ext[234]-specific and shouldn't
be exposed through statx this way).

(15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
Michael Kerrisk].

(Deferred, probably to fsinfo. Finding out if there's an ACL or
seclabal might require extra filesystem operations).

(16) Femtosecond-resolution timestamps [Dave Chinner].

(A __reserved field has been left in the statx_timestamp struct for
this - if there proves to be a need).

(17) A set multiple attributes syscall to go with this.

===============
NEW SYSTEM CALL
===============

The new system call is:

int ret = statx(int dfd,
const char *filename,
unsigned int flags,
unsigned int mask,
struct statx *buffer);

The dfd, filename and flags parameters indicate the file to query, in a
similar way to fstatat(). There is no equivalent of lstat() as that can be
emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is
also no equivalent of fstat() as that can be emulated by passing a NULL
filename to statx() with the fd of interest in dfd.

Whether or not statx() synchronises the attributes with the backing store
can be controlled by OR'ing a value into the flags argument (this typically
only affects network filesystems):

(1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
respect.

(2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
its attributes with the server - which might require data writeback to
occur to get the timestamps correct.

(3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
network filesystem. The resulting values should be considered
approximate.

mask is a bitmask indicating the fields in struct statx that are of
interest to the caller. The user should set this to STATX_BASIC_STATS to
get the basic set returned by stat(). It should be noted that asking for
more information may entail extra I/O operations.

buffer points to the destination for the data. This must be 256 bytes in
size.

======================
MAIN ATTRIBUTES RECORD
======================

The following structures are defined in which to return the main attribute
set:

struct statx_timestamp {
__s64 tv_sec;
__s32 tv_nsec;
__s32 __reserved;
};

struct statx {
__u32 stx_mask;
__u32 stx_blksize;
__u64 stx_attributes;
__u32 stx_nlink;
__u32 stx_uid;
__u32 stx_gid;
__u16 stx_mode;
__u16 __spare0[1];
__u64 stx_ino;
__u64 stx_size;
__u64 stx_blocks;
__u64 __spare1[1];
struct statx_timestamp stx_atime;
struct statx_timestamp stx_btime;
struct statx_timestamp stx_ctime;
struct statx_timestamp stx_mtime;
__u32 stx_rdev_major;
__u32 stx_rdev_minor;
__u32 stx_dev_major;
__u32 stx_dev_minor;
__u64 __spare2[14];
};

The defined bits in request_mask and stx_mask are:

STATX_TYPE Want/got stx_mode & S_IFMT
STATX_MODE Want/got stx_mode & ~S_IFMT
STATX_NLINK Want/got stx_nlink
STATX_UID Want/got stx_uid
STATX_GID Want/got stx_gid
STATX_ATIME Want/got stx_atime{,_ns}
STATX_MTIME Want/got stx_mtime{,_ns}
STATX_CTIME Want/got stx_ctime{,_ns}
STATX_INO Want/got stx_ino
STATX_SIZE Want/got stx_size
STATX_BLOCKS Want/got stx_blocks
STATX_BASIC_STATS [The stuff in the normal stat struct]
STATX_BTIME Want/got stx_btime{,_ns}
STATX_ALL [All currently available stuff]

stx_btime is the file creation time, stx_mask is a bitmask indicating the
data provided and __spares*[] are where as-yet undefined fields can be
placed.

Time fields are structures with separate seconds and nanoseconds fields
plus a reserved field in case we want to add even finer resolution. Note
that times will be negative if before 1970; in such a case, the nanosecond
fields will also be negative if not zero.

The bits defined in the stx_attributes field convey information about a
file, how it is accessed, where it is and what it does. The following
attributes map to FS_*_FL flags and are the same numerical value:

STATX_ATTR_COMPRESSED File is compressed by the fs
STATX_ATTR_IMMUTABLE File is marked immutable
STATX_ATTR_APPEND File is append-only
STATX_ATTR_NODUMP File is not to be dumped
STATX_ATTR_ENCRYPTED File requires key to decrypt in fs

Within the kernel, the supported flags are listed by:

KSTAT_ATTR_FS_IOC_FLAGS

[Are any other IOC flags of sufficient general interest to be exposed
through this interface?]

New flags include:

STATX_ATTR_AUTOMOUNT Object is an automount trigger

These are for the use of GUI tools that might want to mark files specially,
depending on what they are.

Fields in struct statx come in a number of classes:

(0) stx_dev_*, stx_blksize.

These are local system information and are always available.

(1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
stx_size, stx_blocks.

These will be returned whether the caller asks for them or not. The
corresponding bits in stx_mask will be set to indicate whether they
actually have valid values.

If the caller didn't ask for them, then they may be approximated. For
example, NFS won't waste any time updating them from the server,
unless as a byproduct of updating something requested.

If the values don't actually exist for the underlying object (such as
UID or GID on a DOS file), then the bit won't be set in the stx_mask,
even if the caller asked for the value. In such a case, the returned
value will be a fabrication.

Note that there are instances where the type might not be valid, for
instance Windows reparse points.

(2) stx_rdev_*.

This will be set only if stx_mode indicates we're looking at a
blockdev or a chardev, otherwise will be 0.

(3) stx_btime.

Similar to (1), except this will be set to 0 if it doesn't exist.

=======
TESTING
=======

The following test program can be used to test the statx system call:

samples/statx/test-statx.c

Just compile and run, passing it paths to the files you want to examine.
The file is built automatically if CONFIG_SAMPLES is enabled.

Here's some example output. Firstly, an NFS directory that crosses to
another FSID. Note that the AUTOMOUNT attribute is set because transiting
this directory will cause d_automount to be invoked by the VFS.

[root@andromeda ~]# /tmp/test-statx -A /warthog/data
statx(/warthog/data) = 0
results=7ff
Size: 4096 Blocks: 8 IO Block: 1048576 directory
Device: 00:26 Inode: 1703937 Links: 125
Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041
Access: 2016-11-24 09:02:12.219699527+0000
Modify: 2016-11-17 10:44:36.225653653+0000
Change: 2016-11-17 10:44:36.225653653+0000
Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------)

Secondly, the result of automounting on that directory.

[root@andromeda ~]# /tmp/test-statx /warthog/data
statx(/warthog/data) = 0
results=7ff
Size: 4096 Blocks: 8 IO Block: 1048576 directory
Device: 00:27 Inode: 2 Links: 125
Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041
Access: 2016-11-24 09:02:12.219699527+0000
Modify: 2016-11-17 10:44:36.225653653+0000
Change: 2016-11-17 10:44:36.225653653+0000

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2017-03-03 09:51:15 +0800

18 Jan, 2017

2 commits

fab8eef86 xfs: sanity check inode mode when creating new dentry ... Browse Code »

The helper xfs_dentry_to_name() is used by 2 different
classes of callers: Callers that pass zero mode and don't care
about the returned name.type field and Callers that pass
non zero mode and do care about the name.type field.

Change xfs_dentry_to_name() to not take the mode argument and
change the call sites of the first class to not pass the mode
argument.

Create a new helper xfs_dentry_mode_to_name() which does pass
the mode argument and returns -EFSCORRUPTED if mode is invalid.
Callers that translate non zero mode to on-disk file type now
check the return value and will export the error to user instead
of staging an invalid file type to be written to directory entry.

Signed-off-by: Amir Goldstein
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Amir Goldstein
2017-01-18 03:42:22 +0800
1fc4d33fe xfs: replace xfs_mode_to_ftype table with switch statement ... Browse Code »

The size of the xfs_mode_to_ftype[] conversion table
was too small to handle an invalid value of mode=S_IFMT.

Instead of fixing the table size, replace the conversion table
with a conversion helper that uses a switch statement.

Suggested-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Signed-off-by: Amir Goldstein
Signed-off-by: Darrick J. Wong

Amir Goldstein
2017-01-18 03:41:43 +0800

18 Dec, 2016

1 commit

231753ef7 Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm… ... Browse Code »

…/linux/kernel/git/mszeredi/vfs

Pull partial readlink cleanups from Miklos Szeredi.

This is the uncontroversial part of the readlink cleanup patch-set that
simplifies the default readlink handling.

Miklos and Al are still discussing the rest of the series.

* git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
vfs: make generic_readlink() static
vfs: remove ".readlink = generic_readlink" assignments
vfs: default to generic_readlink()
vfs: replace calling i_op->readlink with vfs_readlink()
proc/self: use generic_readlink
ecryptfs: use vfs_get_link()
bad_inode: add missing i_op initializers

Linus Torvalds
2016-12-18 11:16:12 +0800

09 Dec, 2016

2 commits

dfeef6886 vfs: remove ".readlink = generic_readlink" assignments ... Browse Code »

If .readlink == NULL implies generic_readlink().

Generated by:

to_del="\.readlink.*=.*generic_readlink"
for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; done

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2016-12-09 23:45:04 +0800
0c187dc50 xfs: use xfs_vn_setattr_size to check on new size ... Browse Code »

Commit 6552321831dc ("xfs: remove i_iolock and use i_rwsem in the
VFS inode instead") introduced a regression that truncate(2) doesn't
check on new size, so it succeeds even if the new size exceeds the
current resource limit. Because xfs_setattr_size() was used instead
of xfs_vn_setattr_size(), and the latter calls xfs_vn_change_ok()
first to do sanity check on permission and new size.

This is found by truncate03 test from ltp, and the following is a
simplified reproducer:

#!/bin/bash
dev=/dev/sda5
mnt=/mnt/xfs

mkfs -t xfs -f $dev
mount $dev $mnt

# set max file size to 16k
ulimit -f 16
truncate -s $((16 * 1024 + 1)) /mnt/xfs/testfile
[ $? -eq 0 ] && echo "FAIL: truncate exceeded max file size"
ulimit -f unlimited
umount $mnt

Signed-off-by: Eryu Guan
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Eryu Guan
2016-12-09 13:49:54 +0800

30 Nov, 2016

1 commit

655232183 xfs: remove i_iolock and use i_rwsem in the VFS inode instead ... Browse Code »

This patch drops the XFS-own i_iolock and uses the VFS i_rwsem which
recently replaced i_mutex instead. This means we only have to take
one lock instead of two in many fast path operations, and we can
also shrink the xfs_inode structure. Thanks to the xfs_ilock family
there is very little churn, the only thing of note is that we need
to switch to use the lock_two_directory helper for taking the i_rwsem
on two inodes in a few places to make sure our lock order matches
the one used in the VFS.

Signed-off-by: Christoph Hellwig
Tested-by: Jens Axboe
Reviewed-by: Darrick J. Wong
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-11-30 11:33:25 +0800

14 Oct, 2016

1 commit

35a891be9 Merge tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/dgc/linux-xfs

< XFS has gained super CoW powers! >
----------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Pull XFS support for shared data extents from Dave Chinner:
"This is the second part of the XFS updates for this merge cycle. This
pullreq contains the new shared data extents feature for XFS.

Given the complexity and size of this change I am expecting - like the
addition of reverse mapping last cycle - that there will be some
follow-up bug fixes and cleanups around the -rc3 stage for issues that
I'm sure will show up once the code hits a wider userbase.

What it is:

At the most basic level we are simply adding shared data extents to
XFS - i.e. a single extent on disk can now have multiple owners. To do
this we have to add new on-disk features to both track the shared
extents and the number of times they've been shared. This is done by
the new "refcount" btree that sits in every allocation group. When we
share or unshare an extent, this tree gets updated.

Along with this new tree, the reverse mapping tree needs to be updated
to track each owner or a shared extent. This also needs to be updated
ever share/unshare operation. These interactions at extent allocation
and freeing time have complex ordering and recovery constraints, so
there's a significant amount of new intent-based transaction code to
ensure that operations are performed atomically from both the runtime
and integrity/crash recovery perspectives.

We also need to break sharing when writes hit a shared extent - this
is where the new copy-on-write implementation comes in. We allocate
new storage and copy the original data along with the overwrite data
into the new location. We only do this for data as we don't share
metadata at all - each inode has it's own metadata that tracks the
shared data extents, the extents undergoing CoW and it's own private
extents.

Of course, being XFS, nothing is simple - we use delayed allocation
for CoW similar to how we use it for normal writes. ENOSPC is a
significant issue here - we build on the reservation code added in
4.8-rc1 with the reverse mapping feature to ensure we don't get
spurious ENOSPC issues part way through a CoW operation. These
mechanisms also help minimise fragmentation due to repeated CoW
operations. To further reduce fragmentation overhead, we've also
introduced a CoW extent size hint, which indicates how large a region
we should allocate when we execute a CoW operation.

With all this functionality in place, we can hook up .copy_file_range,
.clone_file_range and .dedupe_file_range and we gain all the
capabilities of reflink and other vfs provided functionality that
enable manipulation to shared extents. We also added a fallocate mode
that explicitly unshares a range of a file, which we implemented as an
explicit CoW of all the shared extents in a file.

As such, it's a huge chunk of new functionality with new on-disk
format features and internal infrastructure. It warns at mount time as
an experimental feature and that it may eat data (as we do with all
new on-disk features until they stabilise). We have not released
userspace suport for it yet - userspace support currently requires
download from Darrick's xfsprogs repo and build from source, so the
access to this feature is really developer/tester only at this point.
Initial userspace support will be released at the same time the kernel
with this code in it is released.

The new code causes 5-6 new failures with xfstests - these aren't
serious functional failures but things the output of tests changing
slightly due to perturbations in layouts, space usage, etc. OTOH,
we've added 150+ new tests to xfstests that specifically exercise this
new functionality so it's got far better test coverage than any
functionality we've previously added to XFS.

Darrick has done a pretty amazing job getting us to this stage, and
special mention also needs to go to Christoph (review, testing,
improvements and bug fixes) and Brian (caught several intricate bugs
during review) for the effort they've also put in.

Summary:

- unshare range (FALLOC_FL_UNSHARE) support for fallocate

- copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr
interface

- shared extent support for XFS

- copy-on-write support for shared extents

- copy_file_range support

- clone_file_range support (implements reflink)

- dedupe_file_range support

- defrag support for reverse mapping enabled filesystems"

* tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits)
xfs: convert COW blocks to real blocks before unwritten extent conversion
xfs: rework refcount cow recovery error handling
xfs: clear reflink flag if setting realtime flag
xfs: fix error initialization
xfs: fix label inaccuracies
xfs: remove isize check from unshare operation
xfs: reduce stack usage of _reflink_clear_inode_flag
xfs: check inode reflink flag before calling reflink functions
xfs: implement swapext for rmap filesystems
xfs: refactor swapext code
xfs: various swapext cleanups
xfs: recognize the reflink feature bit
xfs: simulate per-AG reservations being critically low
xfs: don't mix reflink and DAX mode for now
xfs: check for invalid inode reflink flags
xfs: set a default CoW extent size of 32 blocks
xfs: convert unwritten status of reverse mappings for shared files
xfs: use interval query for rmap alloc operations on shared files
xfs: add shared rmap map/unmap/convert log item types
xfs: increase log reservations for reflink
...

Linus Torvalds
2016-10-14 11:28:22 +0800

11 Oct, 2016

3 commits

101105b17 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning

Linus Torvalds
2016-10-11 11:16:43 +0800
3873691e5 Merge remote-tracking branch 'ovl/rename2' into for-linus Browse Code »

Al Viro
2016-10-11 11:02:51 +0800
97d211670 Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs xattr updates from Al Viro:
"xattr stuff from Andreas

This completes the switch to xattr_handler ->get()/->set() from
->getxattr/->setxattr/->removexattr"

* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Remove {get,set,remove}xattr inode operations
xattr: Stop calling {get,set,remove}xattr inode operations
vfs: Check for the IOP_XATTR flag in listxattr
xattr: Add __vfs_{get,set,remove}xattr helpers
libfs: Use IOP_XATTR flag for empty directory handling
vfs: Use IOP_XATTR flag for bad-inode handling
vfs: Add IOP_XATTR inode operations flag
vfs: Move xattr_resolve_name to the front of fs/xattr.c
ecryptfs: Switch to generic xattr handlers
sockfs: Get rid of getxattr iop
sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
kernfs: Switch to generic xattr handlers
hfs: Switch to generic xattr handlers
jffs2: Remove jffs2_{get,set,remove}xattr macros
xattr: Remove unnecessary NULL attribute name check

Linus Torvalds
2016-10-11 08:11:50 +0800

08 Oct, 2016

2 commits

e55f1d1d1 Merge remote-tracking branch 'jk/vfs' into work.misc Browse Code »

Al Viro
2016-10-08 23:06:08 +0800
fd50ecadd vfs: Remove {get,set,remove}xattr inode operations ... Browse Code »

These inode operations are no longer used; remove them.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Al Viro

Andreas Gruenbacher
2016-10-08 09:48:36 +0800

06 Oct, 2016

1 commit

4f435ebe7 xfs: don't mix reflink and DAX mode for now ... Browse Code »

Since we don't have a strategy for handling both DAX and reflink,
for now we'll just prohibit both being set at the same time.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2016-10-06 07:26:31 +0800

28 Sep, 2016

1 commit

c2050a454 fs: Replace current_fs_time() with current_time() ... Browse Code »

current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.

Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.

Signed-off-by: Deepa Dinamani
Signed-off-by: Al Viro

Deepa Dinamani
2016-09-28 09:06:22 +0800

27 Sep, 2016

1 commit

2773bf00a fs: rename "rename2" i_op to "rename" ... Browse Code »

Generated patch:

sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2016-09-27 17:03:58 +0800

22 Sep, 2016

2 commits

31051c85b fs: Give dentry to inode_change_ok() instead of inode ... Browse Code »

inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.

Reviewed-by: Christoph Hellwig
Signed-off-by: Jan Kara

Jan Kara
2016-09-22 16:56:19 +0800
69bca8074 xfs: Propagate dentry down to inode_change_ok() ... Browse Code »

To avoid clearing of capabilities or security related extended
attributes too early, inode_change_ok() will need to take dentry instead
of inode. Propagate dentry down to functions calling inode_change_ok().
This is rather straightforward except for xfs_set_mode() function which
does not have dentry easily available. Luckily that function does not
call inode_change_ok() anyway so we just have to do a little dance with
function prototypes.

Acked-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Jan Kara

Jan Kara
2016-09-22 16:56:19 +0800

17 Aug, 2016

1 commit

1d4795e7b xfs: (re-)implement FIEMAP_FLAG_XATTR ... Browse Code »

Use a special read-only iomap_ops implementation to support fiemap on
the attr fork.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-08-17 06:45:30 +0800

21 Jun, 2016

4 commits

459f0fbc2 xfs: use iomap infrastructure for DAX zeroing ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-06-21 07:55:18 +0800
d2bb140e9 xfs: use iomap fiemap implementation ... Browse Code »

Note that this removes support for the untested FIEMAP_FLAG_XATTR. It
could be added relatively easily with iomap ops for the attr fork, but
without test coverage I don't feel safe doing this.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-06-21 07:54:53 +0800
68a9f5e70 xfs: implement iomap based buffered write path ... Browse Code »

Convert XFS to use the new iomap based multipage write path. This involves
implementing the ->iomap_begin and ->iomap_end methods, and switching the
buffered file write, page_mkwrite and xfs_iozero paths to the new iomap
helpers.

With this change __xfs_get_blocks will never be used for buffered writes,
and the code handling them can be removed.

Based on earlier code from Dave Chinner.

Signed-off-by: Christoph Hellwig
Reviewed-by: Bob Peterson
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-06-21 07:53:44 +0800
f0c6bcba7 xfs: reorder zeroing and flushing sequence in truncate ... Browse Code »

Currently zeroing out blocks and waiting for writeout is a bit of a mess in
truncate. This patch gives it a clear order in preparation for the iomap
path:

(1) we first wait for any direct I/O to complete to prevent any races
for it
(2) we then perform the actual zeroing, and only use the truncate_page
helpers for truncating down. The truncate up case already is
handled by the separate call to xfs_zero_eof.
(3) only then we write back dirty data, as zeroing block may cause
dirty pages when using either xfs_zero_eof or the new iomap
infrastructure.

Signed-off-by: Christoph Hellwig
Reviewed-by: Bob Peterson
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-06-21 07:52:47 +0800

20 May, 2016

1 commit

5b9113547 Merge branch 'xfs-4.7-optimise-inline-symlinks' into for-next Browse Code »

Dave Chinner
2016-05-20 08:32:10 +0800

06 Apr, 2016

3 commits

253f4911f xfs: better xfs_trans_alloc interface ... Browse Code »

Merge xfs_trans_reserve and xfs_trans_alloc into a single function call
that returns a transaction with all the required log and block reservations,
and which allows passing transaction flags directly to avoid the cumbersome
_xfs_trans_alloc interface.

While we're at it we also get rid of the transaction type argument that has
been superflous since we stopped supporting the non-CIL logging mode. The
guts of it will be removed in another patch.

[dchinner: fixed transaction leak in error path in xfs_setattr_nonsize]

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-04-06 07:19:55 +0800
30ee052e1 xfs: optimize inline symlinks ... Browse Code »

By overallocating the in-core inode fork data buffer and zero
terminating the link target in xfs_init_local_fork we can avoid
the memory allocation in ->follow_link.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-04-06 05:53:29 +0800
2b3d1d41b xfs: set up inode operation vectors later ... Browse Code »

In the next patch we'll set up different inode operations for inline vs
out of line symlinks, for that we need to make sure the flags are already
set up properly.

[dchinner: added xfs_setup_iops() call to xfs_rename_alloc_whiteout()]

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-04-06 05:48:27 +0800

07 Mar, 2016

1 commit

85a9f38d3 Merge branch 'xfs-dax-fixes-4.6' into for-next Browse Code »

Dave Chinner
2016-03-07 06:34:31 +0800

01 Mar, 2016

2 commits

644854373 xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE ... Browse Code »

If the block size of a filesystem is not at least PAGE_SIZEd, then
at this point in time DAX cannot be used due to the fact we can't
guarantee extents are page sized or aligned without further work.
Hence disallow setting the DAX flag on an inode if the block size is
too small. Also, be defensive and check the block size when reading
an inode in off disk.

In future, we want to allow DAX to work on any filesystem, so this
is temporary while we sort of the correct conbination of extent size
hints and allocation alignment configurations needed to guarantee
page sized and aligned extent allocation for DAX enabled files.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Tested-by: Ross Zwisler
Signed-off-by: Dave Chinner

Dave Chinner
2016-03-01 06:41:33 +0800
db10c697b xfs: S_DAX is only for regular files ... Browse Code »

Only regular files can use DAX for data operations, so we should
restrict setting it on the VFS inode to regular files. Setting it on
metadata inodes may cause the VFS to do the wrong thing for such
inodes, so avoid potential problems by restricting the scope of the
flag to what we know is supported.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Tested-by: Ross Zwisler
Signed-off-by: Dave Chinner

Dave Chinner
2016-03-01 06:41:33 +0800

09 Feb, 2016

4 commits

c19b3b05a xfs: mode di_mode to vfs inode ... Browse Code »

Move the di_mode value from the xfs_icdinode to the VFS inode, reducing
the xfs_icdinode byte another 2 bytes and collapsing another 2 byte hole
in the structure.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-02-09 13:54:58 +0800
9e9a2674e xfs: move inode generation count to VFS inode ... Browse Code »

Pull another 4 bytes out of the xfs_icdinode.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-02-09 13:54:58 +0800
54d7b5c1d xfs: use vfs inode nlink field everywhere ... Browse Code »

The VFS tracks the inode nlink just like the xfs_icdinode. We can
remove the variable from the icdinode and use the VFS inode variable
everywhere, reducing the size of the xfs_icdinode by a further 4
bytes.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-02-09 13:54:58 +0800
3987848c7 xfs: remove timestamps from incore inode ... Browse Code »

The struct xfs_inode has two copies of the current timestamps in it,
one in the vfs inode and one in the struct xfs_icdinode. Now that we
no longer log the struct xfs_icdinode directly, we don't need to
keep the timestamps in this structure. instead we can copy them
straight out of the VFS inode when formatting the inode log item or
the on-disk inode.

This reduces the struct xfs_inode in size by 24 bytes.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-02-09 13:54:58 +0800

23 Jan, 2016

1 commit

d5ffdf8b4 Merge tag 'xfs-for-linus-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs ... Browse Code »

Pull more xfs updates from Dave Chinner:
"This is the second update for XFS that I mentioned in the original
pull request last week.

It contains a revert for a suspend regression in 4.4 and a fix for a
long standing log recovery issue that has been further exposed by all
the log recovery changes made in the original 4.5 merge.

There is one more thing in this pull request - one that I forgot to
merge into the origin. That is, pulling the XFS_IOC_FS[GS]ETXATTR
ioctl up to the VFS level so that other filesystems can also use it
for modifying project quota IDs

Summary:

- promotion of XFS_IOC_FS[GS]ETXATTR ioctl to the vfs level so that
it can be shared with other filesystems. The ext4 project quota
functionality is the first target for this. The commits in this
series have not been updated with review or final SOB tags because
the branch they were originally published in was needed by ext4.
Those tags are:

Reviewed-by: Theodore Ts'o
Signed-off-by: Dave Chinner

- Revert a change that is causing suspend failures.

- Fix a use-after-free that can occur on log mount failures. Been
around forever, but now exposed by other changes to log recovery
made in the first 4.5 merge"

* tag 'xfs-for-linus-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: log mount failures don't wait for buffers to be released
Revert "xfs: clear PF_NOFREEZE for xfsaild kthread"
xfs: introduce per-inode DAX enablement
xfs: use FS_XFLAG definitions directly
fs: XFS_IOC_FS[SG]SETXATTR to FS_IOC_FS[SG]ETXATTR promotion

Linus Torvalds
2016-01-23 02:54:13 +0800

04 Jan, 2016

1 commit

58f88ca2d xfs: introduce per-inode DAX enablement ... Browse Code »

Rather than just being able to turn DAX on and off via a mount
option, some applications may only want to enable DAX for certain
performance critical files in a filesystem.

This patch introduces a new inode flag to enable DAX in the v3 inode
di_flags2 field. It adds support for setting and clearing flags in
the di_flags2 field via the XFS_IOC_FSSETXATTR ioctl, and sets the
S_DAX inode flag appropriately when it is seen.

When this flag is set on a directory, it acts as an "inherit flag".
That is, inodes created in the directory will automatically inherit
the on-disk inode DAX flag, enabling administrators to set up
directory heirarchies that automatically use DAX. Setting this flag
on an empty root directory will make the entire filesystem use DAX
by default.

Signed-off-by: Dave Chinner

Dave Chinner
2016-01-04 13:44:15 +0800

31 Dec, 2015

1 commit

fceef393a switch ->get_link() to delayed_call, kill ->put_link() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2015-12-31 02:01:03 +0800

09 Dec, 2015

1 commit

6b2553918 replace ->follow_link() with new method that could stay in RCU mode ... Browse Code »

new method: ->get_link(); replacement of ->follow_link(). The differences
are:
* inode and dentry are passed separately
* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.

It's a flagday change - the old method is gone, all in-tree instances
converted. Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode. That'll change
in the next commits.

Signed-off-by: Al Viro

Al Viro
2015-12-09 11:41:54 +0800

12 Oct, 2015

1 commit

ff6d6af23 xfs: per-filesystem stats counter implementation ... Browse Code »

This patch modifies the stats counting macros and the callers
to those macros to properly increment, decrement, and add-to
the xfs stats counts. The counts for global and per-fs stats
are correctly advanced, and cleared by writing a "1" to the
corresponding clear file.

global counts: /sys/fs/xfs/stats/stats
per-fs counts: /sys/fs/xfs/sda*/stats/stats

global clear: /sys/fs/xfs/stats/stats_clear
per-fs clear: /sys/fs/xfs/sda*/stats/stats_clear

[dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
stats structures and macros. ]

Signed-off-by: Bill O'Donnell
Reviewed-by: Eric Sandeen
Signed-off-by: Dave Chinner

Bill O'Donnell
2015-10-12 15:21:22 +0800