Eric Lee / smarc-fsl-linux-kernel

28 Sep, 2016

1 commit

c2050a454 fs: Replace current_fs_time() with current_time() ... Browse Code »

current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.

Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.

Signed-off-by: Deepa Dinamani
Signed-off-by: Al Viro

Deepa Dinamani
2016-09-28 09:06:22 +0800

09 Feb, 2016

2 commits

83e06f21b xfs: move di_changecount to VFS inode ... Browse Code »

We can store the di_changecount in the i_version field of the VFS
inode and remove another 8 bytes from the xfs_icdinode.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-02-09 13:54:58 +0800
3987848c7 xfs: remove timestamps from incore inode ... Browse Code »

The struct xfs_inode has two copies of the current timestamps in it,
one in the vfs inode and one in the struct xfs_icdinode. Now that we
no longer log the struct xfs_icdinode directly, we don't need to
keep the timestamps in this structure. instead we can copy them
straight out of the VFS inode when formatting the inode log item or
the on-disk inode.

This reduces the struct xfs_inode in size by 24 bytes.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-02-09 13:54:58 +0800

03 Nov, 2015

1 commit

fc0561cef xfs: optimise away log forces on timestamp updates for fdatasync ... Browse Code »

xfs: timestamp updates cause excessive fdatasync log traffic

Sage Weil reported that a ceph test workload was writing to the
log on every fdatasync during an overwrite workload. Event tracing
showed that the only metadata modification being made was the
timestamp updates during the write(2) syscall, but fdatasync(2)
is supposed to ignore them. The key observation was that the
transactions in the log all looked like this:

INODE: #regs: 4 ino: 0x8b flags: 0x45 dsize: 32

And contained a flags field of 0x45 or 0x85, and had data and
attribute forks following the inode core. This means that the
timestamp updates were triggering dirty relogging of previously
logged parts of the inode that hadn't yet been flushed back to
disk.

There are two parts to this problem. The first is that XFS relogs
dirty regions in subsequent transactions, so it carries around the
fields that have been dirtied since the last time the inode was
written back to disk, not since the last time the inode was forced
into the log.

The second part is that on v5 filesystems, the inode change count
update during inode dirtying also sets the XFS_ILOG_CORE flag, so
on v5 filesystems this makes a timestamp update dirty the entire
inode.

As a result when fdatasync is run, it looks at the dirty fields in
the inode, and sees more than just the timestamp flag, even though
the only metadata change since the last fdatasync was just the
timestamps. Hence we force the log on every subsequent fdatasync
even though it is not needed.

To fix this, add a new field to the inode log item that tracks
changes since the last time fsync/fdatasync forced the log to flush
the changes to the journal. This flag is updated when we dirty the
inode, but we do it before updating the change count so it does not
carry the "core dirty" flag from timestamp updates. The fields are
zeroed when the inode is marked clean (due to writeback/freeing) or
when an fsync/datasync forces the log. Hence if we only dirty the
timestamps on the inode between fsync/fdatasync calls, the fdatasync
will not trigger another log force.

Over 100 runs of the test program:

Ext4 baseline:
runtime: 1.63s +/- 0.24s
avg lat: 1.59ms +/- 0.24ms
iops: ~2000

XFS, vanilla kernel:
runtime: 2.45s +/- 0.18s
avg lat: 2.39ms +/- 0.18ms
log forces: ~400/s
iops: ~1000

XFS, patched kernel:
runtime: 1.49s +/- 0.26s
avg lat: 1.46ms +/- 0.25ms
log forces: ~30/s
iops: ~1500

Reported-by: Sage Weil
Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Dave Chinner
2015-11-03 10:14:59 +0800

28 Nov, 2014

2 commits

bb58e6188 xfs: move most of xfs_sb.h to xfs_format.h ... Browse Code »

More on-disk format consolidation.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2014-11-28 11:27:09 +0800
4fb6e8ade xfs: merge xfs_ag.h into xfs_format.h ... Browse Code »

More on-disk format consolidation. A few declarations that weren't on-disk
format related move into better suitable spots.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2014-11-28 11:25:04 +0800

02 Oct, 2014

1 commit

e076b0f3a xfs: kill time.h ... Browse Code »

The typedef for timespecs and nanotime() are completely unnecessary,
and delay() can be moved to fs/xfs/linux.h, which means this file
can go away.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2014-10-02 07:18:13 +0800

18 Nov, 2013

1 commit

2fe8c1c08 xfs: open code inc_inode_iversion when logging an inode ... Browse Code »

Michael L Semon reported that generic/069 runtime increased on v5
superblocks by 100% compared to v4 superblocks. his perf-based
analysis pointed directly at the timestamp updates being done by the
write path in this workload. The append writers are doing 4-byte
writes, so there are lots of timestamp updates occurring.

The thing is, they aren't being triggered by timestamp changes -
they are being triggered by the inode change counter needing to be
updated. That is, every write(2) system call needs to bump the inode
version count, and it does that through the timestamp update
mechanism. Hence for v5 filesystems, test generic/069 is running 3
orders of magnitude more timestmap update transactions on v5
filesystems due to the fact it does a huge number of *4 byte*
write(2) calls.

This isn't a real world scenario we really need to address - anyone
doing such sequential IO should be using fwrite(3), not write(2).
i.e. fwrite(3) buffers the writes in userspace to minimise the
number of write(2) syscalls, and the problem goes away.

However, there is a small change we can make to improve the
situation - removing the expensive lock operation on the change
counter update. All inode version counter changes in XFS occur
under the ip->i_ilock during a transaction, and therefore we
don't actually need the spin lock that provides exclusive access to
it through inc_inode_iversion().

Hence avoid the lock and just open code the increment ourselves when
logging the inode.

Reported-by: Michael L. Semon
Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Ben Myers

Dave Chinner
2013-11-18 23:42:08 +0800

24 Oct, 2013

3 commits

a4fbe6ab1 xfs: decouple inode and bmap btree header files ... Browse Code »

Currently the xfs_inode.h header has a dependency on the definition
of the BMAP btree records as the inode fork includes an array of
xfs_bmbt_rec_host_t objects in it's definition.

Move all the btree format definitions from xfs_btree.h,
xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
xfs_format.h to continue the process of centralising the on-disk
format definitions. With this done, the xfs inode definitions are no
longer dependent on btree header files.

The enables a massive culling of unnecessary includes, with close to
200 #include directives removed from the XFS kernel code base.

Signed-off-by: Dave Chinner
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

Dave Chinner
2013-10-24 05:28:49 +0800
239880ef6 xfs: decouple log and transaction headers ... Browse Code »

xfs_trans.h has a dependency on xfs_log.h for a couple of
structures. Most code that does transactions doesn't need to know
anything about the log, but this dependency means that they have to
include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
files and clean up the includes to be in dependency order.

In doing this, remove the direct include of xfs_trans_reserve.h from
xfs_trans.h so that we remove the dependency between xfs_trans.h and
xfs_mount.h. Hence the xfs_trans.h include can be moved to the
indicate the actual dependencies other header files have on it.

Note that these are kernel only header files, so this does not
translate to any userspace changes at all.

Signed-off-by: Dave Chinner
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

Dave Chinner
2013-10-24 05:17:44 +0800
70a9883c5 xfs: create a shared header file for format-related information ... Browse Code »

All of the buffer operations structures are needed to be exported
for xfs_db, so move them all to a common location rather than
spreading them all over the place. They are verifying the on-disk
format, so while xfs_format.h might be a good place, it is not part
of the on disk format.

Hence we need to create a new header file that we centralise these
related definitions. Start by moving the bffer operations
structures, and then also move all the other definitions that have
crept into xfs_log_format.h and xfs_format.h as there was no other
shared header file to put them in.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Ben Myers

Dave Chinner
2013-10-24 03:11:30 +0800

29 Jun, 2013

1 commit

dc037ad7d xfs: implement inode change count ... Browse Code »

For CRC enabled filesystems, add support for the monotonic inode
version change counter that is needed by protocols like NFSv4 for
determining if the inode has changed in any way at all between two
unrelated operations on the inode.

This bumps the change count the first time an inode is dirtied in a
transaction. Since all modifications to the inode are logged, this
will catch all changes that are made to the inode, including
timestamp updates that occur during data writes.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Reviewed-by: Chandra Seetharaman
Signed-off-by: Ben Myers

Dave Chinner
2013-06-29 02:00:05 +0800

18 Dec, 2012

1 commit

ec47eb6b0 xfs remove the XFS_TRANS_DEBUG routines ... Browse Code »

Remove the XFS_TRANS_DEBUG routines. They are no longer appropriate
and have not been used in years

Signed-off-by: Mark Tinguely
Reviewed-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Mark Tinguely
2012-12-18 06:29:00 +0800

15 May, 2012

2 commits

ad1e95c54 xfs: clean up xfs_bit.h includes ... Browse Code »

With the removal of xfs_rw.h and other changes over time, xfs_bit.h
is being included in many files that don't actually need it. Clean
up the includes as necessary.

Also move the only-used-once xfs_ialloc_find_free() static inline
function out of a header file that is widely included to reduce
the number of needless dependencies on xfs_bit.h.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-05-15 05:21:00 +0800
60a34607b xfs: move xfsagino_t to xfs_types.h ... Browse Code »

Untangle the header file includes a bit by moving the definition of
xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
xfs_ag.h.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-05-15 05:20:54 +0800

14 Mar, 2012

2 commits

f5d8d5c4b xfs: split in-core and on-disk inode log item fields ... Browse Code »

Add a new ili_fields member to the inode log item to isolate the in-memory
flags from the ones that actually go to the log. This will allow tracking
timestamp-only updates for fdatasync and O_DSYNC in the next patch and
prepares for divorcing the on-disk log format from the in-memory log item
a little further down the road.

Reviewed-by: Dave Chinner
Signed-off-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Christoph Hellwig
2012-03-14 06:08:17 +0800
8a9c9980f xfs: log timestamp updates ... Browse Code »

Timestamps on regular files are the last metadata that XFS does not update
transactionally. Now that we use the delaylog mode exclusively and made
the log scode scale extremly well there is no need to bypass that code for
timestamp updates. Logging all updates allows to drop a lot of code, and
will allow for further performance improvements later on.

Note that this patch drops optimized handling of fdatasync - it will be
added back in a separate commit.

Reviewed-by: Dave Chinner
Signed-off-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Christoph Hellwig
2012-03-14 06:01:15 +0800

12 Oct, 2011

1 commit

ddc3415ab xfs: simplify xfs_trans_ijoin* again ... Browse Code »

There is no reason to keep a reference to the inode even if we unlock
it during transaction commit because we never drop a reference between
the ijoin and commit. Also use this fact to merge xfs_trans_ijoin_ref
back into xfs_trans_ijoin - the third argument decides if an unlock
is needed now.

I'm actually starting to wonder if allowing inodes to be unlocked
at transaction commit really is worth the effort. The only real
benefit is that they can be unlocked earlier when commiting a
synchronous transactions, but that could be solved by doing the
log force manually after the unlock, too.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2011-10-12 10:15:08 +0800

08 Jul, 2011

1 commit

f3ca87389 xfs: remove i_transp ... Browse Code »

Remove the transaction pointer in the inode. It's only used to avoid
passing down an argument in the bmap code, and for a few asserts in
the transaction code right now.

Also use the local variable ip in a few more places in xfs_inode_item_unlock,
so that it isn't only used for debug builds after the above change.

Signed-off-by: Christoph Hellwig
Reviewed-by: Alex Elder
Reviewed-by: Dave Chinner

Christoph Hellwig
2011-07-08 20:34:47 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

23 Feb, 2011

1 commit

ec3ba85f4 xfs: more sensible inode refcounting for ialloc ... Browse Code »

Currently we return iodes from xfs_ialloc with just a single reference held.
But we need two references, as one is dropped during transaction commit and
the second needs to be transfered to the VFS. Change xfs_ialloc to use
xfs_iget plus xfs_trans_ijoin_ref to grab two references to the inode,
and remove the now superflous IHOLD calls from all callers. This also
greatly simplifies the error handling in xfs_create and also allow to remove
xfs_trans_iget as no other callers are left.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder

Christoph Hellwig
2011-02-23 10:32:28 +0800

19 Oct, 2010

1 commit

dcd79a142 xfs: don't use vfs writeback for pure metadata modifications ... Browse Code »

Under heavy multi-way parallel create workloads, the VFS struggles
to write back all the inodes that have been changed in age order.
The bdi flusher thread becomes CPU bound, spending 85% of it's time
in the VFS code, mostly traversing the superblock dirty inode list
to separate dirty inodes old enough to flush.

We already keep an index of all metadata changes in age order - in
the AIL - and continued log pressure will do age ordered writeback
without any extra overhead at all. If there is no pressure on the
log, the xfssyncd will periodically write back metadata in ascending
disk address offset order so will be very efficient.

Hence we can stop marking VFS inodes dirty during transaction commit
or when changing timestamps during transactions. This will keep the
inodes in the superblock dirty list to those containing data or
unlogged metadata changes.

However, the timstamp changes are slightly more complex than this -
there are a couple of places that do unlogged updates of the
timestamps, and the VFS need to be informed of these. Hence add a
new function xfs_trans_ichgtime() for transactional changes,
and leave xfs_ichgtime() for the non-transactional changes.

Signed-off-by: Dave Chinner
Reviewed-by: Alex Elder
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-10-19 04:07:45 +0800

27 Jul, 2010

4 commits

898621d5a xfs: simplify inode to transaction joining ... Browse Code »

Currently we need to either call IHOLD or xfs_trans_ihold on an inode when
joining it to a transaction via xfs_trans_ijoin.

This patches instead makes xfs_trans_ijoin usable on it's own by doing
an implicity xfs_trans_ihold, which also allows us to drop the third
argument. For the case where we want to hold a reference on the inode
a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks
the inode for needing an xfs_iput. In addition to the cleaner interface
to the caller this also simplifies the implementation.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-07-27 02:16:36 +0800
e98c414f9 xfs: simplify log item descriptor tracking ... Browse Code »

Currently we track log item descriptor belonging to a transaction using a
complex opencoded chunk allocator. This code has been there since day one
and seems to work around the lack of an efficient slab allocator.

This patch replaces it with dynamically allocated log item descriptors
from a dedicated slab pool, linked to the transaction by a linked list.

This allows to greatly simplify the log item descriptor tracking to the
point where it's just a couple hundred lines in xfs_trans.c instead of
a separate file. The external API has also been simplified while we're
at it - the xfs_trans_add_item and xfs_trans_del_item functions to add/
delete items from a transaction have been simplified to the bare minium,
and the xfs_trans_find_item function is replaced with a direct dereference
of the li_desc field. All debug code walking the list of log items in
a transaction is down to a simple list_for_each_entry.

Note that we could easily use a singly linked list here instead of the
double linked list from list.h as the fastpath only does deletion from
sequential traversal. But given that we don't have one available as
a library function yet I use the list.h functions for simplicity.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-07-27 02:16:34 +0800
3400777ff xfs: remove unneeded #include statements ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-07-27 02:16:33 +0800
288699fec xfs: drop dmapi hooks ... Browse Code »

Dmapi support was never merged upstream, but we still have a lot of hooks
bloating XFS for it, all over the fast pathes of the filesystem.

This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM
support in mainline at least the namespace events can be done much saner
in the VFS instead of the individual filesystem, so it's not like this
is much help for future work.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-07-27 02:16:33 +0800

24 Jun, 2010

1 commit

7b6259e7a xfs: remove block number from inode lookup code ... Browse Code »

The block number comes from bulkstat based inode lookups to shortcut
the mapping calculations. We ar enot able to trust anything from
bulkstat, so drop the block number as well so that the correct
lookups and mappings are always done.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-06-24 09:35:17 +0800

02 Sep, 2009

1 commit

aa72a5cf0 xfs: simplify xfs_trans_iget ... Browse Code »

xfs_trans_iget is a wrapper for xfs_iget that adds the inode to the
transaction after it is read. Except when the inode already is in the
inode cache, in which case it returns the existing locked inode with
increment lock recursion counts.

Now, no one in the tree every decrements these lock recursion counts,
so any user of this gets a potential double unlock when both the original
owner of the inode and the xfs_trans_iget caller unlock it. When looking
back in a git bisect in the historic XFS tree there was only one place
that decremented these counts, xfs_trans_iput. Introduced in commit
ca25df7a840f426eb566d52667b6950b92bb84b5 by Adam Sweeney in 1993,
and removed in commit 19f899a3ab155ff6a49c0c79b06f2f61059afaf3 by
Steve Lord in 2003. And as long as it didn't slip through git bisects
cracks never actually used in that time frame.

A quick audit of the callers of xfs_trans_iget shows that no caller
really relies on this behaviour fortunately - xfs_ialloc allows this
inode from disk so it must not be there before, and all the RT allocator
routines only every add each RT bitmap inode once.

In addition to removing lots of code and reducing the size of the inode
item this patch also avoids the double inode cache lookup in each
create/mkdir/mknod transaction.

Signed-off-by: Christoph Hellwig
Reviewed-by: Alex Elder
Signed-off-by: Felix Blyakher

Christoph Hellwig
2009-09-02 01:46:16 +0800

04 Dec, 2008

1 commit

070c4616e use xfs_trans_ijoin in xfs_trans_iget ... Browse Code »

Use xfs_trans_ijoin in xfs_trans_iget in case we need to join an inode into
a transaction instead of opencoding it. Based on a discussion with and an
incomplete patch from Niv Sardi.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Niv Sardi

Christoph Hellwig
2008-12-04 12:39:23 +0800

28 Jul, 2008

1 commit

f0e2d93c2 [XFS] Remove unused arg from kmem_free() ... Browse Code »

kmem_free() function takes (ptr, size) arguments but doesn't actually use
second one.

This patch removes size argument from all callsites.

SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31050a

Signed-off-by: Denys Vlasenko
Signed-off-by: David Chinner
Signed-off-by: Lachlan McIlroy

Denys Vlasenko
2008-07-28 14:58:07 +0800

29 Apr, 2008

1 commit

579aa9caf [XFS] shrink mrlock_t ... Browse Code »

The writer field is not needed for non_DEBU builds so remove it. While
we're at i also clean up the interface for is locked asserts to go through
and xfs_iget.c helper with an interface like the xfs_ilock routines to
isolated the XFS codebase from mrlock internals. That way we can kill
mrlock_t entirely once rw_semaphores grow an islocked facility. Also
remove unused flags to the ilock family of functions.

SGI-PV: 976035
SGI-Modid: xfs-linux-melb:xfs-kern:30902a

Signed-off-by: Christoph Hellwig
Signed-off-by: Lachlan McIlroy

Christoph Hellwig
2008-04-29 13:54:02 +0800

20 Jun, 2006

1 commit

f6c2d1fa6 [XFS] Remove version 1 directory code. Never functioned on Linux, just ... Browse Code »

pure bloat.

SGI-PV: 952969
SGI-Modid: xfs-linux-melb:xfs-kern:26251a

Signed-off-by: Nathan Scott

Nathan Scott
2006-06-20 11:04:51 +0800

29 Mar, 2006

1 commit

c41564b5a [XFS] We really suck at spulling. Thanks to Chris Pascoe for fixing all ... Browse Code »

these typos.

SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:25539a

Signed-off-by: Nathan Scott

Nathan Scott
2006-03-29 06:55:14 +0800

02 Nov, 2005

2 commits

7b7187698 [XFS] Update license/copyright notices to match the prefered SGI ... Browse Code »

boilerplate.

SGI-PV: 913862
SGI-Modid: xfs-linux:xfs-kern:23903a

Signed-off-by: Nathan Scott

Nathan Scott
2005-11-02 11:58:39 +0800
a844f4510 [XFS] Remove xfs_macros.c, xfs_macros.h, rework headers a whole lot. ... Browse Code »

SGI-PV: 943122
SGI-Modid: xfs-linux:xfs-kern:23901a

Signed-off-by: Nathan Scott

Nathan Scott
2005-11-02 11:38:42 +0800

21 Jun, 2005

1 commit

4372d6e10 [XFS] Remove dead code. Patch from Adrian Bunk ... Browse Code »

SGI-PV: 936255
SGI-Modid: xfs-linux:xfs-kern:192759a

Signed-off-by: Christoph Hellwig
Signed-off-by: Nathan Scott

Christoph Hellwig
2005-06-21 13:36:00 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800