Doug / smarc-fsl-linux-kernel | Embedian Git Server

12 Oct, 2011

4 commits

a9add83e5 xfs: remove XFS_bflush ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder

Christoph Hellwig
2011-10-12 10:15:11 +0800
901796afc xfs: clean up xfs_ioerror_alert ... Browse Code »

Instead of passing the block number and mount structure explicitly
get them off the bp and fix make the argument order more natural.

Also move it to xfs_buf.c and stop printing the device name given
that we already get the fs name as part of xfs_alert, and we know
what device is operates on because of the caller that gets printed,
finally rename it to xfs_buf_ioerror_alert and pass __func__ as
argument where it makes sense.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder

Christoph Hellwig
2011-10-12 10:15:10 +0800
87c7bec7f xfs: fix buffer flushing during unmount ... Browse Code »

The code to flush buffers in the umount code is a bit iffy: we first
flush all delwri buffers out, but then might be able to queue up a
new one when logging the sb counts. On a normal shutdown that one
would get flushed out when doing the synchronous superblock write in
xfs_unmountfs_writesb, but we skip that one if the filesystem has
been shut down.

Fix this by moving the delwri list flushing until just before unmounting
the log, and while we're at it also remove the superflous delwri list
and buffer lru flusing for the rt and log device that can never have
cached or delwri buffers.

Signed-off-by: Christoph Hellwig
Reported-by: Amit Sahrawat
Tested-by: Amit Sahrawat
Signed-off-by: Alex Elder

Christoph Hellwig
2011-10-12 10:15:09 +0800
61551f1ee xfs: call xfs_buf_delwri_queue directly ... Browse Code »

Unify the ways we add buffers to the delwri queue by always calling
xfs_buf_delwri_queue directly. The xfs_bdwrite functions is removed and
opencoded in its callers, and the two places setting XBF_DELWRI while a
buffer is locked and expecting xfs_buf_unlock to pick it up are converted
to call xfs_buf_delwri_queue directly, too. Also replace the
XFS_BUF_UNDELAYWRITE macro with direct calls to xfs_buf_delwri_dequeue
to make the explicit queuing/dequeuing more obvious.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder

Christoph Hellwig
2011-10-12 10:14:59 +0800

08 Aug, 2011

1 commit

2ddb4e940 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux Browse Code »

Alex Elder
2011-08-08 20:06:24 +0800

27 Jul, 2011

1 commit

abbede1b3 xfs: get rid of open-coded S_ISREG(), etc. ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-07-27 03:05:16 +0800

26 Jul, 2011

2 commits

49074c069 xfs: Remove the macro XFS_BUF_TARGET ... Browse Code »

Remove the definition and usages of the macro XFS_BUF_TARGET

Signed-off-by: Chandra Seetharaman
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Chandra Seetharaman
2011-07-26 04:03:31 +0800
72790aa11 xfs: Remove macro XFS_BUF_HOLD ... Browse Code »

Remove the definition and usage of the macro XFS_BUF_HOLD

Signed-off-by: Chandra Seetharaman
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Chandra Seetharaman
2011-07-26 04:03:06 +0800

21 Jul, 2011

1 commit

adab0f67d xfs: Remove the second parameter to xfs_sb_count() ... Browse Code »

Remove the second parameter to xfs_sb_count() since all callers of
the function set them.

Also, fix the header comment regarding it being called periodically.

Signed-off-by: Chandra Seetharaman
Signed-off-by: Alex Elder

Chandra Seetharaman
2011-07-21 07:35:03 +0800

13 Jul, 2011

1 commit

ea15ab3cd xfs: remove the dead QUOTADEBUG code ... Browse Code »

Remove the dead hash table test rid which has been rotting away under
QUOTADEBUG, including some code that was compiled for normal debug
builds, but not actually called without QUOTADEBUG, and enable a few
cheap debug checks that were hidden under QUOTADEBUG for normal
debug builds.

Signed-off-by: Christoph Hellwig
Reviewed-by: Alex Elder
Reviewed-by: Dave Chinner

Christoph Hellwig
2011-07-13 19:43:50 +0800

11 Jul, 2011

1 commit

b2ce39740 Revert "xfs: fix filesystsem freeze race in xfs_trans_alloc" ... Browse Code »

This reverts commit 7a249cf83da1813cfa71cfe1e265b40045eceb47.

That commit created a situation that could lead to a filesystem
hang. As Dave Chinner pointed out, xfs_trans_alloc() could hold a
reference to m_active_trans (i.e., keep it non-zero) and then wait
for SB_FREEZE_TRANS to complete. Meanwhile a filesystem freeze
request could set SB_FREEZE_TRANS and then wait for m_active_trans
to drop to zero. Nobody benefits from this sequence of events...

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Alex Elder
2011-07-11 23:21:03 +0800

09 Jul, 2011

1 commit

c0e090ced xfs: consolidate & clarify mount sanity checks ... Browse Code »

Pavol pointed out that there is one silent error case in the mount
path, and that others are rather uninformative.

I've taken Pavol's suggested patch and extended it a bit to also:

* fix a message which says "turned off" but actually errors out
* consolidate the vaguely differentiated "SB sanity check [12]"
messages, and hexdump the superblock for analysis

Original-patch-by: Pavol Gono
Signed-off-by: Eric Sandeen
Signed-off-by: Alex Elder

Eric Sandeen
2011-07-09 00:32:51 +0800

08 Jul, 2011

2 commits

0c842ad46 xfs: clean up buffer locking helpers ... Browse Code »

Rename xfs_buf_cond_lock and reverse it's return value to fit most other
trylock operations in the Kernel and XFS (with the exception of down_trylock,
after which xfs_buf_cond_lock was modelled), and replace xfs_buf_lock_val
with an xfs_buf_islocked for use in asserts, or and opencoded variant in
tracing. remove the XFS_BUF_* wrappers for all the locking helpers.

Signed-off-by: Christoph Hellwig
Reviewed-by: Alex Elder
Reviewed-by: Dave Chinner

Christoph Hellwig
2011-07-08 20:36:19 +0800
7a249cf83 xfs: fix filesystsem freeze race in xfs_trans_alloc ... Browse Code »

As pointed out by Jan xfs_trans_alloc can race with a concurrent filesystem
freeze when it sleeps during the memory allocation. Fix this by moving the
wait_for_freeze call after the memory allocation. This means moving the
freeze into the low-level _xfs_trans_alloc helper, which thus grows a new
argument. Also fix up some comments in that area while at it.

Signed-off-by: Christoph Hellwig
Reviewed-by: Alex Elder
Reviewed-by: Dave Chinner

Christoph Hellwig
2011-07-08 20:34:42 +0800

29 Apr, 2011

1 commit

45c51b999 xfs: cleanup duplicate initializations ... Browse Code »

follow these guidelines:
- leave initialization in the declaration block if it fits the line
- move to the code where it's more suitable ('for' init block)

The last chunk was modified from David's original to be a correct
fix for what appeared to be a duplicate initialization.

Signed-off-by: David Sterba
Signed-off-by: Alex Elder
Reviewed-by: Dave Chinner

David Sterba
2011-04-29 02:25:29 +0800

07 Mar, 2011

3 commits

0b932cccb xfs: Convert remaining cmn_err() callers to new API ... Browse Code »

Once converted, kill the remainder of the cmn_err() interface.

Signed-off-by: Dave Chinner
Reviewed-by: Alex Elder
Reviewed-by: Christoph Hellwig

Dave Chinner
2011-03-07 07:08:35 +0800
534877869 xfs: convert xfs_fs_cmn_err to new error logging API ... Browse Code »

Continue to clean up the error logging code by converting all the
callers of xfs_fs_cmn_err() to the new API. Once done, remove the
unused old API function.

Signed-off-by: Dave Chinner
Reviewed-by: Alex Elder
Reviewed-by: Christoph Hellwig

Dave Chinner
2011-03-07 07:05:35 +0800
af34e09da xfs: kill xfs_fs_mount_cmn_err() macro ... Browse Code »

The xfs_fs_mount_cmn_err() hides a simple check as to whether the
mount path should output an error or not. Remove the macro and open
code the check.

Signed-off-by: Dave Chinner
Reviewed-by: Alex Elder
Reviewed-by: Christoph Hellwig

Dave Chinner
2011-03-07 07:04:35 +0800

04 Jan, 2011

1 commit

055388a31 xfs: dynamic speculative EOF preallocation ... Browse Code »

Currently the size of the speculative preallocation during delayed
allocation is fixed by either the allocsize mount option of a
default size. We are seeing a lot of cases where we need to
recommend using the allocsize mount option to prevent fragmentation
when buffered writes land in the same AG.

Rather than using a fixed preallocation size by default (up to 64k),
make it dynamic by basing it on the current inode size. That way the
EOF preallocation will increase as the file size increases. Hence
for streaming writes we are much more likely to get large
preallocations exactly when we need it to reduce fragementation.

For default settings, the size of the initial extents is determined
by the number of parallel writers and the amount of memory in the
machine. For 4GB RAM and 4 concurrent 32GB file writes:

EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..1048575]: 1048672..2097247 0 (1048672..2097247) 1048576
1: [1048576..2097151]: 5242976..6291551 0 (5242976..6291551) 1048576
2: [2097152..4194303]: 12583008..14680159 0 (12583008..14680159) 2097152
3: [4194304..8388607]: 25165920..29360223 0 (25165920..29360223) 4194304
4: [8388608..16777215]: 58720352..67108959 0 (58720352..67108959) 8388608
5: [16777216..33554423]: 117440584..134217791 0 (117440584..134217791) 16777208
6: [33554424..50331511]: 184549056..201326143 0 (184549056..201326143) 16777088
7: [50331512..67108599]: 251657408..268434495 0 (251657408..268434495) 16777088

and for 16 concurrent 16GB file writes:

EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..262143]: 2490472..2752615 0 (2490472..2752615) 262144
1: [262144..524287]: 6291560..6553703 0 (6291560..6553703) 262144
2: [524288..1048575]: 13631592..14155879 0 (13631592..14155879) 524288
3: [1048576..2097151]: 30408808..31457383 0 (30408808..31457383) 1048576
4: [2097152..4194303]: 52428904..54526055 0 (52428904..54526055) 2097152
5: [4194304..8388607]: 104857704..109052007 0 (104857704..109052007) 4194304
6: [8388608..16777215]: 209715304..218103911 0 (209715304..218103911) 8388608
7: [16777216..33554423]: 452984848..469762055 0 (452984848..469762055) 16777208

Because it is hard to take back specualtive preallocation, cases
where there are large slow growing log files on a nearly full
filesystem may cause premature ENOSPC. Hence as the filesystem nears
full, the maximum dynamic prealloc size іs reduced according to this
table (based on 4k block size):

freespace max prealloc size
>5% full extent (8GB)
4-5% 2GB (8GB >> 2)
3-4% 1GB (8GB >> 3)
2-3% 512MB (8GB >> 4)
1-2% 256MB (8GB >> 5)
> 6)

This should reduce the amount of space held in speculative
preallocation for such cases.

The allocsize mount option turns off the dynamic behaviour and fixes
the prealloc size to whatever the mount option specifies. i.e. the
behaviour is unchanged.

Signed-off-by: Dave Chinner

Dave Chinner
2011-01-04 08:35:03 +0800

16 Dec, 2010

1 commit

1a427ab0c xfs: convert pag_ici_lock to a spin lock ... Browse Code »

now that we are using RCU protection for the inode cache lookups,
the lock is only needed on the modification side. Hence it is not
necessary for the lock to be a rwlock as there are no read side
holders anymore. Convert it to a spin lock to reflect it's exclusive
nature.

Signed-off-by: Dave Chinner
Reviewed-by: Alex Elder
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-12-16 14:08:41 +0800

11 Nov, 2010

1 commit

f83282a8e xfs: fix per-ag reference counting in inode reclaim tree walking ... Browse Code »

The walk fails to decrement the per-ag reference count when the
non-blocking walk fails to obtain the per-ag reclaim lock, leading
to an assert failure on debug kernels when unmounting a filesystem.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-11-11 02:00:48 +0800

19 Oct, 2010

11 commits

1a1a3e97b xfs: remove xfs_buf wrappers ... Browse Code »

Stop having two different names for many buffer functions and use
the more descriptive xfs_buf_* names directly.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-10-19 04:08:07 +0800
1b0407125 xfs: do not use xfs_mod_incore_sb_batch for per-cpu counters ... Browse Code »

Update the per-cpu counters manually in xfs_trans_unreserve_and_mod_sb
and remove support for per-cpu counters from xfs_mod_incore_sb_batch
to simplify it. And added benefit is that we don't have to take
m_sb_lock for transactions that only modify per-cpu counters.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-10-19 04:08:00 +0800
96540c785 xfs: do not use xfs_mod_incore_sb for per-cpu counters ... Browse Code »

Export xfs_icsb_modify_counters and always use it for modifying
the per-cpu counters. Remove support for per-cpu counters from
xfs_mod_incore_sb to simplify it.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-10-19 04:07:59 +0800
61ba35dea xfs: remove XFS_MOUNT_NO_PERCPU_SB ... Browse Code »

Fail the mount if we can't allocate memory for the per-CPU counters.
This is consistent with how we handle everything else in the mount
path and makes the superblock counter modification a lot simpler.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-10-19 04:07:58 +0800
74f75a0cb xfs: convert buffer cache hash to rbtree ... Browse Code »

The buffer cache hash is showing typical hash scalability problems.
In large scale testing the number of cached items growing far larger
than the hash can efficiently handle. Hence we need to move to a
self-scaling cache indexing mechanism.

I have selected rbtrees for indexing becuse they can have O(log n)
search scalability, and insert and remove cost is not excessive,
even on large trees. Hence we should be able to cache large numbers
of buffers without incurring the excessive cache miss search
penalties that the hash is imposing on us.

To ensure we still have parallel access to the cache, we need
multiple trees. Rather than hashing the buffers by disk address to
select a tree, it seems more sensible to separate trees by typical
access patterns. Most operations use buffers from within a single AG
at a time, so rather than searching lots of different lists,
separate the buffer indexes out into per-AG rbtrees. This means that
searches during metadata operation have a much higher chance of
hitting cache resident nodes, and that updates of the tree are less
likely to disturb trees being accessed on other CPUs doing
independent operations.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder

Dave Chinner
2010-10-19 04:07:56 +0800
69b491c21 xfs: serialise inode reclaim within an AG ... Browse Code »

Memory reclaim via shrinkers has a terrible habit of having N+M
concurrent shrinker executions (N = num CPUs, M = num kswapds) all
trying to shrink the same cache. When the cache they are all working
on is protected by a single spinlock, massive contention an
slowdowns occur.

Wrap the per-ag inode caches with a reclaim mutex to serialise
reclaim access to the AG. This will block concurrent reclaim in each
AG but still allow reclaim to scan multiple AGs concurrently. Allow
shrinkers to move on to the next AG if it can't get the lock, and if
we can't get any AG, then start blocking on locks.

To prevent reclaimers from continually scanning the same inodes in
each AG, add a cursor that tracks where the last reclaim got up to
and start from that point on the next reclaim. This should avoid
only ever scanning a small number of inodes at the satart of each AG
and not making progress. If we have a non-shrinker based reclaim
pass, ignore the cursor and reset it to zero once we are done.

Signed-off-by: Dave Chinner
Reviewed-by: Alex Elder

Dave Chinner
2010-10-19 04:07:55 +0800
65d0f2053 xfs: split inode AG walking into separate code for reclaim ... Browse Code »

The reclaim walk requires different locking and has a slightly
different walk algorithm, so separate it out so that it can be
optimised separately.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder

Dave Chinner
2010-10-19 04:07:52 +0800
1922c949c xfs: use unhashed buffers for size checks ... Browse Code »

When we are checking we can access the last block of each device, we
do not need to use cached buffers as they will be tossed away
immediately. Use uncached buffers for size checks so that all IO
prior to full in-memory structure initialisation does not use the
buffer cache.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder

Dave Chinner
2010-10-19 04:07:50 +0800
26af65523 xfs: kill XBF_FS_MANAGED buffers ... Browse Code »

Filesystem level managed buffers are buffers that have their
lifecycle controlled by the filesystem layer, not the buffer cache.
We currently cache these buffers, which makes cleanup and cache
walking somewhat troublesome. Convert the fs managed buffers to
uncached buffers obtained by via xfs_buf_get_uncached(), and remove
the XBF_FS_MANAGED special cases from the buffer cache.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder

Dave Chinner
2010-10-19 04:07:49 +0800
e176579e7 xfs: lockless per-ag lookups ... Browse Code »

When we start taking a reference to the per-ag for every cached
buffer in the system, kernel lockstat profiling on an 8-way create
workload shows the mp->m_perag_lock has higher acquisition rates
than the inode lock and has significantly more contention. That is,
it becomes the highest contended lock in the system.

The perag lookup is trivial to convert to lock-less RCU lookups
because perag structures never go away. Hence the only thing we need
to protect against is tree structure changes during a grow. This can
be done simply by replacing the locking in xfs_perag_get() with RCU
read locking. This removes the mp->m_perag_lock completely from this
path.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder

Dave Chinner
2010-10-19 04:07:44 +0800
bd32d25a7 xfs: remove debug assert for per-ag reference counting ... Browse Code »

When we start taking references per cached buffer to the the perag
it is cached on, it will blow the current debug maximum reference
count assert out of the water. The assert has never caught a bug,
and we have tracing to track changes if there ever is a problem,
so just remove it.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder

Dave Chinner
2010-10-19 04:07:43 +0800

27 Jul, 2010

2 commits

3400777ff xfs: remove unneeded #include statements ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-07-27 02:16:33 +0800
288699fec xfs: drop dmapi hooks ... Browse Code »

Dmapi support was never merged upstream, but we still have a lot of hooks
bloating XFS for it, all over the fast pathes of the filesystem.

This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM
support in mainline at least the namespace events can be done much saner
in the VFS instead of the individual filesystem, so it's not like this
is much help for future work.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-07-27 02:16:33 +0800

24 Jun, 2010

1 commit

7b6259e7a xfs: remove block number from inode lookup code ... Browse Code »

The block number comes from bulkstat based inode lookups to shortcut
the mapping calculations. We ar enot able to trust anything from
bulkstat, so drop the block number as well so that the correct
lookups and mappings are always done.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-06-24 09:35:17 +0800

29 May, 2010

3 commits

fb3b504ad xfs: fix access to upper inodes without inode64 ... Browse Code »

If a filesystem is mounted without the inode64 mount option we
should still be able to access inodes not fitting into 32 bits, just
not created new ones. For this to work we need to make sure the
inode cache radix tree is initialized for all allocation groups, not
just those we plan to allocate inodes from. This patch makes sure
we initialize the inode cache radix tree for all allocation groups,
and also cleans xfs_initialize_perag up a bit to separate the
inode32 logical from the general perag structure setup.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-05-29 04:19:56 +0800
9b98b6f3e xfs: fix might_sleep() warning when initialising per-ag tree ... Browse Code »

The use of radix_tree_preload() only works if the radix tree was
initialised without the __GFP_WAIT flag. The per-ag tree uses
GFP_NOFS, so does not trigger allocation of new tree nodes from the
preloaded array. Hence it enters the allocator with a spinlock held
and triggers the might_sleep() warnings.

Reported-by; Chris Mason
Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-05-29 04:19:50 +0800
657a4cffd xfs: replace E2BIG with EFBIG where appropriate ... Browse Code »

Many places in the xfs code return E2BIG when they really mean
EFBIG; trying to grow past 16T on a 32 bit machine, for example,
says "Argument list too long" rather than "File too large" which is
not particularly helpful.

Some of these don't make perfect sense as EFBIG either, but still
better than E2BIG IMHO.

Signed-off-by: Eric Sandeen
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Eric Sandeen
2010-05-29 03:58:16 +0800

19 May, 2010

1 commit

1414a6046 xfs: remove dead XFS_LOUD_RECOVERY code ... Browse Code »

This can't be enabled through the build system and has been dead for
ages. Note that the CRC patches add back log checksumming, but the
code is quite different from the version removed here anyway.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-05-19 22:58:15 +0800

06 Mar, 2010

1 commit

8babd8a2e xfs: Increase the default size of the reserved blocks pool ... Browse Code »

The current default size of the reserved blocks pool is easy to deplete
with certain workloads, in particular workloads that do lots of concurrent
delayed allocation extent conversions. If enough transactions are running
in parallel and the entire pool is consumed then subsequent calls to
xfs_trans_reserve() will fail with ENOSPC. Also add a rate limited
warning so we know if this starts happening again.

This is an updated version of an old patch from Lachlan McIlroy.

Signed-off-by: Dave Chinner
Signed-off-by: Alex Elder

Dave Chinner
2010-03-06 01:01:59 +0800