Doug / smarc-fsl-linux-kernel | Embedian Git Server

12 Oct, 2011

2 commits

b522950f0 xfs: Check the return value of xfs_buf_get() ... Browse Code »

Check the return value of xfs_buf_get() and fail appropriately.

Signed-off-by: Chandra Seetharaman
Signed-off-by: Alex Elder

Chandra Seetharaman
2011-10-12 10:15:01 +0800
c2b006c1d xfs: let xfs_bwrite callers handle the xfs_buf_relse ... Browse Code »

Remove the xfs_buf_relse from xfs_bwrite and let the caller handle it to
mirror the delwri and read paths.

Also remove the mount pointer passed to xfs_bwrite, which is superflous now
that we have a mount pointer in the buftarg.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder

Christoph Hellwig
2011-10-12 10:15:00 +0800

07 Mar, 2011

1 commit

534877869 xfs: convert xfs_fs_cmn_err to new error logging API ... Browse Code »

Continue to clean up the error logging code by converting all the
callers of xfs_fs_cmn_err() to the new API. Once done, remove the
unused old API function.

Signed-off-by: Dave Chinner
Reviewed-by: Alex Elder
Reviewed-by: Christoph Hellwig

Dave Chinner
2011-03-07 07:05:35 +0800

22 Feb, 2011

1 commit

c4d0c3b09 xfs: prevent leaking uninitialized stack memory in FSGEOMETRY_V1 ... Browse Code »

The FSGEOMETRY_V1 ioctl (and its compat equivalent) calls out to
xfs_fs_geometry() with a version number of 3. This code path does not
fill in the logsunit member of the passed xfs_fsop_geom_t, leading to
the leaking of four bytes of uninitialized stack data to potentially
unprivileged callers.

v2 switches to memset() to avoid future issues if structure members
change, on suggestion of Dave Chinner.

Signed-off-by: Dan Rosenberg
Reviewed-by: Eugene Teo
Signed-off-by: Alex Elder

Dan Rosenberg
2011-02-22 09:55:47 +0800

12 Jan, 2011

1 commit

c58efdb44 xfs: ensure log covering transactions are synchronous ... Browse Code »

To ensure the log is covered and the filesystem idles correctly, we
need to ensure that dummy transactions hit the disk and do not stay
pinned in memory. If the superblock is pinned in memory, it can't
be flushed so the log covering cannot make progress. The result is
dependent on timing - more oftent han not we continue to issues a
log covering transaction every 36s rather than idling after ~90s.

Fix this by making the log covering transaction synchronous. To
avoid additional log force from xfssyncd, make the log covering
transaction take the place of the existing log force in the xfssyncd
background sync process.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2011-01-12 10:28:17 +0800

04 Jan, 2011

1 commit

055388a31 xfs: dynamic speculative EOF preallocation ... Browse Code »

Currently the size of the speculative preallocation during delayed
allocation is fixed by either the allocsize mount option of a
default size. We are seeing a lot of cases where we need to
recommend using the allocsize mount option to prevent fragmentation
when buffered writes land in the same AG.

Rather than using a fixed preallocation size by default (up to 64k),
make it dynamic by basing it on the current inode size. That way the
EOF preallocation will increase as the file size increases. Hence
for streaming writes we are much more likely to get large
preallocations exactly when we need it to reduce fragementation.

For default settings, the size of the initial extents is determined
by the number of parallel writers and the amount of memory in the
machine. For 4GB RAM and 4 concurrent 32GB file writes:

EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..1048575]: 1048672..2097247 0 (1048672..2097247) 1048576
1: [1048576..2097151]: 5242976..6291551 0 (5242976..6291551) 1048576
2: [2097152..4194303]: 12583008..14680159 0 (12583008..14680159) 2097152
3: [4194304..8388607]: 25165920..29360223 0 (25165920..29360223) 4194304
4: [8388608..16777215]: 58720352..67108959 0 (58720352..67108959) 8388608
5: [16777216..33554423]: 117440584..134217791 0 (117440584..134217791) 16777208
6: [33554424..50331511]: 184549056..201326143 0 (184549056..201326143) 16777088
7: [50331512..67108599]: 251657408..268434495 0 (251657408..268434495) 16777088

and for 16 concurrent 16GB file writes:

EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..262143]: 2490472..2752615 0 (2490472..2752615) 262144
1: [262144..524287]: 6291560..6553703 0 (6291560..6553703) 262144
2: [524288..1048575]: 13631592..14155879 0 (13631592..14155879) 524288
3: [1048576..2097151]: 30408808..31457383 0 (30408808..31457383) 1048576
4: [2097152..4194303]: 52428904..54526055 0 (52428904..54526055) 2097152
5: [4194304..8388607]: 104857704..109052007 0 (104857704..109052007) 4194304
6: [8388608..16777215]: 209715304..218103911 0 (209715304..218103911) 8388608
7: [16777216..33554423]: 452984848..469762055 0 (452984848..469762055) 16777208

Because it is hard to take back specualtive preallocation, cases
where there are large slow growing log files on a nearly full
filesystem may cause premature ENOSPC. Hence as the filesystem nears
full, the maximum dynamic prealloc size іs reduced according to this
table (based on 4k block size):

freespace max prealloc size
>5% full extent (8GB)
4-5% 2GB (8GB >> 2)
3-4% 1GB (8GB >> 3)
2-3% 512MB (8GB >> 4)
1-2% 256MB (8GB >> 5)
> 6)

This should reduce the amount of space held in speculative
preallocation for such cases.

The allocsize mount option turns off the dynamic behaviour and fixes
the prealloc size to whatever the mount option specifies. i.e. the
behaviour is unchanged.

Signed-off-by: Dave Chinner

Dave Chinner
2011-01-04 08:35:03 +0800

19 Oct, 2010

2 commits

96540c785 xfs: do not use xfs_mod_incore_sb for per-cpu counters ... Browse Code »

Export xfs_icsb_modify_counters and always use it for modifying
the per-cpu counters. Remove support for per-cpu counters from
xfs_mod_incore_sb to simplify it.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-10-19 04:07:59 +0800
1922c949c xfs: use unhashed buffers for size checks ... Browse Code »

When we are checking we can access the last block of each device, we
do not need to use cached buffers as they will be tossed away
immediately. Use uncached buffers for size checks so that all IO
prior to full in-memory structure initialisation does not use the
buffer cache.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder

Dave Chinner
2010-10-19 04:07:50 +0800

24 Aug, 2010

1 commit

1a387d3be xfs: dummy transactions should not dirty VFS state ... Browse Code »

When we need to cover the log, we issue dummy transactions to ensure
the current log tail is on disk. Unfortunately we currently use the
root inode in the dummy transaction, and the act of committing the
transaction dirties the inode at the VFS level.

As a result, the VFS writeback of the dirty inode will prevent the
filesystem from idling long enough for the log covering state
machine to complete. The state machine gets stuck in a loop issuing
new dummy transactions to cover the log and never makes progress.

To avoid this problem, the dummy transactions should not cause
externally visible state changes. To ensure this occurs, make sure
that dummy transactions log an unchanging field in the superblock as
it's state is never propagated outside the filesystem. This allows
the log covering state machine to complete successfully and the
filesystem now correctly enters a fully idle state about 90s after
the last modification was made.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-08-24 09:46:31 +0800

27 Jul, 2010

3 commits

898621d5a xfs: simplify inode to transaction joining ... Browse Code »

Currently we need to either call IHOLD or xfs_trans_ihold on an inode when
joining it to a transaction via xfs_trans_ijoin.

This patches instead makes xfs_trans_ijoin usable on it's own by doing
an implicity xfs_trans_ihold, which also allows us to drop the third
argument. For the case where we want to hold a reference on the inode
a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks
the inode for needing an xfs_iput. In addition to the cleaner interface
to the caller this also simplifies the implementation.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-07-27 02:16:36 +0800
3400777ff xfs: remove unneeded #include statements ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-07-27 02:16:33 +0800
288699fec xfs: drop dmapi hooks ... Browse Code »

Dmapi support was never merged upstream, but we still have a lot of hooks
bloating XFS for it, all over the fast pathes of the filesystem.

This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM
support in mainline at least the namespace events can be done much saner
in the VFS instead of the individual filesystem, so it's not like this
is much help for future work.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-07-27 02:16:33 +0800

16 Jan, 2010

1 commit

1c1c6ebcf xfs: Replace per-ag array with a radix tree ... Browse Code »

The use of an array for the per-ag structures requires reallocation
of the array when growing the filesystem. This requires locking
access to the array to avoid use after free situations, and the
locking is difficult to get right. To avoid needing to reallocate an
array, change the per-ag structures to an allocated object per ag
and index them using a tree structure.

The AGs are always densely indexed (hence the use of an array), but
the number supported is 2^32 and lookups tend to be random and hence
indexing needs to scale. A simple choice is a radix tree - it works
well with this sort of index. This change also removes another
large contiguous allocation from the mount/growfs path in XFS.

The growing process now needs to change to only initialise the new
AGs required for the extra space, and as such only needs to
exclusively lock the tree for inserts. The rest of the code only
needs to lock the tree while doing lookups, and hence this will
remove all the deadlocks that currently occur on the m_perag_lock as
it is now an innermost lock. The lock is also changed to a spinlock
from a read/write lock as the hold time is now extremely short.

To complete the picture, the per-ag structures will need to be
reference counted to ensure that we don't free/modify them while
they are still in use. This will be done in subsequent patch.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-01-16 05:33:52 +0800

15 Dec, 2009

1 commit

0b1b213fc xfs: event tracing support ... Browse Code »

Convert the old xfs tracing support that could only be used with the
out of tree kdb and xfsidbg patches to use the generic event tracer.

To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
all xfs trace channels by:

echo 1 > /sys/kernel/debug/tracing/events/xfs/enable

or alternatively enable single events by just doing the same in one
event subdirectory, e.g.

echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable

or set more complex filters, etc. In Documentation/trace/events.txt
all this is desctribed in more detail. To reads the events do a

cat /sys/kernel/debug/tracing/trace

Compared to the last posting this patch converts the tracing mostly to
the one tracepoint per callsite model that other users of the new
tracing facility also employ. This allows a very fine-grained control
of the tracing, a cleaner output of the traces and also enables the
perf tool to use each tracepoint as a virtual performance counter,
allowing us to e.g. count how often certain workloads git various
spots in XFS. Take a look at

http://lwn.net/Articles/346470/

for some examples.

Also the btree tracing isn't included at all yet, as it will require
additional core tracing features not in mainline yet, I plan to
deliver it later.

And the really nice thing about this patch is that it actually removes
many lines of code while adding this nice functionality:

fs/xfs/Makefile | 8
fs/xfs/linux-2.6/xfs_acl.c | 1
fs/xfs/linux-2.6/xfs_aops.c | 52 -
fs/xfs/linux-2.6/xfs_aops.h | 2
fs/xfs/linux-2.6/xfs_buf.c | 117 +--
fs/xfs/linux-2.6/xfs_buf.h | 33
fs/xfs/linux-2.6/xfs_fs_subr.c | 3
fs/xfs/linux-2.6/xfs_ioctl.c | 1
fs/xfs/linux-2.6/xfs_ioctl32.c | 1
fs/xfs/linux-2.6/xfs_iops.c | 1
fs/xfs/linux-2.6/xfs_linux.h | 1
fs/xfs/linux-2.6/xfs_lrw.c | 87 --
fs/xfs/linux-2.6/xfs_lrw.h | 45 -
fs/xfs/linux-2.6/xfs_super.c | 104 ---
fs/xfs/linux-2.6/xfs_super.h | 7
fs/xfs/linux-2.6/xfs_sync.c | 1
fs/xfs/linux-2.6/xfs_trace.c | 75 ++
fs/xfs/linux-2.6/xfs_trace.h | 1369 +++++++++++++++++++++++++++++++++++++++++
fs/xfs/linux-2.6/xfs_vnode.h | 4
fs/xfs/quota/xfs_dquot.c | 110 ---
fs/xfs/quota/xfs_dquot.h | 21
fs/xfs/quota/xfs_qm.c | 40 -
fs/xfs/quota/xfs_qm_syscalls.c | 4
fs/xfs/support/ktrace.c | 323 ---------
fs/xfs/support/ktrace.h | 85 --
fs/xfs/xfs.h | 16
fs/xfs/xfs_ag.h | 14
fs/xfs/xfs_alloc.c | 230 +-----
fs/xfs/xfs_alloc.h | 27
fs/xfs/xfs_alloc_btree.c | 1
fs/xfs/xfs_attr.c | 107 ---
fs/xfs/xfs_attr.h | 10
fs/xfs/xfs_attr_leaf.c | 14
fs/xfs/xfs_attr_sf.h | 40 -
fs/xfs/xfs_bmap.c | 507 +++------------
fs/xfs/xfs_bmap.h | 49 -
fs/xfs/xfs_bmap_btree.c | 6
fs/xfs/xfs_btree.c | 5
fs/xfs/xfs_btree_trace.h | 17
fs/xfs/xfs_buf_item.c | 87 --
fs/xfs/xfs_buf_item.h | 20
fs/xfs/xfs_da_btree.c | 3
fs/xfs/xfs_da_btree.h | 7
fs/xfs/xfs_dfrag.c | 2
fs/xfs/xfs_dir2.c | 8
fs/xfs/xfs_dir2_block.c | 20
fs/xfs/xfs_dir2_leaf.c | 21
fs/xfs/xfs_dir2_node.c | 27
fs/xfs/xfs_dir2_sf.c | 26
fs/xfs/xfs_dir2_trace.c | 216 ------
fs/xfs/xfs_dir2_trace.h | 72 --
fs/xfs/xfs_filestream.c | 8
fs/xfs/xfs_fsops.c | 2
fs/xfs/xfs_iget.c | 111 ---
fs/xfs/xfs_inode.c | 67 --
fs/xfs/xfs_inode.h | 76 --
fs/xfs/xfs_inode_item.c | 5
fs/xfs/xfs_iomap.c | 85 --
fs/xfs/xfs_iomap.h | 8
fs/xfs/xfs_log.c | 181 +----
fs/xfs/xfs_log_priv.h | 20
fs/xfs/xfs_log_recover.c | 1
fs/xfs/xfs_mount.c | 2
fs/xfs/xfs_quota.h | 8
fs/xfs/xfs_rename.c | 1
fs/xfs/xfs_rtalloc.c | 1
fs/xfs/xfs_rw.c | 3
fs/xfs/xfs_trans.h | 47 +
fs/xfs/xfs_trans_buf.c | 62 -
fs/xfs/xfs_vnodeops.c | 8
70 files changed, 2151 insertions(+), 2592 deletions(-)

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2009-12-15 13:08:16 +0800

12 Dec, 2009

2 commits

6ad112bfb xfs: simplify xfs_buf_get / xfs_buf_read interfaces ... Browse Code »

Currently the low-level buffer cache interfaces are highly confusing
as we have a _flags variant of each that does actually respect the
flags, and one without _flags which has a flags argument that gets
ignored and overriden with a default set. Given that very few places
use the default arguments get rid of the duplication and convert all
callers to pass the flags explicitly. Also remove the now confusing
_flags postfix.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder

Christoph Hellwig
2009-12-12 05:11:21 +0800
80641dc66 xfs: I/O completion handlers must use NOFS allocations ... Browse Code »

When completing I/O requests we must not allow the memory allocator to
recurse into the filesystem, as we might deadlock on waiting for the
I/O completion otherwise. The only thing currently allocating normal
GFP_KERNEL memory is the allocation of the transaction structure for
the unwritten extent conversion. Add a memflags argument to
_xfs_trans_alloc to allow controlling the allocator behaviour.

Signed-off-by: Christoph Hellwig
Reported-by: Thomas Neumann
Tested-by: Thomas Neumann
Reviewed-by: Alex Elder
Signed-off-by: Alex Elder

Christoph Hellwig
2009-12-12 05:11:20 +0800

12 Aug, 2009

1 commit

0cc6eee13 xfs: avoid memory allocation under m_peraglock in growfs code ... Browse Code »

Allocate the memory for the larger m_perag array before taking the
per-AG lock as the per-AG lock can be taken under the i_lock which
can be taken from reclaim context.

Reported by the new reclaim context tracing in lockdep.

Signed-off-by: Christoph Hellwig
Reviewed-by: Felix Blyakher
Signed-off-by: Felix Blyakher

Christoph Hellwig
2009-08-12 14:06:51 +0800

02 Jun, 2009

1 commit

e6da7c9fe xfs: fix overflow in xfs_growfs_data_private ... Browse Code »

In the case where growing a filesystem would leave the last AG
too small, the fixup code has an overflow in the calculation
of the new size with one fewer ag, because "nagcount" is a 32
bit number. If the new filesystem has > 2^32 blocks in it
this causes a problem resulting in an EINVAL return from growfs:

# xfs_io -f -c "truncate 19998630180864" fsfile
# mkfs.xfs -f -bsize=4096 -dagsize=76288719b,size=3905982455b fsfile
# mount -o loop fsfile /mnt
# xfs_growfs /mnt

meta-data=/dev/loop0 isize=256 agcount=52,
agsize=76288719 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=3905982455, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=32768, version=2
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Invalid argument

Reported-by: richard.ems@cape-horn-eng.com
Signed-off-by: Eric Sandeen
Reviewed-by: Christoph Hellwig
Reviewed-by: Felix Blyakher
Signed-off-by: Felix Blyakher

Eric Sandeen
2009-06-02 11:59:38 +0800

29 Mar, 2009

1 commit

9da096fd1 xfs: fix various typos ... Browse Code »

Signed-off-by: Malcolm Parsons
Reviewed-by: Christoph Hellwig

Malcolm Parsons
2009-03-29 15:55:42 +0800

10 Jan, 2009

1 commit

c4be0c1dc filesystem freeze: add error handling of write_super_lockfs/unlockfs ... Browse Code »

Currently, ext3 in mainline Linux doesn't have the freeze feature which
suspends write requests. So, we cannot take a backup which keeps the
filesystem's consistency with the storage device's features (snapshot and
replication) while it is mounted.

In many case, a commercial filesystem (e.g. VxFS) has the freeze feature
and it would be used to get the consistent backup.

If Linux's standard filesystem ext3 has the freeze feature, we can do it
without a commercial filesystem.

So I have implemented the ioctls of the freeze feature.
I think we can take the consistent backup with the following steps.
1. Freeze the filesystem with the freeze ioctl.
2. Separate the replication volume or create the snapshot
with the storage device's feature.
3. Unfreeze the filesystem with the unfreeze ioctl.
4. Take the backup from the separated replication volume
or the snapshot.

This patch:

VFS:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they can return an error.
Rename write_super_lockfs and unlockfs of the super block operation
freeze_fs and unfreeze_fs to avoid a confusion.

ext3, ext4, xfs, gfs2, jfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that write_super_lockfs returns an error if needed,
and unlockfs always returns 0.

reiserfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they always return 0 (success) to keep a current behavior.

Signed-off-by: Takashi Sato
Signed-off-by: Masayuki Hamaguchi
Cc:
Cc:
Cc: Christoph Hellwig
Cc: Dave Kleikamp
Cc: Dave Chinner
Cc: Alasdair G Kergon
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Takashi Sato
2009-01-10 08:54:42 +0800

02 Dec, 2008

1 commit

743bb4650 [XFS] Move copy_from_user calls out of ioctl helpers into ioctl switch. ... Browse Code »

Moving the copy_from_user out of some of the ioctl helpers will
make it easier for the compat ioctl switch to copy in the right
struct, then just pass to the underlying helper.

Also, move common access checks into the helpers themselves,
and out of the native ioctl switch code, to reduce code
duplication between native & compat ioctl callers.

Signed-off-by: Eric Sandeen
Reviewed-by: Christoph Hellwig
Signed-off-by: Lachlan McIlroy

sandeen@sandeen.net
2008-12-02 14:08:01 +0800

30 Oct, 2008

2 commits

7cc95a821 [XFS] Always use struct xfs_btree_block instead of short / longform ... Browse Code »

structures.

Always use the generic xfs_btree_block type instead of the short / long
structures. Add XFS_BTREE_SBLOCK_LEN / XFS_BTREE_LBLOCK_LEN defines for
the length of a short / long form block. The rationale for this is that we
will grow more btree block header variants to support CRCs and other RAS
information, and always accessing them through the same datatype with
unions for the short / long form pointers makes implementing this much
easier.

SGI-PV: 988146

SGI-Modid: xfs-linux-melb:xfs-kern:32300a

Signed-off-by: Christoph Hellwig
Signed-off-by: Donald Douwsma
Signed-off-by: David Chinner
Signed-off-by: Lachlan McIlroy

Christoph Hellwig
2008-10-30 14:14:34 +0800
136341b41 [XFS] cleanup btree record / key / ptr addressing macros. ... Browse Code »

Replace the generic record / key / ptr addressing macros that use cpp
token pasting with simpler macros that do the job for just one given btree
type. The new macros lose the cur argument and thus can be used outside
the core btree code, but also gain an xfs_mount * argument to allow for
checking the CRC flag in the near future. Note that many of these macros
aren't actually used in the kernel code, but only in userspace (mostly in
xfs_repair).

SGI-PV: 988146

SGI-Modid: xfs-linux-melb:xfs-kern:32295a

Signed-off-by: Christoph Hellwig
Signed-off-by: Donald Douwsma
Signed-off-by: David Chinner
Signed-off-by: Lachlan McIlroy

Christoph Hellwig
2008-10-30 14:11:40 +0800

28 Jul, 2008

1 commit

189f4bf22 [XFS] XFS: ASCII case-insensitive support ... Browse Code »

Implement ASCII case-insensitive support. It's primary purpose is for
supporting existing filesystems that already use this case-insensitive
mode migrated from IRIX. But, if you only need ASCII-only case-insensitive
support (ie. English only) and will never use another language, then this
mode is perfectly adequate.

ASCII-CI is implemented by generating hashes based on lower-case letters
and doing lower-case compares. It implements a new xfs_nameops vector for
doing the hashes and comparisons for all filename operations.

To create a filesystem with this CI mode, use: # mkfs.xfs -n version=ci

SGI-PV: 981516
SGI-Modid: xfs-linux-melb:xfs-kern:31209a

Signed-off-by: Barry Naujok
Signed-off-by: Christoph Hellwig

Barry Naujok
2008-07-28 14:58:42 +0800

29 Apr, 2008

2 commits

d349404ff [XFS] Don't double count reserved block changes on UP. ... Browse Code »

On uniprocessor machines, the incore superblock is used for all in memory
accounting of free blocks. in this situation, changes to the reserved
block count are accounted twice; once directly and once via
xfs_mod_incore_sb(). Seeing as the modification on SMP is done via
xfs_mod_incore_sb(), make this the only update mechanism that UP uses as
well.

SGI-PV: 980654
SGI-Modid: xfs-linux-melb:xfs-kern:30997a

Signed-off-by: David Chinner
Signed-off-by: Lachlan McIlroy

David Chinner
2008-04-29 13:58:27 +0800
d4d90b577 [XFS] Add xfs_icsb_sync_counters_locked for when m_sb_lock already held ... Browse Code »

Add a new xfs_icsb_sync_counters_locked for the case where m_sb_lock
is already taken and add a flags argument to xfs_icsb_sync_counters so
that xfs_icsb_sync_counters_flags is not needed.

SGI-PV: 976035
SGI-Modid: xfs-linux-melb:xfs-kern:30917a

Signed-off-by: Christoph Hellwig
Signed-off-by: David Chinner
Signed-off-by: Lachlan McIlroy

Christoph Hellwig
2008-04-29 13:57:11 +0800

10 Apr, 2008

1 commit

621187099 [XFS] remove shouting-indirection macros from xfs_sb.h ... Browse Code »

Remove macro-to-small-function indirection from xfs_sb.h, and remove some
which are completely unused.

SGI-PV: 976035
SGI-Modid: xfs-linux-melb:xfs-kern:30528a

Signed-off-by: Eric Sandeen
Signed-off-by: Donald Douwsma
Signed-off-by: Lachlan McIlroy

Eric Sandeen
2008-04-10 14:24:45 +0800

14 Feb, 2008

1 commit

413d57c99 xfs: convert beX_add to beX_add_cpu (new common API) ... Browse Code »

remove beX_add functions and replace all uses with beX_add_cpu

Signed-off-by: Marcin Slusarz
Cc: Mark Fasheh
Reviewed-by: Dave Chinner
Cc: Timothy Shimmin
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Marcin Slusarz
2008-02-14 08:21:19 +0800

07 Feb, 2008

1 commit

3685c2a1d [XFS] Unwrap XFS_SB_LOCK. ... Browse Code »

Un-obfuscate XFS_SB_LOCK, remove XFS_SB_LOCK->mutex_lock->spin_lock
macros, call spin_lock directly, remove extraneous cookie holdover from
old xfs code, and change lock type to spinlock_t.

SGI-PV: 970382
SGI-Modid: xfs-linux-melb:xfs-kern:29746a

Signed-off-by: Eric Sandeen
Signed-off-by: Donald Douwsma
Signed-off-by: Tim Shimmin

Eric Sandeen
2008-02-07 13:47:15 +0800

16 Oct, 2007

2 commits

cc92e7ac8 [XFS] growlock should be a mutex ... Browse Code »

m_growlock only needs plain binary mutex semantics, so use a struct mutex
instead of a semaphore for it.

SGI-PV: 968563
SGI-Modid: xfs-linux-melb:xfs-kern:29512a

Signed-off-by: Christoph Hellwig
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin

Christoph Hellwig
2007-10-16 10:18:09 +0800
b267ce995 [XFS] kill struct bhv_vfs ... Browse Code »

Now that struct bhv_vfs doesn't have any members left we can kill it and
go directly from the super_block to the xfs_mount everywhere.

SGI-PV: 969608
SGI-Modid: xfs-linux-melb:xfs-kern:29509a

Signed-off-by: Christoph Hellwig
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin

Christoph Hellwig
2007-10-16 10:17:27 +0800

15 Oct, 2007

1 commit

2bdf7cd0b [XFS] superblock endianess annotations ... Browse Code »

Creates a new xfs_dsb_t that is __be annotated and keeps xfs_sb_t for the
incore one. xfs_xlatesb is renamed to xfs_sb_to_disk and only handles the
incore -> disk conversion. A new helper xfs_sb_from_disk handles the other
direction and doesn't need the slightly hacky table-driven approach
because we only ever read the full sb from disk.

The handling of shared r/o filesystems has been buggy on little endian
system and fixing this required shuffling around of some code in that
area.

SGI-PV: 968563
SGI-Modid: xfs-linux-melb:xfs-kern:29477a

Signed-off-by: Christoph Hellwig
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin

Christoph Hellwig
2007-10-15 14:49:09 +0800

14 Jul, 2007

5 commits

2a82b8be8 [XFS] Concurrent Multi-File Data Streams ... Browse Code »

In media spaces, video is often stored in a frame-per-file format. When
dealing with uncompressed realtime HD video streams in this format, it is
crucial that files do not get fragmented and that multiple files a placed
contiguously on disk.

When multiple streams are being ingested and played out at the same time,
it is critical that the filesystem does not cross the streams and
interleave them together as this creates seek and readahead cache miss
latency and prevents both ingest and playout from meeting frame rate
targets.

This patch set creates a "stream of files" concept into the allocator to
place all the data from a single stream contiguously on disk so that RAID
array readahead can be used effectively. Each additional stream gets
placed in different allocation groups within the filesystem, thereby
ensuring that we don't cross any streams. When an AG fills up, we select a
new AG for the stream that is not in use.

The core of the functionality is the stream tracking - each inode that we
create in a directory needs to be associated with the directories' stream.
Hence every time we create a file, we look up the directories' stream
object and associate the new file with that object.

Once we have a stream object for a file, we use the AG that the stream
object point to for allocations. If we can't allocate in that AG (e.g. it
is full) we move the entire stream to another AG. Other inodes in the same
stream are moved to the new AG on their next allocation (i.e. lazy
update).

Stream objects are kept in a cache and hold a reference on the inode.
Hence the inode cannot be reclaimed while there is an outstanding stream
reference. This means that on unlink we need to remove the stream
association and we also need to flush all the associations on certain
events that want to reclaim all unreferenced inodes (e.g. filesystem
freeze).

SGI-PV: 964469
SGI-Modid: xfs-linux-melb:xfs-kern:29096a

Signed-off-by: David Chinner
Signed-off-by: Barry Naujok
Signed-off-by: Donald Douwsma
Signed-off-by: Christoph Hellwig
Signed-off-by: Tim Shimmin
Signed-off-by: Vlad Apostolov

David Chinner
2007-07-14 13:40:53 +0800
84e1e99f1 [XFS] Prevent ENOSPC from aborting transactions that need to succeed ... Browse Code »

During delayed allocation extent conversion or unwritten extent
conversion, we need to reserve some blocks for transactions reservations.
We need to reserve these blocks in case a btree split occurs and we need
to allocate some blocks.

Unfortunately, we've only ever reserved the number of data blocks we are
allocating, so in both the unwritten and delalloc case we can get ENOSPC
to the transaction reservation. This is bad because in both cases we
cannot report the failure to the writing application.

The fix is two-fold:

1 - leverage the reserved block infrastructure XFS already
has to reserve a small pool of blocks by default to allow
specially marked transactions to dip into when we are at
ENOSPC.
Default setting is min(5%, 1024 blocks).

2 - convert critical transaction reservations to be allowed
to dip into this pool. Spots changed are delalloc
conversion, unwritten extent conversion and growing a
filesystem at ENOSPC.
This also allows growing the filesytsem to succeed at ENOSPC.

SGI-PV: 964468
SGI-Modid: xfs-linux-melb:xfs-kern:28865a

Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin

David Chinner
2007-07-14 13:35:19 +0800
0164af51c [XFS] Log the agf_length change in xfs_growfs_data_private(). ... Browse Code »

SGI-PV: 963528
SGI-Modid: xfs-linux-melb:xfs-kern:28856a

Signed-off-by: Tim Shimmin
Signed-off-by: David Chinner
Signed-off-by: Christoph Hellwig

Tim Shimmin
2007-07-14 13:32:59 +0800
92821e2ba [XFS] Lazy Superblock Counters ... Browse Code »

When we have a couple of hundred transactions on the fly at once, they all
typically modify the on disk superblock in some way.
create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify
free block counts.

When these counts are modified in a transaction, they must eventually lock
the superblock buffer and apply the mods. The buffer then remains locked
until the transaction is committed into the incore log buffer. The result
of this is that with enough transactions on the fly the incore superblock
buffer becomes a bottleneck.

The result of contention on the incore superblock buffer is that
transaction rates fall - the more pressure that is put on the superblock
buffer, the slower things go.

The key to removing the contention is to not require the superblock fields
in question to be locked. We do that by not marking the superblock dirty
in the transaction. IOWs, we modify the incore superblock but do not
modify the cached superblock buffer. In short, we do not log superblock
modifications to critical fields in the superblock on every transaction.
In fact we only do it just before we write the superblock to disk every
sync period or just before unmount.

This creates an interesting problem - if we don't log or write out the
fields in every transaction, then how do the values get recovered after a
crash? the answer is simple - we keep enough duplicate, logged information
in other structures that we can reconstruct the correct count after log
recovery has been performed.

It is the AGF and AGI structures that contain the duplicate information;
after recovery, we walk every AGI and AGF and sum their individual
counters to get the correct value, and we do a transaction into the log to
correct them. An optimisation of this is that if we have a clean unmount
record, we know the value in the superblock is correct, so we can avoid
the summation walk under normal conditions and so mount/recovery times do
not change under normal operation.

One wrinkle that was discovered during development was that the blocks
used in the freespace btrees are never accounted for in the AGF counters.
This was once a valid optimisation to make; when the filesystem is full,
the free space btrees are empty and consume no space. Hence when it
matters, the "accounting" is correct. But that means the when we do the
AGF summations, we would not have a correct count and xfs_check would
complain. Hence a new counter was added to track the number of blocks used
by the free space btrees. This is an *on-disk format change*.

As a result of this, lazy superblock counters are a mkfs option and at the
moment on linux there is no way to convert an old filesystem. This is
possible - xfs_db can be used to twiddle the right bits and then
xfs_repair will do the format conversion for you. Similarly, you can
convert backwards as well. At some point we'll add functionality to
xfs_admin to do the bit twiddling easily....

SGI-PV: 964999
SGI-Modid: xfs-linux-melb:xfs-kern:28652a

Signed-off-by: David Chinner
Signed-off-by: Christoph Hellwig
Signed-off-by: Tim Shimmin

David Chinner
2007-07-14 13:28:50 +0800
4cc929ee3 [XFS] Don't grow filesystems past the size they can index. ... Browse Code »

When growing a filesystem we don't check to see if the new size overflows
the page cache index range, so we can do silly things like grow a
filesystem page 16TB on a 32bit. Check new filesystem sizes against the
limits the kernel can support.

SGI-PV: 957886
SGI-Modid: xfs-linux-melb:xfs-kern:28563a

Signed-Off-By: Nathan Scott
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin

Nathan Scott
2007-07-14 13:21:29 +0800

08 May, 2007

1 commit

1c72bf900 [XFS] The last argument "lsn" of xfs_trans_commit() is always called with ... Browse Code »

NULL.

Patch provided by Eric Sandeen.

SGI-PV: 961693
SGI-Modid: xfs-linux-melb:xfs-kern:28199a

Signed-off-by: Eric Sandeen
Signed-off-by: Lachlan McIlroy
Signed-off-by: Tim Shimmin

Eric Sandeen
2007-05-08 11:48:42 +0800

10 Feb, 2007

2 commits

2c36ddeda [XFS] Remove unused arguments from the XFS_BTREE_*_ADDR macros. ... Browse Code »

It makes it incrementally clearer to read the code when the top of a macro
spaghetti-pile only receives the 3 arguments it uses, rather than 2 extra
ones which are not used. Also when you start pulling this thread out of
the sweater (i.e. remove unused args from XFS_BTREE_*_ADDR), a couple
other third arms etc fall off too. If they're not used in the macro, then
they sometimes don't need to be passed to the function calling the macro
either, etc....

Patch provided by Eric Sandeen (sandeen@sandeen.net).

SGI-PV: 960197
SGI-Modid: xfs-linux-melb:xfs-kern:28037a

Signed-off-by: Eric Sandeen
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin

Eric Sandeen
2007-02-10 15:37:33 +0800
dbcabad19 [XFS] Fix block reservation mechanism. ... Browse Code »

The block reservation mechanism has been broken since the per-cpu
superblock counters were introduced. Make the block reservation code work
with the per-cpu counters by syncing the counters, snapshotting the amount
of available space and then doing a modifcation of the counter state
according to the result. Continue in a loop until we either have no space
available or we reserve some space.

SGI-PV: 956323
SGI-Modid: xfs-linux-melb:xfs-kern:27895a

Signed-off-by: David Chinner
Signed-off-by: Christoph Hellwig
Signed-off-by: Tim Shimmin

David Chinner
2007-02-10 15:36:17 +0800