Eric Lee / smarc-fsl-linux-kernel

27 Feb, 2010

2 commits

b305956ab Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs: (52 commits)
fs/xfs: Correct NULL test
xfs: optimize log flushing in xfs_fsync
xfs: only clear the suid bit once in xfs_write
xfs: kill xfs_bawrite
xfs: log changed inodes instead of writing them synchronously
xfs: remove invalid barrier optimization from xfs_fsync
xfs: kill the unused XFS_QMOPT_* flush flags V2
xfs: Use delay write promotion for dquot flushing
xfs: Sort delayed write buffers before dispatch
xfs: Don't issue buffer IO direct from AIL push V2
xfs: Use delayed write for inodes rather than async V2
xfs: Make inode reclaim states explicit
xfs: more reserved blocks fixups
xfs: turn off sign warnings
xfs: don't hold onto reserved blocks on remount,ro
xfs: quota limit statvfs available blocks
xfs: replace KM_LARGE with explicit vmalloc use
xfs: cleanup up xfs_log_force calling conventions
xfs: kill XLOG_VEC_SET_TYPE
xfs: remove duplicate buffer flags
...

Linus Torvalds
2010-02-27 09:18:52 +0800
f24407d2b Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/xfs-vipt ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/xfs-vipt:
xfs: fix xfs to work with Virtually Indexed architectures
sh: add mm API for DMA to vmalloc/vmap areas
arm: add mm API for DMA to vmalloc/vmap areas
parisc: add mm API for DMA to vmalloc/vmap areas
mm: add coherence API for DMA to vmalloc/vmap areas

Linus Torvalds
2010-02-27 09:05:10 +0800

13 Feb, 2010

1 commit

87185517d xfs: only clear the suid bit once in xfs_write ... Browse Code »

file_remove_suid already calls into ->setattr to clear the suid and
sgid bits if needed, no need to start a second transaction to do it
ourselves.

Note that xfs_write_clear_setuid issues a sync transaction while the
path through ->setattr doesn't, but that is consistant with the
other filesystems.

Signed-off-by: Christoph Hellwig
Reviewed-by: Alex Elder
Signed-off-by: Alex Elder

Christoph Hellwig
2010-02-13 03:43:57 +0800

09 Feb, 2010

2 commits

07fec7362 xfs: log changed inodes instead of writing them synchronously ... Browse Code »

When an inode has already be flushed delayed write,
xfs_inode_clean() returns true and hence xfs_fs_write_inode() can
return on a synchronous inode write without having written the
inode. Currently these sycnhronous writes only come sync(1),
unmount, a sycnhronous NFS export and cachefiles so should be
relatively rare and out of common performance paths.

Realistically, a synchronous inode write is not necessary here; we
can avoid writing the inode by logging any non-transactional changes
that are pending. This needs to be done with synchronous
transactions, but it avoids seeking between the log and inode
clusters as we do now. We don't force the log if the inode is
pinned, though, so this differs from the fsync case. For normal
sys_sync and unmount behaviour this is fine because we do a
synchronous log force in xfs_sync_data which is called from the
->sync_fs code.

It does however break the NFS synchronous export guarantees for now,
but work is under way to fix this at a higher level or for the
higher level to provide an additional flag in the writeback control
to tell us that a log force is needed.

Portions of this patch are based on work from Dave Chinner.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Alex Elder

Christoph Hellwig
2010-02-09 08:43:49 +0800
d5db0f97f xfs: more reserved blocks fixups ... Browse Code »

This mangles the reserved blocks counts a little more.

1) add a helper function for the default reserved count
2) add helper functions to save/restore counts on ro/rw
3) save/restore reserved blocks on freeze/thaw
4) disallow changing reserved count while readonly

V2: changed field name to match Dave's changes

Signed-off-by: Eric Sandeen
Signed-off-by: Alex Elder

Eric Sandeen
2010-02-09 07:41:48 +0800

06 Feb, 2010

3 commits

73c77e2cc xfs: fix xfs to work with Virtually Indexed architectures ... Browse Code »

xfs_buf.c includes what is essentially a hand rolled version of
blk_rq_map_kern(). In order to work properly with the vmalloc buffers
that xfs uses, this hand rolled routine must also implement the flushing
API for vmap/vmalloc areas.

[style updates from hch@lst.de]
Acked-by: Christoph Hellwig
Signed-off-by: James Bottomley

James Bottomley
2010-02-06 02:32:35 +0800
c854363e8 xfs: Use delayed write for inodes rather than async V2 ... Browse Code »

We currently do background inode flush asynchronously, resulting in
inodes being written in whatever order the background writeback
issues them. Not only that, there are also blocking and non-blocking
asynchronous inode flushes, depending on where the flush comes from.

This patch completely removes asynchronous inode writeback. It
removes all the strange writeback modes and replaces them with
either a synchronous flush or a non-blocking delayed write flush.
That is, inode flushes will only issue IO directly if they are
synchronous, and background flushing may do nothing if the operation
would block (e.g. on a pinned inode or buffer lock).

Delayed write flushes will now result in the inode buffer sitting in
the delwri queue of the buffer cache to be flushed by either an AIL
push or by the xfsbufd timing out the buffer. This will allow
accumulation of dirty inode buffers in memory and allow optimisation
of inode cluster writeback at the xfsbufd level where we have much
greater queue depths than the block layer elevators. We will also
get adjacent inode cluster buffer IO merging for free when a later
patch in the series allows sorting of the delayed write buffers
before dispatch.

This effectively means that any inode that is written back by
background writeback will be seen as flush locked during AIL
pushing, and will result in the buffers being pushed from there.
This writeback path is currently non-optimal, but the next patch
in the series will fix that problem.

A side effect of this delayed write mechanism is that background
inode reclaim will no longer directly flush inodes, nor can it wait
on the flush lock. The result is that inode reclaim must leave the
inode in the reclaimable state until it is clean. Hence attempts to
reclaim a dirty inode in the background will simply skip the inode
until it is clean and this allows other mechanisms (i.e. xfsbufd) to
do more optimal writeback of the dirty buffers. As a result, the
inode reclaim code has been rewritten so that it no longer relies on
the ambiguous return values of xfs_iflush() to determine whether it
is safe to reclaim an inode.

Portions of this patch are derived from patches by Christoph
Hellwig.

Version 2:
- cleanup reclaim code as suggested by Christoph
- log background reclaim inode flush errors
- just pass sync flags to xfs_iflush

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-02-06 09:39:36 +0800
777df5afd xfs: Make inode reclaim states explicit ... Browse Code »

A.K.A.: don't rely on xfs_iflush() return value in reclaim

We have gradually been moving checks out of the reclaim code because
they are duplicated in xfs_iflush(). We've had a history of problems
in this area, and many of them stem from the overloading of the
return values from xfs_iflush() and interaction with inode flush
locking to determine if the inode is safe to reclaim.

With the desire to move to delayed write flushing of inodes and
non-blocking inode tree reclaim walks, the overloading of the
return value of xfs_iflush makes it very difficult to determine
the correct thing to do next.

This patch explicitly re-adds the checks to the inode reclaim code,
removing the reliance on the return value of xfs_iflush() to
determine what to do next. It also means that we can clearly
document all the inode states that reclaim must handle and hence
we can easily see that we handled all the necessary cases.

This also removes the need for the xfs_inode_clean() check in
xfs_iflush() as all callers now check this first (safely).

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-02-06 09:37:26 +0800

04 Feb, 2010

1 commit

5322892d8 xfs: kill xfs_bawrite ... Browse Code »

There are no more users of this function left in the XFS code
now that we've switched everything to delayed write flushing.
Remove it.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-02-04 07:09:14 +0800

02 Feb, 2010

1 commit

d808f617a xfs: Don't issue buffer IO direct from AIL push V2 ... Browse Code »

All buffers logged into the AIL are marked as delayed write.
When the AIL needs to push the buffer out, it issues an async write of the
buffer. This means that IO patterns are dependent on the order of
buffers in the AIL.

Instead of flushing the buffer, promote the buffer in the delayed
write list so that the next time the xfsbufd is run the buffer will
be flushed by the xfsbufd. Return the state to the xfsaild that the
buffer was promoted so that the xfsaild knows that it needs to cause
the xfsbufd to run to flush the buffers that were promoted.

Using the xfsbufd for issuing the IO allows us to dispatch all
buffer IO from the one queue. This means that we can make much more
enlightened decisions on what order to flush buffers to disk as
we don't have multiple places issuing IO. Optimisations to xfsbufd
will be in a future patch.

Version 2
- kill XFS_ITEM_FLUSHING as it is now unused.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-02-02 07:13:42 +0800

26 Jan, 2010

2 commits

089716aa1 xfs: Sort delayed write buffers before dispatch ... Browse Code »

Currently when the xfsbufd writes delayed write buffers, it pushes
them to disk in the order they come off the delayed write list. If
there are lots of buffers ѕpread widely over the disk, this results
in overwhelming the elevator sort queues in the block layer and we
end up losing the posibility of merging adjacent buffers to minimise
the number of IOs.

Use the new generic list_sort function to sort the delwri dispatch
queue before issue to ensure that the buffers are pushed in the most
friendly order possible to the lower layers.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-01-26 12:13:25 +0800
cbe132a8b xfs: don't hold onto reserved blocks on remount,ro ... Browse Code »

If we hold onto reserved blocks when doing a remount,ro we end
up writing the blocks used count to disk that includes the reserved
blocks. Reserved blocks are not actually used, so this results in
the values in the superblock being incorrect.

Hence if we run xfs_check or xfs_repair -n while the filesystem is
mounted remount,ro we end up with an inconsistent filesystem being
reported. Also, running xfs_copy on the remount,ro filesystem will
result in an inconsistent image being generated.

To fix this, unreserve the blocks when doing the remount,ro, and
reserved them again on remount,rw. This way a remount,ro filesystem
will appear consistent on disk to all utilities.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-01-26 12:08:49 +0800

22 Jan, 2010

3 commits

bdfb04301 xfs: replace KM_LARGE with explicit vmalloc use ... Browse Code »

We use the KM_LARGE flag to make kmem_alloc and friends use vmalloc
if necessary. As we only need this for a few boot/mount time
allocations just switch to explicit vmalloc calls there.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-01-22 03:44:56 +0800
a14a348bf xfs: cleanup up xfs_log_force calling conventions ... Browse Code »

Remove the XFS_LOG_FORCE argument which was always set, and the
XFS_LOG_URGE define, which was never used.

Split xfs_log_force into a two helpers - xfs_log_force which forces
the whole log, and xfs_log_force_lsn which forces up to the
specified LSN. The underlying implementations already were entirely
separate, as were the users.

Also re-indent the new _xfs_log_force/_xfs_log_force which
previously had a weird coding style.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-01-22 03:44:49 +0800
0cadda1c5 xfs: remove duplicate buffer flags ... Browse Code »

Currently we define aliases for the buffer flags in various
namespaces, which only adds confusion. Remove all but the XBF_
flags to clean this up a bit.

Note that we still abuse XFS_B_ASYNC/XBF_ASYNC for some non-buffer
uses, but I'll clean that up later.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder

Christoph Hellwig
2010-01-22 03:44:36 +0800

20 Jan, 2010

2 commits

a9273ca5c xfs: convert attr to use unsigned names ... Browse Code »

To be consistent with the directory code, the attr code should use
unsigned names. Convert the names from the vfs at the highest level
to unsigned, and ænsure they are consistenly used as unsigned down
to disk.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-01-20 07:47:48 +0800
b9c486495 xfs: xfs_buf_iomove() doesn't care about signedness ... Browse Code »

xfs_buf_iomove() uses xfs_caddr_t as it's parameter types, but it doesn't
care about the signedness of the variables as it is just copying the
data. Change the prototype to use void * so that we don't get sign
warnings at call sites.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-01-20 07:47:39 +0800

16 Jan, 2010

13 commits

4e23471a3 xfs: move more buffer helpers into xfs_buf.c ... Browse Code »

Move xfsbdstrat and xfs_bdstrat_cb from xfs_lrw.c and xfs_bioerror
and xfs_bioerror_relse from xfs_rw.c into xfs_buf.c. This also
means xfs_bioerror and xfs_bioerror_relse can be marked static now.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-01-16 05:35:17 +0800
64e0bc7d2 xfs: clean up xfs_bwrite ... Browse Code »

Fold XFS_bwrite into it's only caller, xfs_bwrite and move it into
xfs_buf.c instead of leaving it as a fairly large inline function.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-01-16 05:35:07 +0800
873ff5501 xfs: clean up log buffer writes ... Browse Code »

Don't bother using XFS_bwrite as it doesn't provide much code for
our use case. Instead opencode it and fold xlog_bdstrat_cb into the
new xlog_bdstrat helper.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-01-16 05:34:54 +0800
b657fc82a xfs: Kill filestreams cache flush ... Browse Code »

The filestreams cache flush is not needed in the sync code as it
does not affect data writeback, and it is now not used by the growfs
code, either, so kill it.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-01-16 05:34:22 +0800
0fa800fbd xfs: Add trace points for per-ag refcount debugging. ... Browse Code »

Uninline xfs_perag_{get,put} so that tracepoints can be inserted
into them to speed debugging of reference count problems.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-01-16 05:34:12 +0800
5017e97d5 xfs: rename xfs_get_perag ... Browse Code »

xfs_get_perag is really getting the perag that an inode belongs to
based on it's inode number. Convert the use of this function to just
get the perag from a provided ag number. Use this new function to
obtain the per-ag structure when traversing the per AG inode trees
for sync and reclaim.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-01-16 05:33:02 +0800
c9c129714 xfs: Don't wake xfsbufd when idle ... Browse Code »

The xfsbufd wakes every xfsbufd_centisecs (once per second by
default) for each filesystem even when the filesystem is idle. If
the xfsbufd has nothing to do, put it into a long term sleep and
only wake it up when there is work pending (i.e. dirty buffers to
flush soon). This will make laptop power misers happy.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-01-16 05:32:54 +0800
453eac8a9 xfs: Don't wake the aild once per second ... Browse Code »

Now that the AIL push algorithm is traversal safe, we don't need a
watchdog function in the xfsaild to catch pushes that fail to make
progress. Remove the watchdog timeout and make pushes purely driven
by demand. This will remove the once-per-second wakeup that is seen
when the filesystem is idle and make laptop power misers happy.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-01-16 05:32:46 +0800
5d77c0dc0 xfs: make several more functions static ... Browse Code »

Just minor housekeeping, a lot more functions can be trivially made
static; others could if we reordered things a bit...

Signed-off-by: Eric Sandeen
Signed-off-by: Alex Elder

Eric Sandeen
2010-01-16 05:31:38 +0800
3a85cd96d xfs: add tracing to xfs_swap_extents ... Browse Code »

To be able to diagnose whether the swap extents function is
detecting compatible inode data fork configurations for swapping
extents, add tracing points to the code to allow us to see the
format of the inode forks before and after the swap.

Signed-off-by: Dave Chinner
Signed-off-by: Alex Elder

Dave Chinner
2010-01-16 05:20:06 +0800
57817c682 xfs: reclaim all inodes by background tree walks ... Browse Code »

We cannot do direct inode reclaim without taking the flush lock to
ensure that we do not reclaim an inode under IO. We check the inode
is clean before doing direct reclaim, but this is not good enough
because the inode flush code marks the inode clean once it has
copied the in-core dirty state to the backing buffer.

It is the flush lock that determines whether the inode is still
under IO, even though it is marked clean, and the inode is still
required at IO completion so we can't reclaim it even though it is
clean in core. Hence the requirement that we need to take the flush
lock even on clean inodes because this guarantees that the inode
writeback IO has completed and it is safe to reclaim the inode.

With delayed write inode flushing, we coul dend up waiting a long
time on the flush lock even for a clean inode. The background
reclaim already handles this efficiently, so avoid all the problems
by killing the direct reclaim path altogether.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-01-16 03:44:44 +0800
018027be9 xfs: Avoid inodes in reclaim when flushing from inode cache ... Browse Code »

The reclaim code will handle flushing of dirty inodes before reclaim
occurs, so avoid them when determining whether an inode is a
candidate for flushing to disk when walking the radix trees. This
is based on a test patch from Christoph Hellwig.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-01-16 03:44:21 +0800
c8e20be02 xfs: reclaim inodes under a write lock ... Browse Code »

Make the inode tree reclaim walk exclusive to avoid races with
concurrent sync walkers and lookups. This is a version of a patch
posted by Christoph Hellwig that avoids all the code duplication.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-01-16 03:43:55 +0800

11 Jan, 2010

3 commits

fd45e4784 xfs: Ensure we force all busy extents in range to disk ... Browse Code »

When we search for and find a busy extent during allocation we
force the log out to ensure the extent free transaction is on
disk before the allocation transaction. The current implementation
has a subtle bug in it--it does not handle multiple overlapping
ranges.

That is, if we free lots of little extents into a single
contiguous extent, then allocate the contiguous extent, the busy
search code stops searching at the first extent it finds that
overlaps the allocated range. It then uses the commit LSN of the
transaction to force the log out to.

Unfortunately, the other busy ranges might have more recent
commit LSNs than the first busy extent that is found, and this
results in xfs_alloc_search_busy() returning before all the
extent free transactions are on disk for the range being
allocated. This can lead to potential metadata corruption or
stale data exposure after a crash because log replay won't replay
all the extent free transactions that cover the allocation range.

Modified-by: Alex Elder

(Dropped the "found" argument from the xfs_alloc_busysearch trace
event.)

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-01-11 02:22:02 +0800
d6d59bada xfs: fix timestamp handling in xfs_setattr ... Browse Code »

We currently have some rather odd code in xfs_setattr for
updating the a/c/mtime timestamps:

- first we do a non-transaction update if all three are updated
together
- second we implicitly update the ctime for various changes
instead of relying on the ATTR_CTIME flag
- third we set the timestamps to the current time instead of the
arguments in the iattr structure in many cases.

This patch makes sure we update it in a consistent way:

- always transactional
- ctime is only updated if ATTR_CTIME is set or we do a size
update, which is a special case
- always to the times passed in from the caller instead of the
current time

The only non-size caller of xfs_setattr that doesn't come from
the VFS is updated to set ATTR_CTIME and pass in a valid ctime
value.

Reported-by: Eric Blake
Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-01-11 02:21:58 +0800
ea9a48881 xfs: use DECLARE_EVENT_CLASS ... Browse Code »

Using DECLARE_EVENT_CLASS allows us to to use trace event code
instead of duplicating it in the binary. This was not available
before 2.6.33 so it had to be done as a separate step once the
prerequisite was merged.

This only requires changes to xfs_trace.h and the results are
rather impressive:

hch@brick:~/work/linux-2.6/obj-kvm$ size fs/xfs/xfs.o*
text data bss dec hex filename
607732 41884 3616 653232 9f7b0 fs/xfs/xfs.o
1026732 41884 3808 1072424 105d28 fs/xfs/xfs.o.old

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-01-11 02:21:56 +0800

09 Jan, 2010

1 commit

a539bd8c8 xfs: kill some warnings on i386 builds ... Browse Code »

Randy Dunlap Reported printk() format-related warnings reported
on i386 builds in his environment. Dave Chinner provided this
patch to eliminate them.

Signed-off by: Dave Chinner
Acked-by: Randy Dunlap

Signed-off-by: Alex Elder

Dave Chinner
2010-01-09 03:32:29 +0800

18 Dec, 2009

1 commit

eaff8079d kill I_LOCK ... Browse Code »

After I_SYNC was split from I_LOCK the leftover is always used together with
I_NEW and thus superflous.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-12-18 00:03:25 +0800

17 Dec, 2009

5 commits

bea4c899f Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
XFS: Free buffer pages array unconditionally
xfs: kill xfs_bmbt_rec_32/64 types
xfs: improve metadata I/O merging in the elevator
xfs: check for not fully initialized inodes in xfs_ireclaim

Linus Torvalds
2009-12-17 05:29:39 +0800
3fc98b1ac XFS: Free buffer pages array unconditionally ... Browse Code »

The code in xfs_free_buf() only attempts to free the b_pages array if the
buffer is a page cache backed or page allocated buffer. The extra log buffer
that is used when the log wraps uses pages that are allocated to a different
log buffer, but it still has a b_pages array allocated when those pages
are associated to with the extra buffer in xfs_buf_associate_memory.

Hence we need to always attempt to free the b_pages array when tearing
down a buffer, not just on buffers that are explicitly marked as page bearing
buffers. This fixes a leak detected by the kernel memory leak code.

Signed-off-by: Dave Chinner
Signed-off-by: Alex Elder

Dave Chinner
2009-12-17 03:41:20 +0800
2ee1abad7 xfs: improve metadata I/O merging in the elevator ... Browse Code »

Change all async metadata buffers to use [READ|WRITE]_META I/O types
so that the I/O doesn't get issued immediately. This allows merging of
adjacent metadata requests but still prioritises them over bulk data.
This shows a 10-15% improvement in sequential create speed of small
files.

Don't include the log buffers in this classification - leave them as
sync types so they are issued immediately.

Signed-off-by: Dave Chinner
Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2009-12-17 03:41:19 +0800
1e431f5ce cleanup blockdev_direct_IO locking ... Browse Code »

Currently the locking in blockdev_direct_IO is a mess, we have three different
locking types and very confusing checks for some of them. The most
complicated one is DIO_OWN_LOCKING for reads, which happens to not actually be
used.

This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read case
is unused anyway, and the write side is almost identical to DIO_NO_LOCKING.
The difference is that DIO_NO_LOCKING always sets the create argument for
the get_blocks callback to zero, but we can easily move that to the actual
get_blocks callbacks. There are four users of the DIO_NO_LOCKING mode:
gfs already ignores the create argument and thus is fine with the new
version, ocfs2 only errors out if create were ever set, and we can remove
this dead code now, the block device code only ever uses create for an
error message if we are fully beyond the device which can never happen,
and last but not least XFS will need the new behavour for writes.

Now we can replace the lock_type variable with a flags one, where no flag
means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first
flag. Separate out the check for not allowing to fill holes into a separate
flag, although for now both flags always get set at the same time.

Also revamp the documentation of the locking scheme to actually make sense.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-12-17 01:16:49 +0800
431547b3c sanitize xattr handler prototypes ... Browse Code »

Add a flags argument to struct xattr_handler and pass it to all xattr
handler methods. This allows using the same methods for multiple
handlers, e.g. for the ACL methods which perform exactly the same action
for the access and default ACLs, just using a different underlying
attribute. With a little more groundwork it'll also allow sharing the
methods for the regular user/trusted/secure handlers in extN, ocfs2 and
jffs2 like it's already done for xfs in this patch.

Also change the inode argument to the handlers to a dentry to allow
using the handlers mechnism for filesystems that require it later,
e.g. cifs.

[with GFS2 bits updated by Steven Whitehouse ]

Signed-off-by: Christoph Hellwig
Reviewed-by: James Morris
Acked-by: Joel Becker
Signed-off-by: Al Viro

Christoph Hellwig
2009-12-17 01:16:49 +0800