Doug / smarc-fsl-linux-kernel | Embedian Git Server

10 Oct, 2007

2 commits

7a9f53b3c [GFS2] Alternate gfs2_iget to avoid looking up inodes being freed ... Browse Code »

There is a possible deadlock between two processes on the same node, where one
process is deleting an inode, and another process is looking for allocated but
unused inodes to delete in order to create more space.

process A does an iput() on inode X, and it's i_count drops to 0. This causes
iput_final() to be called, which puts an inode into state I_FREEING at
generic_delete_inode(). There no point between when iput_final() is called, and
when I_FREEING is set where GFS2 could acquire any glocks. Once I_FREEING is
set, no other process on that node can successfully look up that inode until
the delete finishes.

process B locks the the resource group for the same inode in get_local_rgrp(),
which is called by gfs2_inplace_reserve_i()

process A tries to lock the resource group for the inode in
gfs2_dinode_dealloc(), but it's already locked by process B

process B waits in find_inode for the inode to have the I_FREEING state cleared.

Deadlock.

This patch solves the problem by adding an alternative to gfs2_iget(),
gfs2_iget_skip(), that simply skips any inodes that are in the I_FREEING
state.o The alternate test function is just like the original one, except that
it fails if the inode is being freed, and sets a skipped flag. The alternate
set function is just like the original, except that it fails if the skipped
flag is set. Only try_rgrp_unlink() calls gfs2_iget_skip() instead of
gfs2_iget().

Signed-off-by: Benjamin E. Marzinski
Signed-off-by: Steven Whitehouse

Benjamin Marzinski
2007-10-10 15:56:29 +0800
e9bd2b3ba [GFS2] fix inode meta data corruption ... Browse Code »

Fix a nasty inode meta data corruption issue by keeping the buffer head in
icache array. This buffer needs to stay in memory until journal flush occurs
Otherwise, gfs2_meta_inode_buffer could do a disk read before the inode hits
disk. It ends up with meta data corruptions. The buffer will be released as
part of the existing journal flush logic.

Signed-off-by: S. Wendy Cheng
Signed-off-by: Steven Whitehouse

Wendy Cheng
2007-10-10 15:55:51 +0800

09 Jul, 2007

9 commits

35dcc52e3 [GFS2] Remove i_mode passing from NFS File Handle ... Browse Code »

GFS2 has been passing i_mode within NFS File Handle. Other than the
wrong assumption that there is always room for this extra 16 bit value,
the current gfs2_get_dentry doesn't really need the i_mode to work
correctly. Note that GFS2 NFS code does go thru the same lookup code
path as direct file access route (where the mode is obtained from name
lookup) but gfs2_get_dentry() is coded for different purpose. It is not
used during lookup time. It is part of the file access procedure call.
When the call is invoked, if on-disk inode is not in-memory, it has to
be read-in. This makes i_mode passing a useless overhead.

Signed-off-by: S. Wendy Cheng
Signed-off-by: Steven Whitehouse

Wendy Cheng
2007-07-09 15:24:11 +0800
bb9bcf061 [GFS2] Obtaining no_formal_ino from directory entry ... Browse Code »

GFS2 lookup code doesn't ask for inode shared glock. This implies during
in-memory inode creation for existing file, GFS2 will not disk-read in
the inode contents. This leaves no_formal_ino un-initialized during
lookup time. The un-initialized no_formal_ino is subsequently encoded
into file handle. Clients will get ESTALE error whenever it tries to
access these files.

Signed-off-by: S. Wendy Cheng
Signed-off-by: Steven Whitehouse

Wendy Cheng
2007-07-09 15:24:08 +0800
d93cfa988 [GFS2] Fix deallocation issues ... Browse Code »

There were two issues during deallocation of unlinked inodes. The
first was relating to the use of a "try" lock which in the case of
the inode lock wasn't trying hard enough to deallocate in all
circumstances (now changed to a normal glock) and in the case of
the iopen lock didn't wait for the demotion of the shared lock before
attempting to get the exclusive lock, and thereby sometimes (timing dependent)
not completing the deallocation when it should have done.

The second issue related to the lack of a way to invalidate dcache entries
on remote nodes (now fixed by this patch) which meant that unlinks were
taking a long time to return disk space to the fs. By adding some code to
invalidate the dcache entries across the cluster for unlinked inodes, that
is now fixed.

This patch was written jointly by Abhijith Das and Steven Whitehouse.

Signed-off-by: Abhijith Das
Signed-off-by: Steven Whitehouse

Abhijith Das
2007-07-09 15:23:36 +0800
037bcbb75 [GFS2] gfs2_lookupi() uninitialised var fix ... Browse Code »

fs/gfs2/inode.c: In function 'gfs2_lookupi':
fs/gfs2/inode.c:392: warning: 'error' may be used uninitialized in this function

Looks like a real bug to me.

Cc: Steven Whitehouse
Signed-off-by: Andrew Morton
Signed-off-by: Steven Whitehouse

akpm@linux-foundation.org
2007-07-09 15:23:29 +0800
c8cdf4793 [GFS2] Recovery for lost unlinked inodes ... Browse Code »

Under certain circumstances its possible (though rather unlikely) that
inodes which were unlinked by one node while still open on another might
get "lost" in the sense that they don't get deallocated if the node
which held the inode open crashed before it was unlinked.

This patch adds the recovery code which allows automatic deallocation of
the inode if its found during block allocation (the sensible time to
look for such inodes since we are scanning the rgrp's bitmaps anyway at
this time, so it adds no overhead to do this).

Since the inode will have had its i_nlink set to zero, all we need to
trigger recovery is a lookup and an iput(), and the normal deallocation
code takes care of the rest.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-07-09 15:23:26 +0800
e1cc86037 [GFS2] Fix bug in error path of inode ... Browse Code »

This fixes a bug in the ordering of operations in the error path of
createi. Its not valid to do an iput() when holding the inode's glock
since the iput() will (in this case) result in delete_inode() being
called which needs to grab the lock itself. This was causing the
recursive lock checking code to trigger.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-07-09 15:23:22 +0800
4bd91ba18 [GFS2] Add nanosecond timestamp feature ... Browse Code »

This adds a nanosecond timestamp feature to the GFS2 filesystem. Due
to the way that the on-disk format works, older filesystems will just
appear to have this field set to zero. When mounted by an older version
of GFS2, the filesystem will simply ignore the extra fields so that
it will again appear to have whole second resolution, so that its
trivially backward compatible.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-07-09 15:23:12 +0800
bb8d8a6f5 [GFS2] Fix sign problem in quota/statfs and cleanup _host structures ... Browse Code »

This patch fixes some sign issues which were accidentally introduced
into the quota & statfs code during the endianess annotation process.
Also included is a general clean up which moves all of the _host
structures out of gfs2_ondisk.h (where they should not have been to
start with) and into the places where they are actually used (often only
one place). Also those _host structures which are not required any more
are removed entirely (which is the eventual plan for all of them).

The conversion routines from ondisk.c are also moved into the places
where they are actually used, which for almost every one, was just one
single place, so all those are now static functions. This also cleans up
the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__.

The net result is a reduction of about 100 lines of code, many functions
now marked static plus the bug fixes as mentioned above. For good
measure I ran the code through sparse after making these changes to
check that there are no warnings generated.

This fixes Red Hat bz #239686

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-07-09 15:23:10 +0800
dbb7cae2a [GFS2] Clean up inode number handling ... Browse Code »

This patch cleans up the inode number handling code. The main difference
is that instead of looking up the inodes using a struct gfs2_inum_host
we now use just the no_addr member of this structure. The tests relating
to no_formal_ino can then be done by the calling code. This has
advantages in that we want to do different things in different code
paths if the no_formal_ino doesn't match. In the NFS patch we want to
return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
no_formal_ino doesn't match and thus we can withdraw in this case.

In order to later fix bz #201012, we need to be able to look up an inode
without knowing no_formal_ino, as the only information that is known to
us is the on-disk location of the inode in question.

This patch will also help us to fix bz #236099 at a later date by
cleaning up a lot of the code in that area.

There are no user visible changes as a result of this patch and there
are no changes to the on-disk format either.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-07-09 15:22:24 +0800

08 Mar, 2007

2 commits

1be386795 [GFS2] Fix bz 229831, lookup returns wrong inode ... Browse Code »

The following patch fixes Red Hat bz 229831. Without this patch its
possible for the wrong inode to be returned in certain cases. It is a
pretty unusual event, so that its taken some time to track down. Thanks
and due to Josef Whiter who did a lot of the testing required to thrack
this down and fix it.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-03-08 03:01:53 +0800
fb0d3bce8 [GFS2] pass formal ino in do_filldir_main ... Browse Code »

ok, the following is the minimum changes to get NFSD going before we
settle down this issue .. would appreciate this in the tree so other NFS
related works can get done in parallel.

Signed-off-by: S. Wendy Cheng
Signed-off-by: Steven Whitehouse

Wendy Cheng
2007-03-08 02:58:45 +0800

06 Feb, 2007

7 commits

ddee76089 [GFS2] Fix unlink deadlocks ... Browse Code »

Move the glock acquisition to outside of the transactions.

Lock odering must be preserved in order to prevent ABBA
deadlocks. The current gfs2_change_nlink code would tries
to grab the glock after having started a transaction and thus is holding
the log lock. This is inconsistent with other code paths in
gfs that grab the resource group glock prior to staring
a tranactions.

One problem with this fix is that the resource group
lock is always grabbed now even if the inode still has
ref count and can not be marked for unlink.

Signed-off-by: Russell Cattelan
Signed-off-by: Steven Whitehouse

Russell Cattelan
2007-02-06 02:38:17 +0800
d7c103d0b [GFS2] Fix recursive locking attempt with NFS ... Browse Code »

In certain cases, its possible for NFS to call the lookup code while
holding the glock (when doing a readdirplus operation) so we need to
check for that and not try and lock the glock twice. This also fixes a
typo in a previous NFS related GFS2 patch.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-02-06 02:37:53 +0800
ddfe06278 [GFS2] use CURRENT_TIME_SEC instead of get_seconds in gfs2 ... Browse Code »

I was looking something else up and came across this...

I don't honestly have a good reason to change it other than to make it
like every other Linux filesystem in this regard. ;-) It doesn't
functionally change anything, but makes some lines shorter. :)

I'm also curious; why does gfs2 have 64-bits of on-disk timestamps, but
not in timespec_t format, and only stores second resolutions? Seems like
you're halfway to sub-second resolutions already.

I suppose if that gets implemented then all of the below should
instead be CURRENT_TIME not CURRENT_TIME_SEC.

Signed-off-by: Eric Sandeen
Signed-off-by: Steven Whitehouse

Eric Sandeen
2007-02-06 02:37:38 +0800
03dc6a538 [GFS2] make gfs2_change_nlink_i() static ... Browse Code »

On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.20-rc3-mm1:
>...
> git-gfs2-nmw.patch
>...
> git trees
>...

This patch makes the needlessly globlal gfs2_change_nlink_i() static.

Signed-off-by: Adrian Bunk
Signed-off-by: Steven Whitehouse

Adrian Bunk
2007-02-06 02:36:49 +0800
87d21e07f [GFS2] Fix gfs2_rename deadlock ... Browse Code »

Second round of gfs2_rename lock re-ordering to allow Anaconda adding
root partition on top of gfs2. Previous to this patch the recursive
lock detector in glock.c can be triggered due to attempting to lock
the rgrp twice. This fixes it by checking to see whether the rgrp
is already locked.

This fixes Red Hat bugzilla #221237

Signed-off-by: S. Wendy Cheng
Signed-off-by: Steven Whitehouse

S. Wendy Cheng
2007-02-06 02:36:31 +0800
6c93fd1e5 [GFS2] BZ 217008 fsfuzzer fix. ... Browse Code »

Update the quilt header comments to match the
code changes.

Change gfs2_lookup_simple to return an error in the case
of a NULL inode.
The callers of gfs2_lookup_simple do not check for NULL
in the no entry case and such would end up dereferencing a NULL ptr.

This fixes:
http://projects.info-pull.com/mokb/MOKB-15-11-2006.html

Signed-off-by: Russell Cattelan
Signed-off-by: Steven Whitehouse

Russell Cattelan
2007-02-06 02:36:28 +0800
5509826f1 [GFS2] Fix change nlink deadlock ... Browse Code »

Bugzilla 215088

Fix deadlock in gfs2_change_nlink() while installing RHEL5 into GFS2
partition. The gfs2_rename() apparently needs block allocation for the
new name (into the directory) where it requires rg locks. At the same
time, while updating the nlink count for the replaced file,
gfs2_change_nlink() tries to return the inode meta-data back to resource
group where it needs rg locks too. Our logic doesn't allow process to
acquire these locks recursively by the same process (RHEL installer)
that results a BUG call. This only happens within rename code path and
only if the destination file exists before the rename operation.

Signed-off-by: S. Wendy Cheng
Signed-off-by: Steven Whitehouse

S. Wendy Cheng
2007-02-06 02:36:15 +0800

30 Nov, 2006

20 commits

28626e207 [GFS2] Fix glock ordering on inode creation ... Browse Code »

The lock order here should be parent -> child rather than
numeric order.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:36:33 +0800
dcd247995 [GFS2] Remove unused function from inode.c ... Browse Code »

The gfs2_glock_nq_m_atime function is unused in so far as its only
ever called with num_gh = 1, and this falls through to the
gfs2_glock_nq_atime function, so we might as well call that directly.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:35:57 +0800
9e2dbdac3 [GFS2] Remove gfs2_inode_attr_in ... Browse Code »

This function wasn't really doing the right thing. There was no need
to update the inode size at this point and the updating of the
i_blocks field has now been moved to the places where di_blocks is
updated. A result of this patch and some those preceeding it is that
unlocking a glock is now a much more efficient process, since there
is no longer any requirement to copy data from the gfs2 inode into
the vfs inode at this point.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:52 +0800
e7c698d74 [GFS2] Inode number is constant ... Browse Code »

Since the inode number is constant, we don't need to keep updating
it everytime we refresh the other inode fields.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:48 +0800
6b124d8db [GFS2] Only set inode flags when required ... Browse Code »

We were setting the inode flags from GFS2's flags far too often, even when they
couldn't possibly have changed. This patch reduces the amount of flag
setting going on so that we do it only when the inode is read in or
when the flags have changed. The create case is covered by the "when
the inode is read in" case.

This also fixes a bug where we didn't set S_SYNC correctly.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:45 +0800
294caaa3b [GFS2] Tidy up 0 initialisations in inode.c ... Browse Code »

We don't need to use endian conversions for 0 initialisations
when creating a new on-disk inode.

Cc: Christoph Hellwig
Cc: Alexey Dobriyan
Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:33 +0800
bfded27ba [GFS2] Shrink gfs2_inode (8) - i_vn ... Browse Code »

This shrinks the size of the gfs2_inode by 8 bytes by
replacing the version counter with a one bit valid/invalid
flag.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:30 +0800
a9583c798 [GFS2] Shrink gfs2_inode (7) - di_payload_format ... Browse Code »

This is almost never used. Its there for backward
compatibility with GFS1. It doesn't need its own
field since it can always be calculated from the
inode mode & flags. This saves a bit more space
in the gfs2_inode.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:26 +0800
1a7b1eed5 [GFS2] Shrink gfs2_inode (6) - di_atime/di_mtime/di_ctime ... Browse Code »

Remove the di_[amc]time fields and use inode->i_[amc]time
fields instead. This saves 24 bytes from the gfs2_inode.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:23 +0800
4f56110a0 [GFS2] Shrink gfs2_inode (5) - di_nlink ... Browse Code »

Remove the di_nlink field in favour of inode->i_nlink and
update the nlink handling to use the proper macros. This
saves 4 bytes.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:20 +0800
2933f9254 [GFS2] Shrink gfs2_inode (4) - di_uid/di_gid ... Browse Code »

Remove duplicate di_uid/di_gid fields in favour of using
inode->i_uid/inode->i_gid instead. This saves 8 bytes.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:17 +0800
b60623c23 [GFS2] Shrink gfs2_inode (3) - di_mode ... Browse Code »

This removes the duplicate di_mode field in favour of using the
inode->i_mode field. This saves 4 bytes.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:14 +0800
e7f14f4d0 [GFS2] Shrink gfs2_inode (2) - di_major/di_minor ... Browse Code »

This removes the device numbers from this structure by using
inode->i_rdev instead. It also cleans up the code in gfs2_mknod.
It results in shrinking the gfs2_inode by 8 bytes.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:11 +0800
af339c024 [GFS2] Shrink gfs2_inode (1) - di_header/di_num ... Browse Code »

The metadata header doesn't need to be stored in the incore
struct gfs2_inode since its constant, and this patch removes it.
Also, there is already a field for the inode's number in the
struct gfs2_inode, so we don't need one in struct gfs2_dinode_host
as well.

This saves 28 bytes of space in the struct gfs2_inode.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:07 +0800
4cc14f0b8 [GFS2] Change argument to gfs2_dinode_print ... Browse Code »

Change argument for gfs2_dinode_print in order to prepare
for removal of duplicate fields between struct inode and
struct gfs2_dinode_host.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:03 +0800
ea744d01c [GFS2] Move gfs2_dinode_in to inode.c ... Browse Code »

gfs2_dinode_in() is only ever called from one place, so move it
to that place (in inode.c) and make it static.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:00 +0800
891ea1471 [GFS2] Change argument to gfs2_dinode_in ... Browse Code »

This is a preliminary patch to enable the removal of fields
in gfs2_dinode_host which are duplicated in struct inode.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:33:57 +0800
539e5d6b7 [GFS2] Change argument of gfs2_dinode_out ... Browse Code »

Everywhere this was called, a struct gfs2_inode was available,
but despite that, it was always called with a struct gfs2_dinode
as an argument. By making this change it paves the way to start
eliminating fields duplicated between the kernel's struct inode
and the struct gfs2_dinode.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:33:54 +0800
b44b84d76 [GFS2] gfs2 misc endianness annotations ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Steven Whitehouse

Al Viro
2006-11-30 23:33:46 +0800
629a21e7e [GFS2] split and annotate gfs2_inum ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Steven Whitehouse

Al Viro
2006-11-30 23:33:32 +0800