Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

05 Jan, 2009

7 commits

97cc1025b GFS2: Kill two daemons with one patch ... Browse Code »

This patch removes the two daemons, gfs2_scand and gfs2_glockd
and replaces them with a shrinker which is called from the VM.

The net result is that GFS2 responds better when there is memory
pressure, since it shrinks the glock cache at the same rate
as the VFS shrinks the dcache and icache. There are no longer
any time based criteria for shrinking glocks, they are kept
until such time as the VM asks for more memory and then we
demote just as many glocks as required.

There are potential future changes to this code, including the
possibility of sorting the glocks which are to be written back
into inode number order, to get a better I/O ordering. It would
be very useful to have an elevator based workqueue implementation
for this, as that would automatically deal with the read I/O cases
at the same time.

This patch is my answer to Andrew Morton's remark, made during
the initial review of GFS2, asking why GFS2 needs so many kernel
threads, the answer being that it doesn't :-) This patch is a
net loss of about 200 lines of code.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2009-01-05 15:39:09 +0800
383f01fbf GFS2: Banish struct gfs2_dinode_host ... Browse Code »

The final field in gfs2_dinode_host was the i_flags field. Thats
renamed to i_diskflags in order to avoid confusion with the existing
inode flags, and moved into the inode proper at a suitable location
to avoid creating a "hole".

At that point struct gfs2_dinode_host is no longer needed and as
promised (quite some time ago!) it can now be removed completely.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2009-01-05 15:38:59 +0800
c9e988867 GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize ... Browse Code »

This patch moved the i_size field from the gfs2_dinode_host and
following the ext3 convention renames it i_disksize.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2009-01-05 15:38:58 +0800
3767ac21f GFS2: Move di_eattr into "proper" inode ... Browse Code »

This moves the di_eattr field out of gfs2_inode_host and
into the inode proper.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2009-01-05 15:38:57 +0800
ad6203f2b GFS2: Move "entries" into "proper" inode ... Browse Code »

This moves the directory entry count into the proper inode.
Potentially we could get this to share the space used by
something else in the future, but this is one more step
on the way to removing the gfs2_dinode_host structure.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2009-01-05 15:38:56 +0800
bcf0b5b34 GFS2: Move generation number into "proper" part of inode ... Browse Code »

This moves the generation number from the gfs2_dinode_host
into the gfs2_inode structure. Eventually the plan is to get
rid of the gfs2_dinode_host structure completely.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2009-01-05 15:38:55 +0800
b27605837 GFS2: Rationalise header files ... Browse Code »

Move the contents of some headers which contained very
little into more sensible places, and remove the original
header files. This should make it easier to find things.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2009-01-05 15:38:48 +0800

14 Nov, 2008

1 commit

3de7be335 CRED: Wrap task credential accesses in the GFS2 filesystem ... Browse Code »

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells
Reviewed-by: James Morris
Acked-by: Serge Hallyn
Cc: Steven Whitehouse
Cc: cluster-devel@redhat.com
Signed-off-by: James Morris

David Howells
2008-11-14 07:38:53 +0800

18 Sep, 2008

1 commit

719ee3446 GFS2: high time to take some time over atime ... Browse Code »
2

Until now, we've used the same scheme as GFS1 for atime. This has failed
since atime is a per vfsmnt flag, not a per fs flag and as such the
"noatime" flag was not getting passed down to the filesystems. This
patch removes all the "special casing" around atime updates and we
simply use the VFS's atime code.

The net result is that GFS2 will now support all the same atime related
mount options of any other filesystem on a per-vfsmnt basis. We do lose
the "lazy atime" updates, but we gain "relatime". We could add lazy
atime to the VFS at a later date, if there is a requirement for that
variant still - I suspect relatime will be enough.

Also we lose about 100 lines of code after this patch has been applied,
and I have a suspicion that it will speed things up a bit, even when
atime is "on". So it seems like a nice clean up as well.

From a user perspective, everything stays the same except the loss of
the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
least, and to be honest I don't think anybody ever used it) and that a
number of options which were ignored before now work correctly.

Please let me know if you've got any comments. I'm pushing this out
early so that you can all see what my plans are.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2008-09-18 20:53:59 +0800

05 Sep, 2008

1 commit

bd1eb8818 GFS2: Use an IS_ERR test rather than a NULL test ... Browse Code »

In case of error, the function gfs2_inode_lookup returns an
ERR pointer, but never returns a NULL pointer. So a NULL test that
necessarily comes after an IS_ERR test should be deleted, and a NULL
test that may come after a call to this function should be
strengthened by an IS_ERR test.

The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

//
@match_bad_null_test@
expression x, E;
statement S1,S2;
@@
x = gfs2_inode_lookup(...)
... when != x = E
* if (x != NULL)
S1 else S2
//

Signed-off-by: Julien Brunel
Signed-off-by: Julia Lawall
Signed-off-by: Steven Whitehouse

Julien Brunel
2008-09-05 21:19:44 +0800

27 Aug, 2008

1 commit

0188d6c58 GFS2: Fix & clean up GFS2 rename ... Browse Code »

This patch fixes a locking issue in the rename code by ensuring that we hold
the per sb rename lock over both directory and "other" renames which involve
different parent directories.

At the same time, this moved the (only called from one place) function
gfs2_ok_to_move into the file that its called from, so we can mark it
static. This should make a code a bit easier to follow.

Signed-off-by: Steven Whitehouse
Cc: Peter Staubach

Steven Whitehouse
2008-08-27 20:33:10 +0800

27 Jul, 2008

1 commit

a569c711f [PATCH] don't pass nameidata to gfs2_lookupi() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-07-27 08:53:36 +0800

10 Jul, 2008

1 commit

c9f6a6bbc [GFS2] Remove support for unused and pointless flag ... Browse Code »
2

The ability to mark files for direct i/o access when opened
normally is both unused and pointless, so this patch removes
support for that feature.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2008-07-10 23:09:29 +0800

03 Jul, 2008

1 commit

f58ba8891 [GFS2] don't call permission() ... Browse Code »

GFS2 calls permission() to verify permissions after locks on the files
have been taken.

For this it's sufficient to call gfs2_permission() instead. This
results in the following changes:

- IS_RDONLY() check is not performed
- IS_IMMUTABLE() check is not performed
- devcgroup_inode_permission() is not called
- security_inode_permission() is not called

IS_RDONLY() should be unnecessary anyway, as the per-mount read-only
flag should provide protection against read-only remounts during
operations. do_gfs2_set_flags() has been fixed to perform
mnt_want_write()/mnt_drop_write() to protect against remounting
read-only.

IS_IMMUTABLE has been added to gfs2_permission()

Repeating the security checks seems to be pointless, as they don't
normally change, and if they do, it's independent of the filesystem
state.

Signed-off-by: Miklos Szeredi
Signed-off-by: Steven Whitehouse

Miklos Szeredi
2008-07-03 17:22:01 +0800

12 May, 2008

1 commit

091806edd [GFS2] filesystem consistency error from do_strip ... Browse Code »

This patch fixes a GFS2 filesystem consistency error reported from
function do_strip. The problem was caused by a timing window
that allowed two vfs inodes to be created in memory that point
to the same file. The problem is fixed by making the vfs's
iget_test, iget_set mechanism check and set a new bit in the
in-core gfs2_inode structure while the vfs inode spin_lock is held.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse

Bob Peterson
2008-05-12 15:54:53 +0800

10 Apr, 2008

1 commit

16c5f06f1 [GFS2] fix GFP_KERNEL misuses ... Browse Code »

There are several places where GFP_KERNEL allocations happen under a glock,
which will result in hangs if we're under memory pressure and go to re-enter the
fs in order to flush stuff out. This patch changes the culprits to GFS_NOFS to
keep this problem from happening. Thank you,

Signed-off-by: Josef Bacik
Signed-off-by: Steven Whitehouse

Josef Bacik
2008-04-10 16:55:26 +0800

31 Mar, 2008

9 commits

182fe5abd [GFS2] possible null pointer dereference fixup ... Browse Code »

gfs2_alloc_get may fail so we have to check it to prevent
NULL pointer dereference.

Signed-off-by: Cyrill Gorcunov
Signed-off-by: Steven Whitehouse

Cyrill Gorcunov
2008-03-31 17:41:28 +0800
43a33c53c [GFS2] re-support special inode ... Browse Code »

a previous commit removed call to
init_special_inode from inode lookuping, this cause problems as:

# mknod /mnt/gfs2/dev/null c 1 3
# cat /mnt/gfs2/dev/null
cat: /mnt/gfs2/dev/null: Invalid argument

without special inode, GFS2 cannot support char device file,
block device file, fifo pipe, and socket file, lose many important
features as a common file system.

this one line patch re add special inode support.

Signed-off-by: Denis Cheng
Signed-off-by: Steven Whitehouse

Denis Cheng
2008-03-31 17:41:22 +0800
d83225d45 [GFS2] remove gfs2_dev_iops ... Browse Code »

struct inode_operations gfs2_dev_iops is always the same as gfs2_file_iops,
since Jan 2006, when GFS2 merged into mainstream kernel.

So one of them could be removed.

Signed-off-by: Denis Cheng
Signed-off-by: Steven Whitehouse

Denis Cheng
2008-03-31 17:41:20 +0800
7afd88d91 [GFS2] Fix a page lock / glock deadlock ... Browse Code »

We've previously been using a "try lock" in readpage on the basis that
it would prevent deadlocks due to the inverted lock ordering (our normal
lock ordering is glock first and then page lock). Unfortunately tests
have shown that this isn't enough. If the glock has a demote request
queued such that run_queue() in the glock code tries to do a demote when
its called under readpage then it will try and write out all the dirty
pages which requires locking them. This then deadlocks with the page
locked by readpage.

The solution is to always require two calls into readpage. The first
unlocks the page, gets the glock and returns AOP_TRUNCATED_PAGE, the
second does the actual readpage and unlocks the glock & page as
required.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2008-03-31 17:41:12 +0800
77658aad2 [GFS2] Eliminate (almost) duplicate field from gfs2_inode ... Browse Code »

The blocks counter is almost a duplicate of the i_blocks
field in the VFS inode. The only difference is that i_blocks
can be only 32bits long for 32bit arch without large single file
support. Since GFS2 doesn't handle the non-large single file
case (for 32 bit anyway) this adds a new config dependency on
64BIT || LSF. This has always been the case, however we've never
explicitly said so before.

Even if we do add support for the non-LSF case, we will still
not require this field to be duplicated since we will not be
able to access oversized files anyway.

So the net result of all this is that we shave 8 bytes from a gfs2_inode
and get our config deps correct.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2008-03-31 17:40:55 +0800
ce276b06e [GFS2] Reduce inode size by merging fields ... Browse Code »

There were three fields being used to keep track of the location
of the most recently allocated block for each inode. These have
been merged into a single field in order to better keep the
data and metadata for an inode close on disk, and also to reduce
the space required for storage.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2008-03-31 17:40:37 +0800
9a0045088 [GFS2] Shrink & rename di_depth ... Browse Code »

This patch forms a pair with the previous patch which shrunk
di_height. Like that patch di_depth is renamed i_depth and moved
into struct gfs2_inode directly. Also the field goes from 16 bits
to 8 bits since it is also limited to a max value which is rather
small (17 in this case). In addition we also now validate the field
against this maximum value when its read in.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2008-03-31 17:40:31 +0800
ca390601a [GFS2] Fix debug inode printing ... Browse Code »

I noticed that the latest change to i_height got rid of the
value from the inode dump. This patch adds it back.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse

Bob Peterson
2008-03-31 17:39:52 +0800
ecc30c791 [GFS2] Streamline indirect pointer tree height calculation ... Browse Code »

This patch improves the calculation of the tree height in order to reduce
the number of operations which are carried out on each call to gfs2_block_map.
In the common case, we now make a single comparison, rather than calculating
the required tree height from scratch each time. Also in the case that the
tree does need some extra height, we start from the current height rather from
zero when we work out what the new height ought to be.

In addition the di_height field is moved into the inode proper and reduced
in size to a u8 since the value must be between 0 and GFS2_MAX_META_HEIGHT (10).

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2008-03-31 17:39:46 +0800

08 Feb, 2008

1 commit

69840b0d0 iget: use iget_failed() in GFS2 ... Browse Code »

Use iget_failed() in GFS2 to kill a failed inode.

Signed-off-by: David Howells
Cc: Steven Whitehouse
Acked-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2008-02-08 00:42:27 +0800

25 Jan, 2008

7 commits

1b8177ec1 [GFS2] Lockup on error ... Browse Code »

I spotted this bug while I was digging around. Looks like it could cause
a lockup in some rare error condition.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse

Bob Peterson
2008-01-25 16:21:04 +0800
6dbd82248 [GFS2] Reduce inode size by moving i_alloc out of line ... Browse Code »

It is possible to reduce the size of GFS2 inodes by taking the i_alloc
structure out of the gfs2_inode. This patch allocates the i_alloc
structure whenever its needed, and frees it afterward. This decreases
the amount of low memory we use at the expense of requiring a memory
allocation for each page or partial page that we write. A quick test
with postmark shows that the overhead is not measurable and I also note
that OCFS2 use the same approach.

In the future I'd like to solve the problem by shrinking down the size
of the members of the i_alloc structure, but for now, this reduces the
immediate problem of using too much low-memory on x86 and doesn't add
too much overhead.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2008-01-25 16:18:25 +0800
c97bfe435 [GFS2] Remove lock methods for lock_nolock protocol ... Browse Code »

GFS2 supports two modes of locking - lock_nolock for single node filesystem
and lock_dlm for cluster mode locking. The gfs2 lock methods are removed from
file operation table for lock_nolock protocol. This would allow VFS to handle
posix lock and flock logics just like other in-tree filesystems without
duplication.

Signed-off-by: S. Wendy Cheng
Signed-off-by: Steven Whitehouse

Wendy Cheng
2008-01-25 16:08:15 +0800
2bcd610d2 [GFS2] Don't add glocks to the journal ... Browse Code »

The only reason for adding glocks to the journal was to keep track
of which locks required a log flush prior to release. We add a
flag to the glock to allow this check to be made in a simpler way.

This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64)
and means that we can avoid extra work during the journal flush.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2008-01-25 16:07:52 +0800
5561093e2 [GFS2] Introduce gfs2_set_aops() ... Browse Code »

Just like ext3 we now have three sets of address space operations
to cover the cases of writeback, ordered and journalled data
writes. This means that the individual operations can now become
less complicated as we are able to remove some of the tests for
file data mode from the code.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2008-01-25 16:07:23 +0800
f91a0d3e2 [GFS2] Remove useless i_cache from inodes ... Browse Code »

The i_cache was designed to keep references to the indirect blocks
used during block mapping so that they didn't have to be looked
up continually. The idea failed because there are too many places
where the i_cache needs to be freed, and this has in the past been
the cause of many bugs.

In addition there was no performance benefit being gained since the
disk blocks in question were cached anyway. So this patch removes
it in order to simplify the code to prepare for other changes which
would otherwise have had to add further support for this feature.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2008-01-25 16:07:16 +0800
51ff87bdd [GFS2] Clean up internal read function ... Browse Code »

As requested by Christoph, this patch cleans up GFS2's internal
read function so that it no longer uses the do_generic_mapping_read
function. This function is obsolete and GFS2 is the last user of it.

As a side effect the internal read code gets smaller and easier
to read and gfs2_readpage is split into two. One function has the locking
and the other function has the rest of the logic.

Signed-off-by: Steven Whitehouse
Cc: Christoph Hellwig

Steven Whitehouse
2008-01-25 16:07:11 +0800

10 Oct, 2007

2 commits

7a9f53b3c [GFS2] Alternate gfs2_iget to avoid looking up inodes being freed ... Browse Code »
2

There is a possible deadlock between two processes on the same node, where one
process is deleting an inode, and another process is looking for allocated but
unused inodes to delete in order to create more space.

process A does an iput() on inode X, and it's i_count drops to 0. This causes
iput_final() to be called, which puts an inode into state I_FREEING at
generic_delete_inode(). There no point between when iput_final() is called, and
when I_FREEING is set where GFS2 could acquire any glocks. Once I_FREEING is
set, no other process on that node can successfully look up that inode until
the delete finishes.

process B locks the the resource group for the same inode in get_local_rgrp(),
which is called by gfs2_inplace_reserve_i()

process A tries to lock the resource group for the inode in
gfs2_dinode_dealloc(), but it's already locked by process B

process B waits in find_inode for the inode to have the I_FREEING state cleared.

Deadlock.

This patch solves the problem by adding an alternative to gfs2_iget(),
gfs2_iget_skip(), that simply skips any inodes that are in the I_FREEING
state.o The alternate test function is just like the original one, except that
it fails if the inode is being freed, and sets a skipped flag. The alternate
set function is just like the original, except that it fails if the skipped
flag is set. Only try_rgrp_unlink() calls gfs2_iget_skip() instead of
gfs2_iget().

Signed-off-by: Benjamin E. Marzinski
Signed-off-by: Steven Whitehouse

Benjamin Marzinski
2007-10-10 15:56:29 +0800
e9bd2b3ba [GFS2] fix inode meta data corruption ... Browse Code »

Fix a nasty inode meta data corruption issue by keeping the buffer head in
icache array. This buffer needs to stay in memory until journal flush occurs
Otherwise, gfs2_meta_inode_buffer could do a disk read before the inode hits
disk. It ends up with meta data corruptions. The buffer will be released as
part of the existing journal flush logic.

Signed-off-by: S. Wendy Cheng
Signed-off-by: Steven Whitehouse

Wendy Cheng
2007-10-10 15:55:51 +0800

09 Jul, 2007

5 commits

35dcc52e3 [GFS2] Remove i_mode passing from NFS File Handle ... Browse Code »

GFS2 has been passing i_mode within NFS File Handle. Other than the
wrong assumption that there is always room for this extra 16 bit value,
the current gfs2_get_dentry doesn't really need the i_mode to work
correctly. Note that GFS2 NFS code does go thru the same lookup code
path as direct file access route (where the mode is obtained from name
lookup) but gfs2_get_dentry() is coded for different purpose. It is not
used during lookup time. It is part of the file access procedure call.
When the call is invoked, if on-disk inode is not in-memory, it has to
be read-in. This makes i_mode passing a useless overhead.

Signed-off-by: S. Wendy Cheng
Signed-off-by: Steven Whitehouse

Wendy Cheng
2007-07-09 15:24:11 +0800
bb9bcf061 [GFS2] Obtaining no_formal_ino from directory entry ... Browse Code »

GFS2 lookup code doesn't ask for inode shared glock. This implies during
in-memory inode creation for existing file, GFS2 will not disk-read in
the inode contents. This leaves no_formal_ino un-initialized during
lookup time. The un-initialized no_formal_ino is subsequently encoded
into file handle. Clients will get ESTALE error whenever it tries to
access these files.

Signed-off-by: S. Wendy Cheng
Signed-off-by: Steven Whitehouse

Wendy Cheng
2007-07-09 15:24:08 +0800
d93cfa988 [GFS2] Fix deallocation issues ... Browse Code »

There were two issues during deallocation of unlinked inodes. The
first was relating to the use of a "try" lock which in the case of
the inode lock wasn't trying hard enough to deallocate in all
circumstances (now changed to a normal glock) and in the case of
the iopen lock didn't wait for the demotion of the shared lock before
attempting to get the exclusive lock, and thereby sometimes (timing dependent)
not completing the deallocation when it should have done.

The second issue related to the lack of a way to invalidate dcache entries
on remote nodes (now fixed by this patch) which meant that unlinks were
taking a long time to return disk space to the fs. By adding some code to
invalidate the dcache entries across the cluster for unlinked inodes, that
is now fixed.

This patch was written jointly by Abhijith Das and Steven Whitehouse.

Signed-off-by: Abhijith Das
Signed-off-by: Steven Whitehouse

Abhijith Das
2007-07-09 15:23:36 +0800
037bcbb75 [GFS2] gfs2_lookupi() uninitialised var fix ... Browse Code »

fs/gfs2/inode.c: In function 'gfs2_lookupi':
fs/gfs2/inode.c:392: warning: 'error' may be used uninitialized in this function

Looks like a real bug to me.

Cc: Steven Whitehouse
Signed-off-by: Andrew Morton
Signed-off-by: Steven Whitehouse

akpm@linux-foundation.org
2007-07-09 15:23:29 +0800
c8cdf4793 [GFS2] Recovery for lost unlinked inodes ... Browse Code »

Under certain circumstances its possible (though rather unlikely) that
inodes which were unlinked by one node while still open on another might
get "lost" in the sense that they don't get deallocated if the node
which held the inode open crashed before it was unlinked.

This patch adds the recovery code which allows automatic deallocation of
the inode if its found during block allocation (the sensible time to
look for such inodes since we are scanning the rgrp's bitmaps anyway at
this time, so it adds no overhead to do this).

Since the inode will have had its i_nlink set to zero, all we need to
trigger recovery is a lookup and an iput(), and the normal deallocation
code takes care of the rest.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-07-09 15:23:26 +0800