Eric Lee / smarc-fsl-linux-kernel

10 Oct, 2007

10 commits

49e61f2ef [GFS2] Move inode deletion out of blocking_cb ... Browse Code »

Move inode deletion code out of blocking_cb handle_callback route to
avoid racy conditions that end up blocking lock_dlm1 thread. Fix
bugzilla 286821.

Signed-off-by: Wendy Cheng
Signed-off-by: Steven Whitehouse

Wendy Cheng
2007-10-10 15:56:17 +0800
b4c20166d [GFS2] flocks from same process trip kernel BUG at fs/gfs2/glock.c:1118! ... Browse Code »

This patch adds a new flag to the gfs2_holder structure GL_FLOCK.
It is set on holders of glocks representing flocks. This flag is
checked in add_to_queue() and a process is permitted to queue more
than one holder onto a glock if it is set. This solves the issue
of a process not being able to do multiple flocks on the same file.
Through a single descriptor, a process can now promote and demote
flocks. Through multiple descriptors a process can now queue
multiple flocks on the same file. There's still the problem of
a process deadlocking itself (because gfs2 blocking locks are not
interruptible) by queueing incompatible deadlock.

Signed-off-by: Abhijith Das
Signed-off-by: Steven Whitehouse

Abhijith Das
2007-10-10 15:56:14 +0800
c4f68a130 [GFS2] delay glock demote for a minimum hold time ... Browse Code »

When a lot of IO, with some distributed mmap IO, is run on a GFS2 filesystem in
a cluster, it will deadlock. The reason is that do_no_page() will repeatedly
call gfs2_sharewrite_nopage(), because each node keeps giving up the glock
too early, and is forced to call unmap_mapping_range(). This bumps the
mapping->truncate_count sequence count, forcing do_no_page() to retry. This
patch institutes a minimum glock hold time a tenth a second. This insures
that even in heavy contention cases, the node has enough time to get some
useful work done before it gives up the glock.

A second issue is that when gfs2_glock_dq() is called from within a page fault
to demote a lock, and the associated page needs to be written out, it will
try to acqire a lock on it, but it has already been locked at a higher level.
This patch puts makes gfs2_glock_dq() use the work queue as well, to avoid this
issue. This is the same patch as Steve Whitehouse originally proposed to fix
this issue, execpt that gfs2_glock_dq() now grabs a reference to the glock
before it queues up the work on it.

Signed-off-by: Benjamin E. Marzinski
Signed-off-by: Steven Whitehouse

Benjamin Marzinski
2007-10-10 15:55:48 +0800
a947e0335 [GFS2] Wendy's dump lockname in hex & fix glock dump ... Browse Code »

With this patch, gfs2 glockdump through the debugfs filesystem will only
dump glocks for the specified filesystem instead of all glocks. Also, to
aid debugging, the glock number is dumped in hex instead of decimal.

Signed-off-by: Steven Whitehouse
Signed-off-by: S. Wendy Cheng
Signed-off-by: Abhijith Das

Abhijith Das
2007-10-10 15:55:41 +0800
8fbbfd214 [GFS2] Reduce number of gfs2_scand processes to one ... Browse Code »

We only need a single gfs2_scand process rather than the one
per filesystem which we had previously. As a result the parameter
determining the frequency of gfs2_scand runs becomes a module
parameter rather than a mount parameter as it was before.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-10-10 15:55:08 +0800
4ef290025 [GFS2] mark struct *_operations const ... Browse Code »

these struct *_operations are all method tables, thus should be const.

Signed-off-by: Denis Cheng
Signed-off-by: Steven Whitehouse

Denis Cheng
2007-10-10 15:55:03 +0800
7b08fc620 [GFS2] Fix an oops in glock dumping ... Browse Code »

This fixes an oops which was occurring during glock dumping due to the
seq file code not taking a reference to the glock. Also this fixes a
memory leak which occurred in certain cases, in turn preventing the
filesystem from unmounting.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-10-10 15:54:49 +0800
aa0481e58 [GFS2] Clean up duplicate includes in fs/gfs2/ ... Browse Code »

This patch cleans up duplicate includes in
fs/gfs2/

Signed-off-by: Jesper Juhl
Signed-off-by: Steven Whitehouse

Jesper Juhl
2007-10-10 15:54:44 +0800
26caee5bc [GFS2] Fix calculation of demote state ... Browse Code »

If a glock is in the exclusive state and a request for demote to
deferred has been received, then further requests for demote to
shared are being ignored. This patch fixes that by ensuring that
we demote to unlocked in that case.

Signed-off-by: Josef Whiter
Signed-off-by: Steven Whitehouse

Josef Whiter
2007-10-10 15:54:42 +0800
87124e581 [GFS2] Fix two races relating to glock callbacks ... Browse Code »

One of the races relates to referencing a variable while not holding
its protecting spinlock. The patch simply moves the test inside the
spin lock. The other races occurs when a demote to unlocked request
occurs during the time a demote to shared request is already running.
This of course only happens in the case that the lock was in the
exclusive mode to start with. The patch adds a check to see if another
demote request has occurred in the mean time and if it has, then it
performs a second demote.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-10-10 15:54:39 +0800

09 Jul, 2007

4 commits

eaf5bd3ca [GFS2] Simplify multiple glock aquisition ... Browse Code »

There is a bug in the code which acquires multiple glocks where if the
initial out-of-order attempt fails part way though we can land up trying
to acquire the wrong number of glocks. This is part of the fix for red
hat bz #239737. The other part of the bz doesn't apply to upstream
kernels since it was fixed by:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d3717bdf8f08a0e1039158c8bab2c24d20f492b6

Since the out-of-order code doesn't appear to add anything to the
performance of GFS2, this patch just removed it rather than trying to
fix it. It should be much easier to see whats going on here now. In
addition, we don't allocate any memory unless we are using a lot of
glocks (which is a relatively uncommon case).

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-07-09 15:23:50 +0800
d93cfa988 [GFS2] Fix deallocation issues ... Browse Code »

There were two issues during deallocation of unlinked inodes. The
first was relating to the use of a "try" lock which in the case of
the inode lock wasn't trying hard enough to deallocate in all
circumstances (now changed to a normal glock) and in the case of
the iopen lock didn't wait for the demotion of the shared lock before
attempting to get the exclusive lock, and thereby sometimes (timing dependent)
not completing the deallocation when it should have done.

The second issue related to the lack of a way to invalidate dcache entries
on remote nodes (now fixed by this patch) which meant that unlinks were
taking a long time to return disk space to the fs. By adding some code to
invalidate the dcache entries across the cluster for unlinked inodes, that
is now fixed.

This patch was written jointly by Abhijith Das and Steven Whitehouse.

Signed-off-by: Abhijith Das
Signed-off-by: Steven Whitehouse

Abhijith Das
2007-07-09 15:23:36 +0800
dbb7cae2a [GFS2] Clean up inode number handling ... Browse Code »

This patch cleans up the inode number handling code. The main difference
is that instead of looking up the inodes using a struct gfs2_inum_host
we now use just the no_addr member of this structure. The tests relating
to no_formal_ino can then be done by the calling code. This has
advantages in that we want to do different things in different code
paths if the no_formal_ino doesn't match. In the NFS patch we want to
return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
no_formal_ino doesn't match and thus we can withdraw in this case.

In order to later fix bz #201012, we need to be able to look up an inode
without knowing no_formal_ino, as the only information that is known to
us is the on-disk location of the inode in question.

This patch will also help us to fix bz #236099 at a later date by
cleaning up a lot of the code in that area.

There are no user visible changes as a result of this patch and there
are no changes to the on-disk format either.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-07-09 15:22:24 +0800
cd81a4bac [GFS2] Addendum patch 2 for gfs2_grow ... Browse Code »

This addendum patch 2 corrects three things:

1. It fixes a stupid mistake in the previous addendum that broke gfs2.
Ref: https://www.redhat.com/archives/cluster-devel/2007-May/msg00162.html
2. It fixes a problem that Dave Teigland pointed out regarding the
external declarations in ops_address.h being in the wrong place.
3. It recasts a couple more %llu printks to (unsigned long long)
as requested by Steve Whitehouse.

I would have loved to put this all in one revised patch, but there was
a rush to get some patches for RHEL5. Therefore, the previous patches
were applied to the git tree "as is" and therefore, I'm posting another
addendum. Sorry.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse

Robert Peterson
2007-07-09 15:22:19 +0800

01 May, 2007

8 commits

37fde8ca6 [GFS2] Uncomment sprintf_symbol calling code ... Browse Code »

Now that the patch from -mm has gone upstream, we can uncomment the code
in GFS2 which uses sprintf_symbol.

Signed-off-by: Steven Whitehouse
Cc: Robert Peterson

Steven Whitehouse
2007-05-01 16:51:39 +0800
5f8820960 [GFS2] lockdump improvements ... Browse Code »

The patch below consists of the following changes (in code order):

1. I fixed a minor compiler warning regarding the printing of
a kernel symbol address.
2. I implemented a suggestion from Dave Teigland that moves
the debugfs information for gfs2 into a subdirectory so
we can easily expand our use of debugfs in the future.
The current code keeps the glock information in:
/debug/gfs2/
With the patch, the new code keeps the glock information in:
/debug/gfs2//glock
That will allow us to create more debugfs files in the future.
3. This fixes a bug whereby a failed mount attempt causes the
debugfs file to not be deleted. Failed mount attempts should
always clean up after themselves, including deleting the
debugfs file and/or directory.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse

Robert Peterson
2007-05-01 16:11:33 +0800
7a0079d9e [GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump) ... Browse Code »

This is for Bugzilla Bug 236008: Kernel gpf doing cat /debugfs/gfs2/xxx
(lock dump) seen at the "gfs2 summit". This also fixes the bug that caused
garbage to be printed by the "initialized at" field. I apologize for the
kludge, but that code will all be ripped out anyway when the official
sprint_symbol function becomes available in the Linux kernel. I also
changed some formatting so that spaces are replaced by proper tabs.

Signed-off-by: Robert Peterson
Signed-off-by: Steven Whitehouse

Robert Peterson
2007-05-01 16:11:28 +0800
04b933f27 [GFS2] Red Hat bz 228540: owner references ... Browse Code »

In Testing the previously posted and accepted patch for
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228540
I uncovered some gfs2 badness. It turns out that the current
gfs2 code saves off a process pointer when glocks is taken
in both the glock and glock holder structures. Those
structures will persist in memory long after the process has
ended; pointers to poisoned memory.

This problem isn't caused by the 228540 fix; the new capability
introduced by the fix just uncovered the problem.

I wrote this patch that avoids saving process pointers
and instead saves off the process pid. Rather than
referencing the bad pointers, it now does process lookups.
There is special code that makes the output nicer for
printing holder information for processes that have ended.

This patch also adds a stub for the new "sprint_symbol"
function that exists in Andrew Morton's -mm patch set, but
won't go into the base kernel until 2.6.22, since it adds
functionality but doesn't fix a bug.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse

Robert Peterson
2007-05-01 16:10:55 +0800
420d2a102 [GFS2] Fix a bug on i386 due to evaluation order ... Browse Code »

Since gcc didn't evaluate the last two terms of the expression in
glock.c:1881 as a constant expression, it resulted in an error on
i386 due to the lack of a 64bit divide instruction. This adds some
brackets to fix the problem.

This was reported by Andrew Morton.

Signed-off-by: Steven Whitehouse
Cc: Andrew Morton

Steven Whitehouse
2007-05-01 16:10:42 +0800
3b8249f61 [GFS2] Fix bz 224480 and cleanup glock demotion code ... Browse Code »

This patch prevents the printing of a warning message in cases where
the fs is functioning normally by handing off responsibility for
unlinked, but still open inodes, to another node for eventual deallocation.
Also, there is now an improved system for ensuring that such requests
to other nodes do not get lost. The callback on the iopen lock is
only ever called when i_nlink == 0 and when a node is unable to deallocate
it due to it still being in use on another node. When a node receives
the callback therefore, it knows that i_nlink must be zero, so we mark
it as such (in gfs2_drop_inode) in order that it will then attempt
deallocation of the inode itself.

As an additional benefit, queuing a demote request no longer requires
a memory allocation. This simplifies the code for dealing with gfs2_holders
as it removes one special case.

There are two new fields in struct gfs2_glock. gl_demote_state is the
state which the remote node has requested and gl_demote_time is the
time when the request came in. Both fields are only valid when the
GLF_DEMOTE flag is set in gl_flags.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-05-01 16:10:39 +0800
5c7342d89 [GFS2] fix bz 231369, gfs2 will oops if you specify an invalid mount option ... Browse Code »

If you specify an invalid mount option when trying to mount a gfs2 filesystem,
gfs2 will oops. The attached patch resolves this problem.

Signed-off-by: Josef Whiter
Signed-off-by: Steven Whitehouse

Josef Whiter
2007-05-01 16:10:32 +0800
7c52b166c [GFS2] Add gfs2_tool lockdump support to gfs2 (bz 228540) ... Browse Code »

The attached patch resolves bz 228540. This adds the capability
for gfs2 to dump gfs2 locks through the debugfs file system.
This used to exist in gfs1 as "gfs_tool lockdump" but it's missing from
gfs2 because all the ioctls were stripped out. Please see the bugzilla
for more history about the fix. This patch is also attached to the bugzilla
record.

The patch is against Steve Whitehouse's latest nmw git tree kernel
(2.6.21-rc1) and has been tested on system trin-10.

Signed-off-by: Robert Peterson
Signed-off-by: Steven Whitehouse

Robert Peterson
2007-05-01 16:10:29 +0800

08 Mar, 2007

2 commits

95d97b7dd [GFS2] build fix ... Browse Code »

fs/gfs2/glock.c:2198: error: 'THIS_MODULE' undeclared here (not in a function)

Cc: Steven Whitehouse
Signed-off-by: Andrew Morton

akpm@linux-foundation.org
2007-03-08 03:03:25 +0800
631c42e17 [GFS2] go_drop_bh is never used, so remove it ... Browse Code »

The ->go_drop_bh function is never used, so this removes it and the single
caller,

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-03-08 03:02:53 +0800

06 Feb, 2007

10 commits

61be084ef [GFS2] Put back semaphore to avoid umount problem ... Browse Code »

Dave Teigland fixed this bug a while back, but I managed to mistakenly
remove the semaphore during later development. It is required to avoid
the list of inodes changing during an invalidate_inodes call. I have
made it an rwsem since the read side will be taken frequently during
normal filesystem operation. The write site will only happen during
umount of the file system.

Also the bug only triggers when using the DLM lock manager and only then
under certain conditions as its timing related.

Signed-off-by: Steven Whitehouse
Cc: David Teigland

Steven Whitehouse
2007-02-06 02:38:14 +0800
d043e1900 [GFS2] Fix typo in glock.c ... Browse Code »

This is a one letter typo fix in glock.c, spotted by Rob Kenna.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-02-06 02:37:41 +0800
90101c318 [GFS2] Compile fix for glock.c ... Browse Code »

This one liner got missed from the previous patch.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-02-06 02:37:35 +0800
12132933c [GFS2] Remove queue_empty() function ... Browse Code »

This function is not longer required since we do not do recursive
locking in the glock layer. As a result all its callers can be
replaceed with list_empty() calls.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-02-06 02:37:32 +0800
b5d32bead [GFS2] Tidy up glops calls ... Browse Code »

This patch doesn't make any changes to the ordering of the various
operations related to glocking, but it does tidy up the calls to the
glops.c functions to make the structure more obvious.

The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be
made static within glock.c since they are called by every set of glock
operations. The xmote_th and drop_th glock operations are then made
conditional upon those two routines existing and called from the
previously mentioned functions in glock.c respectively.

Also it can be seen that the go_sync operation isn't needed since it can
easily be replaced by calls to xmote_bh and drop_bh respectively. This
results in no longer (confusingly) calling back into routines in glock.c
from glops.c and also reducing the glock operations by one member.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-02-06 02:37:26 +0800
1c0f4872d [GFS2] Remove local exclusive glock mode ... Browse Code »

Here is a patch for GFS2 to remove the local exclusive flag. In
the places it was used, mutex's are always held earlier in the
call path, so it appears redundant in the LM_ST_SHARED case.

Also, the GFS2 holders were setting local exclusive in any case where
the requested lock was LM_ST_EXCLUSIVE. So the other places in the glock
code where the flag was tested have been replaced with tests for the
lock state being LM_ST_EXCLUSIVE in order to ensure the logic is the
same as before (i.e. LM_ST_EXCLUSIVE is always locally exclusive as well
as globally exclusive).

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-02-06 02:37:20 +0800
6bd9c8c2f [GFS2] Remove unused go_callback operation ... Browse Code »

This is never used, so we might as well remove it.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-02-06 02:37:17 +0800
e5dab552c [GFS2] Remove the "greedy" function from glock.[ch] ... Browse Code »

The "greedy" code was an attempt to retain glocks for a minimum length
of time when they relate to mmap()ed files. The current implementation
of this feature is not, however, ideal in that it required allocating
memory in order to do this and its overly complicated.

It also misses the mark by ignoring the other I/O operations which are
just as likely to suffer from the same problem. So the plan is to remove
this now and then add the functionality back as part of the glock state
machine at a later date (and thus take into account all the possible
users of this feature)

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-02-06 02:37:14 +0800
fee852e37 [GFS2] Shrink gfs2_inode memory by half ... Browse Code »

Here is something I spotted (while looking for something entirely
different) the other day.

Rather than using a completion in each and every struct gfs2_holder,
this removes it in favour of hashed wait queues, thus saving a
considerable amount of memory both on the stack (where a number of
gfs2_holder structures are allocated) and in particular in the
gfs2_inode which has 8 gfs2_holder structures embedded within it.

As a result on x86_64 the gfs2_inode shrinks from 2488 bytes to
1912 bytes, a saving of 576 bytes per inode (no thats not a typo!).
In actual practice we get a much better result than that since
now that a gfs2_inode is under the 2048 byte barrier, we get two
per 4k slab page effectively halving the amount of memory required
to store gfs2_inodes.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-02-06 02:37:11 +0800
3699e3a44 [GFS2] Clean up/speed up readdir ... Browse Code »

This removes the extra filldir callback which gfs2 was using to
enclose an attempt at readahead for inodes during readdir. The
code was too complicated and also hurts performance badly in the
case that the getdents64/readdir call isn't being followed by
stat() and it wasn't even getting it right all the time when it
was.

As a result, on my test box an "ls" of a directory containing 250000
files fell from about 7mins (freshly mounted, so nothing cached) to
between about 15 to 25 seconds. When the directory content was cached,
the time taken fell from about 3mins to about 4 or 5 seconds.

Interestingly in the cached case, running "ls -l" once reduced the time
taken for subsequent runs of "ls" to about 6 secs even without this
patch. Now it turns out that there was a special case of glocks being
used for prefetching the metadata, but because of the timeouts for these
locks (set to 10 secs) the metadata was being timed out before it was
being used and this the prefetch code was constantly trying to prefetch
the same data over and over.

Calling "ls -l" meant that the inodes were brought into memory and once
the inodes are cached, the glocks are not disposed of until the inodes
are pushed out of the cache, thus extending the lifetime of the glocks,
and thus bringing down the time for subsequent runs of "ls"
considerably.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2007-02-06 02:37:04 +0800

08 Dec, 2006

1 commit

1c1afa3c0 Merge master.kernel.org:/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw ... Browse Code »

* master.kernel.org:/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (73 commits)
[DLM] Clean up lowcomms
[GFS2] Change gfs2_fsync() to use write_inode_now()
[GFS2] Fix indent in recovery.c
[GFS2] Don't flush everything on fdatasync
[GFS2] Add a comment about reading the super block
[GFS2] Mount problem with the GFS2 code
[GFS2] Remove gfs2_check_acl()
[DLM] fix format warnings in rcom.c and recoverd.c
[GFS2] lock function parameter
[DLM] don't accept replies to old recovery messages
[DLM] fix size of STATUS_REPLY message
[GFS2] fs/gfs2/log.c:log_bmap() fix printk format warning
[DLM] fix add_requestqueue checking nodes list
[GFS2] Fix recursive locking in gfs2_getattr
[GFS2] Fix recursive locking in gfs2_permission
[GFS2] Reduce number of arguments to meta_io.c:getbuf()
[GFS2] Move gfs2_meta_syncfs() into log.c
[GFS2] Fix journal flush problem
[GFS2] mark_inode_dirty after write to stuffed file
[GFS2] Fix glock ordering on inode creation
...

Linus Torvalds
2006-12-08 01:13:20 +0800

30 Nov, 2006

5 commits

0ac230699 [GFS2] lock function parameter ... Browse Code »

Fix function parameter typing:
fs/gfs2/glock.c:100: warning: function declaration isn't a prototype

Signed-off-by: Randy Dunlap
Signed-off-by: Steven Whitehouse

Randy Dunlap
2006-11-30 23:37:18 +0800
b004157ab [GFS2] Fix journal flush problem ... Browse Code »

This fixes a bug which resulted in poor performance due to flushing
the journal too often. The code path in question was via the inode_go_sync()
function in glops.c. The solution is not to flush the journal immediately
when inodes are ejected from memory, but batch up the work for glockd to
deal with later on. This means that glocks may now live on beyond the end of
the lifetime of their inodes (but not very much longer in the normal case).

Also fixed in this patch is a bug (which was hidden by the bug mentioned above) in
calculation of the number of free journal blocks.

The gfs2_logd process has been altered to be more responsive to the journal
filling up. We now wake it up when the number of uncommitted journal blocks
has reached the threshold level rather than trying to flush directly at the
end of each transaction. This again means doing fewer, but larger, log
flushes in general.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:36:42 +0800
1a14d3a68 [GFS2] Simplify glops functions ... Browse Code »

The go_sync callback took two flags, but one of them was set on every
call, so this patch removes once of the flags and makes the previously
conditional operations (on this flag), unconditional.

The go_inval callback took three flags, each of which was set on every
call to it. This patch removes the flags and makes the operations
unconditional, which makes the logic rather more obvious.

Two now unused flags are also removed from incore.h.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:36:30 +0800
ab923031c [GFS2] Fix memory allocation in glock.c ... Browse Code »

Change from GFP_KERNEL to GFP_NOFS as this was causing a
slow down when trying to push inodes from cache.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:35:46 +0800
c594d8866 [GFS2] Remove unused GL_DUMP flag ... Browse Code »

There is no way to set the GL_DUMP flag, and in any case the
same thing can be done with systemtap if required for debugging,
so this removes it.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2006-11-30 23:34:40 +0800