Eric Lee / linux-smarc-t335x-v3.2

08 Jun, 2011

1 commit

d205df995 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
GFS2: Processes waiting on inode glock that no processes are holding

Linus Torvalds
2011-06-08 09:44:10 +0800

27 May, 2011

1 commit

b7c2f0362 Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 ... Browse Code »

* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
gfs2: Drop __TIME__ usage
isdn/diva: Drop __TIME__ usage
atm: Drop __TIME__ usage
dlm: Drop __TIME__ usage
wan/pc300: Drop __TIME__ usage
parport: Drop __TIME__ usage
hdlcdrv: Drop __TIME__ usage
baycom: Drop __TIME__ usage
pmcraid: Drop __DATE__ usage
edac: Drop __DATE__ usage
rio: Drop __DATE__ usage
scsi/wd33c93: Drop __TIME__ usage
scsi/in2000: Drop __TIME__ usage
aacraid: Drop __TIME__ usage
media/cx231xx: Drop __TIME__ usage
media/radio-maxiradio: Drop __TIME__ usage
nozomi: Drop __TIME__ usage
cyclades: Drop __TIME__ usage

Linus Torvalds
2011-05-27 04:19:00 +0800

26 May, 2011

1 commit

8d2c50e3b gfs2: Drop __TIME__ usage ... Browse Code »

The kernel already prints its build timestamp during boot, no need to
repeat it in random drivers and produce different object files each
time.

Cc: Steven Whitehouse
Cc: cluster-devel@redhat.com
Signed-off-by: Michal Marek

Michal Marek
2011-05-26 16:54:37 +0800

25 May, 2011

2 commits

1495f230f vmscan: change shrinker API by passing shrink_control struct ... Browse Code »

Change each shrinker's API by consolidating the existing parameters into
shrink_control struct. This will simplify any further features added w/o
touching each file of shrinker.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: Ying Han
Cc: KOSAKI Motohiro
Cc: Minchan Kim
Acked-by: Pavel Emelyanov
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Acked-by: Rik van Riel
Cc: Johannes Weiner
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Steven Whitehouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ying Han
2011-05-25 23:39:26 +0800
f90e5b5b1 GFS2: Processes waiting on inode glock that no processes are holding ... Browse Code »

This patch fixes a race in the GFS2 glock state machine that may
result in lockups. The symptom is that all nodes but one will
hang, waiting for a particular glock. All the holder records
will have the "W" (Waiting) bit set. The other node will
typically have the glock stuck in Exclusive mode (EX) with no
holder records, but the dinode will be cached. In other words,
an entry with "I:" will appear in the glock dump for that glock,
but nothing else.

The race has to do with the glock "Pending Demote" bit, which
can be set, then immediately reset, thus losing the fact that
another node needs the glock. The sequence of events is:

1. Something schedules the glock workqueue (e.g. glock request from fs)
2. The glock workqueue gets to the point between the test of the reply pending
bit and the spin lock:

if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) {
finish_xmote(gl, gl->gl_reply);
drop_ref = 1;
}
down_read(&gfs2_umount_flush_sem); gl_spin);

3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and
(b) the demote request which sets GLF_PENDING_DEMOTE

4. The following test is executed:

if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
gl->gl_state != LM_ST_UNLOCKED &&
gl->gl_demote_state != LM_ST_EXCLUSIVE) {

This resets the pending demote flag, and gl->gl_demote_state is not equal to
exclusive, however because the reply from the dlm arrived after we checked for
the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so
although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the
GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG.

The patch closes the timing window by only transitioning the
"Pending demote" bit to the "demote" flag once we know the
other conditions (not unlocked and not exclusive) are met.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse

Bob Peterson
2011-05-25 17:37:11 +0800

22 May, 2011

1 commit

26b06a695 GFS2: Wait properly when flushing the ail list ... Browse Code »

The ail flush code has always relied upon log flushing to prevent
it from spinning needlessly. This fixes it to wait on the last
I/O request submitted (we don't need to wait for all of it)
instead of either spinning with io_schedule or sleeping.

As a result cpu usage of gfs2_logd is much reduced with certain
workloads.

Reported-by: Abhijith Das
Tested-by: Abhijith Das
Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-22 02:21:07 +0800

21 May, 2011

2 commits

6d3117b41 GFS2: Wipe directory hash table metadata when deallocating a directory ... Browse Code »

The deallocation code for directories in GFS2 is largely divided into
two parts. The first part deallocates any directory leaf blocks and
marks the directory as being a regular file when that is complete. The
second stage was identical to deallocating regular files.

Regular files have their data blocks in a different
address space to directories, and thus what would have been normal data
blocks in a regular file (the hash table in a GFS2 directory) were
deallocated correctly. However, a reference to these blocks was left in the
journal (assuming of course that some previous activity had resulted in
those blocks being in the journal or ail list).

This patch uses the i_depth as a test of whether the inode is an
exhash directory (we cannot test the inode type as that has already
been changed to a regular file at this stage in deallocation)

The original issue was reported by Chris Hertel as an issue he encountered
running bonnie++

Reported-by: Christopher R. Hertel
Cc: Abhijith Das
Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-21 21:05:58 +0800
6c1b8d94b Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (32 commits)
GFS2: Move all locking inside the inode creation function
GFS2: Clean up symlink creation
GFS2: Clean up mkdir
GFS2: Use UUID field in generic superblock
GFS2: Rename ops_inode.c to inode.c
GFS2: Inode.c is empty now, remove it
GFS2: Move final part of inode.c into super.c
GFS2: Move most of the remaining inode.c into ops_inode.c
GFS2: Move gfs2_refresh_inode() and friends into glops.c
GFS2: Remove gfs2_dinode_print() function
GFS2: When adding a new dir entry, inc link count if it is a subdir
GFS2: Make gfs2_dir_del update link count when required
GFS2: Don't use gfs2_change_nlink in link syscall
GFS2: Don't use a try lock when promoting to a higher mode
GFS2: Double check link count under glock
GFS2: Improve bug trap code in ->releasepage()
GFS2: Fix ail list traversal
GFS2: make sure fallocate bytes is a multiple of blksize
GFS2: Add an AIL writeback tracepoint
GFS2: Make writeback more responsive to system conditions
...

Linus Torvalds
2011-05-21 04:28:45 +0800

13 May, 2011

3 commits

f2741d989 GFS2: Move all locking inside the inode creation function ... Browse Code »

Now that there are no longer any exceptions to the normal inode
creation code path, we can move the parts of the locking code
which were duplicated in mkdir/mknod/create/symlink into the
inode create function.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-13 19:11:17 +0800
160b4026d GFS2: Clean up symlink creation ... Browse Code »

This moves the symlink specific parts of inode creation
into the function where we initialise the rest of the
dinode. As a result we have one less place where we need
to look up the inode's buffer.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-13 17:34:59 +0800
e2d0a13bb GFS2: Clean up mkdir ... Browse Code »

This moves the initialisation of the directory into the inode
creation functions to avoid having to duplicate the lookup
of the inode's buffer.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-13 16:55:55 +0800

10 May, 2011

3 commits

32e471ef1 GFS2: Use UUID field in generic superblock ... Browse Code »

The VFS superblock structure now has a UUID field, so we can use that
in preference to the UUID field in the GFS2 superblock now.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-10 22:01:59 +0800
2ab9cd1c6 GFS2: Rename ops_inode.c to inode.c ... Browse Code »

This is the final part of the ops_inode.c/inode.c reordering. We
are left with a single file called inode.c which now contains
all the inode operations, as expected.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-10 20:12:49 +0800
64ea54025 GFS2: Inode.c is empty now, remove it ... Browse Code »

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-10 20:09:53 +0800

09 May, 2011

7 commits

9eed04cd9 GFS2: Move final part of inode.c into super.c ... Browse Code »

Now inode.c is empty.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-09 23:45:38 +0800
194c011fc GFS2: Move most of the remaining inode.c into ops_inode.c ... Browse Code »

This is in preparation to remove inode.c and rename ops_inode.c
to inode.c. Also most of the functions which were left in inode.c
relate to the creation and lookup of inodes. I'm intending to work
on consolidating some of that code, and its easier when its all in
one place.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-09 23:45:14 +0800
d4b2cf1b0 GFS2: Move gfs2_refresh_inode() and friends into glops.c ... Browse Code »

Eventually there will only be a single caller of this code, so lets
move it where it can be made static at some future date.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-09 23:44:49 +0800
94fb763b1 GFS2: Remove gfs2_dinode_print() function ... Browse Code »

This function was intended for debugging purposes, but it is not very
useful. If we want to know what is on disk then all we need is a
block number and gfs2_edit can give us much better information about
what is there. Otherwise, if we are interested in what is stored in
the in-core inode, it doesn't help us out there either.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-09 23:44:29 +0800
3d6ecb7d1 GFS2: When adding a new dir entry, inc link count if it is a subdir ... Browse Code »

This adds an increment of the link count when we add a new directory
entry, if that entry is itself a directory. This means that we no
longer need separate code to perform this operation.

Now that both adding and removing directory entries automatically
update the parent directory's link count if required, that makes
the code shorter and simpler than before.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-09 23:43:53 +0800
855d23ce2 GFS2: Make gfs2_dir_del update link count when required ... Browse Code »

When we remove an entry from a directory, we can save ourselves
some trouble if we know the type of the entry in question, since
if it is itself a directory, we can update the link count of the
parent at the same time as removing the directory entry.

In addition this patch also merges the rmdir and unlink code which
was almost identical anyway. This eliminates the calls to remove
the . and .. directory entries on each rmdir (not needed since the
directory will be deallocated, anyway) which was the only thing preventing
passing the dentry to gfs2_dir_del(). The passing of the dentry
rather than just the name allows us to figure out the type of the entry
which is being removed, and thus adjust the link count when required.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-09 23:42:37 +0800
2baee03fb GFS2: Don't use gfs2_change_nlink in link syscall ... Browse Code »

There are three users of gfs2_change_nlink which add to the link
count. Two of these are about to be removed in later patches, so
this means that there will no callers, when that happens allowing
removal of that function, also in a later patch.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-09 23:35:25 +0800

05 May, 2011

2 commits

588da3b3b GFS2: Don't use a try lock when promoting to a higher mode ... Browse Code »

Previously we marked all locks being promoted to a higher mode
with the try flag to avoid any potential deadlocks issues. The
DLM is able to detect these and report them in way that GFS2 can
deal with them correctly. So we can just request the required mode
and wait for a response without needing to perform this check.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-05 19:36:38 +0800
d192a8e5c GFS2: Double check link count under glock ... Browse Code »

To avoid any possible races relating to the link count, we need to
recheck it under the inode's glock in all cases where it matters.
Also to ensure we never get any nasty surprises, this patch also
ensures that once the link count has hit zero it can never be
elevated by rereading in data from disk.

The only place we cannot provide a proper solution is in rename
in the case where we are removing a target inode and we discover
that the target inode has been already unlinked on another node.
The race window is very small, and we return EAGAIN in this case
to indicate what has happened. The proper solution would be to move
the lookup parts of rename from the vfs into library calls which
the fs could call directly, but that is potentially a very big job
and this fix should cover most cases for now.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-05 19:35:40 +0800

03 May, 2011

3 commits

8f065d365 GFS2: Improve bug trap code in ->releasepage() ... Browse Code »

If the buffer is dirty or pinned, then as well as printing a
warning, we should also refuse to release the page in
question.

Currently this can occur if there is a race between mmap()ed
writers and O_DIRECT on the same file. With the addition of
->launder_page() in the future, we should be able to close
this gap.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-03 18:49:19 +0800
4f1de0182 GFS2: Fix ail list traversal ... Browse Code »

In the recent patches to update the AIL list code, I managed to
forget that the ail list lock got dropped, even though I
added a comment specifically to remind myself :(

Reported-by: Barry Marson
Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-05-03 18:48:07 +0800
6905d9e4d GFS2: make sure fallocate bytes is a multiple of blksize ... Browse Code »

The GFS2 fallocate code chooses a target size to for allocating chunks of
space. Whenever it can't find any resource groups with enough space free, it
halves its target. Since this target is in bytes, eventually it will no longer
be a multiple of blksize. As long as there is more space available in the
resource group than the target, this isn't a problem, since gfs2 will use the
actual space available, which is always a multiple of blksize. However,
when gfs couldn't fallocate a bigger chunk than the target, it was using the
non-blksize aligned number. This caused a BUG in later code that required
blksize aligned offsets. GFS2 now ensures that bytes is always a multiple of
blksize

Signed-off-by: Benjamin Marzinski
Signed-off-by: Steven Whitehouse

Benjamin Marzinski
2011-05-03 18:47:42 +0800

26 Apr, 2011

1 commit

1879fd6a2 add hlist_bl_lock/unlock helpers ... Browse Code »

Now that the whole dcache_hash_bucket crap is gone, go all the way and
also remove the weird locking layering violations for locking the hash
buckets. Add hlist_bl_lock/unlock helpers to move the locking into the
list abstraction instead of requiring each caller to open code it.
After all allowing for the bit locks is the whole point of these helpers
over the plain hlist variant.

Signed-off-by: Christoph Hellwig
Signed-off-by: Linus Torvalds

Christoph Hellwig
2011-04-26 09:14:10 +0800

20 Apr, 2011

13 commits

c83ae9cad GFS2: Add an AIL writeback tracepoint ... Browse Code »

Add a tracepoint for monitoring writeback of the AIL.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-04-20 16:01:58 +0800
4667a0ec3 GFS2: Make writeback more responsive to system conditions ... Browse Code »

This patch adds writeback_control to writing back the AIL
list. This means that we can then take advantage of the
information we get in ->write_inode() in order to set off
some pre-emptive writeback.

In addition, the AIL code is cleaned up a bit to make it
a bit simpler to understand.

There is still more which can usefully be done in this area,
but this is a good start at least.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-04-20 16:01:37 +0800
f42ab0852 GFS2: Optimise glock lru and end of life inodes ... Browse Code »

The GLF_LRU flag introduced in the previous patch can be
used to check if a glock is on the lru list when a new
holder is queued and if so remove it, without having first
to get the lru_lock.

The main purpose of this patch however is to optimise the
glocks left over when an inode at end of life is being
evicted. Previously such glocks were left with the GLF_LFLUSH
flag set, so that when reclaimed, each one required a log flush.
This patch resets the GLF_LFLUSH flag when there is nothing
left to flush thus preventing later log flushes as glocks are
reused or demoted.

In order to do this, we need to keep track of the number of
revokes which are outstanding, and also to clear the GLF_LFLUSH
bit after a log commit when only revokes have been processed.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-04-20 16:01:17 +0800
627c10b7e GFS2: Improve tracing support (adds two flags) ... Browse Code »

This adds support for two new flags. One keeps track of whether
the glock is on the LRU list or not. The other isn't really a
flag as such, but an indication of whether the glock has an
attached object or not. This indication is reported without
any locking, which is ok since we do not dereference the object
pointer but merely report whether it is NULL or not.

Also, this fixes one place where a tracepoint was missing, which
was at the point we remove deallocated blocks from the journal.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-04-20 16:00:59 +0800
dba898b02 GFS2: Clean up fsync() ... Browse Code »

This patch is designed to clean up GFS2's fsync
implementation and ensure that it really does get everything on
disk. Since ->write_inode() has been updated, we can call that
via the vfs library function sync_inode_metadata() and the only
remaining thing that has to be done is to ensure that we get
any revoke records in the log after the inode has been written back.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-04-20 16:00:41 +0800
efc1a9c2a GFS2: Remove unused macro ... Browse Code »

The buffer_in_io() macro has been unused for some time,
so remove it.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-04-20 16:00:24 +0800
29687a2ac GFS2: Alter point of entry to glock lru list for glocks with an address_space ... Browse Code »

Rather than allowing the glocks to be scheduled for possible
reclaim as soon as they have exited the journal, this patch
delays their entry to the list until the glocks in question
are no longer in use.

This means that we will rely on the vm for writeback of all
dirty data and metadata from now on. When glocks are added
to the lru list they should be freeable much faster since all
the I/O required to free them should have already been completed.

This should lead to much better I/O patterns under low memory
conditions.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-04-20 15:59:48 +0800
5ac048bb7 GFS2: Use filemap_fdatawrite() to write back the AIL ... Browse Code »

In order to ensure that the mapping stats (and thus the bdi) are correctly
updated, this patch changes the AIL writeback to use the filemap_datawrite
function. This helps prevent stalls in balance_dirty_pages() due to
large amounts of dirty metadata when there is little or no dirty data
around.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-04-20 15:59:25 +0800
1027efaa2 GFS2: Make ->write_inode() really write ... Browse Code »

The GFS2 ->write_inode function should be more aggressive at writing
back to the filesystem. This adopts the XFS system of returning
-EAGAIN when the writeback has not been completely done. Also, we
now kick off in-place writeback when called with WB_SYNC_NONE,
but we only wait for it and flush the log when WB_SYNC_ALL is
requested.

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2011-04-20 15:55:07 +0800
556bb1799 GFS2: move function foreach_leaf to gfs2_dir_exhash_dealloc ... Browse Code »

The previous patches made function gfs2_dir_exhash_dealloc do nothing
but call function foreach_leaf. This patch simplifies the code by
moving the entire function foreach_leaf into gfs2_dir_exhash_dealloc.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse

Bob Peterson
2011-04-20 15:54:44 +0800
ec038c826 GFS2: pass leaf_bh into leaf_dealloc ... Browse Code »

Function foreach_leaf used to look up the leaf block address and get
a buffer_head. Then it would call leaf_dealloc which did the same
lookup. This patch combines the two operations by making foreach_leaf
pass the leaf bh to leaf_dealloc.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse

Bob Peterson
2011-04-20 15:54:26 +0800
d24a7a439 GFS2: Combine transaction from gfs2_dir_exhash_dealloc ... Browse Code »

At the end of function gfs2_dir_exhash_dealloc, it was setting the dinode
type to "file" to prevent directory corruption in case of a crash.
It was doing so in its own journal transaction. This patch makes the
change occur when the last call is make to leaf_dealloc, since it needs
to rewrite the directory dinode at that time anyway.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse

Bob Peterson
2011-04-20 15:53:56 +0800
0d95326d9 GFS2: remove *leaf_call_t and simplify leaf_dealloc ... Browse Code »

Since foreach_leaf is only called with leaf_dealloc as its only possible
call function, we can simplify the code by making it call leaf_dealloc
directly. This simplifies the code and eliminates the need for
leaf_call_t, the generic call method. This is a first small step in
simplifying the directory leaf deallocation code.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse

Bob Peterson
2011-04-20 15:53:35 +0800