08 Jun, 2011

1 commit


27 May, 2011

1 commit

  • * 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
    gfs2: Drop __TIME__ usage
    isdn/diva: Drop __TIME__ usage
    atm: Drop __TIME__ usage
    dlm: Drop __TIME__ usage
    wan/pc300: Drop __TIME__ usage
    parport: Drop __TIME__ usage
    hdlcdrv: Drop __TIME__ usage
    baycom: Drop __TIME__ usage
    pmcraid: Drop __DATE__ usage
    edac: Drop __DATE__ usage
    rio: Drop __DATE__ usage
    scsi/wd33c93: Drop __TIME__ usage
    scsi/in2000: Drop __TIME__ usage
    aacraid: Drop __TIME__ usage
    media/cx231xx: Drop __TIME__ usage
    media/radio-maxiradio: Drop __TIME__ usage
    nozomi: Drop __TIME__ usage
    cyclades: Drop __TIME__ usage

    Linus Torvalds
     

26 May, 2011

1 commit

  • The kernel already prints its build timestamp during boot, no need to
    repeat it in random drivers and produce different object files each
    time.

    Cc: Steven Whitehouse
    Cc: cluster-devel@redhat.com
    Signed-off-by: Michal Marek

    Michal Marek
     

25 May, 2011

2 commits

  • Change each shrinker's API by consolidating the existing parameters into
    shrink_control struct. This will simplify any further features added w/o
    touching each file of shrinker.

    [akpm@linux-foundation.org: fix build]
    [akpm@linux-foundation.org: fix warning]
    [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
    [akpm@linux-foundation.org: fix xfs warning]
    [akpm@linux-foundation.org: update gfs2]
    Signed-off-by: Ying Han
    Cc: KOSAKI Motohiro
    Cc: Minchan Kim
    Acked-by: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Johannes Weiner
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Steven Whitehouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ying Han
     
  • This patch fixes a race in the GFS2 glock state machine that may
    result in lockups. The symptom is that all nodes but one will
    hang, waiting for a particular glock. All the holder records
    will have the "W" (Waiting) bit set. The other node will
    typically have the glock stuck in Exclusive mode (EX) with no
    holder records, but the dinode will be cached. In other words,
    an entry with "I:" will appear in the glock dump for that glock,
    but nothing else.

    The race has to do with the glock "Pending Demote" bit, which
    can be set, then immediately reset, thus losing the fact that
    another node needs the glock. The sequence of events is:

    1. Something schedules the glock workqueue (e.g. glock request from fs)
    2. The glock workqueue gets to the point between the test of the reply pending
    bit and the spin lock:

    if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) {
    finish_xmote(gl, gl->gl_reply);
    drop_ref = 1;
    }
    down_read(&gfs2_umount_flush_sem); gl_spin);

    3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and
    (b) the demote request which sets GLF_PENDING_DEMOTE

    4. The following test is executed:

    if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
    gl->gl_state != LM_ST_UNLOCKED &&
    gl->gl_demote_state != LM_ST_EXCLUSIVE) {

    This resets the pending demote flag, and gl->gl_demote_state is not equal to
    exclusive, however because the reply from the dlm arrived after we checked for
    the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so
    although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the
    GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG.

    The patch closes the timing window by only transitioning the
    "Pending demote" bit to the "demote" flag once we know the
    other conditions (not unlocked and not exclusive) are met.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

22 May, 2011

1 commit

  • The ail flush code has always relied upon log flushing to prevent
    it from spinning needlessly. This fixes it to wait on the last
    I/O request submitted (we don't need to wait for all of it)
    instead of either spinning with io_schedule or sleeping.

    As a result cpu usage of gfs2_logd is much reduced with certain
    workloads.

    Reported-by: Abhijith Das
    Tested-by: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

21 May, 2011

2 commits

  • The deallocation code for directories in GFS2 is largely divided into
    two parts. The first part deallocates any directory leaf blocks and
    marks the directory as being a regular file when that is complete. The
    second stage was identical to deallocating regular files.

    Regular files have their data blocks in a different
    address space to directories, and thus what would have been normal data
    blocks in a regular file (the hash table in a GFS2 directory) were
    deallocated correctly. However, a reference to these blocks was left in the
    journal (assuming of course that some previous activity had resulted in
    those blocks being in the journal or ail list).

    This patch uses the i_depth as a test of whether the inode is an
    exhash directory (we cannot test the inode type as that has already
    been changed to a regular file at this stage in deallocation)

    The original issue was reported by Chris Hertel as an issue he encountered
    running bonnie++

    Reported-by: Christopher R. Hertel
    Cc: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (32 commits)
    GFS2: Move all locking inside the inode creation function
    GFS2: Clean up symlink creation
    GFS2: Clean up mkdir
    GFS2: Use UUID field in generic superblock
    GFS2: Rename ops_inode.c to inode.c
    GFS2: Inode.c is empty now, remove it
    GFS2: Move final part of inode.c into super.c
    GFS2: Move most of the remaining inode.c into ops_inode.c
    GFS2: Move gfs2_refresh_inode() and friends into glops.c
    GFS2: Remove gfs2_dinode_print() function
    GFS2: When adding a new dir entry, inc link count if it is a subdir
    GFS2: Make gfs2_dir_del update link count when required
    GFS2: Don't use gfs2_change_nlink in link syscall
    GFS2: Don't use a try lock when promoting to a higher mode
    GFS2: Double check link count under glock
    GFS2: Improve bug trap code in ->releasepage()
    GFS2: Fix ail list traversal
    GFS2: make sure fallocate bytes is a multiple of blksize
    GFS2: Add an AIL writeback tracepoint
    GFS2: Make writeback more responsive to system conditions
    ...

    Linus Torvalds
     

13 May, 2011

3 commits


10 May, 2011

3 commits


09 May, 2011

7 commits

  • Now inode.c is empty.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This is in preparation to remove inode.c and rename ops_inode.c
    to inode.c. Also most of the functions which were left in inode.c
    relate to the creation and lookup of inodes. I'm intending to work
    on consolidating some of that code, and its easier when its all in
    one place.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Eventually there will only be a single caller of this code, so lets
    move it where it can be made static at some future date.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This function was intended for debugging purposes, but it is not very
    useful. If we want to know what is on disk then all we need is a
    block number and gfs2_edit can give us much better information about
    what is there. Otherwise, if we are interested in what is stored in
    the in-core inode, it doesn't help us out there either.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This adds an increment of the link count when we add a new directory
    entry, if that entry is itself a directory. This means that we no
    longer need separate code to perform this operation.

    Now that both adding and removing directory entries automatically
    update the parent directory's link count if required, that makes
    the code shorter and simpler than before.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • When we remove an entry from a directory, we can save ourselves
    some trouble if we know the type of the entry in question, since
    if it is itself a directory, we can update the link count of the
    parent at the same time as removing the directory entry.

    In addition this patch also merges the rmdir and unlink code which
    was almost identical anyway. This eliminates the calls to remove
    the . and .. directory entries on each rmdir (not needed since the
    directory will be deallocated, anyway) which was the only thing preventing
    passing the dentry to gfs2_dir_del(). The passing of the dentry
    rather than just the name allows us to figure out the type of the entry
    which is being removed, and thus adjust the link count when required.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • There are three users of gfs2_change_nlink which add to the link
    count. Two of these are about to be removed in later patches, so
    this means that there will no callers, when that happens allowing
    removal of that function, also in a later patch.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

05 May, 2011

2 commits

  • Previously we marked all locks being promoted to a higher mode
    with the try flag to avoid any potential deadlocks issues. The
    DLM is able to detect these and report them in way that GFS2 can
    deal with them correctly. So we can just request the required mode
    and wait for a response without needing to perform this check.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • To avoid any possible races relating to the link count, we need to
    recheck it under the inode's glock in all cases where it matters.
    Also to ensure we never get any nasty surprises, this patch also
    ensures that once the link count has hit zero it can never be
    elevated by rereading in data from disk.

    The only place we cannot provide a proper solution is in rename
    in the case where we are removing a target inode and we discover
    that the target inode has been already unlinked on another node.
    The race window is very small, and we return EAGAIN in this case
    to indicate what has happened. The proper solution would be to move
    the lookup parts of rename from the vfs into library calls which
    the fs could call directly, but that is potentially a very big job
    and this fix should cover most cases for now.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

03 May, 2011

3 commits

  • If the buffer is dirty or pinned, then as well as printing a
    warning, we should also refuse to release the page in
    question.

    Currently this can occur if there is a race between mmap()ed
    writers and O_DIRECT on the same file. With the addition of
    ->launder_page() in the future, we should be able to close
    this gap.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • In the recent patches to update the AIL list code, I managed to
    forget that the ail list lock got dropped, even though I
    added a comment specifically to remind myself :(

    Reported-by: Barry Marson
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The GFS2 fallocate code chooses a target size to for allocating chunks of
    space. Whenever it can't find any resource groups with enough space free, it
    halves its target. Since this target is in bytes, eventually it will no longer
    be a multiple of blksize. As long as there is more space available in the
    resource group than the target, this isn't a problem, since gfs2 will use the
    actual space available, which is always a multiple of blksize. However,
    when gfs couldn't fallocate a bigger chunk than the target, it was using the
    non-blksize aligned number. This caused a BUG in later code that required
    blksize aligned offsets. GFS2 now ensures that bytes is always a multiple of
    blksize

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

26 Apr, 2011

1 commit

  • Now that the whole dcache_hash_bucket crap is gone, go all the way and
    also remove the weird locking layering violations for locking the hash
    buckets. Add hlist_bl_lock/unlock helpers to move the locking into the
    list abstraction instead of requiring each caller to open code it.
    After all allowing for the bit locks is the whole point of these helpers
    over the plain hlist variant.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

20 Apr, 2011

13 commits

  • Add a tracepoint for monitoring writeback of the AIL.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch adds writeback_control to writing back the AIL
    list. This means that we can then take advantage of the
    information we get in ->write_inode() in order to set off
    some pre-emptive writeback.

    In addition, the AIL code is cleaned up a bit to make it
    a bit simpler to understand.

    There is still more which can usefully be done in this area,
    but this is a good start at least.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The GLF_LRU flag introduced in the previous patch can be
    used to check if a glock is on the lru list when a new
    holder is queued and if so remove it, without having first
    to get the lru_lock.

    The main purpose of this patch however is to optimise the
    glocks left over when an inode at end of life is being
    evicted. Previously such glocks were left with the GLF_LFLUSH
    flag set, so that when reclaimed, each one required a log flush.
    This patch resets the GLF_LFLUSH flag when there is nothing
    left to flush thus preventing later log flushes as glocks are
    reused or demoted.

    In order to do this, we need to keep track of the number of
    revokes which are outstanding, and also to clear the GLF_LFLUSH
    bit after a log commit when only revokes have been processed.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This adds support for two new flags. One keeps track of whether
    the glock is on the LRU list or not. The other isn't really a
    flag as such, but an indication of whether the glock has an
    attached object or not. This indication is reported without
    any locking, which is ok since we do not dereference the object
    pointer but merely report whether it is NULL or not.

    Also, this fixes one place where a tracepoint was missing, which
    was at the point we remove deallocated blocks from the journal.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch is designed to clean up GFS2's fsync
    implementation and ensure that it really does get everything on
    disk. Since ->write_inode() has been updated, we can call that
    via the vfs library function sync_inode_metadata() and the only
    remaining thing that has to be done is to ensure that we get
    any revoke records in the log after the inode has been written back.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The buffer_in_io() macro has been unused for some time,
    so remove it.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Rather than allowing the glocks to be scheduled for possible
    reclaim as soon as they have exited the journal, this patch
    delays their entry to the list until the glocks in question
    are no longer in use.

    This means that we will rely on the vm for writeback of all
    dirty data and metadata from now on. When glocks are added
    to the lru list they should be freeable much faster since all
    the I/O required to free them should have already been completed.

    This should lead to much better I/O patterns under low memory
    conditions.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • In order to ensure that the mapping stats (and thus the bdi) are correctly
    updated, this patch changes the AIL writeback to use the filemap_datawrite
    function. This helps prevent stalls in balance_dirty_pages() due to
    large amounts of dirty metadata when there is little or no dirty data
    around.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The GFS2 ->write_inode function should be more aggressive at writing
    back to the filesystem. This adopts the XFS system of returning
    -EAGAIN when the writeback has not been completely done. Also, we
    now kick off in-place writeback when called with WB_SYNC_NONE,
    but we only wait for it and flush the log when WB_SYNC_ALL is
    requested.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The previous patches made function gfs2_dir_exhash_dealloc do nothing
    but call function foreach_leaf. This patch simplifies the code by
    moving the entire function foreach_leaf into gfs2_dir_exhash_dealloc.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • Function foreach_leaf used to look up the leaf block address and get
    a buffer_head. Then it would call leaf_dealloc which did the same
    lookup. This patch combines the two operations by making foreach_leaf
    pass the leaf bh to leaf_dealloc.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • At the end of function gfs2_dir_exhash_dealloc, it was setting the dinode
    type to "file" to prevent directory corruption in case of a crash.
    It was doing so in its own journal transaction. This patch makes the
    change occur when the last call is make to leaf_dealloc, since it needs
    to rewrite the directory dinode at that time anyway.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • Since foreach_leaf is only called with leaf_dealloc as its only possible
    call function, we can simplify the code by making it call leaf_dealloc
    directly. This simplifies the code and eliminates the need for
    leaf_call_t, the generic call method. This is a first small step in
    simplifying the directory leaf deallocation code.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson