09 May, 2007

3 commits


23 Dec, 2006

1 commit

  • In the current jbd code, if a buffer on BJ_SyncData list is dirty and not
    locked, the buffer is refiled to BJ_Locked list, submitted to the IO and
    waited for IO completion.

    But the fsstress test showed the case that when a buffer was already
    submitted to the IO just before the buffer_dirty(bh) check, the buffer was
    not waited for IO completion.

    Following patch solves this problem. If it is assumed that a buffer is
    submitted to the IO before the buffer_dirty(bh) check and still being
    written to disk, this buffer is refiled to BJ_Locked list.

    Signed-off-by: Hisashi Hifumi
    Cc: Jan Kara
    Cc: "Stephen C. Tweedie"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hisashi Hifumi
     

11 Dec, 2006

1 commit

  • This patch introduces a user: of the round_jiffies() function; the "5 second"
    ext3/jbd wakeup.

    While "every 5 seconds" doesn't sound as a problem, there can be many of these
    (and these timers do add up over all the kernel). The "5 second" wakeup isn't
    really timing sensitive; in addition even with rounding it'll still happen
    every 5 seconds (with the exception of the very first time, which is likely to
    be rounded up to somewhere closer to 6 seconds)

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

08 Dec, 2006

3 commits


29 Oct, 2006

1 commit

  • When running several fsx's and other filesystem stress tests, we found
    cases where an unmapped buffer was still being sent to submit_bh by the
    ext3 dirty data journaling code.

    I saw this happen in two ways, both related to another thread doing a
    truncate which would unmap the buffer in question.

    Either we would get into journal_dirty_data with a bh which was already
    unmapped (although journal_dirty_data_fn had checked for this earlier, the
    state was not locked at that point), or it would get unmapped in the middle
    of journal_dirty_data when we dropped locks to call sync_dirty_buffer.

    By re-checking for mapped state after we've acquired the bh state lock, we
    should avoid these races. If we find a buffer which is no longer mapped,
    we essentially ignore it, because journal_unmap_buffer has already decided
    that this buffer can go away.

    I've also added tracepoints in these two cases, and made a couple other
    tracepoint changes that I found useful in debugging this.

    Signed-off-by: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     

21 Oct, 2006

1 commit

  • A disk generated some I/O error, after it, I hitted
    J_ASSERT(transaction->t_updates > 0) in journal_stop().

    It seems to happened on ext3_truncate() path from stack trace. Then,
    maybe the following case may trigger J_ASSERT(transaction->t_updates > 0).

    ext3_truncate()
    -> ext3_free_branches()
    -> ext3_journal_test_restart()
    -> ext3_journal_restart()
    -> journal_restart()
    transaction->t_updates--;
    /* another process aborted journal */
    -> start_this_handle()
    returns -EROFS without transaction->t_updates++;

    -> ext3_journal_stop()
    -> journal_stop()
    J_ASSERT(transaction->t_updates > 0)

    If journal was aborted in middle of journal_restart(), ext3_truncate()
    may trigger J_ASSERT().

    Signed-off-by: OGAWA Hirofumi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     

12 Oct, 2006

1 commit


04 Oct, 2006

1 commit


30 Sep, 2006

2 commits


27 Sep, 2006

6 commits

  • Fixing up some endian-ness warnings in preparation to clone ext4 from ext3.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • More white space cleanups in preparation of cloning ext4 from ext3.
    Removing spaces that precede a tab.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • These are a few places I've found in jbd that look like they may not be
    16T-safe, or consistent with the use of unsigned longs for block
    containers. Problems here would be somewhat hard to hit, would require
    journal blocks past the 8T boundary, which would not be terribly common.
    Still, should fix.

    (some of these have come from the ext4 work on jbd as well).

    I think there's one more possibility that the wrap() function may not be
    safe IF your last block in the journal butts right up against the 232 block
    boundary, but that seems like a VERY remote possibility, and I'm not
    worrying about it at this point.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Signed-off-by: Alexey Dobriyan
    Acked-by: Stephen Tweedie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Remove whitespace from ext3 and jbd, before we clone ext4.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • jbd_sync_bh releases journal->j_list_lock. Add a lock annotation to this
    function so that sparse can check callers for lock pairing, and so that
    sparse will not complain about this function since it intentionally uses
    the lock in this manner.

    Signed-off-by: Josh Triplett
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     

26 Sep, 2006

1 commit

  • Original commit code assumes, that when a buffer on BJ_SyncData list is
    locked, it is being written to disk. But this is not true and hence it can
    lead to a potential data loss on crash. Also the code didn't count with
    the fact that journal_dirty_data() can steal buffers from committing
    transaction and hence could write buffers that no longer belong to the
    committing transaction. Finally it could possibly happen that we tried
    writing out one buffer several times.

    The patch below tries to solve these problems by a complete rewrite of the
    data commit code. We go through buffers on t_sync_datalist, lock buffers
    needing write out and store them in an array. Buffers are also immediately
    refiled to BJ_Locked list or unfiled (if the write out is completed). When
    the array is full or we have to block on buffer lock, we submit all
    accumulated buffers for IO.

    [suitable for 2.6.18.x around the 2.6.19-rc2 timeframe]

    Signed-off-by: Jan Kara
    Cc: Badari Pulavarty
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

02 Sep, 2006

1 commit


28 Aug, 2006

1 commit

  • JBD currently allocates commit and frozen buffers from slabs. With
    CONFIG_SLAB_DEBUG, its possible for an allocation to cross the page
    boundary causing IO problems.

    https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=200127

    So, instead of allocating these from regular slabs - manage allocation from
    its own slabs and disable slab debug for these slabs.

    [akpm@osdl.org: cleanups]
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

28 Jun, 2006

1 commit

  • Localize poison values into one header file for better documentation and
    easier/quicker debugging and so that the same values won't be used for
    multiple purposes.

    Use these constants in core arch., mm, driver, and fs code.

    Signed-off-by: Randy Dunlap
    Acked-by: Matt Mackall
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: "David S. Miller"
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

26 Jun, 2006

1 commit


23 Jun, 2006

3 commits

  • Split the checkpoint list of the transaction into two lists. In the first
    list we keep the buffers that need to be submitted for IO. In the second
    list are kept buffers that were already submitted and we just have to wait
    for the IO to complete. This should simplify a handling of checkpoint
    lists a bit and can eventually be also a performance gain.

    Signed-off-by: Jan Kara
    Cc: Mark Fasheh
    Cc: "Stephen C. Tweedie"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • There are a couple of places where JBD has to check to see whether an unneeded
    memory allocation was performed. Usually it _was_ needed, so we end up
    calling kfree(NULL). We can micro-optimise that by checking the pointer
    before calling kfree().

    Thanks to Steven Rostedt for identifying this.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Fix possible assertion failure in journal_commit_transaction() on
    jh->b_next_transaction == NULL (when we are processing BJ_Forget list and
    buffer is not jbddirty).

    !jbddirty buffers can be placed on BJ_Forget list for example by
    journal_forget() or by __dispose_buffer() - generally such buffer means
    that it has been freed by this transaction.

    Freed buffers should not be reallocated until the transaction has committed
    (that's why we have the assertion there) but they *can* be reallocated when
    the transaction has already been committed to disk and we are just
    processing the BJ_Forget list (as soon as we remove b_committed_data from
    the bitmap bh, ext3 will be able to reallocate buffers freed by the
    committing transaction). So we have to also count with the case that the
    buffer has been reallocated and b_next_transaction has been already set.

    And one more subtle point: it can happen that we manage to reallocate the
    buffer and also mark it jbddirty. Then we also add the freed buffer to the
    checkpoint list of the committing trasaction. But that should do no harm.

    Non-jbddirty buffers should be filed to BJ_Reserved and not BJ_Metadata
    list. It can actually happen that we refile such buffers during the commit
    phase when we reallocate in the running transaction blocks deleted in
    committing transaction (and that can happen if the committing transaction
    already wrote all the data and is just cleaning up BJ_Forget list).

    Signed-off-by: Jan Kara
    Acked-by: "Stephen C. Tweedie"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

27 Mar, 2006

1 commit

  • The return value of this function is never used, so let's be honest and
    declare it as void.

    Some places where invalidatepage returned 0, I have inserted comments
    suggesting a BUG_ON.

    [akpm@osdl.org: JBD BUG fix]
    [akpm@osdl.org: rework for git-nfs]
    [akpm@osdl.org: don't go BUG in block_invalidate_page()]
    Signed-off-by: Neil Brown
    Acked-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

26 Mar, 2006

2 commits


23 Mar, 2006

1 commit


15 Feb, 2006

1 commit

  • This patch reverts commit f93ea411b73594f7d144855fd34278bcf34a9afc:
    [PATCH] jbd: split checkpoint lists

    This broke journal_flush() for OCFS2, which is its method of being sure
    that metadata is sent to disk for another node.

    And two related commits 8d3c7fce2d20ecc3264c8d8c91ae3beacdeaed1b and
    43c3e6f5abdf6acac9b90c86bf03f995bf7d3d92 with the subjects:
    [PATCH] jbd: log_do_checkpoint fix
    [PATCH] jbd: remove_transaction fix

    These seem to be incremental bugfixes on the original patch and as such are
    no longer needed.

    Signed-off-by: Mark Fasheh
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Fasheh
     

06 Feb, 2006

1 commit

  • Ben points out that:

    When writing files out using O_SYNC, jbd's 1 jiffy delay results in a
    significant drop in throughput as the disk sits idle. The patch below
    results in a 4-5x performance improvement (from 6.5MB/s to ~24-30MB/s on my
    IDE test box) when writing out files using O_SYNC.

    So optimise the batching code by omitting it entirely if the process which is
    doing a sync write is the same as the one which did the most recent sync
    write. If that's true, we're unlikely to get any other processes joining the
    transaction.

    (Has been in -mm for ages - it took me a long time to get on to performance
    testing it)

    Numbers, on write-cache-disabled IDE:

    /usr/bin/time -p synctest -n 10 -uf -t 1 -p 1 dir-name

    Unpatched:
    40 seconds
    Patched:
    35 seconds
    Batching disabled:
    35 seconds

    This is the problematic single-process-doing-fsync case. With multiple
    fsyncing processes the numbers are AFACIT unaltered by the patch.

    Aside: performance testing and instrumentation shows that the transaction
    batching almost doesn't help (testing with synctest -n 1 -uf -t 100 -p 10
    dir-name on non-writeback-caching IDE). This is because by the time one
    process is running a synchronous commit, a bunch of other processes already
    have a transaction handle open, so they're all going to batch into the same
    transaction anyway.

    The batching seems to offer maybe 5-10% speedup with this workload, but I'm
    pretty sure it was more important than that when it was first developed 4-odd
    years ago...

    Cc: "Stephen C. Tweedie"
    Cc: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

19 Jan, 2006

2 commits

  • We have to check that also the second checkpoint list is non-empty before
    dropping the transaction.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • While checkpointing we have to check that our transaction still is in the
    checkpoint list *and* (not or) that it's not just a different transaction
    with the same address.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

07 Jan, 2006

1 commit

  • Split the checkpoint list of the transaction into two lists. In the first
    list we keep the buffers that need to be submitted for IO. In the second
    list are kept buffers that were already submitted and we just have to wait
    for the IO to complete. This should simplify a handling of checkpoint
    lists a bit and can eventually be also a performance gain.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

07 Nov, 2005

2 commits

  • This is the fs/ part of the big kfree cleanup patch.

    Remove pointless checks for NULL prior to calling kfree() in fs/.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • Add structure fields kernel-doc for 2 fields in struct journal_s.

    Warning(/var/linsrc/linux-2614-rc4//include/linux/jbd.h:808): No description found for parameter 'j_wbuf'
    Warning(/var/linsrc/linux-2614-rc4//include/linux/jbd.h:808): No description found for parameter 'j_wbufsize'

    Convert fs/jbd/recovery.c non-static functions to kernel-doc format.

    fs/jbd/recovery.c doesn't export any symbols, so it should use
    !I instead of !E to eliminate this warning message:

    Warning(/var/linsrc/linux-2614-rc4//fs/jbd/recovery.c): no structured comments found

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

28 Oct, 2005

1 commit

  • - ->releasepage() annotated (s/int/gfp_t), instances updated
    - missing gfp_t in fs/* added
    - fixed misannotation from the original sweep caught by bitwise checks:
    XFS used __nocast both for gfp_t and for flags used by XFS allocator.
    The latter left with unsigned int __nocast; we might want to add a
    different type for those but for now let's leave them alone. That,
    BTW, is a case when __nocast use had been actively confusing - it had
    been used in the same code for two different and similar types, with
    no way to catch misuses. Switch of gfp_t to bitwise had caught that
    immediately...

    One tricky bit is left alone to be dealt with later - mapping->flags is
    a mix of gfp_t and error indications. Left alone for now.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro