06 Dec, 2011

1 commit


28 Jun, 2011

1 commit

  • In journal checkpoint, we write the buffer and wait for its finish.
    But in cfq, the async queue has a very low priority, and in our test,
    if there are too many sync queues and every queue is filled up with
    requests, and the process will hang waiting for the log space.

    So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
    be moved into sync queue and handled by cfq timely. We also use the new plug,
    sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.

    Reported-by: Robin Dong
    Signed-off-by: Tao Ma
    Signed-off-by: Jan Kara

    Tao Ma
     

27 Jun, 2011

1 commit

  • journal_remove_journal_head() can oops when trying to access journal_head
    returned by bh2jh(). This is caused for example by the following race:

    TASK1 TASK2
    journal_commit_transaction()
    ...
    processing t_forget list
    __journal_refile_buffer(jh);
    if (!jh->b_transaction) {
    jbd_unlock_bh_state(bh);
    journal_try_to_free_buffers()
    journal_grab_journal_head(bh)
    jbd_lock_bh_state(bh)
    __journal_try_to_free_buffer()
    journal_put_journal_head(jh)
    journal_remove_journal_head(bh);

    journal_put_journal_head() in TASK2 sees that b_jcount == 0 and buffer is not
    part of any transaction and thus frees journal_head before TASK1 gets to doing
    so. Note that even buffer_head can be released by try_to_free_buffers() after
    journal_put_journal_head() which adds even larger opportunity for oops (but I
    didn't see this happen in reality).

    Fix the problem by making transactions hold their own journal_head reference
    (in b_jcount). That way we don't have to remove journal_head explicitely via
    journal_remove_journal_head() and instead just remove journal_head when
    b_jcount drops to zero. The result of this is that [__]journal_refile_buffer(),
    [__]journal_unfile_buffer(), and __journal_remove_checkpoint() can free
    journal_head which needs modification of a few callers. Also we have to be
    careful because once journal_head is removed, buffer_head might be freed as
    well. So we have to get our own buffer_head reference where it matters.

    Signed-off-by: Jan Kara

    Jan Kara
     

25 Jun, 2011

1 commit

  • This commit adds fixed tracepoint for jbd. It has been based on fixed
    tracepoints for jbd2, however there are missing those for collecting
    statistics, since I think that it will require more intrusive patch so I
    should have its own commit, if someone decide that it is needed. Also
    there are new tracepoints in __journal_drop_transaction() and
    journal_update_superblock().

    The list of jbd tracepoints:

    jbd_checkpoint
    jbd_start_commit
    jbd_commit_locking
    jbd_commit_flushing
    jbd_commit_logging
    jbd_drop_transaction
    jbd_end_commit
    jbd_do_submit_data
    jbd_cleanup_journal_tail
    jbd_update_superblock_end

    Signed-off-by: Lukas Czerner
    Cc: Jan Kara
    Signed-off-by: Jan Kara

    Lukas Czerner
     

28 Oct, 2010

1 commit


18 Aug, 2010

1 commit

  • These flags aren't real I/O types, but tell ll_rw_block to always
    lock the buffer instead of giving up on a failed trylock.

    Instead add a new write_dirty_buffer helper that implements this semantic
    and use it from the existing SWRITE* callers. Note that the ll_rw_block
    code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
    this patch fixes.

    In the ufs code clean up the helper that used to call ll_rw_block
    to mirror sync_dirty_buffer, which is the function it implements for
    compound buffers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

16 Sep, 2009

1 commit


07 Nov, 2008

1 commit

  • Commit be07c4ed introducd a regression because it assumed that if
    there were no transactions ready to be checkpointed, that no progress
    could be made on making space available in the journal, and so the
    journal should be aborted. This assumption is false; it could be the
    case that simply calling cleanup_journal_tail() will recover the
    necessary space, or, for small journals, the currently committing
    transaction could be responsible for chewing up the required space in
    the log, so we need to wait for the currently committing transaction
    to finish before trying to force a checkpoint operation.

    This patch fixes the bug reported by Meelis Roos at:
    http://bugzilla.kernel.org/show_bug.cgi?id=11937

    Signed-off-by: "Theodore Ts'o"
    Cc: Duane Griffin
    Cc: Toshiyuki Okajima

    Theodore Ts'o
     

23 Oct, 2008

3 commits

  • The __log_wait_for_space function sits in a loop checkpointing
    transactions until there is sufficient space free in the journal.
    However, if there are no transactions to be processed (e.g. because the
    free space calculation is wrong due to a corrupted filesystem) it will
    never progress.

    Check for space being required when no transactions are outstanding and
    abort the journal instead of endlessly looping.

    This patch fixes the bug reported by Sami Liedes at:
    http://bugzilla.kernel.org/show_bug.cgi?id=10976

    Signed-off-by: Duane Griffin
    Tested-by: Sami Liedes
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Duane Griffin
     
  • __try_to_free_cp_buf(), __process_buffer(), and __wait_cp_io() test
    BH_Uptodate flag to detect write I/O errors on metadata buffers. But by
    commit 95450f5a7e53d5752ce1a0d0b8282e10fe745ae0 "ext3: don't read inode
    block if the buffer has a write error"(*), BH_Uptodate flag can be set to
    inode buffers with BH_Write_EIO in order to avoid reading old inode data.
    So now, we have to test BH_Write_EIO flag of checkpointing inode buffers
    instead of BH_Uptodate. This patch does it.

    Signed-off-by: Hidehiro Kawai
    Acked-by: Jan Kara
    Acked-by: Eric Sandeen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hidehiro Kawai
     
  • When a checkpointing IO fails, current JBD code doesn't check the error
    and continue journaling. This means latest metadata can be lost from both
    the journal and filesystem.

    This patch leaves the failed metadata blocks in the journal space and
    aborts journaling in the case of log_do_checkpoint(). To achieve this, we
    need to do:

    1. don't remove the failed buffer from the checkpoint list where in
    the case of __try_to_free_cp_buf() because it may be released or
    overwritten by a later transaction
    2. log_do_checkpoint() is the last chance, remove the failed buffer
    from the checkpoint list and abort the journal
    3. when checkpointing fails, don't update the journal super block to
    prevent the journaled contents from being cleaned. For safety,
    don't update j_tail and j_tail_sequence either
    4. when checkpointing fails, notify this error to the ext3 layer so
    that ext3 don't clear the needs_recovery flag, otherwise the
    journaled contents are ignored and cleaned in the recovery phase
    5. if the recovery fails, keep the needs_recovery flag
    6. prevent cleanup_journal_tail() from being called between
    __journal_drop_transaction() and journal_abort() (a race issue
    between journal_flush() and __log_wait_for_space()

    Signed-off-by: Hidehiro Kawai
    Acked-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hidehiro Kawai
     

30 Jan, 2008

1 commit

  • The break_lock data structure and code for spinlocks is quite nasty.
    Not only does it double the size of a spinlock but it changes locking to
    a potentially less optimal trylock.

    Put all of that under CONFIG_GENERIC_LOCKBREAK, and introduce a
    __raw_spin_is_contended that uses the lock data itself to determine whether
    there are waiters on the lock, to be used if CONFIG_GENERIC_LOCKBREAK is
    not set.

    Rename need_lockbreak to spin_needbreak, make it use spin_is_contended to
    decouple it from the spinlock implementation, and make it typesafe (rwlocks
    do not have any need_lockbreak sites -- why do they even get bloated up
    with that break_lock then?).

    Signed-off-by: Nick Piggin
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Nick Piggin
     

06 Dec, 2007

1 commit

  • Before we start committing a transaction, we call
    __journal_clean_checkpoint_list() to cleanup transaction's written-back
    buffers.

    If this call happens to remove all of them (and there were already some
    buffers), __journal_remove_checkpoint() will decide to free the transaction
    because it isn't (yet) a committing transaction and soon we fail some
    assertion - the transaction really isn't ready to be freed :).

    We change the check in __journal_remove_checkpoint() to free only a
    transaction in T_FINISHED state. The locking there is subtle though (as
    everywhere in JBD ;(). We use j_list_lock to protect the check and a
    subsequent call to __journal_drop_transaction() and do the same in the end
    of journal_commit_transaction() which is the only place where a transaction
    can get to T_FINISHED state.

    Probably I'm too paranoid here and such locking is not really necessary -
    checkpoint lists are processed only from log_do_checkpoint() where a
    transaction must be already committed to be processed or from
    __journal_clean_checkpoint_list() where kjournald itself calls it and thus
    transaction cannot change state either. Better be safe if something
    changes in future...

    Signed-off-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

09 May, 2007

1 commit


27 Sep, 2006

3 commits


23 Jun, 2006

1 commit

  • Split the checkpoint list of the transaction into two lists. In the first
    list we keep the buffers that need to be submitted for IO. In the second
    list are kept buffers that were already submitted and we just have to wait
    for the IO to complete. This should simplify a handling of checkpoint
    lists a bit and can eventually be also a performance gain.

    Signed-off-by: Jan Kara
    Cc: Mark Fasheh
    Cc: "Stephen C. Tweedie"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

23 Mar, 2006

1 commit


15 Feb, 2006

1 commit

  • This patch reverts commit f93ea411b73594f7d144855fd34278bcf34a9afc:
    [PATCH] jbd: split checkpoint lists

    This broke journal_flush() for OCFS2, which is its method of being sure
    that metadata is sent to disk for another node.

    And two related commits 8d3c7fce2d20ecc3264c8d8c91ae3beacdeaed1b and
    43c3e6f5abdf6acac9b90c86bf03f995bf7d3d92 with the subjects:
    [PATCH] jbd: log_do_checkpoint fix
    [PATCH] jbd: remove_transaction fix

    These seem to be incremental bugfixes on the original patch and as such are
    no longer needed.

    Signed-off-by: Mark Fasheh
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Fasheh
     

19 Jan, 2006

1 commit

  • While checkpointing we have to check that our transaction still is in the
    checkpoint list *and* (not or) that it's not just a different transaction
    with the same address.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

07 Jan, 2006

1 commit

  • Split the checkpoint list of the transaction into two lists. In the first
    list we keep the buffers that need to be submitted for IO. In the second
    list are kept buffers that were already submitted and we just have to wait
    for the IO to complete. This should simplify a handling of checkpoint
    lists a bit and can eventually be also a performance gain.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

08 Sep, 2005

1 commit


03 Jun, 2005

2 commits

  • Fix a bug in list scanning that can cause us to skip the last buffer on the
    checkpoint list (and hence fail to do any progress under some rather
    unfavorable conditions).

    The problem is we first do jh=next_jh and then test

    } while (jh!=last_jh);

    Hence we skip the last buffer on the list (if it was not the only buffer on
    the list). As we already do jh=next_jh; in the beginning of the loop we
    are safe to just remove the assignment in the end. It can happen that 'jh'
    will be freed at the point we test jh != last_jh but that does not matter
    as we never *dereference* the pointer.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Fix possible false assertion failure in log_do_checkpoint(). We might fail
    to detect that we actually made a progress when cleaning up the checkpoint
    lists if we don't retry after writing something to disk. The patch was
    confirmed to fix observed assertion failures for several users.

    When we flushed some buffers we need to retry scanning the list.
    Otherwise we can fail to detect our progress.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds