18 Oct, 2015

1 commit

  • Unlike comments and expectation of callers journal_clean_one_cp_list()
    returned 1 not only if it freed the transaction but also if it freed
    some buffers in the transaction. That could make
    __jbd2_journal_clean_checkpoint_list() skip processing
    t_checkpoint_io_list and continue with processing the next transaction.
    This is mostly a cosmetic issue since the only result is we can
    sometimes free less memory than we could. But it's still worth fixing.
    Fix journal_clean_one_cp_list() to return 1 only if the transaction was
    really freed.

    Fixes: 50849db32a9f529235a84bcc84a6b8e631b1d0ec
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Jan Kara
     

29 Jul, 2015

1 commit

  • Commit 6f6a6fda2945 "jbd2: fix ocfs2 corrupt when updating journal
    superblock fails" changed jbd2_cleanup_journal_tail() to return EIO
    when the journal is aborted. That makes logic in
    jbd2_log_do_checkpoint() bail out which is fine, except that
    jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make
    a progress in cleaning the journal. Without it jbd2_journal_destroy()
    just loops in an infinite loop.

    Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of
    jbd2_log_do_checkpoint() fails with error.

    Reported-by: Eryu Guan
    Tested-by: Eryu Guan
    Fixes: 6f6a6fda294506dfe0e3e0a253bb2d2923f28f0a
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

16 Jun, 2015

1 commit

  • If updating journal superblock fails after journal data has been
    flushed, the error is omitted and this will mislead the caller as a
    normal case. In ocfs2, the checkpoint will be treated successfully
    and the other node can get the lock to update. Since the sb_start is
    still pointing to the old log block, it will rewrite the journal data
    during journal recovery by the other node. Thus the new updates will
    be overwritten and ocfs2 corrupts. So in above case we have to return
    the error, and ocfs2_commit_cache will take care of the error and
    prevent the other node to do update first. And only after recovering
    journal it can do the new updates.

    The issue discussion mail can be found at:
    https://oss.oracle.com/pipermail/ocfs2-devel/2015-June/010856.html
    http://comments.gmane.org/gmane.comp.file-systems.ext4/48841

    [ Fixed bug in patch which allowed a non-negative error return from
    jbd2_cleanup_journal_tail() to leak out of jbd2_fjournal_flush(); this
    was causing xfstests ext4/306 to fail. -- Ted ]

    Reported-by: Yiwen Jiang
    Signed-off-by: Joseph Qi
    Signed-off-by: Theodore Ts'o
    Tested-by: Yiwen Jiang
    Cc: Junxiao Bi
    Cc: stable@vger.kernel.org

    Joseph Qi
     

15 Jun, 2015

1 commit

  • jbd2_cleanup_journal_tail() can be invoked by jbd2__journal_start()
    So allocations should be done with GFP_NOFS

    [Full stack trace snipped from 3.10-rh7]
    [] dump_stack+0x19/0x1b
    [] warn_slowpath_common+0x61/0x80
    [] warn_slowpath_null+0x1a/0x20
    [] slab_pre_alloc_hook.isra.31.part.32+0x15/0x17
    [] kmem_cache_alloc+0x55/0x210
    [] ? mempool_alloc_slab+0x15/0x20
    [] mempool_alloc_slab+0x15/0x20
    [] mempool_alloc+0x69/0x170
    [] ? _raw_spin_unlock_irq+0xe/0x20
    [] ? finish_task_switch+0x5d/0x150
    [] bio_alloc_bioset+0x1be/0x2e0
    [] blkdev_issue_flush+0x99/0x120
    [] jbd2_cleanup_journal_tail+0x93/0xa0 [jbd2] -->GFP_KERNEL
    [] jbd2_log_do_checkpoint+0x221/0x4a0 [jbd2]
    [] __jbd2_log_wait_for_space+0xa7/0x1e0 [jbd2]
    [] start_this_handle+0x2d8/0x550 [jbd2]
    [] ? __memcg_kmem_put_cache+0x29/0x30
    [] ? kmem_cache_alloc+0x130/0x210
    [] jbd2__journal_start+0xba/0x190 [jbd2]
    [] ? lru_cache_add+0xe/0x10
    [] ? ext4_da_write_begin+0xf9/0x330 [ext4]
    [] __ext4_journal_start_sb+0x77/0x160 [ext4]
    [] ext4_da_write_begin+0xf9/0x330 [ext4]
    [] generic_file_buffered_write_iter+0x10c/0x270
    [] __generic_file_write_iter+0x178/0x390
    [] __generic_file_aio_write+0x8b/0xb0
    [] generic_file_aio_write+0x5d/0xc0
    [] ext4_file_write+0xa9/0x450 [ext4]
    [] ? pipe_read+0x379/0x4f0
    [] do_sync_write+0x90/0xe0
    [] vfs_write+0xbd/0x1e0
    [] SyS_write+0x58/0xb0
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Dmitry Monakhov
     

18 Sep, 2014

2 commits

  • __jbd2_journal_clean_checkpoint_list() returns number of buffers it
    freed but noone was using the value so just stop doing that. This
    also allows for simplifying the calling convention for
    journal_clean_once_cp_list().

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Yuanhan has reported that when he is running fsync(2) heavy workload
    creating new files over ramdisk, significant amount of time is spent in
    __jbd2_journal_clean_checkpoint_list() trying to clean old transactions
    (but they cannot be cleaned up because flusher hasn't yet checkpointed
    those buffers). The workload can be generated by:
    fs_mark -d /fs/ram0/1 -D 2 -N 2560 -n 1000000 -L 1 -S 1 -s 4096

    Reduce the amount of scanning by stopping to scan the transaction list
    once we find a transaction that cannot be checkpointed. Note that this
    way of cleaning is still enough to keep freeing space in the journal
    after fully checkpointed transactions.

    Reported-and-tested-by: Yuanhan Liu
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

17 Sep, 2014

1 commit

  • If EIO happens after we have dropped j_state_lock, we won't notice
    that the journal has been aborted. So it is reasonable to move this
    check after we have grabbed the j_checkpoint_mutex and re-grabbed the
    j_state_lock. This patch helps to prevent false positive complain
    after EIO.

    #DMESG:
    __jbd2_log_wait_for_space: needed 8448 blocks and only had 8386 space available
    __jbd2_log_wait_for_space: no way to get more journal space in ram1-8
    ------------[ cut here ]------------
    WARNING: CPU: 15 PID: 6739 at fs/jbd2/checkpoint.c:168 __jbd2_log_wait_for_space+0x188/0x200()
    Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
    CPU: 15 PID: 6739 Comm: fsstress Tainted: G W 3.17.0-rc2-00429-g684de57 #139
    Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
    00000000000000a8 ffff88077aaab878 ffffffff815c1a8c 00000000000000a8
    0000000000000000 ffff88077aaab8b8 ffffffff8106ce8c ffff88077aaab898
    ffff8807c57e6000 ffff8807c57e6028 0000000000002100 ffff8807c57e62f0
    Call Trace:
    [] dump_stack+0x51/0x6d
    [] warn_slowpath_common+0x8c/0xc0
    [] warn_slowpath_null+0x1a/0x20
    [] __jbd2_log_wait_for_space+0x188/0x200
    [] start_this_handle+0x4da/0x7b0
    [] ? local_clock+0x25/0x30
    [] ? lockdep_init_map+0xe7/0x180
    [] jbd2__journal_start+0xdc/0x1d0
    [] ? __ext4_new_inode+0x7f4/0x1330
    [] __ext4_journal_start_sb+0xf8/0x110
    [] __ext4_new_inode+0x7f4/0x1330
    [] ? lock_release_holdtime+0x29/0x190
    [] ext4_create+0x8b/0x150
    [] vfs_create+0x7b/0xb0
    [] do_last+0x7db/0xcf0
    [] ? inode_permission+0x4d/0x50
    [] path_openat+0x242/0x590
    [] ? __alloc_fd+0x36/0x140
    [] do_filp_open+0x4a/0xb0
    [] ? __alloc_fd+0x121/0x140
    [] do_sys_open+0x170/0x220
    [] SyS_open+0x1e/0x20
    [] SyS_creat+0x16/0x20
    [] system_call_fastpath+0x16/0x1b
    ---[ end trace cd71c831f82059db ]---

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Theodore Ts'o

    Dmitry Monakhov
     

05 Sep, 2014

2 commits

  • When we discover written out buffer in transaction checkpoint list we
    don't have to recheck validity of a transaction. Either this is the
    last buffer in a transaction - and then we are done - or this isn't
    and then we can just take another buffer from the checkpoint list
    without dropping j_list_lock.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • The __jbd2_journal_remove_checkpoint() doesn't require an elevated
    b_count; indeed, until the jh structure gets released by the call to
    jbd2_journal_put_journal_head(), the bh's b_count is elevated by
    virtue of the existence of the jh structure.

    Suggested-by: Jan Kara
    Reviewed-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

02 Sep, 2014

2 commits

  • __wait_cp_io() is only called by jbd2_log_do_checkpoint(). Fold it in
    to make it a bit easier to understand.

    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     
  • __process_buffer() is only called by jbd2_log_do_checkpoint(), and it
    had a very complex locking protocol where it would be called with the
    j_list_lock, and sometimes exit with the lock held (if the return code
    was 0), or release the lock.

    This was confusing both to humans and to smatch (which erronously
    complained that the lock was taken twice).

    Folding __process_buffer() to the caller allows us to simplify the
    control flow, making the resulting function easier to read and reason
    about, and dropping the compiled size of fs/jbd2/checkpoint.c by 150
    bytes (over 4% of the text size).

    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Theodore Ts'o
     

13 Jun, 2013

1 commit

  • While trying to debug an an issue under extreme I/O loading
    on preempt-rt kernels, the following backtrace was observed
    via SysRQ output:

    rm D ffff8802203afbc0 4600 4878 4748 0x00000000
    ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80
    ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8
    ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000
    Call Trace:
    [] schedule+0x24/0x70
    [] jbd2_log_wait_commit+0xbd/0x140
    [] ? __init_waitqueue_head+0x50/0x50
    [] jbd2_log_do_checkpoint+0xf5/0x520
    [] __jbd2_log_wait_for_space+0xa9/0x1f0
    [] start_this_handle.isra.10+0x2e0/0x530
    [] ? __init_waitqueue_head+0x50/0x50
    [] jbd2__journal_start+0xc3/0x110
    [] ? ext4_rmdir+0x6e/0x230
    [] jbd2_journal_start+0xe/0x10
    [] ext4_journal_start_sb+0x5b/0x160
    [] ext4_rmdir+0x6e/0x230
    [] vfs_rmdir+0xd5/0x140
    [] do_rmdir+0xdf/0x120
    [] ? task_work_run+0x44/0x80
    [] ? do_notify_resume+0x89/0x100
    [] ? int_signal+0x12/0x17
    [] sys_unlinkat+0x25/0x40
    [] system_call_fastpath+0x16/0x1b

    What is interesting here, is that we call log_wait_commit, from
    within wait_for_space, but we are still holding the checkpoint_mutex
    as it surrounds mostly the whole of wait_for_space. And then, as we
    are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
    bit is set, then we will also try to take the same checkpoint_mutex.

    It seems that we need to drop the checkpoint_mutex while sitting in
    jbd2_log_wait_commit, if we want to guarantee that progress can be made
    by jbd2_journal_commit_transaction(). There does not seem to be
    anything preempt-rt specific about this, other then perhaps increasing
    the odds of it happening.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: "Theodore Ts'o"

    Paul Gortmaker
     

05 Jun, 2013

4 commits

  • j_wait_logspace and j_wait_checkpoint are unused. Remove them.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • __jbd2_log_space_left() and jbd_space_needed() were kind of odd.
    jbd_space_needed() accounted also credits needed for currently
    committing transaction while it didn't account for credits needed for
    control blocks. __jbd2_log_space_left() then accounted for control
    blocks as a fraction of free space. Since results of these two
    functions are always only compared against each other, this works
    correct but is somewhat strange. Move the estimates so that
    jbd_space_needed() returns number of blocks needed for a transaction
    including control blocks and __jbd2_log_space_left() returns free
    space in the journal (with the committing transaction already
    subtracted). Rename functions to jbd2_log_space_left() and
    jbd2_space_needed() while we are changing them.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • Similarly as for metadata buffers, also log descriptor buffers don't
    really need the journal head. So strip it and remove BJ_LogCtl list.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • When writing metadata to the journal, we create temporary buffer heads
    for that task. We also attach journal heads to these buffer heads but
    the only purpose of the journal heads is to keep buffers linked in
    transaction's BJ_IO list. We remove the need for journal heads by
    reusing buffer_head's b_assoc_buffers list for that purpose. Also
    since BJ_IO list is just a temporary list for transaction commit, we
    use a private list in jbd2_journal_commit_transaction() for that thus
    removing BJ_IO list from transaction completely.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

14 Mar, 2012

4 commits

  • All accesses to checkpointing entries in journal_head are protected
    by j_list_lock. Thus __jbd2_journal_remove_checkpoint() doesn't really
    need bh_state lock.

    Also the only part of journal head that the rest of checkpointing code
    needs to check is jh->b_transaction which is safe to read under
    j_list_lock.

    So we can safely remove bh_state lock from all of checkpointing code which
    makes it considerably prettier.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • BH_JWrite bit should be set when buffer is written to the journal. So
    checkpointing shouldn't set this bit when writing out buffer. This didn't
    cause any observable bug since BH_JWrite bit is used only for debugging
    purposes but it's good to have this consistent.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • When we reach jbd2_cleanup_journal_tail(), there is no guarantee that
    checkpointed buffers are on a stable storage - especially if buffers were
    written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's
    caches. Thus when we update journal superblock effectively removing old
    transaction from journal, this write of superblock can get to stable storage
    before those checkpointed buffers which can result in filesystem corruption
    after a crash. Thus we must unconditionally issue a cache flush before we
    update journal superblock in these cases.

    A similar problem can also occur if journal superblock is written only in
    disk's caches, other transaction starts reusing space of the transaction
    cleaned from the log and power failure happens. Subsequent journal replay would
    still try to replay the old transaction but some of it's blocks may be already
    overwritten by the new transaction. For this reason we must use WRITE_FUA when
    updating log tail and we must first write new log tail to disk and update
    in-memory information only after that.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • There are three case of updating journal superblock. In the first case, we want
    to mark journal as empty (setting s_sequence to 0), in the second case we want
    to update log tail, in the third case we want to update s_errno. Split these
    cases into separate functions. It makes the code slightly more straightforward
    and later patches will make the distinction even more important.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

21 Feb, 2012

2 commits


06 Dec, 2011

1 commit


28 Jun, 2011

1 commit

  • In journal checkpoint, we write the buffer and wait for its finish.
    But in cfq, the async queue has a very low priority, and in our test,
    if there are too many sync queues and every queue is filled up with
    requests, the write request will be delayed for quite a long time and
    all the tasks which are waiting for journal space will end with errors like:

    INFO: task attr_set:3816 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    attr_set D ffff880028393480 0 3816 1 0x00000000
    ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
    ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
    ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
    Call Trace:
    [] ? __dequeue_entity+0x33/0x38
    [] ? need_resched+0x23/0x2d
    [] ? thread_return+0xa2/0xbc
    [] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
    [] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
    [] __mutex_lock_common+0x14e/0x1a9
    [] ? brelse+0x13/0x15 [ext4]
    [] __mutex_lock_slowpath+0x19/0x1b
    [] mutex_lock+0x1b/0x32
    [] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
    [] start_this_handle+0x438/0x527 [jbd2]
    [] ? autoremove_wake_function+0x0/0x3e
    [] jbd2_journal_start+0xa1/0xcc [jbd2]
    [] ext4_journal_start_sb+0x57/0x81 [ext4]
    [] ext4_xattr_set+0x6c/0xe3 [ext4]
    [] ext4_xattr_user_set+0x42/0x4b [ext4]
    [] generic_setxattr+0x6b/0x76
    [] __vfs_setxattr_noperm+0x47/0xc0
    [] vfs_setxattr+0x7f/0x9a
    [] setxattr+0xb5/0xe8
    [] ? do_filp_open+0x571/0xa6e
    [] sys_fsetxattr+0x6b/0x91
    [] system_call_fastpath+0x16/0x1b

    So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
    be moved into sync queue and handled by cfq timely. We also use the new plug,
    sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"
    Cc: Jan Kara
    Reported-by: Robin Dong

    Tao Ma
     

14 Jun, 2011

1 commit

  • jbd2_journal_remove_journal_head() can oops when trying to access
    journal_head returned by bh2jh(). This is caused for example by the
    following race:

    TASK1 TASK2
    jbd2_journal_commit_transaction()
    ...
    processing t_forget list
    __jbd2_journal_refile_buffer(jh);
    if (!jh->b_transaction) {
    jbd_unlock_bh_state(bh);
    jbd2_journal_try_to_free_buffers()
    jbd2_journal_grab_journal_head(bh)
    jbd_lock_bh_state(bh)
    __journal_try_to_free_buffer()
    jbd2_journal_put_journal_head(jh)
    jbd2_journal_remove_journal_head(bh);

    jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
    buffer is not part of any transaction and thus frees journal_head
    before TASK1 gets to doing so. Note that even buffer_head can be
    released by try_to_free_buffers() after
    jbd2_journal_put_journal_head() which adds even larger opportunity for
    oops (but I didn't see this happen in reality).

    Fix the problem by making transactions hold their own journal_head
    reference (in b_jcount). That way we don't have to remove journal_head
    explicitely via jbd2_journal_remove_journal_head() and instead just
    remove journal_head when b_jcount drops to zero. The result of this is
    that [__]jbd2_journal_refile_buffer(),
    [__]jbd2_journal_unfile_buffer(), and
    __jdb2_journal_remove_checkpoint() can free journal_head which needs
    modification of a few callers. Also we have to be careful because once
    journal_head is removed, buffer_head might be freed as well. So we
    have to get our own buffer_head reference where it matters.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

28 Oct, 2010

2 commits


17 Sep, 2010

1 commit

  • All the blkdev_issue_* helpers can only sanely be used for synchronous
    caller. To issue cache flushes or barriers asynchronously the caller needs
    to set up a bio by itself with a completion callback to move the asynchronous
    state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always
    specified when calling blkdev_issue_* and also remove the now unused flags
    argument to blkdev_issue_flush and blkdev_issue_zeroout. For
    blkdev_issue_discard we need to keep it for the secure discard flag, which
    gains a more descriptive name and loses the bitops vs flag confusion.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

18 Aug, 2010

1 commit

  • These flags aren't real I/O types, but tell ll_rw_block to always
    lock the buffer instead of giving up on a failed trylock.

    Instead add a new write_dirty_buffer helper that implements this semantic
    and use it from the existing SWRITE* callers. Note that the ll_rw_block
    code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
    this patch fixes.

    In the ufs code clean up the helper that used to call ll_rw_block
    to mirror sync_dirty_buffer, which is the function it implements for
    compound buffers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

04 Aug, 2010

1 commit


02 Aug, 2010

1 commit


29 Apr, 2010

1 commit


23 Dec, 2009

2 commits

  • Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • This is a bit complicated because we are trying to optimize when we
    send barriers to the fs data disk. We could just throw in an extra
    barrier to the data disk whenever we send a barrier to the journal
    disk, but that's not always strictly necessary.

    We only need to send a barrier during a commit when there are data
    blocks which are must be written out due to an inode written in
    ordered mode, or if fsync() depends on the commit to force data blocks
    to disk. Finally, before we drop transactions from the beginning of
    the journal during a checkpoint operation, we need to guarantee that
    any blocks that were flushed out to the data disk are firmly on the
    rust platter before we drop the transaction from the journal.

    Thanks to Oleg Drokin for pointing out this flaw in ext3/ext4.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

30 Sep, 2009

1 commit

  • The /proc/fs/jbd2//history was maintained manually; by using
    tracepoints, we can get all of the existing functionality of the /proc
    file plus extra capabilities thanks to the ftrace infrastructure. We
    save memory as a bonus.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

17 Jun, 2009

1 commit


07 Nov, 2008

2 commits

  • Avoid freeing the transaction in __jbd2_journal_drop_transaction() so
    the journal commit callback can run without holding j_list_lock, to
    avoid lock contention on this spinlock.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • Commit 23f8b79e introducd a regression because it assumed that if
    there were no transactions ready to be checkpointed, that no progress
    could be made on making space available in the journal, and so the
    journal should be aborted. This assumption is false; it could be the
    case that simply calling jbd2_cleanup_journal_tail() will recover the
    necessary space, or, for small journals, the currently committing
    transaction could be responsible for chewing up the required space in
    the log, so we need to wait for the currently committing transaction
    to finish before trying to force a checkpoint operation.

    This patch fixes a bug reported by Mihai Harpau at:
    https://bugzilla.redhat.com/show_bug.cgi?id=469582

    This patch fixes a bug reported by François Valenduc at:
    http://bugzilla.kernel.org/show_bug.cgi?id=11840

    Signed-off-by: "Theodore Ts'o"
    Cc: Duane Griffin
    Cc: Toshiyuki Okajima

    Theodore Ts'o
     

05 Nov, 2008

1 commit


11 Oct, 2008

1 commit

  • When a checkpointing IO fails, current JBD2 code doesn't check the
    error and continue journaling. This means latest metadata can be
    lost from both the journal and filesystem.

    This patch leaves the failed metadata blocks in the journal space
    and aborts journaling in the case of jbd2_log_do_checkpoint().
    To achieve this, we need to do:

    1. don't remove the failed buffer from the checkpoint list where in
    the case of __try_to_free_cp_buf() because it may be released or
    overwritten by a later transaction
    2. jbd2_log_do_checkpoint() is the last chance, remove the failed
    buffer from the checkpoint list and abort the journal
    3. when checkpointing fails, don't update the journal super block to
    prevent the journaled contents from being cleaned. For safety,
    don't update j_tail and j_tail_sequence either
    4. when checkpointing fails, notify this error to the ext4 layer so
    that ext4 don't clear the needs_recovery flag, otherwise the
    journaled contents are ignored and cleaned in the recovery phase
    5. if the recovery fails, keep the needs_recovery flag
    6. prevent jbd2_cleanup_journal_tail() from being called between
    __jbd2_journal_drop_transaction() and jbd2_journal_abort()
    (a possible race issue between jbd2_log_do_checkpoint()s called by
    jbd2_journal_flush() and __jbd2_log_wait_for_space())

    Signed-off-by: Hidehiro Kawai
    Signed-off-by: Theodore Ts'o

    Hidehiro Kawai