22 Nov, 2011

1 commit

  • There is no reason to export two functions for entering the
    refrigerator. Calling refrigerator() instead of try_to_freeze()
    doesn't save anything noticeable or removes any race condition.

    * Rename refrigerator() to __refrigerator() and make it return bool
    indicating whether it scheduled out for freezing.

    * Update try_to_freeze() to return bool and relay the return value of
    __refrigerator() if freezing().

    * Convert all refrigerator() users to try_to_freeze().

    * Update documentation accordingly.

    * While at it, add might_sleep() to try_to_freeze().

    Signed-off-by: Tejun Heo
    Cc: Samuel Ortiz
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Steven Whitehouse
    Cc: Andrew Morton
    Cc: Jan Kara
    Cc: KONISHI Ryusuke
    Cc: Christoph Hellwig

    Tejun Heo
     

02 Nov, 2011

2 commits

  • Some jbd2 code prints out kernel messages with "JBD2: " prefix, at the
    same time other jbd2 code prints with "JBD: " prefix. Unify the prefix
    to "JBD2: ".

    Signed-off-by: Eryu Guan
    Signed-off-by: "Theodore Ts'o"

    Eryu Guan
     
  • I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when
    mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3
    image has s_first = 0 in journal superblock, and the 0 is passed to
    journal->j_head in journal_reset(), then to blocknr in
    cleanup_journal_tail(), in the end the J_ASSERT failed.

    So validate s_first after reading journal superblock from disk in
    journal_get_superblock() to ensure s_first is valid.

    The following script could reproduce it:

    fstype=ext3
    blocksize=1024
    img=$fstype.img
    offset=0
    found=0
    magic="c0 3b 39 98"

    dd if=/dev/zero of=$img bs=1M count=8
    mkfs -t $fstype -b $blocksize -F $img
    filesize=`stat -c %s $img`
    while [ $offset -lt $filesize ]
    do
    if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then
    echo "Found journal: $offset"
    found=1
    break
    fi
    offset=`echo "$offset+$blocksize" | bc`
    done

    if [ $found -ne 1 ];then
    echo "Magic \"$magic\" not found"
    exit 1
    fi

    dd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1

    mkdir -p ./mnt
    mount -o loop $img ./mnt

    Cc: Jan Kara
    Signed-off-by: Eryu Guan
    Signed-off-by: "Theodore Ts'o"

    Eryu Guan
     

11 Jul, 2011

1 commit


14 Jun, 2011

1 commit

  • jbd2_journal_remove_journal_head() can oops when trying to access
    journal_head returned by bh2jh(). This is caused for example by the
    following race:

    TASK1 TASK2
    jbd2_journal_commit_transaction()
    ...
    processing t_forget list
    __jbd2_journal_refile_buffer(jh);
    if (!jh->b_transaction) {
    jbd_unlock_bh_state(bh);
    jbd2_journal_try_to_free_buffers()
    jbd2_journal_grab_journal_head(bh)
    jbd_lock_bh_state(bh)
    __journal_try_to_free_buffer()
    jbd2_journal_put_journal_head(jh)
    jbd2_journal_remove_journal_head(bh);

    jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
    buffer is not part of any transaction and thus frees journal_head
    before TASK1 gets to doing so. Note that even buffer_head can be
    released by try_to_free_buffers() after
    jbd2_journal_put_journal_head() which adds even larger opportunity for
    oops (but I didn't see this happen in reality).

    Fix the problem by making transactions hold their own journal_head
    reference (in b_jcount). That way we don't have to remove journal_head
    explicitely via jbd2_journal_remove_journal_head() and instead just
    remove journal_head when b_jcount drops to zero. The result of this is
    that [__]jbd2_journal_refile_buffer(),
    [__]jbd2_journal_unfile_buffer(), and
    __jdb2_journal_remove_checkpoint() can free journal_head which needs
    modification of a few callers. Also we have to be careful because once
    journal_head is removed, buffer_head might be freed as well. So we
    have to get our own buffer_head reference where it matters.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

24 May, 2011

1 commit


09 May, 2011

1 commit


02 May, 2011

1 commit

  • If an application program does not make any changes to the indirect
    blocks or extent tree, i_datasync_tid will not get updated. If there
    are enough commits (i.e., 2**31) such that tid_geq()'s calculations
    wrap, and there isn't a currently active transaction at the time of
    the fdatasync() call, this can end up triggering a BUG_ON in
    fs/jbd2/commit.c:

    J_ASSERT(journal->j_running_transaction != NULL);

    It's pretty rare that this can happen, since it requires the use of
    fdatasync() plus *very* frequent and excessive use of fsync(). But
    with the right workload, it can.

    We fix this by replacing the use of tid_geq() with an equality test,
    since there's only one valid transaction id that we is valid for us to
    wait until it is commited: namely, the currently running transaction
    (if it exists).

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

12 Apr, 2011

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: fix data corruption regression by reverting commit 6de9843dab3f
    ext4: Allow indirect-block file to grow the file size to max file size
    ext4: allow an active handle to be started when freezing
    ext4: sync the directory inode in ext4_sync_parent()
    ext4: init timer earlier to avoid a kernel panic in __save_error_info
    jbd2: fix potential memory leak on transaction commit
    ext4: fix a double free in ext4_register_li_request
    ext4: fix credits computing for indirect mapped files
    ext4: remove unnecessary [cm]time update of quota file
    jbd2: move bdget out of critical section

    Linus Torvalds
     

05 Apr, 2011

1 commit


31 Mar, 2011

1 commit


01 Mar, 2011

1 commit


12 Feb, 2011

1 commit

  • On an SMP ARM system running ext4, I've received a report that the
    first J_ASSERT in jbd2_journal_commit_transaction has been triggering:

    J_ASSERT(journal->j_running_transaction != NULL);

    While investigating possible causes for this problem, I noticed that
    __jbd2_log_start_commit() is getting called with j_state_lock only
    read-locked, in spite of the fact that it's possible for it might
    j_commit_request. Fix this by grabbing the necessary information so
    we can test to see if we need to start a new transaction before
    dropping the read lock, and then calling jbd2_log_start_commit() which
    will grab the write lock.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

11 Jan, 2011

1 commit


19 Dec, 2010

1 commit


17 Dec, 2010

1 commit


18 Nov, 2010

1 commit


30 Oct, 2010

1 commit


28 Oct, 2010

5 commits

  • * 'upstream-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
    ext4,jbd2: convert tracepoints to use major/minor numbers
    ext4: optimize orphan_list handling for ext4_setattr
    ext4: fix unbalanced mutex unlock in error path of ext4_li_request_new
    ext4: fix compile error in ext4_fallocate()
    ext4: move ext4_mb_{get,put}_buddy_cache_lock and make them static
    ext4: rename mark_bitmap_end() to ext4_mark_bitmap_end()
    ext4: move flush_completed_IO to fs/ext4/fsync.c and make it static
    ext4: rename {ext,idx}_pblock and inline small extent functions
    ext4: make various ext4 functions be static
    ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*()
    ext4: fix kernel oops if the journal superblock has a non-zero j_errno
    ext4: update writeback_index based on last page scanned
    ext4: implement writeback livelock avoidance using page tagging
    ext4: tidy up a void argument in inode.c
    ext4: add batched_discard into ext4 feature list
    ext4: Add batched discard support for ext4
    fs: Add FITRIM ioctl
    ext4: Use return value from sb_issue_discard()
    ext4: Check return value of sb_getblk() and friends
    ext4: use bio layer instead of buffer layer in mpage_da_submit_io
    ...

    Linus Torvalds
     
  • Conflicts:
    fs/ext4/inode.c
    fs/ext4/mballoc.c
    include/trace/events/ext4.h

    Theodore Ts'o
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (24 commits)
    quota: Fix possible oops in __dquot_initialize()
    ext3: Update kernel-doc comments
    jbd/2: fixed typos
    ext2: fixed typo.
    ext3: Fix debug messages in ext3_group_extend()
    jbd: Convert atomic_inc() to get_bh()
    ext3: Remove misplaced BUFFER_TRACE() in ext3_truncate()
    jbd: Fix debug message in do_get_write_access()
    jbd: Check return value of __getblk()
    ext3: Use DIV_ROUND_UP() on group desc block counting
    ext3: Return proper error code on ext3_fill_super()
    ext3: Remove unnecessary casts on bh->b_data
    ext3: Cleanup ext3_setup_super()
    quota: Fix issuing of warnings from dquot_transfer
    quota: fix dquot_disable vs dquot_transfer race v2
    jbd: Convert bitops to buffer fns
    ext3/jbd: Avoid WARN() messages when failing to write the superblock
    jbd: Use offset_in_page() instead of manual calculation
    jbd: Remove unnecessary goto statement
    jbd: Use printk_ratelimited() in journal_alloc_journal_head()
    ...

    Linus Torvalds
     
  • This fixes a hang seen in jbd2_journal_release_jbd_inode
    on a lot of Power 6 systems running with ext4. When we get
    in the hung state, all I/O to the disk in question gets blocked
    where we stay indefinitely. Looking at the task list, I can see
    we are stuck in jbd2_journal_release_jbd_inode waiting on a
    wake up. I added some debug code to detect this scenario and
    dump additional data if we were stuck in jbd2_journal_release_jbd_inode
    for longer than 30 minutes. When it hit, I was able to see that
    i_flags was 0, suggesting we missed the wake up.

    This patch changes i_flags to be an unsigned long, uses bit operators
    to access it, and adds barriers around the accesses. Prior to applying
    this patch, we were regularly hitting this hang on numerous systems
    in our test environment. After applying the patch, the hangs no longer
    occur.

    Signed-off-by: Brian King
    Signed-off-by: "Theodore Ts'o"

    Brian King
     
  • "wakup"

    Signed-off-by: Andrea Gelmini
    Signed-off-by: Jan Kara

    Andrea Gelmini
     

10 Sep, 2010

1 commit

  • Before we start accessing a huge (> 16 TiB) OCFS2 volume, we need to
    confirm that its journal supports 64-bit offsets. In particular, we
    need to check the journal's feature bits before recovering the journal.

    This is not possible with JBD2 at present, because the journal
    superblock (where the feature bits reside) is not loaded from disk until
    the journal is recovered.

    This patch loads the journal superblock in
    jbd2_journal_check_used_features() if it has not already been loaded,
    allowing us to check the feature bits before journal recovery.

    Signed-off-by: Patrick LoPresti
    Cc: linux-ext4@vger.kernel.org
    Acked-by: "Theodore Ts'o"
    Signed-off-by: Joel Becker

    Patrick J. LoPresti
     

18 Aug, 2010

1 commit

  • These flags aren't real I/O types, but tell ll_rw_block to always
    lock the buffer instead of giving up on a failed trylock.

    Instead add a new write_dirty_buffer helper that implements this semantic
    and use it from the existing SWRITE* callers. Note that the ll_rw_block
    code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
    this patch fixes.

    In the ufs code clean up the helper that used to call ll_rw_block
    to mirror sync_dirty_buffer, which is the function it implements for
    compound buffers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

08 Aug, 2010

1 commit

  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
    ext4: Adding error check after calling ext4_mb_regular_allocator()
    ext4: Fix dirtying of journalled buffers in data=journal mode
    ext4: re-inline ext4_rec_len_(to|from)_disk functions
    jbd2: Remove t_handle_lock from start_this_handle()
    jbd2: Change j_state_lock to be a rwlock_t
    jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop
    ext4: Add mount options in superblock
    ext4: force block allocation on quota_off
    ext4: fix freeze deadlock under IO
    ext4: drop inode from orphan list if ext4_delete_inode() fails
    ext4: check to make make sure bd_dev is set before dereferencing it
    jbd2: Make barrier messages less scary
    ext4: don't print scary messages for allocation failures post-abort
    ext4: fix EFBIG edge case when writing to large non-extent file
    ext4: fix ext4_get_blocks references
    ext4: Always journal quota file modifications
    ext4: Fix potential memory leak in ext4_fill_super
    ext4: Don't error out the fs if the user tries to make a file too big
    ext4: allocate stripe-multiple IOs on stripe boundaries
    ext4: move aio completion after unwritten extent conversion
    ...

    Fix up conflicts in fs/ext4/inode.c as per Ted.

    Fix up xfs conflicts as per earlier xfs merge.

    Linus Torvalds
     

04 Aug, 2010

1 commit


27 Jul, 2010

1 commit

  • __GFP_NOFAIL is going away, so add our own retry loop. Also add
    jbd2__journal_start() and jbd2__journal_restart() which take a gfp
    mask, so that file systems can optionally (re)start transaction
    handles using GFP_KERNEL. If they do this, then they need to be
    prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready
    to reflect that error up to userspace.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

16 Jul, 2010

1 commit

  • OCFS2 uses t_commit trigger to compute and store checksum of the just
    committed blocks. When a buffer has b_frozen_data, checksum is computed
    for it instead of b_data but this can result in an old checksum being
    written to the filesystem in the following scenario:

    1) transaction1 is opened
    2) handle1 is opened
    3) journal_access(handle1, bh)
    - This sets jh->b_transaction to transaction1
    4) modify(bh)
    5) journal_dirty(handle1, bh)
    6) handle1 is closed
    7) start committing transaction1, opening transaction2
    8) handle2 is opened
    9) journal_access(handle2, bh)
    - This copies off b_frozen_data to make it safe for transaction1 to commit.
    jh->b_next_transaction is set to transaction2.
    10) jbd2_journal_write_metadata() checksums b_frozen_data
    11) the journal correctly writes b_frozen_data to the disk journal
    12) handle2 is closed
    - There was no dirty call for the bh on handle2, so it is never queued for
    any more journal operation
    13) Checkpointing finally happens, and it just spools the bh via normal buffer
    writeback. This will write b_data, which was never triggered on and thus
    contains a wrong (old) checksum.

    This patch fixes the problem by calling the trigger at the moment data is
    frozen for journal commit - i.e., either when b_frozen_data is created by
    do_get_write_access or just before we write a buffer to the log if
    b_frozen_data does not exist. We also rename the trigger to t_frozen as
    that better describes when it is called.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Jan Kara
     

15 Jun, 2010

1 commit


11 May, 2010

1 commit


23 Dec, 2009

2 commits


10 Dec, 2009

1 commit


07 Dec, 2009

1 commit

  • Now that the SLUB seems to be fixed so that it respects the requested
    alignment, use kmem_cache_alloc() to allocator if the block size of
    the buffer heads to be allocated is less than the page size.
    Previously, we were using 16k page on a Power system for each buffer,
    even when the file system was using 1k or 4k block size.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

01 Dec, 2009

1 commit


16 Nov, 2009

1 commit

  • If there is a failed journal checksum, don't reset the journal. This
    allows for userspace programs to decide how to recover from this
    situation. It may be that ignoring the journal checksum failure might
    be a better way of recovering the file system. Once we add per-block
    checksums, we can definitely do better. Until then, a system
    administrator can try backing up the file system image (or taking a
    snapshot) and and trying to determine experimentally whether ignoring
    the checksum failure or aborting the journal replay results in less
    data loss.

    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Theodore Ts'o
     

11 Nov, 2009

1 commit


02 Oct, 2009

1 commit


30 Sep, 2009

1 commit

  • The /proc/fs/jbd2//history was maintained manually; by using
    tracepoints, we can get all of the existing functionality of the /proc
    file plus extra capabilities thanks to the ftrace infrastructure. We
    save memory as a bonus.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o