11 Jul, 2011

1 commit


28 Jun, 2011

1 commit

  • In journal checkpoint, we write the buffer and wait for its finish.
    But in cfq, the async queue has a very low priority, and in our test,
    if there are too many sync queues and every queue is filled up with
    requests, the write request will be delayed for quite a long time and
    all the tasks which are waiting for journal space will end with errors like:

    INFO: task attr_set:3816 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    attr_set D ffff880028393480 0 3816 1 0x00000000
    ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
    ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
    ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
    Call Trace:
    [] ? __dequeue_entity+0x33/0x38
    [] ? need_resched+0x23/0x2d
    [] ? thread_return+0xa2/0xbc
    [] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
    [] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
    [] __mutex_lock_common+0x14e/0x1a9
    [] ? brelse+0x13/0x15 [ext4]
    [] __mutex_lock_slowpath+0x19/0x1b
    [] mutex_lock+0x1b/0x32
    [] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
    [] start_this_handle+0x438/0x527 [jbd2]
    [] ? autoremove_wake_function+0x0/0x3e
    [] jbd2_journal_start+0xa1/0xcc [jbd2]
    [] ext4_journal_start_sb+0x57/0x81 [ext4]
    [] ext4_xattr_set+0x6c/0xe3 [ext4]
    [] ext4_xattr_user_set+0x42/0x4b [ext4]
    [] generic_setxattr+0x6b/0x76
    [] __vfs_setxattr_noperm+0x47/0xc0
    [] vfs_setxattr+0x7f/0x9a
    [] setxattr+0xb5/0xe8
    [] ? do_filp_open+0x571/0xa6e
    [] sys_fsetxattr+0x6b/0x91
    [] system_call_fastpath+0x16/0x1b

    So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
    be moved into sync queue and handled by cfq timely. We also use the new plug,
    sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"
    Cc: Jan Kara
    Reported-by: Robin Dong

    Tao Ma
     

14 Jun, 2011

1 commit

  • jbd2_journal_remove_journal_head() can oops when trying to access
    journal_head returned by bh2jh(). This is caused for example by the
    following race:

    TASK1 TASK2
    jbd2_journal_commit_transaction()
    ...
    processing t_forget list
    __jbd2_journal_refile_buffer(jh);
    if (!jh->b_transaction) {
    jbd_unlock_bh_state(bh);
    jbd2_journal_try_to_free_buffers()
    jbd2_journal_grab_journal_head(bh)
    jbd_lock_bh_state(bh)
    __journal_try_to_free_buffer()
    jbd2_journal_put_journal_head(jh)
    jbd2_journal_remove_journal_head(bh);

    jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
    buffer is not part of any transaction and thus frees journal_head
    before TASK1 gets to doing so. Note that even buffer_head can be
    released by try_to_free_buffers() after
    jbd2_journal_put_journal_head() which adds even larger opportunity for
    oops (but I didn't see this happen in reality).

    Fix the problem by making transactions hold their own journal_head
    reference (in b_jcount). That way we don't have to remove journal_head
    explicitely via jbd2_journal_remove_journal_head() and instead just
    remove journal_head when b_jcount drops to zero. The result of this is
    that [__]jbd2_journal_refile_buffer(),
    [__]jbd2_journal_unfile_buffer(), and
    __jdb2_journal_remove_checkpoint() can free journal_head which needs
    modification of a few callers. Also we have to be careful because once
    journal_head is removed, buffer_head might be freed as well. So we
    have to get our own buffer_head reference where it matters.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

13 Jun, 2011

1 commit


27 May, 2011

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (61 commits)
    jbd2: Add MAINTAINERS entry
    jbd2: fix a potential leak of a journal_head on an error path
    ext4: teach ext4_ext_split to calculate extents efficiently
    ext4: Convert ext4 to new truncate calling convention
    ext4: do not normalize block requests from fallocate()
    ext4: enable "punch hole" functionality
    ext4: add "punch hole" flag to ext4_map_blocks()
    ext4: punch out extents
    ext4: add new function ext4_block_zero_page_range()
    ext4: add flag to ext4_has_free_blocks
    ext4: reserve inodes and feature code for 'quota' feature
    ext4: add support for multiple mount protection
    ext4: ensure f_bfree returned by ext4_statfs() is non-negative
    ext4: protect bb_first_free in ext4_trim_all_free() with group lock
    ext4: only load buddy bitmap in ext4_trim_fs() when it is needed
    jbd2: Fix comment to match the code in jbd2__journal_start()
    ext4: fix waiting and sending of a barrier in ext4_sync_file()
    jbd2: Add function jbd2_trans_will_send_data_barrier()
    jbd2: fix sending of data flush on journal commit
    ext4: fix ext4_ext_fiemap_cb() to handle blocks before request range correctly
    ...

    Linus Torvalds
     

26 May, 2011

1 commit


25 May, 2011

1 commit


24 May, 2011

2 commits

  • Provide a function which returns whether a transaction with given tid
    will send a flush to the filesystem device. The function will be used
    by ext4 to detect whether fsync needs to send a separate flush or not.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • In data=ordered mode, it's theoretically possible (however rare) that
    an inode is filed to transaction's t_inode_list and a flusher thread
    writes all the data and inode is reclaimed before the transaction
    starts to commit. In such a case, we could erroneously omit sending a
    flush to file system device when it is different from the journal
    device (because data can still be in disk cache only).

    Fix the problem by setting a flag in a transaction when some inode is added
    to it and then send disk flush in the commit code when the flag is set.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

23 May, 2011

1 commit

  • t_max_wait is added in commit 8e85fb3f to indicate how long we
    were waiting for new transaction to start. In commit 6d0bf005,
    it is moved to another function named update_t_max_wait to
    avoid a build warning. But the wrong thing is that the original
    'ts' is initialized in the start of function start_this_handle
    and we can calculate t_max_wait in the right way. while with
    this change, ts is initialized within the function and t_max_wait
    can never be calculated right.

    This patch moves the initialization of ts to the original beginning
    of start_this_handle and pass it to function update_t_max_wait so
    that it can be calculated right and the build warning is avoided also.

    Cc: Jan Kara
    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Eric Sandeen

    Tao Ma
     

17 May, 2011

1 commit


09 May, 2011

2 commits


02 May, 2011

1 commit

  • If an application program does not make any changes to the indirect
    blocks or extent tree, i_datasync_tid will not get updated. If there
    are enough commits (i.e., 2**31) such that tid_geq()'s calculations
    wrap, and there isn't a currently active transaction at the time of
    the fdatasync() call, this can end up triggering a BUG_ON in
    fs/jbd2/commit.c:

    J_ASSERT(journal->j_running_transaction != NULL);

    It's pretty rare that this can happen, since it requires the use of
    fdatasync() plus *very* frequent and excessive use of fsync(). But
    with the right workload, it can.

    We fix this by replacing the use of tid_geq() with an equality test,
    since there's only one valid transaction id that we is valid for us to
    wait until it is commited: namely, the currently running transaction
    (if it exists).

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

12 Apr, 2011

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: fix data corruption regression by reverting commit 6de9843dab3f
    ext4: Allow indirect-block file to grow the file size to max file size
    ext4: allow an active handle to be started when freezing
    ext4: sync the directory inode in ext4_sync_parent()
    ext4: init timer earlier to avoid a kernel panic in __save_error_info
    jbd2: fix potential memory leak on transaction commit
    ext4: fix a double free in ext4_register_li_request
    ext4: fix credits computing for indirect mapped files
    ext4: remove unnecessary [cm]time update of quota file
    jbd2: move bdget out of critical section

    Linus Torvalds
     

06 Apr, 2011

1 commit

  • There is potential memory leak of journal head in function
    jbd2_journal_commit_transaction. The problem is that JBD2 will not
    reclaim the journal head of commit record if error occurs or journal
    is abotred.

    I use the following script to reproduce this issue, on a RHEL6
    system. I found it very easy to reproduce with async commit enabled.

    mount /dev/sdb /mnt -o journal_checksum,journal_async_commit
    touch /mnt/xxx
    echo offline > /sys/block/sdb/device/state
    sync
    umount /mnt
    rmmod ext4
    rmmod jbd2

    Removal of the jbd2 module will make slab complaining that
    "cache `jbd2_journal_head': can't free all objects".

    Signed-off-by: Zhang Huan
    Signed-off-by: "Theodore Ts'o"

    Zhang Huan
     

05 Apr, 2011

1 commit


31 Mar, 2011

1 commit


25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

17 Mar, 2011

1 commit


10 Mar, 2011

1 commit

  • With the plugging now being explicitly controlled by the
    submitter, callers need not pass down unplugging hints
    to the block layer. If they want to unplug, it's because they
    manually plugged on their own - in which case, they should just
    unplug at will.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

01 Mar, 2011

1 commit


12 Feb, 2011

1 commit

  • On an SMP ARM system running ext4, I've received a report that the
    first J_ASSERT in jbd2_journal_commit_transaction has been triggering:

    J_ASSERT(journal->j_running_transaction != NULL);

    While investigating possible causes for this problem, I noticed that
    __jbd2_log_start_commit() is getting called with j_state_lock only
    read-locked, in spite of the fact that it's possible for it might
    j_commit_request. Fix this by grabbing the necessary information so
    we can test to see if we need to start a new transaction before
    dropping the read lock, and then calling jbd2_log_start_commit() which
    will grab the write lock.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

14 Jan, 2011

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    Documentation/trace/events.txt: Remove obsolete sched_signal_send.
    writeback: fix global_dirty_limits comment runtime -> real-time
    ppc: fix comment typo singal -> signal
    drivers: fix comment typo diable -> disable.
    m68k: fix comment typo diable -> disable.
    wireless: comment typo fix diable -> disable.
    media: comment typo fix diable -> disable.
    remove doc for obsolete dynamic-printk kernel-parameter
    remove extraneous 'is' from Documentation/iostats.txt
    Fix spelling milisec -> ms in snd_ps3 module parameter description
    Fix spelling mistakes in comments
    Revert conflicting V4L changes
    i7core_edac: fix typos in comments
    mm/rmap.c: fix comment
    sound, ca0106: Fix assignment to 'channel'.
    hrtimer: fix a typo in comment
    init/Kconfig: fix typo
    anon_inodes: fix wrong function name in comment
    fix comment typos concerning "consistent"
    poll: fix a typo in comment
    ...

    Fix up trivial conflicts in:
    - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
    - fs/ext4/ext4.h

    Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.

    Linus Torvalds
     

11 Jan, 2011

1 commit


23 Dec, 2010

1 commit


19 Dec, 2010

5 commits


17 Dec, 2010

1 commit


10 Dec, 2010

1 commit


18 Nov, 2010

1 commit


30 Oct, 2010

1 commit


28 Oct, 2010

5 commits

  • * 'upstream-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
    ext4,jbd2: convert tracepoints to use major/minor numbers
    ext4: optimize orphan_list handling for ext4_setattr
    ext4: fix unbalanced mutex unlock in error path of ext4_li_request_new
    ext4: fix compile error in ext4_fallocate()
    ext4: move ext4_mb_{get,put}_buddy_cache_lock and make them static
    ext4: rename mark_bitmap_end() to ext4_mark_bitmap_end()
    ext4: move flush_completed_IO to fs/ext4/fsync.c and make it static
    ext4: rename {ext,idx}_pblock and inline small extent functions
    ext4: make various ext4 functions be static
    ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*()
    ext4: fix kernel oops if the journal superblock has a non-zero j_errno
    ext4: update writeback_index based on last page scanned
    ext4: implement writeback livelock avoidance using page tagging
    ext4: tidy up a void argument in inode.c
    ext4: add batched_discard into ext4 feature list
    ext4: Add batched discard support for ext4
    fs: Add FITRIM ioctl
    ext4: Use return value from sb_issue_discard()
    ext4: Check return value of sb_getblk() and friends
    ext4: use bio layer instead of buffer layer in mpage_da_submit_io
    ...

    Linus Torvalds
     
  • Conflicts:
    fs/ext4/inode.c
    fs/ext4/mballoc.c
    include/trace/events/ext4.h

    Theodore Ts'o
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (24 commits)
    quota: Fix possible oops in __dquot_initialize()
    ext3: Update kernel-doc comments
    jbd/2: fixed typos
    ext2: fixed typo.
    ext3: Fix debug messages in ext3_group_extend()
    jbd: Convert atomic_inc() to get_bh()
    ext3: Remove misplaced BUFFER_TRACE() in ext3_truncate()
    jbd: Fix debug message in do_get_write_access()
    jbd: Check return value of __getblk()
    ext3: Use DIV_ROUND_UP() on group desc block counting
    ext3: Return proper error code on ext3_fill_super()
    ext3: Remove unnecessary casts on bh->b_data
    ext3: Cleanup ext3_setup_super()
    quota: Fix issuing of warnings from dquot_transfer
    quota: fix dquot_disable vs dquot_transfer race v2
    jbd: Convert bitops to buffer fns
    ext3/jbd: Avoid WARN() messages when failing to write the superblock
    jbd: Use offset_in_page() instead of manual calculation
    jbd: Remove unnecessary goto statement
    jbd: Use printk_ratelimited() in journal_alloc_journal_head()
    ...

    Linus Torvalds
     
  • An attempt to modify the file system during the call to
    jbd2_destroy_journal() can lead to a system lockup. So add some
    checking to make it much more obvious when this happens to and to
    determine where the offending code is located.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • This fixes a hang seen in jbd2_journal_release_jbd_inode
    on a lot of Power 6 systems running with ext4. When we get
    in the hung state, all I/O to the disk in question gets blocked
    where we stay indefinitely. Looking at the task list, I can see
    we are stuck in jbd2_journal_release_jbd_inode waiting on a
    wake up. I added some debug code to detect this scenario and
    dump additional data if we were stuck in jbd2_journal_release_jbd_inode
    for longer than 30 minutes. When it hit, I was able to see that
    i_flags was 0, suggesting we missed the wake up.

    This patch changes i_flags to be an unsigned long, uses bit operators
    to access it, and adds barriers around the accesses. Prior to applying
    this patch, we were regularly hitting this hang on numerous systems
    in our test environment. After applying the patch, the hangs no longer
    occur.

    Signed-off-by: Brian King
    Signed-off-by: "Theodore Ts'o"

    Brian King