01 Jul, 2013

3 commits

  • If jbd2_journal_restart() fails the handle will have been disconnected
    from the current transaction. In this situation, the handle must not
    be used for for any jbd2 function other than jbd2_journal_stop().
    Enforce this with by treating a handle which has a NULL transaction
    pointer as an aborted handle, and issue a kernel warning if
    jbd2_journal_extent(), jbd2_journal_get_write_access(),
    jbd2_journal_dirty_metadata(), etc. is called with an invalid handle.

    This commit also fixes a bug where jbd2_journal_stop() would trip over
    a kernel jbd2 assertion check when trying to free an invalid handle.

    Also move the responsibility of setting current->journal_info to
    start_this_handle(), simplifying the three users of this function.

    Signed-off-by: "Theodore Ts'o"
    Reported-by: Younger Liu
    Cc: Jan Kara

    Theodore Ts'o
     
  • Once we decrement transaction->t_updates, if this is the last handle
    holding the transaction from closing, and once we release the
    t_handle_lock spinlock, it's possible for the transaction to commit
    and be released. In practice with normal kernels, this probably won't
    happen, since the commit happens in a separate kernel thread and it's
    unlikely this could all happen within the space of a few CPU cycles.

    On the other hand, with a real-time kernel, this could potentially
    happen, so save the tid found in transaction->t_tid before we release
    t_handle_lock. It would require an insane configuration, such as one
    where the jbd2 thread was set to a very high real-time priority,
    perhaps because a high priority real-time thread is trying to read or
    write to a file system. But some people who use real-time kernels
    have been known to do insane things, including controlling
    laser-wielding industrial robots. :-)

    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     
  • Some of the functions which modify the jbd2 superblock were not
    updating the checksum before calling jbd2_write_superblock(). Move
    the call to jbd2_superblock_csum_set() to jbd2_write_superblock(), so
    that the checksum is calculated consistently.

    Signed-off-by: "Theodore Ts'o"
    Cc: Darrick J. Wong
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     

13 Jun, 2013

6 commits

  • Commit b6e96d0067d8 ("jbd2: use module parameters instead of debugfs
    for jbd_debug") removed any need for a dependency on DEBUG_FS. It
    also moved the /sys variables out from underneath the typical debugfs
    mount point. Delete the dependency and update the /sys path to where
    the debug settings are currently.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: "Theodore Ts'o"

    Paul Gortmaker
     
  • Since the jbd_debug() is implemented with two separate printk()
    calls, it can lead to corrupted and misleading debug output like
    the following (see lines marked with "*"):

    [ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
    [ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
    [ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
    [* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
    [* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
    [* 290.339382] JBD2: starting commit of transaction 42104
    [ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
    [ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079

    i.e. the debug output from log_wait_commit and journal_commit_transaction
    have become interleaved. The output should have been:

    (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
    (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104

    It is expected that this is not easy to replicate -- I was only able
    to cause it on preempt-rt kernels, and even then only under heavy
    I/O load.

    Reported-by: Paul Gortmaker
    Suggested-by: "Theodore Ts'o"
    Signed-off-by: Paul Gortmaker
    Signed-off-by: "Theodore Ts'o"

    Paul Gortmaker
     
  • Currently we see this output:

    $git grep phase fs/jbd2
    fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 1\n");
    fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n");
    fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n");
    fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 3\n");
    fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 4\n");
    [...]

    There is clearly a duplicate label for phase 2, and they are
    both active (i.e. not in #if ... #else block). Rename them to
    be "2a" and "2b" so the debug output is unambiguous.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: "Theodore Ts'o"

    Paul Gortmaker
     
  • While trying to debug an an issue under extreme I/O loading
    on preempt-rt kernels, the following backtrace was observed
    via SysRQ output:

    rm D ffff8802203afbc0 4600 4878 4748 0x00000000
    ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80
    ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8
    ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000
    Call Trace:
    [] schedule+0x24/0x70
    [] jbd2_log_wait_commit+0xbd/0x140
    [] ? __init_waitqueue_head+0x50/0x50
    [] jbd2_log_do_checkpoint+0xf5/0x520
    [] __jbd2_log_wait_for_space+0xa9/0x1f0
    [] start_this_handle.isra.10+0x2e0/0x530
    [] ? __init_waitqueue_head+0x50/0x50
    [] jbd2__journal_start+0xc3/0x110
    [] ? ext4_rmdir+0x6e/0x230
    [] jbd2_journal_start+0xe/0x10
    [] ext4_journal_start_sb+0x5b/0x160
    [] ext4_rmdir+0x6e/0x230
    [] vfs_rmdir+0xd5/0x140
    [] do_rmdir+0xdf/0x120
    [] ? task_work_run+0x44/0x80
    [] ? do_notify_resume+0x89/0x100
    [] ? int_signal+0x12/0x17
    [] sys_unlinkat+0x25/0x40
    [] system_call_fastpath+0x16/0x1b

    What is interesting here, is that we call log_wait_commit, from
    within wait_for_space, but we are still holding the checkpoint_mutex
    as it surrounds mostly the whole of wait_for_space. And then, as we
    are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
    bit is set, then we will also try to take the same checkpoint_mutex.

    It seems that we need to drop the checkpoint_mutex while sitting in
    jbd2_log_wait_commit, if we want to guarantee that progress can be made
    by jbd2_journal_commit_transaction(). There does not seem to be
    anything preempt-rt specific about this, other then perhaps increasing
    the odds of it happening.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: "Theodore Ts'o"

    Paul Gortmaker
     
  • The state lock is taken after we are doing an assert on the state
    value, not before. So we might in fact be doing an assert on a
    transient value. Ensure the state check is within the scope of
    the state lock being taken.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: "Theodore Ts'o"

    Paul Gortmaker
     
  • Current implementation of jbd2_journal_force_commit() is suboptimal because
    result in empty and useless commits. But callers just want to force and wait
    any unfinished commits. We already have jbd2_journal_force_commit_nested()
    which does exactly what we want, except we are guaranteed that we do not hold
    journal transaction open.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     

05 Jun, 2013

8 commits

  • In some cases we cannot start a transaction because of locking
    constraints and passing started transaction into those places is not
    handy either because we could block transaction commit for too long.
    Transaction reservation is designed to solve these issues. It
    reserves a handle with given number of credits in the journal and the
    handle can be later attached to the running transaction without
    blocking on commit or checkpointing. Reserved handles do not block
    transaction commit in any way, they only reduce maximum size of the
    running transaction (because we have to always be prepared to
    accomodate request for attaching reserved handle).

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • j_wait_logspace and j_wait_checkpoint are unused. Remove them.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • jbd2_journal_extend() first checked whether transaction can accept
    extending handle with more credits and then added credits to
    t_outstanding_credits. This can race with start_this_handle() adding
    another handle to a transaction and thus overbooking a transaction.
    Make jbd2_journal_extend() use atomic_add_return() to close the race.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • __jbd2_log_space_left() and jbd_space_needed() were kind of odd.
    jbd_space_needed() accounted also credits needed for currently
    committing transaction while it didn't account for credits needed for
    control blocks. __jbd2_log_space_left() then accounted for control
    blocks as a fraction of free space. Since results of these two
    functions are always only compared against each other, this works
    correct but is somewhat strange. Move the estimates so that
    jbd_space_needed() returns number of blocks needed for a transaction
    including control blocks and __jbd2_log_space_left() returns free
    space in the journal (with the committing transaction already
    subtracted). Rename functions to jbd2_log_space_left() and
    jbd2_space_needed() while we are changing them.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • The comment about credit estimates isn't true anymore. We do what the
    comment describes now.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • Currently when we add a buffer to a transaction, we wait until the
    buffer is removed from BJ_Shadow list (so that we prevent any changes
    to the buffer that is just written to the journal). This can take
    unnecessarily long as a lot happens between the time the buffer is
    submitted to the journal and the time when we remove the buffer from
    BJ_Shadow list. (e.g. We wait for all data buffers in the
    transaction, we issue a cache flush, etc.) Also this creates a
    dependency of do_get_write_access() on transaction commit (namely
    waiting for data IO to complete) which we want to avoid when
    implementing transaction reservation.

    So we modify commit code to set new BH_Shadow flag when temporary
    shadowing buffer is created and we clear that flag once IO on that
    buffer is complete. This allows do_get_write_access() to wait only
    for BH_Shadow bit and thus removes the dependency on data IO
    completion.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • Similarly as for metadata buffers, also log descriptor buffers don't
    really need the journal head. So strip it and remove BJ_LogCtl list.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • When writing metadata to the journal, we create temporary buffer heads
    for that task. We also attach journal heads to these buffer heads but
    the only purpose of the journal heads is to keep buffers linked in
    transaction's BJ_IO list. We remove the need for journal heads by
    reusing buffer_head's b_assoc_buffers list for that purpose. Also
    since BJ_IO list is just a temporary list for transaction commit, we
    use a private list in jbd2_journal_commit_transaction() for that thus
    removing BJ_IO list from transaction completely.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

28 May, 2013

2 commits


22 May, 2013

1 commit

  • invalidatepage now accepts range to invalidate and there are two file
    system using jbd2 also implementing punch hole feature which can benefit
    from this. We need to implement the same thing for jbd2 layer in order to
    allow those file system take benefit of this functionality.

    This commit adds length argument to the jbd2_journal_invalidatepage()
    and updates all instances in ext4 and ocfs2.

    Signed-off-by: Lukas Czerner
    Reviewed-by: Jan Kara

    Lukas Czerner
     

02 May, 2013

1 commit

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     

01 May, 2013

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Mostly performance and bug fixes, plus some cleanups. The one new
    feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which
    allows installation of a hidden inode designed for boot loaders."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
    ext4: fix type-widening bug in inode table readahead code
    ext4: add check for inodes_count overflow in new resize ioctl
    ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG
    ext4: fix online resizing for ext3-compat file systems
    jbd2: trace when lock_buffer in do_get_write_access takes a long time
    ext4: mark metadata blocks using bh flags
    buffer: add BH_Prio and BH_Meta flags
    ext4: mark all metadata I/O with REQ_META
    ext4: fix readdir error in case inline_data+^dir_index.
    ext4: fix readdir error in the case of inline_data+dir_index
    jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
    ext4: mext_insert_extents should update extent block checksum
    ext4: move quota initialization out of inode allocation transaction
    ext4: reserve xattr index for Rich ACL support
    jbd2: reduce journal_head size
    ext4: clear buffer_uninit flag when submitting IO
    ext4: use io_end for multiple bios
    ext4: make ext4_bio_write_page() use BH_Async_Write flags
    ext4: Use kstrtoul() instead of parse_strtoul()
    ext4: defragmentation code cleanup
    ...

    Linus Torvalds
     

30 Apr, 2013

1 commit


22 Apr, 2013

1 commit

  • While investigating interactivity problems it was clear that processes
    sometimes stall for long periods of times if an attempt is made to
    lock a buffer which is undergoing writeback. It would stall in
    a trace looking something like

    [] __lock_buffer+0x2e/0x30
    [] do_get_write_access+0x43f/0x4b0
    [] jbd2_journal_get_write_access+0x2b/0x50
    [] __ext4_journal_get_write_access+0x39/0x80
    [] ext4_reserve_inode_write+0x78/0xa0
    [] ext4_mark_inode_dirty+0x49/0x220
    [] ext4_dirty_inode+0x41/0x60
    [] __mark_inode_dirty+0x4e/0x2d0
    [] update_time+0x79/0xc0
    [] file_update_time+0x98/0x100
    [] __generic_file_aio_write+0x17c/0x3b0
    [] generic_file_aio_write+0x7a/0xf0
    [] ext4_file_write+0x83/0xd0
    [] do_sync_write+0xa3/0xe0
    [] vfs_write+0xae/0x180
    [] sys_write+0x4d/0x90
    [] system_call_fastpath+0x1a/0x1f
    [] 0xffffffffffffffff

    Signed-off-by: Mel Gorman
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

20 Apr, 2013

1 commit


10 Apr, 2013

1 commit

  • The only part of proc_dir_entry the code outside of fs/proc
    really cares about is PDE(inode)->data. Provide a helper
    for that; static inline for now, eventually will be moved
    to fs/proc, along with the knowledge of struct proc_dir_entry
    layout.

    Signed-off-by: Al Viro

    Al Viro
     

04 Apr, 2013

2 commits

  • The following race is possible:

    [kjournald2] other_task
    jbd2_journal_commit_transaction()
    j_state = T_FINISHED;
    spin_unlock(&journal->j_list_lock);
    ->jbd2_journal_remove_checkpoint()
    ->jbd2_journal_free_transaction();
    ->kmem_cache_free(transaction)
    ->j_commit_callback(journal, transaction);
    -> USE_AFTER_FREE

    WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
    Hardware name:
    list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
    Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
    Pid: 16400, comm: jbd2/dm-1-8 Tainted: G W 3.8.0-rc3+ #107
    Call Trace:
    [] warn_slowpath_common+0xad/0xf0
    [] warn_slowpath_fmt+0x46/0x50
    [] ? ext4_journal_commit_callback+0x99/0xc0
    [] __list_del_entry+0x1c0/0x250
    [] ext4_journal_commit_callback+0x6f/0xc0
    [] jbd2_journal_commit_transaction+0x23a6/0x2570
    [] ? try_to_del_timer_sync+0x82/0xa0
    [] ? del_timer_sync+0x91/0x1e0
    [] kjournald2+0x19f/0x6a0
    [] ? wake_up_bit+0x40/0x40
    [] ? bit_spin_lock+0x80/0x80
    [] kthread+0x10e/0x120
    [] ? __init_kthread_worker+0x70/0x70
    [] ret_from_fork+0x7c/0xb0
    [] ? __init_kthread_worker+0x70/0x70

    In order to demonstrace this issue one should mount ext4 with mount -o
    discard option on SSD disk. This makes callback longer and race
    window becomes wider.

    In order to fix this we should mark transaction as finished only after
    callbacks have completed

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Dmitry Monakhov
     
  • In the case where an inode has a very stale transaction id (tid) in
    i_datasync_tid or i_sync_tid, it's possible that after a very large
    (2**31) number of transactions, that the tid number space might wrap,
    causing tid_geq()'s calculations to fail.

    Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
    by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
    attempted to fix this problem, but it only avoided kjournald spinning
    forever by fixing the logic in jbd2_log_start_commit().

    Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
    that might call jbd2_log_start_commit() with a stale tid, those
    functions will subsequently call jbd2_log_wait_commit() with the same
    stale tid, and then wait for a very long time. To fix this, we
    replace the calls to jbd2_log_start_commit() and
    jbd2_log_wait_commit() with a call to a new function,
    jbd2_complete_transaction(), which will correctly handle stale tid's.

    As a bonus, jbd2_complete_transaction() will avoid locking
    j_state_lock for writing unless a commit needs to be started. This
    should have a small (but probably not measurable) improvement for
    ext4's scalability.

    Signed-off-by: "Theodore Ts'o"
    Reported-by: Ben Hutchings
    Reported-by: George Barnett
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     

12 Mar, 2013

1 commit

  • jbd2_journal_dirty_metadata() didn't get a reference to journal_head it
    was working with. This is OK in most of the cases since the journal head
    should be attached to a transaction but in rare occasions when we are
    journalling data, __ext4_journalled_writepage() can race with
    jbd2_journal_invalidatepage() stripping buffers from a page and thus
    journal head can be freed under hands of jbd2_journal_dirty_metadata().

    Fix the problem by getting own journal head reference in
    jbd2_journal_dirty_metadata() (and also in jbd2_journal_set_triggers()
    which can possibly have the same issue).

    Reported-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Jan Kara
     

03 Mar, 2013

1 commit

  • If start_this_handle() failed handle will be initialized
    to ERR_PTR() and can not be dereferenced.

    paging request at fffffffffffffff6
    IP: [] jbd2__journal_start+0x18f/0x290
    PGD 200e067 PUD 200f067 PMD 0
    Oops: 0000 [#1] SMP
    Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
    CPU 0 journal commit I/O error

    Pid: 2694, comm: fio Not tainted 3.8.0-rc3+ #79 /DQ67SW
    RIP: 0010:[] [] jbd2__journal_start+0x18f/0x290
    RSP: 0018:ffff880233b8ba58 EFLAGS: 00010292
    RAX: 00000000ffffffe2 RBX: ffffffffffffffe2 RCX: 0000000000000006
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff82128f48
    RBP: ffff880233b8ba98 R08: 0000000000000000 R09: ffff88021440a6e0

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     

10 Feb, 2013

1 commit

  • There are multiple reasons to move away from debugfs. First of all,
    we are only using it for a single parameter, and it is much more
    complicated to set up (some 30 lines of code compared to 3), and one
    more thing that might fail while loading the jbd2 module.

    Secondly, as a module paramter it can be specified as a boot option if
    jbd2 is built into the kernel, or as a parameter when the module is
    loaded, and it can also be manipulated dynamically under
    /sys/module/jbd2/parameters/jbd2_debug. So it is more flexible.

    Ultimately we want to move away from using jbd_debug() towards
    tracepoints, but for now this is still a useful simplification of the
    code base.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

09 Feb, 2013

1 commit


07 Feb, 2013

1 commit

  • Track the delay between when we first request that the commit begin
    and when it actually begins, so we can see how much of a gap exists.
    In theory, this should just be the remaining scheduling quantuum of
    the thread which requested the commit (assuming it was not a
    synchronous operation which triggered the commit request) plus
    scheduling overhead; however, it's possible that real time processes
    might get in the way of letting the kjournald thread from executing.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

30 Jan, 2013

1 commit

  • Don't send an extra wakeup to kjournald in the case where we
    already have the proper target in j_commit_request, i.e. that
    transaction has already been requested for commit.

    commit deeeaf13 "jbd2: fix fsync() tid wraparound bug" changed
    the logic leading to a wakeup, but it caused some extra wakeups
    which were found to lead to a measurable performance regression.

    Signed-off-by: Eric Sandeen
    [tytso@mit.edu: reworked check to make it clearer]
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     

03 Jan, 2013

1 commit

  • Pull ext4 bug fixes from Ted Ts'o:
    "Various bug fixes for ext4. Perhaps the most serious bug fixed is one
    which could cause file system corruptions when performing file punch
    operations."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: avoid hang when mounting non-journal filesystems with orphan list
    ext4: lock i_mutex when truncating orphan inodes
    ext4: do not try to write superblock on ro remount w/o journal
    ext4: include journal blocks in df overhead calcs
    ext4: remove unaligned AIO warning printk
    ext4: fix an incorrect comment about i_mutex
    ext4: fix deadlock in journal_unmap_buffer()
    ext4: split off ext4_journalled_invalidatepage()
    jbd2: fix assertion failure in jbd2_journal_flush()
    ext4: check dioread_nolock on remount
    ext4: fix extent tree corruption caused by hole punch

    Linus Torvalds
     

26 Dec, 2012

1 commit

  • We cannot wait for transaction commit in journal_unmap_buffer()
    because we hold page lock which ranks below transaction start. We
    solve the issue by bailing out of journal_unmap_buffer() and
    jbd2_journal_invalidatepage() with -EBUSY. Caller is then responsible
    for waiting for transaction commit to finish and try invalidation
    again. Since the issue can happen only for page stradding i_size, it
    is simple enough to manually call jbd2_journal_invalidatepage() for
    such page from ext4_setattr(), check the return value and wait if
    necessary.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

21 Dec, 2012

1 commit

  • The following race is possible between start_this_handle() and someone
    calling jbd2_journal_flush().

    Process A Process B
    start_this_handle().
    if (journal->j_barrier_count) # false
    if (!journal->j_running_transaction) { #true
    read_unlock(&journal->j_state_lock);
    jbd2_journal_lock_updates()
    jbd2_journal_flush()
    write_lock(&journal->j_state_lock);
    if (journal->j_running_transaction) {
    # false
    ... wait for committing trans ...
    write_unlock(&journal->j_state_lock);
    ...
    write_lock(&journal->j_state_lock);
    if (!journal->j_running_transaction) { # true
    jbd2_get_transaction(journal, new_transaction);
    write_unlock(&journal->j_state_lock);
    goto repeat; # eventually blocks on j_barrier_count > 0
    ...
    J_ASSERT(!journal->j_running_transaction);
    # fails

    We fix the race by rechecking j_barrier_count after reacquiring j_state_lock
    in exclusive mode.

    Reported-by: yjwsignal@empal.com
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Jan Kara
     

17 Dec, 2012

1 commit

  • Pull ext4 update from Ted Ts'o:
    "There are two major features for this merge window. The first is
    inline data, which allows small files or directories to be stored in
    the in-inode extended attribute area. (This requires that the file
    system use inodes which are at least 256 bytes or larger; 128 byte
    inodes do not have any room for in-inode xattrs.)

    The second new feature is SEEK_HOLE/SEEK_DATA support. This is
    enabled by the extent status tree patches, and this infrastructure
    will be used to further optimize ext4 in the future.

    Beyond that, we have the usual collection of code cleanups and bug
    fixes."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (63 commits)
    ext4: zero out inline data using memset() instead of empty_zero_page
    ext4: ensure Inode flags consistency are checked at build time
    ext4: Remove CONFIG_EXT4_FS_XATTR
    ext4: remove unused variable from ext4_ext_in_cache()
    ext4: remove redundant initialization in ext4_fill_super()
    ext4: remove redundant code in ext4_alloc_inode()
    ext4: use sync_inode_metadata() when syncing inode metadata
    ext4: enable ext4 inline support
    ext4: let fallocate handle inline data correctly
    ext4: let ext4_truncate handle inline data correctly
    ext4: evict inline data out if we need to strore xattr in inode
    ext4: let fiemap work with inline data
    ext4: let ext4_rename handle inline dir
    ext4: let empty_dir handle inline dir
    ext4: let ext4_delete_entry() handle inline data
    ext4: make ext4_delete_entry generic
    ext4: let ext4_find_entry handle inline data
    ext4: create a new function search_dir
    ext4: let ext4_readdir handle inline data
    ext4: let add_dir_entry handle inline data properly
    ...

    Linus Torvalds
     

19 Nov, 2012

1 commit


09 Nov, 2012

1 commit

  • ext4_handle_release_buffer() was intended to remove journal
    write access from a buffer, but it doesn't actually do anything
    at all other than add a BUFFER_TRACE point, but it's not reliably
    used for that either. Remove all the associated dead code.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Carlos Maiolino

    Eric Sandeen