13 Aug, 2011

1 commit

  • ext4_should_writeback_data() had an incorrect sequence of
    tests to determine if it should return 0 or 1: in
    particular, even in no-journal mode, 0 was being returned
    for a non-regular-file inode.

    This meant that, in non-journal mode, we would use
    ext4_journalled_aops for directories, symlinks, and other
    non-regular files. However, calling journalled aop
    callbacks when there is no valid handle, can cause problems.

    This would cause a kernel crash with Jan Kara's commit
    2d859db3e4 ("ext4: fix data corruption in inodes with
    journalled data"), because we now dereference 'handle' in
    ext4_journalled_write_end().

    I also added BUG_ONs to check for a valid handle in the
    obviously journal-only aops callbacks.

    I tested this running xfstests with a scratch device in
    these modes:

    - no-journal
    - data=ordered
    - data=writeback
    - data=journal

    All work fine; the data=journal run has many failures and a
    crash in xfstests 074, but this is no different from a
    vanilla kernel.

    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Curt Wohlgemuth
     

09 May, 2011

1 commit

  • The block allocation code used to use jbd2_journal_get_undo_access as
    a way to make changes that wouldn't show up until the commit took
    place. The new multi-block allocation code has a its own way of
    preventing newly freed blocks from getting reused until the commit
    takes place (it avoids updating the buddy bitmaps until the commit is
    done), so we don't need to use jbd2_journal_get_undo_access(), which
    has extra overhead compared to jbd2_journal_get_write_access().

    There was one last vestigal use of ext4_journal_get_undo_access() in
    ext4_add_groupblocks(); change it to use ext4_journal_get_write_access()
    and then remove the ext4_journal_get_undo_access() support.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

05 Apr, 2011

1 commit

  • It is not necessary to update [cm]time of quota file on each quota
    file write and it wastes journal space and IO throughput with inode
    writes. So just remove the updating from ext4_quota_write() and only
    update times when quotas are being turned off. Userspace cannot get
    anything reliable from quota files while they are used by the kernel
    anyway.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

21 Mar, 2011

1 commit

  • There are two wrapper functions which do exactly the same thing:
    ext4_journal_release_buffer(), and ext4_handle_release_buffer(). In
    addition, ext4_xattr_block_set() calls jbd2_journal_release_buffer()
    directly.

    Unify all of the code to use ext4_handle_release_buffer(), and get rid
    of ext4_journal_release_buffer().

    Signed-off-by: Amir Goldstein
    Signed-off-by: "Theodore Ts'o"

    Amir Goldstein
     

11 Jan, 2011

1 commit


27 Jul, 2010

1 commit


30 Jun, 2010

1 commit


15 Jun, 2010

1 commit


12 Jun, 2010

1 commit

  • We don't need to set s_dirt in most of the ext4 code when journaling
    is enabled. In ext3/4 some of the summary statistics for # of free
    inodes, blocks, and directories are calculated from the per-block
    group statistics when the file system is mounted or unmounted. As a
    result the superblock doesn't have to be updated, either via the
    journal or by setting s_dirt. There are a few exceptions, most
    notably when resizing the file system, where the superblock needs to
    be modified --- and in that case it should be done as a journalled
    operation if possible, and s_dirt set only in no-journal mode.

    This patch will optimize out some unneeded disk writes when using ext4
    with a journal.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

17 May, 2010

1 commit


05 Mar, 2010

1 commit

  • Allocate uninitialized extent before ext4 buffer write and
    convert the extent to initialized after io completes.
    The purpose is to make sure an extent can only be marked
    initialized after it has been written with new data so
    we can safely drop the i_mutex lock in ext4 DIO read without
    exposing stale data. This helps to improve multi-thread DIO
    read performance on high-speed disks.

    Skip the nobh and data=journal mount cases to make things simple for now.

    Signed-off-by: Jiaying Zhang
    Signed-off-by: "Theodore Ts'o"

    Jiaying Zhang
     

09 Dec, 2009

2 commits

  • We cannot rely on buffer dirty bits during fsync because pdflush can come
    before fsync is called and clear dirty bits without forcing a transaction
    commit. What we do is that we track which transaction has last changed
    the inode and which transaction last changed allocation and force it to
    disk on fsync.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • Currently all quota block reservation macros contains hard-coded "2"
    aka MAXQUOTAS value. This is no good because in some places it is not
    obvious to understand what does this digit represent. Let's introduce
    new macro with self descriptive name.

    Signed-off-by: Dmitry Monakhov
    Acked-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     

25 Nov, 2009

1 commit


23 Nov, 2009

2 commits

  • Convert the last two callers of ext4_journal_forget() to use
    ext4_forget() instead, and then fold ext4_journal_forget() into
    ext4_forget(). This reduces are code complexity and shortens our call
    stack.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • The ext4_forget() function better belongs in ext4_jbd2.c. This will
    allow us to do some cleanup of the ext4_journal_revoke() and
    ext4_journal_forget() functions, as well as giving us better error
    reporting since we can report the caller of ext4_forget() when things
    go wrong.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

29 Sep, 2009

1 commit


13 Jul, 2009

1 commit

  • We found a problem with buffer head reference leaks when using an ext4
    partition without a journal. In particular, calls to ext4_forget() would
    not to a brelse() on the input buffer head, which will cause pages they
    belong to to not be reclaimable.

    Further investigation showed that all places where ext4_journal_forget() and
    ext4_journal_revoke() are called are subject to the same problem. The patch
    below changes __ext4_journal_forget/__ext4_journal_revoke to do an explicit
    release of the buffer head when the journal handle isn't valid.

    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: "Theodore Ts'o"

    Curt Wohlgemuth
     

09 Jul, 2009

1 commit

  • If there is no journal, ext4_should_writeback_data() should return
    TRUE. This will fix ext4_set_aops() to set ext4_da_ops in the case of
    delayed allocation; otherwise ext4_journaled_aops gets used by
    default, which doesn't handle delayed allocation properly.

    The advantage of using ext4_should_writeback_data() approach is that
    it should handle nobh better as well.

    Thanks to Curt Wohlgemuth for investigating this problem, and Aneesh
    Kumar for suggesting this approach.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

07 Jan, 2009

2 commits

  • This mount option is largely superfluous, and in fact the way it was
    implemented was buggy; if a filesystem which did not have the extents
    feature flag was mounted -o extents, the filesystem would attempt to
    create and use extents-based file even though the extents feature flag
    was not eabled. The simplest thing to do is to nuke the mount option
    entirely. It's not all that useful to force the non-creation of new
    extent-based files if the filesystem can support it.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • A few weeks ago I posted a patch for discussion that allowed ext4 to run
    without a journal. Since that time I've integrated the excellent
    comments from Andreas and fixed several serious bugs. We're currently
    running with this patch and generating some performance numbers against
    both ext2 (with backported reservations code) and ext4 with and without
    a journal. It just so happens that running without a journal is
    slightly faster for most everything.

    We did
    iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2

    which creates 4 threads, each of which create and do reads and writes on
    a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
    to bypass the page cache. Results:

    ext2 ext4, default ext4, no journal
    initial writes 13.0 MB/s 15.4 MB/s 15.7 MB/s
    rewrites 13.1 MB/s 15.6 MB/s 15.9 MB/s
    reads 15.2 MB/s 16.9 MB/s 17.2 MB/s
    re-reads 15.3 MB/s 16.9 MB/s 17.2 MB/s
    random readers 5.6 MB/s 5.6 MB/s 5.7 MB/s
    random writers 5.1 MB/s 5.3 MB/s 5.4 MB/s

    So it seems that, so far, this was a useful exercise.

    Signed-off-by: Frank Mayhar
    Signed-off-by: "Theodore Ts'o"

    Frank Mayhar
     

20 Aug, 2008

1 commit

  • When considering how many journal credits are needed for modifying a
    chunk of data, we need to account for the super block, inode block,
    quota blocks and xattr block, indirect/index blocks, also, group bitmap
    and group descriptor blocks for new allocation (including data and
    indirect/index blocks). There are many places in ext4 do the calculation
    on their own and often missed one or two meta blocks, and often they
    assume single block allocation, and did not considering the multile
    chunk of allocation case.

    This patch is trying to cleanup current journal credit code, provides
    some common helper funtion to calculate the journal credits, to be used
    for writepage, writepages, DIO, fallocate, migration, defrag, and for
    both nonextent and extent files.

    This patch modified the writepage/write_begin credit caculation for
    nonextent files, to use the new helper function. It also fixed the
    problem that writepage on nonextent files did not consider the case
    blocksize
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     

14 Jul, 2008

1 commit


12 Jul, 2008

1 commit

  • This patch makes ext4 use inode-based implementation of data=ordered mode
    in JBD2. It allows us to unify some data=ordered and data=writeback paths
    (especially writepage since we don't have to start a transaction anymore)
    and remove some buffer walking.

    Updated fix from Aneesh Kumar K.V
    to fix file system hang due to corrupt jinode values.

    Signed-off-by: Jan Kara
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

30 Apr, 2008

1 commit