09 Nov, 2012

1 commit

  • ext4_handle_release_buffer() was intended to remove journal
    write access from a buffer, but it doesn't actually do anything
    at all other than add a BUFFER_TRACE point, but it's not reliably
    used for that either. Remove all the associated dead code.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Carlos Maiolino

    Eric Sandeen
     

23 Jul, 2012

2 commits

  • The '__ext4_handle_dirty_metadata()' does not need the 'now' argument
    anymore and we can kill it.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Jan Kara

    Artem Bityutskiy
     
  • This patch adds support for quotas as a first class feature in ext4;
    which is to say, the quota files are stored in hidden inodes as file
    system metadata, instead of as separate files visible in the file system
    directory hierarchy.

    It is based on the proposal at:
    https://ext4.wiki.kernel.org/index.php/Design_For_1st_Class_Quota_in_Ext4

    This patch introduces a new feature - EXT4_FEATURE_RO_COMPAT_QUOTA
    which, when turned on, enables quota accounting at mount time
    iteself. Also, the quota inodes are stored in two additional superblock
    fields. Some changes introduced by this patch that should be pointed
    out are:

    1) Two new ext4-superblock fields - s_usr_quota_inum and
    s_grp_quota_inum for storing the quota inodes in use.
    2) Default quota inodes are: inode#3 for tracking userquota and inode#4
    for tracking group quota. The superblock fields can be set to use
    other inodes as well.
    3) If the QUOTA feature and corresponding quota inodes are set in
    superblock, the quota usage tracking is turned on at mount time. On
    'quotaon' ioctl, the quota limits enforcement is turned
    on. 'quotaoff' ioctl turns off only the limits enforcement in this
    case.
    4) When QUOTA feature is in use, the quota mount options 'quota',
    'usrquota', 'grpquota' are ignored by the kernel.
    5) mke2fs or tune2fs can be used to set the QUOTA feature and initialize
    quota inodes. The default reserved inodes will not be visible to user
    as regular files.
    6) The quota-tools will need to be modified to support hidden quota
    files on ext4. E2fsprogs will also include support for creating and
    fixing quota files.
    7) Support is only for the new V2 quota file format.

    Tested-by: Jan Kara
    Reviewed-by: Jan Kara
    Reviewed-by: Johann Lombardi
    Signed-off-by: Aditya Kali
    Signed-off-by: "Theodore Ts'o"

    Aditya Kali
     

30 Apr, 2012

1 commit

  • Calculate and verify the superblock checksum. Since the UUID and
    block group number are embedded in each copy of the superblock, we
    need only checksum the entire block. Refactor some of the code to
    eliminate open-coding of the checksum update call.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: "Theodore Ts'o"

    Darrick J. Wong
     

21 Feb, 2012

2 commits

  • The per-commit callback was used by mballoc code to manage free space
    bitmaps after deleted blocks have been released. This patch expands
    it to support multiple different callbacks, to allow other things to
    be done after the commit has been completed.

    Signed-off-by: Bobi Jam
    Signed-off-by: Andreas Dilger
    Signed-off-by: "Theodore Ts'o"

    Bobi Jam
     
  • Ext4 does not support data journalling with delayed allocation enabled.
    We even do not allow to mount the file system with delayed allocation
    and data journalling enabled, however it can be set via FS_IOC_SETFLAGS
    so we can hit the inode with EXT4_INODE_JOURNAL_DATA set even on file
    system mounted with delayed allocation (default) and that's where
    problem arises. The easies way to reproduce this problem is with the
    following set of commands:

    mkfs.ext4 /dev/sdd
    mount /dev/sdd /mnt/test1
    dd if=/dev/zero of=/mnt/test1/file bs=1M count=4
    chattr +j /mnt/test1/file
    dd if=/dev/zero of=/mnt/test1/file bs=1M count=4 conv=notrunc
    chattr -j /mnt/test1/file

    Additionally it can be reproduced quite reliably with xfstests 272 and
    269. In fact the above reproducer is a part of test 272.

    To fix this we should ignore the EXT4_INODE_JOURNAL_DATA inode flag if
    the file system is mounted with delayed allocation. This can be easily
    done by fixing ext4_should_*_data() functions do ignore data journal
    flag when delalloc is set (suggested by Ted). We also have to set the
    appropriate address space operations for the inode (again, ignoring data
    journal flag if delalloc enabled).

    Additionally this commit introduces ext4_inode_journal_mode() function
    because ext4_should_*_data() has already had a lot of common code and
    this change is putting it all into one function so it is easier to
    read.

    Successfully tested with xfstests in following configurations:

    delalloc + data=ordered
    delalloc + data=writeback
    data=journal
    nodelalloc + data=ordered
    nodelalloc + data=writeback
    nodelalloc + data=journal

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Lukas Czerner
     

13 Aug, 2011

1 commit

  • ext4_should_writeback_data() had an incorrect sequence of
    tests to determine if it should return 0 or 1: in
    particular, even in no-journal mode, 0 was being returned
    for a non-regular-file inode.

    This meant that, in non-journal mode, we would use
    ext4_journalled_aops for directories, symlinks, and other
    non-regular files. However, calling journalled aop
    callbacks when there is no valid handle, can cause problems.

    This would cause a kernel crash with Jan Kara's commit
    2d859db3e4 ("ext4: fix data corruption in inodes with
    journalled data"), because we now dereference 'handle' in
    ext4_journalled_write_end().

    I also added BUG_ONs to check for a valid handle in the
    obviously journal-only aops callbacks.

    I tested this running xfstests with a scratch device in
    these modes:

    - no-journal
    - data=ordered
    - data=writeback
    - data=journal

    All work fine; the data=journal run has many failures and a
    crash in xfstests 074, but this is no different from a
    vanilla kernel.

    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Curt Wohlgemuth
     

09 May, 2011

1 commit

  • The block allocation code used to use jbd2_journal_get_undo_access as
    a way to make changes that wouldn't show up until the commit took
    place. The new multi-block allocation code has a its own way of
    preventing newly freed blocks from getting reused until the commit
    takes place (it avoids updating the buddy bitmaps until the commit is
    done), so we don't need to use jbd2_journal_get_undo_access(), which
    has extra overhead compared to jbd2_journal_get_write_access().

    There was one last vestigal use of ext4_journal_get_undo_access() in
    ext4_add_groupblocks(); change it to use ext4_journal_get_write_access()
    and then remove the ext4_journal_get_undo_access() support.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

05 Apr, 2011

1 commit

  • It is not necessary to update [cm]time of quota file on each quota
    file write and it wastes journal space and IO throughput with inode
    writes. So just remove the updating from ext4_quota_write() and only
    update times when quotas are being turned off. Userspace cannot get
    anything reliable from quota files while they are used by the kernel
    anyway.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

21 Mar, 2011

1 commit

  • There are two wrapper functions which do exactly the same thing:
    ext4_journal_release_buffer(), and ext4_handle_release_buffer(). In
    addition, ext4_xattr_block_set() calls jbd2_journal_release_buffer()
    directly.

    Unify all of the code to use ext4_handle_release_buffer(), and get rid
    of ext4_journal_release_buffer().

    Signed-off-by: Amir Goldstein
    Signed-off-by: "Theodore Ts'o"

    Amir Goldstein
     

11 Jan, 2011

1 commit


27 Jul, 2010

1 commit


30 Jun, 2010

1 commit


15 Jun, 2010

1 commit


12 Jun, 2010

1 commit

  • We don't need to set s_dirt in most of the ext4 code when journaling
    is enabled. In ext3/4 some of the summary statistics for # of free
    inodes, blocks, and directories are calculated from the per-block
    group statistics when the file system is mounted or unmounted. As a
    result the superblock doesn't have to be updated, either via the
    journal or by setting s_dirt. There are a few exceptions, most
    notably when resizing the file system, where the superblock needs to
    be modified --- and in that case it should be done as a journalled
    operation if possible, and s_dirt set only in no-journal mode.

    This patch will optimize out some unneeded disk writes when using ext4
    with a journal.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

17 May, 2010

1 commit


05 Mar, 2010

1 commit

  • Allocate uninitialized extent before ext4 buffer write and
    convert the extent to initialized after io completes.
    The purpose is to make sure an extent can only be marked
    initialized after it has been written with new data so
    we can safely drop the i_mutex lock in ext4 DIO read without
    exposing stale data. This helps to improve multi-thread DIO
    read performance on high-speed disks.

    Skip the nobh and data=journal mount cases to make things simple for now.

    Signed-off-by: Jiaying Zhang
    Signed-off-by: "Theodore Ts'o"

    Jiaying Zhang
     

09 Dec, 2009

2 commits

  • We cannot rely on buffer dirty bits during fsync because pdflush can come
    before fsync is called and clear dirty bits without forcing a transaction
    commit. What we do is that we track which transaction has last changed
    the inode and which transaction last changed allocation and force it to
    disk on fsync.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • Currently all quota block reservation macros contains hard-coded "2"
    aka MAXQUOTAS value. This is no good because in some places it is not
    obvious to understand what does this digit represent. Let's introduce
    new macro with self descriptive name.

    Signed-off-by: Dmitry Monakhov
    Acked-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     

25 Nov, 2009

1 commit


23 Nov, 2009

2 commits

  • Convert the last two callers of ext4_journal_forget() to use
    ext4_forget() instead, and then fold ext4_journal_forget() into
    ext4_forget(). This reduces are code complexity and shortens our call
    stack.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • The ext4_forget() function better belongs in ext4_jbd2.c. This will
    allow us to do some cleanup of the ext4_journal_revoke() and
    ext4_journal_forget() functions, as well as giving us better error
    reporting since we can report the caller of ext4_forget() when things
    go wrong.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

29 Sep, 2009

1 commit


13 Jul, 2009

1 commit

  • We found a problem with buffer head reference leaks when using an ext4
    partition without a journal. In particular, calls to ext4_forget() would
    not to a brelse() on the input buffer head, which will cause pages they
    belong to to not be reclaimable.

    Further investigation showed that all places where ext4_journal_forget() and
    ext4_journal_revoke() are called are subject to the same problem. The patch
    below changes __ext4_journal_forget/__ext4_journal_revoke to do an explicit
    release of the buffer head when the journal handle isn't valid.

    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: "Theodore Ts'o"

    Curt Wohlgemuth
     

09 Jul, 2009

1 commit

  • If there is no journal, ext4_should_writeback_data() should return
    TRUE. This will fix ext4_set_aops() to set ext4_da_ops in the case of
    delayed allocation; otherwise ext4_journaled_aops gets used by
    default, which doesn't handle delayed allocation properly.

    The advantage of using ext4_should_writeback_data() approach is that
    it should handle nobh better as well.

    Thanks to Curt Wohlgemuth for investigating this problem, and Aneesh
    Kumar for suggesting this approach.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

07 Jan, 2009

2 commits

  • This mount option is largely superfluous, and in fact the way it was
    implemented was buggy; if a filesystem which did not have the extents
    feature flag was mounted -o extents, the filesystem would attempt to
    create and use extents-based file even though the extents feature flag
    was not eabled. The simplest thing to do is to nuke the mount option
    entirely. It's not all that useful to force the non-creation of new
    extent-based files if the filesystem can support it.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • A few weeks ago I posted a patch for discussion that allowed ext4 to run
    without a journal. Since that time I've integrated the excellent
    comments from Andreas and fixed several serious bugs. We're currently
    running with this patch and generating some performance numbers against
    both ext2 (with backported reservations code) and ext4 with and without
    a journal. It just so happens that running without a journal is
    slightly faster for most everything.

    We did
    iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2

    which creates 4 threads, each of which create and do reads and writes on
    a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
    to bypass the page cache. Results:

    ext2 ext4, default ext4, no journal
    initial writes 13.0 MB/s 15.4 MB/s 15.7 MB/s
    rewrites 13.1 MB/s 15.6 MB/s 15.9 MB/s
    reads 15.2 MB/s 16.9 MB/s 17.2 MB/s
    re-reads 15.3 MB/s 16.9 MB/s 17.2 MB/s
    random readers 5.6 MB/s 5.6 MB/s 5.7 MB/s
    random writers 5.1 MB/s 5.3 MB/s 5.4 MB/s

    So it seems that, so far, this was a useful exercise.

    Signed-off-by: Frank Mayhar
    Signed-off-by: "Theodore Ts'o"

    Frank Mayhar
     

20 Aug, 2008

1 commit

  • When considering how many journal credits are needed for modifying a
    chunk of data, we need to account for the super block, inode block,
    quota blocks and xattr block, indirect/index blocks, also, group bitmap
    and group descriptor blocks for new allocation (including data and
    indirect/index blocks). There are many places in ext4 do the calculation
    on their own and often missed one or two meta blocks, and often they
    assume single block allocation, and did not considering the multile
    chunk of allocation case.

    This patch is trying to cleanup current journal credit code, provides
    some common helper funtion to calculate the journal credits, to be used
    for writepage, writepages, DIO, fallocate, migration, defrag, and for
    both nonextent and extent files.

    This patch modified the writepage/write_begin credit caculation for
    nonextent files, to use the new helper function. It also fixed the
    problem that writepage on nonextent files did not consider the case
    blocksize
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     

14 Jul, 2008

1 commit


12 Jul, 2008

1 commit

  • This patch makes ext4 use inode-based implementation of data=ordered mode
    in JBD2. It allows us to unify some data=ordered and data=writeback paths
    (especially writepage since we don't have to start a transaction anymore)
    and remove some buffer walking.

    Updated fix from Aneesh Kumar K.V
    to fix file system hang due to corrupt jinode values.

    Signed-off-by: Jan Kara
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

30 Apr, 2008

1 commit