28 Jul, 2014

1 commit


13 May, 2014

1 commit


29 Aug, 2013

1 commit


17 Aug, 2013

1 commit

  • When we read in an extent tree leaf block from disk, arrange to have
    all of its entries cached. In nearly all cases the in-memory
    representation will be more compact than the on-disk representation in
    the buffer cache, and it allows us to get the information without
    having to traverse the extent tree for successive extents.

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu

    Theodore Ts'o
     

11 Apr, 2013

2 commits

  • With bigalloc feature enabled we do not support indirect addressing at all
    so we have to prevent extent addressing to indirect addressing
    conversion in this case. The problem has been introduced with the commit
    "ext4: support simple conversion of extent-mapped inodes to use i_blocks"

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"

    Lukas Czerner
     
  • Move ext4_ind_migrate() into migrate.c file since it makes much more
    sense and ext4_ext_migrate() is there as well.

    Also fix tiny style problem - add spaces around "=" in "i=0".

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"

    Lukas Czerner
     

10 Feb, 2013

1 commit


09 Feb, 2013

1 commit

  • So we can better understand what bits of ext4 are responsible for
    long-running jbd2 handles, use jbd2__journal_start() so we can pass
    context information for logging purposes.

    The recommended way for finding the longer-running handles is:

    T=/sys/kernel/debug/tracing
    EVENT=$T/events/jbd2/jbd2_handle_stats
    echo "interval > 5" > $EVENT/filter
    echo 1 > $EVENT/enable

    ./run-my-fs-benchmark

    cat $T/trace > /tmp/problem-handles

    This will list handles that were active for longer than 20ms. Having
    longer-running handles is bad, because a commit started at the wrong
    time could stall for those 20+ milliseconds, which could delay an
    fsync() or an O_SYNC operation. Here is an example line from the
    trace file describing a handle which lived on for 311 jiffies, or over
    1.2 seconds:

    postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
    tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
    dirtied_blocks 0

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

29 Nov, 2012

1 commit

  • Previously, ext4_extents.h was being included at the end of ext4.h,
    which was bad for a number of reasons: (a) it was not being included
    in the expected place, and (b) it caused the header to be included
    multiple times. There were #ifdef's to prevent this from causing any
    problems, but it still was unnecessary.

    By moving the function declarations that were in ext4_extents.h to
    ext4.h, which is standard practice for where the function declarations
    for the rest of ext4.h can be found, we can remove ext4_extents.h from
    being included in ext4.h at all, and then we can only include
    ext4_extents.h where it is needed in ext4's source files.

    It should be possible to move a few more things into ext4.h, and
    further reduce the number of source files that need to #include
    ext4_extents.h, but that's a cleanup for another day.

    Reported-by: Sachin Kamat
    Reported-by: Wei Yongjun
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

16 May, 2012

1 commit


21 Feb, 2012

1 commit


09 Jan, 2012

1 commit

  • Delete any instances of include module.h that were not strictly
    required. In the case of ext2, the declaration of MODULE_LICENSE
    etc. were in inode.c but the module_init/exit were in super.c, so
    relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it
    consistent with ext3 and ext4 at the same time.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Jan Kara

    Paul Gortmaker
     

03 Nov, 2011

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue:
    vfs: add d_prune dentry operation
    vfs: protect i_nlink
    filesystems: add set_nlink()
    filesystems: add missing nlink wrappers
    logfs: remove unnecessary nlink setting
    ocfs2: remove unnecessary nlink setting
    jfs: remove unnecessary nlink setting
    hypfs: remove unnecessary nlink setting
    vfs: ignore error on forced remount
    readlinkat: ensure we return ENOENT for the empty pathname for normal lookups
    vfs: fix dentry leak in simple_fill_super()

    Linus Torvalds
     

02 Nov, 2011

1 commit


29 Oct, 2011

2 commits

  • The tmp_inode should have same uid/gid as the original inode.
    Otherwise new metadata blocks will be accounted to wrong quota-id,
    which will result in a quota leak after the inode migration is
    completed.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • This patch cleanup code a bit, actual logic not changed
    - Move current block pointer to migrate_structure, let's all
    walk info will be in one structure.
    - Get rid of usless null ind-block ptr checks, caller already
    does that check.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     

10 Sep, 2011

1 commit


03 May, 2011

1 commit


31 Mar, 2011

1 commit


22 Feb, 2011

1 commit


11 Jan, 2011

1 commit


28 Oct, 2010

1 commit


14 Jun, 2010

1 commit


17 May, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

02 Mar, 2010

1 commit

  • Set i_nlink to zero for temporary inode from very beginning.
    otherwise we may fail to start new journal handle and this
    inode will be unreferenced but with i_nlink == 1
    Since we hold inode reference it can not be pruned.

    Also add missed journal_start retval check.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     

25 Jan, 2010

1 commit

  • At several places we modify EXT4_I(inode)->i_state without holding
    i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage,
    ext4_do_update_inode, ...). These modifications are racy and we can
    lose updates to i_state. So convert handling of i_state to use bitops
    which are atomic.

    Cc: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

09 Dec, 2009

1 commit

  • Currently all quota block reservation macros contains hard-coded "2"
    aka MAXQUOTAS value. This is no good because in some places it is not
    obvious to understand what does this digit represent. Let's introduce
    new macro with self descriptive name.

    Signed-off-by: Dmitry Monakhov
    Acked-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     

23 Nov, 2009

1 commit

  • Add the facility for ext4_forget() to be called from
    ext4_free_blocks(). This simplifies the code in a large number of
    places, and centralizes most of the work of calling ext4_forget() into
    a single place.

    Also fix a bug in the extents migration code; it wasn't calling
    ext4_forget() when releasing the indirect blocks during the
    conversion. As a result, if the system cashed during or shortly after
    the extents migration, and the released indirect blocks get reused as
    data blocks, the journal replay would corrupt the data blocks. With
    this new patch, fixing this bug was as simple as adding the
    EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks().

    Signed-off-by: "Theodore Ts'o"
    Cc: "Aneesh Kumar K.V"

    Theodore Ts'o
     

29 Sep, 2009

1 commit

  • When writing into an unitialized extent via direct I/O, and the direct
    I/O doesn't exactly cover the unitialized extent, split the extent
    into uninitialized and initialized extents before submitting the I/O.
    This avoids needing to deal with an ENOSPC error in the end_io
    callback that gets used for direct I/O.

    When the IO is complete, the written extent will be marked as initialized.

    Singed-Off-By: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     

17 Sep, 2009

1 commit

  • EXT4_EXT_MIGRATE is only intended to be used for an in-memory flag,
    and the hex value assigned to it collides with FS_DIRECTIO_FL (which
    is also stored in i_flags). There's no reason for the
    EXT4_EXT_MIGRATE bit to be stored in i_flags, so we switch it to use
    i_state instead.

    Cc: "Aneesh Kumar K.V"
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

26 Aug, 2009

1 commit

  • We need to unlock the new inode before iput. This patch fixes the
    following warning when calling chattr +e to migrate a file to use
    extents. It also fixes problems in when e4defrag attempts to
    defragment an inode.

    [ 470.400044] ------------[ cut here ]------------
    [ 470.400065] WARNING: at fs/inode.c:1210 generic_delete_inode+0x65/0x16a()
    [ 470.400072] Hardware name: N/A
    .....
    ...
    [ 470.400353] Pid: 4451, comm: chattr Not tainted 2.6.31-rc7-red-debug #4
    [ 470.400359] Call Trace:
    [ 470.400372] [] warn_slowpath_common+0x77/0x8f
    [ 470.400385] [] warn_slowpath_null+0xf/0x11
    [ 470.400395] [] generic_delete_inode+0x65/0x16a
    [ 470.400405] [] generic_drop_inode+0x17/0x1bd
    [ 470.400413] [] iput+0x61/0x65
    [ 470.400455] [] ext4_ext_migrate+0x5eb/0x66a [ext4]
    [ 470.400492] [] ext4_ioctl+0x340/0x756 [ext4]
    [ 470.400507] [] vfs_ioctl+0x1d/0x82
    [ 470.400517] [] do_vfs_ioctl+0x483/0x4c9
    [ 470.400527] [] ? trace_hardirqs_on+0xd/0xf
    [ 470.400537] [] sys_ioctl+0x51/0x74
    [ 470.400549] [] system_call_fastpath+0x16/0x1b
    [ 470.400557] ---[ end trace ab85723542352dac ]---

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     

13 Jun, 2009

2 commits

  • Enhance the inode allocator to take a goal inode number as a
    paremeter; if it is specified, it takes precedence over Orlov or
    parent directory inode allocation algorithms.

    The extents migration function uses the goal inode number so that the
    extent trees allocated the migration function use the correct flex_bg.
    In the future, the goal inode functionality will also be used to
    allocate an adjacent inode for the extended attributes.

    Also, for testing purposes the goal inode number can be specified via
    /sys/fs/{dev}/inode_goal. This can be useful for testing inode
    allocation beyond 2^32 blocks on very large filesystems.

    Signed-off-by: Andreas Dilger
    Signed-off-by: "Theodore Ts'o"

    Andreas Dilger
     
  • Instead of using a random number to determine the goal parent grop for
    the Orlov top directories, use a hash of the directory name. This
    allows for repeatable results when trying to benchmark filesystem
    layout algorithms.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

16 Feb, 2009

1 commit


07 Jan, 2009

2 commits

  • This mount option is largely superfluous, and in fact the way it was
    implemented was buggy; if a filesystem which did not have the extents
    feature flag was mounted -o extents, the filesystem would attempt to
    create and use extents-based file even though the extents feature flag
    was not eabled. The simplest thing to do is to nuke the mount option
    entirely. It's not all that useful to force the non-creation of new
    extent-based files if the filesystem can support it.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • A few weeks ago I posted a patch for discussion that allowed ext4 to run
    without a journal. Since that time I've integrated the excellent
    comments from Andreas and fixed several serious bugs. We're currently
    running with this patch and generating some performance numbers against
    both ext2 (with backported reservations code) and ext4 with and without
    a journal. It just so happens that running without a journal is
    slightly faster for most everything.

    We did
    iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2

    which creates 4 threads, each of which create and do reads and writes on
    a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
    to bypass the page cache. Results:

    ext2 ext4, default ext4, no journal
    initial writes 13.0 MB/s 15.4 MB/s 15.7 MB/s
    rewrites 13.1 MB/s 15.6 MB/s 15.9 MB/s
    reads 15.2 MB/s 16.9 MB/s 17.2 MB/s
    re-reads 15.3 MB/s 16.9 MB/s 17.2 MB/s
    random readers 5.6 MB/s 5.6 MB/s 5.7 MB/s
    random writers 5.1 MB/s 5.3 MB/s 5.4 MB/s

    So it seems that, so far, this was a useful exercise.

    Signed-off-by: Frank Mayhar
    Signed-off-by: "Theodore Ts'o"

    Frank Mayhar
     

14 Sep, 2008

1 commit


20 Aug, 2008

1 commit

  • This patch modified the writepage/write_begin credit calculation for
    extent files, to use the credits caculation helper function.

    The current calculation of how many index/leaf blocks should be
    accounted is too conservetive, it always considered the worse case,
    where the tree level is 5, and in the case of multiple chunk
    allocations, it always assumed no blocks were dirtied in common across
    the allocations. This path uses the accurate depth of the inode with
    some extras to calculate the index blocks, and also less conservative in
    the case of multiple allocation accounting.

    Signed-off-by: Mingming Cao
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     

30 Apr, 2008

1 commit