03 Nov, 2011

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue:
    vfs: add d_prune dentry operation
    vfs: protect i_nlink
    filesystems: add set_nlink()
    filesystems: add missing nlink wrappers
    logfs: remove unnecessary nlink setting
    ocfs2: remove unnecessary nlink setting
    jfs: remove unnecessary nlink setting
    hypfs: remove unnecessary nlink setting
    vfs: ignore error on forced remount
    readlinkat: ensure we return ENOENT for the empty pathname for normal lookups
    vfs: fix dentry leak in simple_fill_super()

    Linus Torvalds
     

02 Nov, 2011

1 commit


29 Oct, 2011

2 commits

  • The tmp_inode should have same uid/gid as the original inode.
    Otherwise new metadata blocks will be accounted to wrong quota-id,
    which will result in a quota leak after the inode migration is
    completed.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • This patch cleanup code a bit, actual logic not changed
    - Move current block pointer to migrate_structure, let's all
    walk info will be in one structure.
    - Get rid of usless null ind-block ptr checks, caller already
    does that check.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     

10 Sep, 2011

1 commit


03 May, 2011

1 commit


31 Mar, 2011

1 commit


22 Feb, 2011

1 commit


11 Jan, 2011

1 commit


28 Oct, 2010

1 commit


14 Jun, 2010

1 commit


17 May, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

02 Mar, 2010

1 commit

  • Set i_nlink to zero for temporary inode from very beginning.
    otherwise we may fail to start new journal handle and this
    inode will be unreferenced but with i_nlink == 1
    Since we hold inode reference it can not be pruned.

    Also add missed journal_start retval check.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     

25 Jan, 2010

1 commit

  • At several places we modify EXT4_I(inode)->i_state without holding
    i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage,
    ext4_do_update_inode, ...). These modifications are racy and we can
    lose updates to i_state. So convert handling of i_state to use bitops
    which are atomic.

    Cc: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

09 Dec, 2009

1 commit

  • Currently all quota block reservation macros contains hard-coded "2"
    aka MAXQUOTAS value. This is no good because in some places it is not
    obvious to understand what does this digit represent. Let's introduce
    new macro with self descriptive name.

    Signed-off-by: Dmitry Monakhov
    Acked-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     

23 Nov, 2009

1 commit

  • Add the facility for ext4_forget() to be called from
    ext4_free_blocks(). This simplifies the code in a large number of
    places, and centralizes most of the work of calling ext4_forget() into
    a single place.

    Also fix a bug in the extents migration code; it wasn't calling
    ext4_forget() when releasing the indirect blocks during the
    conversion. As a result, if the system cashed during or shortly after
    the extents migration, and the released indirect blocks get reused as
    data blocks, the journal replay would corrupt the data blocks. With
    this new patch, fixing this bug was as simple as adding the
    EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks().

    Signed-off-by: "Theodore Ts'o"
    Cc: "Aneesh Kumar K.V"

    Theodore Ts'o
     

29 Sep, 2009

1 commit

  • When writing into an unitialized extent via direct I/O, and the direct
    I/O doesn't exactly cover the unitialized extent, split the extent
    into uninitialized and initialized extents before submitting the I/O.
    This avoids needing to deal with an ENOSPC error in the end_io
    callback that gets used for direct I/O.

    When the IO is complete, the written extent will be marked as initialized.

    Singed-Off-By: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     

17 Sep, 2009

1 commit

  • EXT4_EXT_MIGRATE is only intended to be used for an in-memory flag,
    and the hex value assigned to it collides with FS_DIRECTIO_FL (which
    is also stored in i_flags). There's no reason for the
    EXT4_EXT_MIGRATE bit to be stored in i_flags, so we switch it to use
    i_state instead.

    Cc: "Aneesh Kumar K.V"
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

26 Aug, 2009

1 commit

  • We need to unlock the new inode before iput. This patch fixes the
    following warning when calling chattr +e to migrate a file to use
    extents. It also fixes problems in when e4defrag attempts to
    defragment an inode.

    [ 470.400044] ------------[ cut here ]------------
    [ 470.400065] WARNING: at fs/inode.c:1210 generic_delete_inode+0x65/0x16a()
    [ 470.400072] Hardware name: N/A
    .....
    ...
    [ 470.400353] Pid: 4451, comm: chattr Not tainted 2.6.31-rc7-red-debug #4
    [ 470.400359] Call Trace:
    [ 470.400372] [] warn_slowpath_common+0x77/0x8f
    [ 470.400385] [] warn_slowpath_null+0xf/0x11
    [ 470.400395] [] generic_delete_inode+0x65/0x16a
    [ 470.400405] [] generic_drop_inode+0x17/0x1bd
    [ 470.400413] [] iput+0x61/0x65
    [ 470.400455] [] ext4_ext_migrate+0x5eb/0x66a [ext4]
    [ 470.400492] [] ext4_ioctl+0x340/0x756 [ext4]
    [ 470.400507] [] vfs_ioctl+0x1d/0x82
    [ 470.400517] [] do_vfs_ioctl+0x483/0x4c9
    [ 470.400527] [] ? trace_hardirqs_on+0xd/0xf
    [ 470.400537] [] sys_ioctl+0x51/0x74
    [ 470.400549] [] system_call_fastpath+0x16/0x1b
    [ 470.400557] ---[ end trace ab85723542352dac ]---

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     

13 Jun, 2009

2 commits

  • Enhance the inode allocator to take a goal inode number as a
    paremeter; if it is specified, it takes precedence over Orlov or
    parent directory inode allocation algorithms.

    The extents migration function uses the goal inode number so that the
    extent trees allocated the migration function use the correct flex_bg.
    In the future, the goal inode functionality will also be used to
    allocate an adjacent inode for the extended attributes.

    Also, for testing purposes the goal inode number can be specified via
    /sys/fs/{dev}/inode_goal. This can be useful for testing inode
    allocation beyond 2^32 blocks on very large filesystems.

    Signed-off-by: Andreas Dilger
    Signed-off-by: "Theodore Ts'o"

    Andreas Dilger
     
  • Instead of using a random number to determine the goal parent grop for
    the Orlov top directories, use a hash of the directory name. This
    allows for repeatable results when trying to benchmark filesystem
    layout algorithms.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

16 Feb, 2009

1 commit


07 Jan, 2009

2 commits

  • This mount option is largely superfluous, and in fact the way it was
    implemented was buggy; if a filesystem which did not have the extents
    feature flag was mounted -o extents, the filesystem would attempt to
    create and use extents-based file even though the extents feature flag
    was not eabled. The simplest thing to do is to nuke the mount option
    entirely. It's not all that useful to force the non-creation of new
    extent-based files if the filesystem can support it.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • A few weeks ago I posted a patch for discussion that allowed ext4 to run
    without a journal. Since that time I've integrated the excellent
    comments from Andreas and fixed several serious bugs. We're currently
    running with this patch and generating some performance numbers against
    both ext2 (with backported reservations code) and ext4 with and without
    a journal. It just so happens that running without a journal is
    slightly faster for most everything.

    We did
    iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2

    which creates 4 threads, each of which create and do reads and writes on
    a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
    to bypass the page cache. Results:

    ext2 ext4, default ext4, no journal
    initial writes 13.0 MB/s 15.4 MB/s 15.7 MB/s
    rewrites 13.1 MB/s 15.6 MB/s 15.9 MB/s
    reads 15.2 MB/s 16.9 MB/s 17.2 MB/s
    re-reads 15.3 MB/s 16.9 MB/s 17.2 MB/s
    random readers 5.6 MB/s 5.6 MB/s 5.7 MB/s
    random writers 5.1 MB/s 5.3 MB/s 5.4 MB/s

    So it seems that, so far, this was a useful exercise.

    Signed-off-by: Frank Mayhar
    Signed-off-by: "Theodore Ts'o"

    Frank Mayhar
     

14 Sep, 2008

1 commit


20 Aug, 2008

1 commit

  • This patch modified the writepage/write_begin credit calculation for
    extent files, to use the credits caculation helper function.

    The current calculation of how many index/leaf blocks should be
    accounted is too conservetive, it always considered the worse case,
    where the tree level is 5, and in the case of multiple chunk
    allocations, it always assumed no blocks were dirtied in common across
    the allocations. This path uses the accurate depth of the inode with
    some extras to calculate the index blocks, and also less conservative in
    the case of multiple allocation accounting.

    Signed-off-by: Mingming Cao
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     

30 Apr, 2008

1 commit


29 Apr, 2008

1 commit

  • Fail migrate if we allocated new blocks via mmap write.

    If we write to holes in the file via mmap, we end up allocating
    new blocks. This block allocation happens without taking inode->i_mutex.
    Since migrate is protected by i_mutex and migrate expects that no
    new blocks get allocated during migrate, fail migrate if new blocks
    get allocated.

    We can't take inode->i_mutex in the mmap write path because that
    would result in a locking order violation between i_mutex and mmap_sem.
    Also adding a separate rw_sempahore for protection is really high overhead
    for a rare operation such as migrate.

    Signed-off-by: Aneesh Kumar K.V
    Acked-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     

26 Feb, 2008

1 commit


10 Feb, 2008

1 commit

  • In order to prevent a circular locking dependency when an unlink
    operation is racing with an ext4 migration, we delay taking i_data_sem
    until just before switch the inode format, and use i_mutex to prevent
    writes and truncates during the first part of the migration operation.

    Acked-by: Jan Kara
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     

05 Feb, 2008

1 commit

  • For fast symbolic links, the file content is stored in the i_block[]
    array, which is not compatible with the new file extents format.
    e2fsck reports error on such files because EXTENTS_FL is set.
    Don't set the EXTENTS_FL flag when creating fast symlinks.

    In the case of file migration, skip fast symbolic links.

    Signed-off-by: Valerie Clement
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Valerie Clement
     

29 Jan, 2008

2 commits