13 Oct, 2013

1 commit

  • If we take the 2nd retry path in ext4_expand_extra_isize_ea, we
    potentionally return from the function without having freed these
    allocations. If we don't do the return, we over-write the previous
    allocation pointers, so we leak either way.

    Spotted with Coverity.

    [ Fixed by tytso to set is and bs to NULL after freeing these
    pointers, in case in the retry loop we later end up triggering an
    error causing a jump to cleanup, at which point we could have a double
    free bug. -- Ted ]

    Signed-off-by: Dave Jones
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Eric Sandeen
    Cc: stable@vger.kernel.org

    Dave Jones
     

10 Apr, 2013

1 commit


19 Feb, 2013

1 commit

  • Currently when new xattr block is created or released we we would call
    dquot_free_block() or dquot_alloc_block() respectively, among the else
    decrementing or incrementing the number of blocks assigned to the
    inode by one block.

    This however does not work for bigalloc file system because we always
    allocate/free the whole cluster so we have to count with that in
    dquot_free_block() and dquot_alloc_block() as well.

    Use the clusters-to-blocks conversion EXT4_C2B() when passing number of
    blocks to the dquot_alloc/free functions to fix the problem.

    The problem has been revealed by xfstests #117 (and possibly others).

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Eric Sandeen
    Cc: stable@vger.kernel.org

    Lukas Czerner
     

10 Feb, 2013

1 commit

  • Operations which modify extended attributes may need extra journal
    credits if inline data is used, since there is a chance that some
    extended attributes may need to get pushed to an external attribute
    block.

    Changes to reflect this was made in xattr.c, but they were missed in
    fs/ext4/acl.c. To fix this, abstract the calculation of the number of
    credits needed for xattr operations to an inline function defined in
    ext4_jbd2.h, and use it in acl.c and xattr.c.

    Also move the function declarations used in inline.c from xattr.h
    (where they are non-obviously hidden, and caused problems since
    ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
    them to ext4.h.

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Tao Ma
    Reviewed-by: Jan Kara

    Theodore Ts'o
     

09 Feb, 2013

1 commit

  • So we can better understand what bits of ext4 are responsible for
    long-running jbd2 handles, use jbd2__journal_start() so we can pass
    context information for logging purposes.

    The recommended way for finding the longer-running handles is:

    T=/sys/kernel/debug/tracing
    EVENT=$T/events/jbd2/jbd2_handle_stats
    echo "interval > 5" > $EVENT/filter
    echo 1 > $EVENT/enable

    ./run-my-fs-benchmark

    cat $T/trace > /tmp/problem-handles

    This will list handles that were active for longer than 20ms. Having
    longer-running handles is bad, because a commit started at the wrong
    time could stall for those 20+ milliseconds, which could delay an
    fsync() or an O_SYNC operation. Here is an example line from the
    trace file describing a handle which lived on for 311 jiffies, or over
    1.2 seconds:

    postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
    tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
    dirtied_blocks 0

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

13 Jan, 2013

2 commits


11 Dec, 2012

2 commits

  • Not all architectures (in particular, sparc64) have empty_zero_page.
    So instead of copying from empty_zero_page, use memset to clear the
    inline data by signalling to ext4_xattr_set_entry() via a magic
    pointer value, EXT4_ZERO_ATTR_VALUE, which is defined by casting -1 to
    a pointer.

    This fixes a build failure on sparc64, and the memset() should be more
    efficient than using memcpy() anyway.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • Now we that store data in the inode, in case we need to store some
    xattrs and inode doesn't have enough space, Andreas suggested that we
    should keep the xattr(metadata) in and data should be pushed out. So
    this patch does the work.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     

05 Dec, 2012

1 commit


09 Nov, 2012

1 commit

  • ext4_handle_release_buffer() was intended to remove journal
    write access from a buffer, but it doesn't actually do anything
    at all other than add a BUFFER_TRACE point, but it's not reliably
    used for that either. Remove all the associated dead code.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Carlos Maiolino

    Eric Sandeen
     

10 Jul, 2012

1 commit

  • In xattr block operation, we use h_refcount to indicate whether the
    xattr block is shared among many inodes. And xattr block csum uses
    s_csum_seed if it is shared and i_csum_seed if it belongs to
    one inode. But this has a problem. So consider the block is shared
    first bewteen inode A and B, and B has some xattr update and CoW
    the xattr block. When it updates the *old* xattr block(because
    of the h_refcount change) and calls ext4_xattr_release_block, we
    has no idea that inode A is the real owner of the *old* xattr
    block and we can't use the i_csum_seed of inode A either in xattr
    block csum calculation. And I don't think we have an easy way to
    find inode A.

    So this patch just removes the tricky i_csum_seed and we now uses
    s_csum_seed every time for the xattr block csum. The corresponding
    patch for the e2fsprogs will be sent in another patch.

    This is spotted by xfstests 117.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"
    Acked-by: Darrick J. Wong

    Tao Ma
     

30 Apr, 2012

1 commit


20 Mar, 2012

1 commit


21 Feb, 2012

2 commits

  • Processes hang forever on a sync-mounted ext2 file system that
    is mounted with the ext4 module (default in Fedora 16).

    I can reproduce this reliably by mounting an ext2 partition with
    "-o sync" and opening a new file an that partition with vim. vim
    will hang in "D" state forever. The same happens on ext4 without
    a journal.

    I am attaching a small patch here that solves this issue for me.
    In the sync mounted case without a journal,
    ext4_handle_dirty_metadata() may call sync_dirty_buffer(), which
    can't be called with buffer lock held.

    Also move mb_cache_entry_release inside lock to avoid race
    fixed previously by 8a2bfdcb ext[34]: EA block reference count racing fix
    Note too that ext2 fixed this same problem in 2006 with
    b2f49033 [PATCH] fix deadlock in ext2

    Signed-off-by: Martin.Wilck@ts.fujitsu.com
    [sandeen@redhat.com: move mb_cache_entry_release before unlock, edit commit msg]
    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • We could return directly from ext4_xattr_check_block(). Thus, we
    shouldn't need to define a 'error' variable.

    Signed-off-by: Zheng Liu
    Signed-off-by: "Theodore Ts'o"

    Zheng Liu
     

29 Oct, 2011

1 commit

  • Ceph users reported that when using Ceph on ext4, the filesystem
    would often become corrupted, containing inodes with incorrect
    i_blocks counters.

    I managed to reproduce this with a very hacked-up "streamtest"
    binary from the Ceph tree.

    Ceph is doing a lot of xattr writes, to out-of-inode blocks.
    There is also another thread which does sync_file_range and close,
    of the same files. The problem appears to happen due to this race:

    sync/flush thread xattr-set thread
    ----------------- ----------------

    do_writepages ext4_xattr_set
    ext4_da_writepages ext4_xattr_set_handle
    mpage_da_map_blocks ext4_xattr_block_set
    set DELALLOC_RESERVE
    ext4_new_meta_blocks
    ext4_mb_new_blocks
    if (!i_delalloc_reserved_flag)
    vfs_dq_alloc_block
    ext4_get_blocks
    down_write(i_data_sem)
    set i_delalloc_reserved_flag
    ...
    up_write(i_data_sem)
    if (i_delalloc_reserved_flag)
    vfs_dq_alloc_block_nofail

    In other words, the sync/flush thread pops in and sets
    i_delalloc_reserved_flag on the inode, which makes the xattr thread
    think that it's in a delalloc path in ext4_new_meta_blocks(),
    and add the block for a second time, after already having added
    it once in the !i_delalloc_reserved_flag case in ext4_mb_new_blocks

    The real problem is that we shouldn't be using the DELALLOC_RESERVED
    state flag, and instead we should be passing
    EXT4_GET_BLOCKS_DELALLOC_RESERVE down to ext4_map_blocks() instead of
    using an inode state flag. We'll fix this for now with using
    i_data_sem to prevent this race, but this is really not the right way
    to fix things.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Eric Sandeen
     

26 Oct, 2011

1 commit

  • ext4_mark_iloc_dirty() says:

    * The caller must have previously called ext4_reserve_inode_write().
    * Give this, we know that the caller already has write access to iloc->bh.

    ext4_xattr_set_handle, however, just open-codes it. May as well use
    the helper function for consistency.

    No bug here, just tidiness.

    (Note: on cleanup path, ext4_reserve_inode_write sets
    the bh to NULL if it returns an error, and brelse() of
    a null bh is handled gracefully).

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     

25 May, 2011

1 commit

  • This patch adds an allocation request flag to the ext4_has_free_blocks
    function which enables the use of reserved blocks. This will allow a
    punch hole to proceed even if the disk is full. Punching a hole may
    require additional blocks to first split the extents.

    Because ext4_has_free_blocks is a low level function, the flag needs
    to be passed down through several functions listed below:

    ext4_ext_insert_extent
    ext4_ext_create_new_leaf
    ext4_ext_grow_indepth
    ext4_ext_split
    ext4_ext_new_meta_block
    ext4_mb_new_blocks
    ext4_claim_free_blocks
    ext4_has_free_blocks

    [ext4 punch hole patch series 1/5 v7]

    Signed-off-by: Allison Henderson
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Mingming Cao

    Allison Henderson
     

21 Mar, 2011

1 commit

  • There are two wrapper functions which do exactly the same thing:
    ext4_journal_release_buffer(), and ext4_handle_release_buffer(). In
    addition, ext4_xattr_block_set() calls jbd2_journal_release_buffer()
    directly.

    Unify all of the code to use ext4_handle_release_buffer(), and get rid
    of ext4_journal_release_buffer().

    Signed-off-by: Amir Goldstein
    Signed-off-by: "Theodore Ts'o"

    Amir Goldstein
     

22 Feb, 2011

1 commit


11 Jan, 2011

2 commits


28 Oct, 2010

1 commit


11 Aug, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
    no need for list_for_each_entry_safe()/resetting with superblock list
    Fix sget() race with failing mount
    vfs: don't hold s_umount over close_bdev_exclusive() call
    sysv: do not mark superblock dirty on remount
    sysv: do not mark superblock dirty on mount
    btrfs: remove junk sb_dirt change
    BFS: clean up the superblock usage
    AFFS: wait for sb synchronization when needed
    AFFS: clean up dirty flag usage
    cifs: truncate fallout
    mbcache: fix shrinker function return value
    mbcache: Remove unused features
    add f_flags to struct statfs(64)
    pass a struct path to vfs_statfs
    update VFS documentation for method changes.
    All filesystems that need invalidate_inode_buffers() are doing that explicitly
    convert remaining ->clear_inode() to ->evict_inode()
    Make ->drop_inode() just return whether inode needs to be dropped
    fs/inode.c:clear_inode() is gone
    fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
    ...

    Fix up trivial conflicts in fs/nilfs2/super.c

    Linus Torvalds
     

10 Aug, 2010

1 commit

  • The mbcache code was written to support a variable number of indexes,
    but all the existing users use exactly one index. Simplify to code to
    support only that case.

    There are also no users of the cache entry free operation, and none of
    the users keep extra data in cache entries. Remove those features as
    well.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

12 Jun, 2010

1 commit

  • We don't need to set s_dirt in most of the ext4 code when journaling
    is enabled. In ext3/4 some of the summary statistics for # of free
    inodes, blocks, and directories are calculated from the per-block
    group statistics when the file system is mounted or unmounted. As a
    result the superblock doesn't have to be updated, either via the
    journal or by setting s_dirt. There are a few exceptions, most
    notably when resizing the file system, where the superblock needs to
    be modified --- and in that case it should be done as a journalled
    operation if possible, and s_dirt set only in no-journal mode.

    This patch will optimize out some unneeded disk writes when using ext4
    with a journal.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

28 May, 2010

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
    ext4: Make fsync sync new parent directories in no-journal mode
    ext4: Drop whitespace at end of lines
    ext4: Fix compat EXT4_IOC_ADD_GROUP
    ext4: Conditionally define compat ioctl numbers
    tracing: Convert more ext4 events to DEFINE_EVENT
    ext4: Add new tracepoints to track mballoc's buddy bitmap loads
    ext4: Add a missing trace hook
    ext4: restart ext4_ext_remove_space() after transaction restart
    ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted
    ext4: Avoid crashing on NULL ptr dereference on a filesystem error
    ext4: Use bitops to read/modify i_flags in struct ext4_inode_info
    ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE()
    ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks()
    ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks()
    ext4: Use our own write_cache_pages()
    ext4: Show journal_checksum option
    ext4: Fix for ext4_mb_collect_stats()
    ext4: check for a good block group before loading buddy pages
    ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate
    ext4: Remove extraneous newlines in ext4_msg() calls
    ...

    Fixed up trivial conflict in fs/ext4/fsync.c

    Linus Torvalds
     

22 May, 2010

1 commit


17 May, 2010

2 commits


06 Mar, 2010

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
    quota: stop using QUOTA_OK / NO_QUOTA
    dquot: cleanup dquot initialize routine
    dquot: move dquot initialization responsibility into the filesystem
    dquot: cleanup dquot drop routine
    dquot: move dquot drop responsibility into the filesystem
    dquot: cleanup dquot transfer routine
    dquot: move dquot transfer responsibility into the filesystem
    dquot: cleanup inode allocation / freeing routines
    dquot: cleanup space allocation / freeing routines
    ext3: add writepage sanity checks
    ext3: Truncate allocated blocks if direct IO write fails to update i_size
    quota: Properly invalidate caches even for filesystems with blocksize < pagesize
    quota: generalize quota transfer interface
    quota: sb_quota state flags cleanup
    jbd: Delay discarding buffers in journal_unmap_buffer
    ext3: quota_write cross block boundary behaviour
    quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
    quota: split out compat_sys_quotactl support from quota.c
    quota: split out netlink notification support from quota.c
    quota: remove invalid optimization from quota_sync_all
    ...

    Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c

    Linus Torvalds
     

05 Mar, 2010

1 commit

  • Get rid of the alloc_space, free_space, reserve_space, claim_space and
    release_rsv dquot operations - they are always called from the filesystem
    and if a filesystem really needs their own (which none currently does)
    it can just call into it's own routine directly.

    Move shared logic into the common __dquot_alloc_space,
    dquot_claim_space_nodirty and __dquot_free_space low-level methods,
    and rationalize the wrappers around it to move as much as possible
    code into the common block for CONFIG_QUOTA vs not. Also rename
    all these helpers to be named dquot_* instead of vfs_dq_*.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     

16 Feb, 2010

2 commits


25 Jan, 2010

1 commit

  • At several places we modify EXT4_I(inode)->i_state without holding
    i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage,
    ext4_do_update_inode, ...). These modifications are racy and we can
    lose updates to i_state. So convert handling of i_state to use bitops
    which are atomic.

    Cc: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

23 Dec, 2009

1 commit

  • b_entry_name and buffer are initially NULL, are initialized within a loop
    to the result of calling kmalloc, and are freed at the bottom of this loop.
    The loop contains gotos to cleanup, which also frees b_entry_name and
    buffer. Some of these gotos are before the reinitializations of
    b_entry_name and buffer. To maintain the invariant that b_entry_name and
    buffer are NULL at the top of the loop, and thus acceptable arguments to
    kfree, these variables are now set to NULL after the kfrees.

    This seems to be the simplest solution. A more complicated solution
    would be to introduce more labels in the error handling code at the end of
    the function.

    A simplified version of the semantic match that finds this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    @r@
    identifier E;
    expression E1;
    iterator I;
    statement S;
    @@

    *kfree(E);
    ... when != E = E1
    when != I(E,...) S
    when != &E
    *kfree(E);
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: "Theodore Ts'o"

    Julia Lawall
     

17 Dec, 2009

1 commit

  • Add a flags argument to struct xattr_handler and pass it to all xattr
    handler methods. This allows using the same methods for multiple
    handlers, e.g. for the ACL methods which perform exactly the same action
    for the access and default ACLs, just using a different underlying
    attribute. With a little more groundwork it'll also allow sharing the
    methods for the regular user/trusted/secure handlers in extN, ocfs2 and
    jffs2 like it's already done for xfs in this patch.

    Also change the inode argument to the handlers to a dentry to allow
    using the handlers mechnism for filesystems that require it later,
    e.g. cifs.

    [with GFS2 bits updated by Steven Whitehouse ]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: James Morris
    Acked-by: Joel Becker
    Signed-off-by: Al Viro

    Christoph Hellwig
     

23 Nov, 2009

1 commit

  • Add the facility for ext4_forget() to be called from
    ext4_free_blocks(). This simplifies the code in a large number of
    places, and centralizes most of the work of calling ext4_forget() into
    a single place.

    Also fix a bug in the extents migration code; it wasn't calling
    ext4_forget() when releasing the indirect blocks during the
    conversion. As a result, if the system cashed during or shortly after
    the extents migration, and the released indirect blocks get reused as
    data blocks, the journal replay would corrupt the data blocks. With
    this new patch, fixing this bug was as simple as adding the
    EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks().

    Signed-off-by: "Theodore Ts'o"
    Cc: "Aneesh Kumar K.V"

    Theodore Ts'o
     

16 Nov, 2009

1 commit

  • ext4_xattr_set_handle() was zeroing out an inode outside
    of journaling constraints; this is one of the accesses that
    was causing the crc errors in journal replay as seen in
    kernel.org bugzilla #14354.

    Reviewed-by: Andreas Dilger
    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Eric Sandeen