22 Nov, 2011

2 commits


08 Nov, 2011

1 commit


07 Nov, 2011

3 commits

  • The BKL is gone, these annotations are useless.

    Signed-off-by: Richard Weinberger
    Signed-off-by: "Theodore Ts'o"

    Richard Weinberger
     
  • This avoids a confusing failure in the init scripts when the
    /etc/fstab has data=writeback or data=journal but the file system does
    not have a journal. So check for this case explicitly, and warn the
    user that we are ignoring the (pointless, since they have no journal)
    data=* mount option.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Add a 'reason' to wb_writeback_work
    writeback: send work item to queue_io, move_expired_inodes
    writeback: trace event balance_dirty_pages
    writeback: trace event bdi_dirty_ratelimit
    writeback: fix ppc compile warnings on do_div(long long, unsigned long)
    writeback: per-bdi background threshold
    writeback: dirty position control - bdi reserve area
    writeback: control dirty pause time
    writeback: limit max dirty pause time
    writeback: IO-less balance_dirty_pages()
    writeback: per task dirty rate limit
    writeback: stabilize bdi->dirty_ratelimit
    writeback: dirty rate control
    writeback: add bg_threshold parameter to __bdi_update_bandwidth()
    writeback: dirty position control
    writeback: account per-bdi accumulated dirtied pages

    Linus Torvalds
     

03 Nov, 2011

2 commits

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue:
    vfs: add d_prune dentry operation
    vfs: protect i_nlink
    filesystems: add set_nlink()
    filesystems: add missing nlink wrappers
    logfs: remove unnecessary nlink setting
    ocfs2: remove unnecessary nlink setting
    jfs: remove unnecessary nlink setting
    hypfs: remove unnecessary nlink setting
    vfs: ignore error on forced remount
    readlinkat: ensure we return ENOENT for the empty pathname for normal lookups
    vfs: fix dentry leak in simple_fill_super()

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (97 commits)
    jbd2: Unify log messages in jbd2 code
    jbd/jbd2: validate sb->s_first in journal_get_superblock()
    ext4: let ext4_ext_rm_leaf work with EXT_DEBUG defined
    ext4: fix a syntax error in ext4_ext_insert_extent when debugging enabled
    ext4: fix a typo in struct ext4_allocation_context
    ext4: Don't normalize an falloc request if it can fit in 1 extent.
    ext4: remove comments about extent mount option in ext4_new_inode()
    ext4: let ext4_discard_partial_buffers handle unaligned range correctly
    ext4: return ENOMEM if find_or_create_pages fails
    ext4: move vars to local scope in ext4_discard_partial_page_buffers_no_lock()
    ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten
    ext4: optimize locking for end_io extent conversion
    ext4: remove unnecessary call to waitqueue_active()
    ext4: Use correct locking for ext4_end_io_nolock()
    ext4: fix race in xattr block allocation path
    ext4: trace punch_hole correctly in ext4_ext_map_blocks
    ext4: clean up AGGRESSIVE_TEST code
    ext4: move variables to their scope
    ext4: fix quota accounting during migration
    ext4: migrate cleanup
    ...

    Linus Torvalds
     

02 Nov, 2011

4 commits


01 Nov, 2011

9 commits


31 Oct, 2011

4 commits

  • Now that we are doing the locking correctly, we need to grab the
    i_completed_io_lock() twice per end_io. We can clean this up by
    removing the structure from the i_complted_io_list, and use this as
    the locking mechanism to prevent ext4_flush_completed_IO() racing
    against ext4_end_io_work(), instead of clearing the
    EXT4_IO_END_UNWRITTEN in io->flag.

    In addition, if the ext4_convert_unwritten_extents() returns an error,
    we no longer keep the end_io structure on the linked list. This
    doesn't help, because it tends to lock up the file system and wedges
    the system. That's one way to call attention to the problem, but it
    doesn't help the overall robustness of the system.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • The usage of waitqueue_active() is not necessary, and introduces (I
    believe) a hard-to-hit race.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • We must hold i_completed_io_lock when manipulating anything on the
    i_completed_io_list linked list. This includes io->lock, which we
    were checking in ext4_end_io_nolock().

    So move this check to ext4_end_io_work(). This also has the bonus of
    avoiding extra work if it is already done without needing to take the
    mutex.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • This creates a new 'reason' field in a wb_writeback_work
    structure, which unambiguously identifies who initiates
    writeback activity. A 'wb_reason' enumeration has been
    added to writeback.h, to enumerate the possible reasons.

    The 'writeback_work_class' and tracepoint event class and
    'writeback_queue_io' tracepoints are updated to include the
    symbolic 'reason' in all trace events.

    And the 'writeback_inodes_sbXXX' family of routines has had
    a wb_stats parameter added to them, so callers can specify
    why writeback is being started.

    Acked-by: Jan Kara
    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: Wu Fengguang

    Curt Wohlgemuth
     

29 Oct, 2011

7 commits

  • Ceph users reported that when using Ceph on ext4, the filesystem
    would often become corrupted, containing inodes with incorrect
    i_blocks counters.

    I managed to reproduce this with a very hacked-up "streamtest"
    binary from the Ceph tree.

    Ceph is doing a lot of xattr writes, to out-of-inode blocks.
    There is also another thread which does sync_file_range and close,
    of the same files. The problem appears to happen due to this race:

    sync/flush thread xattr-set thread
    ----------------- ----------------

    do_writepages ext4_xattr_set
    ext4_da_writepages ext4_xattr_set_handle
    mpage_da_map_blocks ext4_xattr_block_set
    set DELALLOC_RESERVE
    ext4_new_meta_blocks
    ext4_mb_new_blocks
    if (!i_delalloc_reserved_flag)
    vfs_dq_alloc_block
    ext4_get_blocks
    down_write(i_data_sem)
    set i_delalloc_reserved_flag
    ...
    up_write(i_data_sem)
    if (i_delalloc_reserved_flag)
    vfs_dq_alloc_block_nofail

    In other words, the sync/flush thread pops in and sets
    i_delalloc_reserved_flag on the inode, which makes the xattr thread
    think that it's in a delalloc path in ext4_new_meta_blocks(),
    and add the block for a second time, after already having added
    it once in the !i_delalloc_reserved_flag case in ext4_mb_new_blocks

    The real problem is that we shouldn't be using the DELALLOC_RESERVED
    state flag, and instead we should be passing
    EXT4_GET_BLOCKS_DELALLOC_RESERVE down to ext4_map_blocks() instead of
    using an inode state flag. We'll fix this for now with using
    i_data_sem to prevent this race, but this is really not the right way
    to fix things.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Eric Sandeen
     
  • When ext4_ext_map_blocks() is called by punch_hole, trace should
    trace blocks punched out.

    Signed-off-by: Yongqiang Yang
    Signed-off-by: "Theodore Ts'o"

    Yongqiang Yang
     
  • Signed-off-by: Yongqiang Yang
    Signed-off-by: "Theodore Ts'o"

    Yongqiang Yang
     
  • Signed-off-by: Yongqiang Yang
    Signed-off-by: "Theodore Ts'o"

    Yongqiang Yang
     
  • The tmp_inode should have same uid/gid as the original inode.
    Otherwise new metadata blocks will be accounted to wrong quota-id,
    which will result in a quota leak after the inode migration is
    completed.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • This patch cleanup code a bit, actual logic not changed
    - Move current block pointer to migrate_structure, let's all
    walk info will be in one structure.
    - Get rid of usless null ind-block ptr checks, caller already
    does that check.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue: (21 commits)
    leases: fix write-open/read-lease race
    nfs: drop unnecessary locking in llseek
    ext4: replace cut'n'pasted llseek code with generic_file_llseek_size
    vfs: add generic_file_llseek_size
    vfs: do (nearly) lockless generic_file_llseek
    direct-io: merge direct_io_walker into __blockdev_direct_IO
    direct-io: inline the complete submission path
    direct-io: separate map_bh from dio
    direct-io: use a slab cache for struct dio
    direct-io: rearrange fields in dio/dio_submit to avoid holes
    direct-io: fix a wrong comment
    direct-io: separate fields only used in the submission path from struct dio
    vfs: fix spinning prevention in prune_icache_sb
    vfs: add a comment to inode_permission()
    vfs: pass all mask flags check_acl and posix_acl_permission
    vfs: add hex format for MAY_* flag values
    vfs: indicate that the permission functions take all the MAY_* flags
    compat: sync compat_stats with statfs.
    vfs: add "device" tag to /proc/self/mountstats
    cleanup: vfs: small comment fix for block_invalidatepage
    ...

    Fix up trivial conflict in fs/gfs2/file.c (llseek changes)

    Linus Torvalds
     

28 Oct, 2011

1 commit


27 Oct, 2011

2 commits

  • ext4_ext_insert_extent() (respectively ext4_ext_insert_index())
    was using EXT_MAX_EXTENT() (resp. EXT_MAX_INDEX()) to determine
    how many entries needed to be moved beyond the insertion point.
    In practice this means that (320 - I) * 24 bytes were memmove()'d
    when I is the insertion point, rather than (#entries - I) * 24 bytes.

    This patch uses EXT_LAST_EXTENT() (resp. EXT_LAST_INDEX()) instead
    to only move existing entries. The code flow is also simplified
    slightly to highlight similarities and reduce code duplication in
    the insertion logic.

    This patch reduces system CPU consumption by over 25% on a 4kB
    synchronous append DIO write workload when used with the
    pre-2.6.39 x86_64 memmove() implementation. With the much faster
    2.6.39 memmove() implementation we still see a decrease in
    system CPU usage between 2% and 7%.

    Note that the ext_debug() output changes with this patch, splitting
    some log information between entries. Users of the ext_debug() output
    should note that the "move %d" units changed from reporting the number
    of bytes moved to reporting the number of entries moved.

    Signed-off-by: Eric Gouriou
    Signed-off-by: "Theodore Ts'o"

    Eric Gouriou
     
  • This patch introduces a fast path in ext4_ext_convert_to_initialized()
    for the case when the conversion can be performed by transferring
    the newly initialized blocks from the uninitialized extent into
    an adjacent initialized extent. Doing so removes the expensive
    invocations of memmove() which occur during extent insertion and
    the subsequent merge.

    In practice this should be the common case for clients performing
    append writes into files pre-allocated via
    fallocate(FALLOC_FL_KEEP_SIZE). In such a workload performed via
    direct IO and when using a suboptimal implementation of memmove()
    (x86_64 prior to the 2.6.39 rewrite), this patch reduces kernel CPU
    consumption by 32%.

    Two new trace points are added to ext4_ext_convert_to_initialized()
    to offer visibility into its operations. No exit trace point has
    been added due to the multiplicity of return points. This can be
    revisited once the upstream cleanup is backported.

    Signed-off-by: Eric Gouriou
    Signed-off-by: "Theodore Ts'o"

    Eric Gouriou
     

26 Oct, 2011

5 commits