11 Jan, 2012

1 commit


10 Jan, 2012

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    ext2/3/4: delete unneeded includes of module.h
    ext{3,4}: Fix potential race when setversion ioctl updates inode
    udf: Mark LVID buffer as uptodate before marking it dirty
    ext3: Don't warn from writepage when readonly inode is spotted after error
    jbd: Remove j_barrier mutex
    reiserfs: Force inode evictions before umount to avoid crash
    reiserfs: Fix quota mount option parsing
    udf: Treat symlink component of type 2 as /
    udf: Fix deadlock when converting file from in-ICB one to normal one
    udf: Cleanup calling convention of inode_getblk()
    ext2: Fix error handling on inode bitmap corruption
    ext3: Fix error handling on inode bitmap corruption
    ext3: replace ll_rw_block with other functions
    ext3: NULL dereference in ext3_evict_inode()
    jbd: clear revoked flag on buffers before a new transaction started
    ext3: call ext3_mark_recovery_complete() when recovery is really needed

    Linus Torvalds
     

09 Jan, 2012

2 commits

  • Delete any instances of include module.h that were not strictly
    required. In the case of ext2, the declaration of MODULE_LICENSE
    etc. were in inode.c but the module_init/exit were in super.c, so
    relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it
    consistent with ext3 and ext4 at the same time.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Jan Kara

    Paul Gortmaker
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
    Kconfig: acpi: Fix typo in comment.
    misc latin1 to utf8 conversions
    devres: Fix a typo in devm_kfree comment
    btrfs: free-space-cache.c: remove extra semicolon.
    fat: Spelling s/obsolate/obsolete/g
    SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
    tools/power turbostat: update fields in manpage
    mac80211: drop spelling fix
    types.h: fix comment spelling for 'architectures'
    typo fixes: aera -> area, exntension -> extension
    devices.txt: Fix typo of 'VMware'.
    sis900: Fix enum typo 'sis900_rx_bufer_status'
    decompress_bunzip2: remove invalid vi modeline
    treewide: Fix comment and string typo 'bufer'
    hyper-v: Update MAINTAINERS
    treewide: Fix typos in various parts of the kernel, and fix some comments.
    clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
    gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
    leds: Kconfig: Fix typo 'D2NET_V2'
    sound: Kconfig: drop unknown symbol ARCH_CLPS7500
    ...

    Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
    kconfig additions, close to removed commented-out old ones)

    Linus Torvalds
     

05 Jan, 2012

1 commit


29 Dec, 2011

4 commits


14 Dec, 2011

3 commits

  • If a page has been read into memory and never been written, it has no
    buffers, but we should handle the page in truncate or punch hole.

    VFS code of writing operations has handled holes correctly, so this
    patch removes the code handling holes in writing operations.

    Signed-off-by: Yongqiang Yang
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Yongqiang Yang
     
  • If there is an unwritten but clean buffer in a page and there is a
    dirty buffer after the buffer, then mpage_submit_io does not write the
    dirty buffer out. As a result, da_writepages loops forever.

    This patch fixes the problem by checking dirty flag.

    Signed-off-by: Yongqiang Yang
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Yongqiang Yang
     
  • If the pte mapping in generic_perform_write() is unmapped between
    iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
    "copied" parameter to ->end_write can be zero. ext4 couldn't cope with
    it with delayed allocations enabled. This skips the i_disksize
    enlargement logic if copied is zero and no new data was appeneded to
    the inode.

    gdb> bt
    #0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
    08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
    #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
    xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
    #2 0xffffffff810d97f1 in generic_perform_write (iocb=, iov=, nr_segs=, pos=0x108000, ppos=0xffff88001e26be40, count=, written=0x0) at mm/filemap.c:2440
    #3 generic_file_buffered_write (iocb=, iov=, nr_segs=, p\
    os=0x108000, ppos=0xffff88001e26be40, count=, written=0x0) at mm/filemap.c:2482
    #4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
    xffff88001e26be40) at mm/filemap.c:2600
    #5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=, pos=) at mm/filemap.c:2632
    #6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
    t fs/ext4/file.c:136
    #7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=, len=, \
    ppos=0xffff88001e26bf48) at fs/read_write.c:406
    #8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960

    , count=0x4\
    000, pos=0xffff88001e26bf48) at fs/read_write.c:435
    #9 0xffffffff8113816c in sys_write (fd=, buf=0x1ec2960
    , count=0x\
    4000) at fs/read_write.c:487
    #10
    #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
    #12 0x0000000000000000 in ?? ()
    gdb> print offset
    $22 = 0xffffffffffffffff
    gdb> print idx
    $23 = 0xffffffff
    gdb> print inode->i_blkbits
    $24 = 0xc
    gdb> up
    #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
    xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
    2512 if (ext4_da_should_update_i_disksize(page, end)) {
    gdb> print start
    $25 = 0x0
    gdb> print end
    $26 = 0xffffffffffffffff
    gdb> print pos
    $27 = 0x108000
    gdb> print new_i_size
    $28 = 0x108000
    gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
    $29 = 0xd9000
    gdb> down
    2467 for (i = 0; i < idx; i++)
    gdb> print i
    $30 = 0xd44acbee

    This is 100% reproducible with some autonuma development code tuned in
    a very aggressive manner (not normal way even for knumad) which does
    "exotic" changes to the ptes. It wouldn't normally trigger but I don't
    see why it can't happen normally if the page is added to swap cache in
    between the two faults leading to "copied" being zero (which then
    hangs in ext4). So it should be fixed. Especially possible with lumpy
    reclaim (albeit disabled if compaction is enabled) as that would
    ignore the young bits in the ptes.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Andrea Arcangeli
     

12 Dec, 2011

1 commit

  • We need to make sure iocb->private is cleared *before* we put the
    io_end structure on i_completed_io_list. Otherwise fsync() could
    potentially run on another CPU and free the iocb structure out from
    under us.

    Reported-by: Kent Overstreet
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Theodore Ts'o
     

06 Dec, 2011

1 commit


02 Dec, 2011

1 commit


25 Nov, 2011

1 commit

  • ext4_end_io_dio() queues io_end->work and then clears iocb->private;
    however, io_end->work calls aio_complete() which frees the iocb
    object. If that slab object gets reallocated, then ext4_end_io_dio()
    can end up clearing someone else's iocb->private, this use-after-free
    can cause a leak of a struct ext4_io_end_t structure.

    Detected and tested with slab poisoning.

    [ Note: Can also reproduce using 12 fio's against 12 file systems with the
    following configuration file:

    [global]
    direct=1
    ioengine=libaio
    iodepth=1
    bs=4k
    ba=4k
    size=128m

    [create]
    filename=${TESTDIR}
    rw=write

    -- tytso ]

    Google-Bug-Id: 5354697
    Signed-off-by: Tejun Heo
    Signed-off-by: "Theodore Ts'o"
    Reported-by: Kent Overstreet
    Tested-by: Kent Overstreet
    Cc: stable@kernel.org

    Tejun Heo
     

22 Nov, 2011

1 commit


08 Nov, 2011

1 commit


07 Nov, 2011

1 commit

  • * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Add a 'reason' to wb_writeback_work
    writeback: send work item to queue_io, move_expired_inodes
    writeback: trace event balance_dirty_pages
    writeback: trace event bdi_dirty_ratelimit
    writeback: fix ppc compile warnings on do_div(long long, unsigned long)
    writeback: per-bdi background threshold
    writeback: dirty position control - bdi reserve area
    writeback: control dirty pause time
    writeback: limit max dirty pause time
    writeback: IO-less balance_dirty_pages()
    writeback: per task dirty rate limit
    writeback: stabilize bdi->dirty_ratelimit
    writeback: dirty rate control
    writeback: add bg_threshold parameter to __bdi_update_bandwidth()
    writeback: dirty position control
    writeback: account per-bdi accumulated dirtied pages

    Linus Torvalds
     

03 Nov, 2011

2 commits

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue:
    vfs: add d_prune dentry operation
    vfs: protect i_nlink
    filesystems: add set_nlink()
    filesystems: add missing nlink wrappers
    logfs: remove unnecessary nlink setting
    ocfs2: remove unnecessary nlink setting
    jfs: remove unnecessary nlink setting
    hypfs: remove unnecessary nlink setting
    vfs: ignore error on forced remount
    readlinkat: ensure we return ENOENT for the empty pathname for normal lookups
    vfs: fix dentry leak in simple_fill_super()

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (97 commits)
    jbd2: Unify log messages in jbd2 code
    jbd/jbd2: validate sb->s_first in journal_get_superblock()
    ext4: let ext4_ext_rm_leaf work with EXT_DEBUG defined
    ext4: fix a syntax error in ext4_ext_insert_extent when debugging enabled
    ext4: fix a typo in struct ext4_allocation_context
    ext4: Don't normalize an falloc request if it can fit in 1 extent.
    ext4: remove comments about extent mount option in ext4_new_inode()
    ext4: let ext4_discard_partial_buffers handle unaligned range correctly
    ext4: return ENOMEM if find_or_create_pages fails
    ext4: move vars to local scope in ext4_discard_partial_page_buffers_no_lock()
    ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten
    ext4: optimize locking for end_io extent conversion
    ext4: remove unnecessary call to waitqueue_active()
    ext4: Use correct locking for ext4_end_io_nolock()
    ext4: fix race in xattr block allocation path
    ext4: trace punch_hole correctly in ext4_ext_map_blocks
    ext4: clean up AGGRESSIVE_TEST code
    ext4: move variables to their scope
    ext4: fix quota accounting during migration
    ext4: migrate cleanup
    ...

    Linus Torvalds
     

02 Nov, 2011

1 commit


01 Nov, 2011

5 commits


31 Oct, 2011

1 commit

  • This creates a new 'reason' field in a wb_writeback_work
    structure, which unambiguously identifies who initiates
    writeback activity. A 'wb_reason' enumeration has been
    added to writeback.h, to enumerate the possible reasons.

    The 'writeback_work_class' and tracepoint event class and
    'writeback_queue_io' tracepoints are updated to include the
    symbolic 'reason' in all trace events.

    And the 'writeback_inodes_sbXXX' family of routines has had
    a wb_stats parameter added to them, so callers can specify
    why writeback is being started.

    Acked-by: Jan Kara
    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: Wu Fengguang

    Curt Wohlgemuth
     

26 Oct, 2011

1 commit


25 Oct, 2011

1 commit

  • EOFBLOCK_FL should be updated if called w/o FALLOCATE_FL_KEEP_SIZE
    Currently it happens only if new extent was allocated.

    TESTCASE:
    fallocate test_file -n -l4096
    fallocate test_file -l4096
    Last fallocate cmd has updated size, but keept EOFBLOCK_FL set. And
    fsck will complain about that.

    Also remove ping pong in ext4_fallocate() in case of new extents,
    where ext4_ext_map_blocks() clear EOFBLOCKS bit, and later
    ext4_falloc_update_inode() restore it again.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     

21 Oct, 2011

2 commits


18 Oct, 2011

1 commit

  • Add block plug for ext4 .writepages. Though ext4 .writepages
    already handles request merge very well, block plug is still
    helpful to reduce block lock contention.

    Signed-off-by: Shaohua Li
    Signed-off-by: "Theodore Ts'o"

    Shaohua Li
     

09 Oct, 2011

1 commit


22 Sep, 2011

1 commit

  • * 'for-linus' of git://git.kernel.dk/linux-block:
    floppy: use del_timer_sync() in init cleanup
    blk-cgroup: be able to remove the record of unplugged device
    block: Don't check QUEUE_FLAG_SAME_COMP in __blk_complete_request
    mm: Add comment explaining task state setting in bdi_forker_thread()
    mm: Cleanup clearing of BDI_pending bit in bdi_forker_thread()
    block: simplify force plug flush code a little bit
    block: change force plug flush call order
    block: Fix queue_flag update when rq_affinity goes from 2 to 1
    block: separate priority boosting from REQ_META
    block: remove READ_META and WRITE_META
    xen-blkback: fixed indentation and comments
    xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.

    Linus Torvalds
     

10 Sep, 2011

5 commits

  • Currently, there exists a race between delayed allocated writes and
    the writeback when bigalloc feature is in use. The race was because we
    wanted to determine what blocks in a cluster are under delayed
    allocation and we were using buffer_delayed(bh) check for it. But, the
    writeback codepath clears this bit without any synchronization which
    resulted in a race and an ext4 warning similar to:

    EXT4-fs (ram1): ext4_da_update_reserve_space: ino 13, used 1 with only 0
    reserved data blocks

    The race existed in two places.
    (1) between ext4_find_delalloc_range() and ext4_map_blocks() when called from
    writeback code path.
    (2) between ext4_find_delalloc_range() and ext4_da_get_block_prep() (where
    buffer_delayed(bh) is set.

    To fix (1), this patch introduces a new buffer_head state bit -
    BH_Da_Mapped. This bit is set under the protection of
    EXT4_I(inode)->i_data_sem when we have actually mapped the delayed
    allocated blocks during the writeout time. We can now reliably check
    for this bit inside ext4_find_delalloc_range() to determine whether
    the reservation for the blocks have already been claimed or not.

    To fix (2), it was necessary to set buffer_delay(bh) under the
    protection of i_data_sem. So, I extracted the very beginning of
    ext4_map_blocks into a new function - ext4_da_map_blocks() - and
    performed the required setting of bh_delay bit and the quota
    reservation under the protection of i_data_sem. These two fixes makes
    the checking of buffer_delay(bh) and buffer_da_mapped(bh) consistent,
    thus removing the race.

    Tested: I was able to reproduce the problem by running 'dd' and
    'fsync' in parallel. Also, xfstests sometimes used to reproduce this
    race. After the fix both my test and xfstests were successful and no
    race (warning message) was observed.

    Google-Bug-Id: 4997027

    Signed-off-by: Aditya Kali
    Signed-off-by: "Theodore Ts'o"

    Aditya Kali
     
  • This patch adds some tracepoints in ext4/extents.c and updates a tracepoint in
    ext4/inode.c.

    Tested: Built and ran the kernel and verified that these tracepoints work.
    Also ran xfstests.

    Signed-off-by: Aditya Kali
    Signed-off-by: "Theodore Ts'o"

    Aditya Kali
     
  • Rename the function so it is more clear what is going on. Also rename
    the various variables so it's clearer what's happening.

    Also fix a missing blocks to cluster conversion when reading the
    number of reserved blocks for root.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • This function really claims a number of free clusters, not blocks, so
    rename it so it's clearer what's going on.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • This function really counts the free clusters reported in the block
    group descriptors, so rename it to reduce confusion.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o