15 May, 2013

1 commit

  • Pull ext4 update from Ted Ts'o:
    "Fixed regressions (two stability regressions and a performance
    regression) introduced during the 3.10-rc1 merge window.

    Also included is a bug fix relating to allocating blocks after
    resizing an ext3 file system when using the ext4 file system driver"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    jbd,jbd2: fix oops in jbd2_journal_put_journal_head()
    ext4: revert "ext4: use io_end for multiple bios"
    ext4: limit group search loop for non-extent files
    ext4: fix fio regression

    Linus Torvalds
     

12 May, 2013

1 commit


08 May, 2013

1 commit

  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

12 Apr, 2013

3 commits

  • Currently noone cleared buffer_uninit flag. This results in writeback
    needlessly marking io_end as needing extent conversion scanning extent
    tree for extents to convert. So clear the buffer_uninit flag once the
    buffer is submitted for IO and the flag is transformed into
    EXT4_IO_END_UNWRITTEN flag.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu

    Jan Kara
     
  • Change writeback path to create just one io_end structure for the
    extent to which we submit IO and share it among bios writing that
    extent. This prevents needless splitting and joining of unwritten
    extents when they cannot be submitted as a single bio.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Dmitry Monakhov
    Reviewed-by: Zheng Liu

    Jan Kara
     
  • So far ext4_bio_write_page() attached all the pages to ext4_io_end
    structure. This makes that structure pretty heavy (1 KB for pointers
    + 16 bytes per page attached to the bio). Also later we would like to
    share ext4_io_end structure among several bios in case IO to a single
    extent needs to be split among several bios and pointing to pages from
    ext4_io_end makes this complex.

    We remove page pointers from ext4_io_end and use pointers from bio
    itself instead. This isn't as easy when blocksize < pagesize because
    then we can have several bios in flight for a single page and we have
    to be careful when to call end_page_writeback(). However this is a
    known problem already solved by block_write_full_page() /
    end_buffer_async_write() so we mimic its behavior here. We mark
    buffers going to disk with BH_Async_Write flag and in
    ext4_bio_end_io() we check whether there are any buffers with
    BH_Async_Write flag left. If there are not, we can call
    end_page_writeback().

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Dmitry Monakhov
    Reviewed-by: Zheng Liu

    Jan Kara
     

20 Mar, 2013

1 commit

  • Commit 84c17543ab56 (ext4: move work from io_end to inode) triggered a
    regression when running xfstest #270 when the file system is mounted
    with dioread_nolock.

    The problem is that after ext4_evict_inode() calls ext4_ioend_wait(),
    this guarantees that last io_end structure has been freed, but it does
    not guarantee that the workqueue structure, which was moved into the
    inode by commit 84c17543ab56, is actually finished. Once
    ext4_flush_completed_IO() calls ext4_free_io_end() on CPU #1, this
    will allow ext4_ioend_wait() to return on CPU #2, at which point the
    evict_inode() codepath can race against the workqueue code on CPU #1
    accessing EXT4_I(inode)->i_unwritten_work to find the next item of
    work to do.

    Fix this by calling cancel_work_sync() in ext4_ioend_wait(), which
    will be renamed ext4_ioend_shutdown(), since it is only used by
    ext4_evict_inode(). Also, move the call to ext4_ioend_shutdown()
    until after truncate_inode_pages() and filemap_write_and_wait() are
    called, to make sure all dirty pages have been written back and
    flushed from the page cache first.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] cwq_activate_delayed_work+0x3b/0x7e
    *pdpt = 0000000030bc3001 *pde = 0000000000000000
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
    Modules linked in:
    Pid: 6, comm: kworker/u:0 Not tainted 3.8.0-rc3-00013-g84c1754-dirty #91 Bochs Bochs
    EIP: 0060:[] EFLAGS: 00010046 CPU: 0
    EIP is at cwq_activate_delayed_work+0x3b/0x7e
    EAX: 00000000 EBX: 00000000 ECX: f505fe54 EDX: 00000000
    ESI: ed5b697c EDI: 00000006 EBP: f64b7e8c ESP: f64b7e84
    DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    CR0: 8005003b CR2: 00000000 CR3: 30bc2000 CR4: 000006f0
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: ffff0ff0 DR7: 00000400
    Process kworker/u:0 (pid: 6, ti=f64b6000 task=f64b4160 task.ti=f64b6000)
    Stack:
    f505fe00 00000006 f64b7e9c c01de3d7 f6435540 00000003 f64b7efc c01def1d
    f6435540 00000002 00000000 0000008a c16d0808 c040a10b c16d07d8 c16d08b0
    f505fe00 c16d0780 00000000 00000000 ee153df4 c1ce4a30 c17d0e30 00000000
    Call Trace:
    [] cwq_dec_nr_in_flight+0x71/0xfb
    [] process_one_work+0x5d8/0x637
    [] ? ext4_end_bio+0x300/0x300
    [] worker_thread+0x249/0x3ef
    [] kthread+0xd8/0xeb
    [] ? manage_workers+0x4bb/0x4bb
    [] ? trace_hardirqs_on+0x27/0x37
    [] ret_from_kernel_thread+0x1b/0x28
    [] ? __init_kthread_worker+0x71/0x71
    Code: 01 83 15 ac ff 6c c1 00 31 db 89 c6 8b 00 a8 04 74 12 89 c3 30 db 83 05 b0 ff 6c c1 01 83 15 b4 ff 6c c1 00 89 f0 e8 42 ff ff ff 13 89 f0 83 05 b8 ff 6c c1
    6c c1 00 31 c9 83
    EIP: [] cwq_activate_delayed_work+0x3b/0x7e SS:ESP 0068:f64b7e84
    CR2: 0000000000000000
    ---[ end trace a1923229da53d8a4 ]---

    Signed-off-by: "Theodore Ts'o"
    Cc: Jan Kara

    Theodore Ts'o
     

30 Jan, 2013

1 commit

  • Running AIO is pinning inode in memory using file reference. Once AIO
    is completed using aio_complete(), file reference is put and inode can
    be freed from memory. So we have to be sure that calling aio_complete()
    is the last thing we do with the inode.

    CC: stable@vger.kernel.org
    Reviewed-by: Carlos Maiolino
    Acked-by: Jeff Moyer
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

29 Jan, 2013

2 commits

  • Remove unused variable flags from dump_completed_IO(). The code is
    only exercised when EXT4FS_DEBUG is defined.

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu

    Lukas Czerner
     
  • So far ext4_bio_writepage() unconditionally cleared dirty bit on all
    buffers underlying the page. That implicitely assumes we can write all
    buffers. So far that is true because callers call into
    ext4_bio_writepage() make sure all buffers in the page are mapped but:

    a) it's a data corruption bug waiting to happen
    b) in data=ordered mode when blocksize < pagesize we do need to write
    pages that may have only some of dirty buffers mapped.

    So change ext4_bio_writepage() to skip buffers that cannot be written without
    clearing their dirty bit.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

28 Jan, 2013

4 commits

  • The function splices i_completed_io_list to its private list
    first. From that moment on we don't need any lock for working with
    io_end structures because all io_end structure on the list are only
    our own. So we can remove the other two lists in the function and free
    io_end immediately after we are done with it.

    CC: Dmitry Monakhov
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • It does not make much sense to have struct work in ext4_io_end_t
    because we always use it for only one ext4_io_end_t per inode (the
    first one in the i_completed_io list). So just move the structure to
    inode itself. This also allows for a small simplification in
    processing io_end structures.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • When we cannot write a page we should use redirty_page_for_writepage()
    instead of plain set_page_dirty(). That tells writeback code we have
    problems, redirties only the page (redirtying buffers is not needed),
    and updates mm accounting of failed page writes.

    Also move clearing of buffer dirty flag after io_submit_add_bh(). At that
    moment we are sure buffer will be going to disk.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • Currently we sometimes used block_write_full_page() and sometimes
    ext4_bio_write_page() for writeback (depending on mount options and call
    path). Let's always use ext4_bio_write_page() to simplify things a bit.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

29 Nov, 2012

1 commit

  • Previously, ext4_extents.h was being included at the end of ext4.h,
    which was bad for a number of reasons: (a) it was not being included
    in the expected place, and (b) it caused the header to be included
    multiple times. There were #ifdef's to prevent this from causing any
    problems, but it still was unnecessary.

    By moving the function declarations that were in ext4_extents.h to
    ext4.h, which is standard practice for where the function declarations
    for the rest of ext4.h can be found, we can remove ext4_extents.h from
    being included in ext4.h at all, and then we can only include
    ext4_extents.h where it is needed in ext4's source files.

    It should be possible to move a few more things into ext4.h, and
    further reduce the number of source files that need to #include
    ext4_extents.h, but that's a cleanup for another day.

    Reported-by: Sachin Kamat
    Reported-by: Wei Yongjun
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

09 Nov, 2012

1 commit


05 Oct, 2012

1 commit

  • BUG #1) All places where we call ext4_flush_completed_IO are broken
    because buffered io and DIO/AIO goes through three stages
    1) submitted io,
    2) completed io (in i_completed_io_list) conversion pended
    3) finished io (conversion done)
    And by calling ext4_flush_completed_IO we will flush only
    requests which were in (2) stage, which is wrong because:
    1) punch_hole and truncate _must_ wait for all outstanding unwritten io
    regardless to it's state.
    2) fsync and nolock_dio_read should also wait because there is
    a time window between end_page_writeback() and ext4_add_complete_io()
    As result integrity fsync is broken in case of buffered write
    to fallocated region:
    fsync blkdev_completion
    ->filemap_write_and_wait_range
    ->ext4_end_bio
    ->end_page_writeback
    ext4_flush_completed_IO
    sees empty i_completed_io_list but pended
    conversion still exist
    ->ext4_add_complete_io

    BUG #2) Race window becomes wider due to the 'ext4: completed_io
    locking cleanup V4' patch series

    This patch make following changes:
    1) ext4_flush_completed_io() now first try to flush completed io and when
    wait for any outstanding unwritten io via ext4_unwritten_wait()
    2) Rename function to more appropriate name.
    3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to
    prevent endless wait

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Jan Kara

    Dmitry Monakhov
     

29 Sep, 2012

3 commits

  • Current unwritten extent conversion state-machine is very fuzzy.
    - For unknown reason it performs conversion under i_mutex. What for?
    My diagnosis:
    We already protect extent tree with i_data_sem, truncate and punch_hole
    should wait for DIO, so the only data we have to protect is end_io->flags
    modification, but only flush_completed_IO and end_io_work modified this
    flags and we can serialize them via i_completed_io_lock.

    Currently all these games with mutex_trylock result in the following deadlock
    truncate: kworker:
    ext4_setattr ext4_end_io_work
    mutex_lock(i_mutex)
    inode_dio_wait(inode) ->BLOCK
    DEADLOCKflags modification
    is protected by ei->ext4_complete_io_lock

    Full list of changes:
    - Move all completion end_io related routines to page-io.c in order to improve
    logic locality
    - Move open coded logic from various xx_end_xx routines to ext4_add_complete_io()
    - remove EXT4_IO_END_FSYNC
    - Improve SMP scalability by removing useless i_mutex which does not
    protect io->flags anymore.
    - Reduce lock contention on i_completed_io_lock by optimizing list walk.
    - Rename ext4_end_io_nolock to end4_end_io and make it static
    - Check flush completion status to ext4_ext_punch_hole(). Because it is
    not good idea to punch blocks from corrupted inode.

    Changes since V3 (in request to Jan's comments):
    Fall back to active flush_completed_IO() approach in order to prevent
    performance issues with nolocked DIO reads.
    Changes since V2:
    Fix use-after-free caused by race truncate vs end_io_work

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • ext4_set_io_unwritten_flag() will increment i_unwritten counter, so
    once we mark end_io with EXT4_END_IO_UNWRITTEN we have to revert it back
    on error path.

    - add missed error checks to prevent counter leakage
    - ext4_end_io_nolock() will clear EXT4_END_IO_UNWRITTEN flag to signal
    that conversion finished.
    - add BUG_ON to ext4_free_end_io() to prevent similar leakage in future.

    Visible effect of this bug is that unaligned aio_stress may deadlock

    Reviewed-by: Jan Kara
    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • AIO/DIO prefix is wrong because it account unwritten extents which
    also may be scheduled from buffered write endio

    Reviewed-by: Jan Kara
    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     

30 Mar, 2012

1 commit

  • This reverts commit b43d17f319f2c502b17139d1cf70731b2b62c644.

    Dave Jones reports that it causes lockups on his laptop, and his debug
    output showed a lot of processes hung waiting for page_writeback (or
    more commonly - processes hung waiting for a lock that was held during
    that writeback wait).

    The page_writeback hint made Ted suggest that Dave look at this commit,
    and Dave verified that reverting it makes his problems go away.

    Ted says:
    "That commit fixes a race which is seen when you write into fallocated
    (and hence uninitialized) disk blocks under *very* heavy memory
    pressure. Furthermore, although theoretically it could trigger under
    normal direct I/O writes, it only seems to trigger if you are issuing
    a huge number of AIO writes, such that a just-written page can get
    evicted from memory, and then read back into memory, before the
    workqueue has a chance to update the extent tree.

    This race has been around for a little over a year, and no one noticed
    until two months ago; it only happens under fairly exotic conditions,
    and in fact even after trying very hard to create a simple repro under
    lab conditions, we could only reproduce the problem and confirm the
    fix on production servers running MySQL on very fast PCIe-attached
    flash devices.

    Given that Dave was able to hit this problem pretty quickly, if we
    confirm that this commit is at fault, the only reasonable thing to do
    is to revert it IMO."

    Reported-and-tested-by: Dave Jones
    Acked-by: Theodore Ts'o
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

05 Mar, 2012

2 commits

  • We can clear PageWriteback on each page when the IO
    completes, but we can't release the references on the page
    until we convert any uninitialized extents.

    Without this patch, the use of the dioread_nolock mount
    option can break buffered writes, because extents may
    not be converted by the time a subsequent buffered read
    comes in; if the page is not in the page cache, a read
    will return zeros if the extent is still uninitialized.

    I tested this with a (temporary) patch that adds a call
    to msleep(1000) at the start of ext4_end_io_work(), to delay
    processing of each DIO-unwritten work queue item. With this
    msleep(), a simple workload of

    fallocate
    write
    fadvise
    read

    will fail without this patch, succeeds with it.

    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: "Theodore Ts'o"

    Curt Wohlgemuth
     
  • The following command line will leave the aio-stress process unkillable
    on an ext4 file system (in my case, mounted on /mnt/test):

    aio-stress -t 20 -s 10 -O -S -o 2 -I 1000 /mnt/test/aiostress.3561.4 /mnt/test/aiostress.3561.4.20 /mnt/test/aiostress.3561.4.19 /mnt/test/aiostress.3561.4.18 /mnt/test/aiostress.3561.4.17 /mnt/test/aiostress.3561.4.16 /mnt/test/aiostress.3561.4.15 /mnt/test/aiostress.3561.4.14 /mnt/test/aiostress.3561.4.13 /mnt/test/aiostress.3561.4.12 /mnt/test/aiostress.3561.4.11 /mnt/test/aiostress.3561.4.10 /mnt/test/aiostress.3561.4.9 /mnt/test/aiostress.3561.4.8 /mnt/test/aiostress.3561.4.7 /mnt/test/aiostress.3561.4.6 /mnt/test/aiostress.3561.4.5 /mnt/test/aiostress.3561.4.4 /mnt/test/aiostress.3561.4.3 /mnt/test/aiostress.3561.4.2

    This is using the aio-stress program from the xfstests test suite.
    That particular command line tells aio-stress to do random writes to
    20 files from 20 threads (one thread per file). The files are NOT
    preallocated, so you will get writes to random offsets within the
    file, thus creating holes and extending i_size. It also opens the
    file with O_DIRECT and O_SYNC.

    On to the problem. When an I/O requires unwritten extent conversion,
    it is queued onto the completed_io_list for the ext4 inode. Two code
    paths will pull work items from this list. The first is the
    ext4_end_io_work routine, and the second is ext4_flush_completed_IO,
    which is called via the fsync path (and O_SYNC handling, as well).
    There are two issues I've found in these code paths. First, if the
    fsync path beats the work routine to a particular I/O, the work
    routine will free the io_end structure! It does not take into account
    the fact that the io_end may still be in use by the fsync path. I've
    fixed this issue by adding yet another IO_END flag, indicating that
    the io_end is being processed by the fsync path.

    The second problem is that the work routine will make an assignment to
    io->flag outside of the lock. I have witnessed this result in a hang
    at umount. Moving the flag setting inside the lock resolved that
    problem.

    The problem was introduced by commit b82e384c7b ("ext4: optimize
    locking for end_io extent conversion"), which first appeared in 3.2.
    As such, the fix should be backported to that release (probably along
    with the unwritten extent conversion race fix).

    Signed-off-by: Jeff Moyer
    Signed-off-by: "Theodore Ts'o"
    CC: stable@kernel.org

    Jeff Moyer
     

21 Feb, 2012

1 commit

  • The following comment in ext4_end_io_dio caught my attention:

    /* XXX: probably should move into the real I/O completion handler */
    inode_dio_done(inode);

    The truncate code takes i_mutex, then calls inode_dio_wait. Because the
    ext4 code path above will end up dropping the mutex before it is
    reacquired by the worker thread that does the extent conversion, it
    seems to me that the truncate can happen out of order. Jan Kara
    mentioned that this might result in error messages in the system logs,
    but that should be the extent of the "damage."

    The fix is pretty straight-forward: don't call inode_dio_done until the
    extent conversion is complete.

    Reviewed-by: Jan Kara
    Signed-off-by: Jeff Moyer
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Jeff Moyer
     

10 Jan, 2012

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    ext2/3/4: delete unneeded includes of module.h
    ext{3,4}: Fix potential race when setversion ioctl updates inode
    udf: Mark LVID buffer as uptodate before marking it dirty
    ext3: Don't warn from writepage when readonly inode is spotted after error
    jbd: Remove j_barrier mutex
    reiserfs: Force inode evictions before umount to avoid crash
    reiserfs: Fix quota mount option parsing
    udf: Treat symlink component of type 2 as /
    udf: Fix deadlock when converting file from in-ICB one to normal one
    udf: Cleanup calling convention of inode_getblk()
    ext2: Fix error handling on inode bitmap corruption
    ext3: Fix error handling on inode bitmap corruption
    ext3: replace ll_rw_block with other functions
    ext3: NULL dereference in ext3_evict_inode()
    jbd: clear revoked flag on buffers before a new transaction started
    ext3: call ext3_mark_recovery_complete() when recovery is really needed

    Linus Torvalds
     

09 Jan, 2012

1 commit

  • Delete any instances of include module.h that were not strictly
    required. In the case of ext2, the declaration of MODULE_LICENSE
    etc. were in inode.c but the module_init/exit were in super.c, so
    relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it
    consistent with ext3 and ext4 at the same time.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Jan Kara

    Paul Gortmaker
     

14 Dec, 2011

1 commit


01 Nov, 2011

1 commit

  • EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
    should be done simultaneously since ext4_end_io_nolock always clear
    the flag and decrease the counter in the same time.

    We have found some bugs that the flag is set while leaving
    i_aiodio_unwritten unchanged(commit 32c80b32c053d). So this patch just tries
    to create a helper function to wrap them to avoid any future bug.
    The idea is inspired by Eric.

    Cc: Eric Sandeen
    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     

31 Oct, 2011

3 commits

  • Now that we are doing the locking correctly, we need to grab the
    i_completed_io_lock() twice per end_io. We can clean this up by
    removing the structure from the i_complted_io_list, and use this as
    the locking mechanism to prevent ext4_flush_completed_IO() racing
    against ext4_end_io_work(), instead of clearing the
    EXT4_IO_END_UNWRITTEN in io->flag.

    In addition, if the ext4_convert_unwritten_extents() returns an error,
    we no longer keep the end_io structure on the linked list. This
    doesn't help, because it tends to lock up the file system and wedges
    the system. That's one way to call attention to the problem, but it
    doesn't help the overall robustness of the system.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • The usage of waitqueue_active() is not necessary, and introduces (I
    believe) a hard-to-hit race.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • We must hold i_completed_io_lock when manipulating anything on the
    i_completed_io_list linked list. This includes io->lock, which we
    were checking in ext4_end_io_nolock().

    So move this check to ext4_end_io_work(). This also has the bonus of
    avoiding extra work if it is already done without needing to take the
    mutex.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     

31 Aug, 2011

1 commit

  • The i_mutex lock and flush_completed_IO() added by commit 2581fdc810
    in ext4_evict_inode() causes lockdep complaining about potential
    deadlock in several places. In most/all of these LOCKDEP complaints
    it looks like it's a false positive, since many of the potential
    circular locking cases can't take place by the time the
    ext4_evict_inode() is called; but since at the very least it may mask
    real problems, we need to address this.

    This change removes the flush_completed_IO() and i_mutex lock in
    ext4_evict_inode(). Instead, we take a different approach to resolve
    the software lockup that commit 2581fdc810 intends to fix. Rather
    than having ext4-dio-unwritten thread wait for grabing the i_mutex
    lock of an inode, we use mutex_trylock() instead, and simply requeue
    the work item if we fail to grab the inode's i_mutex lock.

    This should speed up work queue processing in general and also
    prevents the following deadlock scenario: During page fault,
    shrink_icache_memory is called that in turn evicts another inode B.
    Inode B has some pending io_end work so it calls ext4_ioend_wait()
    that waits for inode B's i_ioend_count to become zero. However, inode
    B's ioend work was queued behind some of inode A's ioend work on the
    same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
    thread on that cpu is processing inode A's ioend work, it tries to
    grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
    still hold before the page fault happened, we enter a deadlock.

    Signed-off-by: Jiaying Zhang
    Signed-off-by: "Theodore Ts'o"

    Jiaying Zhang
     

14 Aug, 2011

1 commit

  • EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
    should be done simultaneously since ext4_end_io_nolock always clear
    the flag and decrease the counter in the same time.

    We don't increase i_aiodio_unwritten when setting
    EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process
    to wait forever.

    Part of the patch came from Eric in his e-mail, but it doesn't fix the
    problem met by Michael actually.

    http://marc.info/?l=linux-ext4&m=131316851417460&w=2

    Reported-and-Tested-by: Michael Tokarev
    Signed-off-by: Eric Sandeen
    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Tao Ma
     

30 Jun, 2011

1 commit


01 May, 2011

1 commit

  • In the bio completion routine, we should not be setting
    PageUptodate at all -- it's set at sys_write() time, and is
    unaffected by success/failure of the write to disk.

    This can cause a page corruption bug when the file system's
    block size is less than the architecture's VM page size.

    if we have only written a single block -- we might end up
    setting the page's PageUptodate flag, indicating that page
    is completely read into memory, which may not be true.
    This could cause subsequent reads to get bad data.

    This commit also takes the opportunity to clean up error
    handling in ext4_end_bio(), and remove some extraneous code:

    - fixes ext4_end_bio() to set AS_EIO in the
    page->mapping->flags on error, which was left out by
    mistake. This is needed so that fsync() will
    return an error if there was an I/O error.
    - remove the clear_buffer_dirty() call on unmapped
    buffers for each page.
    - consolidate page/buffer error handling in a single
    section.

    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: "Theodore Ts'o"
    Reported-by: Jim Meyering
    Reported-by: Hugh Dickins
    Cc: Mingming Cao

    Curt Wohlgemuth
     

26 Mar, 2011

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (43 commits)
    ext4: fix a BUG in mb_mark_used during trim.
    ext4: unused variables cleanup in fs/ext4/extents.c
    ext4: remove redundant set_buffer_mapped() in ext4_da_get_block_prep()
    ext4: add more tracepoints and use dev_t in the trace buffer
    ext4: don't kfree uninitialized s_group_info members
    ext4: add missing space in printk's in __ext4_grp_locked_error()
    ext4: add FITRIM to compat_ioctl.
    ext4: handle errors in ext4_clear_blocks()
    ext4: unify the ext4_handle_release_buffer() api
    ext4: handle errors in ext4_rename
    jbd2: add COW fields to struct jbd2_journal_handle
    jbd2: add the b_cow_tid field to journal_head struct
    ext4: Initialize fsync transaction ids in ext4_new_inode()
    ext4: Use single thread to perform DIO unwritten convertion
    ext4: optimize ext4_bio_write_page() when no extent conversion is needed
    ext4: skip orphan cleanup if fs has unknown ROCOMPAT features
    ext4: use the nblocks arg to ext4_truncate_restart_trans()
    ext4: fix missing iput of root inode for some mount error paths
    ext4: make FIEMAP and delayed allocation play well together
    ext4: suppress verbose debugging information if malloc-debug is off
    ...

    Fi up conflicts in fs/ext4/super.c due to workqueue changes

    Linus Torvalds
     

10 Mar, 2011

1 commit

  • With the plugging now being explicitly controlled by the
    submitter, callers need not pass down unplugging hints
    to the block layer. If they want to unplug, it's because they
    manually plugged on their own - in which case, they should just
    unplug at will.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

01 Mar, 2011

1 commit

  • If no extent conversion is required, wake up any processes waiting for
    the page's writeback to be complete and free the ext4_io_end structure
    directly in ext4_end_bio() instead of dropping it on the linked list
    (which requires taking a spinlock to queue and dequeue the io_end
    structure), and waiting for the workqueue to do this work.

    This removes an extra scheduling delay before process waiting for an
    fsync() to complete gets woken up, and it also reduces the CPU
    overhead for a random write workload.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

28 Feb, 2011

1 commit

  • In ext4_bio_write_page(), if the memory allocation for the struct
    ext4_io_page fails, it returns with the page's PageWriteback flag set.
    This will end up causing the page not to skip writeback in
    WB_SYNC_NONE mode, and in WB_SYNC_ALL mode (i.e., on a sync, fsync, or
    umount) the writeback daemon will get stuck forever on the
    wait_on_page_writeback() function in write_cache_pages_da().

    Or, if journalling is enabled and the file gets deleted, it the
    journal thread can get stuck in journal_finish_inode_data_buffers()
    call to filemap_fdatawait().

    Another place where things can get hung up is in
    truncate_inode_pages(), called out of ext4_evict_inode().

    Fix this by not setting PageWriteback until after we have successfully
    allocated the struct ext4_io_page.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

22 Feb, 2011

1 commit