Doug / smarc-fsl-linux-kernel | Embedian Git Server

15 May, 2013

1 commit

b973425cb Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 update from Ted Ts'o:
"Fixed regressions (two stability regressions and a performance
regression) introduced during the 3.10-rc1 merge window.

Also included is a bug fix relating to allocating blocks after
resizing an ext3 file system when using the ext4 file system driver"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd,jbd2: fix oops in jbd2_journal_put_journal_head()
ext4: revert "ext4: use io_end for multiple bios"
ext4: limit group search loop for non-extent files
ext4: fix fio regression

Linus Torvalds
2013-05-15 00:30:54 +0800

12 May, 2013

1 commit

a549984b8 ext4: revert "ext4: use io_end for multiple bios" ... Browse Code »

This reverts commit 4eec708d263f0ee10861d69251708a225b64cac7.

Multiple users have reported crashes which is apparently caused by
this commit. Thanks to Dmitry Monakhov for bisecting it.

Signed-off-by: "Theodore Ts'o"
Cc: Dmitry Monakhov
Cc: Jan Kara

Theodore Ts'o
2013-05-12 07:07:42 +0800

08 May, 2013

1 commit

a27bb332c aio: don't include aio.h in sched.h ... Browse Code »

Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 11:16:25 +0800

12 Apr, 2013

3 commits

7b001d6a0 ext4: clear buffer_uninit flag when submitting IO ... Browse Code »

Currently noone cleared buffer_uninit flag. This results in writeback
needlessly marking io_end as needing extent conversion scanning extent
tree for extents to convert. So clear the buffer_uninit flag once the
buffer is submitted for IO and the flag is transformed into
EXT4_IO_END_UNWRITTEN flag.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Zheng Liu

Jan Kara
2013-04-12 12:03:19 +0800
4eec708d2 ext4: use io_end for multiple bios ... Browse Code »

Change writeback path to create just one io_end structure for the
extent to which we submit IO and share it among bios writing that
extent. This prevents needless splitting and joining of unwritten
extents when they cannot be submitted as a single bio.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Dmitry Monakhov
Reviewed-by: Zheng Liu

Jan Kara
2013-04-12 11:56:53 +0800
0058f9658 ext4: make ext4_bio_write_page() use BH_Async_Write flags ... Browse Code »

So far ext4_bio_write_page() attached all the pages to ext4_io_end
structure. This makes that structure pretty heavy (1 KB for pointers
+ 16 bytes per page attached to the bio). Also later we would like to
share ext4_io_end structure among several bios in case IO to a single
extent needs to be split among several bios and pointing to pages from
ext4_io_end makes this complex.

We remove page pointers from ext4_io_end and use pointers from bio
itself instead. This isn't as easy when blocksize < pagesize because
then we can have several bios in flight for a single page and we have
to be careful when to call end_page_writeback(). However this is a
known problem already solved by block_write_full_page() /
end_buffer_async_write() so we mimic its behavior here. We mark
buffers going to disk with BH_Async_Write flag and in
ext4_bio_end_io() we check whether there are any buffers with
BH_Async_Write flag left. If there are not, we can call
end_page_writeback().

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Dmitry Monakhov
Reviewed-by: Zheng Liu

Jan Kara
2013-04-12 11:48:32 +0800

20 Mar, 2013

1 commit

1ada47d94 ext4: fix ext4_evict_inode() racing against workqueue processing code ... Browse Code »

Commit 84c17543ab56 (ext4: move work from io_end to inode) triggered a
regression when running xfstest #270 when the file system is mounted
with dioread_nolock.

The problem is that after ext4_evict_inode() calls ext4_ioend_wait(),
this guarantees that last io_end structure has been freed, but it does
not guarantee that the workqueue structure, which was moved into the
inode by commit 84c17543ab56, is actually finished. Once
ext4_flush_completed_IO() calls ext4_free_io_end() on CPU #1, this
will allow ext4_ioend_wait() to return on CPU #2, at which point the
evict_inode() codepath can race against the workqueue code on CPU #1
accessing EXT4_I(inode)->i_unwritten_work to find the next item of
work to do.

Fix this by calling cancel_work_sync() in ext4_ioend_wait(), which
will be renamed ext4_ioend_shutdown(), since it is only used by
ext4_evict_inode(). Also, move the call to ext4_ioend_shutdown()
until after truncate_inode_pages() and filemap_write_and_wait() are
called, to make sure all dirty pages have been written back and
flushed from the page cache first.

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] cwq_activate_delayed_work+0x3b/0x7e
*pdpt = 0000000030bc3001 *pde = 0000000000000000
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in:
Pid: 6, comm: kworker/u:0 Not tainted 3.8.0-rc3-00013-g84c1754-dirty #91 Bochs Bochs
EIP: 0060:[] EFLAGS: 00010046 CPU: 0
EIP is at cwq_activate_delayed_work+0x3b/0x7e
EAX: 00000000 EBX: 00000000 ECX: f505fe54 EDX: 00000000
ESI: ed5b697c EDI: 00000006 EBP: f64b7e8c ESP: f64b7e84
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: 00000000 CR3: 30bc2000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process kworker/u:0 (pid: 6, ti=f64b6000 task=f64b4160 task.ti=f64b6000)
Stack:
f505fe00 00000006 f64b7e9c c01de3d7 f6435540 00000003 f64b7efc c01def1d
f6435540 00000002 00000000 0000008a c16d0808 c040a10b c16d07d8 c16d08b0
f505fe00 c16d0780 00000000 00000000 ee153df4 c1ce4a30 c17d0e30 00000000
Call Trace:
[] cwq_dec_nr_in_flight+0x71/0xfb
[] process_one_work+0x5d8/0x637
[] ? ext4_end_bio+0x300/0x300
[] worker_thread+0x249/0x3ef
[] kthread+0xd8/0xeb
[] ? manage_workers+0x4bb/0x4bb
[] ? trace_hardirqs_on+0x27/0x37
[] ret_from_kernel_thread+0x1b/0x28
[] ? __init_kthread_worker+0x71/0x71
Code: 01 83 15 ac ff 6c c1 00 31 db 89 c6 8b 00 a8 04 74 12 89 c3 30 db 83 05 b0 ff 6c c1 01 83 15 b4 ff 6c c1 00 89 f0 e8 42 ff ff ff 13 89 f0 83 05 b8 ff 6c c1
6c c1 00 31 c9 83
EIP: [] cwq_activate_delayed_work+0x3b/0x7e SS:ESP 0068:f64b7e84
CR2: 0000000000000000
---[ end trace a1923229da53d8a4 ]---

Signed-off-by: "Theodore Ts'o"
Cc: Jan Kara

Theodore Ts'o
2013-03-20 21:39:42 +0800

30 Jan, 2013

1 commit

091e26dfc ext4: fix possible use-after-free with AIO ... Browse Code »

Running AIO is pinning inode in memory using file reference. Once AIO
is completed using aio_complete(), file reference is put and inode can
be freed from memory. So we have to be sure that calling aio_complete()
is the last thing we do with the inode.

CC: stable@vger.kernel.org
Reviewed-by: Carlos Maiolino
Acked-by: Jeff Moyer
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-01-30 11:48:17 +0800

29 Jan, 2013

2 commits

cfa727548 ext4: remove unused variable flags ... Browse Code »

Remove unused variable flags from dump_completed_IO(). The code is
only exercised when EXT4FS_DEBUG is defined.

Signed-off-by: Lukas Czerner
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Zheng Liu

Lukas Czerner
2013-01-29 10:14:11 +0800
8a850c3fb ext4: Make ext4_bio_writepage() handle unprepared buffers ... Browse Code »

So far ext4_bio_writepage() unconditionally cleared dirty bit on all
buffers underlying the page. That implicitely assumes we can write all
buffers. So far that is true because callers call into
ext4_bio_writepage() make sure all buffers in the page are mapped but:

a) it's a data corruption bug waiting to happen
b) in data=ordered mode when blocksize < pagesize we do need to write
pages that may have only some of dirty buffers mapped.

So change ext4_bio_writepage() to skip buffers that cannot be written without
clearing their dirty bit.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-01-29 09:53:28 +0800

28 Jan, 2013

4 commits

002bd7fa3 ext4: simplify list handling in ext4_do_flush_completed_IO() ... Browse Code »

The function splices i_completed_io_list to its private list
first. From that moment on we don't need any lock for working with
io_end structures because all io_end structure on the list are only
our own. So we can remove the other two lists in the function and free
io_end immediately after we are done with it.

CC: Dmitry Monakhov
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-01-28 22:49:15 +0800
84c17543a ext4: move work from io_end to inode ... Browse Code »

It does not make much sense to have struct work in ext4_io_end_t
because we always use it for only one ext4_io_end_t per inode (the
first one in the i_completed_io list). So just move the structure to
inode itself. This also allows for a small simplification in
processing io_end structures.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-01-28 22:43:46 +0800
1ae48a635 ext4: use redirty_page_for_writepage() in ext4_bio_write_page() ... Browse Code »

When we cannot write a page we should use redirty_page_for_writepage()
instead of plain set_page_dirty(). That tells writeback code we have
problems, redirties only the page (redirtying buffers is not needed),
and updates mm accounting of failed page writes.

Also move clearing of buffer dirty flag after io_submit_add_bh(). At that
moment we are sure buffer will be going to disk.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-01-28 22:32:54 +0800
36ade451a ext4: Always use ext4_bio_write_page() for writeout ... Browse Code »

Currently we sometimes used block_write_full_page() and sometimes
ext4_bio_write_page() for writeback (depending on mount options and call
path). Let's always use ext4_bio_write_page() to simplify things a bit.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-01-28 22:30:52 +0800

29 Nov, 2012

1 commit

4a092d737 ext4: rationalize ext4_extents.h inclusion ... Browse Code »

Previously, ext4_extents.h was being included at the end of ext4.h,
which was bad for a number of reasons: (a) it was not being included
in the expected place, and (b) it caused the header to be included
multiple times. There were #ifdef's to prevent this from causing any
problems, but it still was unnecessary.

By moving the function declarations that were in ext4_extents.h to
ext4.h, which is standard practice for where the function declarations
for the rest of ext4.h can be found, we can remove ext4_extents.h from
being included in ext4.h at all, and then we can only include
ext4_extents.h where it is needed in ext4's source files.

It should be possible to move a few more things into ext4.h, and
further reduce the number of source files that need to #include
ext4_extents.h, but that's a cleanup for another day.

Reported-by: Sachin Kamat
Reported-by: Wei Yongjun
Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2012-11-29 02:03:30 +0800

09 Nov, 2012

1 commit

8d8c18257 ext4: use 'inode' variable that is already dereferenced ... Browse Code »

Tested: xfs tests

Reviewed-by: Zheng Liu
Signed-off-by: Anatol Pomozov
Signed-off-by: "Theodore Ts'o"

Anatol Pomozov
2012-11-09 03:53:35 +0800

05 Oct, 2012

1 commit

c278531d3 ext4: fix ext4_flush_completed_IO wait semantics ... Browse Code »

BUG #1) All places where we call ext4_flush_completed_IO are broken
because buffered io and DIO/AIO goes through three stages
1) submitted io,
2) completed io (in i_completed_io_list) conversion pended
3) finished io (conversion done)
And by calling ext4_flush_completed_IO we will flush only
requests which were in (2) stage, which is wrong because:
1) punch_hole and truncate _must_ wait for all outstanding unwritten io
regardless to it's state.
2) fsync and nolock_dio_read should also wait because there is
a time window between end_page_writeback() and ext4_add_complete_io()
As result integrity fsync is broken in case of buffered write
to fallocated region:
fsync blkdev_completion
->filemap_write_and_wait_range
->ext4_end_bio
->end_page_writeback
ext4_flush_completed_IO
sees empty i_completed_io_list but pended
conversion still exist
->ext4_add_complete_io

BUG #2) Race window becomes wider due to the 'ext4: completed_io
locking cleanup V4' patch series

This patch make following changes:
1) ext4_flush_completed_io() now first try to flush completed io and when
wait for any outstanding unwritten io via ext4_unwritten_wait()
2) Rename function to more appropriate name.
3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to
prevent endless wait

Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Jan Kara

Dmitry Monakhov
2012-10-05 23:31:55 +0800

29 Sep, 2012

3 commits

28a535f9a ext4: completed_io locking cleanup ... Browse Code »

Current unwritten extent conversion state-machine is very fuzzy.
- For unknown reason it performs conversion under i_mutex. What for?
My diagnosis:
We already protect extent tree with i_data_sem, truncate and punch_hole
should wait for DIO, so the only data we have to protect is end_io->flags
modification, but only flush_completed_IO and end_io_work modified this
flags and we can serialize them via i_completed_io_lock.

Currently all these games with mutex_trylock result in the following deadlock
truncate: kworker:
ext4_setattr ext4_end_io_work
mutex_lock(i_mutex)
inode_dio_wait(inode) ->BLOCK
DEADLOCKflags modification
is protected by ei->ext4_complete_io_lock

Full list of changes:
- Move all completion end_io related routines to page-io.c in order to improve
logic locality
- Move open coded logic from various xx_end_xx routines to ext4_add_complete_io()
- remove EXT4_IO_END_FSYNC
- Improve SMP scalability by removing useless i_mutex which does not
protect io->flags anymore.
- Reduce lock contention on i_completed_io_lock by optimizing list walk.
- Rename ext4_end_io_nolock to end4_end_io and make it static
- Check flush completion status to ext4_ext_punch_hole(). Because it is
not good idea to punch blocks from corrupted inode.

Changes since V3 (in request to Jan's comments):
Fall back to active flush_completed_IO() approach in order to prevent
performance issues with nolocked DIO reads.
Changes since V2:
Fix use-after-free caused by race truncate vs end_io_work

Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"

Dmitry Monakhov
2012-09-29 12:14:55 +0800
82e542291 ext4: fix unwritten counter leakage ... Browse Code »

ext4_set_io_unwritten_flag() will increment i_unwritten counter, so
once we mark end_io with EXT4_END_IO_UNWRITTEN we have to revert it back
on error path.

- add missed error checks to prevent counter leakage
- ext4_end_io_nolock() will clear EXT4_END_IO_UNWRITTEN flag to signal
that conversion finished.
- add BUG_ON to ext4_free_end_io() to prevent similar leakage in future.

Visible effect of this bug is that unaligned aio_stress may deadlock

Reviewed-by: Jan Kara
Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"

Dmitry Monakhov
2012-09-29 11:36:25 +0800
e27f41e1b ext4: give i_aiodio_unwritten a more appropriate name ... Browse Code »

AIO/DIO prefix is wrong because it account unwritten extents which
also may be scheduled from buffered write endio

Reviewed-by: Jan Kara
Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"

Dmitry Monakhov
2012-09-29 11:24:52 +0800

30 Mar, 2012

1 commit

6268b325c Revert "ext4: don't release page refs in ext4_end_bio()" ... Browse Code »

This reverts commit b43d17f319f2c502b17139d1cf70731b2b62c644.

Dave Jones reports that it causes lockups on his laptop, and his debug
output showed a lot of processes hung waiting for page_writeback (or
more commonly - processes hung waiting for a lock that was held during
that writeback wait).

The page_writeback hint made Ted suggest that Dave look at this commit,
and Dave verified that reverting it makes his problems go away.

Ted says:
"That commit fixes a race which is seen when you write into fallocated
(and hence uninitialized) disk blocks under *very* heavy memory
pressure. Furthermore, although theoretically it could trigger under
normal direct I/O writes, it only seems to trigger if you are issuing
a huge number of AIO writes, such that a just-written page can get
evicted from memory, and then read back into memory, before the
workqueue has a chance to update the extent tree.

This race has been around for a little over a year, and no one noticed
until two months ago; it only happens under fairly exotic conditions,
and in fact even after trying very hard to create a simple repro under
lab conditions, we could only reproduce the problem and confirm the
fix on production servers running MySQL on very fast PCIe-attached
flash devices.

Given that Dave was able to hit this problem pretty quickly, if we
confirm that this commit is at fault, the only reasonable thing to do
is to revert it IMO."

Reported-and-tested-by: Dave Jones
Acked-by: Theodore Ts'o
Signed-off-by: Linus Torvalds

Linus Torvalds
2012-03-30 08:00:56 +0800

05 Mar, 2012

2 commits

b43d17f31 ext4: don't release page refs in ext4_end_bio() ... Browse Code »

We can clear PageWriteback on each page when the IO
completes, but we can't release the references on the page
until we convert any uninitialized extents.

Without this patch, the use of the dioread_nolock mount
option can break buffered writes, because extents may
not be converted by the time a subsequent buffered read
comes in; if the page is not in the page cache, a read
will return zeros if the extent is still uninitialized.

I tested this with a (temporary) patch that adds a call
to msleep(1000) at the start of ext4_end_io_work(), to delay
processing of each DIO-unwritten work queue item. With this
msleep(), a simple workload of

fallocate
write
fadvise
read

will fail without this patch, succeeds with it.

Signed-off-by: Curt Wohlgemuth
Signed-off-by: "Theodore Ts'o"

Curt Wohlgemuth
2012-03-05 23:40:15 +0800
491caa436 ext4: fix race between sync and completed io work ... Browse Code »

The following command line will leave the aio-stress process unkillable
on an ext4 file system (in my case, mounted on /mnt/test):

aio-stress -t 20 -s 10 -O -S -o 2 -I 1000 /mnt/test/aiostress.3561.4 /mnt/test/aiostress.3561.4.20 /mnt/test/aiostress.3561.4.19 /mnt/test/aiostress.3561.4.18 /mnt/test/aiostress.3561.4.17 /mnt/test/aiostress.3561.4.16 /mnt/test/aiostress.3561.4.15 /mnt/test/aiostress.3561.4.14 /mnt/test/aiostress.3561.4.13 /mnt/test/aiostress.3561.4.12 /mnt/test/aiostress.3561.4.11 /mnt/test/aiostress.3561.4.10 /mnt/test/aiostress.3561.4.9 /mnt/test/aiostress.3561.4.8 /mnt/test/aiostress.3561.4.7 /mnt/test/aiostress.3561.4.6 /mnt/test/aiostress.3561.4.5 /mnt/test/aiostress.3561.4.4 /mnt/test/aiostress.3561.4.3 /mnt/test/aiostress.3561.4.2

This is using the aio-stress program from the xfstests test suite.
That particular command line tells aio-stress to do random writes to
20 files from 20 threads (one thread per file). The files are NOT
preallocated, so you will get writes to random offsets within the
file, thus creating holes and extending i_size. It also opens the
file with O_DIRECT and O_SYNC.

On to the problem. When an I/O requires unwritten extent conversion,
it is queued onto the completed_io_list for the ext4 inode. Two code
paths will pull work items from this list. The first is the
ext4_end_io_work routine, and the second is ext4_flush_completed_IO,
which is called via the fsync path (and O_SYNC handling, as well).
There are two issues I've found in these code paths. First, if the
fsync path beats the work routine to a particular I/O, the work
routine will free the io_end structure! It does not take into account
the fact that the io_end may still be in use by the fsync path. I've
fixed this issue by adding yet another IO_END flag, indicating that
the io_end is being processed by the fsync path.

The second problem is that the work routine will make an assignment to
io->flag outside of the lock. I have witnessed this result in a hang
at umount. Moving the flag setting inside the lock resolved that
problem.

The problem was introduced by commit b82e384c7b ("ext4: optimize
locking for end_io extent conversion"), which first appeared in 3.2.
As such, the fix should be backported to that release (probably along
with the unwritten extent conversion race fix).

Signed-off-by: Jeff Moyer
Signed-off-by: "Theodore Ts'o"
CC: stable@kernel.org

Jeff Moyer
2012-03-05 23:29:52 +0800

21 Feb, 2012

1 commit

266991b13 ext4: fix race between unwritten extent conversion and truncate ... Browse Code »

The following comment in ext4_end_io_dio caught my attention:

/* XXX: probably should move into the real I/O completion handler */
inode_dio_done(inode);

The truncate code takes i_mutex, then calls inode_dio_wait. Because the
ext4 code path above will end up dropping the mutex before it is
reacquired by the worker thread that does the extent conversion, it
seems to me that the truncate can happen out of order. Jan Kara
mentioned that this might result in error messages in the system logs,
but that should be the extent of the "damage."

The fix is pretty straight-forward: don't call inode_dio_done until the
extent conversion is complete.

Reviewed-by: Jan Kara
Signed-off-by: Jeff Moyer
Signed-off-by: "Theodore Ts'o"
Cc: stable@vger.kernel.org

Jeff Moyer
2012-02-21 06:59:24 +0800

10 Jan, 2012

1 commit

ac69e0928 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2/3/4: delete unneeded includes of module.h
ext{3,4}: Fix potential race when setversion ioctl updates inode
udf: Mark LVID buffer as uptodate before marking it dirty
ext3: Don't warn from writepage when readonly inode is spotted after error
jbd: Remove j_barrier mutex
reiserfs: Force inode evictions before umount to avoid crash
reiserfs: Fix quota mount option parsing
udf: Treat symlink component of type 2 as /
udf: Fix deadlock when converting file from in-ICB one to normal one
udf: Cleanup calling convention of inode_getblk()
ext2: Fix error handling on inode bitmap corruption
ext3: Fix error handling on inode bitmap corruption
ext3: replace ll_rw_block with other functions
ext3: NULL dereference in ext3_evict_inode()
jbd: clear revoked flag on buffers before a new transaction started
ext3: call ext3_mark_recovery_complete() when recovery is really needed

Linus Torvalds
2012-01-10 04:51:21 +0800

09 Jan, 2012

1 commit

302bf2f32 ext2/3/4: delete unneeded includes of module.h ... Browse Code »

Delete any instances of include module.h that were not strictly
required. In the case of ext2, the declaration of MODULE_LICENSE
etc. were in inode.c but the module_init/exit were in super.c, so
relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it
consistent with ext3 and ext4 at the same time.

Signed-off-by: Paul Gortmaker
Signed-off-by: Jan Kara

Paul Gortmaker
2012-01-09 20:52:10 +0800

14 Dec, 2011

1 commit

5a0dc7365 ext4: handle EOF correctly in ext4_bio_write_page() ... Browse Code »

We need to zero out part of a page which beyond EOF before setting uptodate,
otherwise, mapread or write will see non-zero data beyond EOF.

Signed-off-by: Yongqiang Yang
Signed-off-by: "Theodore Ts'o"
Cc: stable@kernel.org

Yongqiang Yang
2011-12-14 11:29:12 +0800

01 Nov, 2011

1 commit

0edeb71dc ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten ... Browse Code »

EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
should be done simultaneously since ext4_end_io_nolock always clear
the flag and decrease the counter in the same time.

We have found some bugs that the flag is set while leaving
i_aiodio_unwritten unchanged(commit 32c80b32c053d). So this patch just tries
to create a helper function to wrap them to avoid any future bug.
The idea is inspired by Eric.

Cc: Eric Sandeen
Signed-off-by: Tao Ma
Signed-off-by: "Theodore Ts'o"

Tao Ma
2011-11-01 05:30:44 +0800

31 Oct, 2011

3 commits

b82e384c7 ext4: optimize locking for end_io extent conversion ... Browse Code »

Now that we are doing the locking correctly, we need to grab the
i_completed_io_lock() twice per end_io. We can clean this up by
removing the structure from the i_complted_io_list, and use this as
the locking mechanism to prevent ext4_flush_completed_IO() racing
against ext4_end_io_work(), instead of clearing the
EXT4_IO_END_UNWRITTEN in io->flag.

In addition, if the ext4_convert_unwritten_extents() returns an error,
we no longer keep the end_io structure on the linked list. This
doesn't help, because it tends to lock up the file system and wedges
the system. That's one way to call attention to the problem, but it
doesn't help the overall robustness of the system.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2011-10-31 22:56:32 +0800
4e2980212 ext4: remove unnecessary call to waitqueue_active() ... Browse Code »

The usage of waitqueue_active() is not necessary, and introduces (I
believe) a hard-to-hit race.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2011-10-31 06:41:19 +0800
d73d5046a ext4: Use correct locking for ext4_end_io_nolock() ... Browse Code »

We must hold i_completed_io_lock when manipulating anything on the
i_completed_io_list linked list. This includes io->lock, which we
were checking in ext4_end_io_nolock().

So move this check to ext4_end_io_work(). This also has the bonus of
avoiding extra work if it is already done without needing to take the
mutex.

Signed-off-by: Tao Ma
Signed-off-by: "Theodore Ts'o"

Tao Ma
2011-10-31 06:26:08 +0800

31 Aug, 2011

1 commit

8c0bec215 ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining ... Browse Code »

The i_mutex lock and flush_completed_IO() added by commit 2581fdc810
in ext4_evict_inode() causes lockdep complaining about potential
deadlock in several places. In most/all of these LOCKDEP complaints
it looks like it's a false positive, since many of the potential
circular locking cases can't take place by the time the
ext4_evict_inode() is called; but since at the very least it may mask
real problems, we need to address this.

This change removes the flush_completed_IO() and i_mutex lock in
ext4_evict_inode(). Instead, we take a different approach to resolve
the software lockup that commit 2581fdc810 intends to fix. Rather
than having ext4-dio-unwritten thread wait for grabing the i_mutex
lock of an inode, we use mutex_trylock() instead, and simply requeue
the work item if we fail to grab the inode's i_mutex lock.

This should speed up work queue processing in general and also
prevents the following deadlock scenario: During page fault,
shrink_icache_memory is called that in turn evicts another inode B.
Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero. However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.

Signed-off-by: Jiaying Zhang
Signed-off-by: "Theodore Ts'o"

Jiaying Zhang
2011-08-31 23:50:51 +0800

14 Aug, 2011

1 commit

32c80b32c ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN. ... Browse Code »

EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
should be done simultaneously since ext4_end_io_nolock always clear
the flag and decrease the counter in the same time.

We don't increase i_aiodio_unwritten when setting
EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process
to wait forever.

Part of the patch came from Eric in his e-mail, but it doesn't fix the
problem met by Michael actually.

http://marc.info/?l=linux-ext4&m=131316851417460&w=2

Reported-and-Tested-by: Michael Tokarev
Signed-off-by: Eric Sandeen
Signed-off-by: Tao Ma
Signed-off-by: "Theodore Ts'o"
Cc: stable@kernel.org

Tao Ma
2011-08-14 00:30:59 +0800

30 Jun, 2011

1 commit

275d3ba6b ext4: remove loop around bio_alloc() ... Browse Code »

These days, bio_alloc() is guaranteed to never fail (as long as nvecs
is less than BIO_MAX_PAGES), so we don't need the loop around the
struct bio allocation.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2011-06-30 09:44:45 +0800

01 May, 2011

1 commit

39db00f1c ext4: don't set PageUptodate in ext4_end_bio() ... Browse Code »

In the bio completion routine, we should not be setting
PageUptodate at all -- it's set at sys_write() time, and is
unaffected by success/failure of the write to disk.

This can cause a page corruption bug when the file system's
block size is less than the architecture's VM page size.

if we have only written a single block -- we might end up
setting the page's PageUptodate flag, indicating that page
is completely read into memory, which may not be true.
This could cause subsequent reads to get bad data.

This commit also takes the opportunity to clean up error
handling in ext4_end_bio(), and remove some extraneous code:

- fixes ext4_end_bio() to set AS_EIO in the
page->mapping->flags on error, which was left out by
mistake. This is needed so that fsync() will
return an error if there was an I/O error.
- remove the clear_buffer_dirty() call on unmapped
buffers for each page.
- consolidate page/buffer error handling in a single
section.

Signed-off-by: Curt Wohlgemuth
Signed-off-by: "Theodore Ts'o"
Reported-by: Jim Meyering
Reported-by: Hugh Dickins
Cc: Mingming Cao

Curt Wohlgemuth
2011-05-01 01:26:26 +0800

26 Mar, 2011

1 commit

ae005cbed Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (43 commits)
ext4: fix a BUG in mb_mark_used during trim.
ext4: unused variables cleanup in fs/ext4/extents.c
ext4: remove redundant set_buffer_mapped() in ext4_da_get_block_prep()
ext4: add more tracepoints and use dev_t in the trace buffer
ext4: don't kfree uninitialized s_group_info members
ext4: add missing space in printk's in __ext4_grp_locked_error()
ext4: add FITRIM to compat_ioctl.
ext4: handle errors in ext4_clear_blocks()
ext4: unify the ext4_handle_release_buffer() api
ext4: handle errors in ext4_rename
jbd2: add COW fields to struct jbd2_journal_handle
jbd2: add the b_cow_tid field to journal_head struct
ext4: Initialize fsync transaction ids in ext4_new_inode()
ext4: Use single thread to perform DIO unwritten convertion
ext4: optimize ext4_bio_write_page() when no extent conversion is needed
ext4: skip orphan cleanup if fs has unknown ROCOMPAT features
ext4: use the nblocks arg to ext4_truncate_restart_trans()
ext4: fix missing iput of root inode for some mount error paths
ext4: make FIEMAP and delayed allocation play well together
ext4: suppress verbose debugging information if malloc-debug is off
...

Fi up conflicts in fs/ext4/super.c due to workqueue changes

Linus Torvalds
2011-03-26 00:57:41 +0800

10 Mar, 2011

1 commit

721a9602e block: kill off REQ_UNPLUG ... Browse Code »

With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:27 +0800

01 Mar, 2011

1 commit

b61684431 ext4: optimize ext4_bio_write_page() when no extent conversion is needed ... Browse Code »

If no extent conversion is required, wake up any processes waiting for
the page's writeback to be complete and free the ext4_io_end structure
directly in ext4_end_bio() instead of dropping it on the linked list
(which requires taking a spinlock to queue and dequeue the io_end
structure), and waiting for the workqueue to do this work.

This removes an extra scheduling delay before process waiting for an
fsync() to complete gets woken up, and it also reduces the CPU
overhead for a random write workload.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2011-03-01 02:12:38 +0800

28 Feb, 2011

1 commit

a54aa7610 ext4: don't leave PageWriteback set after memory failure ... Browse Code »

In ext4_bio_write_page(), if the memory allocation for the struct
ext4_io_page fails, it returns with the page's PageWriteback flag set.
This will end up causing the page not to skip writeback in
WB_SYNC_NONE mode, and in WB_SYNC_ALL mode (i.e., on a sync, fsync, or
umount) the writeback daemon will get stuck forever on the
wait_on_page_writeback() function in write_cache_pages_da().

Or, if journalling is enabled and the file gets deleted, it the
journal thread can get stuck in journal_finish_inode_data_buffers()
call to filemap_fdatawait().

Another place where things can get hung up is in
truncate_inode_pages(), called out of ext4_evict_inode().

Fix this by not setting PageWriteback until after we have successfully
allocated the struct ext4_io_page.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2011-02-28 05:43:24 +0800

22 Feb, 2011

1 commit

7dc576158 ext4: Fix sparse warning: Using plain integer as NULL pointer ... Browse Code »

This patch fixes the warning "Using plain integer as NULL pointer",
generated by sparse, by replacing the offending 0s with NULL.

Signed-off-by: Peter Huewe
Signed-off-by: "Theodore Ts'o"

Peter Huewe
2011-02-22 10:01:42 +0800