Eric Lee / smarc-fsl-linux-kernel

09 Jun, 2014

1 commit

f8409abdc Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 updates from Ted Ts'o:
"Clean ups and miscellaneous bug fixes, in particular for the new
collapse_range and zero_range fallocate functions. In addition,
improve the scalability of adding and remove inodes from the orphan
list"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
ext4: handle symlink properly with inline_data
ext4: fix wrong assert in ext4_mb_normalize_request()
ext4: fix zeroing of page during writeback
ext4: remove unused local variable "stored" from ext4_readdir(...)
ext4: fix ZERO_RANGE test failure in data journalling
ext4: reduce contention on s_orphan_lock
ext4: use sbi in ext4_orphan_{add|del}()
ext4: use EXT_MAX_BLOCKS in ext4_es_can_be_merged()
ext4: add missing BUFFER_TRACE before ext4_journal_get_write_access
ext4: remove unnecessary double parentheses
ext4: do not destroy ext4_groupinfo_caches if ext4_mb_init() fails
ext4: make local functions static
ext4: fix block bitmap validation when bigalloc, ^flex_bg
ext4: fix block bitmap initialization under sparse_super2
ext4: find the group descriptors on a 1k-block bigalloc,meta_bg filesystem
ext4: avoid unneeded lookup when xattr name is invalid
ext4: fix data integrity sync in ordered mode
ext4: remove obsoleted check
ext4: add a new spinlock i_raw_lock to protect the ext4's raw inode
ext4: fix locking for O_APPEND writes
...

Linus Torvalds
2014-06-09 04:03:35 +0800

05 Jun, 2014

1 commit

1b938c082 fs/buffer.c: remove block_write_full_page_endio() ... Browse Code »

The last in-tree caller of block_write_full_page_endio() was removed in
January 2013. It's time to remove the EXPORT_SYMBOL, which leaves
block_write_full_page() as the only caller of
block_write_full_page_endio(), so inline block_write_full_page_endio()
into block_write_full_page().

Signed-off-by: Matthew Wilcox
Cc: Hugh Dickins
Cc: Dave Chinner
Cc: Dheeraj Reddy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2014-06-05 07:54:02 +0800

28 May, 2014

1 commit

eeece469d ext4: fix zeroing of page during writeback ... Browse Code »

Tail of a page straddling inode size must be zeroed when being written
out due to POSIX requirement that modifications of mmaped page beyond
inode size must not be written to the file. ext4_bio_write_page() did
this only for blocks fully beyond inode size but didn't properly zero
blocks partially beyond inode size. Fix this.

The problem has been uncovered by mmap_11-4 test in openposix test suite
(part of LTP).

Reported-by: Xiaoguang Wang
Fixes: 5a0dc7365c240
Fixes: bd2d0210cf22f
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2014-05-28 00:48:55 +0800

12 May, 2014

1 commit

1c8349a17 ext4: fix data integrity sync in ordered mode ... Browse Code »

When we perform a data integrity sync we tag all the dirty pages with
PAGECACHE_TAG_TOWRITE at start of ext4_da_writepages. Later we check
for this tag in write_cache_pages_da and creates a struct
mpage_da_data containing contiguously indexed pages tagged with this
tag and sync these pages with a call to mpage_da_map_and_submit. This
process is done in while loop until all the PAGECACHE_TAG_TOWRITE
pages are synced. We also do journal start and stop in each iteration.
journal_stop could initiate journal commit which would call
ext4_writepage which in turn will call ext4_bio_write_page even for
delayed OR unwritten buffers. When ext4_bio_write_page is called for
such buffers, even though it does not sync them but it clears the
PAGECACHE_TAG_TOWRITE of the corresponding page and hence these pages
are also not synced by the currently running data integrity sync. We
will end up with dirty pages although sync is completed.

This could cause a potential data loss when the sync call is followed
by a truncate_pagecache call, which is exactly the case in
collapse_range. (It will cause generic/127 failure in xfstests)

To avoid this issue, we can use set_page_writeback_keepwrite instead of
set_page_writeback, which doesn't clear TOWRITE tag.

Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon
Signed-off-by: Ashish Sangwan
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Jan Kara

Namjae Jeon
2014-05-12 20:12:25 +0800

07 Apr, 2014

1 commit

9503c67c9 ext4: note the error in ext4_end_bio() ... Browse Code »

ext4_end_bio() currently throws away the error that it receives. Chances
are this is part of a spate of errors, one of which will end up getting
the error returned to userspace somehow, but we shouldn't take that risk.
Also print out the errno to aid in debug.

Signed-off-by: Matthew Wilcox
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Jan Kara
Cc: stable@vger.kernel.org

Matthew Wilcox
2014-04-07 22:54:20 +0800

24 Nov, 2013

2 commits

4f024f379 block: Abstract out bvec iterator ... Browse Code »

Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Geert Uytterhoeven
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "Ed L. Cashin"
Cc: Nick Piggin
Cc: Lars Ellenberg
Cc: Jiri Kosina
Cc: Matthew Wilcox
Cc: Geoff Levand
Cc: Yehuda Sadeh
Cc: Sage Weil
Cc: Alex Elder
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris
Cc: Philip Kelleher
Cc: Rusty Russell
Cc: "Michael S. Tsirkin"
Cc: Konrad Rzeszutek Wilk
Cc: Jeremy Fitzhardinge
Cc: Neil Brown
Cc: Alasdair Kergon
Cc: Mike Snitzer
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh
Cc: Benny Halevy
Cc: "James E.J. Bottomley"
Cc: Greg Kroah-Hartman
Cc: "Nicholas A. Bellinger"
Cc: Alexander Viro
Cc: Chris Mason
Cc: "Theodore Ts'o"
Cc: Andreas Dilger
Cc: Jaegeuk Kim
Cc: Steven Whitehouse
Cc: Dave Kleikamp
Cc: Joern Engel
Cc: Prasad Joshi
Cc: Trond Myklebust
Cc: KONISHI Ryusuke
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Ben Myers
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Len Brown
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: Herton Ronaldo Krzesinski
Cc: Ben Hutchings
Cc: Andrew Morton
Cc: Guo Chao
Cc: Tejun Heo
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Wei Yongjun
Cc: "Roger Pau Monné"
Cc: Jan Beulich
Cc: Stefano Stabellini
Cc: Ian Campbell
Cc: Sebastian Ott
Cc: Christian Borntraeger
Cc: Minchan Kim
Cc: Jiang Liu
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Joe Perches
Cc: Peng Tao
Cc: Andy Adamson
Cc: fanchaoting
Cc: Jie Liu
Cc: Sunil Mushran
Cc: "Martin K. Petersen"
Cc: Namjae Jeon
Cc: Pankaj Kumar
Cc: Dan Magenheimer
Cc: Mel Gorman 6

Kent Overstreet
2013-11-24 14:33:47 +0800
2c30c71bd block: Convert various code to bio_for_each_segment() ... Browse Code »

With immutable biovecs we don't want code accessing bi_io_vec directly -
the uses this patch changes weren't incorrect since they all own the
bio, but it makes the code harder to audit for no good reason - also,
this will help with multipage bvecs later.

Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Alexander Viro
Cc: Chris Mason
Cc: Jaegeuk Kim
Cc: Joern Engel
Cc: Prasad Joshi
Cc: Trond Myklebust

Kent Overstreet
2013-11-24 14:33:46 +0800

16 Oct, 2013

1 commit

78371a45d ext4: fix assertion in ext4_add_complete_io() ... Browse Code »

It doesn't make sense to require io_end->handle when we are in
nojournal mode. So update the assertion accordingly to avoid false
warnings from ext4_add_complete_io().

Reported-by: Eric Whitney
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-10-16 20:25:11 +0800

04 Sep, 2013

1 commit

7b7a8665e direct-io: Implement generic deferred AIO completions ... Browse Code »

Add support to the core direct-io code to defer AIO completions to user
context using a workqueue. This replaces opencoded and less efficient
code in XFS and ext4 (we save a memory allocation for each direct IO)
and will be needed to properly support O_(D)SYNC for AIO.

The communication between the filesystem and the direct I/O code requires
a new buffer head flag, which is a bit ugly but not avoidable until the
direct I/O code stops abusing the buffer_head structure for communicating
with the filesystems.

Currently this creates a per-superblock unbound workqueue for these
completions, which is taken from an earlier patch by Jan Kara. I'm
not really convinced about this use and would prefer a "normal" global
workqueue with a high concurrency limit, but this needs further discussion.

JK: Fixed ext4 part, dynamic allocation of the workqueue.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Christoph Hellwig
2013-09-04 21:23:46 +0800

12 Jul, 2013

1 commit

e8974c393 ext4: rate limit printk in buffer_io_error() ... Browse Code »

If there are a lot of outstanding buffered IOs when a device is
taken offline (due to hardware errors etc), ext4_end_bio prints
out a message for each failed logical block. While this is desirable,
we see thousands of such lines being printed out before the
serial console gets overwhelmed, causing ext4_end_bio() wait for
the printk to complete.

This in itself isn't a disaster, except for the detail that this
function is being called with the queue lock held.
This causes any other function in the block layer
to spin on its spin_lock_irqsave while the serial console is
draining. If NMI watchdog is enabled on this machine then it
eventually comes along and shoots the machine in the head.

The end result is that losing any one disk causes the machine to
go down. This patch rate limits the printk to bandaid around the
problem.

Tested: xfstests
Change-Id: I8ab5690dcf4f3a67e78be147d45e489fdf4a88d8
Signed-off-by: Anatol Pomozov
Signed-off-by: "Theodore Ts'o"

Anatol Pomozov
2013-07-12 10:42:42 +0800

11 Jul, 2013

1 commit

822dbba33 ext4: fix warning in ext4_evict_inode() ... Browse Code »

The following race can lead to ext4_evict_inode() seeing i_ioend_count
> 0 and thus triggering a sanity check warning:

CPU1 CPU2
ext4_end_bio() ext4_evict_inode()
ext4_finish_bio()
end_page_writeback();
truncate_inode_pages()
evict page
WARN_ON(i_ioend_count > 0);
ext4_put_io_end_defer()
ext4_release_io_end()
dec i_ioend_count

This is possible use-after-free bug since we decrement i_ioend_count in
possibly released inode.

Since i_ioend_count is used only for sanity checks one possible solution
would be to just remove it but for now I'd like to keep those sanity
checks to help debugging the new ext4 writeback code.

This patch changes ext4_end_bio() to call ext4_put_io_end_defer() before
ext4_finish_bio() in the shortcut case when unwritten extent conversion
isn't needed. In that case we don't need the io_end so we are safe to
drop it early.

Reported-by: Guenter Roeck
Tested-by: Guenter Roeck
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-07-11 09:31:04 +0800

06 Jun, 2013

1 commit

a1d8d9a75 ext4: add check to io_submit_init_bio ... Browse Code »

The bio_alloc() function can return NULL if the memory allocation
fails. So we need to check for this.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2013-06-06 22:18:22 +0800

05 Jun, 2013

8 commits

5dc23bdd5 ext4: remove ext4_ioend_wait() ... Browse Code »

Now that we clear PageWriteback after extent conversion, there's no
need to wait for io_end processing in ext4_evict_inode(). Running
AIO/DIO keeps file reference until aio_complete() is called so
ext4_evict_inode() cannot be called. For io_end structures resulting
from buffered IO waiting is happening because we wait for
PageWriteback in truncate_inode_pages().

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 02:46:12 +0800
c724585b6 ext4: don't wait for extent conversion in ext4_punch_hole() ... Browse Code »

We don't have to wait for extent conversion in ext4_punch_hole() as
buffered IO for the punched range has been flushed and waited upon
(thus all extent conversions for that range have completed). Also we
wait for all DIO to finish using inode_dio_wait() so there cannot be
any extent conversions pending due to direct IO.

Also remove ext4_flush_unwritten_io() since it's unused now.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 02:44:36 +0800
a115f749c ext4: remove wait for unwritten extent conversion from ext4_truncate() ... Browse Code »

Since PageWriteback bit is now cleared after extents are converted
from unwritten to written ones, we have full exclusion of writeback
path from truncate (truncate_inode_pages() waits for PageWriteback
bits to get cleared on all invalidated pages). Exclusion from DIO
path is achieved by inode_dio_wait() call in ext4_setattr(). So
there's no need to wait for extent convertion in ext4_truncate()
anymore.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 02:30:00 +0800
b0857d309 ext4: defer clearing of PageWriteback after extent conversion ... Browse Code »

Currently PageWriteback bit gets cleared from put_io_page() called
from ext4_end_bio(). This is somewhat inconvenient as extent tree is
not fully updated at that time (unwritten extents are not marked as
written) so we cannot read the data back yet. This design was
dictated by lock ordering as we cannot start a transaction while
PageWriteback bit is set (we could easily deadlock with
ext4_da_writepages()). But now that we use transaction reservation
for extent conversion, locking issues are solved and we can move
PageWriteback bit clearing after extent conversion is done. As a
result we can remove wait for unwritten extent conversion from
ext4_sync_file() because it already implicitely happens through
wait_on_page_writeback().

We implement deferring of PageWriteback clearing by queueing completed
bios to appropriate io_end and processing all the pages when io_end is
going to be freed instead of at the moment ext4_io_end() is called.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 02:23:41 +0800
2e8fa54e3 ext4: split extent conversion lists to reserved & unreserved parts ... Browse Code »

Now that we have extent conversions with reserved transaction, we have
to prevent extent conversions without reserved transaction (from DIO
code) to block these (as that would effectively void any transaction
reservation we did). So split lists, work items, and work queues to
reserved and unreserved parts.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 02:21:02 +0800
6b523df4f ext4: use transaction reservation for extent conversion in ext4_end_io ... Browse Code »

Later we would like to clear PageWriteback bit only after extent
conversion from unwritten to written extents is performed. However it
is not possible to start a transaction after PageWriteback is set
because that violates lock ordering (and is easy to deadlock). So we
have to reserve a transaction before locking pages and sending them
for IO and later we use the transaction for extent conversion from
ext4_end_io().

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 01:21:11 +0800
3613d2280 ext4: remove buffer_uninit handling ... Browse Code »

There isn't any need for setting BH_Uninit on buffers anymore. It was
only used to signal we need to mark io_end as needing extent
conversion in add_bh_to_extent() but now we can mark the io_end
directly when mapping extent.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 01:19:34 +0800
4e7ea81db ext4: restructure writeback path ... Browse Code »

There are two issues with current writeback path in ext4. For one we
don't necessarily map complete pages when blocksize < pagesize and
thus needn't do any writeback in one iteration. We always map some
blocks though so we will eventually finish mapping the page. Just if
writeback races with other operations on the file, forward progress is
not really guaranteed. The second problem is that current code
structure makes it hard to associate all the bios to some range of
pages with one io_end structure so that unwritten extents can be
converted after all the bios are finished. This will be especially
difficult later when io_end will be associated with reserved
transaction handle.

We restructure the writeback path to a relatively simple loop which
first prepares extent of pages, then maps one or more extents so that
no page is partially mapped, and once page is fully mapped it is
submitted for IO. We keep all the mapping and IO submission
information in mpage_da_data structure to somewhat reduce stack usage.
Resulting code is somewhat shorter than the old one and hopefully also
easier to read.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 01:17:40 +0800

04 Jun, 2013

1 commit

97a851ed7 ext4: use io_end for multiple bios ... Browse Code »

Change writeback path to create just one io_end structure for the
extent to which we submit IO and share it among bios writing that
extent. This prevents needless splitting and joining of unwritten
extents when they cannot be submitted as a single bio.

Bugs in ENOMEM handling found by Linux File System Verification project
(linuxtesting.org) and fixed by Alexey Khoroshilov
.

CC: Alexey Khoroshilov
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-04 23:58:58 +0800

15 May, 2013

1 commit

b973425cb Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 update from Ted Ts'o:
"Fixed regressions (two stability regressions and a performance
regression) introduced during the 3.10-rc1 merge window.

Also included is a bug fix relating to allocating blocks after
resizing an ext3 file system when using the ext4 file system driver"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd,jbd2: fix oops in jbd2_journal_put_journal_head()
ext4: revert "ext4: use io_end for multiple bios"
ext4: limit group search loop for non-extent files
ext4: fix fio regression

Linus Torvalds
2013-05-15 00:30:54 +0800

12 May, 2013

1 commit

a549984b8 ext4: revert "ext4: use io_end for multiple bios" ... Browse Code »

This reverts commit 4eec708d263f0ee10861d69251708a225b64cac7.

Multiple users have reported crashes which is apparently caused by
this commit. Thanks to Dmitry Monakhov for bisecting it.

Signed-off-by: "Theodore Ts'o"
Cc: Dmitry Monakhov
Cc: Jan Kara

Theodore Ts'o
2013-05-12 07:07:42 +0800

08 May, 2013

1 commit

a27bb332c aio: don't include aio.h in sched.h ... Browse Code »

Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 11:16:25 +0800

12 Apr, 2013

3 commits

7b001d6a0 ext4: clear buffer_uninit flag when submitting IO ... Browse Code »

Currently noone cleared buffer_uninit flag. This results in writeback
needlessly marking io_end as needing extent conversion scanning extent
tree for extents to convert. So clear the buffer_uninit flag once the
buffer is submitted for IO and the flag is transformed into
EXT4_IO_END_UNWRITTEN flag.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Zheng Liu

Jan Kara
2013-04-12 12:03:19 +0800
4eec708d2 ext4: use io_end for multiple bios ... Browse Code »

Change writeback path to create just one io_end structure for the
extent to which we submit IO and share it among bios writing that
extent. This prevents needless splitting and joining of unwritten
extents when they cannot be submitted as a single bio.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Dmitry Monakhov
Reviewed-by: Zheng Liu

Jan Kara
2013-04-12 11:56:53 +0800
0058f9658 ext4: make ext4_bio_write_page() use BH_Async_Write flags ... Browse Code »

So far ext4_bio_write_page() attached all the pages to ext4_io_end
structure. This makes that structure pretty heavy (1 KB for pointers
+ 16 bytes per page attached to the bio). Also later we would like to
share ext4_io_end structure among several bios in case IO to a single
extent needs to be split among several bios and pointing to pages from
ext4_io_end makes this complex.

We remove page pointers from ext4_io_end and use pointers from bio
itself instead. This isn't as easy when blocksize < pagesize because
then we can have several bios in flight for a single page and we have
to be careful when to call end_page_writeback(). However this is a
known problem already solved by block_write_full_page() /
end_buffer_async_write() so we mimic its behavior here. We mark
buffers going to disk with BH_Async_Write flag and in
ext4_bio_end_io() we check whether there are any buffers with
BH_Async_Write flag left. If there are not, we can call
end_page_writeback().

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Dmitry Monakhov
Reviewed-by: Zheng Liu

Jan Kara
2013-04-12 11:48:32 +0800

20 Mar, 2013

1 commit

1ada47d94 ext4: fix ext4_evict_inode() racing against workqueue processing code ... Browse Code »

Commit 84c17543ab56 (ext4: move work from io_end to inode) triggered a
regression when running xfstest #270 when the file system is mounted
with dioread_nolock.

The problem is that after ext4_evict_inode() calls ext4_ioend_wait(),
this guarantees that last io_end structure has been freed, but it does
not guarantee that the workqueue structure, which was moved into the
inode by commit 84c17543ab56, is actually finished. Once
ext4_flush_completed_IO() calls ext4_free_io_end() on CPU #1, this
will allow ext4_ioend_wait() to return on CPU #2, at which point the
evict_inode() codepath can race against the workqueue code on CPU #1
accessing EXT4_I(inode)->i_unwritten_work to find the next item of
work to do.

Fix this by calling cancel_work_sync() in ext4_ioend_wait(), which
will be renamed ext4_ioend_shutdown(), since it is only used by
ext4_evict_inode(). Also, move the call to ext4_ioend_shutdown()
until after truncate_inode_pages() and filemap_write_and_wait() are
called, to make sure all dirty pages have been written back and
flushed from the page cache first.

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] cwq_activate_delayed_work+0x3b/0x7e
*pdpt = 0000000030bc3001 *pde = 0000000000000000
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in:
Pid: 6, comm: kworker/u:0 Not tainted 3.8.0-rc3-00013-g84c1754-dirty #91 Bochs Bochs
EIP: 0060:[] EFLAGS: 00010046 CPU: 0
EIP is at cwq_activate_delayed_work+0x3b/0x7e
EAX: 00000000 EBX: 00000000 ECX: f505fe54 EDX: 00000000
ESI: ed5b697c EDI: 00000006 EBP: f64b7e8c ESP: f64b7e84
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: 00000000 CR3: 30bc2000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process kworker/u:0 (pid: 6, ti=f64b6000 task=f64b4160 task.ti=f64b6000)
Stack:
f505fe00 00000006 f64b7e9c c01de3d7 f6435540 00000003 f64b7efc c01def1d
f6435540 00000002 00000000 0000008a c16d0808 c040a10b c16d07d8 c16d08b0
f505fe00 c16d0780 00000000 00000000 ee153df4 c1ce4a30 c17d0e30 00000000
Call Trace:
[] cwq_dec_nr_in_flight+0x71/0xfb
[] process_one_work+0x5d8/0x637
[] ? ext4_end_bio+0x300/0x300
[] worker_thread+0x249/0x3ef
[] kthread+0xd8/0xeb
[] ? manage_workers+0x4bb/0x4bb
[] ? trace_hardirqs_on+0x27/0x37
[] ret_from_kernel_thread+0x1b/0x28
[] ? __init_kthread_worker+0x71/0x71
Code: 01 83 15 ac ff 6c c1 00 31 db 89 c6 8b 00 a8 04 74 12 89 c3 30 db 83 05 b0 ff 6c c1 01 83 15 b4 ff 6c c1 00 89 f0 e8 42 ff ff ff 13 89 f0 83 05 b8 ff 6c c1
6c c1 00 31 c9 83
EIP: [] cwq_activate_delayed_work+0x3b/0x7e SS:ESP 0068:f64b7e84
CR2: 0000000000000000
---[ end trace a1923229da53d8a4 ]---

Signed-off-by: "Theodore Ts'o"
Cc: Jan Kara

Theodore Ts'o
2013-03-20 21:39:42 +0800

30 Jan, 2013

1 commit

091e26dfc ext4: fix possible use-after-free with AIO ... Browse Code »

Running AIO is pinning inode in memory using file reference. Once AIO
is completed using aio_complete(), file reference is put and inode can
be freed from memory. So we have to be sure that calling aio_complete()
is the last thing we do with the inode.

CC: stable@vger.kernel.org
Reviewed-by: Carlos Maiolino
Acked-by: Jeff Moyer
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-01-30 11:48:17 +0800

29 Jan, 2013

2 commits

cfa727548 ext4: remove unused variable flags ... Browse Code »

Remove unused variable flags from dump_completed_IO(). The code is
only exercised when EXT4FS_DEBUG is defined.

Signed-off-by: Lukas Czerner
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Zheng Liu

Lukas Czerner
2013-01-29 10:14:11 +0800
8a850c3fb ext4: Make ext4_bio_writepage() handle unprepared buffers ... Browse Code »

So far ext4_bio_writepage() unconditionally cleared dirty bit on all
buffers underlying the page. That implicitely assumes we can write all
buffers. So far that is true because callers call into
ext4_bio_writepage() make sure all buffers in the page are mapped but:

a) it's a data corruption bug waiting to happen
b) in data=ordered mode when blocksize < pagesize we do need to write
pages that may have only some of dirty buffers mapped.

So change ext4_bio_writepage() to skip buffers that cannot be written without
clearing their dirty bit.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-01-29 09:53:28 +0800

28 Jan, 2013

4 commits

002bd7fa3 ext4: simplify list handling in ext4_do_flush_completed_IO() ... Browse Code »

The function splices i_completed_io_list to its private list
first. From that moment on we don't need any lock for working with
io_end structures because all io_end structure on the list are only
our own. So we can remove the other two lists in the function and free
io_end immediately after we are done with it.

CC: Dmitry Monakhov
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-01-28 22:49:15 +0800
84c17543a ext4: move work from io_end to inode ... Browse Code »

It does not make much sense to have struct work in ext4_io_end_t
because we always use it for only one ext4_io_end_t per inode (the
first one in the i_completed_io list). So just move the structure to
inode itself. This also allows for a small simplification in
processing io_end structures.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-01-28 22:43:46 +0800
1ae48a635 ext4: use redirty_page_for_writepage() in ext4_bio_write_page() ... Browse Code »

When we cannot write a page we should use redirty_page_for_writepage()
instead of plain set_page_dirty(). That tells writeback code we have
problems, redirties only the page (redirtying buffers is not needed),
and updates mm accounting of failed page writes.

Also move clearing of buffer dirty flag after io_submit_add_bh(). At that
moment we are sure buffer will be going to disk.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-01-28 22:32:54 +0800
36ade451a ext4: Always use ext4_bio_write_page() for writeout ... Browse Code »

Currently we sometimes used block_write_full_page() and sometimes
ext4_bio_write_page() for writeback (depending on mount options and call
path). Let's always use ext4_bio_write_page() to simplify things a bit.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-01-28 22:30:52 +0800

29 Nov, 2012

1 commit

4a092d737 ext4: rationalize ext4_extents.h inclusion ... Browse Code »

Previously, ext4_extents.h was being included at the end of ext4.h,
which was bad for a number of reasons: (a) it was not being included
in the expected place, and (b) it caused the header to be included
multiple times. There were #ifdef's to prevent this from causing any
problems, but it still was unnecessary.

By moving the function declarations that were in ext4_extents.h to
ext4.h, which is standard practice for where the function declarations
for the rest of ext4.h can be found, we can remove ext4_extents.h from
being included in ext4.h at all, and then we can only include
ext4_extents.h where it is needed in ext4's source files.

It should be possible to move a few more things into ext4.h, and
further reduce the number of source files that need to #include
ext4_extents.h, but that's a cleanup for another day.

Reported-by: Sachin Kamat
Reported-by: Wei Yongjun
Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2012-11-29 02:03:30 +0800

09 Nov, 2012

1 commit

8d8c18257 ext4: use 'inode' variable that is already dereferenced ... Browse Code »

Tested: xfs tests

Reviewed-by: Zheng Liu
Signed-off-by: Anatol Pomozov
Signed-off-by: "Theodore Ts'o"

Anatol Pomozov
2012-11-09 03:53:35 +0800

05 Oct, 2012

1 commit

c278531d3 ext4: fix ext4_flush_completed_IO wait semantics ... Browse Code »

BUG #1) All places where we call ext4_flush_completed_IO are broken
because buffered io and DIO/AIO goes through three stages
1) submitted io,
2) completed io (in i_completed_io_list) conversion pended
3) finished io (conversion done)
And by calling ext4_flush_completed_IO we will flush only
requests which were in (2) stage, which is wrong because:
1) punch_hole and truncate _must_ wait for all outstanding unwritten io
regardless to it's state.
2) fsync and nolock_dio_read should also wait because there is
a time window between end_page_writeback() and ext4_add_complete_io()
As result integrity fsync is broken in case of buffered write
to fallocated region:
fsync blkdev_completion
->filemap_write_and_wait_range
->ext4_end_bio
->end_page_writeback
ext4_flush_completed_IO
sees empty i_completed_io_list but pended
conversion still exist
->ext4_add_complete_io

BUG #2) Race window becomes wider due to the 'ext4: completed_io
locking cleanup V4' patch series

This patch make following changes:
1) ext4_flush_completed_io() now first try to flush completed io and when
wait for any outstanding unwritten io via ext4_unwritten_wait()
2) Rename function to more appropriate name.
3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to
prevent endless wait

Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Jan Kara

Dmitry Monakhov
2012-10-05 23:31:55 +0800

29 Sep, 2012

2 commits

28a535f9a ext4: completed_io locking cleanup ... Browse Code »

Current unwritten extent conversion state-machine is very fuzzy.
- For unknown reason it performs conversion under i_mutex. What for?
My diagnosis:
We already protect extent tree with i_data_sem, truncate and punch_hole
should wait for DIO, so the only data we have to protect is end_io->flags
modification, but only flush_completed_IO and end_io_work modified this
flags and we can serialize them via i_completed_io_lock.

Currently all these games with mutex_trylock result in the following deadlock
truncate: kworker:
ext4_setattr ext4_end_io_work
mutex_lock(i_mutex)
inode_dio_wait(inode) ->BLOCK
DEADLOCKflags modification
is protected by ei->ext4_complete_io_lock

Full list of changes:
- Move all completion end_io related routines to page-io.c in order to improve
logic locality
- Move open coded logic from various xx_end_xx routines to ext4_add_complete_io()
- remove EXT4_IO_END_FSYNC
- Improve SMP scalability by removing useless i_mutex which does not
protect io->flags anymore.
- Reduce lock contention on i_completed_io_lock by optimizing list walk.
- Rename ext4_end_io_nolock to end4_end_io and make it static
- Check flush completion status to ext4_ext_punch_hole(). Because it is
not good idea to punch blocks from corrupted inode.

Changes since V3 (in request to Jan's comments):
Fall back to active flush_completed_IO() approach in order to prevent
performance issues with nolocked DIO reads.
Changes since V2:
Fix use-after-free caused by race truncate vs end_io_work

Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"

Dmitry Monakhov
2012-09-29 12:14:55 +0800
82e542291 ext4: fix unwritten counter leakage ... Browse Code »

ext4_set_io_unwritten_flag() will increment i_unwritten counter, so
once we mark end_io with EXT4_END_IO_UNWRITTEN we have to revert it back
on error path.

- add missed error checks to prevent counter leakage
- ext4_end_io_nolock() will clear EXT4_END_IO_UNWRITTEN flag to signal
that conversion finished.
- add BUG_ON to ext4_free_end_io() to prevent similar leakage in future.

Visible effect of this bug is that unaligned aio_stress may deadlock

Reviewed-by: Jan Kara
Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"

Dmitry Monakhov
2012-09-29 11:36:25 +0800