Eric Lee / smarc-fsl-linux-kernel

16 Sep, 2009

12 commits

1ef7d9aa3 writeback: fix possible bdi writeback refcounting problem ... Browse Code »

wb_clear_pending AFAIKS should not be called after the item has been
put on the list, except by the worker threads. It could lead to the
situation where the refcount is decremented below 0 and cause lots of
problems.

Presumably the !wb_has_dirty_io case is not a common one, so it can
be discovered when the thread wakes up to check?

Also add a comment in bdi_work_clear.

Signed-off-by: Nick Piggin
Signed-off-by: Jens Axboe

Nick Piggin
2009-09-16 21:18:53 +0800
77b9d059c writeback: Fix bdi use after free in wb_work_complete() ... Browse Code »

By the time bdi_work_on_stack gets evaluated again in bdi_work_free, it
can already have been deallocated and used for something else in the
!on stack case, giving a false positive in this test and causing
corruption.

Signed-off-by: Nick Piggin
Signed-off-by: Jens Axboe

Nick Piggin
2009-09-16 21:18:52 +0800
77fad5e62 writeback: improve scalability of bdi writeback work queues ... Browse Code »

If you're going to do an atomic RMW on each list entry, there's not much
point in all the RCU complexities of the list walking. This is only going
to help the multi-thread case I guess, but it doesn't hurt to do now.

Signed-off-by: Nick Piggin
Signed-off-by: Jens Axboe

Nick Piggin
2009-09-16 21:18:52 +0800
deed62edf writeback: remove smp_mb(), it's not needed with list_add_tail_rcu() ... Browse Code »

list_add_tail_rcu contains required barriers.

Signed-off-by: Nick Piggin
Signed-off-by: Jens Axboe

Nick Piggin
2009-09-16 21:18:52 +0800
49db04143 writeback: use schedule_timeout_interruptible() ... Browse Code »

Gets rid of a manual set_current_state().

Signed-off-by: Jens Axboe

Jens Axboe
2009-09-16 21:18:52 +0800
8010c3b63 writeback: add comments to bdi_work structure ... Browse Code »

And document its retriever, get_next_work_item().

Acked-by: Jan Kara
Signed-off-by: Jens Axboe

Jens Axboe
2009-09-16 21:18:52 +0800
b6e51316d writeback: separate starting of sync vs opportunistic writeback ... Browse Code »

bdi_start_writeback() is currently split into two paths, one for
WB_SYNC_NONE and one for WB_SYNC_ALL. Add bdi_sync_writeback()
for WB_SYNC_ALL writeback and let bdi_start_writeback() handle
only WB_SYNC_NONE.

Push down the writeback_control allocation and only accept the
parameters that make sense for each function. This cleans up
the API considerably.

Signed-off-by: Jens Axboe

Jens Axboe
2009-09-16 21:18:52 +0800
bcddc3f01 writeback: inline allocation failure handling in bdi_alloc_queue_work() ... Browse Code »

This gets rid of work == NULL in bdi_queue_work() and puts the
OOM handling where it belongs.

Acked-by: Jan Kara
Signed-off-by: Jens Axboe

Jens Axboe
2009-09-16 21:18:52 +0800
cfc4ba536 writeback: use RCU to protect bdi_list ... Browse Code »

Now that bdi_writeback_all() no longer handles integrity writeback,
it doesn't have to block anymore. This means that we can switch
bdi_list reader side protection to RCU.

Signed-off-by: Jens Axboe

Jens Axboe
2009-09-16 21:18:51 +0800
f11fcae84 writeback: only use bdi_writeback_all() for WB_SYNC_NONE writeout ... Browse Code »

Data integrity writeback must use bdi_start_writeback() and ensure
that wbc->sb and wbc->bdi are set.

Acked-by: Jan Kara
Signed-off-by: Jens Axboe

Jens Axboe
2009-09-16 21:18:51 +0800
c4a77a6c7 writeback: make wb_writeback() take an argument structure ... Browse Code »

We need to be able to pass in range_cyclic as well, so instead
of growing yet another argument, split the arguments into a
struct wb_writeback_args structure that we can use internally.
Also makes it easier to just copy all members to an on-stack
struct, since we can't access work after clearing the pending
bit.

Signed-off-by: Jens Axboe

Jens Axboe
2009-09-16 21:18:25 +0800
f0fad8a53 writeback: merely wakeup flusher thread if work allocation fails for WB_SYNC_NONE ... Browse Code »

Since it's an opportunistic writeback and not a data integrity action,
don't punt to blocking writeback. Just wakeup the thread and it will
flush old data.

Acked-by: Jan Kara
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2009-09-16 21:16:18 +0800

14 Sep, 2009

1 commit

18f2ee705 vfs: Remove generic_osync_inode() and sync_page_range{_nolock}() ... Browse Code »

Remove these three functions since nobody uses them anymore.

Signed-off-by: Jan Kara

Jan Kara
2009-09-14 23:08:17 +0800

11 Sep, 2009

5 commits

500b067c5 writeback: check for registered bdi in flusher add and inode dirty ... Browse Code »

Also a debugging aid. We want to catch dirty inodes being added to
backing devices that don't do writeback.

Signed-off-by: Jens Axboe

Jens Axboe
2009-09-11 15:20:26 +0800
d0bceac74 writeback: get rid of pdflush completely ... Browse Code »

It is now unused, so kill it off.

Signed-off-by: Jens Axboe

Jens Axboe
2009-09-11 15:20:25 +0800
03ba3782e writeback: switch to per-bdi threads for flushing data ... Browse Code »

This gets rid of pdflush for bdi writeout and kupdated style cleaning.
pdflush writeout suffers from lack of locality and also requires more
threads to handle the same workload, since it has to work in a
non-blocking fashion against each queue. This also introduces lumpy
behaviour and potential request starvation, since pdflush can be starved
for queue access if others are accessing it. A sample ffsb workload that
does random writes to files is about 8% faster here on a simple SATA drive
during the benchmark phase. File layout also seems a LOT more smooth in
vmstat:

r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42
0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44
1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37
0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58
0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34
0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37
0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44
0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38
0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41
0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45

where vanilla tends to fluctuate a lot in the creation phase:

r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36
1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51
0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40
0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37
1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41
0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49
0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36
1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43
0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39
1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45
1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34
0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54

A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
SSD based writeback test on XFS performs over 20% better as well, with
the throughput being very stable around 1GB/sec, where pdflush only
manages 750MB/sec and fluctuates wildly while doing so. Random buffered
writes to many files behave a lot better as well, as does random mmap'ed
writes.

A separate thread is added to sync the super blocks. In the long term,
adding sync_supers_bdi() functionality could get rid of this thread again.

Signed-off-by: Jens Axboe

Jens Axboe
2009-09-11 15:20:25 +0800
66f3b8e2e writeback: move dirty inodes from super_block to backing_dev_info ... Browse Code »

This is a first step at introducing per-bdi flusher threads. We should
have no change in behaviour, although sb_has_dirty_inodes() is now
ridiculously expensive, as there's no easy way to answer that question.
Not a huge problem, since it'll be deleted in subsequent patches.

Signed-off-by: Jens Axboe

Jens Axboe
2009-09-11 15:20:25 +0800
d8a8559cd writeback: get rid of generic_sync_sb_inodes() export ... Browse Code »

This adds two new exported functions:

- writeback_inodes_sb(), which only attempts to writeback dirty inodes on
this super_block, for WB_SYNC_NONE writeout.
- sync_inodes_sb(), which writes out all dirty inodes on this super_block
and also waits for the IO to complete.

Acked-by: Jan Kara
Signed-off-by: Jens Axboe

Jens Axboe
2009-09-11 15:20:25 +0800

24 Jun, 2009

1 commit

01c031945 cleanup __writeback_single_inode ... Browse Code »

There is no reason to for the split between __writeback_single_inode and
__sync_single_inode, the former just does a couple of checks before
tail-calling the latter. So merge the two, and while we're at it split
out the I_SYNC waiting case for data integrity writers, as it's
logically separate function. Finally rename __writeback_single_inode to
writeback_single_inode.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-24 20:15:26 +0800

17 Jun, 2009

1 commit

84a892456 writeback: skip new or to-be-freed inodes ... Browse Code »

1) I_FREEING tests should be coupled with I_CLEAR

The two I_FREEING tests are racy because clear_inode() can set i_state to
I_CLEAR between the clear of I_SYNC and the test of I_FREEING.

2) skip I_WILL_FREE inodes in generic_sync_sb_inodes() to avoid possible
races with generic_forget_inode()

generic_forget_inode() sets I_WILL_FREE call writeback on its own, so
generic_sync_sb_inodes() shall not try to step in and create possible races:

generic_forget_inode
inode->i_state |= I_WILL_FREE;
spin_unlock(&inode_lock);
generic_sync_sb_inodes()
spin_lock(&inode_lock);
__iget(inode);
__writeback_single_inode
// see non zero i_count
may WARN here ==> WARN_ON(inode->i_state & I_WILL_FREE);
spin_unlock(&inode_lock);
may call generic_forget_inode again ==> iput(inode);

The above race and warning didn't turn up because writeback_inodes() holds
the s_umount lock, so generic_forget_inode() finds MS_ACTIVE and returns
early. But we are not sure the UBIFS calls and future callers will
guarantee that. So skip I_WILL_FREE inodes for the sake of safety.

Cc: Eric Sandeen
Acked-by: Jeff Layton
Cc: Masayoshi MIZUMA
Signed-off-by: Wu Fengguang
Cc: Artem Bityutskiy
Cc: Christoph Hellwig
Acked-by: Jan Kara
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:45 +0800

12 Jun, 2009

3 commits

4195f73d1 fs: block_dump missing dentry locking ... Browse Code »

I think the block_dump output in __mark_inode_dirty is missing dentry locking.
Surely the i_dentry list can change any time, so we may not even *get* a
dentry there. If we do get one by chance, then it would appear to be able to
go away or get renamed at any time...

Signed-off-by: Al Viro

Nick Piggin
2009-06-12 09:36:10 +0800
545b9fd3d fs: remove incorrect I_NEW warnings ... Browse Code »

Some filesystems can call in to sync an inode that is still in the
I_NEW state (eg. ext family, when mounted with -osync). This is OK
because the filesystem has sole access to the new inode, so it can
modify i_state without races (because no other thread should be
modifying it, by definition of I_NEW). Ie. a false positive, so
remove the warnings.

The races are described here 7ef0d7377cb287e08f3ae94cebc919448e1f5dff,
which is also where the warnings were introduced.

Reported-by: Stephen Hemminger
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro

Nick Piggin
2009-06-12 09:36:10 +0800
5cee5815d vfs: Make sys_sync() use fsync_super() (version 4) ... Browse Code »

It is unnecessarily fragile to have two places (fsync_super() and do_sync())
doing data integrity sync of the filesystem. Alter __fsync_super() to
accommodate needs of both callers and use it. So after this patch
__fsync_super() is the only place where we gather all the calls needed to
properly send all data on a filesystem to disk.

Nice bonus is that we get a complete livelock avoidance and write_supers()
is now only used for periodic writeback of superblocks.

sync_blockdevs() introduced a couple of patches ago is gone now.

[build fixes folded]

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2009-06-12 09:36:03 +0800

04 Apr, 2009

1 commit

811158b14 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits)
trivial: Update my email address
trivial: NULL noise: drivers/mtd/tests/mtd_*test.c
trivial: NULL noise: drivers/media/dvb/frontends/drx397xD_fw.h
trivial: Fix misspelling of "Celsius".
trivial: remove unused variable 'path' in alloc_file()
trivial: fix a pdlfush -> pdflush typo in comment
trivial: jbd header comment typo fix for JBD_PARANOID_IOFAIL
trivial: wusb: Storage class should be before const qualifier
trivial: drivers/char/bsr.c: Storage class should be before const qualifier
trivial: h8300: Storage class should be before const qualifier
trivial: fix where cgroup documentation is not correctly referred to
trivial: Give the right path in Documentation example
trivial: MTD: remove EOL from MODULE_DESCRIPTION
trivial: Fix typo in bio_split()'s documentation
trivial: PWM: fix of #endif comment
trivial: fix typos/grammar errors in Kconfig texts
trivial: Fix misspelling of firmware
trivial: cgroups: documentation typo and spelling corrections
trivial: Update contact info for Jochen Hein
trivial: fix typo "resgister" -> "register"
...

Linus Torvalds
2009-04-04 06:24:35 +0800

03 Apr, 2009

2 commits

d2caa3c54 writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #3) ... Browse Code »

The dirtied_when value on an inode is supposed to represent the first time
that an inode has one of its pages dirtied. This value is in units of
jiffies. It's used in several places in the writeback code to determine
when to write out an inode.

The problem is that these checks assume that dirtied_when is updated
periodically. If an inode is continuously being used for I/O it can be
persistently marked as dirty and will continue to age. Once the time
compared to is greater than or equal to half the maximum of the jiffies
type, the logic of the time_*() macros inverts and the opposite of what is
needed is returned. On 32-bit architectures that's just under 25 days
(assuming HZ == 1000).

As the least-recently dirtied inode, it'll end up being the first one that
pdflush will try to write out. sync_sb_inodes does this check:

/* Was this inode dirtied after sync_sb_inodes was called? */
if (time_after(inode->dirtied_when, start))
break;

...but now dirtied_when appears to be in the future. sync_sb_inodes bails
out without attempting to write any dirty inodes. When this occurs,
pdflush will stop writing out inodes for this superblock. Nothing can
unwedge it until jiffies moves out of the problematic window.

This patch fixes this problem by changing the checks against dirtied_when
to also check whether it appears to be in the future. If it does, then we
consider the value to be far in the past.

This should shrink the problematic window of time to such a small period
(30s) as not to matter.

Signed-off-by: Jeff Layton
Signed-off-by: Wu Fengguang
Acked-by: Ian Kent
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Layton
2009-04-03 10:04:48 +0800
b6fac63cc vfs: skip I_CLEAR state inodes ... Browse Code »

clear_inode() will switch inode state from I_FREEING to I_CLEAR, and do so
_outside_ of inode_lock. So any I_FREEING testing is incomplete without a
coupled testing of I_CLEAR.

So add I_CLEAR tests to drop_pagecache_sb(), generic_sync_sb_inodes() and
add_dquot_ref().

Masayoshi MIZUMA discovered the bug in drop_pagecache_sb() and Jan Kara
reminds fixing the other two cases.

Masayoshi MIZUMA has a nice panic flow:

=====================================================================
[process A] | [process B]
| |
| prune_icache() | drop_pagecache()
| spin_lock(&inode_lock) | drop_pagecache_sb()
| inode->i_state |= I_FREEING; | |
| spin_unlock(&inode_lock) | V
| | | spin_lock(&inode_lock)
| V | |
| dispose_list() | |
| list_del() | |
| clear_inode() | |
| inode->i_state = I_CLEAR | |
| | | V
| | | if (inode->i_state & (I_FREEING|I_WILL_FREE))
| | | continue;
Reviewed-by: Jan Kara
Signed-off-by: Wu Fengguang
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-04-03 10:04:48 +0800

30 Mar, 2009

1 commit

3e3cb64f6 trivial: fix a pdlfush -> pdflush typo in comment ... Browse Code »

Signed-off-by: Masatake YAMATO
Signed-off-by: Jiri Kosina

Masatake YAMATO
2009-03-30 21:22:03 +0800

13 Mar, 2009

1 commit

7ef0d7377 fs: new inode i_state corruption fix ... Browse Code »

There was a report of a data corruption
http://lkml.org/lkml/2008/11/14/121. There is a script included to
reproduce the problem.

During testing, I encountered a number of strange things with ext3, so I
tried ext2 to attempt to reduce complexity of the problem. I found that
fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
cleared, even though instrumentation showed that unlock_new_inode had
already been called for that inode. This points to memory scribble, or
synchronisation problme.

i_state of I_NEW inodes is not protected by inode_lock because other
processes are not supposed to touch them until I_LOCK (and I_NEW) is
cleared. Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
i_state revealed that generic_sync_sb_inodes is picking up new inodes from
the inode lists and passing them to __writeback_single_inode without
waiting for I_NEW. Subsequently modifying i_state causes corruption. In
my case it would look like this:

CPU0 CPU1
unlock_new_inode() __sync_single_inode()
reg i_state
reg -> reg & ~(I_LOCK|I_NEW) reg i_state
reg -> inode->i_state reg -> reg | I_SYNC
reg -> inode->i_state

Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.

Fix for this is rather than wait for I_NEW inodes, just skip over them:
inodes concurrently being created are not subject to data integrity
operations, and should not significantly contribute to dirty memory
either.

After this change, I'm unable to reproduce any of the added warnings or
hangs after ~1hour of running. Previously, the new warnings would start
immediately and hang would happen in under 5 minutes.

I'm also testing on ext3 now, and so far no problems there either. I
don't know whether this fixes the problem reported above, but it fixes a
real problem for me.

Cc: "Jorge Boncompte [DTI2]"
Reported-by: Adrian Hunter
Cc: Jan Kara
Cc:
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2009-03-13 07:20:24 +0800

07 Jan, 2009

3 commits

856bf4d71 fs: sys_sync fix ... Browse Code »

s_syncing livelock avoidance was breaking data integrity guarantee of
sys_sync, by allowing sys_sync to skip writing or waiting for superblocks
if there is a concurrent sys_sync happening.

This livelock avoidance is much less important now that we don't have the
get_super_to_sync() call after every sb that we sync. This was replaced
by __put_super_and_need_restart.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2009-01-07 07:59:09 +0800
38f219776 fs: sync_sb_inodes fix ... Browse Code »

Fix data integrity semantics required by sys_sync, by iterating over all
inodes and waiting for any writeback pages after the initial writeout.
Comments explain the exact problem.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2009-01-07 07:59:09 +0800
4f5a99d64 fs: remove WB_SYNC_HOLD ... Browse Code »

Remove WB_SYNC_HOLD. The primary motiviation is the design of my
anti-starvation code for fsync. It requires taking an inode lock over the
sync operation, so we could run into lock ordering problems with multiple
inodes. It is possible to take a single global lock to solve the ordering
problem, but then that would prevent a future nice implementation of "sync
multiple inodes" based on lock order via inode address.

Seems like a backward step to remove this, but actually it is busted
anyway: we can't use the inode lists for data integrity wait: an inode can
be taken off the dirty lists but still be under writeback. In order to
satisfy data integrity semantics, we should wait for it to finish
writeback, but if we only search the dirty lists, we'll miss it.

It would be possible to have a "writeback" list, for sys_sync, I suppose.
But why complicate things by prematurely optimise? For unmounting, we
could avoid the "livelock avoidance" code, which would be easier, but
again premature IMO.

Fixing the existing data integrity problem will come next.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2009-01-07 07:59:09 +0800

17 Oct, 2008

1 commit

e1f8e8744 Remove Andrew Morton's old email accounts ... Browse Code »

People can use the real name an an index into MAINTAINERS to find the
current email address.

Signed-off-by: Francois Cami
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Francois Cami
2008-10-17 02:21:32 +0800

15 Jul, 2008

2 commits

4ee6afd34 VFS: export sync_sb_inodes ... Browse Code »

This patch exports the 'sync_sb_inodes()' which is needed for
UBIFS because it has to force write-back from time to time.
Namely, the UBIFS budgeting subsystem forces write-back when
its pessimistic callculations show that there is no free
space on the media.

Signed-off-by: Artem Bityutskiy

Artem Bityutskiy
2008-07-15 00:10:52 +0800
ae8547b0a VFS: move inode_lock into sync_sb_inodes ... Browse Code »

This patch makes 'sync_sb_inodes()' lock 'inode_lock', rather
than expect that the caller will do this.

This change was previously done by Hans Reiser
and sat in the -mm tree.

Signed-off-by: Artem Bityutskiy

Hans Reiser
2008-07-15 00:10:52 +0800

29 Apr, 2008

1 commit

f11b00f3b fs/fs-writeback.c: make 2 functions static ... Browse Code »

Make the following needlessly global functions static:

- writeback_acquire()
- writeback_release()

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2008-04-29 23:06:00 +0800

20 Mar, 2008

1 commit

a6b91919e fs: fix kernel-doc notation warnings ... Browse Code »

Fix kernel-doc notation warnings in fs/.

Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line:
* mark_files_ro
Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
* lease_get_mtime
Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
* lease_get_mtime
Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line:
* lookup_one_len: filesystem helper to lookup single pathname component
Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line:
* bh_uptodate_or_lock: Test whether the buffer is uptodate
Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line:
* bh_submit_read: Submit a locked buffer for reading
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line:
* writeback_acquire: attempt to get exclusive writeback access to a device
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line:
* writeback_in_progress: determine whether there is writeback in progress
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line:
* writeback_release: relinquish exclusive writeback access against a device.
Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections
Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections
Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line:
* void journal_invalidatepage()

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2008-03-20 09:53:36 +0800

09 Feb, 2008

1 commit

18914b188 write_inode_now(): avoid unnecessary synchronous write ... Browse Code »

We shouldn't use WB_SYNC_ALL if the caller is asking for asynchronous
treatment.

Signed-off-by: Mike Galbraith
Acked-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Galbraith
2008-02-09 01:22:34 +0800

07 Feb, 2008

1 commit

797074e44 fs: use list_for_each_entry_reverse and kill sb_entry ... Browse Code »

Use list_for_each_entry_reverse for super_blocks list and remove
unused sb_entry macro.

Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2008-02-07 02:41:05 +0800

06 Feb, 2008

2 commits

8bc3be275 writeback: speed up writeback of big dirty files ... Browse Code »

After making dirty a 100M file, the normal behavior is to start the
writeback for all data after 30s delays. But sometimes the following
happens instead:

- after 30s: ~4M
- after 5s: ~4M
- after 5s: all remaining 92M

Some analyze shows that the internal io dispatch queues goes like this:

s_io s_more_io
-------------------------
1) 100M,1K 0
2) 1K 96M
3) 0 96M
1) initial state with a 100M file and a 1K file

2) 4M written, nr_to_write 0, no more writes(BUG)

nr_to_write > 0 in (3) fools the upper layer to think that data have all
been written out. The big dirty file is actually still sitting in
s_more_io. We cannot simply splice s_more_io back to s_io as soon as s_io
becomes empty, and let the loop in generic_sync_sb_inodes() continue: this
may starve newly expired inodes in s_dirty. It is also not an option to
draw inodes from both s_more_io and s_dirty, an let the loop go on: this
might lead to live locks, and might also starve other superblocks in sync
time(well kupdate may still starve some superblocks, that's another bug).

We have to return when a full scan of s_io completes. So nr_to_write > 0
does not necessarily mean that "all data are written". This patch
introduces a flag writeback_control.more_io to indicate that more io should
be done. With it the big dirty file no longer has to wait for the next
kupdate invokation 5s later.

In sync_sb_inodes() we only set more_io on super_blocks we actually
visited. This avoids the interaction between two pdflush deamons.

Also in __sync_single_inode() we don't blindly keep requeuing the io if the
filesystem cannot progress. Failing to do so may lead to 100% iowait.

Tested-by: Mike Snitzer
Signed-off-by: Fengguang Wu
Cc: Michael Rubin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fengguang Wu
2008-02-06 01:44:19 +0800
2d544564f skip writing data pages when inode is under I_SYNC ... Browse Code »

Since I_SYNC was split out from I_LOCK, the concern in commit
4b89eed93e0fa40a63e3d7b1796ec1337ea7a3aa ("Write back inode data pages
even when the inode itself is locked") is not longer valid.

We should revert to the original behavior: in __writeback_single_inode(),
when we find an I_SYNC-ed inode and we're not doing a data-integrity sync,
skip writing entirely. Otherwise, we are double calling do_writepages()

Signed-off-by: Qi Yong
Cc: Peter Zijlstra
Cc: Hugh Dickins
Cc: Joern Engel
Cc: WU Fengguang
Cc: Michael Rubin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Qi Yong
2008-02-06 01:44:18 +0800