10 Aug, 2010
1 commit
-
We try to avoid livelocks of writeback when some steadily creates dirty
pages in a mapping we are writing out. For memory-cleaning writeback,
using nr_to_write works reasonably well but we cannot really use it for
data integrity writeback. This patch tries to solve the problem.The idea is simple: Tag all pages that should be written back with a
special tag (TOWRITE) in the radix tree. This can be done rather quickly
and thus livelocks should not happen in practice. Then we start doing the
hard work of locking pages and sending them to disk only for those pages
that have TOWRITE tag set.Note: Adding new radix tree tag grows radix tree node from 288 to 296
bytes for 32-bit archs and from 552 to 560 bytes for 64-bit archs.
However, the number of slab/slub items per page remains the same (13 and 7
respectively).Signed-off-by: Jan Kara
Cc: Dave Chinner
Cc: Nick Piggin
Cc: Chris Mason
Cc: Theodore Ts'o
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 Jul, 2010
1 commit
-
This was just an odd wrapper around writeback_inodes_wb. Removing this
also allows to get rid of the bdi member of struct writeback_control
which was rather out of place there.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
11 Jun, 2010
1 commit
-
bdi_start_writeback now never gets a superblock passed, so we can just remove
that case. And to further untangle the code and flatten the call stack
split it into two trivial helpers for it's two callers.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
09 Jun, 2010
2 commits
-
sync can currently take a really long time if a concurrent writer is
extending a file. The problem is that the dirty pages on the address
space grow in the same direction as write_cache_pages scans, so if
the writer keeps ahead of writeback, the writeback will not
terminate until the writer stops adding dirty pages.For a data integrity sync, we only need to write the pages dirty at
the time we start the writeback, so we can stop scanning once we get
to the page that was at the end of the file at the time the scan
started.This will prevent operations like copying a large file preventing
sync from completing as it will not write back pages that were
dirtied after the sync was started. This does not impact the
existing integrity guarantees, as any dirty page (old or new)
within the EOF range at the start of the scan will still be
captured.This patch will not prevent sync from blocking on large writes into
holes. That requires more complex intervention while this patch only
addresses the common append-case of this sync holdoff.Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Linus Torvalds -
If a filesystem writes more than one page in ->writepage, write_cache_pages
fails to notice this and continues to attempt writeback when wbc->nr_to_write
has gone negative - this trace was captured from XFS:wbc_writeback_start: towrt=1024
wbc_writepage: towrt=1024
wbc_writepage: towrt=0
wbc_writepage: towrt=-1
wbc_writepage: towrt=-5
wbc_writepage: towrt=-21
wbc_writepage: towrt=-85This has adverse effects on filesystem writeback behaviour. write_cache_pages()
needs to terminate after a certain number of pages are written, not after a
certain number of calls to ->writepage are made. This is a regression
introduced by 17bc6c30cf6bfffd816bdc53682dd46fc34a2cf4 ("vfs: Add
no_nrwrite_index_update writeback control flag"), but cannot be reverted
directly due to subsequent bug fixes that have gone in on top of it.Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Linus Torvalds
01 Jun, 2010
1 commit
-
This reverts commit e913fc825dc685a444cb4c1d0f9d32f372f59861.
We are investigating a hang associated with the WB_SYNC_NONE changes,
so revert them for now.Conflicts:
fs/fs-writeback.c
mm/page-writeback.cSigned-off-by: Jens Axboe
22 May, 2010
3 commits
-
The laptop mode timer had the nr_pages and sb_locked arguments
mixed up.Signed-off-by: Jens Axboe
-
When CONFIG_BLOCK isn't enabled:
mm/page-writeback.c: In function 'laptop_mode_timer_fn':
mm/page-writeback.c:708: error: dereferencing pointer to incomplete type
mm/page-writeback.c:709: error: dereferencing pointer to incomplete typeFix this by essentially eliminating the laptop sync handlers when
CONFIG_BLOCK isn't set, as most are only used from the block layer code.
The exception is laptop_sync_completion() which is used from sys_sync(),
make that an empty declaration in that case.Reported-by: Randy Dunlap
Signed-off-by: Jens Axboe -
Commit 69b62d01 fixed up most of the places where we would enter
busy schedule() spins when disabling the periodic background
writeback. This fixes up the sb timer so that it doesn't get
hammered on with the delay disabled, and ensures that it gets
rearmed if needed when /proc/sys/vm/dirty_writeback_centisecs
gets modified.bdi_forker_task() also needs to check for !dirty_writeback_centisecs
and use schedule() appropriately, fix that up too.Signed-off-by: Jens Axboe
17 May, 2010
1 commit
-
When umount calls sync_filesystem(), we first do a WB_SYNC_NONE
writeback to kick off writeback of pending dirty inodes, then follow
that up with a WB_SYNC_ALL to wait for it. Since umount already holds
the sb s_umount mutex, WB_SYNC_NONE ends up doing nothing and all
writeback happens as WB_SYNC_ALL. This can greatly slow down umount,
since WB_SYNC_ALL writeback is a data integrity operation and thus
a bigger hammer than simple WB_SYNC_NONE. For barrier aware file systems
it's a lot slower.Signed-off-by: Jens Axboe
06 Apr, 2010
1 commit
-
One of the features of laptop-mode is that it forces a writeout of dirty
pages if something else triggers a physical read or write from a device.
The current implementation flushes pages on all devices, rather than only
the one that triggered the flush. This patch alters the behaviour so that
only the recently accessed block device is flushed, preventing other
disks being spun up for no terribly good reason.Signed-off-by: Matthew Garrett
Signed-off-by: Jens Axboe
03 Dec, 2009
1 commit
-
- no one is calling wb_writeback and write_cache_pages with
wbc.nonblocking=1 any more
- lumpy pageout will want to do nonblocking writeback without the
congestion waitSo remove the congestion checks as suggested by Chris.
Signed-off-by: Wu Fengguang
Cc: Chris Mason
Cc: Jens Axboe
Cc: Trond Myklebust
Cc: Christoph Hellwig
Cc: Dave Chinner
Cc: Evgeniy Polyakov
Cc: Alex Elder
Signed-off-by: Jens Axboe
09 Oct, 2009
1 commit
-
It makes sense to do IOWAIT when someone is blocked
due to IO throttle, as suggested by Kame and Peter.There is an old comment for not doing IOWAIT on throttle,
however it has been mismatching the code for a long time.If we stop accounting IOWAIT for 2.6.32, it could be an
undesirable behavior change. So restore the io_schedule.CC: KAMEZAWA Hiroyuki
CC: Peter Zijlstra
Signed-off-by: Wu Fengguang
Signed-off-by: Jens Axboe
26 Sep, 2009
5 commits
-
Sometimes we only want to write pages from a specific super_block,
so allow that to be passed in.This fixes a problem with commit 56a131dcf7ed36c3c6e36bea448b674ea85ed5bb
causing writeback on all super_blocks on a bdi, where we only really
want to sync a specific sb from writeback_inodes_sb().Signed-off-by: Jens Axboe
-
* 'writeback' of git://git.kernel.dk/linux-2.6-block:
writeback: writeback_inodes_sb() should use bdi_start_writeback()
writeback: don't delay inodes redirtied by a fast dirtier
writeback: make the super_block pinning more efficient
writeback: don't resort for a single super_block in move_expired_inodes()
writeback: move inodes from one super_block together
writeback: get rid to incorrect references to pdflush in comments
writeback: improve readability of the wb_writeback() continue/break logic
writeback: cleanup writeback_single_inode()
writeback: kupdate writeback shall not stop when more io is possible
writeback: stop background writeback when below background threshold
writeback: balance_dirty_pages() shall write more than dirtied pages
fs: Fix busyloop in wb_writeback() -
Signed-off-by: Jens Axboe
-
Treat bdi_start_writeback(0) as a special request to do background write,
and stop such work when we are below the background dirty threshold.Also simplify the (nr_pages
CC: Jan Kara
Acked-by: Peter Zijlstra
Signed-off-by: Wu Fengguang
Signed-off-by: Jens Axboe -
Some filesystem may choose to write much more than ratelimit_pages
before calling balance_dirty_pages_ratelimited_nr(). So it is safer to
determine number to write based on real number of dirtied pages.Otherwise it is possible that
loop {
btrfs_file_write(): dirty 1024 pages
balance_dirty_pages(): write up to 48 pages (= ratelimit_pages * 1.5)
}
in which the writeback rate cannot keep up with dirty rate, and the
dirty pages go all the way beyond dirty_thresh.The increased write_chunk may make the dirtier more bumpy.
So filesystems shall be take care not to dirty too much at
a time (eg. > 4MB) without checking the ratelimit.Signed-off-by: Wu Fengguang
Acked-by: Peter Zijlstra
Signed-off-by: Jens Axboe
24 Sep, 2009
2 commits
-
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
HWPOISON: Enable error_remove_page on btrfs
HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
HWPOISON: Add madvise() based injector for hardware poisoned pages v4
HWPOISON: Enable error_remove_page for NFS
HWPOISON: Enable .remove_error_page for migration aware file systems
HWPOISON: The high level memory error handler in the VM v7
HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
HWPOISON: shmem: call set_page_dirty() with locked page
HWPOISON: Define a new error_remove_page address space op for async truncation
HWPOISON: Add invalidate_inode_page
HWPOISON: Refactor truncate to allow direct truncating of page v2
HWPOISON: check and isolate corrupted free pages v2
HWPOISON: Handle hardware poisoned pages in try_to_unmap
HWPOISON: Use bitmask/action code for try_to_unmap behaviour
HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
HWPOISON: Add poison check to page fault handling
HWPOISON: Add basic support for poisoned pages in fault handler v3
HWPOISON: Add new SIGBUS error codes for hardware poison signals
HWPOISON: Add support for poison swap entries v2
HWPOISON: Export some rmap vma locking to outside world
... -
It's unused.
It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.It _was_ used in two places at arch/frv for some reason.
Signed-off-by: Alexey Dobriyan
Cc: David Howells
Cc: "Eric W. Biederman"
Cc: Al Viro
Cc: Ralf Baechle
Cc: Martin Schwidefsky
Cc: Ingo Molnar
Cc: "David S. Miller"
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Sep, 2009
1 commit
-
global_lru_pages() / zone_lru_pages() can be used in two ways:
- to estimate max reclaimable pages in determine_dirtyable_memory()
- to calculate the slab scan ratioWhen swap is full or not present, the anon lru lists are not reclaimable
and also won't be scanned. So the anon pages shall not be counted in both
usage scenarios. Also rename to _reclaimable_pages: now they are counting
the possibly reclaimable lru pages.It can greatly (and correctly) increase the slab scan rate under high
memory pressure (when most file pages have been reclaimed and swap is
full/absent), thus reduce false OOM kills.Acked-by: Peter Zijlstra
Reviewed-by: Rik van Riel
Reviewed-by: Christoph Lameter
Reviewed-by: Minchan Kim
Cc: KOSAKI Motohiro
Signed-off-by: Wu Fengguang
Acked-by: Johannes Weiner
Reviewed-by: Minchan Kim
Reviewed-by: Jesse Barnes
Cc: David Howells
Cc: "Li, Ming Chun"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
21 Sep, 2009
2 commits
-
Currently it just sleeps for a very short time, just 1 jiffy. If
we keep looping in there, continually delay for a little longer
of up to 100msec in total. That was the old limit for congestion
wait.Signed-off-by: Jens Axboe
-
Just use schedule_timeout_interruptible(), saves a call to
set_current_state().Signed-off-by: Jens Axboe
16 Sep, 2009
5 commits
-
bdi_start_writeback() is currently split into two paths, one for
WB_SYNC_NONE and one for WB_SYNC_ALL. Add bdi_sync_writeback()
for WB_SYNC_ALL writeback and let bdi_start_writeback() handle
only WB_SYNC_NONE.Push down the writeback_control allocation and only accept the
parameters that make sense for each function. This cleans up
the API considerably.Signed-off-by: Jens Axboe
-
Now that bdi_writeback_all() no longer handles integrity writeback,
it doesn't have to block anymore. This means that we can switch
bdi_list reader side protection to RCU.Signed-off-by: Jens Axboe
-
It's only set, it's never checked. Kill it.
Acked-by: Jan Kara
Signed-off-by: Jens Axboe -
The dirtying of page and set_page_dirty() can be moved into the page lock.
- In shmem_write_end(), the page was dirtied while the page lock was held,
but it's being marked dirty just after dropping the page lock.
- In shmem_symlink(), both dirtying and marking can be moved into page lock.It's valuable for the hwpoison code to know whether one bad page can be dropped
without losing data. It mainly judges by testing the PG_dirty bit after taking
the page lock. So it becomes important that the dirtying of page and the
marking of dirtiness are both done inside the page lock. Which is a common
practice, but sadly not a rule.The noticeable exceptions are
- mapped pages
- pages with buffer_heads
The above pages could go dirty at any time. Fortunately the hwpoison will
unmap the page and release the buffer_heads beforehand anyway.Many other types of pages (eg. metadata pages) can also be dirtied at will by
their owners, the hwpoison code cannot do meaningful things to them anyway.
Only the dirtiness of pagecache pages owned by regular files are interested.v2: AK: Add comment about set_page_dirty rules (suggested by Peter Zijlstra)
Acked-by: Hugh Dickins
Reviewed-by: WANG Cong
Signed-off-by: Wu Fengguang
Signed-off-by: Andi Kleen -
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
powerpc64: convert to dynamic percpu allocator
sparc64: use embedding percpu first chunk allocator
percpu: kill lpage first chunk allocator
x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
percpu: update embedding first chunk allocator to handle sparse units
percpu: use group information to allocate vmap areas sparsely
vmalloc: implement pcpu_get_vm_areas()
vmalloc: separate out insert_vmalloc_vm()
percpu: add chunk->base_addr
percpu: add pcpu_unit_offsets[]
percpu: introduce pcpu_alloc_info and pcpu_group_info
percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
percpu: add @align to pcpu_fc_alloc_fn_t
percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
percpu: drop @static_size from first chunk allocators
percpu: generalize first chunk allocator selection
percpu: build first chunk allocators selectively
percpu: rename 4k first chunk allocator to page
percpu: improve boot messages
percpu: fix pcpu_reclaim() locking
...Fix trivial conflict as by Tejun Heo in kernel/sched.c
11 Sep, 2009
2 commits
-
This gets rid of pdflush for bdi writeout and kupdated style cleaning.
pdflush writeout suffers from lack of locality and also requires more
threads to handle the same workload, since it has to work in a
non-blocking fashion against each queue. This also introduces lumpy
behaviour and potential request starvation, since pdflush can be starved
for queue access if others are accessing it. A sample ffsb workload that
does random writes to files is about 8% faster here on a simple SATA drive
during the benchmark phase. File layout also seems a LOT more smooth in
vmstat:r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42
0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44
1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37
0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58
0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34
0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37
0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44
0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38
0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41
0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45where vanilla tends to fluctuate a lot in the creation phase:
r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36
1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51
0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40
0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37
1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41
0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49
0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36
1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43
0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39
1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45
1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34
0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
SSD based writeback test on XFS performs over 20% better as well, with
the throughput being very stable around 1GB/sec, where pdflush only
manages 750MB/sec and fluctuates wildly while doing so. Random buffered
writes to many files behave a lot better as well, as does random mmap'ed
writes.A separate thread is added to sync the super blocks. In the long term,
adding sync_supers_bdi() functionality could get rid of this thread again.Signed-off-by: Jens Axboe
-
This is a first step at introducing per-bdi flusher threads. We should
have no change in behaviour, although sb_has_dirty_inodes() is now
ridiculously expensive, as there's no easy way to answer that question.
Not a huge problem, since it'll be deleted in subsequent patches.Signed-off-by: Jens Axboe
14 Aug, 2009
1 commit
-
Conflicts:
arch/sparc/kernel/smp_64.c
arch/x86/kernel/cpu/perf_counter.c
arch/x86/kernel/setup_percpu.c
drivers/cpufreq/cpufreq_ondemand.c
mm/percpu.cConflicts in core and arch percpu codes are mostly from commit
ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all
the first chunk allocators into mm/percpu.c, the changes are moved
from arch code to mm/percpu.c.Signed-off-by: Tejun Heo
11 Jul, 2009
1 commit
-
Commit 1faa16d22877f4839bd433547d770c676d1d964c accidentally broke
the bdi congestion wait queue logic, causing us to wait on congestion
for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead.Signed-off-by: Jens Axboe
04 Jul, 2009
1 commit
-
Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix
changes. As alpha in percpu tree uses 'weak' attribute instead of
inline assembly, there's no need for __used attribute.Conflicts:
arch/alpha/include/asm/percpu.h
arch/mn10300/kernel/vmlinux.lds.S
include/linux/percpu-defs.h
01 Jul, 2009
1 commit
-
balance_dirty_pages can overreact and move all of the dirty pages to
writeback unnecessarily.balance_dirty_pages makes its decision to throttle based on the number of
dirty plus writeback pages that are over the calculated limit,so it will
continue to move pages even when there are plenty of pages in writeback
and less than the threshold still dirty.This allows it to overshoot its limits and move all the dirty pages to
writeback while waiting for the drives to catch up and empty the writeback
list.A simple fio test easily demonstrates this problem.
fio --name=f1 --directory=/disk1 --size=2G -rw=write --name=f2 --directory=/disk2 --size=1G --rw=write --startdelay=10
This is the simplest fix I could find, but I'm not entirely sure that it
alone will be enough for all cases. But it certainly is an improvement on
my desktop machine writing to 2 disks.Do we need something more for machines with large arrays where
bdi_threshold * number_of_drives is greater than the dirty_ratio ?Signed-off-by: Richard Kennedy
Acked-by: Peter Zijlstra
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Jun, 2009
1 commit
-
Percpu variable definition is about to be updated such that all percpu
symbols including the static ones must be unique. Update percpu
variable definitions accordingly.* as,cfq: rename ioc_count uniquely
* cpufreq: rename cpu_dbs_info uniquely
* xen: move nesting_count out of xen_evtchn_do_upcall() and rename it
* mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and
rename it* ipv4,6: rename cookie_scratch uniquely
* x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to
pmc_irq_entry and nmi_entry to pmc_nmi_entry* perf_counter: rename disable_count to perf_disable_count
* ftrace: rename test_event_disable to ftrace_test_event_disable
* kmemleak: rename test_pointer to kmemleak_test_pointer
* mce: rename next_interval to mce_next_interval
[ Impact: percpu usage cleanups, no duplicate static percpu var names ]
Signed-off-by: Tejun Heo
Reviewed-by: Christoph Lameter
Cc: Ivan Kokshaysky
Cc: Jens Axboe
Cc: Dave Jones
Cc: Jeremy Fitzhardinge
Cc: linux-mm
Cc: David S. Miller
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Li Zefan
Cc: Catalin Marinas
Cc: Andi Kleen
17 Jun, 2009
1 commit
-
get_dirty_limits() calls clip_bdi_dirty_limit() and task_dirty_limit()
with variable pbdi_dirty as one of the arguments. This variable is an
unsigned long * but both functions expect it to be a long *. This causes
the following sparse warnings:warning: incorrect type in argument 3 (different signedness)
expected long *pbdi_dirty
got unsigned long *pbdi_dirty
warning: incorrect type in argument 2 (different signedness)
expected long *pdirty
got unsigned long *pbdi_dirtyFix the warnings by changing the long * to unsigned long * in both
functions.Signed-off-by: H Hartley Sweeten
Cc: Johannes Weiner
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 May, 2009
1 commit
-
wb_kupdate() function has a bug on linux-2.6.30-rc5. This bug causes
generic_sync_sb_inodes() to start to write inodes back much earlier than
our expectations because it miscalculates oldest_jif in wb_kupdate().This bug was introduced in 704503d836042d4a4c7685b7036e7de0418fbc0f
('mm: fix proc_dointvec_userhz_jiffies "breakage"').Signed-off-by: Toshiyuki Okajima
Cc: Alexey Dobriyan
Cc: Peter Zijlstra
Cc: Nick Piggin
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Apr, 2009
2 commits
-
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9838
On i386, HZ=1000, jiffies_to_clock_t() converts time in a somewhat strange
way from the user's point of view:# echo 500 >/proc/sys/vm/dirty_writeback_centisecs
# cat /proc/sys/vm/dirty_writeback_centisecs
499So, we have 5000 jiffies converted to only 499 clock ticks and reported
back.TICK_NSEC = 999848
ACTHZ = 256039Keeping in-kernel variable in units passed from userspace will fix issue
of course, but this probably won't be right for every sysctl.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan
Cc: Peter Zijlstra
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add a helper function account_page_dirtied(). Use that from two
callsites. reiser4 adds a function which adds a third callsite.Signed-off-by: Edward Shishkin
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Mar, 2009
1 commit
-
Enlarge default dirty ratios from 5/10 to 10/20. This fixes [Bug
#12809] iozone regression with 2.6.29-rc6.The iozone benchmarks are performed on a 1200M file, with 8GB ram.
iozone -i 0 -i 1 -i 2 -i 3 -i 4 -r 4k -s 64k -s 512m -s 1200m -b tmp.xls
iozone -B -r 4k -s 64k -s 512m -s 1200m -b tmp.xlsThe performance regression is triggered by commit 1cf6e7d83bf3(mm: task
dirty accounting fix), which makes more correct/thorough dirty
accounting.The default 5/10 dirty ratios were picked (a) with the old dirty logic
and (b) largely at random and (c) designed to be aggressive. In
particular, that (a) means that having fixed some of the dirty
accounting, maybe the real bug is now that it was always too aggressive,
just hidden by an accounting issue.The enlarged 10/20 dirty ratios are just about enough to fix the regression.
[ We will have to look at how this affects the old fsync() latency issue,
but that probably will need independent work. - Linus ]Cc: Nick Piggin
Cc: Peter Zijlstra
Reported-by: "Lin, Ming M"
Tested-by: "Lin, Ming M"
Signed-off-by: Wu Fengguang
Signed-off-by: Linus Torvalds