04 Jul, 2013
1 commit
-
Page reclaim keeps track of dirty and under writeback pages and uses it
to determine if wait_iff_congested() should stall or if kswapd should
begin writing back pages. This fails to account for buffer pages that
can be under writeback but not PageWriteback which is the case for
filesystems like ext3 ordered mode. Furthermore, PageDirty buffer pages
can have all the buffers clean and writepage does no IO so it should not
be accounted as congested.This patch adds an address_space operation that filesystems may
optionally use to check if a page is really dirty or really under
writeback. An implementation is provided for for buffer_heads is added
and used for block operations and ext3 in ordered mode. By default the
page flags are obeyed.Credit goes to Jan Kara for identifying that the page flags alone are
not sufficient for ext3 and sanity checking a number of ideas on how the
problem could be addressed.Signed-off-by: Mel Gorman
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Rik van Riel
Cc: KAMEZAWA Hiroyuki
Cc: Jiri Slaby
Cc: Valdis Kletnieks
Cc: Zlatko Calusic
Cc: dormando
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 May, 2013
2 commits
-
->invalidatepage() aop now accepts range to invalidate so we can make
use of it in journal_invalidatepage() and all the users in ext3 file
system. Also update ext3 trace point to print out length argument.Signed-off-by: Lukas Czerner
Reviewed-by: Jan Kara -
Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.Signed-off-by: Lukas Czerner
Cc: Andrew Morton
Cc: Hugh Dickins
08 May, 2013
1 commit
-
Faster kernel compiles by way of fewer unnecessary includes.
[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Mar, 2013
1 commit
-
In data=journal mode, if we unmount the file system before a
transaction has a chance to complete, when the journal inode is being
evicted, we can end up calling into log_wait_commit() for the
last transaction, after the journalling machinery has been shut down.
That triggers the WARN_ONCE in __log_start_commit().Arguably we should adjust ext3_should_journal_data() to return FALSE
for the journal inode, but the only place it matters is
ext3_evict_inode(), and so it's to save a bit of CPU time, and to make
the patch much more obviously correct by inspection(tm), we'll fix it
by explicitly not trying to waiting for a journal commit when we are
evicting the journal inode, since it's guaranteed to never succeed in
this case.This can be easily replicated via:
mount -t ext3 -o data=journal /dev/vdb /vdb ; umount /vdb
This is a port of ext4 fix from Ted Ts'o.
Signed-off-by: Jan Kara
21 Jan, 2013
3 commits
-
It will be better to use ENOMEM rather than EIO, because the only
reason that sb_getblk fails is that allocation fails.Signed-off-by: Wang Shilong
Signed-off-by: Jan Kara -
Because the function 'sb_getblk' seldomly fails to return
NULL value,it will be better to use unlikely to check it.Signed-off-by: Wang Shilong
Signed-off-by: Jan Kara -
As we know io error may happen when the function 'sb_getblk'
is called.Add necessary check for itThe patch also fix a coding style problem.
Signed-off-by: Wang Shilong
Signed-off-by: Jan Kara
13 Dec, 2012
1 commit
-
Just use WARN_ON rather than an if containing only WARN_ON(1).
A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)//
@@
expression e;
@@
- if (e) WARN_ON(1);
+ WARN_ON(e);
//Signed-off-by: Julia Lawall
Signed-off-by: Jan Kara
02 Oct, 2012
1 commit
-
Pull the trivial tree from Jiri Kosina:
"Tiny usual fixes all over the place"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
doc: fix old config name of kprobetrace
fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldoc
btrfs: fix the commment for the action flags in delayed-ref.h
btrfs: fix trivial typo for the comment of BTRFS_FREE_INO_OBJECTID
vfs: fix kerneldoc for generic_fh_to_parent()
treewide: fix comment/printk/variable typos
ipr: fix small coding style issues
doc: fix broken utf8 encoding
nfs: comment fix
platform/x86: fix asus_laptop.wled_type module parameter
mfd: printk/comment fixes
doc: getdelays.c: remember to close() socket on error in create_nl_socket()
doc: aliasing-test: close fd on write error
mmc: fix comment typos
dma: fix comments
spi: fix comment/printk typos in spi
Coccinelle: fix typo in memdup_user.cocci
tmiofb: missing NULL pointer checks
tools: perf: Fix typo in tools/perf
tools/testing: fix comment / output typos
...
04 Sep, 2012
1 commit
-
Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.CC: # >= 2.6.32
Reported-by: Kristian Nielsen
Signed-off-by: Jan Kara
02 Sep, 2012
1 commit
-
Signed-off-by: Anatol Pomozov
Signed-off-by: Jiri Kosina
04 Aug, 2012
1 commit
-
The '->write_super' superblock method is gone, and this patch removes all the
references to 'write_super' from ext3.Cc: Jan Kara
Cc: Andrew Morton
Cc: Andreas Dilger
Signed-off-by: Artem Bityutskiy
Signed-off-by: Al Viro
29 May, 2012
1 commit
-
Pull writeback tree from Wu Fengguang:
"Mainly from Jan Kara to avoid iput() in the flusher threads."* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Avoid iput() from flusher thread
vfs: Rename end_writeback() to clear_inode()
vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
writeback: Refactor writeback_single_inode()
writeback: Remove wb->list_lock from writeback_single_inode()
writeback: Separate inode requeueing after writeback
writeback: Move I_DIRTY_PAGES handling
writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
writeback: Move clearing of I_SYNC into inode_sync_complete()
writeback: initialize global_dirty_limit
fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
mm: page-writeback.c: local functions should not be exposed globally
16 May, 2012
1 commit
-
Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman
06 May, 2012
1 commit
-
After we moved inode_sync_wait() from end_writeback() it doesn't make sense
to call the function end_writeback() anymore. Rename it to clear_inode()
which well says what the function really does - set I_CLEAR flag.Signed-off-by: Jan Kara
Signed-off-by: Fengguang Wu
01 Apr, 2012
1 commit
-
Signed-off-by: Al Viro
01 Mar, 2012
1 commit
-
Currently ext3 updates ctime in ext3_splice_branch() which is called whenever
we allocate one block. But it is wasteful because ext3 doesn't support
nanosecond timestamp. This leads to a performance loss.Signed-off-by: Kazuya Mio
Signed-off-by: Jan Kara
10 Jan, 2012
1 commit
-
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2/3/4: delete unneeded includes of module.h
ext{3,4}: Fix potential race when setversion ioctl updates inode
udf: Mark LVID buffer as uptodate before marking it dirty
ext3: Don't warn from writepage when readonly inode is spotted after error
jbd: Remove j_barrier mutex
reiserfs: Force inode evictions before umount to avoid crash
reiserfs: Fix quota mount option parsing
udf: Treat symlink component of type 2 as /
udf: Fix deadlock when converting file from in-ICB one to normal one
udf: Cleanup calling convention of inode_getblk()
ext2: Fix error handling on inode bitmap corruption
ext3: Fix error handling on inode bitmap corruption
ext3: replace ll_rw_block with other functions
ext3: NULL dereference in ext3_evict_inode()
jbd: clear revoked flag on buffers before a new transaction started
ext3: call ext3_mark_recovery_complete() when recovery is really needed
09 Jan, 2012
3 commits
-
Delete any instances of include module.h that were not strictly
required. In the case of ext2, the declaration of MODULE_LICENSE
etc. were in inode.c but the module_init/exit were in super.c, so
relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it
consistent with ext3 and ext4 at the same time.Signed-off-by: Paul Gortmaker
Signed-off-by: Jan Kara -
WARN_ON_ONCE(IS_RDONLY(inode)) tends to trip when filesystem hits error and is
remounted read-only. This unnecessarily scares users (well, they should be
scared because of filesystem error, but the stack trace distracts them from the
right source of their fear ;-). We could as well just remove the WARN_ON but
it's not hard to fix it to not trip on filesystem with errors and not use more
cycles in the common case so that's what we do.CC: stable@kernel.org
Signed-off-by: Jan Kara -
ll_rw_block() is deprecated. Thus we replace it with other functions.
CC: Jan Kara
Signed-off-by: Zheng Liu
Signed-off-by: Jan Kara
02 Dec, 2011
1 commit
-
The below patch fixes some typos in various parts of the kernel, as well as fixes some comments.
Please let me know if I missed anything, and I will try to get it changed and resent.Signed-off-by: Justin P. Mattock
Acked-by: Randy Dunlap
Signed-off-by: Jiri Kosina
22 Nov, 2011
1 commit
-
This is an fsfuzzer bug. ->s_journal is set at the end of
ext3_load_journal() but we try to use it in the error handling from
ext3_get_journal() while it's still NULL.[ 337.039041] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
[ 337.040380] IP: [] _raw_spin_lock+0x9/0x30
[ 337.041687] PGD 0
[ 337.043118] Oops: 0002 [#1] SMP
[ 337.044483] CPU 3
[ 337.044495] Modules linked in: ecb md4 cifs fuse kvm_intel kvm brcmsmac brcmutil crc8 cordic r8169 [last unloaded: scsi_wait_scan]
[ 337.047633]
[ 337.049259] Pid: 8308, comm: mount Not tainted 3.2.0-rc2-next-20111121+ #24 SAMSUNG ELECTRONICS CO., LTD. RV411/RV511/E3511/S3511 /RV411/RV511/E3511/S3511
[ 337.051064] RIP: 0010:[] [] _raw_spin_lock+0x9/0x30
[ 337.052879] RSP: 0018:ffff8800b1d11ae8 EFLAGS: 00010282
[ 337.054668] RAX: 0000000000000100 RBX: 0000000000000000 RCX: ffff8800b77c2000
[ 337.056400] RDX: ffff8800a97b5c00 RSI: 0000000000000000 RDI: 0000000000000024
[ 337.058099] RBP: ffff8800b1d11ae8 R08: 6000000000000000 R09: e018000000000000
[ 337.059841] R10: ff67366cc2607c03 R11: 00000000110688e6 R12: 0000000000000000
[ 337.061607] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8800a78f06e8
[ 337.063385] FS: 00007f9d95652800(0000) GS:ffff8800b7180000(0000) knlGS:0000000000000000
[ 337.065110] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 337.066801] CR2: 0000000000000024 CR3: 00000000aef2c000 CR4: 00000000000006e0
[ 337.068581] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 337.070321] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 337.072105] Process mount (pid: 8308, threadinfo ffff8800b1d10000, task ffff8800b1d02be0)
[ 337.073800] Stack:
[ 337.075487] ffff8800b1d11b08 ffffffff811f48cf ffff88007ac9b158 0000000000000000
[ 337.077255] ffff8800b1d11b38 ffffffff8119405d ffff88007ac9b158 ffff88007ac9b250
[ 337.078851] ffffffff8181bda0 ffffffff8181bda0 ffff8800b1d11b68 ffffffff81131e31
[ 337.080284] Call Trace:
[ 337.081706] [] log_start_commit+0x1f/0x40
[ 337.083107] [] ext3_evict_inode+0x1fd/0x2a0
[ 337.084490] [] evict+0xa1/0x1a0
[ 337.085857] [] iput+0x101/0x210
[ 337.087220] [] iget_failed+0x21/0x30
[ 337.088581] [] ext3_iget+0x15c/0x450
[ 337.089936] [] ? ext3_rsv_window_add+0x81/0x100
[ 337.091284] [] ext3_get_journal+0x15/0xde
[ 337.092641] [] ext3_fill_super+0xf2b/0x1c30
[ 337.093991] [] ? register_shrinker+0x4d/0x60
[ 337.095332] [] mount_bdev+0x1a2/0x1e0
[ 337.096680] [] ? ext3_setup_super+0x210/0x210
[ 337.098026] [] ext3_mount+0x10/0x20
[ 337.099362] [] mount_fs+0x3e/0x1b0
[ 337.100759] [] ? __alloc_percpu+0xb/0x10
[ 337.102330] [] vfs_kern_mount+0x65/0xc0
[ 337.103889] [] do_kern_mount+0x4f/0x100
[ 337.105442] [] do_mount+0x19c/0x890
[ 337.106989] [] ? memdup_user+0x46/0x90
[ 337.108572] [] ? strndup_user+0x53/0x70
[ 337.110114] [] sys_mount+0x8b/0xe0
[ 337.111617] [] system_call_fastpath+0x16/0x1b
[ 337.113133] Code: 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b 5d c3 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 66 0f c1 07 0f b6 d4 38 c2 74 0c 0f 1f 00 f3 90 0f b6 07 38
[ 337.116588] RIP [] _raw_spin_lock+0x9/0x30
[ 337.118260] RSP
[ 337.119998] CR2: 0000000000000024
[ 337.188701] ---[ end trace c36d790becac1615 ]---Signed-off-by: Dan Carpenter
Signed-off-by: Jan Kara
02 Nov, 2011
1 commit
-
Replace remaining direct i_nlink updates with a new set_nlink()
updater function.Signed-off-by: Miklos Szeredi
Tested-by: Toshiyuki Okajima
Signed-off-by: Christoph Hellwig
23 Aug, 2011
2 commits
-
Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule,
and lave REQ_META purely for marking requests as metadata in blktrace.All existing callers of REQ_META except for XFS are updated to also
set REQ_PRIO for now.Signed-off-by: Christoph Hellwig
Reviewed-by: Namhyung Kim
Signed-off-by: Jens Axboe -
Replace all occurnanced of the undocumented READ_META with READ | REQ_META
and remove the unused WRITE_META define.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
27 Jul, 2011
1 commit
-
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
jbd: change the field "b_cow_tid" of struct journal_head from type unsigned to tid_t
ext3.txt: update the links in the section "useful links" to the latest ones
ext3: Fix data corruption in inodes with journalled data
ext2: check xattr name_len before acquiring xattr_sem in ext2_xattr_get
ext3: Fix compilation with -DDX_DEBUG
quota: Remove unused declaration
jbd: Use WRITE_SYNC in journal checkpoint.
jbd: Fix oops in journal_remove_journal_head()
ext3: Return -EINVAL when start is beyond the end of fs in ext3_trim_fs()
ext3/ioctl.c: silence sparse warnings about different address spaces
ext3/ext4 Documentation: remove bh/nobh since it has been deprecated
ext3: Improve truncate error handling
ext3: use proper little-endian bitops
ext2: include fs.h into ext2_fs.h
ext3: Fix oops in ext3_try_to_allocate_with_rsv()
jbd: fix a bug of leaking jh->b_jcount
jbd: remove dependency on __GFP_NOFAIL
ext3: Convert ext3 to new truncate calling convention
jbd: Add fixed tracepoints
ext3: Add fixed tracepointsResolve conflicts in fs/ext3/fsync.c due to fsync locking push-down and
new fixed tracepoints.
23 Jul, 2011
1 commit
-
When journalling data for an inode (either because it is a symlink or
because the filesystem is mounted in data=journal mode), ext3_evict_inode()
can discard unwritten data by calling truncate_inode_pages(). This is
because we don't mark the buffer / page dirty when journalling data but only
add the buffer to the running transaction and thus mm does not know there
are still unwritten data.Fix the problem by carefully tracking transaction containing inode's data,
committing this transaction, and writing uncheckpointed buffers when inode
should be reaped.Signed-off-by: Jan Kara
21 Jul, 2011
2 commits
-
Simple filesystems always pass inode->i_sb_bdev as the block device
argument, and never need a end_io handler. Let's simply things for
them and for my grepping activity by dropping these arguments. The
only thing not falling into that scheme is ext4, which passes and
end_io handler without needing special flags (yet), but given how
messy the direct I/O code there is use of __blockdev_direct_IO
in one instead of two out of three cases isn't going to make a large
difference anyway.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
Let filesystems handle waiting for direct I/O requests themselves instead
of doing it beforehand. This means filesystem-specific locks to prevent
new dio referenes from appearing can be held. This is important to allow
generalizing i_dio_count to non-DIO_LOCKING filesystems.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
25 Jun, 2011
3 commits
-
New truncate calling convention allows us to handle errors from
ext3_block_truncate_page(). So reorganize the code so that
ext3_block_truncate_page() is called before we change inode size.This also removes unnecessary block zeroing from error recovery after failed
buffered writes (zeroing isn't needed because we could have never written
non-zero data to disk). We have to be careful and keep zeroing in direct IO
write error recovery because there we might have already overwritten end of the
last file block.Signed-off-by: Jan Kara
-
Mostly trivial conversion. We fix a bug that IS_IMMUTABLE and IS_APPEND files
could not be truncated during failed writes as we change the code. In fact the
test is not needed at all because both IS_IMMUTABLE and IS_APPEND is tested in
upper layers in do_sys_[f]truncate(), may_write(), etc.Signed-off-by: Jan Kara
-
This commit adds fixed tracepoints to the ext3 code. It is based on ext4
tracepoints, however due to the differences of both file systems, there
are some tracepoints missing (those for delaloc and for multi-block
allocator) and there are some ext3 specific as well (for reservation
windows).Here is a list:
ext3_free_inode
ext3_request_inode
ext3_allocate_inode
ext3_evict_inode
ext3_drop_inode
ext3_mark_inode_dirty
ext3_write_begin
ext3_ordered_write_end
ext3_writeback_write_end
ext3_journalled_write_end
ext3_ordered_writepage
ext3_writeback_writepage
ext3_journalled_writepage
ext3_readpage
ext3_releasepage
ext3_invalidatepage
ext3_discard_blocks
ext3_request_blocks
ext3_allocate_blocks
ext3_free_blocks
ext3_sync_file_enter
ext3_sync_file_exit
ext3_sync_fs
ext3_rsv_window_add
ext3_discard_reservation
ext3_alloc_new_reservation
ext3_reserved
ext3_forget
ext3_read_block_bitmap
ext3_direct_IO_enter
ext3_direct_IO_exit
ext3_unlink_enter
ext3_unlink_exit
ext3_truncate_enter
ext3_truncate_exit
ext3_get_blocks_enter
ext3_get_blocks_exit
ext3_load_inodeSigned-off-by: Lukas Czerner
Cc: Jan Kara
Signed-off-by: Jan Kara
27 May, 2011
1 commit
-
Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
anything else, so that the filesystem can track internally if it
needs to push out a transaction for fdatasync or not.This is just the prototype change with no user for it yet. I plan
to push large XFS changes for the next merge window, and getting
this trivial infrastructure in this window would help a lot to avoid
tree interdependencies.Also remove incorrect comments that ->dirty_inode can't block. That
has been changed a long time ago, and many implementations rely on it.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
08 Apr, 2011
1 commit
-
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
quota: Don't write quota info in dquot_commit()
ext3: Fix writepage credits computation for ordered mode
31 Mar, 2011
1 commit
-
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi
24 Mar, 2011
1 commit
-
Original computation forgets to count writes of indirect block themselves
(it only counts with blocks necessary for their allocation) in ordered mode.Acked-by: Amir Goldstein
Signed-off-by:Yongqiang Yang
Signed-off-by: Jan Kara
10 Mar, 2011
1 commit
-
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().Signed-off-by: Jens Axboe
11 Jan, 2011
1 commit
-
Check return value of ext3_journal_get_write_acccess() and
ext3_journal_dirty_metadata().Signed-off-by: Namhyung Kim
Signed-off-by: Jan Kara