Eric Lee / smarc-fsl-linux-kernel

22 May, 2010

1 commit

12755627b quota: unify quota init condition in setattr ... Browse Code »

Quota must being initialized if size or uid/git changes requested.
But initialization performed in two different places:
in case of i_size file system is responsible for dquot init
, but in case of uid/gid init will be called internally in
dquot_transfer().
This ambiguity makes code harder to understand.
Let's move this logic to one common helper function.

Signed-off-by: Dmitry Monakhov
Signed-off-by: Jan Kara

Dmitry Monakhov
2010-05-22 01:30:45 +0800

30 Mar, 2010

1 commit

de329820e ext3: fix broken handling of EXT3_STATE_NEW ... Browse Code »

In commit 9df93939b735 ("ext3: Use bitops to read/modify
EXT3_I(inode)->i_state") ext3 changed its internal 'i_state' variable to
use bitops for its state handling. However, unline the same ext4
change, it didn't actually change the name of the field when it changed
the semantics of it.

As a result, an old use of 'i_state' remained in fs/ext3/ialloc.c that
initialized the field to EXT3_STATE_NEW. And that does not work
_at_all_ when we're now working with individually named bits rather than
values that get masked. So the code tried to mark the state to be new,
but in actual fact set the field to EXT3_STATE_JDATA. Which makes no
sense at all, and screws up all the code that checks whether the inode
was newly allocated.

In particular, it made the xattr code unhappy, and caused various random
behavior, like apparently

https://bugzilla.redhat.com/show_bug.cgi?id=577911

So fix the initialization, and rename the field to match ext4 so that we
don't have this happen again.

Cc: James Morris
Cc: Stephen Smalley
Cc: Daniel J Walsh
Cc: Eric Paris
Cc: Jan Kara
Signed-off-by: Linus Torvalds

Linus Torvalds
2010-03-30 05:30:19 +0800

06 Mar, 2010

2 commits

e213e26ab Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
quota: stop using QUOTA_OK / NO_QUOTA
dquot: cleanup dquot initialize routine
dquot: move dquot initialization responsibility into the filesystem
dquot: cleanup dquot drop routine
dquot: move dquot drop responsibility into the filesystem
dquot: cleanup dquot transfer routine
dquot: move dquot transfer responsibility into the filesystem
dquot: cleanup inode allocation / freeing routines
dquot: cleanup space allocation / freeing routines
ext3: add writepage sanity checks
ext3: Truncate allocated blocks if direct IO write fails to update i_size
quota: Properly invalidate caches even for filesystems with blocksize < pagesize
quota: generalize quota transfer interface
quota: sb_quota state flags cleanup
jbd: Delay discarding buffers in journal_unmap_buffer
ext3: quota_write cross block boundary behaviour
quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
quota: split out compat_sys_quotactl support from quota.c
quota: split out netlink notification support from quota.c
quota: remove invalid optimization from quota_sync_all
...

Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c

Linus Torvalds
2010-03-06 05:20:53 +0800
a9185b41a pass writeback_control to ->write_inode ... Browse Code »

This gives the filesystem more information about the writeback that
is happening. Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-03-06 02:25:52 +0800

05 Mar, 2010

7 commits

871a29315 dquot: cleanup dquot initialize routine ... Browse Code »

Get rid of the initialize dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_initialize helper to __dquot_initialize
and vfs_dq_init to dquot_initialize to have a consistent namespace.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jan Kara

Christoph Hellwig
2010-03-05 07:20:30 +0800
907f4554e dquot: move dquot initialization responsibility into the filesystem ... Browse Code »

Currently various places in the VFS call vfs_dq_init directly. This means
we tie the quota code into the VFS. Get rid of that and make the
filesystem responsible for the initialization. For most metadata operations
this is a straight forward move into the methods, but for truncate and
open it's a bit more complicated.

For truncate we currently only call vfs_dq_init for the sys_truncate case
because open already takes care of it for ftruncate and open(O_TRUNC) - the
new code causes an additional vfs_dq_init for those which is harmless.

For open the initialization is moved from do_filp_open into the open method,
which means it happens slightly earlier now, and only for regular files.
The latter is fine because we don't need to initialize it for operations
on special files, and we already do it as part of the namespace operations
for directories.

Add a dquot_file_open helper that filesystems that support generic quotas
can use to fill in ->open.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jan Kara

Christoph Hellwig
2010-03-05 07:20:30 +0800
b43fa8284 dquot: cleanup dquot transfer routine ... Browse Code »

Get rid of the transfer dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_transfer helper to __dquot_transfer
and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
and make the new dquot_transfer return a normal negative errno value
which all callers expect.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jan Kara

Christoph Hellwig
2010-03-05 07:20:29 +0800
5dd4056db dquot: cleanup space allocation / freeing routines ... Browse Code »

Get rid of the alloc_space, free_space, reserve_space, claim_space and
release_rsv dquot operations - they are always called from the filesystem
and if a filesystem really needs their own (which none currently does)
it can just call into it's own routine directly.

Move shared logic into the common __dquot_alloc_space,
dquot_claim_space_nodirty and __dquot_free_space low-level methods,
and rationalize the wrappers around it to move as much as possible
code into the common block for CONFIG_QUOTA vs not. Also rename
all these helpers to be named dquot_* instead of vfs_dq_*.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jan Kara

Christoph Hellwig
2010-03-05 07:20:28 +0800
49792c806 ext3: add writepage sanity checks ... Browse Code »

- There is theoretical possibility to perform writepage on
RO superblock. Add explicit check for what case.
- Page must being locked before writepage.

Signed-off-by: Dmitry Monakhov
Signed-off-by: Jan Kara

Dmitry Monakhov
2010-03-05 07:20:27 +0800
7eb4969e0 ext3: Truncate allocated blocks if direct IO write fails to update i_size ... Browse Code »

We have to truncate blocks allocated to file during direct IO when we
fail to update i_size properly.

Signed-off-by: Jan Kara

Jan Kara
2010-03-05 07:20:27 +0800
9df93939b ext3: Use bitops to read/modify EXT3_I(inode)->i_state ... Browse Code »

At several places we modify EXT3_I(inode)->i_state without holding i_mutex
(ext3_release_file, ext3_bmap, ext3_journalled_writepage, ext3_do_update_inode,
...). These modifications are racy and we can lose updates to i_state. So
convert handling of i_state to use bitops which are atomic.

Signed-off-by: Jan Kara

Jan Kara
2010-03-05 07:20:20 +0800

23 Dec, 2009

1 commit

c459001fa ext3: quota macros cleanup [V2] ... Browse Code »

Currently all quota block reservation macros contains hardcoded "2"
aka MAXQUOTAS value. This is no good because in some places it is not
obvious to understand what does this digit represent. Let's introduce
new macro with self descriptive name.

Signed-off-by: Dmitry Monakhov
Signed-off-by: Jan Kara

Dmitry Monakhov
2009-12-23 20:33:54 +0800

10 Dec, 2009

1 commit

68eb3db08 ext3: Fix data / filesystem corruption when write fails to copy data ... Browse Code »

When ext3_write_begin fails after allocating some blocks or
generic_perform_write fails to copy data to write, we truncate blocks already
instantiated beyond i_size. Although these blocks were never inside i_size, we
have to truncate pagecache of these blocks so that corresponding buffers get
unmapped. Otherwise subsequent __block_prepare_write (called because we are
retrying the write) will find the buffers mapped, not call ->get_block, and
thus the page will be backed by already freed blocks leading to filesystem and
data corruption.

Reported-by: James Y Knight
Signed-off-by: Jan Kara

Jan Kara
2009-12-10 22:02:55 +0800

08 Dec, 2009

1 commit

d014d0438 Merge branch 'for-next' into for-linus ... Browse Code »

Conflicts:

kernel/irq/chip.c

Jiri Kosina
2009-12-08 01:36:35 +0800

04 Dec, 2009

1 commit

bf48aabb8 tree-wide: fix typos "offest" -> "offset" ... Browse Code »

This patch was generated by

git grep -E -i -l 'offest' | xargs -r perl -p -i -e 's/offest/offset/'

Signed-off-by: Uwe Kleine-König
Signed-off-by: Jiri Kosina

Uwe Kleine-König
2009-12-04 22:39:50 +0800

11 Nov, 2009

2 commits

fe8bc91c4 ext3: Wait for proper transaction commit on fsync ... Browse Code »

We cannot rely on buffer dirty bits during fsync because pdflush can come
before fsync is called and clear dirty bits without forcing a transaction
commit. What we do is that we track which transaction has last changed
the inode and which transaction last changed allocation and force it to
disk on fsync.

Signed-off-by: Jan Kara
Reviewed-by: Aneesh Kumar K.V

Jan Kara
2009-11-11 22:22:49 +0800
ea0174a71 ext3: retry failed direct IO allocations ... Browse Code »

On a 256M 4k block filesystem, doing this in a loop:

dd if=/dev/zero of=test oflag=direct bs=1M count=64
rm -f test

eventually leads to spurious ENOSPC:

dd: writing `test': No space left on device

As with other block allocation callers, it looks like we need to
potentially retry the allocations on the initial ENOSPC.

A similar patch went into ext4 (commit
fbbf69456619de5d251cb9f1df609069178c62d5)

Signed-off-by: Eric Sandeen
Signed-off-by: Jan Kara

Eric Sandeen
2009-11-11 22:22:49 +0800

24 Sep, 2009

1 commit

db1682636 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 ... Browse Code »

* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
HWPOISON: Enable error_remove_page on btrfs
HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
HWPOISON: Add madvise() based injector for hardware poisoned pages v4
HWPOISON: Enable error_remove_page for NFS
HWPOISON: Enable .remove_error_page for migration aware file systems
HWPOISON: The high level memory error handler in the VM v7
HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
HWPOISON: shmem: call set_page_dirty() with locked page
HWPOISON: Define a new error_remove_page address space op for async truncation
HWPOISON: Add invalidate_inode_page
HWPOISON: Refactor truncate to allow direct truncating of page v2
HWPOISON: check and isolate corrupted free pages v2
HWPOISON: Handle hardware poisoned pages in try_to_unmap
HWPOISON: Use bitmask/action code for try_to_unmap behaviour
HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
HWPOISON: Add poison check to page fault handling
HWPOISON: Add basic support for poisoned pages in fault handler v3
HWPOISON: Add new SIGBUS error codes for hardware poison signals
HWPOISON: Add support for poison swap entries v2
HWPOISON: Export some rmap vma locking to outside world
...

Linus Torvalds
2009-09-24 22:53:22 +0800

16 Sep, 2009

3 commits

4f003fd32 ext3: Add locking to ext3_do_update_inode ... Browse Code »

I've been struggling with this off and on while I've been testing the
data=guarded work. The symptom is corrupted orphan lists and inodes
with the wrong i_size stored on disk. I was convinced the
data=guarded code was just missing a call to ext3_mark_inode_dirty, but
tracing showed the i_disksize I was sending to ext3_mark_inode_dirty
wasn't actually making it to the drive.

ext3_mark_inode_dirty can be called without locks held (atime updates
and a few others), so the data=guarded code uses locks while updating
the in-memory inode, and then calls ext3_mark_inode_dirty
without any locks held.

But, ext3_mark_inode_dirty has no internal locking to make sure that
only one CPU is updating the buffer head at a time. Generally this
works out ok because everyone that changes the inode then calls
ext3_mark_inode_dirty themselves. Even though it races, eventually
someone updates the buffer heads and things move on.

But there is still a risk of the wrong values getting in, and the
data=guarded code seems to hit the race very often.

Since everyone that changes the inode also logs it, it should be
possible to fix this with some memory barriers. I'll leave that as an
exercise to the reader and lock the buffer head instead.

It it probably a good idea to have a different patch series for lockless
bit flipping on the ext3 i_state field. ext3_do_update_inode &= clears
EXT3_STATE_NEW without any locks held.

Signed-off-by: Chris Mason
Signed-off-by: Jan Kara

Chris Mason
2009-09-16 23:44:11 +0800
00171d3c7 ext3: Fix possible deadlock between ext3_truncate() and ext3_get_blocks() ... Browse Code »

During truncate we are sometimes forced to start a new transaction as the
amount of blocks to be journaled is both quite large and hard to predict. So
far we restarted a transaction while holding truncate_mutex and that violates
lock ordering because truncate_mutex ranks below transaction start (and it
can lead to a real deadlock with ext3_get_blocks() allocating new blocks
from ext3_writepage()).

Luckily, the problem is easy to fix: We just drop the truncate_mutex before
restarting the transaction and acquire it afterwards. We are safe to do this as
by the time ext3_truncate() is called, all the page cache for the truncated
part of the file is dropped and so writepage() cannot come and allocate new
blocks in the part of the file we are truncating. The rest of writers is
stopped by us holding i_mutex.

Signed-off-by: Jan Kara

Jan Kara
2009-09-16 23:44:11 +0800
aa261f549 HWPOISON: Enable .remove_error_page for migration aware file systems ... Browse Code »

Enable removing of corrupted pages through truncation
for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
These should cover most server needs.

I chose the set of migration aware file systems for this
for now, assuming they have been especially audited.
But in general it should be safe for all file systems
on the data area that support read/write and truncate.

Caveat: the hardware error handler does not take i_mutex
for now before calling the truncate function. Is that ok?

Cc: tytso@mit.edu
Cc: hch@infradead.org
Cc: mfasheh@suse.com
Cc: aia21@cantab.net
Cc: hugh.dickins@tiscali.co.uk
Cc: swhiteho@redhat.com
Signed-off-by: Andi Kleen

Andi Kleen
2009-09-16 17:50:16 +0800

16 Jul, 2009

2 commits

43237b549 ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() ... Browse Code »

Get rid of extenddisksize parameter of ext3_get_blocks_handle(). This seems to
be a relict from some old days and setting disksize in this function does not
make much sence. Currently it was set only by ext3_getblk(). Since the
parameter has some effect only if create == 1, it is easy to check that the
three callers which end up calling ext3_getblk() with create == 1 (ext3_append,
ext3_quota_write, ext3_mkdir) do the right thing and set disksize themselves.

Signed-off-by: Jan Kara

Jan Kara
2009-07-16 03:30:46 +0800
9eaaa2d57 ext3: Fix truncation of symlinks after failed write ... Browse Code »

Contents of long symlinks is written via standard write methods. So when the
write fails, we add inode to orphan list. But symlinks don't have .truncate
method defined so nobody properly removes them from the orphan list (both on
disk and in memory).

Fix this by calling ext3_truncate() directly instead of calling vmtruncate()
(which is saner anyway since we don't need anything vmtruncate() does except
from calling .truncate in these paths). We also add inode to orphan list only
if ext3_can_truncate() is true (currently, it can be false for symlinks when
there are no blocks allocated) - otherwise orphan list processing will complain
and ext3_truncate() will not remove inode from on-disk orphan list.

Signed-off-by: Jan Kara

Jan Kara
2009-07-16 03:28:07 +0800

24 Jun, 2009

1 commit

6582a0e6f switch ext3 to inode->i_acl ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2009-06-24 20:17:04 +0800

19 Jun, 2009

2 commits

ef43618a4 ext3: make sure inode is deleted from orphan list after truncate ... Browse Code »

As Ted pointed out, it can happen that ext3_truncate() returns without
removing inode from orphan list. This way we could in some rare cases
(like when we get ENOMEM from an allocation in ext3_truncate called
because of failed ext3_write_begin) leave the inode on orphan list and
that triggers assertion failure on umount.

So make ext3_truncate() always remove inode from in-memory orphan list.

Cc: Theodore Ts'o
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2009-06-19 04:03:45 +0800
e8ef7aaea ext3: fix chain verification in ext3_get_blocks() ... Browse Code »

Chain verification in ext3_get_blocks() has been hosed since it called
verify_chain(chain, NULL) which always returns success. As a result
readers could in theory race with truncate. On the other hand the race
probably cannot happen with the current locking scheme, since by the
time ext3_truncate() is called all the pages are already removed and
hence get_block() shouldn't be called on such pages...

Signed-off-by: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2009-06-19 04:03:45 +0800

12 Jun, 2009

1 commit

ca41f7b91 ext3: remove ->write_super and stop maintaining ->s_dirt ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:05 +0800

09 Apr, 2009

1 commit

430db323f ext3: Try to avoid starting a transaction in writepage for data=writepage ... Browse Code »

This does the same as commit 9e80d407736161d9b8b0c5a0d44f786e44c322ea
(avoid starting a transaction when no block allocation is needed)
but for data=writeback mode of ext3. We also cleanup the data=ordered
case a bit to stick to coding style...

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2009-04-09 01:15:10 +0800

04 Apr, 2009

1 commit

20bec8ab1 Merge branch 'ext3-latency-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'ext3-latency-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext3: Add replace-on-rename hueristics for data=writeback mode
ext3: Add replace-on-truncate hueristics for data=writeback mode
ext3: Use WRITE_SYNC for commits which are caused by fsync()
block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks

Linus Torvalds
2009-04-04 02:10:33 +0800

03 Apr, 2009

2 commits

f7ab34ea7 ext3: Add replace-on-truncate hueristics for data=writeback mode ... Browse Code »

In data=writeback mode, start an asynchronous flush when closing a
file which had been previously truncated down to zero. This lowers
the probability of data loss in the case of applications that attempt
to replace a file using truncate.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2009-04-03 13:34:35 +0800
695f6ae0d ext3: avoid false EIO errors ... Browse Code »

Sometimes block_write_begin() can map buffers in a page but later we
fail to copy data into those buffers (because the source page has been
paged out in the mean time). We then end up with !uptodate mapped
buffers. To add a bit more to the confusion, block_write_end() does
not commit any data (and thus does not any mark buffers as uptodate) if
we didn't succeed with copying all the data.

Commit f4fc66a894546bdc88a775d0e83ad20a65210bcb (ext3: convert to new
aops) missed these cases and thus we were inserting non-uptodate
buffers to transaction's list which confuses JBD code and it reports IO
errors, aborts a transaction and generally makes users afraid about
their data ;-P.

This patch fixes the problem by reorganizing ext3_..._write_end() code
to first call block_write_end() to mark buffers with valid data
uptodate and after that we file only uptodate buffers to transaction's
lists.

We also fix a problem where we could leave blocks allocated beyond i_size
(i_disksize in fact) because of failed write. We now add inode to orphan
list when write fails (to be safe in case we crash) and then truncate blocks
beyond i_size in a separate transaction.

Signed-off-by: Jan Kara
Reviewed-by: Aneesh Kumar K.V
Cc: Nick Piggin
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2009-04-03 10:04:52 +0800

28 Mar, 2009

1 commit

2c9e15a01 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: (27 commits)
ext2: Zero our b_size in ext2_quota_read()
trivial: fix typos/grammar errors in fs/Kconfig
quota: Coding style fixes
quota: Remove superfluous inlines
quota: Remove uppercase aliases for quota functions.
nfsd: Use lowercase names of quota functions
jfs: Use lowercase names of quota functions
udf: Use lowercase names of quota functions
ufs: Use lowercase names of quota functions
reiserfs: Use lowercase names of quota functions
ext4: Use lowercase names of quota functions
ext3: Use lowercase names of quota functions
ext2: Use lowercase names of quota functions
ramfs: Remove quota call
vfs: Use lowercase names of quota functions
quota: Remove dqbuf_t and other cleanups
quota: Remove NODQUOT macro
quota: Make global quota locks cacheline aligned
quota: Move quota files into separate directory
ext4: quota reservation for delayed allocation
...

Linus Torvalds
2009-03-28 05:48:34 +0800

27 Mar, 2009

1 commit

9e80d4077 ext3: Avoid starting a transaction in writepage when not necessary ... Browse Code »

We don't have to start a transaction in writepage() when all the blocks
are a properly allocated. Even in ordered mode either the data has been
written via write() and they are thus already added to transaction's list
or the data was written via mmap and then it's random in which transaction
they get written anyway.

This should help VM to pageout dirty memory without blocking on transaction
commits.

Signed-off-by: Jan Kara
Signed-off-by: Linus Torvalds

Jan Kara
2009-03-27 06:44:59 +0800

26 Mar, 2009

1 commit

81a052273 ext3: Use lowercase names of quota functions ... Browse Code »

Use lowercase names of quota functions instead of old uppercase ones.

Signed-off-by: Jan Kara
CC: linux-ext4@vger.kernel.org

Jan Kara
2009-03-26 09:18:36 +0800

05 Jan, 2009

1 commit

54566b2c1 fs: symlink write_begin allocation context fix ... Browse Code »

With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened. They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim. This bug could
cause filesystem deadlocks.

The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock. The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.

Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
this flag in their write_begin function. Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).

This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
random example).

[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin
Reviewed-by: KOSAKI Motohiro
Cc: [2.6.28.x]
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
[ Cleaned up the calling convention: just pass in the AOP flags
untouched to the grab_cache_page_write_begin() function. That
just simplifies everybody, and may even allow future expansion of the
logic. - Linus ]
Signed-off-by: Linus Torvalds

Nick Piggin
2009-01-05 05:33:20 +0800

01 Jan, 2009

1 commit

b5ed3112b ext3: ensure fast symlinks are NUL-terminated ... Browse Code »

Ensure fast symlink targets are NUL-terminated, even if corrupted
on-disk.

Cc: Andrew Morton
Cc: Stephen Tweedie
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Duane Griffin
Signed-off-by: Al Viro

Duane Griffin
2009-01-01 07:07:39 +0800

20 Oct, 2008

1 commit

5ec8b75e3 ext3: truncate block allocated on a failed ext3_write_begin ... Browse Code »

For blocksize < pagesize we need to remove blocks that got allocated in
block_write_begin() if we fail with ENOSPC for later blocks.
block_write_begin() internally does this if it allocated page locally.
This makes sure we don't have blocks outside inode.i_size during ENOSPC.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aneesh Kumar K.V
2008-10-20 23:52:38 +0800

04 Oct, 2008

1 commit

68c9d702b generic block based fiemap implementation ... Browse Code »

Any block based fs (this patch includes ext3) just has to declare its own
fiemap() function and then call this generic function with its own
get_block_t. This works well for block based filesystems that will map
multiple contiguous blocks at one time, but will work for filesystems that
only map one block at a time, you will just end up with an "extent" for each
block. One gotcha is this will not play nicely where there is hole+data
after the EOF. This function will assume its hit the end of the data as soon
as it hits a hole after the EOF, so if there is any data past that it will
not pick that up. AFAIK no block based fs does this anyway, but its in the
comments of the function anyway just in case.

Signed-off-by: Josef Bacik
Signed-off-by: Mark Fasheh
Signed-off-by: "Theodore Ts'o"
Cc: linux-fsdevel@vger.kernel.org

Josef Bacik
2008-10-04 05:32:43 +0800

29 Jul, 2008

1 commit

8ab22b9ab vfs: pagecache usage optimization for pagesize!=blocksize ... Browse Code »

When we read some part of a file through pagecache, if there is a
pagecache of corresponding index but this page is not uptodate, read IO
is issued and this page will be uptodate.

I think this is good for pagesize == blocksize environment but there is
room for improvement on pagesize != blocksize environment. Because in
this case a page can have multiple buffers and even if a page is not
uptodate, some buffers can be uptodate.

So I suggest that when all buffers which correspond to a part of a file
that we want to read are uptodate, use this pagecache and copy data from
this pagecache to user buffer even if a page is not uptodate. This can
reduce read IO and improve system throughput.

I wrote a benchmark program and got result number with this program.

This benchmark do:

1: mount and open a test file.

2: create a 512MB file.

3: close a file and umount.

4: mount and again open a test file.

5: pwrite randomly 300000 times on a test file. offset is aligned
by IO size(1024bytes).

6: measure time of preading randomly 100000 times on a test file.

The result was:
2.6.26
330 sec

2.6.26-patched
226 sec

Arch:i386
Filesystem:ext3
Blocksize:1024 bytes
Memory: 1GB

On ext3/4, a file is written through buffer/block. So random read/write
mixed workloads or random read after random write workloads are optimized
with this patch under pagesize != blocksize environment. This test result
showed this.

The benchmark program is as follows:

#include
#include
#include
#include
#include
#include
#include
#include
#include

#define LEN 1024
#define LOOP 1024*512 /* 512MB */

main(void)
{
unsigned long i, offset, filesize;
int fd;
char buf[LEN];
time_t t1, t2;

if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
perror("cannot mount\n");
exit(1);
}
memset(buf, 0, LEN);
fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
if (fd < 0) {
perror("cannot open file\n");
exit(1);
}
for (i = 0; i < LOOP; i++)
write(fd, buf, LEN);
close(fd);
if (umount("/root/test1/") < 0) {
perror("cannot umount\n");
exit(1);
}
if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
perror("cannot mount\n");
exit(1);
}
fd = open("/root/test1/testfile", O_RDWR);
if (fd < 0) {
perror("cannot open file\n");
exit(1);
}

filesize = LEN * LOOP;
for (i = 0; i < 300000; i++){
offset = (random() % filesize) & (~(LEN - 1));
pwrite(fd, buf, LEN, offset);
}
printf("start test\n");
time(&t1);
for (i = 0; i < 100000; i++){
offset = (random() % filesize) & (~(LEN - 1));
pread(fd, buf, LEN, offset);
}
time(&t2);
printf("%ld sec\n", t2-t1);
close(fd);
if (umount("/root/test1/") < 0) {
perror("cannot umount\n");
exit(1);
}
}

Signed-off-by: Hisashi Hifumi
Cc: Nick Piggin
Cc: Christoph Hellwig
Cc: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hisashi Hifumi
2008-07-29 07:30:21 +0800

26 Jul, 2008

1 commit

3ccc3167b ext3: handle deleting corrupted indirect blocks ... Browse Code »

While freeing indirect blocks we attach a journal head to the parent
buffer head, free the blocks, then journal the parent. If the indirect
block list is corrupted and points to the parent the journal head will be
detached when the block is cleared, causing an OOPS.

Check for that explicitly and handle it gracefully.

This patch fixes the third case (image hdb.20000057.nullderef.gz)
reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.

Immediately above the change, in the ext3_free_data function, we call
ext3_clear_blocks to clear the indirect blocks in this parent block. If
one of those blocks happens to actually be the parent block it will clear
b_private / BH_JBD.

I did the check at the end rather than earlier as it seemed more elegant.
I don't think there should be much practical difference, although it is
possible the FS may not be quite so badly corrupted if we did it the other
way (and didn't clear the block at all). To be honest, I'm not convinced
there aren't other similar failure modes lurking in this code, although I
couldn't find any with a quick review.

[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: Duane Griffin
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Duane Griffin
2008-07-26 01:53:32 +0800