Eric Lee / smarc-fsl-linux-kernel

17 Sep, 2006

2 commits

20acaa18d [PATCH] ext3 sequential read regression fix ... Browse Code »

ext3-get-blocks support caused ~20% degrade in Sequential read
performance (tiobench). Problem is with marking the buffer boundary
so IO can be submitted right away. Here is the patch to fix it.

2.6.18-rc6:
-----------
# ./iotest
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 75.2726 seconds, 57.1 MB/s

real 1m15.285s
user 0m0.276s
sys 0m3.884s

2.6.18-rc6 + fix:
-----------------
[root@elm3a241 ~]# ./iotest
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 62.9356 seconds, 68.2 MB/s

The boundary block check in ext3_get_blocks_handle needs to be adjusted
against the count of blocks mapped in this call, now that it can map
more than one block.

Signed-off-by: Suparna Bhattacharya
Tested-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Suparna Bhattacharya
2006-09-17 03:54:32 +0800
fdb36673a [PATCH] knfsd: Make ext3 reject filehandles referring to invalid inode number ... Browse Code »

Inodes earlier than the 'first' inode (e.g. journal, resize) should be
rejected early - except the root inode. Also inode numbers that are too
big should be rejected early.

[akpm@osdl.org: cleanup]
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-09-17 03:54:31 +0800

09 Sep, 2006

1 commit

3665d0e58 [PATCH] ext3_getblk() should handle HOLE correctly ... Browse Code »

It has been reported that ext3_getblk() is not doing the right thing and
triggering following WARN():

BUG: warning at fs/ext3/inode.c:1016/ext3_getblk()
ext3_getblk+0x98/0x2a6 md_wakeup_thread+0x26/0x2a
ext3_bread+0x1f/0x88 ext3_quota_read+0x136/0x1ae
v1_read_dqblk+0x61/0xac dquot_acquire+0xf6/0x107
ext3_acquire_dquot+0x46/0x68 dqget+0x155/0x1e7
dquot_transfer+0x3e0/0x3e9 dput+0x23/0x13e
ext3_setattr+0xc3/0x240 current_fs_time+0x52/0x6a
notify_change+0x2bd/0x30d chown_common+0x9c/0xc5
strncpy_from_user+0x3b/0x68 do_path_lookup+0xdf/0x266
__user_walk_fd+0x44/0x5a sys_chown+0x4a/0x55
vfs_write+0xe7/0x13c sys_mkdir+0x1f/0x23
syscall_call+0x7/0xb

Looking at the code, it looks like it's not handle HOLE correctly. It ends
up returning -EIO. Here is the patch to fix it.

If we really want to be paranoid, we can allow return values 0 (HOLE), 1
(we asked for one block) and return -EIO for more than 1 block. But I
really don't see a reason for doing it - all we need is the block# here.
(doesn't matter how many blocks are mapped).

ext3_get_blocks_handle() returns number of blocks it mapped. It returns 0
in case of HOLE. ext3_getblk() should handle HOLE properly (currently its
dumping warning stack and returning -EIO).

Signed-off-by: Badari Pulavarty
Acked-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-09-09 01:22:50 +0800

28 Aug, 2006

1 commit

08fb306fe [PATCH] ext3 filesystem bogus ENOSPC with reservation fix ... Browse Code »

To handle the earlier bogus ENOSPC error caused by filesystem full of block
reservation, current code falls back to non block reservation, starts to
allocate block(s) from the goal allocation block group as if there is no
block reservation.

Current code needs to re-load the corresponding block group descriptor for
the initial goal block group in this case. The patch fixes this.

Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-08-28 02:01:30 +0800

01 Aug, 2006

2 commits

0e31f51d8 [PATCH] ext3 -nobh option causes oops ... Browse Code »

For files other than IFREG, nobh option doesn't make sense. Modifications
to them are journalled and needs buffer heads to do that. Without this
patch, we get kernel oops in page_buffers().

Signed-off-by: Badari Pulavarty
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-08-01 04:28:44 +0800
2ccb48ebb [PATCH] ext3: avoid triggering ext3_error on bad NFS file handle ... Browse Code »

The inode number out of an NFS file handle gets passed eventually to
ext3_get_inode_block() without any checking. If ext3_get_inode_block()
allows it to trigger an error, then bad filehandles can have unpleasant
effect - ext3_error() will usually cause a forced read-only remount, or a
panic if `errors=panic' was used.

So remove the call to ext3_error there and put a matching check in
ext3/namei.c where inode numbers are read off storage.

[akpm@osdl.org: fix off-by-one error]
Signed-off-by: Neil Brown
Signed-off-by: Jan Kara
Cc: Marcel Holtmann
Cc:
Cc: "Stephen C. Tweedie"
Cc: Eric Sandeen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Neil Brown
2006-08-01 04:28:36 +0800

11 Jul, 2006

1 commit

36cf96f5e [PATCH] Remove leftover ext3 acl declarations ... Browse Code »

These functions no longer exist; remove their declarations.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andreas Gruenbacher
2006-07-11 04:24:26 +0800

04 Jul, 2006

1 commit

5c81a4197 [PATCH] lockdep: annotate the quota code ... Browse Code »

The quota code plays interesting games with the lock ordering; to quote Jan:

| i_mutex of inode containing quota file is acquired after all other
| quota locks. i_mutex of all other inodes is acquired before quota
| locks. Quota code makes sure (by resetting inode operations and
| setting special flag on inode) that noone tries to enter quota code
| while holding i_mutex on a quota file...

The good news is that all of this special case i_mutex grabbing happens in the
(per filesystem) low level quota write function. For this special case we
need a new I_MUTEX_* nesting level, since this just entirely outside any of
the regular VFS locking rules for i_mutex. I trust Jan on his blue eyes that
this is not ever going to deadlock; and based on that the patch below is what
it takes to inform lockdep of these very interesting new locking rules.

The new locking rule for the I_MUTEX_QUOTA nesting level is that this is the
deepest possible level of nesting for i_mutex, and that this only should be
used in quota write (and possibly read) function of filesystems. This makes
the lock ordering of the I_MUTEX_* levels:

I_MUTEX_PARENT -> I_MUTEX_CHILD -> I_MUTEX_NORMAL -> I_MUTEX_QUOTA

Has no effect on non-lockdep kernels.

Signed-off-by: Arjan van de Ven
Acked-by: Ingo Molnar
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2006-07-04 06:27:08 +0800

01 Jul, 2006

1 commit

6ab3d5624 Remove obsolete #include <linux/config.h> ... Browse Code »

Signed-off-by: Jörn Engel
Signed-off-by: Adrian Bunk

Jörn Engel
2006-07-01 01:25:36 +0800

29 Jun, 2006

1 commit

f5e54d6e5 [PATCH] mark address_space_operations const ... Browse Code »

Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.

Signed-off-by: Christoph Hellwig
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2006-06-29 05:59:04 +0800

27 Jun, 2006

1 commit

ade1a29e1 [PATCH] ext3: Add "-o bh" option ... Browse Code »

This patch adds "-o bh" option to force use of buffer_heads. This option
is needed when we make "nobh" as default - and if we run into problems.

Signed-off-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-06-27 00:58:20 +0800

26 Jun, 2006

5 commits

92eeccd8b [PATCH] ext3: cleanup dead code in ext3_add_entry() ... Browse Code »

The variables nlen and rlen are defined/initialized but not used in
ext3_add_entry().

Signed-off-by: Johann Lombardi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johann Lombardi
2006-06-26 01:01:15 +0800
43d23f903 [PATCH] ext3_fsblk_t: the rest of in-kernel filesystem blocks conversion ... Browse Code »

Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t. Convert the
rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t,
and replace the printk format string respondingly.

Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-06-26 01:01:10 +0800
1c2bf374a [PATCH] ext3_fsblk_t: filesystem, group blocks and bug fixes ... Browse Code »

Some of the in-kernel ext3 block variable type are treated as signed 4 bytes
int type, thus limited ext3 filesystem to 8TB (4kblock size based). While
trying to fix them, it seems quite confusing in the ext3 code where some
blocks are filesystem-wide blocks, some are group relative offsets that need
to be signed value (as -1 has special meaning). So it seem saner to define
two types of physical blocks: one is filesystem wide blocks, another is
group-relative blocks. The following patches clarify these two types of
blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3
filesystem limit to 8TB.

With this series of patches and the percpu counter data type changes in the mm
tree, we are able to extend exts filesystem limit to 16TB.

This work is also a pre-request for the recent >32 bit ext3 work, and makes
the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine
ext3_fsblk_t from unsigned long to sector_t and redefine the format string for
ext3 filesystem block corresponding.

Two RFC with a series patches have been posted to ext2-devel list and have
been reviewed and discussed:
http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2

http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2

Patches are tested on both 32 bit machine and 64 bit machine, 8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39). Tests
includes overnight fsx, tiobench, dbench and fsstress.

This patch:

Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for
filesystem wide blocks.

This patch classifies all block group relative blocks, and ext3_fsblk_t blocks
occurs in the same function where used to be confusing before. Also include
kernel bug fixes for filesystem wide in-kernel block variables. There are
some fileystem wide blocks are treated as int/unsigned int type in the kernel
currently, especially in ext3 block allocation and reservation code. This
patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned
long) type.

Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-06-26 01:01:10 +0800
d2e5b13c4 [PATCH] ext3: remove inconsistent space before exclamation point in mount code ... Browse Code »

This was reported as Debian bug #336604.

Signed-off-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Theodore Ts'o
2006-06-26 01:01:07 +0800
fcd5df358 [PATCH] Avoid disk sector_t overflow for >2TB ext3 filesystem ... Browse Code »

If ext3 filesystem is larger than 2TB, and sector_t is a u32 (i.e.
CONFIG_LBD not defined in the kernel), the calculation of the disk sector
will overflow. Add check at ext3_fill_super() and ext3_group_extend() to
prevent mount/remount/resize >2TB ext3 filesystem if sector_t size is 4
bytes.

Verified this patch on a 32 bit platform without CONFIG_LBD defined
(sector_t is 32 bits long), mount refuse to mount a 10TB ext3.

Signed-off-by: Mingming Cao
Acked-by: Andreas Dilger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-06-26 01:01:07 +0800

23 Jun, 2006

4 commits

0216bfcff [PATCH] percpu counter data type changes to suppport more than 2**31 ext3 free blocks counter ... Browse Code »

The percpu counter data type are changed in this set of patches to support
more users like ext3 who need more than 32 bit to store the free blocks
total in the filesystem.

- Generic perpcu counters data type changes. The size of the global counter
and local counter were explictly specified using s64 and s32. The global
counter is changed from long to s64, while the local counter is changed from
long to s32, so we could avoid doing 64 bit update in most cases.

- Users of the percpu counters are updated to make use of the new
percpu_counter_init() routine now taking an additional parameter to allow
users to pass the initial value of the global counter.

Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-06-23 22:43:06 +0800
e6022603b [PATCH] ext3_clear_inode(): avoid kfree(NULL) ... Browse Code »

Steven Rostedt points out that `rsv' here is usually
NULL, so we should avoid calling kfree().

Also, fix up some nearby whitespace damage.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-06-23 22:43:05 +0800
726c33422 [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry ... Browse Code »

Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch. That reduced the significance of
sb->s_root, allowing NFS to place a fake root there. However, NFS does
require a dentry to use as a target for the statfs operation. This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:45 +0800
454e2398b [PATCH] VFS: Permit filesystem to override root dentry on mount ... Browse Code »

Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.

(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().

(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().

This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.

However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.

[*] Anonymous until discovered from another tree.

(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Cc: Roland Dreier
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:45 +0800

21 Jun, 2006

1 commit

2edc322d4 Merge git://git.infradead.org/~dwmw2/rbtree-2.6 ... Browse Code »

* git://git.infradead.org/~dwmw2/rbtree-2.6:
[RBTREE] Switch rb_colour() et al to en_US spelling of 'color' for consistency
Update UML kernel/physmem.c to use rb_parent() accessor macro
[RBTREE] Update hrtimers to use rb_parent() accessor macro.
[RBTREE] Add explicit alignment to sizeof(long) for struct rb_node.
[RBTREE] Merge colour and parent fields of struct rb_node.
[RBTREE] Remove dead code in rb_erase()
[RBTREE] Update JFFS2 to use rb_parent() accessor macro.
[RBTREE] Update eventpoll.c to use rb_parent() accessor macro.
[RBTREE] Update key.c to use rb_parent() accessor macro.
[RBTREE] Update ext3 to use rb_parent() accessor macro.
[RBTREE] Change rbtree off-tree marking in I/O schedulers.
[RBTREE] Add accessor macros for colour and parent fields of rb_node

Linus Torvalds
2006-06-21 05:51:22 +0800

01 Jun, 2006

1 commit

6855a3a6c [PATCH] ext3 resize: fix double unlock_super() ... Browse Code »

From: Andrew Morton

Spotted by Jan Capek

Cc: "Stephen C. Tweedie"
Cc: Andreas Dilger
Cc: Jan Capek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-06-01 07:27:10 +0800

04 May, 2006

1 commit

5dea5176e [PATCH] ext3: multile block allocate little endian fixes ... Browse Code »

Some places in ext3 multiple block allocation code (in 2.6.17-rc3) don't
handle the little endian well. This was resulting in *wrong* block numbers
being assigned to in-memory block variables and then stored on disk
eventually. The following patch has been verified to fix an ext3
filesystem failure when run ltp test on a 64 bit machine.

Signed-off-by; Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-05-04 11:05:41 +0800

26 Apr, 2006

2 commits

a090d9132 [PATCH] protect ext3 ioctl modifying append_only, immutable, etc. with i_mutex ... Browse Code »

All modifications of ->i_flags in inodes that might be visible to
somebody else must be under ->i_mutex. That patch fixes ext3 ioctl()
setting S_APPEND and friends.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2006-04-26 22:52:21 +0800
de0bb97af [PATCH] forgotten ->b_data in memcpy() call in ext3/resize.c (oopsable) ... Browse Code »

sbi->s_group_desc is an array of pointers to buffer_head. memcpy() of
buffer size from address of buffer_head is a bad idea - it will generate
junk in any case, may oops if buffer_head is close to the end of slab
page and next page is not mapped and isn't what was intended there.
IOW, ->b_data is missing in that call. Fortunately, result doesn't go
into the primary on-disk data structures, so only backup ones get crap
written to them; that had allowed this bug to remain unnoticed until
now.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2006-04-26 22:52:21 +0800

21 Apr, 2006

1 commit

52b5108ca [RBTREE] Update ext3 to use rb_parent() accessor macro. ... Browse Code »

Signed-off-by: David Woodhouse

David Woodhouse
2006-04-21 20:15:57 +0800

18 Apr, 2006

1 commit

75616cf98 [PATCH] ext3: Fix missed mutex unlock ... Browse Code »

Missed unlock_super()call is added in error condition code path.

Signed-off-by: Leonid Ananiev
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman

Ananiev, Leonid I
2006-04-18 05:24:57 +0800

11 Apr, 2006

1 commit

389ed39b9 [PATCH] ext3: Fix missed mutex unlock ... Browse Code »

Missed unlock_super()call is added in error condition code path.

Signed-off-by: Leonid Ananiev
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ananiev, Leonid I
2006-04-11 21:18:46 +0800

31 Mar, 2006

1 commit

5274f052e [PATCH] Introduce sys_splice() system call ... Browse Code »

This adds support for the sys_splice system call. Using a pipe as a
transport, it can connect to files or sockets (latter as output only).

From the splice.c comments:

"splice": joining two ropes together by interweaving their strands.

This is the "extended pipe" functionality, where a pipe is used as
an arbitrary in-memory buffer. Think of a pipe as a small kernel
buffer that you can use to transfer data from one end to the other.

The traditional unix read/write is extended with a "splice()" operation
that transfers data buffers to or from a pipe buffer.

Named by Larry McVoy, original implementation from Linus, extended by
Jens to support splicing to files and fixing the initial implementation
bugs.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2006-03-31 04:28:18 +0800

29 Mar, 2006

1 commit

4b6f5d20b [PATCH] Make most file operations structs in fs/ const ... Browse Code »

This is a conversion to make the various file_operations structs in fs/
const. Basically a regexp job, with a few manual fixups

The goal is both to increase correctness (harder to accidentally write to
shared datastructures) and reducing the false sharing of cachelines with
things that get dirty in .data (while .rodata is nicely read only and thus
cache clean)

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2006-03-29 01:16:06 +0800

27 Mar, 2006

10 commits

a0e928523 [PATCH] ext3: "nobh" writeback support for filesystems blocksize < pagesize ... Browse Code »

There is no valid reason why we can't support "nobh" option for filesystems
with blocksize != PAGESIZE.

This patch lets them use "nobh" option for writeback mode for blocksize <
pagesize.

Signed-off-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-03-27 00:57:02 +0800
f91a2ad2e [PATCH] ext3: multi-block get_block() ... Browse Code »

Mingming Cao recently added multi-block allocation support for ext3,
currently used only by DIO. I added support to map multiple blocks for
mpage_readpages(). This patch add support for ext3_get_block() to deal
with multi-block mapping. Basically it renames ext3_direct_io_get_blocks()
as ext3_get_block().

Signed-off-by: Badari Pulavarty
Cc: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-03-27 00:57:02 +0800
d6859bfca [PATCH] ext3: cleanups and WARN_ON() ... Browse Code »

- Clean up a few little layout things and comments.

- Add a WARN_ON to a case which I was wondering about.

- Tune up some inlines.

Cc: Mingming Cao
Cc: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-27 00:57:02 +0800
1d8fa7a2b [PATCH] remove ->get_blocks() support ... Browse Code »

Now that get_block() can handle mapping multiple disk blocks, no need to have
->get_blocks(). This patch removes fs specific ->get_blocks() added for DIO
and makes it users use get_block() instead.

Signed-off-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-03-27 00:57:01 +0800
d48589bfa [PATCH] ext3_get_blocks: Adjust reservation window size for mblocks ... Browse Code »

Optimize the block reservation and the multiple block allocation: with the
knowledge of the total number of blocks ahead, set or adjust the reservation
window size properly (based on the number of blocks needed) before block
allocation happens: if there isn't any reservation yet, make sure the
reservation window equals to or greater than the number of blocks needed,
before create an reservation window; if a reservation window is already
exists, try to extends the window size to match the number of blocks to
allocate. This could increase the possibility of completing multiple blocks
allocation in a single request, as blocks are only allocated in the range of
the inode's reservation window.

Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-03-27 00:57:01 +0800
faa569763 [PATCH] ext3_get_blocks: Adjust accounting info in ext3_new_blocks() ... Browse Code »

Update accounting information (quota, boundary checks, free blocks number etc)
in ext3_new_blocks().

Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-03-27 00:57:01 +0800
b54e41ec1 [PATCH] ext3_get_blocks: support multiple blocks allocation in ext3_new_block() ... Browse Code »

Change ext3_try_to_allocate() (called via ext3_new_blocks()) to try to
allocate the requested number of blocks on a best effort basis: After
allocated the first block, it will always attempt to allocate the next few(up
to the requested size and not beyond the reservation window) adjacent blocks
at the same time.

Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-03-27 00:57:01 +0800
b47b24781 [PATCH] ext3_get_blocks: multiple block allocation ... Browse Code »

Add support for multiple block allocation in ext3-get-blocks().

Look up the disk block mapping and count the total number of blocks to
allocate, then pass it to ext3_new_block(), where the real block allocation is
performed. Once multiple blocks are allocated, prepare the branch with those
just allocated blocks info and finally splice the whole branch into the block
mapping tree.

Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-03-27 00:57:01 +0800
89747d369 [PATCH] ext3_get_blocks: Mapping multiple blocks at a once ... Browse Code »

Currently ext3_get_block() only maps or allocates one block at a time. This
is quite inefficient for sequential IO workload.

I have posted a early implements a simply multiple block map and allocation
with current ext3. The basic idea is allocating the 1st block in the existing
way, and attempting to allocate the next adjacent blocks on a best effort
basis. More description about the implementation could be found here:
http://marc.theaimsgroup.com/?l=ext2-devel&m=112162230003522&w=2

The following the latest version of the patch: break the original patch into 5
patches, re-worked some logicals, and fixed some bugs. The break ups are:

[patch 1] Adding map multiple blocks at a time in ext3_get_blocks()
[patch 2] Extend ext3_get_blocks() to support multiple block allocation
[patch 3] Implement multiple block allocation in ext3-try-to-allocate
(called via ext3_new_block()).
[patch 4] Proper accounting updates in ext3_new_blocks()
[patch 5] Adjust reservation window size properly (by the given number
of blocks to allocate) before block allocation to increase the
possibility of allocating multiple blocks in a single call.

Tests done so far includes fsx,tiobench and dbench. The following numbers
collected from Direct IO tests (1G file creation/read) shows the system time
have been greatly reduced (more than 50% on my 8 cpu system) with the patches.

1G file DIO write:
2.6.15 2.6.15+patches
real 0m31.275s 0m31.161s
user 0m0.000s 0m0.000s
sys 0m3.384s 0m0.564s

1G file DIO read:
2.6.15 2.6.15+patches
real 0m30.733s 0m30.624s
user 0m0.000s 0m0.004s
sys 0m0.748s 0m0.380s

Some previous test we did on buffered IO with using multiple blocks allocation
and delayed allocation shows noticeable improvement on throughput and system
time.

This patch:

Add support of mapping multiple blocks in one call.

This is useful for DIO reads and re-writes (where blocks are already
allocated), also is in line with Christoph's proposal of using getblocks() in
mpage_readpage() or mpage_readpages().

Signed-off-by: Mingming Cao
Cc: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-03-27 00:57:00 +0800
2ff28e22b [PATCH] Make address_space_operations->invalidatepage return void ... Browse Code »

The return value of this function is never used, so let's be honest and
declare it as void.

Some places where invalidatepage returned 0, I have inserted comments
suggesting a BUG_ON.

[akpm@osdl.org: JBD BUG fix]
[akpm@osdl.org: rework for git-nfs]
[akpm@osdl.org: don't go BUG in block_invalidate_page()]
Signed-off-by: Neil Brown
Acked-by: Dave Kleikamp
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-27 00:56:55 +0800