Eric Lee / smarc-fsl-linux-kernel

09 Jan, 2009

5 commits

2150edc6c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits)
jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs
ext4: Remove "extents" mount option
block: Add Kconfig help which notes that ext4 needs CONFIG_LBD
ext4: Make printk's consistently prefixed with "EXT4-fs: "
ext4: Add sanity checks for the superblock before mounting the filesystem
ext4: Add mount option to set kjournald's I/O priority
jbd2: Submit writes to the journal using WRITE_SYNC
jbd2: Add pid and journal device name to the "kjournald2 starting" message
ext4: Add markers for better debuggability
ext4: Remove code to create the journal inode
ext4: provide function to release metadata pages under memory pressure
ext3: provide function to release metadata pages under memory pressure
add releasepage hooks to block devices which can be used by file systems
ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc
ext4: Init the complete page while building buddy cache
ext4: Don't allow new groups to be added during block allocation
ext4: mark the blocks/inode bitmap beyond end of group as used
ext4: Use new buffer_head flag to check uninit group bitmaps initialization
ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
ext4: code cleanup
...

Linus Torvalds
2009-01-09 09:14:59 +0800
be857df1d generic swap(): ext3: remove local swap() macro ... Browse Code »

Use the new generic implementation.

Signed-off-by: Wu Fengguang
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-01-09 00:31:15 +0800
04143e2fb ext3: tighten restrictions on inode flags ... Browse Code »

At the moment there are few restrictions on which flags may be set on
which inodes. Specifically DIRSYNC may only be set on directories and
IMMUTABLE and APPEND may not be set on links. Tighten that to disallow
TOPDIR being set on non-directories and only NODUMP and NOATIME to be set
on non-regular file, non-directories.

Introduces a flags masking function which masks flags based on mode and
use it during inode creation and when flags are set via the ioctl to
facilitate future consistency.

Signed-off-by: Duane Griffin
Acked-by: Andreas Dilger
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Duane Griffin
2009-01-09 00:31:01 +0800
2e8671cb5 ext3: don't inherit inappropriate inode flags from parent ... Browse Code »

At present INDEX is the only flag that new ext3 inodes do NOT inherit from
their parent. In addition prevent the flags DIRTY, ECOMPR, IMAGIC and
TOPDIR from being inherited. List inheritable flags explicitly to prevent
future flags from accidentally being inherited.

This fixes the TOPDIR flag inheritance bug reported at
http://bugzilla.kernel.org/show_bug.cgi?id=9866.

Signed-off-by: Duane Griffin
Acked-by: Andreas Dilger
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Duane Griffin
2009-01-09 00:31:01 +0800
5df096d67 ext3: allocate ->s_blockgroup_lock separately ... Browse Code »

As spotted by kmemtrace, struct ext3_sb_info is 17152 bytes on 64-bit
which makes it a very bad fit for SLAB allocators. The culprit of the
wasted memory is ->s_blockgroup_lock which can be as big as 16 KB when
NR_CPUS >= 32.

To fix that, allocate ->s_blockgroup_lock, which fits nicely in a order 2
page in the worst case, separately. This shinks down struct ext3_sb_info
enough to fit a 1 KB slab cache so now we allocate 16 KB + 1 KB instead of
32 KB saving 15 KB of memory.

Acked-by: Andreas Dilger
Signed-off-by: Pekka Enberg
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka Enberg
2009-01-09 00:31:00 +0800

06 Jan, 2009

3 commits

157091a2c ext3: Add default allocation routines for quota structures ... Browse Code »

Signed-off-by: Jan Kara
Signed-off-by: Mark Fasheh

Jan Kara
2009-01-06 00:40:25 +0800
ee0d5ffe0 ext3: Use sb_any_quota_loaded() instead of sb_any_quota_enabled() ... Browse Code »

Signed-off-by: Jan Kara
Signed-off-by: Mark Fasheh

Jan Kara
2009-01-06 00:36:56 +0800
6b082b531 ext3: provide function to release metadata pages under memory pressure ... Browse Code »

Pages in the page cache belonging to ext3 data files are released via
the ext3_releasepage() function specified in the ext3 inode's
address_space_ops. However, metadata blocks (such as indirect blocks,
directory blocks, etc) are managed via the block device
address_space_ops, and they can not be released by
try_to_free_buffers() if they have a journal head attached to them.

To address this, we supply a try_to_free_pages() function which calls
journal_try_to_free_buffers() function to free the metadata, and which
is called by the block device's blkdev_releasepage() function.

Signed-off-by: Toshiyuki Okajima
Signed-off-by: "Theodore Ts'o"
Cc: linux-fsdevel@vger.kernel.org

Toshiyuki Okajima
2009-01-06 11:38:14 +0800

05 Jan, 2009

1 commit

54566b2c1 fs: symlink write_begin allocation context fix ... Browse Code »

With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened. They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim. This bug could
cause filesystem deadlocks.

The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock. The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.

Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
this flag in their write_begin function. Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).

This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
random example).

[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin
Reviewed-by: KOSAKI Motohiro
Cc: [2.6.28.x]
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
[ Cleaned up the calling convention: just pass in the AOP flags
untouched to the grab_cache_page_write_begin() function. That
just simplifies everybody, and may even allow future expansion of the
logic. - Linus ]
Signed-off-by: Linus Torvalds

Nick Piggin
2009-01-05 05:33:20 +0800

01 Jan, 2009

2 commits

c38012daa nfsd race fixes: ext3 ... Browse Code »

ext3 analog of the previous patch

Signed-off-by: Al Viro

Al Viro
2009-01-01 07:07:44 +0800
b5ed3112b ext3: ensure fast symlinks are NUL-terminated ... Browse Code »

Ensure fast symlink targets are NUL-terminated, even if corrupted
on-disk.

Cc: Andrew Morton
Cc: Stephen Tweedie
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Duane Griffin
Signed-off-by: Al Viro

Duane Griffin
2009-01-01 07:07:39 +0800

07 Dec, 2008

1 commit

59e315b4c ext3/4: Fix loop index in do_split() so it is signed ... Browse Code »

This fixes a gcc warning but it doesn't appear able to result in a
failure, since the primary way the loop is exited is the first
conditional in the for loop, and at least for a consistent filesystem,
the signed/unsigned should in practice never be exposed.

Signed-off-by: Roel Kluin
Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-12-07 05:58:39 +0800

14 Nov, 2008

2 commits

2b8289256 Merge branch 'master' into next ... Browse Code »

Conflicts:
security/keys/internal.h
security/keys/process_keys.c
security/keys/request_key.c

Fixed conflicts above by using the non 'tsk' versions.

Signed-off-by: James Morris

James Morris
2008-11-14 08:29:12 +0800
6a2f90e9f CRED: Wrap task credential accesses in the Ext3 filesystem ... Browse Code »

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells
Reviewed-by: James Morris
Acked-by: Serge Hallyn
Cc: Stephen Tweedie
Cc: Andrew Morton
Cc: adilger@sun.com
Cc: linux-ext4@vger.kernel.org
Signed-off-by: James Morris

David Howells
2008-11-14 07:38:51 +0800

13 Nov, 2008

1 commit

6cdfcc275 ext3: Clean up outdated and incorrect comment for ext3_write_super() ... Browse Code »

Signed-off-by: "Theodore Ts'o"
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Theodore Tso
2008-11-13 09:17:17 +0800

07 Nov, 2008

1 commit

c87591b71 ext3: wait on all pending commits in ext3_sync_fs ... Browse Code »

In ext3_sync_fs, we only wait for a commit to finish if we started it, but
there may be one already in progress which will not be synced.

In the case of a data=ordered umount with pending long symlinks which are
delayed due to a long list of other I/O on the backing block device, this
causes the buffer associated with the long symlinks to not be moved to the
inode dirty list in the second phase of fsync_super. Then, before they
can be dirtied again, kjournald exits, seeing the UMOUNT flag and the
dirty pages are never written to the backing block device, causing long
symlink corruption and exposing new or previously freed block data to
userspace.

This can be reproduced with a script created
by Eric Sandeen :

#!/bin/bash

umount /mnt/test2
mount /dev/sdb4 /mnt/test2
rm -f /mnt/test2/*
dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
touch
/mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
ln -s
/mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
/mnt/test2/link
umount /mnt/test2
mount /dev/sdb4 /mnt/test2
ls /mnt/test2/
umount /mnt/test2

To ensure all commits are synced, we flush all journal commits now when
sync_fs'ing ext3.

Signed-off-by: Arthur Jones
Cc: Eric Sandeen
Cc: Theodore Ts'o
Cc:
Cc: [2.6.everything]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arthur Jones
2008-11-07 07:41:19 +0800

29 Oct, 2008

1 commit

5e1f8c9e2 ext3: Add support for non-native signed/unsigned htree hash algorithms ... Browse Code »

The original ext3 hash algorithms assumed that variables of type char
were signed, as God and K&R intended. Unfortunately, this assumption
is not true on some architectures. Userspace support for marking
filesystems with non-native signed/unsigned chars was added two years
ago, but the kernel-side support was never added (until now).

Signed-off-by: "Theodore Ts'o"
Cc: akpm@linux-foundation.org
Cc: linux-kernel@vger.kernel.org

Theodore Ts'o
2008-10-29 01:21:55 +0800

28 Oct, 2008

1 commit

44d6f7875 ext3: fix a bug accessing freed memory in ext3_abort ... Browse Code »

Vegard Nossum reported a bug which accesses freed memory (found via
kmemcheck). When journal has been aborted, ext3_put_super() calls
ext3_abort() after freeing the journal_t object, and then ext3_abort()
accesses it. This patch fix it.

Signed-off-by: Hidehiro Kawai
Acked-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Hidehiro Kawai
2008-10-28 10:51:46 +0800

26 Oct, 2008

1 commit

8c9fa93d5 ext3: Fix duplicate entries returned from getdents() system call ... Browse Code »

Fix a regression caused by commit 6a897cf4, "ext3: fix ext3_dx_readdir
hash collision handling", where deleting files in a large directory
(requiring more than one getdents system call), results in some
filenames being returned twice. This was caused by a failure to
update info->curr_hash and info->curr_minor_hash, so that if the
directory had gotten modified since the last getdents() system call
(as would be the case if the user is running "rm -r" or "git clean"),
a directory entry would get returned twice to the userspace.

This patch fixes the bug reported by Markus Trippelsdorf at:
http://bugzilla.kernel.org/show_bug.cgi?id=11844

Signed-off-by: "Theodore Ts'o"
Tested-by: Markus Trippelsdorf

Theodore Ts'o
2008-10-26 10:37:44 +0800

24 Oct, 2008

3 commits

12e1ec9ff ext3 quota support: fix compile failure ... Browse Code »

This one was due to a merge error: we added a use of nd.path in commit
2d7c820e56ce83b23daee9eb5343730fb309418e ("ext3: add checks for errors
from jbd"), and concurrently we got rid of 'nd' and used a naked 'path'
in commit 8264613def2e5c4f12bc3167713090fd172e6055 ("[PATCH] switch
quota_on-related stuff to kern_path()").

That all merged cleanly, but it didn't actually _work_. This should fix
it.

Signed-off-by: Linus Torvalds

Linus Torvalds
2008-10-24 02:48:56 +0800
224848564 Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
[PATCH] kill the rest of struct file propagation in block ioctls
[PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
[PATCH] get rid of blkdev_locked_ioctl()
[PATCH] get rid of blkdev_driver_ioctl()
[PATCH] sanitize blkdev_get() and friends
[PATCH] remember mode of reiserfs journal
[PATCH] propagate mode through swsusp_close()
[PATCH] propagate mode through open_bdev_excl/close_bdev_excl
[PATCH] pass fmode_t to blkdev_put()
[PATCH] kill the unused bsize on the send side of /dev/loop
[PATCH] trim file propagation in block/compat_ioctl.c
[PATCH] end of methods switch: remove the old ones
[PATCH] switch sr
[PATCH] switch sd
[PATCH] switch ide-scsi
[PATCH] switch tape_block
[PATCH] switch dcssblk
[PATCH] switch dasd
[PATCH] switch mtd_blkdevs
[PATCH] switch mmc
...

Linus Torvalds
2008-10-24 01:23:07 +0800
5ed487bc2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (46 commits)
[PATCH] fs: add a sanity check in d_free
[PATCH] i_version: remount support
[patch] vfs: make security_inode_setattr() calling consistent
[patch 1/3] FS_MBCACHE: don't needlessly make it built-in
[PATCH] move executable checking into ->permission()
[PATCH] fs/dcache.c: update comment of d_validate()
[RFC PATCH] touch_mnt_namespace when the mount flags change
[PATCH] reiserfs: add missing llseek method
[PATCH] fix ->llseek for more directories
[PATCH vfs-2.6 6/6] vfs: add LOOKUP_RENAME_TARGET intent
[PATCH vfs-2.6 5/6] vfs: remove LOOKUP_PARENT from non LOOKUP_PARENT lookup
[PATCH vfs-2.6 4/6] vfs: remove unnecessary fsnotify_d_instantiate()
[PATCH vfs-2.6 3/6] vfs: add __d_instantiate() helper
[PATCH vfs-2.6 2/6] vfs: add d_ancestor()
[PATCH vfs-2.6 1/6] vfs: replace parent == dentry->d_parent by IS_ROOT()
[PATCH] get rid of on-stack dentry in udf
[PATCH 2/2] anondev: switch to IDA
[PATCH 1/2] anondev: init IDR statically
[JFFS2] Use d_splice_alias() not d_add() in jffs2_lookup()
[PATCH] Optimise NFS readdir hack slightly.
...

Linus Torvalds
2008-10-24 01:22:40 +0800

23 Oct, 2008

4 commits

2d7c820e5 ext3: add checks for errors from jbd ... Browse Code »

If the journal has aborted due to a checkpointing failure, we have to
keep the contents of the journal space. Otherwise, the filesystem will
lose uncheckpointed metadata completely and become inconsistent. To
avoid this, we need to keep needs_recovery flag if checkpoint has
failed.

With this patch, ext3_put_super() detects a checkpointing failure from
the return value of journal_destroy(), then it invokes ext3_abort() to
make the filesystem read only and keep needs_recovery flag. Errors
from journal_flush() are also handled by this patch in some places.

Signed-off-by: Hidehiro Kawai
Cc: Jan Kara
Cc: Stephen Rothwell
Cc: Al Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hidehiro Kawai
2008-10-23 23:55:01 +0800
734711aba [PATCH] get rid of on-stack fake dentry in ext3_get_parent() ... Browse Code »

Better pass parent and qstr to ext3_find_entry() explicitly than
use such kludges, especially since the stack footprint is nasty
enough and we have every chance to be deep in call chain.

Signed-off-by: Al Viro

Al Viro
2008-10-23 17:13:08 +0800
440037287 [PATCH] switch all filesystems over to d_obtain_alias ... Browse Code »

Switch all users of d_alloc_anon to d_obtain_alias.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2008-10-23 17:13:01 +0800
8264613de [PATCH] switch quota_on-related stuff to kern_path() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-10-23 17:12:44 +0800

21 Oct, 2008

2 commits

9a1c35427 [PATCH] pass fmode_t to blkdev_put() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-10-21 19:48:58 +0800
6da0b38f4 fs/Kconfig: move ext2, ext3, ext4, JBD, JBD2 out ... Browse Code »

Use fs/*/Kconfig more, which is good because everything related to one
filesystem is in one place and fs/Kconfig is quite fat.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2008-10-21 02:43:59 +0800

20 Oct, 2008

6 commits

cdbf6dba2 ext3: avoid printk floods in the face of directory corruption ... Browse Code »

A very large directory with many read failures (either due to storage
problems, or due to invalid size & blocks from corruption) will generate a
printk storm as the filesystem continues to try to read all the blocks.
This flood of messages can tie up the box until it is complete - which may
be a very long time, especially for very large corrupted values.

This is fixed by only reporting the corruption once each time we try to
read the directory.

Signed-off-by: Eric Sandeen
Signed-off-by: "Theodore Ts'o"
Cc: Eugene Teo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Sandeen
2008-10-20 23:52:38 +0800
5ec8b75e3 ext3: truncate block allocated on a failed ext3_write_begin ... Browse Code »

For blocksize < pagesize we need to remove blocks that got allocated in
block_write_begin() if we fail with ENOSPC for later blocks.
block_write_begin() internally does this if it allocated page locally.
This makes sure we don't have blocks outside inode.i_size during ENOSPC.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aneesh Kumar K.V
2008-10-20 23:52:38 +0800
6a897cf44 ext3: fix ext3_dx_readdir hash collision handling ... Browse Code »

This fixes a bug where readdir() would return a directory entry twice
if there was a hash collision in an hash tree indexed directory.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Eugene Dashevsky
Signed-off-by: Mike Snitzer
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eugene Dashevsky
2008-10-20 23:52:38 +0800
0e4fb5e28 ext3: add an option to control error handling on file data ... Browse Code »

If the journal doesn't abort when it gets an IO error in file data blocks,
the file data corruption will spread silently. Because most of
applications and commands do buffered writes without fsync(), they don't
notice the IO error. It's scary for mission critical systems. On the
other hand, if the journal aborts whenever it gets an IO error in file
data blocks, the system will easily become inoperable. So this patch
introduces a filesystem option to determine whether it aborts the journal
or just call printk() when it gets an IO error in file data.

If you mount a ext3 fs with data_err=abort option, it aborts on file data
write error. If you mount it with data_err=ignore, it doesn't abort, just
call printk(). data_err=ignore is the default.

Signed-off-by: Hidehiro Kawai
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hidehiro Kawai
2008-10-20 23:52:37 +0800
46d01a225 ext3: fix ext3 block reservation early ENOSPC issue ... Browse Code »

We could run into ENOSPC error on ext3, even when there is free blocks on
the filesystem.

The problem is triggered in the case the goal block group has 0 free
blocks , and the rest block groups are skipped due to the check of
"free_blocks < windowsz/2". Current code could fall back to non
reservation allocation to prevent early ENOSPC after examing all the block
groups with reservation on , but this code was bypassed if the reservation
window is turned off already, which is true in this case.

This patch fixed two issues:
1) We don't need to turn off block reservation if the goal block group has
0 free blocks left and continue search for the rest of block groups.

Current code the intention is to turn off the block reservation if the
goal allocation group has a few (some) free blocks left (not enough for
make the desired reservation window),to try to allocation in the goal
block group, to get better locality. But if the goal blocks have 0 free
blocks, it should leave the block reservation on, and continues search for
the next block groups,rather than turn off block reservation completely.

2) we don't need to check the window size if the block reservation is off.

The problem was originally found and fixed in ext4.

Signed-off-by: Mingming Cao
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2008-10-20 23:52:37 +0800
972fbf779 ext3: don't try to resize if there are no reserved gdt blocks left ... Browse Code »

When trying to resize a ext3 fs and you run out of reserved gdt blocks,
you get an error that doesn't actually tell you what went wrong, it just
says that the gdb it picked is not correct, which is the case since you
don't have any reserved gdt blocks left. This patch adds a check to make
sure you have reserved gdt blocks to use, and if not prints out a more
relevant error.

Signed-off-by: Josef Bacik
Cc:
Cc: Andreas Dilger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef Bacik
2008-10-20 23:52:37 +0800

14 Oct, 2008

1 commit

a447c0932 vfs: Use const for kernel parser table ... Browse Code »

This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.

This was posted for review some time ago and I believe its been in -mm
since then.

Signed-off-by: Steven Whitehouse
Cc: Alexander Viro
Signed-off-by: Linus Torvalds

Steven Whitehouse
2008-10-14 01:10:37 +0800

04 Oct, 2008

1 commit

68c9d702b generic block based fiemap implementation ... Browse Code »

Any block based fs (this patch includes ext3) just has to declare its own
fiemap() function and then call this generic function with its own
get_block_t. This works well for block based filesystems that will map
multiple contiguous blocks at one time, but will work for filesystems that
only map one block at a time, you will just end up with an "extent" for each
block. One gotcha is this will not play nicely where there is hole+data
after the EOF. This function will assume its hit the end of the data as soon
as it hits a hole after the EOF, so if there is any data past that it will
not pick that up. AFAIK no block based fs does this anyway, but its in the
comments of the function anyway just in case.

Signed-off-by: Josef Bacik
Signed-off-by: Mark Fasheh
Signed-off-by: "Theodore Ts'o"
Cc: linux-fsdevel@vger.kernel.org

Josef Bacik
2008-10-04 05:32:43 +0800

01 Aug, 2008

1 commit

77e69dac3 [PATCH] fix races and leaks in vfs_quota_on() users ... Browse Code »

* new helper: vfs_quota_on_path(); equivalent of vfs_quota_on() sans the
pathname resolution.
* callers of vfs_quota_on() that do their own pathname resolution and
checks based on it are switched to vfs_quota_on_path(); that way we
avoid the races.
* reiserfs leaked dentry/vfsmount references on several failure exits.

Signed-off-by: Al Viro

Al Viro
2008-08-01 23:25:25 +0800

29 Jul, 2008

1 commit

8ab22b9ab vfs: pagecache usage optimization for pagesize!=blocksize ... Browse Code »

When we read some part of a file through pagecache, if there is a
pagecache of corresponding index but this page is not uptodate, read IO
is issued and this page will be uptodate.

I think this is good for pagesize == blocksize environment but there is
room for improvement on pagesize != blocksize environment. Because in
this case a page can have multiple buffers and even if a page is not
uptodate, some buffers can be uptodate.

So I suggest that when all buffers which correspond to a part of a file
that we want to read are uptodate, use this pagecache and copy data from
this pagecache to user buffer even if a page is not uptodate. This can
reduce read IO and improve system throughput.

I wrote a benchmark program and got result number with this program.

This benchmark do:

1: mount and open a test file.

2: create a 512MB file.

3: close a file and umount.

4: mount and again open a test file.

5: pwrite randomly 300000 times on a test file. offset is aligned
by IO size(1024bytes).

6: measure time of preading randomly 100000 times on a test file.

The result was:
2.6.26
330 sec

2.6.26-patched
226 sec

Arch:i386
Filesystem:ext3
Blocksize:1024 bytes
Memory: 1GB

On ext3/4, a file is written through buffer/block. So random read/write
mixed workloads or random read after random write workloads are optimized
with this patch under pagesize != blocksize environment. This test result
showed this.

The benchmark program is as follows:

#include
#include
#include
#include
#include
#include
#include
#include
#include

#define LEN 1024
#define LOOP 1024*512 /* 512MB */

main(void)
{
unsigned long i, offset, filesize;
int fd;
char buf[LEN];
time_t t1, t2;

if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
perror("cannot mount\n");
exit(1);
}
memset(buf, 0, LEN);
fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
if (fd < 0) {
perror("cannot open file\n");
exit(1);
}
for (i = 0; i < LOOP; i++)
write(fd, buf, LEN);
close(fd);
if (umount("/root/test1/") < 0) {
perror("cannot umount\n");
exit(1);
}
if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
perror("cannot mount\n");
exit(1);
}
fd = open("/root/test1/testfile", O_RDWR);
if (fd < 0) {
perror("cannot open file\n");
exit(1);
}

filesize = LEN * LOOP;
for (i = 0; i < 300000; i++){
offset = (random() % filesize) & (~(LEN - 1));
pwrite(fd, buf, LEN, offset);
}
printf("start test\n");
time(&t1);
for (i = 0; i < 100000; i++){
offset = (random() % filesize) & (~(LEN - 1));
pread(fd, buf, LEN, offset);
}
time(&t2);
printf("%ld sec\n", t2-t1);
close(fd);
if (umount("/root/test1/") < 0) {
perror("cannot umount\n");
exit(1);
}
}

Signed-off-by: Hisashi Hifumi
Cc: Nick Piggin
Cc: Christoph Hellwig
Cc: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hisashi Hifumi
2008-07-29 07:30:21 +0800

27 Jul, 2008

2 commits

e6305c43e [PATCH] sanitize ->permission() prototype ... Browse Code »

* kill nameidata * argument; map the 3 bits in ->flags anybody cares
about to new MAY_... ones and pass with the mask.
* kill redundant gfs2_iop_permission()
* sanitize ecryptfs_permission()
* fix remaining places where ->permission() instances might barf on new
MAY_... found in mask.

The obvious next target in that direction is permission(9)

folded fix for nfs_permission() breakage from Miklos Szeredi

Signed-off-by: Al Viro

Al Viro
2008-07-27 08:53:14 +0800
51cc50685 SL*B: drop kmem cache argument from constructor ... Browse Code »

Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan
Acked-by: Pekka Enberg
Acked-by: Christoph Lameter
Cc: Jon Tollefson
Cc: Nick Piggin
Cc: Matt Mackall
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2008-07-27 03:00:07 +0800