04 Oct, 2008
1 commit
-
Any block based fs (this patch includes ext3) just has to declare its own
fiemap() function and then call this generic function with its own
get_block_t. This works well for block based filesystems that will map
multiple contiguous blocks at one time, but will work for filesystems that
only map one block at a time, you will just end up with an "extent" for each
block. One gotcha is this will not play nicely where there is hole+data
after the EOF. This function will assume its hit the end of the data as soon
as it hits a hole after the EOF, so if there is any data past that it will
not pick that up. AFAIK no block based fs does this anyway, but its in the
comments of the function anyway just in case.Signed-off-by: Josef Bacik
Signed-off-by: Mark Fasheh
Signed-off-by: "Theodore Ts'o"
Cc: linux-fsdevel@vger.kernel.org
29 Jul, 2008
1 commit
-
When we read some part of a file through pagecache, if there is a
pagecache of corresponding index but this page is not uptodate, read IO
is issued and this page will be uptodate.I think this is good for pagesize == blocksize environment but there is
room for improvement on pagesize != blocksize environment. Because in
this case a page can have multiple buffers and even if a page is not
uptodate, some buffers can be uptodate.So I suggest that when all buffers which correspond to a part of a file
that we want to read are uptodate, use this pagecache and copy data from
this pagecache to user buffer even if a page is not uptodate. This can
reduce read IO and improve system throughput.I wrote a benchmark program and got result number with this program.
This benchmark do:
1: mount and open a test file.
2: create a 512MB file.
3: close a file and umount.
4: mount and again open a test file.
5: pwrite randomly 300000 times on a test file. offset is aligned
by IO size(1024bytes).6: measure time of preading randomly 100000 times on a test file.
The result was:
2.6.26
330 sec2.6.26-patched
226 secArch:i386
Filesystem:ext3
Blocksize:1024 bytes
Memory: 1GBOn ext3/4, a file is written through buffer/block. So random read/write
mixed workloads or random read after random write workloads are optimized
with this patch under pagesize != blocksize environment. This test result
showed this.The benchmark program is as follows:
#include
#include
#include
#include
#include
#include
#include
#include
#include#define LEN 1024
#define LOOP 1024*512 /* 512MB */main(void)
{
unsigned long i, offset, filesize;
int fd;
char buf[LEN];
time_t t1, t2;if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
perror("cannot mount\n");
exit(1);
}
memset(buf, 0, LEN);
fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
if (fd < 0) {
perror("cannot open file\n");
exit(1);
}
for (i = 0; i < LOOP; i++)
write(fd, buf, LEN);
close(fd);
if (umount("/root/test1/") < 0) {
perror("cannot umount\n");
exit(1);
}
if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
perror("cannot mount\n");
exit(1);
}
fd = open("/root/test1/testfile", O_RDWR);
if (fd < 0) {
perror("cannot open file\n");
exit(1);
}filesize = LEN * LOOP;
for (i = 0; i < 300000; i++){
offset = (random() % filesize) & (~(LEN - 1));
pwrite(fd, buf, LEN, offset);
}
printf("start test\n");
time(&t1);
for (i = 0; i < 100000; i++){
offset = (random() % filesize) & (~(LEN - 1));
pread(fd, buf, LEN, offset);
}
time(&t2);
printf("%ld sec\n", t2-t1);
close(fd);
if (umount("/root/test1/") < 0) {
perror("cannot umount\n");
exit(1);
}
}Signed-off-by: Hisashi Hifumi
Cc: Nick Piggin
Cc: Christoph Hellwig
Cc: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jul, 2008
2 commits
-
* kill nameidata * argument; map the 3 bits in ->flags anybody cares
about to new MAY_... ones and pass with the mask.
* kill redundant gfs2_iop_permission()
* sanitize ecryptfs_permission()
* fix remaining places where ->permission() instances might barf on new
MAY_... found in mask.The obvious next target in that direction is permission(9)
folded fix for nfs_permission() breakage from Miklos Szeredi
Signed-off-by: Al Viro
-
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.cThis is flag day, yes.
Signed-off-by: Alexey Dobriyan
Acked-by: Pekka Enberg
Acked-by: Christoph Lameter
Cc: Jon Tollefson
Cc: Nick Piggin
Cc: Matt Mackall
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Jul, 2008
2 commits
-
Move declarations of some macros, which should be in fact functions to
quotaops.h. This way they can be later converted to inline functions
because we can now use declarations from quota.h. Also add necessary
includes of quotaops.h to a few files.[akpm@linux-foundation.org: fix JFS build]
[akpm@linux-foundation.org: fix UFS build]
[vegard.nossum@gmail.com: fix QUOTA=n build]
Signed-off-by: Jan Kara
Cc: Vegard Nossum
Cc: Arjen Pool
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
remove the definitions of macros:
XATTR_TRUSTED_PREFIX
XATTR_USER_PREFIX
since they are defined in linux/xattr.hSigned-off-by: Shen Feng
Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Apr, 2008
10 commits
-
If the block allocator gets blocks out of system zone ext2 calls ext2_error.
But if the file system is mounted with errors=continue retry block allocation.
We need to mark the system zone blocks as in use to make sure retry don't
pick them againSystem zone is the block range mapping block bitmap, inode bitmap and inode
table.[akpm@linux-foundation.org: fix typo in comment]
Signed-off-by: Aneesh Kumar K.V
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)//
@ disable unlikely @ expression E,f; @@(
if () { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)@@ expression E,f; @@
(
if () { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
//Signed-off-by: Julia Lawall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use ext2_fsblk_t type for filesystem-wide blocks number
Signed-off-by: Akinobu Mita
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use ext2_group_first_block_no() and assign the return values to
ext2_fsblk_t variables.Signed-off-by: Akinobu Mita
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Improve ext2_readdir() return value for ext2_get_page() failure by using the
actual result of ext2_get_page().Signed-off-by: Akinobu Mita
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Convert byte order of constant instead of variable which can be done at
compile time (vs run time).Signed-off-by: Marcin Slusarz
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
replace all:
little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
expression_in_cpu_byteorder);
with:
leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
generated with semantic patchSigned-off-by: Marcin Slusarz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Convert XIP to support non-struct page backed memory, using VM_MIXEDMAP for
the user mappings.This requires the get_xip_page API to be changed to an address based one.
Improve the API layering a little bit too, while we're here.This is required in order to support XIP filesystems on memory that isn't
backed with struct page (but memory with struct page is still supported too).Signed-off-by: Nick Piggin
Acked-by: Carsten Otte
Cc: Jared Hulbert
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Alter the block device ->direct_access() API to work with the new
get_xip_mem() API (that requires both kaddr and pfn are returned).Some architectures will not do the right thing in their virt_to_page() for use
by XIP (to translate from the kernel virtual address returned by
direct_access(), to a user mappable pfn in XIP's page fault handler.However, we can't switch it to just return the pfn and not the kaddr, because
we have no good way to get a kva from a pfn, and XIP requires the kva for its
read(2) and write(2) handlers. So we have to return both.Signed-off-by: Jared Hulbert
Signed-off-by: Nick Piggin
Cc: Carsten Otte
Cc: Heiko Carstens
Cc: linux-mm@kvack.org
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Apr, 2008
2 commits
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/juhl/trivial: (24 commits)
DOC: A couple corrections and clarifications in USB doc.
Generate a slightly more informative error msg for bad HZ
fix typo "is" -> "if" in Makefile
ext*: spelling fix prefered -> preferred
DOCUMENTATION: Use newer DEFINE_SPINLOCK macro in docs.
KEYS: Fix the comment to match the file name in rxrpc-type.h.
RAID: remove trailing space from printk line
DMA engine: typo fixes
Remove unused MAX_NODES_SHIFT
MAINTAINERS: Clarify access to OCFS2 development mailing list.
V4L: Storage class should be before const qualifier (sn9c102)
V4L: Storage class should be before const qualifier
sonypi: Storage class should be before const qualifier
intel_menlow: Storage class should be before const qualifier
DVB: Storage class should be before const qualifier
arm: Storage class should be before const qualifier
ALSA: Storage class should be before const qualifier
acpi: Storage class should be before const qualifier
firmware_sample_driver.c: fix coding style
MAINTAINERS: Add ati_remote2 driver
...Fixed up trivial conflicts in firmware_sample_driver.c
-
Spelling fix: prefered -> preferred
Signed-off-by: Benoit Boissinot
Signed-off-by: Jesper Juhl
19 Apr, 2008
1 commit
-
Some ioctl()s can cause writes to the filesystem. Take these, and make them
use mnt_want/drop_write() instead.[AV: updated]
Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro
16 Apr, 2008
1 commit
-
mb_cache_entry_alloc() was allocating cache entries with GFP_KERNEL. But
filesystems are calling this function while holding xattr_sem so possible
recursion into the fs violates locking ordering of xattr_sem and transaction
start / i_mutex for ext2-4. Change mb_cache_entry_alloc() so that filesystems
can specify desired gfp mask and use GFP_NOFS from all of them.Signed-off-by: Jan Kara
Reported-by: Dave Jones
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Feb, 2008
2 commits
-
Add noreservation option to /proc/mounts for ext2 filesystems.
Signed-off-by: Miklos Szeredi
Acked-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Feb, 2008
1 commit
-
Stop the EXT2 filesystem from using iget() and read_inode(). Replace
ext2_read_inode() with ext2_iget(), and call that instead of iget().
ext2_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.ext2_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: David Howells
Acked-by: "Theodore Ts'o"
Cc:
Acked-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Feb, 2008
10 commits
-
Use ext[234]_bg_has_super() to remove duplicate code.
Signed-off-by: Akinobu Mita
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The argument chain for ext[234]_find_goal() is not used. This patch removes
it and fixes comment as well.Cc:
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use ext[234]_get_group_desc() to get group descriptor from group number.
Cc:
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The comment in ext[234]_new_blocks() describes about "i". But there is no
local variable called "i" in that scope. I guess it has been renamed to
group_no.Cc:
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
No BKL used anywhere, so don't mention it.
Signed-off-by: Andi Kleen
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I checked ext2_ioctl and could not find anything in there that would need the
BKL. So convert it over to use unlocked_ioctlSigned-off-by: Andi Kleen
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When a new block bitmap is read from disk in read_block_bitmap() there are a
few bits that should ALWAYS be set. In particular, the blocks given
corresponding to block bitmap, inode bitmap and inode tables. Validate the
block bitmap against these blocks.Signed-off-by: Aneesh Kumar K.V
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ext2 should not worry about checking sb->s_blocksize for XIP before the
sb's blocksize actually gets set.Signed-off-by: Nick Piggin
Acked-by: Carsten Otte
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ext2 file system was by default ignoring errors and continuing. This is
not a good default as continuing on error could lead to file system
corruption. Change the default to mark the file system readonly. Debian
and ubuntu already does this as the default in their fstab.Signed-off-by: Aneesh Kumar K.V
Cc:
Cc: Eric Sandeen
Cc: Jan Kara
Cc: Dave Jones
Cc: Chuck Ebbert
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This fixes some instances where we were continuing after calling
ext2_error. ext2_error call panic only if errors=panic mount option is
set. So we need to make sure we return correctly after ext2_error call.Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Jan, 2008
1 commit
-
The max file size for ext2 file system is now calculated
with hardcoded 4K block size. The patch fixes it to be
calculated with the right block size.Signed-off-by: Aneesh Kumar K.V
30 Nov, 2007
1 commit
-
In commit a686cd898bd999fd026a51e90fb0a3410d258ddb:
"Val's cross-port of the ext3 reservations code into ext2."
include/linux/ext2_fs.h got a new function whose return value is only
defined if __KERNEL__ is defined. Putting #ifdef __KERNEL__ around the
function seems to help, patch below.Signed-off-by: Eric Sandeen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Nov, 2007
1 commit
-
Forbid user from changing file flags on quota files. User has no bussiness
in playing with these flags when quota is on. Furthermore there is a
remote possibility of deadlock due to a lock inversion between quota file's
i_mutex and transaction's start (i_mutex for quota file is locked only when
trasaction is started in quota operations) in ext3 and ext4.Signed-off-by: Jan Kara
Cc: LIOU Payphone
Cc:
Acked-by: Dave Kleikamp
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Nov, 2007
1 commit
-
This reverts commit 7c9e69faa28027913ee059c285a5ea8382e24b5d, fixing up
conflicts in fs/ext4/balloc.c manually.The cost of doing the bitmap validation on each lookup - even when the
bitmap is cached - is absolutely prohibitive. We could, and probably
should, do it only when adding the bitmap to the buffer cache. However,
right now we are better off just reverting it.Peter Zijlstra measured the cost of this extra validation as a 85%
decrease in cached iozone, and while I had a patch that took it down to
just 17% by not being _quite_ so stupid in the validation, it was still
a big slowdown that could have been avoided by just doing it right.Cc: Peter Zijlstra
Cc: Andrew Morton
Cc: Aneesh Kumar
Cc: Andreas Dilger
Cc: Mingming Cao
Signed-off-by: Linus Torvalds
22 Oct, 2007
3 commits
-
Now that nfsd has stopped writing to the find_exported_dentry member we an
mark the export_operations constSigned-off-by: Christoph Hellwig
Cc: Neil Brown
Cc: "J. Bruce Fields"
Cc:
Cc: Dave Kleikamp
Cc: Anton Altaparmakov
Cc: David Chinner
Cc: Timothy Shimmin
Cc: OGAWA Hirofumi
Cc: Hugh Dickins
Cc: Chris Mason
Cc: Jeff Mahoney
Cc: "Vladimir V. Saveliev"
Cc: Steven Whitehouse
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Trivial switch over to the new generic helpers.
Signed-off-by: Christoph Hellwig
Cc: Neil Brown
Cc: "J. Bruce Fields"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With 64KB blocksize, a directory entry can have size 64KB which does not
fit into 16 bits we have for entry length. So we store 0xffff instead and
convert the value when read from / written to disk.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds