12 Oct, 2006
1 commit
-
Current error behaviour for ext2 and ext3 filesystems does not fully
correspond to the documentation and should be fixed.According to man 8 mount, ext2 and ext3 file systems allow to set one of 3
different on-errors behaviours:---- start of quote man 8 mount ----
errors=continue / errors=remount-ro / errors=panic
Define the behaviour when an error is encountered. (Either ignore
errors and just mark the file system erroneous and continue, or remount
the file system read-only, or panic and halt the system.) The default is
set in the filesystem superblock, and can be changed using tune2fs(8).---- end of quote ----
However EXT3_ERRORS_CONTINUE is not read from the superblock, and thus
ERRORS_CONT is not saved on the sbi->s_mount_opt. It leads to the incorrect
handle of errors on ext3.Then we've checked corresponding code in ext2 and discovered that it is buggy
as well:- EXT2_ERRORS_CONTINUE is not read from the superblock (the same);
- parse_option() does not clean the alternative values and thus something
like (ERRORS_CONT|ERRORS_RO) can be set;- if options are omitted, parse_option() does not set any of these options.
Therefore it is possible to set any combination of these options on the ext2:
- none of them may be set: EXT2_ERRORS_CONTINUE on superblock / empty mount
options;- any of them may be set using mount options;
- 2 any options may be set: by using EXT2_ERRORS_RO/EXT2_ERRORS_PANIC on the
superblock and other value in mount options;- and finally all three options may be set by adding third option in remount.
Currently ext2 uses these values only in ext2_error() and it is not leading to
any noticeable troubles. However somebody may be discouraged when he will try
to workaround EXT2_ERRORS_PANIC on the superblock by using errors=continue in
mount options.This patch:
EXT3_ERRORS_CONTINUE should be taken from the superblock as default value for
error behaviour.Signed-off-by: Dmitry Mishin
Acked-by: Vasily Averin
Acked-by: Kirill Korotaev
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Oct, 2006
7 commits
-
Some filesystems, instead of simply decrementing i_nlink, simply zero it
during an unlink operation. We need to catch these in addition to the
decrement operations.Signed-off-by: Dave Hansen
Acked-by: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is mostly included for parity with dec_nlink(), where we will have some
more hooks. This one should stay pretty darn straightforward for now.Signed-off-by: Dave Hansen
Acked-by: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When a filesystem decrements i_nlink to zero, it means that a write must be
performed in order to drop the inode from the filesystem.We're shortly going to have keep filesystems from being remounted r/o between
the time that this i_nlink decrement and that write occurs.So, add a little helper function to do the decrements. We'll tie into it in a
bit to note when i_nlink hits zero.Signed-off-by: Dave Hansen
Acked-by: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Cc: Michael Holzheu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move the Ext3 device ioctl compat stuff from fs/compat_ioctl.c to the Ext3
driver so that the Ext3 header file doesn't need to be included.Signed-Off-By: David Howells
Signed-off-by: Jens Axboe -
Signed-off-by: Jens Axboe
27 Sep, 2006
13 commits
-
This eliminates the i_blksize field from struct inode. Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.[bunk@stusta.de: cleanup]
[akpm@osdl.org: generic_fillattr() fix]
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
* Rougly half of callers already do it by not checking return value
* Code in drivers/acpi/osl.c does the following to be sure:(void)kmem_cache_destroy(cache);
* Those who check it printk something, however, slab_error already printed
the name of failed cache.
* XFS BUGs on failed kmem_cache_destroy which is not the decision
low-level filesystem driver should make. Converted to ignore.Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
* Removing useless casts
* Removing useless wrapper
* Conversion from kmalloc+memset to kzallocSigned-off-by: Panagiotis Issaris
Acked-by: Dave Kleikamp
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Conversions from kmalloc+memset to kzalloc.
Signed-off-by: Panagiotis Issaris
Jffs2-bit-acked-by: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Some of the changes in balloc.c are just cosmetic, as Andreas pointed out -
if they overflow they'll then underflow and things are fine.5th hunk actually fixes an overflow problem.
Also check for potential overflows in inode & block counts when resizing.
Signed-off-by: Eric Sandeen
Cc: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fixing up some endian-ness warnings in preparation to clone ext4 from ext3.
Signed-off-by: Dave Kleikamp
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
More white space cleanups in preparation of cloning ext4 from ext3.
Removing spaces that precede a tab.Signed-off-by: Dave Kleikamp
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
SWsoft Virtuozzo/OpenVZ Linux kernel team has discovered that ext3 error
behavior was broken in linux kernels since 2.5.x versions by the following
patch:2002/10/31 02:15:26-05:00 tytso@snap.thunk.org
Default mount options from superblock for ext2/3 filesystems
http://linux.bkbits.net:8080/linux-2.6/gnupatch@3dc0d88eKbV9ivV4ptRNM8fBuA3JBQIn case ext3 file system is mounted with errors=continue
(EXT3_ERRORS_CONTINUE) errors should be ignored when possible. However at
present in case of any error kernel aborts journal and remounts filesystem
to read-only. Such behavior was hit number of times and noted to differ
from that of 2.4.x kernels.This patch fixes this:
- do nothing in case of EXT3_ERRORS_CONTINUE,
- set EXT3_MOUNT_ABORT and call journal_abort() in all other cases
- panic() should be called after ext3_commit_super() to save
sb marked as EXT3_ERROR_FSSigned-off-by: Vasily Averin
Acked-by: Kirill Korotaev
Cc: Theodore Ts'o
Cc: "Stephen C. Tweedie"
Cc: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Mingming Cao
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In the past there were a few kernel panics related to block reservation
tree operations failure (insert/remove etc). It would be very useful to
get the block allocation reservation map info when such error happens.Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is primarily format string fixes, with changes to ialloc.c where large
inode counts could overflow, and also pass around journal_inum as an
unsigned long, just to be pedantic about it....Signed-off-by: Eric Sandeen
Cc: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I need to do some actual IO testing now, but this gets things mounting for
a 16T ext3 filesystem. (patched up e2fsprogs is needed too, I'll send that
off the kernel list)This patch fixes these issues in the kernel:
o sbi->s_groups_count overflows in ext3_fill_super()
sbi->s_groups_count = (le32_to_cpu(es->s_blocks_count) -
le32_to_cpu(es->s_first_data_block) +
EXT3_BLOCKS_PER_GROUP(sb) - 1) /
EXT3_BLOCKS_PER_GROUP(sb);at 16T, s_blocks_count is already maxed out; adding
EXT3_BLOCKS_PER_GROUP(sb) overflows it and groups_count comes out to 0.
Not really what we want, and causes a failed mount.Feel free to check my math (actually, please do!), but changing it this
way should work & avoid the overflow:(A + B - 1)/B changed to: ((A - 1)/B) + 1
o ext3_check_descriptors() overflows range checks
ext3_check_descriptors() iterates over all block groups making sure
that various bits are within the right block ranges... on the last pass
through, it is checking the error case[item] >= block + EXT3_BLOCKS_PER_GROUP(sb)
where "block" is the first block in the last block group. The last
block in this group (and the last one that will fit in 32 bits) is block
+ EXT3_BLOCKS_PER_GROUP(sb)- 1. block + EXT3_BLOCKS_PER_GROUP(sb) wraps
back around to 0.so, make things clearer with "first_block" and "last_block" where those
are first and last, inclusive, and use rather than =.Finally, the last block group may be smaller than the rest, so account
for this on the last pass through: last_block = sb->s_blocks_count - 1;(a similar patch could be done for ext2; does anyone in their right mind
use ext2 at 16T? I'll send an ext2 patch doing the same thing if that's
warranted)Signed-off-by: Eric Sandeen
Cc: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove whitespace from ext3 and jbd, before we clone ext4.
Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Sep, 2006
2 commits
-
ext3-get-blocks support caused ~20% degrade in Sequential read
performance (tiobench). Problem is with marking the buffer boundary
so IO can be submitted right away. Here is the patch to fix it.2.6.18-rc6:
-----------
# ./iotest
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 75.2726 seconds, 57.1 MB/sreal 1m15.285s
user 0m0.276s
sys 0m3.884s2.6.18-rc6 + fix:
-----------------
[root@elm3a241 ~]# ./iotest
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 62.9356 seconds, 68.2 MB/sThe boundary block check in ext3_get_blocks_handle needs to be adjusted
against the count of blocks mapped in this call, now that it can map
more than one block.Signed-off-by: Suparna Bhattacharya
Tested-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Inodes earlier than the 'first' inode (e.g. journal, resize) should be
rejected early - except the root inode. Also inode numbers that are too
big should be rejected early.[akpm@osdl.org: cleanup]
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Sep, 2006
1 commit
-
It has been reported that ext3_getblk() is not doing the right thing and
triggering following WARN():BUG: warning at fs/ext3/inode.c:1016/ext3_getblk()
ext3_getblk+0x98/0x2a6 md_wakeup_thread+0x26/0x2a
ext3_bread+0x1f/0x88 ext3_quota_read+0x136/0x1ae
v1_read_dqblk+0x61/0xac dquot_acquire+0xf6/0x107
ext3_acquire_dquot+0x46/0x68 dqget+0x155/0x1e7
dquot_transfer+0x3e0/0x3e9 dput+0x23/0x13e
ext3_setattr+0xc3/0x240 current_fs_time+0x52/0x6a
notify_change+0x2bd/0x30d chown_common+0x9c/0xc5
strncpy_from_user+0x3b/0x68 do_path_lookup+0xdf/0x266
__user_walk_fd+0x44/0x5a sys_chown+0x4a/0x55
vfs_write+0xe7/0x13c sys_mkdir+0x1f/0x23
syscall_call+0x7/0xbLooking at the code, it looks like it's not handle HOLE correctly. It ends
up returning -EIO. Here is the patch to fix it.If we really want to be paranoid, we can allow return values 0 (HOLE), 1
(we asked for one block) and return -EIO for more than 1 block. But I
really don't see a reason for doing it - all we need is the block# here.
(doesn't matter how many blocks are mapped).ext3_get_blocks_handle() returns number of blocks it mapped. It returns 0
in case of HOLE. ext3_getblk() should handle HOLE properly (currently its
dumping warning stack and returning -EIO).Signed-off-by: Badari Pulavarty
Acked-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Aug, 2006
1 commit
-
To handle the earlier bogus ENOSPC error caused by filesystem full of block
reservation, current code falls back to non block reservation, starts to
allocate block(s) from the goal allocation block group as if there is no
block reservation.Current code needs to re-load the corresponding block group descriptor for
the initial goal block group in this case. The patch fixes this.Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Aug, 2006
2 commits
-
For files other than IFREG, nobh option doesn't make sense. Modifications
to them are journalled and needs buffer heads to do that. Without this
patch, we get kernel oops in page_buffers().Signed-off-by: Badari Pulavarty
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The inode number out of an NFS file handle gets passed eventually to
ext3_get_inode_block() without any checking. If ext3_get_inode_block()
allows it to trigger an error, then bad filehandles can have unpleasant
effect - ext3_error() will usually cause a forced read-only remount, or a
panic if `errors=panic' was used.So remove the call to ext3_error there and put a matching check in
ext3/namei.c where inode numbers are read off storage.[akpm@osdl.org: fix off-by-one error]
Signed-off-by: Neil Brown
Signed-off-by: Jan Kara
Cc: Marcel Holtmann
Cc:
Cc: "Stephen C. Tweedie"
Cc: Eric Sandeen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Jul, 2006
1 commit
-
These functions no longer exist; remove their declarations.
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Jul, 2006
1 commit
-
The quota code plays interesting games with the lock ordering; to quote Jan:
| i_mutex of inode containing quota file is acquired after all other
| quota locks. i_mutex of all other inodes is acquired before quota
| locks. Quota code makes sure (by resetting inode operations and
| setting special flag on inode) that noone tries to enter quota code
| while holding i_mutex on a quota file...The good news is that all of this special case i_mutex grabbing happens in the
(per filesystem) low level quota write function. For this special case we
need a new I_MUTEX_* nesting level, since this just entirely outside any of
the regular VFS locking rules for i_mutex. I trust Jan on his blue eyes that
this is not ever going to deadlock; and based on that the patch below is what
it takes to inform lockdep of these very interesting new locking rules.The new locking rule for the I_MUTEX_QUOTA nesting level is that this is the
deepest possible level of nesting for i_mutex, and that this only should be
used in quota write (and possibly read) function of filesystems. This makes
the lock ordering of the I_MUTEX_* levels:I_MUTEX_PARENT -> I_MUTEX_CHILD -> I_MUTEX_NORMAL -> I_MUTEX_QUOTA
Has no effect on non-lockdep kernels.
Signed-off-by: Arjan van de Ven
Acked-by: Ingo Molnar
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Jul, 2006
1 commit
-
Signed-off-by: Jörn Engel
Signed-off-by: Adrian Bunk
29 Jun, 2006
1 commit
-
Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.Signed-off-by: Christoph Hellwig
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jun, 2006
1 commit
-
This patch adds "-o bh" option to force use of buffer_heads. This option
is needed when we make "nobh" as default - and if we run into problems.Signed-off-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Jun, 2006
5 commits
-
The variables nlen and rlen are defined/initialized but not used in
ext3_add_entry().Signed-off-by: Johann Lombardi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t. Convert the
rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t,
and replace the printk format string respondingly.Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Some of the in-kernel ext3 block variable type are treated as signed 4 bytes
int type, thus limited ext3 filesystem to 8TB (4kblock size based). While
trying to fix them, it seems quite confusing in the ext3 code where some
blocks are filesystem-wide blocks, some are group relative offsets that need
to be signed value (as -1 has special meaning). So it seem saner to define
two types of physical blocks: one is filesystem wide blocks, another is
group-relative blocks. The following patches clarify these two types of
blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3
filesystem limit to 8TB.With this series of patches and the percpu counter data type changes in the mm
tree, we are able to extend exts filesystem limit to 16TB.This work is also a pre-request for the recent >32 bit ext3 work, and makes
the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine
ext3_fsblk_t from unsigned long to sector_t and redefine the format string for
ext3 filesystem block corresponding.Two RFC with a series patches have been posted to ext2-devel list and have
been reviewed and discussed:
http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2
Patches are tested on both 32 bit machine and 64 bit machine, 8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39). Tests
includes overnight fsx, tiobench, dbench and fsstress.This patch:
Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for
filesystem wide blocks.This patch classifies all block group relative blocks, and ext3_fsblk_t blocks
occurs in the same function where used to be confusing before. Also include
kernel bug fixes for filesystem wide in-kernel block variables. There are
some fileystem wide blocks are treated as int/unsigned int type in the kernel
currently, especially in ext3 block allocation and reservation code. This
patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned
long) type.Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This was reported as Debian bug #336604.
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If ext3 filesystem is larger than 2TB, and sector_t is a u32 (i.e.
CONFIG_LBD not defined in the kernel), the calculation of the disk sector
will overflow. Add check at ext3_fill_super() and ext3_group_extend() to
prevent mount/remount/resize >2TB ext3 filesystem if sector_t size is 4
bytes.Verified this patch on a 32 bit platform without CONFIG_LBD defined
(sector_t is 32 bits long), mount refuse to mount a 10TB ext3.Signed-off-by: Mingming Cao
Acked-by: Andreas Dilger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Jun, 2006
3 commits
-
The percpu counter data type are changed in this set of patches to support
more users like ext3 who need more than 32 bit to store the free blocks
total in the filesystem.- Generic perpcu counters data type changes. The size of the global counter
and local counter were explictly specified using s64 and s32. The global
counter is changed from long to s64, while the local counter is changed from
long to s32, so we could avoid doing 64 bit update in most cases.- Users of the percpu counters are updated to make use of the new
percpu_counter_init() routine now taking an additional parameter to allow
users to pass the initial value of the global counter.Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Steven Rostedt points out that `rsv' here is usually
NULL, so we should avoid calling kfree().Also, fix up some nearby whitespace damage.
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.This complements the get_sb() patch. That reduced the significance of
sb->s_root, allowing NFS to place a fake root there. However, NFS does
require a dentry to use as a target for the statfs operation. This permits
the root in the vfsmount to be used instead.linux/mount.h has been added where necessary to make allyesconfig build
successfully.Interest has also been expressed for use with the FUSE and XFS filesystems.
Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds