09 Dec, 2006
2 commits
-
This facility provides three entry points:
ilog2() Log base 2 of unsigned long
ilog2_u32() Log base 2 of u32
ilog2_u64() Log base 2 of u64These facilities can either be used inside functions on dynamic data:
int do_something(long q)
{
...;
y = ilog2(x)
...;
}Or can be used to statically initialise global variables with constant values:
unsigned n = ilog2(27);
When performing static initialisation, the compiler will report "error:
initializer element is not constant" if asked to take a log of zero or of
something not reducible to a constant. They treat negative numbers as
unsigned.When not dealing with a constant, they fall back to using fls() which permits
them to use arch-specific log calculation instructions - such as BSR on
x86/x86_64 or SCAN on FRV - if available.[akpm@osdl.org: MMC fix]
Signed-off-by: David Howells
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Herbert Xu
Cc: David Howells
Cc: Wojtek Kaniewski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the ext3
filesystem.Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Dec, 2006
15 commits
-
Port fix to the off-by-one in find_next_usable_block's memscan from ext2 to
ext3; but it didn't cause a serious problem for ext3 because the additional
ext3_test_allocatable check rescued it from the error.Signed-off-by: Mingming Cao
Signed-off-by: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ext3_new_blocks has a nice io_error label for setting -EIO, so goto that in
the one place that doesn't already use it.Signed-off-by: Mingming Cao
Signed-off-by: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The reservations tree is an rb_tree not a list, so it's less confusing to use
rb_entry() than list_entry() - though they're both just container_of().Signed-off-by: Mingming Cao
Signed-off-by: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
rsv_end is the last block within the reservation, so alloc_new_reservation
should accept start_block == rsv_end as success.Signed-off-by: Mingming Cao
Signed-off-by: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
grp_goal 0 is a genuine goal (unlike -1), so ext3_try_to_allocate_with_rsv
should treat it as such.Signed-off-by: Mingming Cao
Signed-off-by: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ext3_new_blocks should reset the reservation window size to 0 when squeezing
the last blocks out of an almost full filesystem, so the retry doesn't skip
any groups with less than half that free, reporting ENOSPC too soon.Signed-off-by: Mingming Cao
Signed-off-by: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If you do something like:
# touch foo
# tail -f foo &
# rm foo
#
#you'll panic, because ext3/4 tries to do orphan list processing on the
readonly snapshot device, and:kernel: journal commit I/O error
kernel: Assertion failure in journal_flush_Rsmp_e2f189ce() at journal.c:1356: "!journal->j_checkpoint_transactions"
kernel: Kernel panic: Fatal exceptionfor a truly readonly underlying device, it's reasonable and necessary
to just skip orphan list processing.Signed-off-by: Eric Sandeen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Hugh Dickins wrote:
> Not found anything relevant, but I keep noticing these lines
> in ext2_try_to_allocate_with_rsv(), ext3 and ext4 similar:
>
> } else if (grp_goal > 0 &&
> (my_rsv->rsv_end - grp_goal + 1) < *count)
> try_to_extend_reservation(my_rsv, sb,
> *count-my_rsv->rsv_end + grp_goal - 1);
>
> They're wrong, a no-op in most groups, aren't they? rsv_end is an
> absolute block number, whereas grp_goal is group-relative, so the
> calculation ought to bring in group_first_block? Or I'm confused.
>Signed-off-by: Mingming Cao
Cc: "linux-ext4@vger.kernel.org"
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In journal=ordered or journal=data mode retry in ext3_prepare_write()
breaks the requirements of journaling of data with respect to metadata.
The fix is to call commit_write to commit allocated zero blocks before
retry.Signed-off-by: Kirill Korotaev
Cc: Ingo Molnar
Cc: Ken Chen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Saves nearly 4kbytes on x86.
Cc: Arnaldo Carvalho de Melo
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I've been using Steve Grubb's purely evil "fsfuzzer" tool, at
http://people.redhat.com/sgrubb/files/fsfuzzer-0.4.tar.gzBasically it makes a filesystem, splats some random bits over it, then
tries to mount it and do some simple filesystem actions.At best, the filesystem catches the corruption gracefully. At worst,
things spin out of control.As you might guess, we found a couple places in ext3 where things spin out
of control :)First, we had a corrupted directory that was never checked for
consistency... it was corrupt, and pointed to another bad "entry" of
length 0. The for() loop looped forever, since the length of
ext3_next_entry(de) was 0, and we kept looking at the same pointer over and
over and over and over... I modeled this check and subsequent action on
what is done for other directory types in ext3_readdir...(adding this check adds some computational expense; I am testing a followup
patch to reduce the number of times we check and re-check these directory
entries, in all cases. Thanks for the idea, Andreas).Next we had a root directory inode which had a corrupted size, claimed to
be > 200M on a 4M filesystem. There was only really 1 block in the
directory, but because the size was so large, readdir kept coming back for
more, spewing thousands of printk's along the way.Per Andreas' suggestion, if we're in this read error condition and we're
trying to read an offset which is greater than i_blocks worth of bytes,
stop trying, and break out of the loop.With these two changes fsfuzz test survives quite well on ext3.
Signed-off-by: Eric Sandeen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
lock_super() is unnecessary for setting super-block feature flags. Use the
provided *_SET_COMPAT_FEATURE() macros as well.Signed-off-by: Andreas Gruenbacher
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Update ext3_statfs to return an FSID that is a 64 bit XOR of the 128 bit
filesystem UUID as suggested by Andreas Dilger. See the following Bugzilla
entry for details:http://bugzilla.kernel.org/show_bug.cgi?id=136
Cc: Andreas Dilger
Cc: Stephen Tweedie
Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
doneThe script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
SLAB_NOFS is an alias of GFP_NOFS.
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Oct, 2006
1 commit
-
Current error behaviour for ext2 and ext3 filesystems does not fully
correspond to the documentation and should be fixed.According to man 8 mount, ext2 and ext3 file systems allow to set one of 3
different on-errors behaviours:---- start of quote man 8 mount ----
errors=continue / errors=remount-ro / errors=panic
Define the behaviour when an error is encountered. (Either ignore
errors and just mark the file system erroneous and continue, or remount
the file system read-only, or panic and halt the system.) The default is
set in the filesystem superblock, and can be changed using tune2fs(8).---- end of quote ----
However EXT3_ERRORS_CONTINUE is not read from the superblock, and thus
ERRORS_CONT is not saved on the sbi->s_mount_opt. It leads to the incorrect
handle of errors on ext3.Then we've checked corresponding code in ext2 and discovered that it is buggy
as well:- EXT2_ERRORS_CONTINUE is not read from the superblock (the same);
- parse_option() does not clean the alternative values and thus something
like (ERRORS_CONT|ERRORS_RO) can be set;- if options are omitted, parse_option() does not set any of these options.
Therefore it is possible to set any combination of these options on the ext2:
- none of them may be set: EXT2_ERRORS_CONTINUE on superblock / empty mount
options;- any of them may be set using mount options;
- 2 any options may be set: by using EXT2_ERRORS_RO/EXT2_ERRORS_PANIC on the
superblock and other value in mount options;- and finally all three options may be set by adding third option in remount.
Currently ext2 uses these values only in ext2_error() and it is not leading to
any noticeable troubles. However somebody may be discouraged when he will try
to workaround EXT2_ERRORS_PANIC on the superblock by using errors=continue in
mount options.This patch:
EXT3_ERRORS_CONTINUE should be taken from the superblock as default value for
error behaviour.Signed-off-by: Dmitry Mishin
Acked-by: Vasily Averin
Acked-by: Kirill Korotaev
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Oct, 2006
7 commits
-
Some filesystems, instead of simply decrementing i_nlink, simply zero it
during an unlink operation. We need to catch these in addition to the
decrement operations.Signed-off-by: Dave Hansen
Acked-by: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is mostly included for parity with dec_nlink(), where we will have some
more hooks. This one should stay pretty darn straightforward for now.Signed-off-by: Dave Hansen
Acked-by: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When a filesystem decrements i_nlink to zero, it means that a write must be
performed in order to drop the inode from the filesystem.We're shortly going to have keep filesystems from being remounted r/o between
the time that this i_nlink decrement and that write occurs.So, add a little helper function to do the decrements. We'll tie into it in a
bit to note when i_nlink hits zero.Signed-off-by: Dave Hansen
Acked-by: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Cc: Michael Holzheu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move the Ext3 device ioctl compat stuff from fs/compat_ioctl.c to the Ext3
driver so that the Ext3 header file doesn't need to be included.Signed-Off-By: David Howells
Signed-off-by: Jens Axboe -
Signed-off-by: Jens Axboe
27 Sep, 2006
13 commits
-
This eliminates the i_blksize field from struct inode. Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.[bunk@stusta.de: cleanup]
[akpm@osdl.org: generic_fillattr() fix]
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
* Rougly half of callers already do it by not checking return value
* Code in drivers/acpi/osl.c does the following to be sure:(void)kmem_cache_destroy(cache);
* Those who check it printk something, however, slab_error already printed
the name of failed cache.
* XFS BUGs on failed kmem_cache_destroy which is not the decision
low-level filesystem driver should make. Converted to ignore.Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
* Removing useless casts
* Removing useless wrapper
* Conversion from kmalloc+memset to kzallocSigned-off-by: Panagiotis Issaris
Acked-by: Dave Kleikamp
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Conversions from kmalloc+memset to kzalloc.
Signed-off-by: Panagiotis Issaris
Jffs2-bit-acked-by: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Some of the changes in balloc.c are just cosmetic, as Andreas pointed out -
if they overflow they'll then underflow and things are fine.5th hunk actually fixes an overflow problem.
Also check for potential overflows in inode & block counts when resizing.
Signed-off-by: Eric Sandeen
Cc: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fixing up some endian-ness warnings in preparation to clone ext4 from ext3.
Signed-off-by: Dave Kleikamp
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
More white space cleanups in preparation of cloning ext4 from ext3.
Removing spaces that precede a tab.Signed-off-by: Dave Kleikamp
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
SWsoft Virtuozzo/OpenVZ Linux kernel team has discovered that ext3 error
behavior was broken in linux kernels since 2.5.x versions by the following
patch:2002/10/31 02:15:26-05:00 tytso@snap.thunk.org
Default mount options from superblock for ext2/3 filesystems
http://linux.bkbits.net:8080/linux-2.6/gnupatch@3dc0d88eKbV9ivV4ptRNM8fBuA3JBQIn case ext3 file system is mounted with errors=continue
(EXT3_ERRORS_CONTINUE) errors should be ignored when possible. However at
present in case of any error kernel aborts journal and remounts filesystem
to read-only. Such behavior was hit number of times and noted to differ
from that of 2.4.x kernels.This patch fixes this:
- do nothing in case of EXT3_ERRORS_CONTINUE,
- set EXT3_MOUNT_ABORT and call journal_abort() in all other cases
- panic() should be called after ext3_commit_super() to save
sb marked as EXT3_ERROR_FSSigned-off-by: Vasily Averin
Acked-by: Kirill Korotaev
Cc: Theodore Ts'o
Cc: "Stephen C. Tweedie"
Cc: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Mingming Cao
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In the past there were a few kernel panics related to block reservation
tree operations failure (insert/remove etc). It would be very useful to
get the block allocation reservation map info when such error happens.Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is primarily format string fixes, with changes to ialloc.c where large
inode counts could overflow, and also pass around journal_inum as an
unsigned long, just to be pedantic about it....Signed-off-by: Eric Sandeen
Cc: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I need to do some actual IO testing now, but this gets things mounting for
a 16T ext3 filesystem. (patched up e2fsprogs is needed too, I'll send that
off the kernel list)This patch fixes these issues in the kernel:
o sbi->s_groups_count overflows in ext3_fill_super()
sbi->s_groups_count = (le32_to_cpu(es->s_blocks_count) -
le32_to_cpu(es->s_first_data_block) +
EXT3_BLOCKS_PER_GROUP(sb) - 1) /
EXT3_BLOCKS_PER_GROUP(sb);at 16T, s_blocks_count is already maxed out; adding
EXT3_BLOCKS_PER_GROUP(sb) overflows it and groups_count comes out to 0.
Not really what we want, and causes a failed mount.Feel free to check my math (actually, please do!), but changing it this
way should work & avoid the overflow:(A + B - 1)/B changed to: ((A - 1)/B) + 1
o ext3_check_descriptors() overflows range checks
ext3_check_descriptors() iterates over all block groups making sure
that various bits are within the right block ranges... on the last pass
through, it is checking the error case[item] >= block + EXT3_BLOCKS_PER_GROUP(sb)
where "block" is the first block in the last block group. The last
block in this group (and the last one that will fit in 32 bits) is block
+ EXT3_BLOCKS_PER_GROUP(sb)- 1. block + EXT3_BLOCKS_PER_GROUP(sb) wraps
back around to 0.so, make things clearer with "first_block" and "last_block" where those
are first and last, inclusive, and use rather than =.Finally, the last block group may be smaller than the rest, so account
for this on the last pass through: last_block = sb->s_blocks_count - 1;(a similar patch could be done for ext2; does anyone in their right mind
use ext2 at 16T? I'll send an ext2 patch doing the same thing if that's
warranted)Signed-off-by: Eric Sandeen
Cc: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove whitespace from ext3 and jbd, before we clone ext4.
Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Sep, 2006
2 commits
-
ext3-get-blocks support caused ~20% degrade in Sequential read
performance (tiobench). Problem is with marking the buffer boundary
so IO can be submitted right away. Here is the patch to fix it.2.6.18-rc6:
-----------
# ./iotest
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 75.2726 seconds, 57.1 MB/sreal 1m15.285s
user 0m0.276s
sys 0m3.884s2.6.18-rc6 + fix:
-----------------
[root@elm3a241 ~]# ./iotest
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 62.9356 seconds, 68.2 MB/sThe boundary block check in ext3_get_blocks_handle needs to be adjusted
against the count of blocks mapped in this call, now that it can map
more than one block.Signed-off-by: Suparna Bhattacharya
Tested-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Inodes earlier than the 'first' inode (e.g. journal, resize) should be
rejected early - except the root inode. Also inode numbers that are too
big should be rejected early.[akpm@osdl.org: cleanup]
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds