18 Dec, 2007
1 commit
-
As it turns out, the kernel divides by EXT3_INODES_PER_GROUP(s) when
mounting an ext3 filesystem. If that number is zero, a crash follows.
Below a patch.This crash was reported by Joeri de Ruiter, Carst Tankink and Pim Vullers.
Cc:
Acked-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Oct, 2007
2 commits
-
Now that nfsd has stopped writing to the find_exported_dentry member we an
mark the export_operations constSigned-off-by: Christoph Hellwig
Cc: Neil Brown
Cc: "J. Bruce Fields"
Cc:
Cc: Dave Kleikamp
Cc: Anton Altaparmakov
Cc: David Chinner
Cc: Timothy Shimmin
Cc: OGAWA Hirofumi
Cc: Hugh Dickins
Cc: Chris Mason
Cc: Jeff Mahoney
Cc: "Vladimir V. Saveliev"
Cc: Steven Whitehouse
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Trivial switch over to the new generic helpers.
Signed-off-by: Christoph Hellwig
Cc: Neil Brown
Cc: "J. Bruce Fields"
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Oct, 2007
8 commits
-
Convert s_r_blocks_count and s_free_blocks_count to
s_r_blocks_count_lo and s_free_blocks_count_loThis helps in finding BUGs due to direct partial access of
these split 64 bit valuesAlso fix direct partial access in ext4 code
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: "Theodore Ts'o" -
Convert s_blocks_count to s_blocks_count_lo
This helps in finding BUGs due to direct partial access of
these split 64 bit valuesAlso fix direct partial access in ext4 code
Signed-off-by: Aneesh Kumar K.V
-
Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo
and bg_inode_table_lo. This helps in finding BUGs due to
direct partial access of these split 64 bit valuesAlso fix one direct partial access
Signed-off-by: Aneesh Kumar K.V
-
Convert bg_block_bitmap to bg_block_bitmap_lo
This helps in catching some BUGS due to direct
partial access of these split fields.Signed-off-by: Aneesh Kumar K.V
-
This feature relaxes check restrictions on where each block groups meta
data is located within the storage media. This allows for the allocation
of bitmaps or inode tables outside the block group boundaries in cases
where bad blocks forces us to look for new blocks which the owning block
group can not satisfy. This will also allow for new meta-data allocation
schemes to improve performance and scalability.Signed-off-by: Jose R. Santos
Cc:
Signed-off-by: Andrew Morton -
In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
regardless of whether it is in use. This is this the most time consuming part
of the filesystem check. The unintialized block group feature can greatly
reduce e2fsck time by eliminating checking of uninitialized inodes.With this feature, there is a a high water mark of used inodes for each block
group. Block and inode bitmaps can be uninitialized on disk via a flag in the
group descriptor to avoid reading or scanning them at e2fsck time. A checksum
of each group descriptor is used to ensure that corruption in the group
descriptor's bit flags does not cause incorrect operation.The feature is enabled through a mkfs option
mke2fs /dev/ -O uninit_groups
A patch adding support for uninitialized block groups to e2fsprogs tools has
been posted to the linux-ext4 mailing list.The patches have been stress tested with fsstress and fsx. In performance
tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
linearly with the total number of inodes in the filesytem. In ext4 with the
uninitialized block groups feature, the e2fsck time is constant, based
solely on the number of used inodes rather than the total inode count.
Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
greatly reduce e2fsck time for users. With performance improvement of 2-20
times, depending on how full the filesystem is.The attached graph shows the major improvements in e2fsck times in filesystems
with a large total inode count, but few inodes in use.In each group descriptor if we have
EXT4_BG_INODE_UNINIT set in bg_flags:
Inode table is not initialized/used in this group. So we can skip
the consistency check during fsck.
EXT4_BG_BLOCK_UNINIT set in bg_flags:
No block in the group is used. So we can skip the block bitmap
verification for this group.We also add two new fields to group descriptor as a part of
uninitialized group patch.__le16 bg_itable_unused; /* Unused inodes count */
__le16 bg_checksum; /* crc16(sb_uuid+group+desc) */bg_itable_unused:
If we have EXT4_BG_INODE_UNINIT not set in bg_flags
then bg_itable_unused will give the offset within
the inode table till the inodes are used. This can be
used by fsck to skip list of inodes that are marked unused.bg_checksum:
Now that we depend on bg_flags and bg_itable_unused to determine
the block and inode usage, we need to make sure group descriptor
is not corrupt. We add checksum to group descriptor to
detect corruption. If the descriptor is found to be corrupt, we
mark all the blocks and inodes in the group used.Signed-off-by: Avantika Mathur
Signed-off-by: Andreas Dilger
Signed-off-by: Mingming Cao
Signed-off-by: Aneesh Kumar K.V -
Fragment support in ext2/3/4 was never implemented, and it probably will
never be implemented. So remove it from ext4.Signed-off-by: Coly Li
Acked-by: Andreas Dilger
Signed-off-by: Andrew Morton
Signed-off-by: "Theodore Ts'o" -
change JBD_XXX macros to JBD2_XXX in JBD2/Ext4
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o"
17 Oct, 2007
6 commits
-
Replace n & (n - 1) with is_power_of_2(n)
Signed-off-by: vignesh babu
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Miklos Szeredi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It's *wrong* to have
#define log2(n) ffz(~(n))
It should be *reversed*:
#define log2(n) flz(~(n))
or
#define log2(n) fls(n)
or just use
ilog2(n) defined in linux/log2.h.This patch follows the last solution, recommended by Andrew Morton.
Cc:
Cc: Mingming Cao
Cc: Bjorn Helgaas
Cc: Chris Ahna
Cc: David Mosberger-Tang
Cc: Kyle McMartin
Cc: Dave Airlie
Cc: Dave Jones
Signed-off-by: Fengguang Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Slab constructors currently have a flags parameter that is never used. And
the order of the arguments is opposite to other slab functions. The object
pointer is placed before the kmem_cache pointer.Convert
ctor(void *object, struct kmem_cache *s, unsigned long flags)
to
ctor(struct kmem_cache *s, void *object)
throughout the kernel
[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
alloc_percpu can fail, propagate that error.
Signed-off-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
s/percpu_counter_sum/&_positive/
Because its consitent with percpu_counter_read*
Signed-off-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Sep, 2007
1 commit
-
If we fail to start a transaction when releasing dquot, we have to call
dquot_release() anyway to mark dquot structure as inactive. Otherwise we
end in an infinite loop inside dqput().Signed-off-by: Jan Kara
Cc: xb
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jul, 2007
1 commit
-
ext[234]_check_descriptors sanity checks block group descriptor geometry at
mount time, testing whether the block bitmap, inode bitmap, and inode table
reside wholly within the blockgroup. However, the inode table test is off
by one so that if the last block in the inode table resides on the last
block of the block group, the test incorrectly fails. This is because it
tests the last block as (start + length) rather than (start + length - 1).This can be seen by trying to mount a filesystem made such as:
mkfs.ext2 -F -b 1024 -m 0 -g 256 -N 3744 fsfile 1024
which yields:
EXT2-fs error (device loop0): ext2_check_descriptors: Inode table for group 0 not in group (block 101)!
EXT2-fs: group descriptors corrupted!There is a similar bug in e2fsprogs, patch already sent for that.
(I wonder if inside(), outside(), and/or in_range() should someday be
used in this and other tests throughout the ext filesystems...)Signed-off-by: Eric Sandeen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Jul, 2007
1 commit
-
Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).Signed-off-by: Paul Mundt
18 Jul, 2007
5 commits
-
Replace (n & (n-1)) in the context of power of 2 checks with
is_power_of_2()Signed-off-by: Vignesh Babu
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Dave Kleikamp
Signed-off-by: "Theodore Ts'o" -
This patch adds nanosecond timestamps for ext4. This involves adding
*time_extra fields to the ext4_inode to extend the timestamps to
64-bits. Creation time is also added by this patch.These extended fields will fit into an inode if the filesystem was
formatted with large inodes (-I 256 or larger) and there are currently
no EAs consuming all of the available space. For new inodes we always
reserve enough space for the kernel's known extended fields, but for
inodes created with an old kernel this might not have been the case. So
this patch also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature
flag(ro-compat so that older kernels can't create inodes with a smaller
extra_isize). which indicates if the fields fitting inside
s_min_extra_isize are available or not. If the expansion of inodes if
unsuccessful then this feature will be disabled. This feature is only
enabled if requested by the sysadmin.None of the extended inode fields is critical for correct filesystem
operation.Signed-off-by: Andreas Dilger
Signed-off-by: Kalpak Shah
Signed-off-by: Eric Sandeen
Signed-off-by: Dave Kleikamp
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o" -
Set the journals JBD2_FEATURE_INCOMPAT_64BIT on devices with more
than 32bit block sizes during mount time. This ensure proper record
lenth when writing to the journal.Signed-off-by: Jose R. Santos
Signed-off-by: Andreas Dilger
Signed-off-by: Mingming Cao
Signed-off-by: Laurent Vivier
Signed-off-by: "Theodore Ts'o" -
Turn on extents feature by default in ext4 filesystem, to get wider
testing of extents feature in ext4dev. This can be disabled using
-o noextents.Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o" -
currently the export_operation structure and helpers related to it are in
fs.h. fs.h is already far too large and there are very few places needing the
export bits, so split them off into a separate header.[akpm@linux-foundation.org: fix cifs build]
Signed-off-by: Christoph Hellwig
Signed-off-by: Neil Brown
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Jul, 2007
4 commits
-
This is a patch that speeds up statfs. It is very simple - the "overhead"
calculation, which takes a huge amount of time for large filesystems, never
changes unless the size of the filesystem itself changes. That means we can
store it in memory and only recalculate if the filesystem has been resized
(almost never).It also fixes a minor problem that we never update the on-disk superblock free
blocks/inodes counts until the filesystem is unmounted. While not fatal, we
may as well update that on disk when we have the information, and it makes
things like debugfs and dumpe2fs report a bit more accurate info.Signed-off-by: Badari Pulavarty
Signed-off-by: Andreas Dilger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix error handling in ext4_create_journal according to kernel conventions.
Signed-off-by: Borislav Petkov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ext4_orphan_add() and ext4_orphan_del() functions lock sb->s_lock with a
transaction started with ext4_mark_recovery_complete() waits for a transaction
holding sb->s_lock, thus leading to a possible deadlock. At the moment we
call ext4_mark_recovery_complete() from ext4_remount() we have done all the
work needed for remounting and thus we are safe to drop sb->s_lock before we
wait for transactions to commit. Note that at this moment we are still
guarded by s_umount lock against other remounts/umounts.Signed-off-by: Jan Kara
Cc: Eric Sandeen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Customers claims to ext3-related errors, investigation showed that ext3
orphan list has been corrupted and have the reference to non-ext3 inode.
The following debug helps to understand the reasons of this issue.[akpm@linux-foundation.org: update for print_hex_dump() changes]
Signed-off-by: Vasily Averin
Cc: "Randy.Dunlap"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Jun, 2007
1 commit
-
Replace a lot of spaces with tabs
Signed-off-by: Dave Kleikamp
Signed-off-by: "Theodore Ts'o"
17 May, 2007
1 commit
-
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
Signed-off-by: Christoph Lameter
Cc: David Howells
Cc: Jens Axboe
Cc: Steven French
Cc: Michael Halcrow
Cc: OGAWA Hirofumi
Cc: Miklos Szeredi
Cc: Steven Whitehouse
Cc: Roman Zippel
Cc: David Woodhouse
Cc: Dave Kleikamp
Cc: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: Anton Altaparmakov
Cc: Mark Fasheh
Cc: Paul Mackerras
Cc: Christoph Hellwig
Cc: Jan Kara
Cc: David Chinner
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 May, 2007
2 commits
-
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
been used in 6 years (so akpm says).find * -name \*.[ch] | xargs grep -l invalidate_bdev |
while read file; do
quilt add $file;
sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
doneSigned-off-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Feb, 2007
1 commit
-
This patch is inspired by Arjan's "Patch series to mark struct
file_operations and struct inode_operations const".Compile tested with gcc & sparse.
Signed-off-by: Josef 'Jeff' Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Feb, 2007
2 commits
-
Fix insecure default behaviour reported by Tigran Aivazian: if an ext2 or
ext3 or ext4 filesystem is tuned to mount with "acl", but mounted by a
kernel built without ACL support, then umask was ignored when creating
inodes - though root or user has umask 022, touch creates files as 0666,
and mkdir creates directories as 0777.This appears to have worked right until 2.6.11, when a fix to the default
mode on symlinks (always 0777) assumed VFS applies umask: which it does,
unless the mount is marked for ACLs; but ext[234] set MS_POSIXACL in
s_flags according to s_mount_opt set according to def_mount_opts.We could revert to the 2.6.10 ext[234]_init_acl (adding an S_ISLNK test);
but other filesystems only set MS_POSIXACL when ACLs are configured. We
could fix this at another level; but it seems most robust to avoid setting
the s_mount_opt flag in the first place (at the expense of more ifdefs).Likewise don't set the XATTR_USER flag when built without XATTR support.
Signed-off-by: Hugh Dickins
Cc: Tigran Aivazian
Cc:
Cc: Andreas Gruenbacher
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In the rare case where we have skipped orphan inode processing due to a
readonly block device, and the block device subsequently changes back to
read-write, disallow a remount,rw transition of the filesystem when we have an
unprocessed orphan inodes as this would corrupt the list.Ideally we should process the orphan inode list during the remount, but that's
trickier, and this plugs the hole for now.Signed-off-by: Eric Sandeen
Cc: "Stephen C. Tweedie"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Dec, 2006
4 commits
-
If you do something like:
# touch foo
# tail -f foo &
# rm foo
#
#you'll panic, because ext3/4 tries to do orphan list processing on the
readonly snapshot device, and:kernel: journal commit I/O error
kernel: Assertion failure in journal_flush_Rsmp_e2f189ce() at journal.c:1356: "!journal->j_checkpoint_transactions"
kernel: Kernel panic: Fatal exceptionfor a truly readonly underlying device, it's reasonable and necessary
to just skip orphan list processing.Signed-off-by: Eric Sandeen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Update ext4_statfs to return an FSID that is a 64 bit XOR of the 128 bit
filesystem UUID as suggested by Andreas Dilger. See the following Bugzilla
entry for details:http://bugzilla.kernel.org/show_bug.cgi?id=136
Cc: Andreas Dilger
Cc: Stephen Tweedie
Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
doneThe script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
SLAB_NOFS is an alias of GFP_NOFS.
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds