20 Jul, 2007
4 commits
-
Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).Signed-off-by: Paul Mundt
-
Looking at the current linus-git tree jbd_debug() define in
include/linux/jbd2.hextern u8 journal_enable_debug;
#define jbd_debug(n, f, a...) \
do { \
if ((n) fs/ext4/inode.c: In function âext4_write_inodeâ:
> fs/ext4/inode.c:2906: warning: comparison is always true due to limited
> range of data type
>
> fs/jbd2/recovery.c: In function âjbd2_journal_recoverâ:
> fs/jbd2/recovery.c:254: warning: comparison is always true due to
> limited range of data type
> fs/jbd2/recovery.c:257: warning: comparison is always true due to
> limited range of data type
>
> fs/jbd2/recovery.c: In function âjbd2_journal_skip_recoveryâ:
> fs/jbd2/recovery.c:301: warning: comparison is always true due to
> limited range of data type
>
Noticed all warnings are occurs when the debug level is 0. Then found
the "jbd2: Move jbd2-debug file to debugfs" patch
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0f49d5d019afa4e94253bfc92f0daca3badb990bchanged the jbd2_journal_enable_debug from int type to u8, makes the
jbd_debug comparision is always true when the debugging level is 0. Thus
the compile warning occurs.Thought about changing the jbd2_journal_enable_debug data type back to
int, but can't, because the jbd2-debug is moved to debug fs, where
calling debugfs_create_u8() to create the debugfs entry needs the value
to be u8 type.Even if we changed the data type back to int, the code is still buggy,
kernel should not print jbd2 debug message if the
jbd2_journal_enable_debug is set to 0. But this is not the case.The fix is change the level of debugging to 1. The same should fixed in
ext3/JBD, but currently ext3 jbd-debug via /proc fs is broken, so we
probably should fix it all together.Signed-off-by: Mingming Cao
Cc: Jeff Garzik
Cc: Theodore Tso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Split ondemand readahead interface into two functions. I think this makes it
a little clearer for non-readahead experts (like Rusty).Internally they both call ondemand_readahead(), but the page argument is
changed to an obvious boolean flag.Signed-off-by: Rusty Russell
Signed-off-by: Fengguang Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Convert ext3/ext4 dir reads to use on-demand readahead.
Readahead for dirs operates _not_ on file level, but on blockdev level. This
makes a difference when the data blocks are not continuous. And the read
routine is somehow opaque: there's no handy info about the status of current
page. So a simplified call scheme is employed: to call into readahead
whenever the current page falls out of readahead windows.Signed-off-by: Fengguang Wu
Cc: Steven Pratt
Cc: Ram Pai
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Jul, 2007
18 commits
-
Use the EXT_LAST_INDEX macro; that's what it's there for.
Clean up ext4_ext_ext_grow_indepth() so the correct EXT_FIRST_INDEX or
EXT_FIRST_MACRO is used as necessary. The two macros are equivalent, so
the C will collapse the if statement out, but it makes the code much
more readable.Signed-off-by: Dmitry Monakhov
Acked-by: Alex Tomas
Signed-off-by: Dave Kleikamp
Singed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o" -
Signed-off-by: Dmitry Monakhov
Acked-by: Alex Tomas
Signed-off-by: Dave Kleikamp
Signed-off-by: "Theodore Ts'o" -
ext4_change_inode_journal_flag() is only called from one location:
ext4_ioctl(EXT3_IOC_SETFLAGS). That ioctl case already has a IS_RDONLY()
call in it so this one is superfluous.Signed-off-by: Dave Hansen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Dave Kleikamp
Signed-off-by: "Theodore Ts'o" -
Replace (n & (n-1)) in the context of power of 2 checks with
is_power_of_2()Signed-off-by: Vignesh Babu
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Dave Kleikamp
Signed-off-by: "Theodore Ts'o" -
Signed-off-by: Eric Sandeen
Signed-off-by: "Theodore Ts'o" -
This patch adds support to ext4 for allowing more than 65000
subdirectories. Currently the maximum number of subdirectories is capped
at 32000.If we exceed 65000 subdirectories in an htree directory it sets the
inode link count to 1 and no longer counts subdirectories. The
directory link count is not actually used when determining if a
directory is empty, as that only counts subdirectories and not regular
files that might be in there.A EXT4_FEATURE_RO_COMPAT_DIR_NLINK flag has been added and it is set if
the subdir count for any directory crosses 65000. A later fsck will clear
EXT4_FEATURE_RO_COMPAT_DIR_NLINK if there are no longer any directory
with >65000 subdirs.Signed-off-by: Andreas Dilger
Signed-off-by: Kalpak Shah
Signed-off-by: "Theodore Ts'o" -
We need to make sure that existing ext3 filesystems can also avail the
new fields that have been added to the ext4 inode. We use
s_want_extra_isize and s_min_extra_isize to decide by how much we should
expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set
then we expand the inode by max(s_want_extra_isize, s_min_extra_isize ,
sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is
still an open question about whether users should be able to set
s_*_extra_isize smaller than the known fields or not.This patch also adds the functionality to expand inodes to include the
newly added fields. We start by trying to expand by s_want_extra_isize
bytes and if its fails we try to expand by s_min_extra_isize bytes. This
is done by changing the i_extra_isize if enough space is available in
the inode and no EAs are present. If EAs are present and there is enough
space in the inode then the EAs in the inode are shifted to make space.
If enough space is not available in the inode due to the EAs then 1 or
more EAs are shifted to the external EA block. In the worst case when
even the external EA block does not have enough space we inform the user
that some EA would need to be deleted or s_min_extra_isize would have to
be reduced.Signed-off-by: Andreas Dilger
Signed-off-by: Kalpak Shah
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o" -
This patch adds nanosecond timestamps for ext4. This involves adding
*time_extra fields to the ext4_inode to extend the timestamps to
64-bits. Creation time is also added by this patch.These extended fields will fit into an inode if the filesystem was
formatted with large inodes (-I 256 or larger) and there are currently
no EAs consuming all of the available space. For new inodes we always
reserve enough space for the kernel's known extended fields, but for
inodes created with an old kernel this might not have been the case. So
this patch also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature
flag(ro-compat so that older kernels can't create inodes with a smaller
extra_isize). which indicates if the fields fitting inside
s_min_extra_isize are available or not. If the expansion of inodes if
unsuccessful then this feature will be disabled. This feature is only
enabled if requested by the sysadmin.None of the extended inode fields is critical for correct filesystem
operation.Signed-off-by: Andreas Dilger
Signed-off-by: Kalpak Shah
Signed-off-by: Eric Sandeen
Signed-off-by: Dave Kleikamp
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o" -
When the JBD code was forked to create the new JBD2 code base, the
references to CONFIG_JBD_DEBUG where never changed to
CONFIG_JBD2_DEBUG. This patch fixes that.Signed-off-by: Jose R. Santos
Signed-off-by: "Theodore Ts'o" -
Set the journals JBD2_FEATURE_INCOMPAT_64BIT on devices with more
than 32bit block sizes during mount time. This ensure proper record
lenth when writing to the journal.Signed-off-by: Jose R. Santos
Signed-off-by: Andreas Dilger
Signed-off-by: Mingming Cao
Signed-off-by: Laurent Vivier
Signed-off-by: "Theodore Ts'o" -
Add more run-time checking of extent header fields and remove BUG_ON
checks so we don't panic the kernel just because the on-disk filesystem
is corrupted.Signed-off-by: Alex Tomas
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o" -
Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
ext4-specific i_flags. Quota code changes these flags on quota files
(to make it harder for sysadmin to screw himself) and these changes were
not correctly propagated into the filesystem.(This is a forward port patch from ext3)
Signed-off-by: Jan Kara
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o" -
Turn on extents feature by default in ext4 filesystem, to get wider
testing of extents feature in ext4dev. This can be disabled using
-o noextents.Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o" -
This change was suggested by Andreas Dilger.
This patch changes the EXT_MAX_LEN value and extent code which marks/checks
uninitialized extents. With this change it will be possible to have
initialized extents with 2^15 blocks (earlier the max blocks we could have
was 2^15 - 1). This way we can have better extent-to-block alignment.
Now, maximum number of blocks we can have in an initialized extent is 2^15
and in an uninitialized extent is 2^15 - 1.Signed-off-by: Amit Arora
-
This patch adds write support to the uninitialized extents that get
created when a preallocation is done using fallocate(). It takes care of
splitting the extents into multiple (upto three) extents and merging the
new split extents with neighbouring ones, if possible.Signed-off-by: Amit Arora
-
This patch implements ->fallocate() inode operation in ext4. With this
patch users of ext4 file systems will be able to use fallocate() system
call for persistent preallocation. Current implementation only supports
preallocation for regular files (directories not supported as of date)
with extent maps. This patch does not support block-mapped files currently.
Only FALLOC_ALLOCATE and FALLOC_RESV_SPACE modes are being supported as of
now.Signed-off-by: Amit Arora
-
Introduce is_owner_or_cap() macro in fs.h, and convert over relevant
users to it. This is done because we want to avoid bugs in the future
where we check for only effective fsuid of the current task against a
file's owning uid, without simultaneously checking for CAP_FOWNER as
well, thus violating its semantics.
[ XFS uses special macros and structures, and in general looked ...
untouchable, so we leave it alone -- but it has been looked over. ]The (current->fsuid != inode->i_uid) check in generic_permission() and
exec_permission_lite() is left alone, because those operations are
covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations
falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone.Signed-off-by: Satyam Sharma
Cc: Al Viro
Acked-by: Serge E. Hallyn
Signed-off-by: Linus Torvalds -
currently the export_operation structure and helpers related to it are in
fs.h. fs.h is already far too large and there are very few places needing the
export bits, so split them off into a separate header.[akpm@linux-foundation.org: fix cifs build]
Signed-off-by: Christoph Hellwig
Signed-off-by: Neil Brown
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Jul, 2007
6 commits
-
This is a patch that speeds up statfs. It is very simple - the "overhead"
calculation, which takes a huge amount of time for large filesystems, never
changes unless the size of the filesystem itself changes. That means we can
store it in memory and only recalculate if the filesystem has been resized
(almost never).It also fixes a minor problem that we never update the on-disk superblock free
blocks/inodes counts until the filesystem is unmounted. While not fatal, we
may as well update that on disk when we have the information, and it makes
things like debugfs and dumpe2fs report a bit more accurate info.Signed-off-by: Badari Pulavarty
Signed-off-by: Andreas Dilger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix error handling in ext4_create_journal according to kernel conventions.
Signed-off-by: Borislav Petkov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In ext4_new_blocks(), one of two ext4_block_bitmap() calls should be
ext4_inode_bitmap() call. It is not harmful in normal processing, but it
should be fixed.Signed-off-by: Toshiyuki Okajima
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ext4_orphan_add() and ext4_orphan_del() functions lock sb->s_lock with a
transaction started with ext4_mark_recovery_complete() waits for a transaction
holding sb->s_lock, thus leading to a possible deadlock. At the moment we
call ext4_mark_recovery_complete() from ext4_remount() we have done all the
work needed for remounting and thus we are safe to drop sb->s_lock before we
wait for transactions to commit. Note that at this moment we are still
guarded by s_umount lock against other remounts/umounts.Signed-off-by: Jan Kara
Cc: Eric Sandeen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
After ext3 orphan list check has been added into ext3_destroy_inode()
(please see my previous patch) the following situation has been detected:EXT3-fs warning (device sda6): ext3_unlink: Deleting nonexistent file (37901290), 0
Inode 00000101a15b7840: orphan list check failed!
00000773 6f665f00 74616d72 00000573 65725f00 06737270 66000000 616d726f
...
Call Trace: [] ext3_destroy_inode+0x79/0x90
[] sys_unlink+0x126/0x1a0
[] error_exit+0x0/0x81
[] system_call+0x7e/0x83First messages said that unlinked inode has i_nlink=0, then ext3_unlink()
adds this inode into orphan list.Second message means that this inode has not been removed from orphan list.
Inode dump has showed that i_fop = &bad_file_ops and it can be set in
make_bad_inode() only. Then I've found that ext3_read_inode() can call
make_bad_inode() without any error/warning messages, for example in the
following case:...
if (inode->i_nlink == 0) {
if (inode->i_mode == 0 ||
!(EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ORPHAN_FS)) {
/* this inode is deleted */
brelse (bh);
goto bad_inode;
...Bad inode can live some time, ext3_unlink can add it to orphan list, but
ext3_delete_inode() do not deleted this inode from orphan list. As result
we can have orphan list corruption detected in ext3_destroy_inode().However it is not clear for me how to fix this issue correctly.
As far as i see is_bad_inode() is called after iget() in all places
excluding ext3_lookup() and ext3_get_parent(). I believe it makes sense to
add bad inode check to these functions too and call iput if bad inode
detected.Signed-off-by: Vasily Averin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Customers claims to ext3-related errors, investigation showed that ext3
orphan list has been corrupted and have the reference to non-ext3 inode.
The following debug helps to understand the reasons of this issue.[akpm@linux-foundation.org: update for print_hex_dump() changes]
Signed-off-by: Vasily Averin
Cc: "Randy.Dunlap"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Jul, 2007
1 commit
-
They can use generic_file_splice_read() instead. Since sys_sendfile() now
prefers that, there should be no change in behaviour.Signed-off-by: Jens Axboe
24 Jun, 2007
1 commit
-
One of error path in ext4_read_inode() leaks bh since brelse is forgoten.
Signed-off-by: Kirill Korotaev
Acked-by: Vasily Averin
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Jun, 2007
4 commits
-
we should free just the allocated blocks.
Signed-off-by: Alex Tomas
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o" -
This patch adds a check for overlap of extents and cuts short the
new extent to be inserted, if there is a chance of overlap.Signed-off-by: Amit Arora
Signed-off-by: "Theodore Ts'o" -
Signed-Off-By: Mingming Cao
Signed-off-by: "Theodore Ts'o" -
Replace a lot of spaces with tabs
Signed-off-by: Dave Kleikamp
Signed-off-by: "Theodore Ts'o"
17 May, 2007
1 commit
-
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
Signed-off-by: Christoph Lameter
Cc: David Howells
Cc: Jens Axboe
Cc: Steven French
Cc: Michael Halcrow
Cc: OGAWA Hirofumi
Cc: Miklos Szeredi
Cc: Steven Whitehouse
Cc: Roman Zippel
Cc: David Woodhouse
Cc: Dave Kleikamp
Cc: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: Anton Altaparmakov
Cc: Mark Fasheh
Cc: Paul Mackerras
Cc: Christoph Hellwig
Cc: Jan Kara
Cc: David Chinner
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 May, 2007
3 commits
-
Remove includes of where it is not used/needed.
Suggested by Al Viro.Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
- ext3_dx_find_entry() exit with out setting proper error pointer
- do_split() exit with out setting proper error pointer
it is realy painful because many callers contain folowing code:de = do_split(handle,dir, &bh, frame, &hinfo, &retval);
if (!(de))
return retval;
<<< WOW retval wasn't changed by do_split(), so caller failed
<<< but return SUCCESS :)- Rearrange do_split() error path. Current error path is realy ugly, all
this up and down jump stuff doesn't make code easy to understand.[dmonakhov@sw.ru: fix annoying fake error messages]
Signed-off-by: Monakhov Dmitriy
Cc: Andreas Dilger
Cc: Theodore Ts'o
Signed-off-by: Monakhov Dmitriy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Taken from http://bugzilla.kernel.org/show_bug.cgi?id=5079
signed long ranges from -2.147.483.648 to 2.147.483.647 on x86 32bit
10000011110110100100111110111101 .. -2,082,844,739
10000011110110100100111110111101 .. 2,212,122,557
Cc:Andreas says:
This patch is now treating timestamps with the high bit set as negative
times (before Jan 1, 1970). This means we lose 1/2 of the possible range
of timestamps (lopping off 68 years before unix timestamp overflow -
now only 30 years away :-) to handle the extremely rare case of setting
timestamps into the distant past.If we are only interested in fixing the underflow case, we could just
limit the values to 0 instead of storing negative values. At worst this
will skew the timestamp by a few hours for timezones in the far east
(files would still show Jan 1, 1970 in "ls -l" output).That said, it seems 32-bit systems (mine at least) allow files to be set
into the past (01/01/1907 works fine) so it seems this patch is bringing
the x86_64 behaviour into sync with other kernels.On the plus side, we have a patch that is ready to add nanosecond timestamps
to ext3 and as an added bonus adds 2 high bits to the on-disk timestamp so
this extends the maximum date to 2242.Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 May, 2007
2 commits
-
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
been used in 6 years (so akpm says).find * -name \*.[ch] | xargs grep -l invalidate_bdev |
while read file; do
quilt add $file;
sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
doneSigned-off-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds