07 Nov, 2008
3 commits
-
When initializing an uninitialized block group in ext4_new_inode(),
its block group checksum must be re-calculated. This fixes a race
when several threads try to allocate a new inode in an UNINIT'd group.There is some question whether we need to be initializing the block
bitmap in ext4_new_inode() at all, but for now, if we are going to
init the block group, let's eliminate the race.Signed-off-by: Frederic Bohe
Signed-off-by: "Theodore Ts'o" -
We need to make sure we mark the buffer_heads as dirty and uptodate
so that block_write_full_page write them correctly.This fixes mmap corruptions that can occur in low memory situations.
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: "Theodore Ts'o" -
This fixes a 2.6.27 regression which was introduced in commit a02908f1.
We weren't passing the chunk parameter down to the two subections,
ext4_indirect_trans_blocks() and ext4_ext_index_trans_blocks(), with
the result that massively overestimate the amount of credits needed by
ext4_da_writepages, especially in the non-extents case. This causes
failures especially on /boot partitions, which tend to be small and
non-extent using since GRUB doesn't handle extents.This patch fixes the bug reported by Joseph Fannin at:
http://bugzilla.kernel.org/show_bug.cgi?id=11964Signed-off-by: "Theodore Ts'o"
04 Nov, 2008
3 commits
-
In ext4_sync_fs, we only wait for a commit to finish if we started it,
but there may be one already in progress which will not be synced.In the case of a data=ordered umount with pending long symlinks which
are delayed due to a long list of other I/O on the backing block
device, this causes the buffer associated with the long symlinks to
not be moved to the inode dirty list in the second phase of
fsync_super. Then, before they can be dirtied again, kjournald exits,
seeing the UMOUNT flag and the dirty pages are never written to the
backing block device, causing long symlink corruption and exposing new
or previously freed block data to userspace.To ensure all commits are synced, we flush all journal commits now
when sync_fs'ing ext4.Signed-off-by: Arthur Jones
Signed-off-by: Andrew Morton
Signed-off-by: "Theodore Ts'o"
Cc: Eric Sandeen
Cc: -
Use le16_to_cpu to read the s_reserved_gdt_blocks values
from super block.Signed-off-by: Aneesh Kumar K.V
Signed-off-by: "Theodore Ts'o" -
If we try to free a block which is already freed, the code was
returning without first unlocking the group.Signed-off-by: Aneesh Kumar K.V
Signed-off-by: "Theodore Ts'o"
28 Oct, 2008
3 commits
-
As reported by Eric Paris, the capable() check in ext4_has_free_blocks()
sometimes causes SELinux denials.We can rearrange the logic so that we only try to use the root-reserved
blocks when necessary, and even then we can move the capable() test
to last, to avoid the check most of the time.Signed-off-by: Eric Sandeen
Reviewed-by: Mingming Cao
Signed-off-by: "Theodore Ts'o" -
Mingming pointed out that ext4_claim_free_blocks & ext4_has_free_blocks
are largely cut & pasted; they can be collapsed/merged as follows.Signed-off-by: Eric Sandeen
Reviewed-by: Mingming Cao
Signed-off-by: "Theodore Ts'o" -
Vegard Nossum reported a bug which accesses freed memory (found via
kmemcheck). When journal has been aborted, ext4_put_super() calls
ext4_abort() after freeing the journal_t object, and then ext4_abort()
accesses it. This patch fix it.Signed-off-by: Hidehiro Kawai
Acked-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
26 Oct, 2008
1 commit
-
Fix a regression caused by commit d0156417, "ext4: fix ext4_dx_readdir
hash collision handling", where deleting files in a large directory
(requiring more than one getdents system call), results in some
filenames being returned twice. This was caused by a failure to
update info->curr_hash and info->curr_minor_hash, so that if the
directory had gotten modified since the last getdents() system call
(as would be the case if the user is running "rm -r" or "git clean"),
a directory entry would get returned twice to the userspace.Signed-off-by: "Theodore Ts'o"
This patch fixes the bug reported by Markus Trippelsdorf at:
http://bugzilla.kernel.org/show_bug.cgi?id=11844Signed-off-by: "Theodore Ts'o"
Tested-by: Markus Trippelsdorf
24 Oct, 2008
2 commits
-
Signed-off-by: Christoph Hellwig
[ All users removed in "switch all filesystems over to d_obtain_alias",
aka commit 440037287c5ebb07033ab927ca16bb68c291d309 ]
Signed-off-by: Linus Torvalds -
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
[PATCH] kill the rest of struct file propagation in block ioctls
[PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
[PATCH] get rid of blkdev_locked_ioctl()
[PATCH] get rid of blkdev_driver_ioctl()
[PATCH] sanitize blkdev_get() and friends
[PATCH] remember mode of reiserfs journal
[PATCH] propagate mode through swsusp_close()
[PATCH] propagate mode through open_bdev_excl/close_bdev_excl
[PATCH] pass fmode_t to blkdev_put()
[PATCH] kill the unused bsize on the send side of /dev/loop
[PATCH] trim file propagation in block/compat_ioctl.c
[PATCH] end of methods switch: remove the old ones
[PATCH] switch sr
[PATCH] switch sd
[PATCH] switch ide-scsi
[PATCH] switch tape_block
[PATCH] switch dcssblk
[PATCH] switch dasd
[PATCH] switch mtd_blkdevs
[PATCH] switch mmc
...
23 Oct, 2008
2 commits
-
Switch all users of d_alloc_anon to d_obtain_alias.
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
Signed-off-by: Al Viro
21 Oct, 2008
2 commits
-
Signed-off-by: Al Viro
-
Use fs/*/Kconfig more, which is good because everything related to one
filesystem is in one place and fs/Kconfig is quite fat.Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds
18 Oct, 2008
1 commit
-
Signed-off-by: Manish Katiyar
Signed-off-by: "Theodore Ts'o"
17 Oct, 2008
4 commits
-
If the HUGE_FILE feature flag is not set, don't allow the creation of
large files, instead of automatically enabling the feature flag.
Recent versions of mke2fs will set the HUGE_FILE flag automatically
anyway for ext4 filesystems.Signed-off-by: "Theodore Ts'o"
-
The multiblock allocator needs to be able to release blocks (and issue
a blkdev discard request) when the transaction which freed those
blocks is committed. Previously this was done via a polling mechanism
when blocks are allocated or freed. A much better way of doing things
is to create a jbd2 callback function and attaching the list of blocks
to be freed directly to the transaction structure.Signed-off-by: "Theodore Ts'o"
-
These mount options don't actually do anything any more, so remove
them.Signed-off-by: "Theodore Ts'o"
-
There are some newlines missing in ext4_check_descriptors, which
cause the printk level to be printed out when the next printk call
is made:[ 778.847265] EXT4-fs: ext4_check_descriptors: Block bitmap for group 0
not in group (block 1509949442)!EXT4-fs: group descriptors corrupted!
[ 802.646630] EXT4-fs: ext4_check_descriptors: Inode bitmap for group 0
not in group (block 9043971)!EXT4-fs: group descriptors corrupted!Signed-off-by: Eric Sesterhenn
Signed-off-by: "Theodore Ts'o"
16 Oct, 2008
3 commits
-
The range_cyclic writeback mode uses the address_space writeback_index
as the start index for writeback. With delayed allocation we were
updating writeback_index wrongly resulting in highly fragmented file.
This patch reduces the number of extents reduced from 4000 to 27 for a
3GB file.Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Theodore Ts'o -
Let the block device know when unused blocks can be discarded, using
the new sb_issue_discard() interface.Signed-off-by: "Theodore Ts'o"
-
With this patch we track the block freed during a transaction using
red-black tree. We also make sure contiguous blocks freed are collected
in one node in the tree.Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Theodore Ts'o
14 Oct, 2008
3 commits
-
This enables us to drop the range_cont writeback mode
use from ext4_da_writepages.Signed-off-by: Aneesh Kumar K.V
-
We should use kmem_cache_free to free memory allocated
via kmem_cache_allocSigned-off-by: Aneesh Kumar K.V
Signed-off-by: Theodore Ts'o -
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.This was posted for review some time ago and I believe its been in -mm
since then.Signed-off-by: Steven Whitehouse
Cc: Alexander Viro
Signed-off-by: Linus Torvalds
13 Oct, 2008
1 commit
-
fs/ext4/super.c: In function 'ext4_fill_super':
fs/ext4/super.c:2226: error: 'ext4_ui_proc_fops' undeclared (first use
in this function)
fs/ext4/super.c:2226: error: (Each undeclared identifier is reported
only once
fs/ext4/super.c:2226: error: for each function it appears in.)Signed-off-by: Alexander Beregalov
Signed-off-by: Theodore Ts'o
11 Oct, 2008
5 commits
-
We need to make sure we don't reuse the data blocks released
during the transaction untill the transaction commits. We force
this mode only for ordered and journalled mode. Writeback mode
already don't provided data consistency.Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Theodore Ts'o -
During filesystem recovery we may be doing a truncate
which expects some of the mballoc data structures to
be initialized. So do ext4_mb_init before recovery.Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Theodore Ts'o -
If the journal doesn't abort when it gets an IO error in file data
blocks, the file data corruption will spread silently. Because
most of applications and commands do buffered writes without fsync(),
they don't notice the IO error. It's scary for mission critical
systems. On the other hand, if the journal aborts whenever it gets
an IO error in file data blocks, the system will easily become
inoperable. So this patch introduces a filesystem option to
determine whether it aborts the journal or just call printk() when
it gets an IO error in file data.If you mount an ext4 fs with data_err=abort option, it aborts on file
data write error. If you mount it with data_err=ignore, it doesn't
abort, just call printk(). data_err=ignore is the default.Here is the corresponding patch of the ext3 version:
http://kerneltrap.org/mailarchive/linux-kernel/2008/9/9/3239374Signed-off-by: Hidehiro Kawai
Signed-off-by: Theodore Ts'o -
If the journal has aborted due to a checkpointing failure, we
have to keep the contents of the journal space. Otherwise, the
filesystem will lose uncheckpointed metadata completely and
become inconsistent. To avoid this, we need to keep needs_recovery
flag if checkpoint has failed.With this patch, ext4_put_super() detects a checkpointing failure
from the return value of journal_destroy(), then it invokes
ext4_abort() to make the filesystem read only and keep
needs_recovery flag. Errors from jbd2_journal_flush() are also
handled by this patch in some places.Signed-off-by: Hidehiro Kawai
Signed-off-by: Theodore Ts'o -
The ext4 filesystem is getting stable enough that it's time to drop
the "dev" prefix. Also remove the requirement for the TEST_FILESYS
flag.Signed-off-by: "Theodore Ts'o"
10 Oct, 2008
3 commits
-
This fixes a bug which caused on-line resizing of filesystems with a
1k blocksize to fail. The root cause of this bug was the fact that if
an uninitalized bitmap block gets read in by userspace (which
e2fsprogs does try to avoid, but can happen when the blocksize is less
than the pagesize and an adjacent blocks is read into memory)
ext4_read_block_bitmap() was erroneously depending on the buffer
uptodate flag to decide whether it needed to initialize the bitmap
block in memory --- i.e., to set the standard set of blocks in use by
a block group (superblock, bitmaps, inode table, etc.). Essentially,
ext4_read_block_bitmap() assumed it was the only routine that might
try to read a block containing a block bitmap, which is simply not
true.To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap()
must always initialize uninitialized bitmap blocks. Once a block or
inode is allocated out of that bitmap, it will be marked as
initialized in the block group descriptor, so in general this won't
result any extra unnecessary work.Signed-off-by: Frederic Bohe
Signed-off-by: "Theodore Ts'o" -
Signed-off-by: "Theodore Ts'o"
-
With modern hard drives, reading 64k takes roughly the same time as
reading a 4k block. So request readahead for adjacent inode table
blocks to reduce the time it takes when iterating over directories
(especially when doing this in htree sort order) in a cold cache case.
With this patch, the time it takes to run "git status" on a kernel
tree after flushing the caches via "echo 3 > /proc/sys/vm/drop_caches"
is reduced by 21%.Signed-off-by: "Theodore Ts'o"
09 Oct, 2008
1 commit
-
ext4_xattr_set_handle() eventually ends up calling
ext4_mark_inode_dirty() which tries to expand the inode by shifting
the EAs. This leads to the xattr_sem being downed again and leading
to a deadlock.This patch makes sure that if ext4_xattr_set_handle() is in the
call-chain, ext4_mark_inode_dirty() will not expand the inode.Signed-off-by: Kalpak Shah
Signed-off-by: "Theodore Ts'o"
07 Oct, 2008
2 commits
-
While reading code I noticed that ext4_put_super() dirties the
superblock bh twice. It is always done in ext4_commit_super()
too. Remove the redundant dirty operation.
Should be a nop semantically.Signed-off-by: Andi Kleen
-
ext4_ext_walk_space() was reinstated to be used for iterating over file
extents with a callback; it is used by the ext4 fiemap implementation.Signed-off-by: Eric Sandeen
Signed-off-by: "Theodore Ts'o"
Cc: linux-ext4@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
06 Oct, 2008
1 commit
-
This debugging markers are designed to debug problems such as the
random filesystem latency problems reported by Arjan.Signed-off-by: "Theodore Ts'o"