12 Mar, 2013
1 commit
-
A user who was using a 8TB+ file system and with a very large flexbg
size (> 65536) could cause the atomic_t used in the struct flex_groups
to overflow. This was detected by PaX security patchset:http://forums.grsecurity.net/viewtopic.php?f=3&t=3289&p=12551#p12551
This bug was introduced in commit 9f24e4208f7e, so it's been around
since 2.6.30. :-(Fix this by using an atomic64_t for struct orlav_stats's
free_clusters.Signed-off-by: "Theodore Ts'o"
Reviewed-by: Lukas Czerner
Cc: stable@vger.kernel.org
15 Feb, 2013
1 commit
-
Some messages printed related to a WARN_ON(1) were printed using
KERN_NOTICE. Use KERN_WARNING or ext4_warning() instead so that
context related to the WARN_ON() is printed at the same printk warning
level (and log files, etc.)Signed-off-by: "Theodore Ts'o"
10 Feb, 2013
1 commit
-
In ext4_{create,mknod,mkdir,symlink}(), don't start the journal handle
until the inode has been succesfully allocated. In order to do this,
we need to start the handle in the ext4_new_inode(). So create a new
variant of this function, ext4_new_inode_start_handle(), so the handle
can be created at the last possible minute, before we need to modify
the inode allocation bitmap block.Signed-off-by: "Theodore Ts'o"
09 Feb, 2013
1 commit
-
So we can better understand what bits of ext4 are responsible for
long-running jbd2 handles, use jbd2__journal_start() so we can pass
context information for logging purposes.The recommended way for finding the longer-running handles is:
T=/sys/kernel/debug/tracing
EVENT=$T/events/jbd2/jbd2_handle_stats
echo "interval > 5" > $EVENT/filter
echo 1 > $EVENT/enable./run-my-fs-benchmark
cat $T/trace > /tmp/problem-handles
This will list handles that were active for longer than 20ms. Having
longer-running handles is bad, because a commit started at the wrong
time could stall for those 20+ milliseconds, which could delay an
fsync() or an O_SYNC operation. Here is an example line from the
trace file describing a handle which lived on for 311 jiffies, or over
1.2 seconds:postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
dirtied_blocks 0Signed-off-by: "Theodore Ts'o"
11 Dec, 2012
1 commit
-
Signed-off-by: Tao Ma
Signed-off-by: "Theodore Ts'o"
30 Nov, 2012
1 commit
-
Commit fa77dcfafeaa introduces block bitmap checksum calculation into
ext4_new_inode() in the case that block group was uninitialized.
However we brelse() the bitmap buffer before we attempt to checksum it
so we have no guarantee that the buffer is still there.Fix this by releasing the buffer after the possible checksum
computation.Signed-off-by: Lukas Czerner
Signed-off-by: "Theodore Ts'o"
Acked-by: Darrick J. Wong
Cc: stable@vger.kernel.org
29 Oct, 2012
1 commit
-
commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 changed
ext4_new_inode() such that the inode bitmap was being modified
outside a transaction, which could lead to corruption, and was
discovered when journal_checksum found a bad checksum in the
journal during log replay.Nix ran into this when using the journal_async_commit mount
option, which enables journal checksumming. The ensuing
journal replay failures due to the bad checksums led to
filesystem corruption reported as the now infamous
"Apparent serious progressive ext4 data corruption bug"[ Changed by tytso to only call ext4_journal_get_write_access() only
when we're fairly certain that we're going to allocate the inode. ]I've tested this by mounting with journal_checksum and
running fsstress then dropping power; I've also tested by
hacking DM to create snapshots w/o first quiescing, which
allows me to test journal replay repeatedly w/o actually
power-cycling the box. Without the patch I hit a journal
checksum error every time. With this fix it survives
many iterations.Reported-by: Nix
Signed-off-by: Eric Sandeen
Signed-off-by: "Theodore Ts'o"
Cc: stable@vger.kernel.org
22 Oct, 2012
1 commit
-
In mke2fs, we only checksum the whole bitmap block and it is right.
While in the kernel, we use EXT4_BLOCKS_PER_GROUP to indicate the
size of the checksumed bitmap which is wrong when we enable bigalloc.
The right size should be EXT4_CLUSTERS_PER_GROUP and this patch fixes
it.Also as every caller of ext4_block_bitmap_csum_set and
ext4_block_bitmap_csum_verify pass in EXT4_BLOCKS_PER_GROUP(sb)/8,
we'd better removes this parameter and sets it in the function itself.Signed-off-by: Tao Ma
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Lukas Czerner
Cc: stable@vger.kernel.org
24 Sep, 2012
1 commit
-
Recently, I ecountered some corrupted filesystems in which some
groups' free inode counts were 65535, it seemed that free inode
count was overflow. This patch teaches ext4 to check free inode
count before allocaing an inode.Signed-off-by: Yongqiang Yang
Signed-off-by: "Theodore Ts'o"
23 Jul, 2012
1 commit
-
Commit a0375156 properly notes that superblock doesn't need to be marked
as dirty when only number of free inodes / blocks / number of directories
changes since that is recomputed on each mount anyway. However that comment
leaves some unnecessary markings as dirty in place. Remove these.Artem: tested using xfstests for both journalled and non-journalled ext4.
Signed-off-by: Jan Kara
Signed-off-by: Artem Bityutskiy
Signed-off-by: "Theodore Ts'o"
Tested-by: Artem Bityutskiy
01 Jul, 2012
1 commit
-
Make it possible for ext4_count_free to operate on buffers and not
just data in buffer_heads.Signed-off-by: "Theodore Ts'o"
Cc: stable@kernel.org
02 Jun, 2012
1 commit
-
Pull Ext4 updates from Theodore Ts'o:
"The major new feature added in this update is Darrick J Wong's
metadata checksum feature, which adds crc32 checksums to ext4's
metadata fields.There is also the usual set of cleanups and bug fixes."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits)
ext4: hole-punch use truncate_pagecache_range
jbd2: use kmem_cache_zalloc wrapper instead of flag
ext4: remove mb_groups before tearing down the buddy_cache
ext4: add ext4_mb_unload_buddy in the error path
ext4: don't trash state flags in EXT4_IOC_SETFLAGS
ext4: let getattr report the right blocks in delalloc+bigalloc
ext4: add missing save_error_info() to ext4_error()
ext4: add debugging trigger for ext4_error()
ext4: protect group inode free counting with group lock
ext4: use consistent ssize_t type in ext4_file_write()
ext4: fix format flag in ext4_ext_binsearch_idx()
ext4: cleanup in ext4_discard_allocated_blocks()
ext4: return ENOMEM when mounts fail due to lack of memory
ext4: remove redundundant "(char *) bh->b_data" casts
ext4: disallow hard-linked directory in ext4_lookup
ext4: fix potential integer overflow in alloc_flex_gd()
ext4: remove needs_recovery in ext4_mb_init()
ext4: force ro mount if ext4_setup_super() fails
ext4: fix potential NULL dereference in ext4_free_inodes_counts()
ext4/jbd2: add metadata checksumming to the list of supported features
...
29 May, 2012
2 commits
-
Now when we set the group inode free count, we don't have a proper
group lock so that multiple threads may decrease the inode free
count at the same time. And e2fsck will complain something like:Free inodes count wrong for group #1 (1, counted=0).
Fix? noFree inodes count wrong for group #2 (3, counted=0).
Fix? noDirectories count wrong for group #2 (780, counted=779).
Fix? noFree inodes count wrong for group #3 (2272, counted=2273).
Fix? noSo this patch try to protect it with the ext4_lock_group.
btw, it is found by xfstests test case 269 and the volume is
mkfsed with the parameter
"-O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr"
and I have run it 100 times and the error in e2fsck doesn't
show up again.Signed-off-by: Tao Ma
Signed-off-by: "Theodore Ts'o" -
The ext4_get_group_desc() function returns NULL on error, and
ext4_free_inodes_count() function dereferences it without checking.
There is a check on the next line, but it's too late.Reviewed-by: Jan Kara
Signed-off-by: Dan Carpenter
Signed-off-by: "Theodore Ts'o"
Cc: stable@kernel.org
16 May, 2012
1 commit
-
Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman
30 Apr, 2012
4 commits
-
metadata_csum supersedes uninit_bg. Convert the ROCOMPAT uninit_bg
flag check to a helper function that covers both, and make the
checksum calculation algorithm use either crc16 or the metadata_csum
chosen algorithm depending on which flag is set. Print a warning if
we try to mount a filesystem with both feature flags set.Signed-off-by: Darrick J. Wong
Signed-off-by: "Theodore Ts'o" -
Compute and verify the checksum of the block bitmap; this checksum is
stored in the block group descriptor.Signed-off-by: Darrick J. Wong
Signed-off-by: "Theodore Ts'o" -
Compute and verify the checksum of the inode bitmap; the checkum is
stored in the block group descriptor.Signed-off-by: Darrick J. Wong
Signed-off-by: "Theodore Ts'o" -
This patch introduces to ext4 the ability to calculate and verify
inode checksums. This requires the use of a new ro compatibility flag
and some accompanying e2fsprogs patches to provide the relevant
features in tune2fs and e2fsck. The inode generation changes have
been integrated into this patch.Signed-off-by: Darrick J. Wong
Signed-off-by: "Theodore Ts'o"
20 Mar, 2012
2 commits
-
Signed-off-by: "Theodore Ts'o"
-
The functions ext4_msg() and ext4_error() already tack on a trailing
newline, so remove the unnecessary extra newline.Signed-off-by: "Theodore Ts'o"
21 Feb, 2012
1 commit
-
In ext4_read_{inode,block}_bitmap() we were setting bitmap_uptodate()
before submitting the buffer for read. The is bad, since we check
bitmap_uptodate() without locking the buffer, and so if another
process is racing with us, it's possible that they will think the
bitmap is uptodate even though the read has not completed yet,
resulting in inodes and blocks potentially getting allocated more than
once if we get really unlucky.Addresses-Google-Bug: 2828254
Signed-off-by: "Theodore Ts'o"
07 Feb, 2012
1 commit
-
The function ext4_claim_inode() is only called by one function,
ext4_new_inode(), and by folding the functionality into
ext4_new_inode(), we can remove almost 50 lines of code, and put all
of the logic of allocating a new inode into a single place.Signed-off-by: "Theodore Ts'o"
11 Jan, 2012
1 commit
-
Conflicts:
fs/ext4/ioctl.c
04 Jan, 2012
1 commit
-
Signed-off-by: Al Viro
29 Dec, 2011
2 commits
-
ext4_{set,clear}_bit() is defined as __test_and_{set,clear}_bit_le() for
ext4. Only two ext4_{set,clear}_bit() calls check the return value. The
rest of calls ignore the return value and they can be replaced with
__{set,clear}_bit_le().This changes ext4_{set,clear}_bit() from __test_and_{set,clear}_bit_le()
to __{set,clear}_bit_le() and introduces ext4_test_and_{set,clear}_bit()
for the two places where old bit needs to be returned.This ext4_{set,clear}_bit() change is considered safe, because if someone
uses these macros without noticing the change, new ext4_{set,clear}_bit
don't have return value and causes compiler errors where the return value
is used.This also removes unused ext4_find_first_zero_bit().
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: "Theodore Ts'o" -
Signed-off-by: "Theodore Ts'o"
19 Dec, 2011
1 commit
-
When insert_inode_locked() fails in ext4_new_inode() it most likely means inode
bitmap got corrupted and we allocated again inode which is already in use. Also
doing unlock_new_inode() during error recovery is wrong since the inode does
not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:)
which declares filesystem error and does not call unlock_new_inode().Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
03 Nov, 2011
1 commit
-
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue:
vfs: add d_prune dentry operation
vfs: protect i_nlink
filesystems: add set_nlink()
filesystems: add missing nlink wrappers
logfs: remove unnecessary nlink setting
ocfs2: remove unnecessary nlink setting
jfs: remove unnecessary nlink setting
hypfs: remove unnecessary nlink setting
vfs: ignore error on forced remount
readlinkat: ensure we return ENOENT for the empty pathname for normal lookups
vfs: fix dentry leak in simple_fill_super()
02 Nov, 2011
1 commit
-
Replace direct i_nlink updates with the respective updater function
(inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count).Signed-off-by: Miklos Szeredi
01 Nov, 2011
1 commit
-
Remove comments about 'extent' mount option in ext4_new_inode(), since
it's no longer exists.Signed-off-by: Eryu Guan
Signed-off-by: "Theodore Ts'o"
29 Oct, 2011
1 commit
-
The tmp_inode should have same uid/gid as the original inode.
Otherwise new metadata blocks will be accounted to wrong quota-id,
which will result in a quota leak after the inode migration is
completed.Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"
18 Oct, 2011
1 commit
-
The function declarations in ext4.h are already marked extern, so it's
not necessary to do so in the .c files.This quiets the sparse noise:
warning: function 'ext4_flush_completed_IO' with external linkage has definition
warning: function 'ext4_init_inode_table' with external linkage has definitionSigned-off-by: H Hartley Sweeten
Signed-off-by: "Theodore Ts'o"
09 Oct, 2011
1 commit
-
For a long time now orlov is the default block allocator in the
ext4. It performs better than the old one and no one seems to claim
otherwise so we can safely drop it and make oldalloc and orlov mount
option deprecated.This is a part of the effort to reduce number of ext4 options hence the
test matrix.Signed-off-by: Lukas Czerner
Signed-off-by: "Theodore Ts'o"
10 Sep, 2011
5 commits
-
This function really returns the number of clusters after initializing
an uninitalized block bitmap has been initialized.Signed-off-by: "Theodore Ts'o"
-
The field bg_free_blocks_count_{lo,high} in the block group
descriptor has been repurposed to hold the number of free clusters for
bigalloc functions. So rename the functions so it makes it easier to
read and audit the block allocation and block freeing code.Note: at this point in bigalloc development we doesn't support
online resize, so this also makes it really obvious all of the places
we need to fix up to add support for online resize.Signed-off-by: "Theodore Ts'o"
-
Convert the free_blocks to be free_clusters to make the final revised
bigalloc changes easier to read/understand.Signed-off-by: "Theodore Ts'o"
-
Convert the percpu counters s_dirtyblocks_counter and
s_freeblocks_counter in struct ext4_super_info to be
s_dirtyclusters_counter and s_freeclusters_counter.Signed-off-by: "Theodore Ts'o"
-
The function ext4_free_blocks_after_init() used to be a #define of
ext4_init_block_bitmap(). This actually made it difficult to
understand how the function worked, and made it hard make changes to
support clusters. So as an initial cleanup, I've separated out the
functionality of initializing block bitmap from calculating the number
of free blocks in the new block group.Signed-off-by: "Theodore Ts'o"
01 Aug, 2011
1 commit
-
This patch lets ext4_init_inode_table() handle errors right.
ext4_init_inode_table() should down_write() alloc_sem which
has been up_write()ed and stop the started journal handle.Signed-off-by: Yongqiang Yang
Signed-off-by: "Theodore Ts'o"