Eric Lee / smarc-fsl-linux-kernel

04 Aug, 2008

1 commit

8f616cd52 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: remove write-only variables from ext4_ordered_write_end
ext4: unexport jbd2_journal_update_superblock
ext4: Cleanup whitespace and other miscellaneous style issues
ext4: improve ext4_fill_flex_info() a bit
ext4: Cleanup the block reservation code path
ext4: don't assume extents can't cross block groups when truncating
ext4: Fix lack of credits BUG() when deleting a badly fragmented inode
ext4: Fix ext4_ext_journal_restart()
ext4: fix ext4_da_write_begin error path
jbd2: don't abort if flushing file data failed
ext4: don't read inode block if the buffer has a write error
ext4: Don't allow lg prealloc list to be grow large.
ext4: Convert the usage of NR_CPUS to nr_cpu_ids.
ext4: Improve error handling in mballoc
ext4: lock block groups when initializing
ext4: sync up block and inode bitmap reading functions
ext4: Allow read/only mounts with corrupted block group checksums
ext4: Fix data corruption when writing to prealloc area

Linus Torvalds
2008-08-04 01:50:44 +0800

03 Aug, 2008

6 commits

7d55992d6 ext4: remove write-only variables from ext4_ordered_write_end ... Browse Code »

The variables 'from' and 'to' are not used anywhere.

Signed-off-by: Eric Sandeen
Acked-by: Mingming Cao
Signed-off-by: "Theodore Ts'o"

Eric Sandeen
2008-08-03 09:22:18 +0800
bc965ab3f ext4: Fix lack of credits BUG() when deleting a badly fragmented inode ... Browse Code »

The extents codepath for ext4_truncate() requests journal transaction
credits in very small chunks, requesting only what is needed. This
means there may not be enough credits left on the transaction handle
after ext4_truncate() returns and then when ext4_delete_inode() tries
finish up its work, it may not have enough transaction credits,
causing a BUG() oops in the jbd2 core.

Also, reserve an extra 2 blocks when starting an ext4_delete_inode()
since we need to update the inode bitmap, as well as update the
orphaned inode linked list.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-08-03 09:10:38 +0800
d5a0d4f73 ext4: fix ext4_da_write_begin error path ... Browse Code »

ext4_da_write_begin needs to call journal_stop before returning,
if the page allocation fails.

Signed-off-by: Eric Sandeen
Acked-by: Mingming Cao
Signed-off-by: "Theodore Ts'o"

Eric Sandeen
2008-08-03 06:51:06 +0800
b5f10eed8 ext4: lock block groups when initializing ... Browse Code »

I noticed when filling a 1T filesystem with 4 threads using the
fs_mark benchmark:

fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0

that I occasionally got checksum mismatch errors:

EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 6935

etc. I'd reliably get 4-5 of them during the run.

It appears that the problem is likely a race to init the bg's
when the uninit_bg feature is enabled.

With the patch below, which adds sb_bgl_locking around initialization,
I was able to complete several runs with no errors or warnings.

Signed-off-by: Eric Sandeen
Signed-off-by: Theodore Ts'o

Eric Sandeen
2008-08-03 09:21:08 +0800
e29d1cde6 ext4: sync up block and inode bitmap reading functions ... Browse Code »

ext4_read_block_bitmap and read_inode_bitmap do essentially
the same thing, and yet they are structured quite differently.
I came across this difference while looking at doing bg locking
during bg initialization.

This patch:

* removes unnecessary casts in the error messages
* renames read_inode_bitmap to ext4_read_inode_bitmap
* and more substantially, restructures the inode bitmap
reading function to be more like the block bitmap counterpart.

The change to the inode bitmap reader simplifies the locking
to be applied in the next patch.

Signed-off-by: Eric Sandeen
Signed-off-by: Theodore Ts'o

Eric Sandeen
2008-08-03 09:21:02 +0800
d03856bd5 ext4: Fix data corruption when writing to prealloc area ... Browse Code »

Inserting an extent can cause a new entry in the already existing index
block. That doesn't increase the depth of the instead. Instead it adds a
new leaf block. Now with the new leaf block the path information
corresponding to the logical block should be fetched from the new block.
The old path will be pointing to the old leaf block.

We need to recalucate the path information on extent insert
even if depth doesn't change. Without this change, the extent merge
after converting an unwritten extent to initialized extent takes the wrong
extent and cause data corruption.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o"

Aneesh Kumar K.V
2008-08-03 06:51:32 +0800

02 Aug, 2008

2 commits

34071da71 ext4: don't assume extents can't cross block groups when truncating ... Browse Code »

With the FLEX_BG layout, there is no reason why extents can't cross
block groups, so make the truncate code reserve enough credits so we
don't BUG if we come across such an extent.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-08-02 09:59:19 +0800
0123c9399 ext4: Fix ext4_ext_journal_restart() ... Browse Code »

The ext4_ext_journal_restart() is a convenience function which checks
to see if the requested number of credits is present, and if so it
closes the current transaction and attaches the current handle to the
new transaction. Unfortunately, it wasn't proprely checking the
return value from ext4_journal_extend(), so it was starting a new
transaction when one was not necessary, and returning an error when
all that was necessary was to restart the handle with a new
transaction.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-08-02 08:57:54 +0800

01 Aug, 2008

1 commit

77e69dac3 [PATCH] fix races and leaks in vfs_quota_on() users ... Browse Code »

* new helper: vfs_quota_on_path(); equivalent of vfs_quota_on() sans the
pathname resolution.
* callers of vfs_quota_on() that do their own pathname resolution and
checks based on it are switched to vfs_quota_on_path(); that way we
avoid the races.
* reiserfs leaked dentry/vfsmount references on several failure exits.

Signed-off-by: Al Viro

Al Viro
2008-08-01 23:25:25 +0800

29 Jul, 2008

1 commit

8ab22b9ab vfs: pagecache usage optimization for pagesize!=blocksize ... Browse Code »

When we read some part of a file through pagecache, if there is a
pagecache of corresponding index but this page is not uptodate, read IO
is issued and this page will be uptodate.

I think this is good for pagesize == blocksize environment but there is
room for improvement on pagesize != blocksize environment. Because in
this case a page can have multiple buffers and even if a page is not
uptodate, some buffers can be uptodate.

So I suggest that when all buffers which correspond to a part of a file
that we want to read are uptodate, use this pagecache and copy data from
this pagecache to user buffer even if a page is not uptodate. This can
reduce read IO and improve system throughput.

I wrote a benchmark program and got result number with this program.

This benchmark do:

1: mount and open a test file.

2: create a 512MB file.

3: close a file and umount.

4: mount and again open a test file.

5: pwrite randomly 300000 times on a test file. offset is aligned
by IO size(1024bytes).

6: measure time of preading randomly 100000 times on a test file.

The result was:
2.6.26
330 sec

2.6.26-patched
226 sec

Arch:i386
Filesystem:ext3
Blocksize:1024 bytes
Memory: 1GB

On ext3/4, a file is written through buffer/block. So random read/write
mixed workloads or random read after random write workloads are optimized
with this patch under pagesize != blocksize environment. This test result
showed this.

The benchmark program is as follows:

#include
#include
#include
#include
#include
#include
#include
#include
#include

#define LEN 1024
#define LOOP 1024*512 /* 512MB */

main(void)
{
unsigned long i, offset, filesize;
int fd;
char buf[LEN];
time_t t1, t2;

if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
perror("cannot mount\n");
exit(1);
}
memset(buf, 0, LEN);
fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
if (fd < 0) {
perror("cannot open file\n");
exit(1);
}
for (i = 0; i < LOOP; i++)
write(fd, buf, LEN);
close(fd);
if (umount("/root/test1/") < 0) {
perror("cannot umount\n");
exit(1);
}
if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
perror("cannot mount\n");
exit(1);
}
fd = open("/root/test1/testfile", O_RDWR);
if (fd < 0) {
perror("cannot open file\n");
exit(1);
}

filesize = LEN * LOOP;
for (i = 0; i < 300000; i++){
offset = (random() % filesize) & (~(LEN - 1));
pwrite(fd, buf, LEN, offset);
}
printf("start test\n");
time(&t1);
for (i = 0; i < 100000; i++){
offset = (random() % filesize) & (~(LEN - 1));
pread(fd, buf, LEN, offset);
}
time(&t2);
printf("%ld sec\n", t2-t1);
close(fd);
if (umount("/root/test1/") < 0) {
perror("cannot umount\n");
exit(1);
}
}

Signed-off-by: Hisashi Hifumi
Cc: Nick Piggin
Cc: Christoph Hellwig
Cc: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hisashi Hifumi
2008-07-29 07:30:21 +0800

27 Jul, 2008

5 commits

e6305c43e [PATCH] sanitize ->permission() prototype ... Browse Code »

* kill nameidata * argument; map the 3 bits in ->flags anybody cares
about to new MAY_... ones and pass with the mask.
* kill redundant gfs2_iop_permission()
* sanitize ecryptfs_permission()
* fix remaining places where ->permission() instances might barf on new
MAY_... found in mask.

The obvious next target in that direction is permission(9)

folded fix for nfs_permission() breakage from Miklos Szeredi

Signed-off-by: Al Viro

Al Viro
2008-07-27 08:53:14 +0800
2b2d6d019 ext4: Cleanup whitespace and other miscellaneous style issues ... Browse Code »

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-07-27 04:15:44 +0800
51cc50685 SL*B: drop kmem cache argument from constructor ... Browse Code »

Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan
Acked-by: Pekka Enberg
Acked-by: Christoph Lameter
Cc: Jon Tollefson
Cc: Nick Piggin
Cc: Matt Mackall
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2008-07-27 03:00:07 +0800
9c83a923c ext4: don't read inode block if the buffer has a write error ... Browse Code »

A transient I/O error can corrupt inode data. Here is the scenario:

(1) update inode_A at the block_B
(2) pdflush writes out new inode_A to the filesystem, but it results
in write I/O error, at this point, BH_Uptodate flag of the buffer
for block_B is cleared and BH_Write_EIO is set
(3) create new inode_C which located at block_B, and
__ext4_get_inode_loc() tries to read on-disk block_B because the
buffer is not uptodate
(4) if it can read on-disk block_B successfully, inode_A is
overwritten by old data

This patch makes __ext4_get_inode_loc() not read the inode block if the
buffer has BH_Write_EIO flag. In this case, the buffer should have the
latest information, so setting the uptodate flag to the buffer (this
avoids WARN_ON_ONCE() in mark_buffer_dirty().)

According to this change, we would need to test BH_Write_EIO flag for the
error checking. Currently nobody checks write I/O errors on metadata
buffers, but it will be done in other patches I'm working on.

Signed-off-by: Hidehiro Kawai
Cc: sugita
Cc: Satoshi OSHIMA
Cc: Nick Piggin
Cc: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Theodore Ts'o

Hidehiro Kawai
2008-07-27 04:39:26 +0800
8a266467b ext4: Allow read/only mounts with corrupted block group checksums ... Browse Code »

If the block group checksums are corrupted, still allow the mount to
succeed, so e2fsck can have a chance to try to fix things up. Add
code in the remount r/w path to make sure the block group checksums
are valid before allowing the filesystem to be remounted read/write.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-07-27 02:34:21 +0800

25 Jul, 2008

1 commit

ec05e868a ext4: improve ext4_fill_flex_info() a bit ... Browse Code »

- use kzalloc() instead of kmalloc() + memset()
- improve a printk info

Signed-off-by: Li Zefan
Signed-off-by: Theodore Ts'o

Li Zefan
2008-07-25 00:49:59 +0800

24 Jul, 2008

3 commits

6be2ded1d ext4: Don't allow lg prealloc list to be grow large. ... Browse Code »

Currently, the locality group prealloc list is freed only when there
is a block allocation failure. This can result in large number of
entries in the preallocation list making ext4_mb_use_preallocated()
expensive.

To fix this, we convert the locality group prealloc list to a hash
list. The hash index is the order of number of blocks in the prealloc
space with a max order of 9. When adding prealloc space to the list we
make sure total entries for each order does not exceed 8. If it is
more than 8 we discard few entries and make sure the we have only
Signed-off-by: Theodore Ts'o

Aneesh Kumar K.V
2008-07-24 02:14:05 +0800
1320cbcf7 ext4: Convert the usage of NR_CPUS to nr_cpu_ids. ... Browse Code »

NR_CPUS can be really large. We should be using nr_cpu_ids instead.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Theodore Ts'o

Aneesh Kumar K.V
2008-07-24 02:09:26 +0800
ce89f46cb ext4: Improve error handling in mballoc ... Browse Code »

Don't call BUG_ON on file system failures. Instead use ext4_error and
also handle the continue case properly.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Theodore Ts'o

Aneesh Kumar K.V
2008-07-24 02:09:29 +0800

18 Jul, 2008

1 commit

12219aea6 ext4: Cleanup the block reservation code path ... Browse Code »

The truncate patch should not use the i_allocated_meta_blocks
value. So add seperate functions to be used in the truncate
and alloc path. We also need to release the meta-data block
that we reserved for the blocks that we are truncating.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Theodore Ts'o

Aneesh Kumar K.V
2008-07-18 04:12:08 +0800

15 Jul, 2008

1 commit

d2a176379 ext4: delayed allocation ENOSPC handling ... Browse Code »

This patch does block reservation for delayed
allocation, to avoid ENOSPC later at page flush time.

Blocks(data and metadata) are reserved at da_write_begin()
time, the freeblocks counter is updated by then, and the number of
reserved blocks is store in per inode counter.

At the writepage time, the unused reserved meta blocks are returned
back. At unlink/truncate time, reserved blocks are properly released.

Updated fix from Aneesh Kumar K.V
to fix the oldallocator block reservation accounting with delalloc, added
lock to guard the counters and also fix the reservation for meta blocks.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Mingming Cao
Signed-off-by: Theodore Ts'o

Mingming Cao
2008-07-15 05:52:37 +0800

12 Jul, 2008

18 commits

e4079a11f ext4: do not set extents feature from the kernel ... Browse Code »

We've talked for a while about getting rid of any feature-
setting from the kernel; this gets rid of the code which would
set the INCOMPAT_EXTENTS flag on the first file write when mounted
as ext4[dev].

With this patch, if the extents feature is not already set on disk,
then mounting as ext4 will fall back to noextents with a warning,
and if -o extents is explicitly requested, the mount will fail,
also with warning.

Signed-off-by: Eric Sandeen
Signed-off-by: "Theodore Ts'o"

Eric Sandeen
2008-07-12 07:27:31 +0800
c07651b55 ext4: Don't allow nonextenst mount option for large filesystem ... Browse Code »

The block mapped inode format can address only blocks within 2**32. This
causes a number of issues, the biggest of which is that the block
allocator needs to be taught that certain inodes can not utilize block
numbers > 2**32. So until this is fixed, it is simplest to fail
mounting of file systems with more than 2**32 blocks if the -o noextents
option is given.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o"

Aneesh Kumar K.V
2008-07-12 07:27:31 +0800
dd919b982 ext4: Enable delalloc by default. ... Browse Code »

Enable delalloc by default to ensure it gets sufficient testing and
because it makes the filesystem much more efficient. Add a nodealalloc
option to disable delayed allocation, and update ext4_show_options to
show delayed allocation off if it is disabled.

If the data=journal mount option is used, disable delayed allocation
since the delalloc code doesn't support data=journal yet.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Mingming Cao

Aneesh Kumar K.V
2008-07-12 07:27:31 +0800
3e3398a08 ext4: delayed allocation i_blocks fix for stat ... Browse Code »

Right now i_blocks is not getting updated until the blocks are actually
allocaed on disk. This means with delayed allocation, right after files
are copied, "ls -sF" shoes the file as taking 0 blocks on disk. "du"
also shows the files taking zero space, which is highly confusing to the
user.

Since delayed allocation already keeps track of per-inode total
number of blocks that are subject to delayed allocation, this patch fix
this by using that to adjust the value returned by stat(2). When real
block allocation is done, the i_blocks will get updated. Since the
reserved blocks for delayed allocation will be decreased, this will be
keep value returned by stat(2) consistent.

Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o"

Mingming Cao
2008-07-12 07:27:31 +0800
632eaeab1 ext4: fix delalloc i_disksize early update issue ... Browse Code »

Ext4_da_write_end() used walk_page_buffers() with a callback function of
ext4_bh_unmapped_or_delay() to check if it extended the file size
without allocating any blocks (since in this case i_disksize needs to be
updated). However, this is didn't work proprely because the buffer head
has not been marked dirty yet --- this is done later in
block_commit_write() --- which caused ext4_bh_unmapped_or_delay() to
always return false.

In addition, walk_page_buffers() checks all of the buffer heads covering
the page, and the only buffer_head that should be checked is the one
covering the end of the write. Otherwise, given a 1k blocksize
filesystem and a 4k page size, the buffer head covering the first 1k
stripe of the file could be unmapped (because it was a sparse file), and
the second or third buffer_head covering that page could be mapped, and
using walk_page_buffers() would fail in this case since it would stop at
the first unmapped buffer_head and return true.

The core problem is that walk_page_buffers() was intended to do work in
a callback function, and a non-zero return value indicated a failure,
which termined the walk of the buffer heads covering the page. It was
not intended to be used with a boolean function, such as
ext4_bh_unmapped_or_delay().

Add addtional fix from Aneesh to protect i_disksize update rave with truncate.

Signed-off-by: Mingming Cao
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: "Theodore Ts'o"

Mingming Cao
2008-07-12 07:27:31 +0800
f0e6c9859 ext4: Handle page without buffers in ext4_*_writepage() ... Browse Code »

It can happen that buffers are removed from the page before it gets
marked dirty and then is passed to writepage(). In writepage() we just
initialize the buffers and check whether they are mapped and non
delay. If they are mapped and non delay we write the page. Otherwise we
mark them dirty. With this change we don't do block allocation at all
in ext4_*_write_page.

writepage() can get called under many condition and with a locking order
of journal_start -> lock_page, we should not try to allocate blocks in
writepage() which get called after taking page lock. writepage() can
get called via shrink_page_list even with a journal handle which was
created for doing inode update. For example when doing
ext4_da_write_begin we create a journal handle with credit 1 expecting a
i_disksize update for the inode. But ext4_da_write_begin can cause
shrink_page_list via _grab_page_cache. So having a valid handle via
ext4_journal_current_handle is not a guarantee that we can use the
handle for block allocation in writepage, since we shouldn't be using
credits that had been reserved for other updates. That it could result
in we running out of credits when we update inodes.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o"

Aneesh Kumar K.V
2008-07-12 07:27:31 +0800
cd1aac329 ext4: Add ordered mode support for delalloc ... Browse Code »

This provides a new ordered mode implementation which gets rid of using
buffer heads to enforce the ordering between metadata change with the
related data chage. Instead, in the new ordering mode, it keeps track
of all of the inodes touched by each transaction on a list, and when
that transaction is committed, it flushes all of the dirty pages for
those inodes. In addition, the new ordered mode reverses the lock
ordering of the page lock and transaction lock, which provides easier
support for delayed allocation.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o"

Aneesh Kumar K.V
2008-07-12 07:27:31 +0800
61628a3f3 ext4: Invert lock ordering of page_lock and transaction start in delalloc ... Browse Code »

With the reverse locking, we need to start a transation before taking
the page lock, so in ext4_da_writepages() we need to break the write-out
into chunks, and restart the journal for each chunck to ensure the
write-out fits in a single transaction.

Updated patch from Aneesh Kumar K.V
which fixes delalloc sync hang with journal lock inversion, and address
the performance regression issue.

Signed-off-by: Mingming Cao
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Mingming Cao
2008-07-12 07:27:31 +0800
e8ced39d5 percpu_counter: new function percpu_counter_sum_and_set ... Browse Code »

Delayed allocation need to check free blocks at every write time.
percpu_counter_read_positive() is not quit accurate. delayed
allocation need a more accurate accounting, but using
percpu_counter_sum_positive() is frequently is quite expensive.

This patch added a new function to update center counter when sum
per-cpu counter, to increase the accurate rate for next
percpu_counter_read() and require less calling expensive
percpu_counter_sum().

Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o"

Mingming Cao
2008-07-12 07:27:31 +0800
64769240b ext4: Add delayed allocation support in data=writeback mode ... Browse Code »

Updated with fixes from Mingming Cao to unlock and
release the page from page cache if the delalloc write_begin failed, and
properly handle preallocated blocks. Also added a fix to clear
buffer_delay in block_write_full_page() after allocating a delayed
buffer.

Updated with fixes from Aneesh Kumar K.V
to update i_disksize properly and to add bmap support for delayed
allocation.

Updated with a fix from Valerie Clement to
avoid filesystem corruption when the filesystem is mounted with the
delalloc option and blocksize < pagesize.

Signed-off-by: Alex Tomas
Signed-off-by: Mingming Cao
Signed-off-by: Dave Kleikamp
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Aneesh Kumar K.V

Alex Tomas
2008-07-12 07:27:31 +0800
678aaf481 ext4: Use new framework for data=ordered mode in JBD2 ... Browse Code »

This patch makes ext4 use inode-based implementation of data=ordered mode
in JBD2. It allows us to unify some data=ordered and data=writeback paths
(especially writepage since we don't have to start a transaction anymore)
and remove some buffer walking.

Updated fix from Aneesh Kumar K.V
to fix file system hang due to corrupt jinode values.

Signed-off-by: Jan Kara
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o"

Jan Kara
2008-07-12 07:27:31 +0800
9ddfc3dc7 ext4: Fix lock inversion in ext4_ext_truncate() ... Browse Code »

We cannot call ext4_orphan_add() from under i_data_sem because that
causes a lock ordering violation between i_data_sem and and the
superblock lock.

Updated with Aneesh's locking order fix

Signed-off-by: Jan Kara
Signed-off-by: Mingming Cao
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: "Theodore Ts'o"

Jan Kara
2008-07-12 07:27:31 +0800
cf108bca4 ext4: Invert the locking order of page_lock and transaction start ... Browse Code »

This changes are needed to support data=ordered mode handling via
inodes. This enables us to get rid of the journal heads and buffer
heads for data buffers in the ordered mode. With the changes, during
tranasaction commit we writeout the inode pages using the
writepages()/writepage(). That implies we take page lock during
transaction commit. This can cause a deadlock with the locking order
page_lock -> jbd2_journal_start, since the jbd2_journal_start can wait
for the journal_commit to happen and the journal_commit now needs to
take the page lock. To avoid this dead lock reverse the locking order.

Signed-off-by: Jan Kara
Signed-off-by: Mingming Cao
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: "Theodore Ts'o"

Jan Kara
2008-07-12 07:27:31 +0800
2e9ee8503 ext4: Use page_mkwrite vma_operations to get mmap write notification. ... Browse Code »

We would like to get notified when we are doing a write on mmap section.
This is needed with respect to preallocated area. We split the preallocated
area into initialzed extent and uninitialzed extent in the call back. This
let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
that would result in data loss. The changes are also needed to handle ENOSPC
when writing to an mmap section of files with holes.

Acked-by: Jan Kara
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o"

Aneesh Kumar K.V
2008-07-12 07:27:31 +0800
5f21b0e64 ext4: fix online resize with mballoc ... Browse Code »

Update group infos when updating a group's descriptor.
Add group infos when adding a group's descriptor.
Refresh cache pages used by mb_alloc when changes occur.
This will probably need modifications when META_BG resizing will be allowed.

Signed-off-by: Frederic Bohe
Signed-off-by: Mingming Cao

Frederic Bohe
2008-07-12 07:27:31 +0800
953e622b6 ext4: use atomic functions to set bh_state ... Browse Code »

Use the BUFFER_FNS functions (set_buffer_foo) to set buffer
head state atomically instead of nonatomic __set_bit().

Signed-off-by: Eric Sandeen
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o"

Eric Sandeen
2008-07-12 07:27:31 +0800
47b4a50be ext4: Set journal pointer to NULL when journal is released ... Browse Code »

Set sbi->s_journal to NULL after we call journal_destroy(). This
will be later needed because after journal_destroy() is called,
ext4_clear_inode() can still be called for some inodes (e.g. root
inode) and we'll need to detect there that journal doesn't exists
anymore.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2008-07-12 07:27:31 +0800
070314310 ext4: mballoc avoid use root reserved blocks for non root allocation ... Browse Code »

mballoc allocation missed check for blocks reserved for root users. Add
ext4_has_free_blocks() check before allocation. Also modified
ext4_has_free_blocks() to support multiple block allocation request.

Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o"

Mingming Cao
2008-07-12 07:27:31 +0800