Doug / smarc-fsl-linux-kernel | Embedian Git Server

24 Sep, 2009

1 commit

db1682636 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 ... Browse Code »

* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
HWPOISON: Enable error_remove_page on btrfs
HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
HWPOISON: Add madvise() based injector for hardware poisoned pages v4
HWPOISON: Enable error_remove_page for NFS
HWPOISON: Enable .remove_error_page for migration aware file systems
HWPOISON: The high level memory error handler in the VM v7
HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
HWPOISON: shmem: call set_page_dirty() with locked page
HWPOISON: Define a new error_remove_page address space op for async truncation
HWPOISON: Add invalidate_inode_page
HWPOISON: Refactor truncate to allow direct truncating of page v2
HWPOISON: check and isolate corrupted free pages v2
HWPOISON: Handle hardware poisoned pages in try_to_unmap
HWPOISON: Use bitmask/action code for try_to_unmap behaviour
HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
HWPOISON: Add poison check to page fault handling
HWPOISON: Add basic support for poisoned pages in fault handler v3
HWPOISON: Add new SIGBUS error codes for hardware poison signals
HWPOISON: Add support for poison swap entries v2
HWPOISON: Export some rmap vma locking to outside world
...

Linus Torvalds
2009-09-24 22:53:22 +0800

23 Sep, 2009

1 commit

a4255e4c1 ext2: fix format string compile warning (ino_t) ... Browse Code »

Unlike on most other architectures ino_t is an unsigned int on s390. So
add an explicit cast to avoid this compile warning:

fs/ext2/namei.c: In function 'ext2_lookup':
fs/ext2/namei.c:73: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'

Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2009-09-23 22:39:58 +0800

22 Sep, 2009

1 commit

83d5cde47 const: make block_device_operations const ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-09-22 22:17:25 +0800

16 Sep, 2009

1 commit

aa261f549 HWPOISON: Enable .remove_error_page for migration aware file systems ... Browse Code »

Enable removing of corrupted pages through truncation
for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
These should cover most server needs.

I chose the set of migration aware file systems for this
for now, assuming they have been especially audited.
But in general it should be safe for all file systems
on the data area that support read/write and truncate.

Caveat: the hardware error handler does not take i_mutex
for now before calling the truncate function. Is that ok?

Cc: tytso@mit.edu
Cc: hch@infradead.org
Cc: mfasheh@suse.com
Cc: aia21@cantab.net
Cc: hugh.dickins@tiscali.co.uk
Cc: swhiteho@redhat.com
Signed-off-by: Andi Kleen

Andi Kleen
2009-09-16 17:50:16 +0800

14 Sep, 2009

1 commit

a2a735ad6 ext2: Update comment about generic_osync_inode ... Browse Code »

We rely on generic_write_sync() now.

CC: linux-ext4@vger.kernel.org
Signed-off-by: Jan Kara

Jan Kara
2009-09-14 23:08:16 +0800

09 Sep, 2009

1 commit

1d5ccd1c4 ext[234]: move over to 'check_acl' permission model ... Browse Code »

Don't implement per-filesystem 'extX_permission()' functions that have
to be called for every path component operation, and instead just expose
the actual ACL checking so that the VFS layer can now do it for us.

Reviewed-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: Linus Torvalds

Linus Torvalds
2009-09-09 02:09:04 +0800

06 Sep, 2009

1 commit

9de6886ec ext2: fix unbalanced kmap()/kunmap() ... Browse Code »

In ext2_rename(), dir_page is acquired through ext2_dotdot(). It is
then released through ext2_set_link() but only if old_dir != new_dir.
Failing that, the pkmap reference count is never decremented and the
page remains pinned forever. Repeat that a couple times with highmem
pages and all pkmap slots get exhausted, and every further kmap() calls
end up stalling on the pkmap_map_wait queue at which point the whole
system comes to a halt.

Signed-off-by: Nicolas Pitre
Acked-by: Theodore Ts'o
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Nicolas Pitre
2009-09-06 04:41:08 +0800

13 Jul, 2009

1 commit

405f55712 headers: smp_lock.h redux ... Browse Code »

* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)

Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-07-13 03:22:34 +0800

01 Jul, 2009

1 commit

4d6c13f87 ext2: return -EIO not -ESTALE on directory traversal through deleted inode ... Browse Code »

ext2_iget() returns -ESTALE if invoked on a deleted inode, in order to
report errors to NFS properly. However, in ext[234]_lookup(), this
-ESTALE can be propagated to userspace if the filesystem is corrupted such
that a directory entry references a deleted inode. This leads to a
misleading error message - "Stale NFS file handle" - and confusion on the
part of the admin.

The bug can be easily reproduced by creating a new filesystem, making a
link to an unused inode using debugfs, then mounting and attempting to ls
-l said link.

This patch thus changes ext2_lookup to return -EIO if it receives -ESTALE
from ext2_iget(), as ext2 does for other filesystem metadata corruption;
and also invokes the appropriate ext*_error functions when this case is
detected.

Signed-off-by: Bryan Donlan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bryan Donlan
2009-07-01 09:56:00 +0800

24 Jun, 2009

2 commits

073aaa1b1 helpers for acl caching + switch to those ... Browse Code »

helpers: get_cached_acl(inode, type), set_cached_acl(inode, type, acl),
forget_cached_acl(inode, type).

ubifs/xattr.c needed includes reordered, the rest is a plain switchover.

Signed-off-by: Al Viro

Al Viro
2009-06-24 20:17:07 +0800
5e78b4356 switch ext2 to inode->i_acl ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2009-06-24 20:15:28 +0800

19 Jun, 2009

1 commit

39fe7557b ext2: Do not update mtime of a moved directory ... Browse Code »

One of our users is complaining that his backup tool is upset on ext2
(while it's happy on ext3, xfs, ...) because of the mtime change.

The problem is:

mkdir foo
mkdir bar
mkdir foo/a

Now under ext2:
mv foo/a foo/b

changes mtime of 'foo/a' (foo/b after the move). That does not really
make sense and it does not happen under any other filesystem I've seen.

More complicated is:
mv foo/a bar/a

This changes mtime of foo/a (bar/a after the move) and it makes some
sense since we had to update parent directory pointer of foo/a. But
again, no other filesystem does this. So after some thoughts I'd vote
for consistency and change ext2 to behave the same as other filesystems.

Do not update mtime of a moved directory. Specs don't say anything
about it (neither that it should, nor that it should not be updated) and
other common filesystems (ext3, ext4, xfs, reiserfs, fat, ...) don't do
it. So let's become more consistent.

Spotted by ronny.pretzsch@dfs.de, initial fix by Jörn Engel.

Reported-by:
Cc:
Cc: Jörn Engel
Signed-off-by: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2009-06-19 04:03:44 +0800

13 Jun, 2009

1 commit

88164ff4f trivial: ext2: fix a typo in comment in ext2.h ... Browse Code »

Signed-off-by: Ali Gholami Rudi
Signed-off-by: Jiri Kosina

Ali Gholami Rudi
2009-06-13 00:01:44 +0800

12 Jun, 2009

5 commits

40f31dd47 ext2: add ->sync_fs ... Browse Code »

Add a ->sync_fs method for data integrity syncs, and reimplement
->write_super ontop of it.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:15 +0800
e1740a462 switch ext2 to simple_fsync() ... Browse Code »

kill ext2_sync_file() (along with ext2/fsync.c), get rid of
ext2_update_inode() - it's an alias of ext2_write_inode().

Signed-off-by: Al Viro

Al Viro
2009-06-12 09:36:12 +0800
337eb00a2 Push BKL down into ->remount_fs() ... Browse Code »

[xfs, btrfs, capifs, shmem don't need BKL, exempt]

Signed-off-by: Alessio Igor Bogani
Signed-off-by: Al Viro

Alessio Igor Bogani
2009-06-12 09:36:11 +0800
6cfd01484 push BKL down into ->put_super ... Browse Code »

Move BKL into ->put_super from the only caller. A couple of
filesystems had trivial enough ->put_super (only kfree and NULLing of
s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
of them probably don't need it, but I'd rather sort that out individually.
Preferably after all the other BKL pushdowns in that area.

[AV: original used to move lock_super() down as well; these changes are
removed since we don't do lock_super() at all in generic_shutdown_super()
now]
[AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:07 +0800
8c85e1251 remove ->write_super call in generic_shutdown_super ... Browse Code »

We just did a full fs writeout using sync_filesystem before, and if
that's not enough for the filesystem it can perform it's own writeout
in ->put_super, which many filesystems already do.

Move a call to foofs_write_super into every foofs_put_super for now to
guarantee identical behaviour until it's cleaned up by the individual
filesystem maintainers.

Exceptions:

- affs already has identical copy & pasted code at the beginning of
affs_put_super so no need to do it twice.
- xfs does the right thing without it and I have changes pending for
the xfs tree touching this are so I don't really need conflicts
here..

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:06 +0800

18 May, 2009

1 commit

0f7ee7c17 ext2: Fix memory leak in ext2_fill_super() in case of a failed mount ... Browse Code »

Signed-off-by: Manish Katiyar
Signed-off-by: "Theodore Ts'o"

Manish Katiyar
2009-05-18 11:52:51 +0800

27 Apr, 2009

1 commit

a069e9cee ext2: missing unlock in ext2_quota_write() ... Browse Code »

The inode->i_mutex should be unlocked.

Found by smatch (http://repo.or.cz/w/smatch.git). Compile tested.

Signed-off-by: Dan Carpenter
Signed-off-by: Jan Kara

Dan Carpenter
2009-04-27 22:49:52 +0800

14 Apr, 2009

1 commit

316cb4ef3 ext2: fix data corruption for racing writes ... Browse Code »

If two writers allocating blocks to file race with each other (e.g.
because writepages races with ordinary write or two writepages race with
each other), ext2_getblock() can be called on the same inode in parallel.
Before we are going to allocate new blocks, we have to recheck the block
chain we have obtained so far without holding truncate_mutex. Otherwise
we could overwrite the indirect block pointer set by the other writer
leading to data loss.

The below test program by Ying is able to reproduce the data loss with ext2
on in BRD in a few minutes if the machine is under memory pressure:

long kMemSize = 50 << 20;
int kPageSize = 4096;

int main(int argc, char **argv) {
int status;
int count = 0;
int i;
char *fname = "/mnt/test.mmap";
char *mem;
unlink(fname);
int fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0600);
status = ftruncate(fd, kMemSize);
mem = mmap(0, kMemSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
// Fill the memory with 1s.
memset(mem, 1, kMemSize);
sleep(2);
for (i = 0; i < kMemSize; i++) {
int byte_good = mem[i] != 0;
if (!byte_good && ((i % kPageSize) == 0)) {
//printf("%d ", i / kPageSize);
count++;
}
}
munmap(mem, kMemSize);
close(fd);
unlink(fname);

if (count > 0) {
printf("Running %d bad page\n", count);
return 1;
}
return 0;
}

Cc: Ying Han
Cc: Nick Piggin
Signed-off-by: Jan Kara
Cc: Mingming Cao
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2009-04-14 06:04:33 +0800

01 Apr, 2009

1 commit

ce3b0f8d5 New helper - current_umask() ... Browse Code »

current->fs->umask is what most of fs_struct users are doing.
Put that into a helper function.

Signed-off-by: Al Viro

Al Viro
2009-04-01 11:00:26 +0800

26 Mar, 2009

2 commits

c16831b4c ext2: Zero our b_size in ext2_quota_read() ... Browse Code »

ext2_quota_read() doesn't initialize tmp_bh.b_size before calling
ext2_get_block() where we access it. Since it is a local variable it
might contain some garbage. Make sure it is filled with reasonable
value before passing.

Signed-off-by: Manish Katiyar
Signed-off-by: Jan Kara

Manish Katiyar
2009-03-26 09:18:38 +0800
6f90bee50 ext2: Use lowercase names of quota functions ... Browse Code »

Use lowercase names of quota functions instead of old uppercase ones.

Signed-off-by: Jan Kara
CC: linux-ext4@vger.kernel.org

Jan Kara
2009-03-26 09:18:36 +0800

12 Feb, 2009

1 commit

0e4a9b592 ext2/xip: refuse to change xip flag during remount with busy inodes ... Browse Code »

For a reason that I was unable to understand in three months of debugging,
mount ext2 -o remount stopped working properly when remounting from
regular operation to xip, or the other way around. According to a git
bisect search, the problem was introduced with the VM_MIXEDMAP/PTE_SPECIAL
rework in the vm:

commit 70688e4dd1647f0ceb502bbd5964fa344c5eb411
Author: Nick Piggin
Date: Mon Apr 28 02:13:02 2008 -0700

xip: support non-struct page backed memory

In the failing scenario, the filesystem is mounted read only via root=
kernel parameter on s390x. During remount (in rc.sysinit), the inodes of
the bash binary and its libraries are busy and cannot be invalidated (the
bash which is running rc.sysinit resides on subject filesystem).
Afterwards, another bash process (running ifup-eth) recurses into a
subshell, runs dup_mm (via fork). Some of the mappings in this bash
process were created from inodes that could not be invalidated during
remount.

Both parent and child process crash some time later due to inconsistencies
in their address spaces. The issue seems to be timing sensitive, various
attempts to recreate it have failed.

This patch refuses to change the xip flag during remount in case some
inodes cannot be invalidated. This patch keeps users from running into
that issue.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Carsten Otte
Cc: Nick Piggin
Cc: Jared Hulbert
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Carsten Otte
2009-02-12 06:25:36 +0800

16 Jan, 2009

1 commit

6b7021ef7 ext2: also update the inode on disk when dir is IS_DIRSYNC ... Browse Code »

We used to just write changed page for IS_DIRSYNC inodes. But we also
have to update the directory inode itself just for the case that we've
allocated a new block and changed i_size.

[akpm@linux-foundation.org: still sync the data page]
Signed-off-by: Jan Kara
Tested-by: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2009-01-16 08:39:42 +0800

09 Jan, 2009

4 commits

ef8b64618 ext2: tighten restrictions on inode flags ... Browse Code »

At the moment there are few restrictions on which flags may be set on
which inodes. Specifically DIRSYNC may only be set on directories and
IMMUTABLE and APPEND may not be set on links. Tighten that to disallow
TOPDIR being set on non-directories and only NODUMP and NOATIME to be set
on non-regular file, non-directories.

Introduces a flags masking function which masks flags based on mode and
use it during inode creation and when flags are set via the ioctl to
facilitate future consistency.

Signed-off-by: Duane Griffin
Acked-by: Andreas Dilger
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Duane Griffin
2009-01-09 00:31:00 +0800
0e090f1e0 ext2: don't inherit inappropriate inode flags from parent ... Browse Code »

At present BTREE/INDEX is the only flag that new ext2 inodes do NOT
inherit from their parent. In addition prevent the flags DIRTY, ECOMPR,
INDEX, IMAGIC and TOPDIR from being inherited. List inheritable flags
explicitly to prevent future flags from accidentally being inherited.

This fixes the TOPDIR flag inheritance bug reported at
http://bugzilla.kernel.org/show_bug.cgi?id=9866.

Signed-off-by: Duane Griffin
Acked-by: Andreas Dilger
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Duane Griffin
2009-01-09 00:31:00 +0800
18a82eb9f ext2: allocate ->s_blockgroup_lock separately ... Browse Code »

As spotted by kmemtrace, struct ext2_sb_info is 17024 bytes on 64-bit
which makes it a very bad fit for SLAB allocators. The culprit of the
wasted memory is ->s_blockgroup_lock which can be as big as 16 KB when
NR_CPUS >= 32.

To fix that, allocate ->s_blockgroup_lock, which fits nicely in a order 2
page in the worst case, separately. This shinks down struct ext2_sb_info
enough to fit a 1 KB slab cache so now we allocate 16 KB + 1 KB instead of
32 KB saving 15 KB of memory.

Acked-by: Andreas Dilger
Signed-off-by: Pekka Enberg
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka J Enberg
2009-01-09 00:31:00 +0800
22d613d13 ext2: fix ext2_splice_branch() comments ... Browse Code »

There is no argument named @chain in ext2_splice_branch, remove references
to it.

Signed-off-by: Qinghuang Feng
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Qinghuang Feng
2009-01-09 00:31:00 +0800

01 Jan, 2009

2 commits

41080b5a2 nfsd race fixes: ext2 ... Browse Code »

* make ext2_new_inode() put the inode into icache in locked state
* do not unlock until the inode is fully set up; otherwise nfsd
might pick it in half-baked state.
* make sure that ext2_new_inode() does *not* lead to two inodes with the
same inumber hashed at the same time; otherwise a bogus fhandle coming
from nfsd might race with inode creation:

nfsd: iget_locked() creates inode
nfsd: try to read from disk, block on that.
ext2_new_inode(): allocate inode with that inumber
ext2_new_inode(): insert it into icache, set it up and dirty
ext2_write_inode(): get the relevant part of inode table in cache,
set the entry for our inode (and start writing to disk)
nfsd: get CPU again, look into inode table, see nice and sane on-disk
inode, set the in-core inode from it

oops - we have two in-core inodes with the same inumber live in icache,
both used for IO. Welcome to fs corruption...

Signed-off-by: Al Viro

Al Viro
2009-01-01 07:07:43 +0800
8d6d0c4da ext2: ensure fast symlinks are NUL-terminated ... Browse Code »

Ensure fast symlink targets are NUL-terminated, even if corrupted
on-disk.

Cc: Andrew Morton
Signed-off-by: Duane Griffin
Signed-off-by: Al Viro

Duane Griffin
2009-01-01 07:07:39 +0800

14 Nov, 2008

1 commit

a8dd4d67b CRED: Wrap task credential accesses in the Ext2 filesystem ... Browse Code »

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells
Reviewed-by: James Morris
Acked-by: Serge Hallyn
Cc: linux-ext4@vger.kernel.org
Signed-off-by: James Morris

David Howells
2008-11-14 07:38:50 +0800

24 Oct, 2008

1 commit

224848564 Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
[PATCH] kill the rest of struct file propagation in block ioctls
[PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
[PATCH] get rid of blkdev_locked_ioctl()
[PATCH] get rid of blkdev_driver_ioctl()
[PATCH] sanitize blkdev_get() and friends
[PATCH] remember mode of reiserfs journal
[PATCH] propagate mode through swsusp_close()
[PATCH] propagate mode through open_bdev_excl/close_bdev_excl
[PATCH] pass fmode_t to blkdev_put()
[PATCH] kill the unused bsize on the send side of /dev/loop
[PATCH] trim file propagation in block/compat_ioctl.c
[PATCH] end of methods switch: remove the old ones
[PATCH] switch sr
[PATCH] switch sd
[PATCH] switch ide-scsi
[PATCH] switch tape_block
[PATCH] switch dcssblk
[PATCH] switch dasd
[PATCH] switch mtd_blkdevs
[PATCH] switch mmc
...

Linus Torvalds
2008-10-24 01:23:07 +0800

23 Oct, 2008

2 commits

a9885444f [PATCH] get rid of on-stack dentry in ext2_get_parent() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-10-23 17:13:09 +0800
440037287 [PATCH] switch all filesystems over to d_obtain_alias ... Browse Code »

Switch all users of d_alloc_anon to d_obtain_alias.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2008-10-23 17:13:01 +0800

21 Oct, 2008

2 commits

08f858512 [PATCH] move block_device_operations to blkdev.h ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-10-21 19:47:20 +0800
6da0b38f4 fs/Kconfig: move ext2, ext3, ext4, JBD, JBD2 out ... Browse Code »

Use fs/*/Kconfig more, which is good because everything related to one
filesystem is in one place and fs/Kconfig is quite fat.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2008-10-21 02:43:59 +0800

17 Oct, 2008

2 commits

bd39597cb ext2: avoid printk floods in the face of directory corruption ... Browse Code »

A very large directory with many read failures (either due to storage
problems, or due to invalid size & blocks from corruption) will generate a
printk storm as the filesystem continues to try to read all the blocks.
This flood of messages can tie up the box until it is complete - which may
be a very long time, especially for very large corrupted values.

This is fixed by only reporting the corruption once each time we try to
read the directory.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Eric Sandeen
Signed-off-by: "Theodore Ts'o"
Cc: Eugene Teo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Sandeen
2008-10-17 02:21:46 +0800
d707d31c9 ext2: fix ext2 block reservation early ENOSPC issue ... Browse Code »

We could run into ENOSPC error on ext2, even when there is free blocks on
the filesystem.

The problem is triggered in the case the goal block group has 0 free
blocks , and the rest block groups are skipped due to the check of
"free_blocks < windowsz/2". Current code could fall back to non
reservation allocation to prevent early ENOSPC after examing all the block
groups with reservation on , but this code was bypassed if the reservation
window is turned off already, which is true in this case.

This patch fixed two issues:
1) We don't need to turn off block reservation if the goal block group has
0 free blocks left and continue search for the rest of block groups.

Current code the intention is to turn off the block reservation if the
goal allocation group has a few (some) free blocks left (not enough for
make the desired reservation window),to try to allocation in the goal
block group, to get better locality. But if the goal blocks have 0 free
blocks, it should leave the block reservation on, and continues search for
the next block groups,rather than turn off block reservation completely.

2) we don't need to check the window size if the block reservation is off.

The problem was originally found and fixed in ext4.

Signed-off-by: Mingming Cao
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2008-10-17 02:21:45 +0800