Eric Lee / smarc-fsl-linux-kernel

11 Sep, 2010

1 commit

fbc148701 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: log IO completion workqueue is a high priority queue
xfs: prevent reading uninitialized stack memory

Linus Torvalds
2010-09-11 09:19:26 +0800

10 Sep, 2010

12 commits

51749e47e xfs: log IO completion workqueue is a high priority queue ... Browse Code »

The workqueue implementation in 2.6.36-rcX has changed, resulting
in the workqueues no longer having dedicated threads for work
processing. This has caused severe livelocks under heavy parallel
create workloads because the log IO completions have been getting
held up behind metadata IO completions. Hence log commits would
stall, memory allocation would stall because pages could not be
cleaned, and lock contention on the AIL during inode IO completion
processing was being seen to slow everything down even further.

By making the log Io completion workqueue a high priority workqueue,
they are queued ahead of all data/metadata IO completions and
processed before the data/metadata completions. Hence the log never
gets stalled, and operations needed to clean memory can continue as
quickly as possible. This avoids the livelock conditions and allos
the system to keep running under heavy load as per normal.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-09-10 23:16:54 +0800
9aea5a65a execve: make responsive to SIGKILL with large arguments ... Browse Code »

An execve with a very large total of argument/environment strings
can take a really long time in the execve system call. It runs
uninterruptibly to count and copy all the strings. This change
makes it abort the exec quickly if sent a SIGKILL.

Note that this is the conservative change, to interrupt only for
SIGKILL, by using fatal_signal_pending(). It would be perfectly
correct semantics to let any signal interrupt the string-copying in
execve, i.e. use signal_pending() instead of fatal_signal_pending().
We'll save that change for later, since it could have user-visible
consequences, such as having a timer set too quickly make it so that
an execve can never complete, though it always happened to work before.

Signed-off-by: Roland McGrath
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Linus Torvalds

Roland McGrath
2010-09-10 23:10:26 +0800
7993bc1f4 execve: improve interactivity with large arguments ... Browse Code »

This adds a preemption point during the copying of the argument and
environment strings for execve, in copy_strings(). There is already
a preemption point in the count() loop, so this doesn't add any new
points in the abstract sense.

When the total argument+environment strings are very large, the time
spent copying them can be much more than a normal user time slice.
So this change improves the interactivity of the rest of the system
when one process is doing an execve with very large arguments.

Signed-off-by: Roland McGrath
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Linus Torvalds

Roland McGrath
2010-09-10 23:10:26 +0800
1b528181b setup_arg_pages: diagnose excessive argument size ... Browse Code »

The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not
check the size of the argument/environment area on the stack.
When it is unworkably large, shift_arg_pages() hits its BUG_ON.
This is exploitable with a very large RLIMIT_STACK limit, to
create a crash pretty easily.

Check that the initial stack is not too large to make it possible
to map in any executable. We're not checking that the actual
executable (or intepreter, for binfmt_elf) will fit. So those
mappings might clobber part of the initial stack mapping. But
that is just userland lossage that userland made happen, not a
kernel problem.

Signed-off-by: Roland McGrath
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Linus Torvalds

Roland McGrath
2010-09-10 23:10:26 +0800
ff3cb3fec Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: Range check cpu in blk_cpu_to_group
scatterlist: prevent invalid free when alloc fails
writeback: Fix lost wake-up shutting down writeback thread
writeback: do not lose wakeup events when forking bdi threads
cciss: fix reporting of max queue depth since init
block: switch s390 tape_block and mg_disk to elevator_change()
block: add function call to switch the IO scheduler from a driver
fs/bio-integrity.c: return -ENOMEM on kmalloc failure
bio-integrity.c: remove dependency on __GFP_NOFAIL
BLOCK: fix bio.bi_rw handling
block: put dev->kobj in blk_register_queue fail path
cciss: handle allocation failure
cfq-iosched: Documentation help for new tunables
cfq-iosched: blktrace print per slice sector stats
cfq-iosched: Implement tunable group_idle
cfq-iosched: Do group share accounting in IOPS when slice_idle=0
cfq-iosched: Do not idle if slice_idle=0
cciss: disable doorbell reset on reset_devices
blkio: Fix return code for mkdir calls

Linus Torvalds
2010-09-10 22:26:27 +0800
a122eb2fd xfs: prevent reading uninitialized stack memory ... Browse Code »

The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12
bytes of uninitialized stack memory, because the fsxattr struct
declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero)
the 12-byte fsx_pad member before copying it back to the user. This
patch takes care of it.

Signed-off-by: Dan Rosenberg
Reviewed-by: Eric Sandeen
Signed-off-by: Alex Elder

Dan Rosenberg
2010-09-10 20:39:28 +0800
eee743fd7 minix: fix regression in minix_mkdir() ... Browse Code »

Commit 9eed1fb721c ("minix: replace inode uid,gid,mode init with helper")
broke directory creation on minix filesystems.

Fix it by passing the needed mode flag to inode init helper.

Signed-off-by: Jorge Boncompte [DTI2]
Cc: Dmitry Monakhov
Cc: Al Viro
Cc: [2.6.35.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jorge Boncompte [DTI2]
2010-09-10 09:57:25 +0800
3ab04d5cf vfs: take O_NONBLOCK out of the O_* uniqueness test ... Browse Code »

O_NONBLOCK on parisc has a dual value:

#define O_NONBLOCK 000200004 /* HPUX has separate NDELAY & NONBLOCK */

It is caught by the O_* bits uniqueness check and leads to a parisc
compile error. The fix would be to take O_NONBLOCK out.

Signed-off-by: Wu Fengguang
Signed-off-by: James Bottomley
Cc: Jamie Lokier
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

James Bottomley
2010-09-10 09:57:25 +0800
ee3aebdd8 binfmt_misc: fix binfmt_misc priority ... Browse Code »

Commit 74641f584da ("alpha: binfmt_aout fix") (May 2009) introduced a
regression - binfmt_misc is now consulted after binfmt_elf, which will
unfortunately break ia32el. ia32 ELF binaries on ia64 used to be matched
using binfmt_misc and executed using wrapper. As 32bit binaries are now
matched by binfmt_elf before bindmt_misc kicks in, the wrapper is ignored.

The fix increases precedence of binfmt_misc to the original state.

Signed-off-by: Jan Sembera
Cc: Ivan Kokshaysky
Cc: Al Viro
Cc: Richard Henderson [2.6.everything.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Sembera
2010-09-10 09:57:24 +0800
ed430fec7 proc: export uncached bit properly in /proc/kpageflags ... Browse Code »

Fix the left-over old ifdef for PG_uncached in /proc/kpageflags. Now it's
used by x86, too.

Signed-off-by: Takashi Iwai
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Takashi Iwai
2010-09-10 09:57:23 +0800
7a801ac6f O_DIRECT: fix the splitting up of contiguous I/O ... Browse Code »

commit c2c6ca4 (direct-io: do not merge logically non-contiguous requests)
introduced a bug whereby all O_DIRECT I/Os were submitted a page at a time
to the block layer. The problem is that the code expected
dio->block_in_file to correspond to the current page in the dio. In fact,
it corresponds to the previous page submitted via submit_page_section.
This was purely an oversight, as the dio->cur_page_fs_offset field was
introduced for just this purpose. This patch simply uses the correct
variable when calculating whether there is a mismatch between contiguous
logical blocks and contiguous physical blocks (as described in the
comments).

I also switched the if conditional following this check to an else if, to
ensure that we never call dio_bio_submit twice for the same dio (in
theory, this should not happen, anyway).

I've tested this by running blktrace and verifying that a 64KB I/O was
submitted as a single I/O. I also ran the patched kernel through
xfstests' aio tests using xfs, ext4 (with 1k and 4k block sizes) and btrfs
and verified that there were no regressions as compared to an unpatched
kernel.

Signed-off-by: Jeff Moyer
Acked-by: Josef Bacik
Cc: Christoph Hellwig
Cc: Chris Mason
Cc: [2.6.35.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Moyer
2010-09-10 09:57:22 +0800
39aa3cb3e mm: Move vma_stack_continue into mm.h ... Browse Code »

So it can be used by all that need to check for that.

Signed-off-by: Stefan Bader
Signed-off-by: Linus Torvalds

Stefan Bader
2010-09-10 00:05:06 +0800

09 Sep, 2010

2 commits

cad46744a Merge branch 'fixes' of git://oss.oracle.com/git/tma/linux-2.6 ... Browse Code »

* 'fixes' of git://oss.oracle.com/git/tma/linux-2.6:
ocfs2: Fix orphan add in ocfs2_create_inode_in_orphan
ocfs2: split out ocfs2_prepare_orphan_dir() into locking and prep functions
ocfs2: allow return of new inode block location before allocation of the inode
ocfs2: use ocfs2_alloc_dinode_update_counts() instead of open coding
ocfs2: split out inode alloc code from ocfs2_mknod_locked
Ocfs2: Fix a regression bug from mainline commit(6b933c8e6f1a2f3118082c455eef25f9b1ac7b45).
ocfs2: Fix deadlock when allocating page
ocfs2: properly set and use inode group alloc hint
ocfs2: Use the right group in nfs sync check.
ocfs2: Flush drive's caches on fdatasync
ocfs2: make __ocfs2_page_mkwrite handle file end properly.
ocfs2: Fix incorrect checksum validation error
ocfs2: Fix metaecc error messages

Linus Torvalds
2010-09-09 23:57:02 +0800
c8c727db4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix lock annotations
fuse: flush background queue on connection close

Linus Torvalds
2010-09-09 02:12:59 +0800

08 Sep, 2010

19 commits

97b8f4a9d ocfs2: Fix orphan add in ocfs2_create_inode_in_orphan ... Browse Code »

ocfs2_create_inode_in_orphan() is used by reflink to create the newly
reflinked inode simultaneously in the orphan dir. This allows us to easily
handle partially-reflinked files during recovery cleanup.

We have a problem though - the orphan dir stringifies inode # to determine
a unique name under which the orphan entry dirent can be created. Since
ocfs2_create_inode_in_orphan() needs the space allocated in the orphan dir
before it can allocate the inode, we currently call into the orphan code:

/*
* We give the orphan dir the root blkno to fake an orphan name,
* and allocate enough space for our insertion.
*/
status = ocfs2_prepare_orphan_dir(osb, &orphan_dir,
osb->root_blkno,
orphan_name, &orphan_insert);

Using osb->root_blkno might work fine on unindexed directories, but the
orphan dir can have an index. When it has that index, the above code fails
to allocate the proper index entry. Later, when we try to remove the file
from the orphan dir (using the actual inode #), the reflink operation will
fail.

To fix this, I created a function ocfs2_alloc_orphaned_file() which uses the
newly split out orphan and inode alloc code to figure out what the inode
block number will be (once allocated) and then prepare the orphan dir from
that data.

Signed-off-by: Mark Fasheh
Signed-off-by: Tao Ma

Mark Fasheh
2010-09-08 14:26:00 +0800
dd43bcde2 ocfs2: split out ocfs2_prepare_orphan_dir() into locking and prep functions ... Browse Code »

We do this because ocfs2_create_inode_in_orphan() wants to order locking of
the orphan dir with respect to locking of the inode allocator *before*
making any changes to the directory.

Signed-off-by: Mark Fasheh
Signed-off-by: Tao Ma

Mark Fasheh
2010-09-08 14:26:00 +0800
e49e27674 ocfs2: allow return of new inode block location before allocation of the inode ... Browse Code »

This allows code which needs to know the eventual block number of an inode
but can't allocate it yet due to transaction or lock ordering. For example,
ocfs2_create_inode_in_orphan() currently gives a junk blkno for preparation
of the orphan dir because it can't yet know where the actual inode is placed
- that code is actually in ocfs2_mknod_locked. This is a problem when the
orphan dirs are indexed as the junk inode number will create an index entry
which goes unused (and fails the later removal from the orphan dir). Now
with these interfaces, ocfs2_create_inode_in_orphan() can run the block
group search (and get back the inode block number) *before* any actual
allocation occurs.

Signed-off-by: Mark Fasheh
Signed-off-by: Tao Ma

Mark Fasheh
2010-09-08 14:25:59 +0800
d51349829 ocfs2: use ocfs2_alloc_dinode_update_counts() instead of open coding ... Browse Code »

ocfs2_search_chain() makes the same updates as
ocfs2_alloc_dinode_update_counts to the alloc inode. Instead of open coding
the bitmap update, use our helper function.

Signed-off-by: Mark Fasheh
Signed-off-by: Tao Ma

Mark Fasheh
2010-09-08 14:25:58 +0800
021960cab ocfs2: split out inode alloc code from ocfs2_mknod_locked ... Browse Code »

Do this by splitting the bulk of the function away from the inode allocation
code at the very tom of ocfs2_mknod_locked(). Existing callers don't need to
change and won't see any difference. The new function created,
__ocfs2_mknod_locked() will be used shortly.

Signed-off-by: Mark Fasheh
Signed-off-by: Tao Ma

Mark Fasheh
2010-09-08 14:25:58 +0800
81c8c82b5 Ocfs2: Fix a regression bug from mainline commit(6b933c8e6f1a2f3118082c455eef25f9b1ac7b45 ). ... Browse Code »

The patch is to fix the regression bug brought from commit 6b933c8...( 'ocfs2:
Avoid direct write if we fall back to buffered I/O'):

http://oss.oracle.com/bugzilla/show_bug.cgi?id=1285

The commit 6b933c8e6f1a2f3118082c455eef25f9b1ac7b45 changed __generic_file_aio_write
to generic_file_buffered_write, which didn't call filemap_{write,wait}_range to flush
the pagecaches when we were falling O_DIRECT writes back to buffered ones. it did hurt
the O_DIRECT semantics somehow in extented odirect writes.

This patch tries to guarantee O_DIRECT writes of 'fall back to buffered' to be correctly
flushed.

Signed-off-by: Tristan Ye
Signed-off-by: Tao Ma

Tristan Ye
2010-09-08 14:25:57 +0800
9b4c0ff32 ocfs2: Fix deadlock when allocating page ... Browse Code »

We cannot call grab_cache_page() when holding filesystem locks or with
a transaction started as grab_cache_page() calls page allocation with
GFP_KERNEL flag and thus page reclaim can recurse back into the filesystem
causing deadlocks or various assertion failures. We have to use
find_or_create_page() instead and pass it GFP_NOFS as we do with other
allocations.

Acked-by: Mark Fasheh
Signed-off-by: Jan Kara
Signed-off-by: Tao Ma

Jan Kara
2010-09-08 14:25:57 +0800
b2b6ebf5f ocfs2: properly set and use inode group alloc hint ... Browse Code »

We were setting ac->ac_last_group in ocfs2_claim_suballoc_bits from
res->sr_bg_blkno. Unfortunately, res->sr_bg_blkno is going to be zero under
normal (non-fragmented) circumstances. The discontig block group patches
effectively turned off that feature. Fix this by correctly calculating what
the next group hint should be.

Acked-by: Tao Ma
Signed-off-by: Mark Fasheh
Tested-by: Goldwyn Rodrigues
Signed-off-by: Tao Ma

Mark Fasheh
2010-09-08 14:25:56 +0800
889f004a8 ocfs2: Use the right group in nfs sync check. ... Browse Code »

We have added discontig block group now, and now an inode
can be allocated in an discontig block group. So get
it in ocfs2_get_suballoc_slot_bit.

The old ocfs2_test_suballoc_bit gets group block no
from the allocation inode which is wrong. Fix it by
passing the right group.

Acked-by: Mark Fasheh
Signed-off-by: Tao Ma

Tao Ma
2010-09-08 14:25:56 +0800
04eda1a18 ocfs2: Flush drive's caches on fdatasync ... Browse Code »

When 'barrier' mount option is specified, we have to issue a cache flush
during fdatasync(2). We have to do this even if inode doesn't have
I_DIRTY_DATASYNC set because we still have to get written *data* to disk so
that they are not lost in case of crash.

Acked-by: Tao Ma
Signed-off-by: Jan Kara
Singed-off-by: Tao Ma

Jan Kara
2010-09-08 14:25:55 +0800
f63afdb2c ocfs2: make __ocfs2_page_mkwrite handle file end properly. ... Browse Code »

__ocfs2_page_mkwrite now is broken in handling file end.
1. the last page should be the page contains i_size - 1.
2. the len in the last page is also calculated wrong.
So change them accordingly.

Acked-by: Mark Fasheh
Signed-off-by: Tao Ma

Tao Ma
2010-09-08 14:25:55 +0800
f5ce5a08a ocfs2: Fix incorrect checksum validation error ... Browse Code »

For local mounts, ocfs2_read_locked_inode() calls ocfs2_read_blocks_sync() to
read the inode off the disk. The latter first checks to see if that block is
cached in the journal, and, if so, returns that block. That is ok.

But ocfs2_read_locked_inode() goes wrong when it tries to validate the checksum
of such blocks. Blocks that are cached in the journal may not have had their
checksum computed as yet. We should not validate the checksums of such blocks.

Fixes ossbz#1282
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1282

Signed-off-by: Sunil Mushran
Cc: stable@kernel.org
Singed-off-by: Tao Ma

Sunil Mushran
2010-09-08 14:25:54 +0800
dc696aced ocfs2: Fix metaecc error messages ... Browse Code »

Like tools, the checksum validate function now prints the values in hex.

Signed-off-by: Sunil Mushran
Singed-off-by: Tao Ma

Sunil Mushran
2010-09-08 14:25:53 +0800
4f63e3c5b Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linux ... Browse Code »

* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux:
nfsd4: mask out non-access bits in nfs4_access_to_omode

Linus Torvalds
2010-09-08 10:21:02 +0800
fa2925cf9 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: Make fiemap work with sparse files
xfs: prevent 32bit overflow in space reservation
xfs: Disallow 32bit project quota id
xfs: improve buffer cache hash scalability

Linus Torvalds
2010-09-08 06:44:28 +0800
3c5dff7b5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: potential ERR_PTR() dereference

Linus Torvalds
2010-09-08 05:38:21 +0800
d3de0eb16 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
sysfs: checking for NULL instead of ERR_PTR

Linus Torvalds
2010-09-08 05:04:59 +0800
4848d7156 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: fix leak of shadow dat inode in error path of load_nilfs

Linus Torvalds
2010-09-08 05:01:50 +0800
7a2e8a8fa VFS: Sanity check mount flags passed to change_mnt_propagation() ... Browse Code »
44

Sanity check the flags passed to change_mnt_propagation(). Exactly
one flag should be set. Return EINVAL otherwise.

Userspace can pass in arbitrary combinations of MS_* flags to mount().
do_change_type() is called if any of MS_SHARED, MS_PRIVATE, MS_SLAVE,
or MS_UNBINDABLE is set. do_change_type() clears MS_REC and then
calls change_mnt_propagation() with the rest of the user-supplied
flags. change_mnt_propagation() clearly assumes only one flag is set
but do_change_type() does not check that this is true. For example,
mount() with flags MS_SHARED | MS_RDONLY does not actually make the
mount shared or read-only but does clear MNT_UNBINDABLE.

Signed-off-by: Valerie Aurora
Signed-off-by: Linus Torvalds

Valerie Aurora
2010-09-08 04:46:20 +0800

07 Sep, 2010

2 commits

b9ca67b2d fuse: fix lock annotations ... Browse Code »

Sparse doesn't understand lock annotations of the form
__releases(&foo->lock). Change them to __releases(foo->lock). Same
for __acquires().

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2010-09-07 19:42:41 +0800
595afaf9e fuse: flush background queue on connection close ... Browse Code »

David Bartly reported that fuse can hang in fuse_get_req_nofail() when
the connection to the filesystem server is no longer active.

If bg_queue is not empty then flush_bg_queue() called from
request_end() can put more requests on to the pending queue. If this
happens while ending requests on the processing queue then those
background requests will be queued to the pending list and never
ended.

Another problem is that fuse_dev_release() didn't wake up processes
sleeping on blocked_waitq.

Solve this by:

a) flushing the background queue before calling end_requests() on the
pending and processing queues

b) setting blocked = 0 and waking up processes waiting on
blocked_waitq()

Thanks to David for an excellent bug report.

Reported-by: David Bartley
Signed-off-by: Miklos Szeredi
CC: stable@kernel.org

Miklos Szeredi
2010-09-07 19:42:41 +0800

04 Sep, 2010

1 commit

57f9bdac2 sysfs: checking for NULL instead of ERR_PTR ... Browse Code »

d_path() returns an ERR_PTR and it doesn't return NULL.

Signed-off-by: Dan Carpenter
Cc: stable
Reviewed-by: "Eric W. Biederman"
Signed-off-by: Greg Kroah-Hartman

Dan Carpenter
2010-09-04 08:26:28 +0800

03 Sep, 2010

3 commits

cb7a93412 Merge branch '2.6.36-xfs-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev Browse Code »

Alex Elder
2010-09-03 22:02:32 +0800
9af254650 xfs: Make fiemap work with sparse files ... Browse Code »

In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want
to return fi_extent_max extents, but actually it won't work for
a sparse file. The reason is that in xfs_getbmap we will
calculate holes and set it in 'out', while out is malloced by
bmv_count(fi_extent_max+1) which didn't consider holes. So in the
worst case, if 'out' vector looks like
[hole, extent, hole, extent, hole, ... hole, extent, hole],
we will only return half of fi_extent_max extents.

This patch add a new parameter BMV_IF_NO_HOLES for bvm_iflags.
So with this flags, we don't use our 'out' in xfs_getbmap for
a hole. The solution is a bit ugly by just don't increasing
index of 'out' vector. I felt that it is not easy to skip it
at the very beginning since we have the complicated check and
some function like xfs_getbmapx_fix_eof_hole to adjust 'out'.

Cc: Dave Chinner
Signed-off-by: Tao Ma
Signed-off-by: Alex Elder

Tao Ma
2010-09-03 22:02:11 +0800
72656c46f xfs: prevent 32bit overflow in space reservation ... Browse Code »

If we attempt to preallocate more than 2^32 blocks of space in a
single syscall, the transaction block reservation will overflow
leading to a hangs in the superblock block accounting code. This
is trivially reproduced with xfs_io. Fix the problem by capping the
allocation reservation to the maximum number of blocks a single
xfs_bmapi() call can allocate (2^21 blocks).

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-09-03 10:19:33 +0800