11 Dec, 2006
1 commit
-
The only time it is safe to call aio_complete() is when the ->ki_retry
function returns -EIOCBQUEUED to the AIO core. direct_io_worker() has
historically done this by relying on its caller to translate positive return
codes into -EIOCBQUEUED for the aio case. It did this by trying to keep
conditionals in sync. direct_io_worker() knew when finished_one_bio() was
going to call aio_complete(). It would reverse the test and wait and free the
dio in the cases it thought that finished_one_bio() wasn't going to.Not surprisingly, it ended up getting it wrong. 'ret' could be a negative
errno from the submission path but it failed to communicate this to
finished_one_bio(). direct_io_worker() would return < 0, it's callers
wouldn't raise -EIOCBQUEUED, and aio_complete() would be called. In the
future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
would be called for a second time which can manifest as an oops.The previous cleanups have whittled the sync and async completion paths down
to the point where we can collapse them and clearly reassert the invariant
that we must only call aio_complete() after returning -EIOCBQUEUED.
direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
drop the dio refcount and the aio bio completion path will only call
aio_complete() when it is the last to drop the dio refcount.
direct_io_worker() can ensure that it is the last to drop the reference count
by waiting for bios to drain. It does this for sync ops, of course, and for
partial dio writes that must fall back to buffered and for aio ops that saw
errors during submission.This means that operations that end up waiting, even if they were issued as
aio ops, will not call aio_complete() from dio. Instead we return the return
code of the operation and let the aio core call aio_complete(). This is
purposely done to fix a bug where AIO DIO file extensions would call
aio_complete() before their callers have a chance to update i_size.Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
no longer have to translate for it. XFS needs to be careful not to free
resources that will be used during AIO completion if -EIOCBQUEUED is returned.
We maintain the previous behaviour of trying to write fs metadata for O_SYNC
aio+dio writes.Signed-off-by: Zach Brown
Cc: Badari Pulavarty
Cc: Suparna Bhattacharya
Acked-by: Jeff Moyer
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Dec, 2006
1 commit
-
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the xfs
filesystem.Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Dec, 2006
2 commits
-
Make the workqueues used by XFS freezeable, so their worker threads don't
submit any I/O after the suspend image has been created.Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Cc: Nigel Cunningham
Cc: David Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move process freezing functions from include/linux/sched.h to freezer.h, so
that modifications to the freezer or the kernel configuration don't require
recompiling just about everything.[akpm@osdl.org: fix ueagle driver]
Signed-off-by: Nigel Cunningham
Cc: "Rafael J. Wysocki"
Cc: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Nov, 2006
1 commit
-
Fix up for make allyesconfig.
Signed-Off-By: David Howells
21 Nov, 2006
2 commits
-
SGI-PV: 958376
SGI-Modid: xfs-linux-melb:xfs-kern:27503aSigned-off-by: David Chinner
Signed-off-by: Tim Shimmin -
xfs_bmap_add_extent_delay_real()
SGI-PV: 957008
SGI-Modid: xfs-linux-melb:xfs-kern:27457aSigned-off-by: Lachlan McIlroy
Signed-off-by: Shailendra Tripathi
Signed-off-by: Tim Shimmin
11 Nov, 2006
7 commits
-
SGI-PV: 957005
SGI-Modid: xfs-linux-melb:xfs-kern:27398aSigned-off-by: David Chinner
Signed-off-by: Michal Piotrowski
Signed-off-by: Tim Shimmin -
The previous fixes for the use after free in xfs_iunpin left a nasty log
deadlock when xfslogd unpinned the inode and dropped the last reference to
the inode. the ->clear_inode() method can issue transactions, and if the
log was full, the transaction could push on the log and get stuck trying
to push the inode it was currently unpinning.To fix this, we provide xfs_iunpin a guarantee that it will always have a
valid xfs_inode linux inode link or a particular flag will be set on
the inode. We then use log forces during lookup to ensure transactions are
completed before we recycle the inode. This ensures that xfs_iunpin will
never use the linux inode after it is being freed, and any lookup on an
inode on the reclaim list will wait until it is safe to attach a new linux
inode to the xfs inode.SGI-PV: 956832
SGI-Modid: xfs-linux-melb:xfs-kern:27359aSigned-off-by: David Chinner
Signed-off-by: Shailendra Tripathi
Signed-off-by: Takenori Nagano
Signed-off-by: Tim Shimmin -
SGI-PV: 956832
SGI-Modid: xfs-linux-melb:xfs-kern:27358aSigned-off-by: David Chinner
Signed-off-by: Nathan Scott
Signed-off-by: Tim Shimmin -
SGI-PV: 956664
SGI-Modid: xfs-linux-melb:xfs-kern:27315aSigned-off-by: Vlad Apostolov
Signed-off-by: Sam Vaughan
Signed-off-by: Tim Shimmin -
SGI-PV: 957004
SGI-Modid: xfs-linux-melb:xfs-kern:27231aSigned-off-by: Vlad Apostolov
Signed-off-by: Tim Shimmin -
SGI-PV: 956964
SGI-Modid: xfs-linux-melb:xfs-kern:27200aSigned-off-by: Tim Shimmin
Signed-off-by: David Chinner
Signed-off-by: Eric Sandeen -
CONFIG_XFS_TRACE is on
SGI-PV: 956618
SGI-Modid: xfs-linux-melb:xfs-kern:27196aSigned-off-by: Vlad Apostolov
Signed-off-by: Tim Shimmin
21 Oct, 2006
1 commit
-
Separate out the concept of "queue congestion" from "backing-dev congestion".
Congestion is a backing-dev concept, not a queue concept.The blk_* congestion functions are retained, as wrappers around the core
backing-dev congestion functions.This proper layering is needed so that NFS can cleanly use the congestion
functions, and so that CONFIG_BLOCK=n actually links.Cc: "Thomas Maier"
Cc: "Jens Axboe"
Cc: Trond Myklebust
Cc: David Howells
Cc: Peter Osterlund
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Oct, 2006
1 commit
-
This patch converts two if () BUG(); construct to BUG_ON();
which occupies less space, uses unlikely and is safer when
BUG() is disabled.Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk
01 Oct, 2006
4 commits
-
This patch cleans up generic_file_*_read/write() interfaces. Christoph
Hellwig gave me the idea for this clean ups.In a nutshell, all filesystems should set .aio_read/.aio_write methods and use
do_sync_read/ do_sync_write() as their .read/.write methods. This allows us
to cleanup all variants of generic_file_* routines.Final available interfaces:
generic_file_aio_read() - read handler
generic_file_aio_write() - write handler
generic_file_aio_write_nolock() - no lock write handler__generic_file_aio_write_nolock() - internal worker routine
Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Cc: Michael Holzheu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.This patch does the following:
(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:(*) Block I/O tracing.
(*) Disk partition code.
(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.(*) MTD blockdev handling and FTL.
(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:(*) Default blockdev file operations (to give error ENODEV on opening).
(*) Makes some /proc changes:
(*) /proc/devices does not list any blockdevs.
(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.Signed-Off-By: David Howells
Signed-off-by: Jens Axboe
29 Sep, 2006
1 commit
-
Signed-off-by: Tim Shimmin
28 Sep, 2006
19 commits
-
SGI-PV: 955947
SGI-Modid: xfs-linux-melb:xfs-kern:26986aSigned-off-by: Vlad Apostolov
Signed-off-by: Tim Shimmin -
consistent in bulkstat
SGI-PV: 956241
SGI-Modid: xfs-linux-melb:xfs-kern:26984aSigned-off-by: Vlad Apostolov
Signed-off-by: Tim Shimmin -
kmem_zalloc_greedy()
SGI-PV: 956240
SGI-Modid: xfs-linux-melb:xfs-kern:26983aSigned-off-by: Vlad Apostolov
Signed-off-by: Tim Shimmin -
The previous attempts to fix the linux inode use-after-free in xfs_iunpin
simply made the problem harder to hit. We actually need complete exclusion
between xfs_reclaim and xfs_iunpin, as well as ensuring that the i_flags
are consistent during both of these functions. Introduce a new spinlock
for exclusion and the i_flags, and fix up xfs_iunpin to use igrab before
marking the inode dirty.SGI-PV: 952967
SGI-Modid: xfs-linux-melb:xfs-kern:26964aSigned-off-by: David Chinner
Signed-off-by: Tim Shimmin -
SGI-PV: 907752
SGI-Modid: xfs-linux-melb:xfs-kern:26925aSigned-off-by: Eric Sandeen
Signed-off-by: Nathan Scott
Signed-off-by: Tim Shimmin -
One sema to rule them all, one sema to find them...
SGI-PV: 907752
SGI-Modid: xfs-linux-melb:xfs-kern:26911aSigned-off-by: Eric Sandeen
Signed-off-by: Nathan Scott
Signed-off-by: Tim Shimmin -
ialloc_btree.
SGI-PV: 955302
SGI-Modid: xfs-linux-melb:xfs-kern:26910aSigned-off-by: Eric Sandeen
Signed-off-by: Nathan Scott
Signed-off-by: Tim Shimmin -
SGI-PV: 955696
SGI-Modid: xfs-linux-melb:xfs-kern:26908aSigned-off-by: Nathan Scott
Signed-off-by: Tim Shimmin -
SGI-PV: 955302
SGI-Modid: xfs-linux-melb:xfs-kern:26907aSigned-off-by: Nathan Scott
Signed-off-by: Tim Shimmin -
SGI-PV: 955157
SGI-Modid: xfs-linux-melb:xfs-kern:26869aSigned-off-by: Vlad Apostolov
Signed-off-by: Tim Shimmin -
SGI-PV: 955157
SGI-Modid: xfs-linux-melb:xfs-kern:26866aSigned-off-by: Vlad Apostolov
Signed-off-by: Tim Shimmin -
space for the unmount record - which becomes a problem in the freeze/thaw
scenario.SGI-PV: 942533
SGI-Modid: xfs-linux-melb:xfs-kern:26815aSigned-off-by: Tim Shimmin
-
xfs_trans_delete_ail
xfs_trans_update_ail and xfs_trans_delete_ail get called with the AIL lock
held, and release it. Add lock annotations to these two functions so that
sparse can check callers for lock pairing, and so that sparse will not
complain about these functions since they intentionally use locks in this
manner.SGI-PV: 954580
SGI-Modid: xfs-linux-melb:xfs-kern:26807aSigned-off-by: Josh Triplett
Signed-off-by: Nathan Scott
Signed-off-by: Tim Shimmin -
SGI-PV: 955515
SGI-Modid: xfs-linux-melb:xfs-kern:26806aSigned-off-by: Nathan Scott
Signed-off-by: Tim Shimmin -
positives.
SGI-PV: 955502
SGI-Modid: xfs-linux-melb:xfs-kern:26805aSigned-off-by: Nathan Scott
Signed-off-by: Tim Shimmin -
handling.
SGI-PV: 955302
SGI-Modid: xfs-linux-melb:xfs-kern:26804aSigned-off-by: Nathan Scott
Signed-off-by: Tim Shimmin -
range.
SGI-PV: 955302
SGI-Modid: xfs-linux-melb:xfs-kern:26803aSigned-off-by: Nathan Scott
Signed-off-by: Tim Shimmin -
SGI-PV: 955302
SGI-Modid: xfs-linux-melb:xfs-kern:26802aSigned-off-by: Nathan Scott
Signed-off-by: Tim Shimmin -
buffers.
SGI-PV: 955302
SGI-Modid: xfs-linux-melb:xfs-kern:26801aSigned-off-by: Nathan Scott
Signed-off-by: Tim Shimmin