17 May, 2007
1 commit
-
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
Signed-off-by: Christoph Lameter
Cc: David Howells
Cc: Jens Axboe
Cc: Steven French
Cc: Michael Halcrow
Cc: OGAWA Hirofumi
Cc: Miklos Szeredi
Cc: Steven Whitehouse
Cc: Roman Zippel
Cc: David Woodhouse
Cc: Dave Kleikamp
Cc: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: Anton Altaparmakov
Cc: Mark Fasheh
Cc: Paul Mackerras
Cc: Christoph Hellwig
Cc: Jan Kara
Cc: David Chinner
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 May, 2007
2 commits
-
* git://oss.sgi.com:8090/xfs/xfs-2.6:
[XFS] Add lockdep support for XFS
[XFS] Fix race in xfs_write() b/w dmapi callout and direct I/O checks.
[XFS] Get rid of redundant "required" in msg.
[XFS] Export via a function xfs_buftarg_list for use by kdb/xfsidbg.
[XFS] Remove unused ilen variable and references.
[XFS] Fix to prevent the notorious 'NULL files' problem after a crash.
[XFS] Fix race condition in xfs_write().
[XFS] Fix uquota and oquota enforcement problems.
[XFS] propogate return codes from flush routines
[XFS] Fix quotaon syscall failures for group enforcement requests.
[XFS] Invalidate quotacheck when mounting without a quota type.
[XFS] reducing the number of random number functions.
[XFS] remove more misc. unused args
[XFS] the "aendp" arg to xfs_dir2_data_freescan is always NULL, remove it.
[XFS] The last argument "lsn" of xfs_trans_commit() is always called with -
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Monakhov Dmitriy
Cc: Christoph Hellwig
Acked-by: Anton Altaparmakov
Acked-by: David Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 May, 2007
7 commits
-
SGI-PV: 963965
SGI-Modid: xfs-linux-melb:xfs-kern:28485aSigned-off-by: Lachlan McIlroy
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin -
In xfs_write() the iolock is dropped and reacquired in XFS_SEND_DATA()
which means that the file could change from not-cached to cached and we
need to redo the direct I/O checks. We should also redo the direct I/O
checks when the file size changes regardless if O_APPEND is set or not.SGI-PV: 963483
SGI-Modid: xfs-linux-melb:xfs-kern:28440aSigned-off-by: Lachlan McIlroy
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin -
SGI-PV: 963465
SGI-Modid: xfs-linux-melb:xfs-kern:28414aSigned-off-by: Tim Shimmin
Signed-off-by: Lachlan McIlroy -
The problem that has been addressed is that of synchronising updates of
the file size with writes that extend a file. Without the fix the update
of a file's size, as a result of a write beyond eof, is independent of
when the cached data is flushed to disk. Often the file size update would
be written to the filesystem log before the data is flushed to disk. When
a system crashes between these two events and the filesystem log is
replayed on mount the file's size will be set but since the contents never
made it to disk the file is full of holes. If some of the cached data was
flushed to disk then it may just be a section of the file at the end that
has holes.There are existing fixes to help alleviate this problem, particularly in
the case where a file has been truncated, that force cached data to be
flushed to disk when the file is closed. If the system crashes while the
file(s) are still open then this flushing will never occur.The fix that we have implemented is to introduce a second file size,
called the in-memory file size, that represents the current file size as
viewed by the user. The existing file size, called the on-disk file size,
is the one that get's written to the filesystem log and we only update it
when it is safe to do so. When we write to a file beyond eof we only
update the in- memory file size in the write operation. Later when the I/O
operation, that flushes the cached data to disk completes, an I/O
completion routine will update the on-disk file size. The on-disk file
size will be updated to the maximum offset of the I/O or to the value of
the in-memory file size if the I/O includes eof.SGI-PV: 958522
SGI-Modid: xfs-linux-melb:xfs-kern:28322aSigned-off-by: Lachlan McIlroy
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin -
This change addresses a race in xfs_write() where, for direct I/O, the
flags need_i_mutex and need_flush are setup before the iolock is acquired.
The logic used to setup the flags may change between setting the flags and
acquiring the iolock resulting in these flags having incorrect values. For
example, if a file is not currently cached then need_i_mutex is set to
zero and then if the file is cached before the iolock is acquired we will
fail to do the flushinval before the direct write.The flush (and also the call to xfs_zero_eof()) need to be done with the
iolock held exclusive so we need to acquire the iolock before checking for
cached data (or if the write begins after eof) to prevent this state from
changing. For direct I/O I've chosen to always acquire the iolock in
shared mode initially and if there is a need to promote it then drop it
and reacquire it.There's also some other tidy-ups including removing the O_APPEND offset
adjustment since that work is done in generic_write_checks() (and we don't
use offset as an input parameter anywhere).SGI-PV: 962170
SGI-Modid: xfs-linux-melb:xfs-kern:28319aSigned-off-by: Lachlan McIlroy
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin -
This patch handles error return values in fs_flush_pages and
fs_flushinval_pages. It changes the prototype of fs_flushinval_pages so we
can propogate the errors and handle them at higher layers. I also modified
xfs_itruncate_start so that it could propogate the error further.SGI-PV: 961990
SGI-Modid: xfs-linux-melb:xfs-kern:28231aSigned-off-by: Lachlan McIlroy
Signed-off-by: Stewart Smith
Signed-off-by: Tim Shimmin -
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Mar, 2007
1 commit
-
Since freezable workqueues are broken in 2.6.21-rc
(cf. http://marc.theaimsgroup.com/?l=linux-kernel&m=116855740612755,
http://marc.theaimsgroup.com/?l=linux-kernel&m=117261312523921&w=2)
it's better to change the only user of them, which is XFS, to use "normal"
nonfreezable workqueues.Signed-off-by: Rafael J. Wysocki
Cc: Pavel Machek
Cc: David Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
21 Feb, 2007
1 commit
-
fs/xfs/linux-2.6/xfs_super.c:903: warning: 'noinline' attribute ignored
Cc: David Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Feb, 2007
2 commits
-
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name. Which is
pain for caching and the proc interface never implemented.I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.Signed-off-by: Eric W. Biederman
Acked-by: Ralf Baechle
Acked-by: Martin Schwidefsky
Cc: Russell King
Cc: David Howells
Cc: "Luck, Tony"
Cc: Ralf Baechle
Cc: Paul Mackerras
Cc: Martin Schwidefsky
Cc: Andi Kleen
Cc: Jens Axboe
Cc: Corey Minyard
Cc: Neil Brown
Cc: "John W. Linville"
Cc: James Bottomley
Cc: Jan Kara
Cc: Trond Myklebust
Cc: Mark Fasheh
Cc: David Chinner
Cc: "David S. Miller"
Cc: Patrick McHardy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there. Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm. I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).Signed-off-by: Tim Schmielau
Acked-by: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Feb, 2007
3 commits
-
Many struct inode_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Don't hide buffer_unwritten behind buffer_delay() and remove the hack that
clears unexpected buffer_unwritten() states now that it can't happen.Signed-off-by: Dave Chinner
Acked-by: Christoph Hellwig
Cc: Timothy Shimmin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently, XFS uses BH_PrivateStart for flagging unwritten extent state in a
bufferhead. Recently, I found the long standing mmap/unwritten extent
conversion bug, and it was to do with partial page invalidation not clearing
the unwritten flag from bufferheads attached to the page but beyond EOF. See
here for a full explaination:http://oss.sgi.com/archives/xfs/2006-12/msg00196.html
The solution I have checked into the XFS dev tree involves duplicating code
from block_invalidatepage to clear the unwritten flag from the bufferhead(s),
and then calling block_invalidatepage() to do the rest.Christoph suggested that this would be better solved by pushing the unwritten
flag into the common buffer head flags and just adding the call to
discard_buffer():http://oss.sgi.com/archives/xfs/2006-12/msg00239.html
The following patch makes BH_Unwritten a first class citizen.
Signed-off-by: Dave Chinner
Acked-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Feb, 2007
15 commits
-
kmap() is inefficient and does not scale well. kmap_atomic() is a better
choice. Use the generic wrapper function instead of open coding the
kmap-memset-dcache flush-kunmap stuff.SGI-PV: 960904
SGI-Modid: xfs-linux-melb:xfs-kern:28041aSigned-off-by: David Chinner
Signed-off-by: Christoph Hellwig
Signed-off-by: Tim Shimmin -
xfs_mac.h and xfs_cap.h provide definitions and macros that aren't used
anywhere in XFS at all. They are left-overs from "to be implement at some
point in the future" functionality that Irix XFS has. If this
functionality ever goes into Linux, it will be provided at a different
layer, most likely through the security hooks in the kernel so we will
never need this functionality in XFS.Patch provided by Eric Sandeen (sandeen@sandeen.net).
SGI-PV: 960895
SGI-Modid: xfs-linux-melb:xfs-kern:28036aSigned-off-by: Eric Sandeen
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin -
Fixes a few small issues (mostly cosmetic) that were picked up during the
review cycle for the last set of freeze path changes.SGI-PV: 959267
SGI-Modid: xfs-linux-melb:xfs-kern:28035aSigned-off-by: David Chinner
Signed-off-by: Christoph Hellwig
Signed-off-by: Tim Shimmin -
Use the the generic VFS attr flags where appropriate instead of open
coding them to the same values.Patch provided by Eric Sandeen.
SGI-PV: 960868
SGI-Modid: xfs-linux-melb:xfs-kern:28033aSigned-off-by: Eric Sandeen
Signed-off-by: David Chinner
Signed-off-by: Christoph Hellwig
Signed-off-by: Tim Shimmin -
wake_up's implementation does an implicit memory barrier so the explicit
memory barrier is not needed in vfs_sync_worker.Patch provided by Ralf Baechle.
SGI-PV: 960867
SGI-Modid: xfs-linux-melb:xfs-kern:28032aSigned-off-by: Ralf Baechle
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin -
Removes unneeded sysctl insert at head behaviour. Cleans up sysctl
definitions to use C99 initialisers. Patch provided by Eric W. Biederman.SGI-PV: 960192
SGI-Modid: xfs-linux-melb:xfs-kern:28031aSigned-off-by: Eric W. Biederman
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin -
The problem is the two callers of xfs_iozero() are rounding out the range
to be zeroed to the end of a fsb and in some cases this extends past the
new eof. The call to commit_write() in xfs_iozero() will cause the Linux
inode's file size to be set too high.SGI-PV: 960788
SGI-Modid: xfs-linux-melb:xfs-kern:28013aSigned-off-by: Lachlan McIlroy
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin -
record.
The current Linux XFS freeze code is a mess. We flush the metadata buffers
out while we are still allowing new transactions to start and then fail to
flush the dirty buffers back out before writing the unmount and dummy
records to the log.This leads to problems when the frozen filesystem is used for snapshots -
we do log recovery on a readonly image and often it appears that the log
image in the snapshot is not correct. Hence we end up with hangs, oops and
mount failures when trying to mount a snapshot image that has been created
when the filesystem has not been correctly frozen.To fix this, we need to move th metadata flush to after we wait for all
current transactions to complete in teh second stage of the freeze. This
means that when we write the final log records, the log should be clean
and recovery should never occur on a snapshot image created from a frozen
filesystem.SGI-PV: 959267
SGI-Modid: xfs-linux-melb:xfs-kern:28010aSigned-off-by: David Chinner
Signed-off-by: Donald Douwsma
Signed-off-by: Tim Shimmin -
When writing less than a filesystem block of data into an unwritten extent
via buffered I/O, __xfs_get_blocks fails to set the buffer new flag. As a
result, the generic code will not zero either edge of the block resulting
in garbage being written to disk either side of the real data. Set the
buffer new state on bufferd writes to unwritten extents to ensure that
zeroing occurs.SGI-PV: 960328
SGI-Modid: xfs-linux-melb:xfs-kern:28000aSigned-off-by: David Chinner
Signed-off-by: Lachlan McIlroy
Signed-off-by: Tim Shimmin -
SGI-PV: 959140
SGI-Modid: xfs-linux-melb:xfs-kern:27712aSigned-off-by: Lachlan McIlroy
Signed-off-by: Eric Sandeen
Signed-off-by: Tim Shimmin -
functions, but they
a) ignore the flags parameter completely, and b) are never called
directly, only via the flag-less defines anywaySo, drop the #define indirection, and rename mraccessf to mraccess, etc.
SGI-PV: 959138
SGI-Modid: xfs-linux-melb:xfs-kern:27711aSigned-off-by: Lachlan McIlroy
Signed-off-by: Eric Sandeen
Signed-off-by: Tim Shimmin -
SGI-PV: 954580
SGI-Modid: xfs-linux-melb:xfs-kern:27701aSigned-off-by: Lachlan McIlroy
Signed-off-by: Christoph Hellwig
Signed-off-by: Tim Shimmin -
gcc-4.1 and more recent aggressively inline static functions which
increases XFS stack usage by ~15% in critical paths. Prevent this from
occurring by adding noinline to the STATIC definition.Also uninline some functions that are too large to be inlined and were
causing problems with CONFIG_FORCED_INLINING=y.Finally, clean up all the different users of inline, __inline and
__inline__ and put them under one STATIC_INLINE macro. For debug kernels
the STATIC_INLINE macro uninlines those functions.SGI-PV: 957159
SGI-Modid: xfs-linux-melb:xfs-kern:27585aSigned-off-by: David Chinner
Signed-off-by: David Chatterton
Signed-off-by: Tim Shimmin -
The {test,set,clear}_bit() operations take a bit index for the bit to
operate on. The XBT_* flags are defined as bit fields which is incorrect,
not to mention the way the bit fields are enumerated is broken too. This
was only working by chance.Fix the definitions of the flags and make the code using them use the
{test,set,clear}_bit() operations correctly.SGI-PV: 958639
SGI-Modid: xfs-linux-melb:xfs-kern:27565aSigned-off-by: David Chinner
Signed-off-by: Tim Shimmin -
At the last stage of a freeze, we flush the buftarg synchronously over and
over again until it succeeds twice without skipping any buffers.The delwri list flush skips pinned buffers, but tries to flush all others.
It removes the buffers from the delwri list, then tries to lock them one
at a time as it traverses the list to issue the I/O. It holds them locked
until we issue all of the I/O and then unlocks them once we've waited for
it to complete.The problem is that during a freeze, the filesystem may still be doing
stuff - like flushing delalloc data buffers - in the background and hence
we can be trying to lock buffers that were on the delwri list at the same
time. Hence we can get ABBA deadlocks between threads doing allocation and
the buftarg flush (freeze) thread.Fix it by skipping locked (and pinned) buffers as we traverse the delwri
buffer list.SGI-PV: 957195
SGI-Modid: xfs-linux-melb:xfs-kern:27535aSigned-off-by: David Chinner
Signed-off-by: Tim Shimmin
22 Dec, 2006
1 commit
-
XFS appears to call clear_page_dirty to get the mapping tree dirty tag
set correctly at the same time the page dirty flag is cleared. I note
that this can be done by set_page_writeback() if we clear the dirty flag
on the page first when we are writing back the entire page.Hence it seems to me that the XFS call to clear_page_dirty() could
easily be substituted by clear_page_dirty_for_io() followed by a call to
set_page_writeback() to get the mapping tree tags set correctly after
the page has been marked clean.Signed-off-by: Linus Torvalds
11 Dec, 2006
1 commit
-
The only time it is safe to call aio_complete() is when the ->ki_retry
function returns -EIOCBQUEUED to the AIO core. direct_io_worker() has
historically done this by relying on its caller to translate positive return
codes into -EIOCBQUEUED for the aio case. It did this by trying to keep
conditionals in sync. direct_io_worker() knew when finished_one_bio() was
going to call aio_complete(). It would reverse the test and wait and free the
dio in the cases it thought that finished_one_bio() wasn't going to.Not surprisingly, it ended up getting it wrong. 'ret' could be a negative
errno from the submission path but it failed to communicate this to
finished_one_bio(). direct_io_worker() would return < 0, it's callers
wouldn't raise -EIOCBQUEUED, and aio_complete() would be called. In the
future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
would be called for a second time which can manifest as an oops.The previous cleanups have whittled the sync and async completion paths down
to the point where we can collapse them and clearly reassert the invariant
that we must only call aio_complete() after returning -EIOCBQUEUED.
direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
drop the dio refcount and the aio bio completion path will only call
aio_complete() when it is the last to drop the dio refcount.
direct_io_worker() can ensure that it is the last to drop the reference count
by waiting for bios to drain. It does this for sync ops, of course, and for
partial dio writes that must fall back to buffered and for aio ops that saw
errors during submission.This means that operations that end up waiting, even if they were issued as
aio ops, will not call aio_complete() from dio. Instead we return the return
code of the operation and let the aio core call aio_complete(). This is
purposely done to fix a bug where AIO DIO file extensions would call
aio_complete() before their callers have a chance to update i_size.Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
no longer have to translate for it. XFS needs to be careful not to free
resources that will be used during AIO completion if -EIOCBQUEUED is returned.
We maintain the previous behaviour of trying to write fs metadata for O_SYNC
aio+dio writes.Signed-off-by: Zach Brown
Cc: Badari Pulavarty
Cc: Suparna Bhattacharya
Acked-by: Jeff Moyer
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Dec, 2006
1 commit
-
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the xfs
filesystem.Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Dec, 2006
2 commits
-
Make the workqueues used by XFS freezeable, so their worker threads don't
submit any I/O after the suspend image has been created.Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Cc: Nigel Cunningham
Cc: David Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move process freezing functions from include/linux/sched.h to freezer.h, so
that modifications to the freezer or the kernel configuration don't require
recompiling just about everything.[akpm@osdl.org: fix ueagle driver]
Signed-off-by: Nigel Cunningham
Cc: "Rafael J. Wysocki"
Cc: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Nov, 2006
1 commit
-
Fix up for make allyesconfig.
Signed-Off-By: David Howells
11 Nov, 2006
2 commits
-
SGI-PV: 957005
SGI-Modid: xfs-linux-melb:xfs-kern:27398aSigned-off-by: David Chinner
Signed-off-by: Michal Piotrowski
Signed-off-by: Tim Shimmin -
SGI-PV: 956832
SGI-Modid: xfs-linux-melb:xfs-kern:27358aSigned-off-by: David Chinner
Signed-off-by: Nathan Scott
Signed-off-by: Tim Shimmin