17 May, 2007

1 commit

  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 May, 2007

2 commits

  • * git://oss.sgi.com:8090/xfs/xfs-2.6:
    [XFS] Add lockdep support for XFS
    [XFS] Fix race in xfs_write() b/w dmapi callout and direct I/O checks.
    [XFS] Get rid of redundant "required" in msg.
    [XFS] Export via a function xfs_buftarg_list for use by kdb/xfsidbg.
    [XFS] Remove unused ilen variable and references.
    [XFS] Fix to prevent the notorious 'NULL files' problem after a crash.
    [XFS] Fix race condition in xfs_write().
    [XFS] Fix uquota and oquota enforcement problems.
    [XFS] propogate return codes from flush routines
    [XFS] Fix quotaon syscall failures for group enforcement requests.
    [XFS] Invalidate quotacheck when mounting without a quota type.
    [XFS] reducing the number of random number functions.
    [XFS] remove more misc. unused args
    [XFS] the "aendp" arg to xfs_dir2_data_freescan is always NULL, remove it.
    [XFS] The last argument "lsn" of xfs_trans_commit() is always called with

    Linus Torvalds
     
  • [akpm@linux-foundation.org: cleanup]
    Signed-off-by: Monakhov Dmitriy
    Cc: Christoph Hellwig
    Acked-by: Anton Altaparmakov
    Acked-by: David Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitriy Monakhov
     

08 May, 2007

7 commits

  • SGI-PV: 963965
    SGI-Modid: xfs-linux-melb:xfs-kern:28485a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Lachlan McIlroy
     
  • In xfs_write() the iolock is dropped and reacquired in XFS_SEND_DATA()
    which means that the file could change from not-cached to cached and we
    need to redo the direct I/O checks. We should also redo the direct I/O
    checks when the file size changes regardless if O_APPEND is set or not.

    SGI-PV: 963483
    SGI-Modid: xfs-linux-melb:xfs-kern:28440a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Lachlan McIlroy
     
  • SGI-PV: 963465
    SGI-Modid: xfs-linux-melb:xfs-kern:28414a

    Signed-off-by: Tim Shimmin
    Signed-off-by: Lachlan McIlroy

    Tim Shimmin
     
  • The problem that has been addressed is that of synchronising updates of
    the file size with writes that extend a file. Without the fix the update
    of a file's size, as a result of a write beyond eof, is independent of
    when the cached data is flushed to disk. Often the file size update would
    be written to the filesystem log before the data is flushed to disk. When
    a system crashes between these two events and the filesystem log is
    replayed on mount the file's size will be set but since the contents never
    made it to disk the file is full of holes. If some of the cached data was
    flushed to disk then it may just be a section of the file at the end that
    has holes.

    There are existing fixes to help alleviate this problem, particularly in
    the case where a file has been truncated, that force cached data to be
    flushed to disk when the file is closed. If the system crashes while the
    file(s) are still open then this flushing will never occur.

    The fix that we have implemented is to introduce a second file size,
    called the in-memory file size, that represents the current file size as
    viewed by the user. The existing file size, called the on-disk file size,
    is the one that get's written to the filesystem log and we only update it
    when it is safe to do so. When we write to a file beyond eof we only
    update the in- memory file size in the write operation. Later when the I/O
    operation, that flushes the cached data to disk completes, an I/O
    completion routine will update the on-disk file size. The on-disk file
    size will be updated to the maximum offset of the I/O or to the value of
    the in-memory file size if the I/O includes eof.

    SGI-PV: 958522
    SGI-Modid: xfs-linux-melb:xfs-kern:28322a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Lachlan McIlroy
     
  • This change addresses a race in xfs_write() where, for direct I/O, the
    flags need_i_mutex and need_flush are setup before the iolock is acquired.
    The logic used to setup the flags may change between setting the flags and
    acquiring the iolock resulting in these flags having incorrect values. For
    example, if a file is not currently cached then need_i_mutex is set to
    zero and then if the file is cached before the iolock is acquired we will
    fail to do the flushinval before the direct write.

    The flush (and also the call to xfs_zero_eof()) need to be done with the
    iolock held exclusive so we need to acquire the iolock before checking for
    cached data (or if the write begins after eof) to prevent this state from
    changing. For direct I/O I've chosen to always acquire the iolock in
    shared mode initially and if there is a need to promote it then drop it
    and reacquire it.

    There's also some other tidy-ups including removing the O_APPEND offset
    adjustment since that work is done in generic_write_checks() (and we don't
    use offset as an input parameter anywhere).

    SGI-PV: 962170
    SGI-Modid: xfs-linux-melb:xfs-kern:28319a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Lachlan McIlroy
     
  • This patch handles error return values in fs_flush_pages and
    fs_flushinval_pages. It changes the prototype of fs_flushinval_pages so we
    can propogate the errors and handle them at higher layers. I also modified
    xfs_itruncate_start so that it could propogate the error further.

    SGI-PV: 961990
    SGI-Modid: xfs-linux-melb:xfs-kern:28231a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Stewart Smith
    Signed-off-by: Tim Shimmin

    Lachlan McIlroy
     
  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

23 Mar, 2007

1 commit

  • Since freezable workqueues are broken in 2.6.21-rc
    (cf. http://marc.theaimsgroup.com/?l=linux-kernel&m=116855740612755,
    http://marc.theaimsgroup.com/?l=linux-kernel&m=117261312523921&w=2)
    it's better to change the only user of them, which is XFS, to use "normal"
    nonfreezable workqueues.

    Signed-off-by: Rafael J. Wysocki
    Cc: Pavel Machek
    Cc: David Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

21 Feb, 2007

1 commit


15 Feb, 2007

2 commits

  • The semantic effect of insert_at_head is that it would allow new registered
    sysctl entries to override existing sysctl entries of the same name. Which is
    pain for caching and the proc interface never implemented.

    I have done an audit and discovered that none of the current users of
    register_sysctl care as (excpet for directories) they do not register
    duplicate sysctl entries.

    So this patch simply removes the support for overriding existing entries in
    the sys_sysctl interface since no one uses it or cares and it makes future
    enhancments harder.

    Signed-off-by: Eric W. Biederman
    Acked-by: Ralf Baechle
    Acked-by: Martin Schwidefsky
    Cc: Russell King
    Cc: David Howells
    Cc: "Luck, Tony"
    Cc: Ralf Baechle
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Andi Kleen
    Cc: Jens Axboe
    Cc: Corey Minyard
    Cc: Neil Brown
    Cc: "John W. Linville"
    Cc: James Bottomley
    Cc: Jan Kara
    Cc: Trond Myklebust
    Cc: Mark Fasheh
    Cc: David Chinner
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • After Al Viro (finally) succeeded in removing the sched.h #include in module.h
    recently, it makes sense again to remove other superfluous sched.h includes.
    There are quite a lot of files which include it but don't actually need
    anything defined in there. Presumably these includes were once needed for
    macros that used to live in sched.h, but moved to other header files in the
    course of cleaning it up.

    To ease the pain, this time I did not fiddle with any header files and only
    removed #includes from .c-files, which tend to cause less trouble.

    Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
    arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
    allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
    configs in arch/arm/configs on arm. I also checked that no new warnings were
    introduced by the patch (actually, some warnings are removed that were emitted
    by unnecessarily included header files).

    Signed-off-by: Tim Schmielau
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

13 Feb, 2007

3 commits

  • Many struct inode_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Don't hide buffer_unwritten behind buffer_delay() and remove the hack that
    clears unexpected buffer_unwritten() states now that it can't happen.

    Signed-off-by: Dave Chinner
    Acked-by: Christoph Hellwig
    Cc: Timothy Shimmin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Chinner
     
  • Currently, XFS uses BH_PrivateStart for flagging unwritten extent state in a
    bufferhead. Recently, I found the long standing mmap/unwritten extent
    conversion bug, and it was to do with partial page invalidation not clearing
    the unwritten flag from bufferheads attached to the page but beyond EOF. See
    here for a full explaination:

    http://oss.sgi.com/archives/xfs/2006-12/msg00196.html

    The solution I have checked into the XFS dev tree involves duplicating code
    from block_invalidatepage to clear the unwritten flag from the bufferhead(s),
    and then calling block_invalidatepage() to do the rest.

    Christoph suggested that this would be better solved by pushing the unwritten
    flag into the common buffer head flags and just adding the call to
    discard_buffer():

    http://oss.sgi.com/archives/xfs/2006-12/msg00239.html

    The following patch makes BH_Unwritten a first class citizen.

    Signed-off-by: Dave Chinner
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Chinner
     

10 Feb, 2007

15 commits

  • kmap() is inefficient and does not scale well. kmap_atomic() is a better
    choice. Use the generic wrapper function instead of open coding the
    kmap-memset-dcache flush-kunmap stuff.

    SGI-PV: 960904
    SGI-Modid: xfs-linux-melb:xfs-kern:28041a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • xfs_mac.h and xfs_cap.h provide definitions and macros that aren't used
    anywhere in XFS at all. They are left-overs from "to be implement at some
    point in the future" functionality that Irix XFS has. If this
    functionality ever goes into Linux, it will be provided at a different
    layer, most likely through the security hooks in the kernel so we will
    never need this functionality in XFS.

    Patch provided by Eric Sandeen (sandeen@sandeen.net).

    SGI-PV: 960895
    SGI-Modid: xfs-linux-melb:xfs-kern:28036a

    Signed-off-by: Eric Sandeen
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Eric Sandeen
     
  • Fixes a few small issues (mostly cosmetic) that were picked up during the
    review cycle for the last set of freeze path changes.

    SGI-PV: 959267
    SGI-Modid: xfs-linux-melb:xfs-kern:28035a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • Use the the generic VFS attr flags where appropriate instead of open
    coding them to the same values.

    Patch provided by Eric Sandeen.

    SGI-PV: 960868
    SGI-Modid: xfs-linux-melb:xfs-kern:28033a

    Signed-off-by: Eric Sandeen
    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    Eric Sandeen
     
  • wake_up's implementation does an implicit memory barrier so the explicit
    memory barrier is not needed in vfs_sync_worker.

    Patch provided by Ralf Baechle.

    SGI-PV: 960867
    SGI-Modid: xfs-linux-melb:xfs-kern:28032a

    Signed-off-by: Ralf Baechle
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Ralf Baechle
     
  • Removes unneeded sysctl insert at head behaviour. Cleans up sysctl
    definitions to use C99 initialisers. Patch provided by Eric W. Biederman.

    SGI-PV: 960192
    SGI-Modid: xfs-linux-melb:xfs-kern:28031a

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Eric W. Biederman
     
  • The problem is the two callers of xfs_iozero() are rounding out the range
    to be zeroed to the end of a fsb and in some cases this extends past the
    new eof. The call to commit_write() in xfs_iozero() will cause the Linux
    inode's file size to be set too high.

    SGI-PV: 960788
    SGI-Modid: xfs-linux-melb:xfs-kern:28013a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Lachlan McIlroy
     
  • record.

    The current Linux XFS freeze code is a mess. We flush the metadata buffers
    out while we are still allowing new transactions to start and then fail to
    flush the dirty buffers back out before writing the unmount and dummy
    records to the log.

    This leads to problems when the frozen filesystem is used for snapshots -
    we do log recovery on a readonly image and often it appears that the log
    image in the snapshot is not correct. Hence we end up with hangs, oops and
    mount failures when trying to mount a snapshot image that has been created
    when the filesystem has not been correctly frozen.

    To fix this, we need to move th metadata flush to after we wait for all
    current transactions to complete in teh second stage of the freeze. This
    means that when we write the final log records, the log should be clean
    and recovery should never occur on a snapshot image created from a frozen
    filesystem.

    SGI-PV: 959267
    SGI-Modid: xfs-linux-melb:xfs-kern:28010a

    Signed-off-by: David Chinner
    Signed-off-by: Donald Douwsma
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • When writing less than a filesystem block of data into an unwritten extent
    via buffered I/O, __xfs_get_blocks fails to set the buffer new flag. As a
    result, the generic code will not zero either edge of the block resulting
    in garbage being written to disk either side of the real data. Set the
    buffer new state on bufferd writes to unwritten extents to ensure that
    zeroing occurs.

    SGI-PV: 960328
    SGI-Modid: xfs-linux-melb:xfs-kern:28000a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • SGI-PV: 959140
    SGI-Modid: xfs-linux-melb:xfs-kern:27712a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Eric Sandeen
    Signed-off-by: Tim Shimmin

    Lachlan McIlroy
     
  • functions, but they

    a) ignore the flags parameter completely, and b) are never called
    directly, only via the flag-less defines anyway

    So, drop the #define indirection, and rename mraccessf to mraccess, etc.

    SGI-PV: 959138
    SGI-Modid: xfs-linux-melb:xfs-kern:27711a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Eric Sandeen
    Signed-off-by: Tim Shimmin

    Lachlan McIlroy
     
  • SGI-PV: 954580
    SGI-Modid: xfs-linux-melb:xfs-kern:27701a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    Lachlan McIlroy
     
  • gcc-4.1 and more recent aggressively inline static functions which
    increases XFS stack usage by ~15% in critical paths. Prevent this from
    occurring by adding noinline to the STATIC definition.

    Also uninline some functions that are too large to be inlined and were
    causing problems with CONFIG_FORCED_INLINING=y.

    Finally, clean up all the different users of inline, __inline and
    __inline__ and put them under one STATIC_INLINE macro. For debug kernels
    the STATIC_INLINE macro uninlines those functions.

    SGI-PV: 957159
    SGI-Modid: xfs-linux-melb:xfs-kern:27585a

    Signed-off-by: David Chinner
    Signed-off-by: David Chatterton
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • The {test,set,clear}_bit() operations take a bit index for the bit to
    operate on. The XBT_* flags are defined as bit fields which is incorrect,
    not to mention the way the bit fields are enumerated is broken too. This
    was only working by chance.

    Fix the definitions of the flags and make the code using them use the
    {test,set,clear}_bit() operations correctly.

    SGI-PV: 958639
    SGI-Modid: xfs-linux-melb:xfs-kern:27565a

    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • At the last stage of a freeze, we flush the buftarg synchronously over and
    over again until it succeeds twice without skipping any buffers.

    The delwri list flush skips pinned buffers, but tries to flush all others.
    It removes the buffers from the delwri list, then tries to lock them one
    at a time as it traverses the list to issue the I/O. It holds them locked
    until we issue all of the I/O and then unlocks them once we've waited for
    it to complete.

    The problem is that during a freeze, the filesystem may still be doing
    stuff - like flushing delalloc data buffers - in the background and hence
    we can be trying to lock buffers that were on the delwri list at the same
    time. Hence we can get ABBA deadlocks between threads doing allocation and
    the buftarg flush (freeze) thread.

    Fix it by skipping locked (and pinned) buffers as we traverse the delwri
    buffer list.

    SGI-PV: 957195
    SGI-Modid: xfs-linux-melb:xfs-kern:27535a

    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    David Chinner
     

22 Dec, 2006

1 commit

  • XFS appears to call clear_page_dirty to get the mapping tree dirty tag
    set correctly at the same time the page dirty flag is cleared. I note
    that this can be done by set_page_writeback() if we clear the dirty flag
    on the page first when we are writing back the entire page.

    Hence it seems to me that the XFS call to clear_page_dirty() could
    easily be substituted by clear_page_dirty_for_io() followed by a call to
    set_page_writeback() to get the mapping tree tags set correctly after
    the page has been marked clean.

    Signed-off-by: Linus Torvalds

    David Chinner
     

11 Dec, 2006

1 commit

  • The only time it is safe to call aio_complete() is when the ->ki_retry
    function returns -EIOCBQUEUED to the AIO core. direct_io_worker() has
    historically done this by relying on its caller to translate positive return
    codes into -EIOCBQUEUED for the aio case. It did this by trying to keep
    conditionals in sync. direct_io_worker() knew when finished_one_bio() was
    going to call aio_complete(). It would reverse the test and wait and free the
    dio in the cases it thought that finished_one_bio() wasn't going to.

    Not surprisingly, it ended up getting it wrong. 'ret' could be a negative
    errno from the submission path but it failed to communicate this to
    finished_one_bio(). direct_io_worker() would return < 0, it's callers
    wouldn't raise -EIOCBQUEUED, and aio_complete() would be called. In the
    future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
    would be called for a second time which can manifest as an oops.

    The previous cleanups have whittled the sync and async completion paths down
    to the point where we can collapse them and clearly reassert the invariant
    that we must only call aio_complete() after returning -EIOCBQUEUED.
    direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
    drop the dio refcount and the aio bio completion path will only call
    aio_complete() when it is the last to drop the dio refcount.
    direct_io_worker() can ensure that it is the last to drop the reference count
    by waiting for bios to drain. It does this for sync ops, of course, and for
    partial dio writes that must fall back to buffered and for aio ops that saw
    errors during submission.

    This means that operations that end up waiting, even if they were issued as
    aio ops, will not call aio_complete() from dio. Instead we return the return
    code of the operation and let the aio core call aio_complete(). This is
    purposely done to fix a bug where AIO DIO file extensions would call
    aio_complete() before their callers have a chance to update i_size.

    Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
    no longer have to translate for it. XFS needs to be careful not to free
    resources that will be used during AIO completion if -EIOCBQUEUED is returned.
    We maintain the previous behaviour of trying to write fs metadata for O_SYNC
    aio+dio writes.

    Signed-off-by: Zach Brown
    Cc: Badari Pulavarty
    Cc: Suparna Bhattacharya
    Acked-by: Jeff Moyer
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     

09 Dec, 2006

1 commit


08 Dec, 2006

2 commits


22 Nov, 2006

1 commit


11 Nov, 2006

2 commits