17 May, 2007

1 commit

  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

11 May, 2007

1 commit

  • Fix gcc warning and Oops that it causes:

    fs/ocfs2/cluster/masklog.c:161: warning: assignment from incompatible pointer type
    [ 2776.204120] OCFS2 Node Manager 1.3.3
    [ 2776.211729] BUG: spinlock bad magic on CPU#0, modprobe/4424
    [ 2776.214269] lock: ffff810021c8fe18, .magic: ffffffff, .owner: /6394416, .owner_cpu: 0
    [ 2776.217864] [ 2776.217865] Call Trace:
    [ 2776.219662] [] spin_bug+0x9e/0xe9
    [ 2776.221921] [] _raw_spin_lock+0x23/0xf9
    [ 2776.224417] [] _spin_lock+0x9/0xb
    [ 2776.226676] [] kobject_shadow_add+0x98/0x1ac
    [ 2776.229367] [] kobject_add+0xb/0xd
    [ 2776.231665] [] kset_add+0xd/0xf
    [ 2776.233845] [] kset_register+0x23/0x28
    [ 2776.236309] [] :ocfs2_nodemanager:mlog_sys_init+0x68/0x6d
    [ 2776.239518] [] :ocfs2_nodemanager:o2cb_sys_init+0x32/0x4a
    [ 2776.242726] [] :ocfs2_nodemanager:init_o2nm+0xa6/0xd5
    [ 2776.245772] [] sys_init_module+0x1471/0x15d2
    [ 2776.248465] [] simple_strtoull+0x0/0xdc
    [ 2776.250959] [] system_call+0x7e/0x83

    Signed-off-by: Randy Dunlap
    Acked-by: Mark Fasheh
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

09 May, 2007

1 commit


08 May, 2007

2 commits

  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Ensure pages are uptodate after returning from read_cache_page, which allows
    us to cut out most of the filesystem-internal PageUptodate calls.

    I didn't have a great look down the call chains, but this appears to fixes 7
    possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in
    ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in
    block2mtd. All depending on whether the filler is async and/or can return
    with a !uptodate page.

    Signed-off-by: Nick Piggin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

05 May, 2007

1 commit

  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
    ocfs2: Force use of GFP_NOFS in ocfs2_write()
    ocfs2: fix sparse warnings in fs/ocfs2/cluster
    ocfs2: fix sparse warnings in fs/ocfs2/dlm
    ocfs2: fix sparse warnings in fs/ocfs2
    [PATCH] Copy i_flags to ocfs2 inode flags on write
    [PATCH] ocfs2: use __set_current_state()
    ocfs2: Wrap access of directory allocations with ip_alloc_sem.
    [PATCH] fs/ocfs2/: make 3 functions static
    ocfs2: Implement compat_ioctl()

    Linus Torvalds
     

03 May, 2007

10 commits


27 Apr, 2007

24 commits

  • The extent map code was ripped out earlier because of an inability to deal
    with holes. This patch adds back a simpler caching scheme requiring far less
    code.

    Our old extent map caching was designed back when meta data block caching in
    Ocfs2 didn't work very well, resulting in many disk reads. These days our
    metadata caching is much better, resulting in no un-necessary disk reads. As
    a result, extent caching doesn't have to be as fancy, nor does it have to
    cache as many extents. Keeping the last 3 extents seen should be sufficient
    to give us a small performance boost on some streaming workloads.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Cluster locking might have been redone because a direct write won't
    complete, so this needs to be reflected in the iocb.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Older file systems which didn't support holes did a dumb calculation of
    i_blocks based on i_size. This is no longer accurate, so fix things up to
    take actual allocation into account.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Initially, we had wired things to return a size '1' of holes. Cook up a
    small amount of code to find the next extent and calculate the number of
    clusters between the virtual offset and the next allocated extent.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Return an optional extent flags field from our lookup functions and wire up
    callers to treat unwritten regions as holes for the purpose of returning
    zeros to the user.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Due to the size of our group bitmaps, we'll never have a leaf node extent
    record with more than 16 bits worth of clusters. Split e_clusters up so that
    leaf nodes can get a flags field where we can mark unwritten extents.
    Interior nodes whose length references all the child nodes beneath it can't
    split their e_clusters field, so we use a union to preserve sizing there.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • We need to fill holes during a splice write. Provide our own splice write
    actor which can call ocfs2_file_buffered_write() with a splice-specific
    callback.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Do this instead of filemap_fdatawrite() - this way we sync only the
    range between i_size and the cluster boundary.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Since we don't zero on extend anymore, truncate needs to be fixed up to zero
    the part of a file between i_size and and end of it's cluster. Otherwise a
    subsequent extend could expose bad data.

    This introduced a new helper, which can be used in ocfs2_write().

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • ocfs2_get_block() didn't understand sparse files, fix that. Also remove some
    code that isn't really useful anymore. We can fix up
    ocfs2_direct_IO_get_blocks() at the same time.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • These are no longer used, and can't handle file systems with sparse file
    allocation.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Unfortunately, ocfs2 can no longer make use of generic_file_aio_write_nlock()
    because allocating writes will require zeroing of pages adjacent to the I/O
    for cluster sizes greater than page size.

    Implement a custom file write here, which can order page locks for zeroing.
    This also has the advantage that cluster locks can easily be ordered outside
    of the page locks.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • This will be turned back on once we can do allocation in ->page_mkwrite().

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Right now, file allocation for ocfs2 is done within ocfs2_extend_file(),
    which is either called from ->setattr() (for an i_size change), or at the
    top of ocfs2_file_aio_write().

    Inodes on file systems with sparse file support will want to do their
    allocation during the actual write call.

    In either case the cluster locking decisions are the same. We abstract out
    that code into a new function, ocfs2_lock_allocators() which will be used by
    a later patch to enable writing to sparse files.

    This also provides a nice cleanup of ocfs2_extend_allocation().

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no
    longer exists since i_size is not tied to i_clusters. In
    ocfs2_extend_file(), we skip the allocation / page zeroing code for file
    systems which understand sparse files.

    The core truncate code is changed to do a bottom up tree traversal. This
    gets abstracted out into it's own function. To make things more readable,
    most of the special case handling for in-inode extents from
    ocfs2_do_truncate() is also removed.

    Though write support for sparse files comes in a later patch, we at least
    update ocfs2_prepare_inode_for_write() to skip allocation for sparse files.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • The code in extent_map.c is not prepared to deal with a subtree being
    rotated between lookups. This can happen when filling holes in sparse files.
    Instead of a lengthy patch to update the code (which would likely lose the
    benefit of caching subtree roots), we remove most of the algorithms and
    implement a simple path based lookup. A less ambitious extent caching scheme
    will be added in a later patch.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Introduce tree rotations into the b-tree code. This will allow ocfs2 to
    support sparse files. Much of the added code is designed to be generic (in
    the ocfs2 sense) so that it can later be re-used to implement large
    extended attributes.

    This patch only adds the rotation code and does minimal updates to callers
    of the extent api.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • There are two checks in there (one for inode newness, one for other mounted
    nodes) which are unnecessary, so remove them. The DLM will allow the trylock
    in either case without any messaging overhead.

    Removing these makes ocfs2_request_delete() a one liner function, so just
    move the trylock out one level into ocfs2_query_inode_wipe().

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Remove node messaging code that becomes unused with the delete inode vote
    removal.

    [Removed even more cruft which I spotted during review --Mark]

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang
     
  • Ocfs2 currently does cluster-wide node messaging to check the open state of
    an inode during delete. This patch removes that mechanism in favor of an
    inode cluster lock which is taken at shared read when an inode is first read
    and dropped in clear_inode(). This allows a deleting node to test the
    liveness of an inode by attempting to take an exclusive lock.

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang
     
  • We don't want to print anything at all in ocfs2_lookup() when getting an
    error from ocfs2_iget() - it could be something as innocuous as a signal
    being detected in the dlm.

    ocfs2_permission() should filter on -ENOENT which ocfs2_meta_lock() can
    return if the inode was deleted on another node.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • We have noticed panic() hanging leading us to a situation in which
    the node, while otherwise dead, is still disk heartbeating. This
    leads to a hung cluster as the other nodes are waiting for this
    node to stop disk heartbeating. This situation is only resolved
    by power resetting the box.

    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     
  • Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     
  • We don't want the extent map and uptodate cache destruction in
    ocfs2_meta_lock_update() on a local mount, so skip that.

    This fixes several bugs with uptodate being cleared on buffers and extent
    maps being corrupted.

    Signed-off-by: Mark Fasheh

    Mark Fasheh