15 Oct, 2008

3 commits

  • More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED.
    Only six pass a different flag set. Rather than have every caller care,
    let's make ocfs2_read_block() take no flags and always do a cached read.
    The remaining six places can call ocfs2_read_blocks() directly.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Now that synchronous readers are using ocfs2_read_blocks_sync(), all
    callers of ocfs2_read_blocks() are passing an inode. Use it
    unconditionally. Since it's there, we don't need to pass the
    ocfs2_super either.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The ocfs2_read_blocks() function currently handles sync reads, cached,
    reads, and sometimes cached reads. We're going to add some
    functionality to it, so first we should simplify it. The uncached,
    synchronous reads are much easer to handle as a separate function, so we
    instroduce ocfs2_read_blocks_sync().

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     

14 Oct, 2008

37 commits

  • According to Christoph Hellwig's advice, we really don't need
    a ->list to handle one xattr's list. Just a map from index to
    xattr prefix is enough. And I also refactor the old list method
    with the reference from fs/xfs/linux-2.6/xfs_xattr.c and the
    xattr list method in btrfs.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • According to Christoph Hellwig's advice, the hash value of EA
    is only calculated by its suffix.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • Per Christoph Hellwig's suggestion - don't split these up. It's not like we
    gained much by having the two tiny files around.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • This is too big to be inlined.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • This is pointless as brelse() already does the check.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • i and b_len don't really need to be u64's. Xattr extent lengths should be
    limited by the VFS, and then the size of our on-disk length field.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • It can also be moved into ocfs2_la_debug_read().

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • ocfs2_stack_supports_plocks() doesn't need this to properly return a zero or
    one value.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • As Mark mentioned, it may be time-consuming when we remove the
    empty xattr bucket, so this patch try to let empty bucket exist
    in xattr operation. The modification includes:
    1. Remove the functin of bucket and extent record deletion during
    xattr delete.
    2. In xattr set:
    1) Don't clean the last entry so that if the bucket is empty,
    the hash value of the bucket is the hash value of the entry
    which is deleted last.
    2) During insert, if we meet with an empty bucket, just use the
    1st entry.
    3. In binary search of xattr bucket, use the bucket hash value(which
    stored in the 1st xattr entry) to find the right place.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • During the process of xatt insertion, we use binary search
    to find the right place and "low" is set to it. But when
    there is one xattr which has the same name hash as the inserted
    one, low is the wrong value. So set it to the right position.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • Patch adds check for [no]user_xattr in ocfs2_show_options() that completes
    the list of all mount options.

    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     
  • ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
    limiting our maximum filesystem size.

    It's a pretty trivial change. Most functions are just renamed. The
    only functional change is moving to Jan's inode-based ordered data mode.
    It's better, too.

    Because JBD2 reads and writes JBD journals, this is compatible with any
    existing filesystem. It can even interact with JBD-based ocfs2 as long
    as the journal is formated for JBD.

    We provide a compatibility option so that paranoid people can still use
    JBD for the time being. This will go away shortly.

    [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
    ocfs2_truncate_for_delete(). --Mark ]

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Now that ocfs2 limits inode numbers to 32bits, add a mount option to
    disable the limit. This parallels XFS. 64bit systems can handle the
    larger inode numbers.

    [ Added description of inode64 mount option in ocfs2.txt. --Mark ]

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • ocfs2 inode numbers are block numbers. For any filesystem with less
    than 2^32 blocks, this is not a problem. However, when ocfs2 starts
    using JDB2, it will be able to support filesystems with more than 2^32
    blocks. This would result in inode numbers higher than 2^32.

    The problem is that stat(2) can't handle those numbers on 32bit
    machines. The simple solution is to have ocfs2 allocate all inodes
    below that boundary.

    The suballoc code is changed to honor an optional block limit. Only the
    inode suballocator sets that limit - all other allocations stay unlimited.

    The biggest trick is to grow the inode suballocator beneath that limit.
    There's no point in allocating block groups that are above the limit,
    then rejecting their elements later on. We want to prevent the inode
    allocator from ever having block groups above the limit. This involves
    a little gyration with the local alloc code. If the local alloc window
    is above the limit, it signals the caller to try the global bitmap but
    does not disable the local alloc file (which can be used for other
    allocations).

    [ Minor cleanup - removed an ML_NOTICE comment. --Mark ]

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • In ocfs2_xattr_free_block, we take a cluster lock on xb_alloc_inode while we
    have a transaction open. This will deadlock the downconvert thread, so fix
    it.

    We can clean up how xattr blocks are removed while here - this patch also
    moves the mechanism of releasing xattr block (including both value, xattr
    tree and xattr block) into this function.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • In ocfs2_extend_trans, when we can't extend the current
    transaction, it will commit current transaction and restart
    a new one. So if the previous credits we have allocated aren't
    used(the block isn't dirtied before our extend), we will not
    have enough credits for any future operation(it will cause jbd
    complain and bug out). So check this and re-extend it.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • The original get/put_extent_tree() functions held a reference on
    et_root_bh. However, every single caller already has a safe reference,
    making the get/put cycle irrelevant.

    We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree(). It
    no longer gets a reference on et_root_bh. ocfs2_put_extent_tree() is
    removed. Callers now have a simpler init+use pattern.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • struct ocfs2_extent_tree_operations provides methods for the different
    on-disk btrees in ocfs2. Describing what those methods do is probably a
    good idea.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • We now have three different kinds of extent trees in ocfs2: inode data
    (dinode), extended attributes (xattr_tree), and extended attribute
    values (xattr_value). There is a nice abstraction for them,
    ocfs2_extent_tree, but it is hidden in alloc.c. All the calling
    functions have to pick amongst a varied API and pass in type bits and
    often extraneous pointers.

    A better way is to make ocfs2_extent_tree a first-class object.
    Everyone converts their object to an ocfs2_extent_tree() via the
    ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all
    tree calls to alloc.c.

    This simplifies a lot of callers, making for readability. It also
    provides an easy way to add additional extent tree types, as they only
    need to be defined in alloc.c with a ocfs2_get__extent_tree()
    function.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • A couple places check an extent_tree for a valid inode. We move that
    out to add an eo_insert_check() operation. It can be called from
    ocfs2_insert_extent() and elsewhere.

    We also have the wrapper calls ocfs2_et_insert_check() and
    ocfs2_et_sanity_check() ignore NULL ops. That way we don't have to
    provide useless operations for xattr types.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • A caller knows what kind of extent tree they have. There's no reason
    they have to call ocfs2_get_extent_tree() with a NULL when they could
    just as easily call a specific function to their type of extent tree.

    Introduce ocfs2_dinode_get_extent_tree(),
    ocfs2_xattr_tree_get_extent_tree(), and
    ocfs2_xattr_value_get_extent_tree(). They only take the necessary
    arguments, calling into the underlying __ocfs2_get_extent_tree() to do
    the real work.

    __ocfs2_get_extent_tree() is the old ocfs2_get_extent_tree(), but
    without needing any switch-by-type logic.

    ocfs2_get_extent_tree() is now a wrapper around the specific calls. It
    exists because a couple alloc.c functions can take et_type. This will
    go later.

    Another benefit is that ocfs2_xattr_value_get_extent_tree() can take a
    struct ocfs2_xattr_value_root* instead of void*. This gives us
    typechecking where we didn't have it before.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Provide an optional extent_tree_operation to specify the
    max_leaf_clusters of an ocfs2_extent_tree. If not provided, the value
    is 0 (unlimited).

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • ocfs2_num_free_extents() re-implements the logic of
    ocfs2_get_extent_tree(). Now that ocfs2_get_extent_tree() does not
    allocate, let's use it in ocfs2_num_free_extents() to simplify the code.

    The inode validation code in ocfs2_num_free_extents() is not needed.
    All callers are passing in pre-validated inodes.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The root_el of an ocfs2_extent_tree needs to be calculated from
    et->et_object. Make it an operation on et->et_ops.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The 'private' pointer was a way to store off xattr values, which don't
    live at a set place in the bh. But the concept of "the object
    containing the extent tree" is much more generic. For an inode it's the
    struct ocfs2_dinode, for an xattr value its the value. Let's save off
    the 'object' at all times. If NULL is passed to
    ocfs2_get_extent_tree(), 'object' is set to bh->b_data;

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Rather than allocating a struct ocfs2_extent_tree, just put it on the
    stack. Fill it with ocfs2_get_extent_tree() and drop it with
    ocfs2_put_extent_tree(). Now the callers don't have to ENOMEM, yet
    still safely ref the root_bh.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The members of the ocfs2_extent_tree structure gain a prefix of 'et_'.
    All users are updated.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The ocfs2_extent_tree_operations structure gains a field prefix on its
    members. The ->eo_sanity_check() operation gains a wrapper function for
    completeness. All of the extent tree operation wrappers gain a
    consistent name (ocfs2_et_*()).

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • This patch fixes the following build warnings:

    fs/ocfs2/xattr.c: In function 'ocfs2_half_xattr_bucket':
    fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int'
    fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int'
    fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int'
    fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int'
    fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int'
    fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int'
    fs/ocfs2/xattr.c: In function 'ocfs2_xattr_set_entry_in_bucket':
    fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t'
    fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t'
    fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t'

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • This patch adds the s_incompat flag for extended attribute support. This
    helps us ensure that older versions of Ocfs2 or ocfs2-tools will not be able
    to mount a volume with xattr support.

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang
     
  • In inode removal, we need to iterate all the buckets, remove any
    externally-stored EA values and delete the xattr buckets.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • Where the previous patches added the ability of list/get xattr in buckets
    for ocfs2, this patch enables ocfs2 to store large numbers of EAs.

    The original design doc is written by Mark Fasheh, and it can be found in
    http://oss.oracle.com/osswiki/OCFS2/DesignDocs/IndexedEATrees. I only had to
    make small modifications to it.

    First, because the bucket size is 4K, a new field named xh_free_start is added
    in ocfs2_xattr_header to indicate the next valid name/value offset in a bucket.
    It is used when we store new EA name/value. With this field, we can find the
    place more quickly and what's more, we don't need to sort the name/value every
    time to let the last entry indicate the next unused space. This makes the
    insert operation more efficient for blocksizes smaller than 4k.

    Because of the new xh_free_start, another field named as xh_name_value_len is
    also added in ocfs2_xattr_header. It records the total length of all the
    name/values in the bucket. We need this so that we can check it and defragment
    the bucket if there is not enough contiguous free space.

    An xattr insertion looks like this:
    1. xattr_index_block_find: find the right bucket by the name_hash, say bucketA.
    2. check whether there is enough space in bucketA. If yes, insert it directly
    and modify xh_free_start and xh_name_value_len accordingly. If not, check
    xh_name_value_len to see whether we can store this by defragment the bucket.
    If yes, defragment it and go on insertion.
    3. If defragement doesn't work, check whether there is new empty bucket in
    the clusters within this extent record. If yes, init the new bucket and move
    all the buckets after bucketA one by one to the next bucket. Move half of the
    entries in bucketA to the next bucket and go on insertion.
    4. If there is no new bucket, grow the extent tree.

    As for xattr deletion, we will delete an xattr bucket when all it's xattrs
    are removed and move all the buckets after it to the previous one. When all
    the xattr buckets in an extend record are freed, free this extend records
    from ocfs2_xattr_tree.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • In xattr bucket, we want to limit the maximum size of a btree leaf,
    otherwise we'll lose the benefits of hashing because we'll have to search
    large leaves.

    So add a new field in ocfs2_extent_tree which indicates the maximum leaf cluster
    size we want so that we can prevent ocfs2_insert_extent() from merging the leaf
    record even if it is contiguous with an adjacent record.

    Other btree types are not affected by this change.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • Add code to lookup a given extended attribute in the xattr btree. Lookup
    follows this general scheme:

    1. Use ocfs2_xattr_get_rec to find the xattr extent record

    2. Find the xattr bucket within the extent which may contain this xattr

    3. Iterate the bucket to find the xattr. In ocfs2_xattr_block_get(), we need
    to recalcuate the block offset and name offset for the right position of
    name/value.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • Ocfs2 breaks up xattr index tree leaves into 4k regions, called buckets.
    Attributes are stored within a given bucket, depending on hash value.

    After a discussion with Mark, we decided that the per-bucket index
    (xe_entry[]) would only exist in the 1st block of a bucket. Likewise,
    name/value pairs will not straddle more than one block. This allows the
    majority of operations to work directly on the buffer heads in a leaf block.

    This patch adds code to iterate the buckets in an EA. A new abstration of
    ocfs2_xattr_bucket is added. It records the bhs in this bucket and
    ocfs2_xattr_header. This keeps the code neat, improving readibility.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • When necessary, an ocfs2_xattr_block will embed an ocfs2_extent_list to
    store large numbers of EAs. This patch adds a new type in
    ocfs2_extent_tree_type and adds the implementation so that we can re-use the
    b-tree code to handle the storage of many EAs.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma