06 Jan, 2009

7 commits

  • The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
    commit triggers and allow us to compute metadata ecc right before the
    buffers are written out. This commit provides ecc for inodes, extent
    blocks, group descriptors, and quota blocks. It is not safe to use
    extened attributes and metaecc at the same time yet.

    The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
    the type of block at their root. Before, it didn't matter, but now the
    root block must use the appropriate ocfs2_journal_access_*() function.
    To keep this abstract, the structures now have a pointer to the matching
    journal_access function and a wrapper call to call it.

    A few places use naked ocfs2_write_block() calls instead of adding the
    blocks to the journal. We make sure to calculate their checksum and ecc
    before the write.

    Since we pass around the journal_access functions. Let's typedef them
    in ocfs2.h.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Add block check calls to the read_block validate functions. This is the
    almost all of the read-side checking of metaecc. xattr buckets are not checked
    yet. Writes are also unchecked, and so a read-write mount will quickly fail.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Add quota calls for allocation and freeing of inodes and space, also update
    estimates on number of needed credits for a transaction. Move out inode
    allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called
    outside of a transaction.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Mark system files as not subject to quota accounting. This prevents
    possible recursions into quota code and thus deadlocks.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Add an optional validation hook to ocfs2_read_blocks(). Now the
    validation function is only called when a block was actually read off of
    disk. It is not called when the buffer was in cache.

    We add a buffer state bit BH_NeedsValidate to flag these buffers. It
    must always be one higher than the last JBD2 buffer state bit.

    The dinode, dirblock, extent_block, and xattr_block validators are
    lifted to this scheme directly. The group_descriptor validator needs to
    be split into two pieces. The first part only needs the gd buffer and
    is passed to ocfs2_read_block(). The second part requires the dinode as
    well, and is called every time. It's only 3 compares, so it's tiny.
    This also allows us to clean up the non-fatal gd check used by resize.c.
    It now has no magic argument.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The ocfs2 code currently reads inodes off disk with a simple
    ocfs2_read_block() call. Each place that does this has a different set
    of sanity checks it performs. Some check only the signature. A couple
    validate the block number (the block read vs di->i_blkno). A couple
    others check for VALID_FL. Only one place validates i_fs_generation. A
    couple check nothing. Even when an error is found, they don't all do
    the same thing.

    We wrap inode reading into ocfs2_read_inode_block(). This will validate
    all the above fields, going readonly if they are invalid (they never
    should be). ocfs2_read_inode_block_full() is provided for the places
    that want to pass read_block flags. Every caller is passing a struct
    inode with a valid ip_blkno, so we don't need a separate blkno argument
    either.

    We will remove the validation checks from the rest of the code in a
    later commit, as they are no longer necessary.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     

11 Nov, 2008

1 commit

  • Patch sets journal descriptor to NULL after the journal is shutdown.
    This ensures that jbd2_journal_release_jbd_inode(), which removes the
    jbd2 inode from txn lists, can be called safely from ocfs2_clear_inode()
    even after the journal has been shutdown.

    Signed-off-by: Sunil Mushran
    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     

15 Oct, 2008

5 commits

  • ocfs2_read_blocks() currently requires the CACHED flag for cached I/O.
    However, that's the common case. Let's flip it around and provide an
    IGNORE_CACHE flag for the special users. This has the added benefit of
    cleaning up the code some (ignore_cache takes on its special meaning
    earlier in the loop).

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • dir.c is the only place using ocfs2_bread(), so let's make it static to
    that file.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED.
    Only six pass a different flag set. Rather than have every caller care,
    let's make ocfs2_read_block() take no flags and always do a cached read.
    The remaining six places can call ocfs2_read_blocks() directly.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Now that synchronous readers are using ocfs2_read_blocks_sync(), all
    callers of ocfs2_read_blocks() are passing an inode. Use it
    unconditionally. Since it's there, we don't need to pass the
    ocfs2_super either.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The ocfs2_read_blocks() function currently handles sync reads, cached,
    reads, and sometimes cached reads. We're going to add some
    functionality to it, so first we should simplify it. The uncached,
    synchronous reads are much easer to handle as a separate function, so we
    instroduce ocfs2_read_blocks_sync().

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     

14 Oct, 2008

4 commits

  • This is pointless as brelse() already does the check.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
    limiting our maximum filesystem size.

    It's a pretty trivial change. Most functions are just renamed. The
    only functional change is moving to Jan's inode-based ordered data mode.
    It's better, too.

    Because JBD2 reads and writes JBD journals, this is compatible with any
    existing filesystem. It can even interact with JBD-based ocfs2 as long
    as the journal is formated for JBD.

    We provide a compatibility option so that paranoid people can still use
    JBD for the time being. This will go away shortly.

    [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
    ocfs2_truncate_for_delete(). --Mark ]

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • This patch implements storing extended attributes both in inode or a single
    external block. We only store EA's in-inode when blocksize > 512 or that
    inode block has free space for it. When an EA's value is larger than 80
    bytes, we will store the value via b-tree outside inode or block.

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang
     
  • This is actually pretty easy since fs/dlm already handles the bulk of the
    work. The Ocfs2 userspace cluster stack module already uses fs/dlm as the
    underlying lock manager, so I only had to add the right calls.

    Cluster-aware POSIX locks ("plocks") can be turned off by the same means at
    UNIX locks - mount with 'noflocks', or create a local-only Ocfs2 volume.
    Internally, the file system uses two sets of file_operations, depending on
    whether cluster aware plocks is required. This turns out to be easier than
    implementing local-only versions of ->lock.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

26 Jan, 2008

5 commits

  • Convert byte order of constant instead of variable it will be done at
    compile time vs run time. Remove unused le32_and_cpu.

    Signed-off-by: Marcin Slusarz
    Signed-off-by: Mark Fasheh

    Marcin Slusarz
     
  • Create separate lockdep lock classes for system file's i_mutexes. They are
    used to guard allocations and similar things and thus rank differently
    than i_mutex of a regular file or directory.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Call this the "inode_lock" now, since it covers both data and meta data.
    This patch makes no functional changes.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • The meta lock now covers both meta data and data, so this just removes the
    now-redundant data lock.

    Combining locks saves us a round of lock mastery per inode and one less lock
    to ping between nodes during read/write.

    We don't lose much - since meta locks were always held before a data lock
    (and at the same level) ordered writeout mode (the default) ensured that
    flushing for the meta data lock also pushed out data anyways.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • The node maps that are set/unset by these votes are no longer relevant, thus
    we can remove the mount and umount votes. Since those are the last two
    remaining votes, we can also remove the entire vote infrastructure.

    The vote thread has been renamed to the downconvert thread, and the small
    amount of functionality related to managing it has been moved into
    fs/ocfs2/dlmglue.c. All references to votes have been removed or updated.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

28 Nov, 2007

2 commits


13 Oct, 2007

2 commits

  • This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline
    inode data.

    For the most part, the changes to the core write code can be relied on to do
    the heavy lifting. Any code calling ocfs2_write_begin (including shared
    writeable mmap) can count on it doing the right thing with respect to
    growing inline data to an extent tree.

    Size reducing truncates, including UNRESVP can simply zero that portion of
    the inode block being removed. Size increasing truncatesm, including RESVP
    have to be a little bit smarter and grow the inode to an extent tree if
    necessary.

    Signed-off-by: Mark Fasheh
    Reviewed-by: Joel Becker

    Mark Fasheh
     
  • Add the disk, network and memory structures needed to support data in inode.

    Struct ocfs2_inline_data is defined and embedded in ocfs2_dinode for storing
    inline data.

    A new inode field, i_dyn_features, is added to facilitate tracking of
    dynamic inode state. Since it will be used often, we want to mirror it on
    ocfs2_inode_info, and transfer it via the meta data lvb.

    Signed-off-by: Mark Fasheh
    Reviewed-by: Joel Becker

    Mark Fasheh
     

09 May, 2007

1 commit


03 May, 2007

3 commits


27 Apr, 2007

9 commits

  • The extent map code was ripped out earlier because of an inability to deal
    with holes. This patch adds back a simpler caching scheme requiring far less
    code.

    Our old extent map caching was designed back when meta data block caching in
    Ocfs2 didn't work very well, resulting in many disk reads. These days our
    metadata caching is much better, resulting in no un-necessary disk reads. As
    a result, extent caching doesn't have to be as fancy, nor does it have to
    cache as many extents. Keeping the last 3 extents seen should be sufficient
    to give us a small performance boost on some streaming workloads.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Older file systems which didn't support holes did a dumb calculation of
    i_blocks based on i_size. This is no longer accurate, so fix things up to
    take actual allocation into account.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Return an optional extent flags field from our lookup functions and wire up
    callers to treat unwritten regions as holes for the purpose of returning
    zeros to the user.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Since we don't zero on extend anymore, truncate needs to be fixed up to zero
    the part of a file between i_size and and end of it's cluster. Otherwise a
    subsequent extend could expose bad data.

    This introduced a new helper, which can be used in ocfs2_write().

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no
    longer exists since i_size is not tied to i_clusters. In
    ocfs2_extend_file(), we skip the allocation / page zeroing code for file
    systems which understand sparse files.

    The core truncate code is changed to do a bottom up tree traversal. This
    gets abstracted out into it's own function. To make things more readable,
    most of the special case handling for in-inode extents from
    ocfs2_do_truncate() is also removed.

    Though write support for sparse files comes in a later patch, we at least
    update ocfs2_prepare_inode_for_write() to skip allocation for sparse files.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • The code in extent_map.c is not prepared to deal with a subtree being
    rotated between lookups. This can happen when filling holes in sparse files.
    Instead of a lengthy patch to update the code (which would likely lose the
    benefit of caching subtree roots), we remove most of the algorithms and
    implement a simple path based lookup. A less ambitious extent caching scheme
    will be added in a later patch.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • There are two checks in there (one for inode newness, one for other mounted
    nodes) which are unnecessary, so remove them. The DLM will allow the trylock
    in either case without any messaging overhead.

    Removing these makes ocfs2_request_delete() a one liner function, so just
    move the trylock out one level into ocfs2_query_inode_wipe().

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Remove node messaging code that becomes unused with the delete inode vote
    removal.

    [Removed even more cruft which I spotted during review --Mark]

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang
     
  • Ocfs2 currently does cluster-wide node messaging to check the open state of
    an inode during delete. This patch removes that mechanism in favor of an
    inode cluster lock which is taken at shared read when an inode is first read
    and dropped in clear_inode(). This allows a deleting node to test the
    liveness of an inode by attempting to take an exclusive lock.

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang
     

22 Jan, 2007

1 commit