28 Jul, 2011

1 commit

  • Fix a corruption that can happen when we have (two or more) outstanding
    aio's to an overlapping unaligned region. Ext4
    (e9e3bcecf44c04b9e6b505fd8e2eb9cea58fb94d) and xfs recently had to fix
    similar issues.

    In our case what happens is that we can have an outstanding aio on a region
    and if a write comes in with some bytes overlapping the original aio we may
    decide to read that region into a page before continuing (typically because
    of buffered-io fallback). Since we have no ordering guarantees with the
    aio, we can read stale or bad data into the page and then write it back out.

    If the i/o is page and block aligned, then we avoid this issue as there
    won't be any need to read data from disk.

    I took the same approach as Eric in the ext4 patch and introduced some
    serialization of unaligned async direct i/o. I don't expect this to have an
    effect on the most common cases of AIO. Unaligned aio will be slower
    though, but that's far more acceptable than data corruption.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Mark Fasheh
     

31 Mar, 2011

1 commit


10 Dec, 2010

1 commit

  • Due to newly-introduced 'coherency=full' O_DIRECT writes also takes the EX
    rw_lock like buffered writes did(rw_level == 1), it turns out messing the
    usage of 'level' in ocfs2_dio_end_io() up, which caused i_alloc_sem being
    failed to get up_read'd correctly.

    This patch tries to teach ocfs2_dio_end_io to understand well on all locking
    stuffs by explicitly introducing a new bit for i_alloc_sem in iocb's private
    data, just like what we did for rw_lock.

    Signed-off-by: Tristan Ye
    Signed-off-by: Joel Becker

    Tristan Ye
     

26 Oct, 2010

1 commit

  • __block_write_begin and block_prepare_write are identical except for slightly
    different calling conventions. Convert all callers to the __block_write_begin
    calling conventions and drop block_prepare_write.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

12 Aug, 2010

1 commit


23 Sep, 2009

1 commit

  • This patch try CoW support for a refcounted record.

    the whole process will be:
    1. Calculate how many clusters we need to CoW and where we start.
    Extents that are not completely encompassed by the write will
    be broken on 1MB boundaries.
    2. Do CoW for the clusters with the help of page cache.
    3. Change the b-tree structure with the new allocated clusters.

    Signed-off-by: Tao Ma

    Tao Ma
     

17 Oct, 2007

1 commit

  • Plug ocfs2 into the ->write_begin and ->write_end aops.

    A bunch of custom code is now gone - the iovec iteration stuff during write
    and the ocfs2 splice write actor.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

13 Oct, 2007

2 commits

  • This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline
    inode data.

    For the most part, the changes to the core write code can be relied on to do
    the heavy lifting. Any code calling ocfs2_write_begin (including shared
    writeable mmap) can count on it doing the right thing with respect to
    growing inline data to an extent tree.

    Size reducing truncates, including UNRESVP can simply zero that portion of
    the inode block being removed. Size increasing truncatesm, including RESVP
    have to be a little bit smarter and grow the inode to an extent tree if
    necessary.

    Signed-off-by: Mark Fasheh
    Reviewed-by: Joel Becker

    Mark Fasheh
     
  • We'll want to reuse most of this when pushing inline data back out to an
    extent. Keeping this part as a seperate patch helps to keep the upcoming
    changes for write support uncluttered.

    The core portion of ocfs2_zero_cluster_pages() responsible for making sure a
    page is mapped and properly dirtied is abstracted out into it's own
    function, ocfs2_map_and_dirty_page(). Actual functionality doesn't change,
    though zeroing becomes optional.

    We also turn part of ocfs2_free_write_ctxt() into a common function for
    unlocking and freeing a page array. This operation is very common (and
    uniform) for Ocfs2 cluster sizes greater than page size, so it makes sense
    to keep the code in one place.

    Signed-off-by: Mark Fasheh
    Reviewed-by: Joel Becker

    Mark Fasheh
     

11 Jul, 2007

2 commits

  • Implement cluster consistent shared writeable mappings using the
    ->page_mkwrite() callback.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Use some ideas from the new-aops patch series and turn
    ocfs2_buffered_write_cluster() into a 2 stage operation with the caller
    copying data in between. The code now understands multiple cluster writes as
    a result of having to deal with a full page write for greater than 4k pages.

    This sets us up to easily call into the write path during ->page_mkwrite().

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

27 Apr, 2007

4 commits


02 Dec, 2006

1 commit


18 May, 2006

1 commit

  • We need to take a data lock around extends to protect the pages that
    ocfs2_zero_extend is going to be pulling into the page cache. Otherwise an
    extend on one node might populate the page cache with data pages that have
    no lock coverage.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

04 Jan, 2006

1 commit