20 Jul, 2007

4 commits

  • Split ondemand readahead interface into two functions. I think this makes it
    a little clearer for non-readahead experts (like Rusty).

    Internally they both call ondemand_readahead(), but the page argument is
    changed to an obvious boolean flag.

    Signed-off-by: Rusty Russell
    Signed-off-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • Pass real splice size to page_cache_readahead_ondemand().

    The splice code works in chunks of 16 pages internally. The readahead code
    should be told of the overall splice size, instead of the internal chunk size.
    Otherwize bad things may happen. Imagine some 17-page random splice reads.
    The code before this patch will result in two readahead calls: readahead(16);
    readahead(1); That leads to one 16-page I/O and one 32-page I/O: one extra I/O
    and 31 readahead miss pages.

    Signed-off-by: Fengguang Wu
    Cc: Jens Axboe
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     
  • Move synchronous page_cache_readahead_ondemand() call out of splice loop.

    This avoids one pointless page allocation/insertion in case of non-zero
    ra_pages, or many pointless readahead calls in case of zero ra_pages.

    Note that if a user sets ra_pages to less than PIPE_BUFFERS=16 pages, he will
    not get expected readahead behavior anyway. The splice code works in batches
    of 16 pages, which can be taken as another form of synchronous readahead.

    Signed-off-by: Fengguang Wu
    Cc: Jens Axboe
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     
  • Convert splice reads to use on-demand readahead.

    Signed-off-by: Fengguang Wu
    Cc: Steven Pratt
    Cc: Ram Pai
    Cc: Jens Axboe
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     

16 Jul, 2007

1 commit

  • OGAWA Hirofumi reported that he's noticed
    nfsd read corruption in recent kernels, and did the hard work of
    discovering that it's due to splice updating the file position twice.
    This means that the next operation would start further ahead than it
    should.

    nfsd_vfs_read()
    splice_direct_to_actor()
    while(len) {
    do_splice_to() [update sd->pos]
    -> generic_file_splice_read() [read from sd->pos]
    nfsd_direct_splice_actor()
    -> __splice_from_pipe() [update sd->pos]

    There's nothing wrong with the core splice code, but the direct
    splicing is an addon that calls both input and output paths.
    So it has to take care in locally caching offset so it remains correct.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

13 Jul, 2007

2 commits


10 Jul, 2007

7 commits


15 Jun, 2007

3 commits


08 Jun, 2007

5 commits


08 May, 2007

2 commits

  • Don't try to guess what the read-ahead logic will do, allow it
    to make its own decisions.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Eric Dumazet, thank you for disclosing this bug.

    Readahead logic somehow fails to populate the page range with data.
    It can be because

    1) the readahead routine is not always called in the following lines of

    fs/splice.c:
    if (!loff || nr_pages > 1)
    page_cache_readahead(mapping, &in->f_ra, in, index, nr_pages);

    2) even called, page_cache_readahead() wont guarantee the pages are there.
    It wont submit readahead I/O for pages already in the radix tree, or when
    (ra_pages == 0), or after 256 cache hits.

    In your case, it should be because of the retried reads, which lead to
    excessive cache hits, and disables readahead at some time.

    And that _one_ failure of readahead blocks the whole read process.
    The application receives EAGAIN and retries the read, but
    __generic_file_splice_read() refuse to make progress:

    - in the previous invocation, it has allocated a blank page and inserted it
    into the radix tree, but never has the chance to start I/O for it: the test
    of SPLICE_F_NONBLOCK goes before that.

    - in the retried invocation, the readahead code will neither get out of the
    cache hit mode, nor will it submit I/O for an already existing page.

    Cc: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Fengguang Wu
     

29 Mar, 2007

1 commit


27 Mar, 2007

3 commits

  • Ocfs2 wants to implement it's own splice write actor so that it can better
    manage cluster / page locks. This lets us re-use the rest of splice write
    while only providing our own code where it's actually important.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Jens Axboe

    Mark Fasheh
     
  • Splice does not need to readpage to bring the page uptodate before writing
    to it, because prepare_write will take care of that for us.

    Splice is also wrong to SetPageUptodate before the page is actually uptodate.
    This results in the old uninitialised memory leak. This gets fixed as a
    matter of course when removing the readpage logic.

    Signed-off-by: Nick Piggin
    Signed-off-by: Jens Axboe

    Nick Piggin
     
  • Stealing pages with splice is problematic because we cannot just insert
    an uptodate page into the pagecache and hope the filesystem can take care
    of it later.

    We also cannot just ClearPageUptodate, then hope prepare_write does not
    write anything into the page, because I don't think prepare_write gives
    that guarantee.

    Remove support for SPLICE_F_MOVE for now. If we really want to bring it
    back, we might be able to do so with a the new filesystem buffered write
    aops APIs I'm working on. If we really don't want to bring it back, then
    we should decide that sooner rather than later, and remove the flag and
    all the stealing infrastructure before anybody starts using it.

    Signed-off-by: Nick Piggin
    Signed-off-by: Jens Axboe

    Nick Piggin
     

14 Dec, 2006

1 commit

  • - pipe/splice should use const pipe_buf_operations and file_operations

    - struct pipe_inode_info has an unused field "start" : get rid of it.

    Signed-off-by: Eric Dumazet
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

09 Dec, 2006

1 commit

  • This patch changes struct file to use struct path instead of having
    independent pointers to struct dentry and struct vfsmount, and converts all
    users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

    Additionally, it adds two #define's to make the transition easier for users of
    the f_dentry and f_vfsmnt.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     

05 Nov, 2006

1 commit


29 Oct, 2006

1 commit

  • - Consolidate page_cache_alloc

    - Fix splice: only the pagecache pages and filesystem data need to use
    mapping_gfp_mask.

    - Fix grab_cache_page_nowait: same as splice, also honour NUMA placement.

    Signed-off-by: Nick Piggin
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

20 Oct, 2006

3 commits

  • Originally from Mark Fasheh

    generic_file_splice_write() does not remove S_ISUID or S_ISGID. This is
    inconsistent with the way we generally write to files.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This allows file systems to manage their own i_mutex locking while
    still re-using the generic_file_splice_write() logic.

    OCFS2 in particular wants this so that it can order cluster locks within
    i_mutex.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Jens Axboe

    Mark Fasheh
     
  • The splice_actor may be calling ->prepare_write() and ->commit_write(). We
    want i_mutex on the inode being written to before calling those so that we
    don't race i_size changes.

    The double locking behavior is done elsewhere in splice.c, and if we
    eventually want _nolock variants of generic_file_splice_write(), fs modules
    might have to replicate the nasty locking code. We introduce
    inode_double_lock() and inode_double_unlock() to consolidate the locking
    rules into one set of functions.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Jens Axboe

    Mark Fasheh
     

12 Oct, 2006

1 commit


01 Oct, 2006

1 commit


10 Jul, 2006

1 commit

  • Several issues noticed/fixed:

    - We cannot reliably block in link_pipe() while holding both input and output
    mutexes. So do preparatory checks before locking down both mutexes and doing
    the link.

    - The ipipe->nrbufs vs i check was bad, because we could have dropped the
    ipipe lock in-between. This causes us to potentially look at unknown
    buffers if we were racing with someone else reading this pipe.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

23 Jun, 2006

1 commit


04 May, 2006

1 commit