08 May, 2007

2 commits

  • Don't try to guess what the read-ahead logic will do, allow it
    to make its own decisions.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Eric Dumazet, thank you for disclosing this bug.

    Readahead logic somehow fails to populate the page range with data.
    It can be because

    1) the readahead routine is not always called in the following lines of

    fs/splice.c:
    if (!loff || nr_pages > 1)
    page_cache_readahead(mapping, &in->f_ra, in, index, nr_pages);

    2) even called, page_cache_readahead() wont guarantee the pages are there.
    It wont submit readahead I/O for pages already in the radix tree, or when
    (ra_pages == 0), or after 256 cache hits.

    In your case, it should be because of the retried reads, which lead to
    excessive cache hits, and disables readahead at some time.

    And that _one_ failure of readahead blocks the whole read process.
    The application receives EAGAIN and retries the read, but
    __generic_file_splice_read() refuse to make progress:

    - in the previous invocation, it has allocated a blank page and inserted it
    into the radix tree, but never has the chance to start I/O for it: the test
    of SPLICE_F_NONBLOCK goes before that.

    - in the retried invocation, the readahead code will neither get out of the
    cache hit mode, nor will it submit I/O for an already existing page.

    Cc: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Fengguang Wu
     

29 Mar, 2007

1 commit


27 Mar, 2007

3 commits

  • Ocfs2 wants to implement it's own splice write actor so that it can better
    manage cluster / page locks. This lets us re-use the rest of splice write
    while only providing our own code where it's actually important.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Jens Axboe

    Mark Fasheh
     
  • Splice does not need to readpage to bring the page uptodate before writing
    to it, because prepare_write will take care of that for us.

    Splice is also wrong to SetPageUptodate before the page is actually uptodate.
    This results in the old uninitialised memory leak. This gets fixed as a
    matter of course when removing the readpage logic.

    Signed-off-by: Nick Piggin
    Signed-off-by: Jens Axboe

    Nick Piggin
     
  • Stealing pages with splice is problematic because we cannot just insert
    an uptodate page into the pagecache and hope the filesystem can take care
    of it later.

    We also cannot just ClearPageUptodate, then hope prepare_write does not
    write anything into the page, because I don't think prepare_write gives
    that guarantee.

    Remove support for SPLICE_F_MOVE for now. If we really want to bring it
    back, we might be able to do so with a the new filesystem buffered write
    aops APIs I'm working on. If we really don't want to bring it back, then
    we should decide that sooner rather than later, and remove the flag and
    all the stealing infrastructure before anybody starts using it.

    Signed-off-by: Nick Piggin
    Signed-off-by: Jens Axboe

    Nick Piggin
     

14 Dec, 2006

1 commit

  • - pipe/splice should use const pipe_buf_operations and file_operations

    - struct pipe_inode_info has an unused field "start" : get rid of it.

    Signed-off-by: Eric Dumazet
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

09 Dec, 2006

1 commit

  • This patch changes struct file to use struct path instead of having
    independent pointers to struct dentry and struct vfsmount, and converts all
    users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

    Additionally, it adds two #define's to make the transition easier for users of
    the f_dentry and f_vfsmnt.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     

05 Nov, 2006

1 commit


29 Oct, 2006

1 commit

  • - Consolidate page_cache_alloc

    - Fix splice: only the pagecache pages and filesystem data need to use
    mapping_gfp_mask.

    - Fix grab_cache_page_nowait: same as splice, also honour NUMA placement.

    Signed-off-by: Nick Piggin
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

20 Oct, 2006

3 commits

  • Originally from Mark Fasheh

    generic_file_splice_write() does not remove S_ISUID or S_ISGID. This is
    inconsistent with the way we generally write to files.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This allows file systems to manage their own i_mutex locking while
    still re-using the generic_file_splice_write() logic.

    OCFS2 in particular wants this so that it can order cluster locks within
    i_mutex.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Jens Axboe

    Mark Fasheh
     
  • The splice_actor may be calling ->prepare_write() and ->commit_write(). We
    want i_mutex on the inode being written to before calling those so that we
    don't race i_size changes.

    The double locking behavior is done elsewhere in splice.c, and if we
    eventually want _nolock variants of generic_file_splice_write(), fs modules
    might have to replicate the nasty locking code. We introduce
    inode_double_lock() and inode_double_unlock() to consolidate the locking
    rules into one set of functions.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Jens Axboe

    Mark Fasheh
     

12 Oct, 2006

1 commit


01 Oct, 2006

1 commit


10 Jul, 2006

1 commit

  • Several issues noticed/fixed:

    - We cannot reliably block in link_pipe() while holding both input and output
    mutexes. So do preparatory checks before locking down both mutexes and doing
    the link.

    - The ipipe->nrbufs vs i check was bad, because we could have dropped the
    ipipe lock in-between. This causes us to potentially look at unknown
    buffers if we were racing with someone else reading this pipe.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

23 Jun, 2006

1 commit


04 May, 2006

4 commits


02 May, 2006

8 commits

  • Apply the same rules as the anon pipe pages, only allow stealing
    if no one else is using the page.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Currently we rely on the PIPE_BUF_FLAG_LRU flag being set correctly
    to know whether we need to fiddle with page LRU state after stealing it,
    however for some origins we just don't know if the page is on the LRU
    list or not.

    So remove PIPE_BUF_FLAG_LRU and do this check/add manually in pipe_to_file()
    instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We need to use the minium of {len, PAGE_SIZE-off}, not {len, PAGE_SIZE}-off.
    The latter doesn't make any sense, and could cause us to attempt negative
    length transfers...

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • If SPLICE_F_GIFT is set, the user is basically giving this pages away to
    the kernel. That means we can steal them for eg page cache uses instead
    of copying it.

    The data must be properly page aligned and also a multiple of the page size
    in length.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • The pipe ->map() method uses kmap() to virtually map the pages, which
    is both slow and has known scalability issues on SMP. This patch enables
    atomic copying of pipe pages, by pre-faulting data and using kmap_atomic()
    instead.

    lmbench bw_pipe and lat_pipe measurements agree this is a Good Thing. Here
    are results from that on a UP machine with highmem (1.5GiB of RAM), running
    first a UP kernel, SMP kernel, and SMP kernel patched.

    Vanilla-UP:
    Pipe bandwidth: 1622.28 MB/sec
    Pipe bandwidth: 1610.59 MB/sec
    Pipe bandwidth: 1608.30 MB/sec
    Pipe latency: 7.3275 microseconds
    Pipe latency: 7.2995 microseconds
    Pipe latency: 7.3097 microseconds

    Vanilla-SMP:
    Pipe bandwidth: 1382.19 MB/sec
    Pipe bandwidth: 1317.27 MB/sec
    Pipe bandwidth: 1355.61 MB/sec
    Pipe latency: 9.6402 microseconds
    Pipe latency: 9.6696 microseconds
    Pipe latency: 9.6153 microseconds

    Patched-SMP:
    Pipe bandwidth: 1578.70 MB/sec
    Pipe bandwidth: 1579.95 MB/sec
    Pipe bandwidth: 1578.63 MB/sec
    Pipe latency: 9.1654 microseconds
    Pipe latency: 9.2266 microseconds
    Pipe latency: 9.1527 microseconds

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Notify the readahead logic of the missing page. Suggested by
    Oleg Nesterov.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • The ->map() function is really expensive on highmem machines right now,
    since it has to use the slower kmap() instead of kmap_atomic(). Splice
    rarely needs to access the virtual address of a page, so it's a waste
    of time doing it.

    Introduce ->pin() to take over the responsibility of making sure the
    page data is valid. ->map() is then reduced to just kmap(). That way we
    can also share a most of the pipe buffer ops between pipe.c and splice.c

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Found by Oleg Nesterov , fixed by me.

    - Only allow full pages to go to the page cache.
    - Check page != buf->page instead of using PIPE_BUF_FLAG_STOLEN.
    - Remember to clear 'stolen' if add_to_page_cache() fails.

    And as a cleanup on that:

    - Make the bottom fall-through logic a little less convoluted. Also make
    the steal path hold an extra reference to the page, so we don't have
    to differentiate between stolen and non-stolen at the end.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

30 Apr, 2006

1 commit


27 Apr, 2006

2 commits


26 Apr, 2006

4 commits


20 Apr, 2006

1 commit


19 Apr, 2006

3 commits