26 Sep, 2014

2 commits

  • commit 2457aec63745e235bcafb7ef312b182d8682f0fc upstream.

    aops->write_begin may allocate a new page and make it visible only to have
    mark_page_accessed called almost immediately after. Once the page is
    visible the atomic operations are necessary which is noticable overhead
    when writing to an in-memory filesystem like tmpfs but should also be
    noticable with fast storage. The objective of the patch is to initialse
    the accessed information with non-atomic operations before the page is
    visible.

    The bulk of filesystems directly or indirectly use
    grab_cache_page_write_begin or find_or_create_page for the initial
    allocation of a page cache page. This patch adds an init_page_accessed()
    helper which behaves like the first call to mark_page_accessed() but may
    called before the page is visible and can be done non-atomically.

    The primary APIs of concern in this care are the following and are used
    by most filesystems.

    find_get_page
    find_lock_page
    find_or_create_page
    grab_cache_page_nowait
    grab_cache_page_write_begin

    All of them are very similar in detail to the patch creates a core helper
    pagecache_get_page() which takes a flags parameter that affects its
    behavior such as whether the page should be marked accessed or not. Then
    old API is preserved but is basically a thin wrapper around this core
    function.

    Each of the filesystems are then updated to avoid calling
    mark_page_accessed when it is known that the VM interfaces have already
    done the job. There is a slight snag in that the timing of the
    mark_page_accessed() has now changed so in rare cases it's possible a page
    gets to the end of the LRU as PageReferenced where as previously it might
    have been repromoted. This is expected to be rare but it's worth the
    filesystem people thinking about it in case they see a problem with the
    timing change. It is also the case that some filesystems may be marking
    pages accessed that previously did not but it makes sense that filesystems
    have consistent behaviour in this regard.

    The test case used to evaulate this is a simple dd of a large file done
    multiple times with the file deleted on each iterations. The size of the
    file is 1/10th physical memory to avoid dirty page balancing. In the
    async case it will be possible that the workload completes without even
    hitting the disk and will have variable results but highlight the impact
    of mark_page_accessed for async IO. The sync results are expected to be
    more stable. The exception is tmpfs where the normal case is for the "IO"
    to not hit the disk.

    The test machine was single socket and UMA to avoid any scheduling or NUMA
    artifacts. Throughput and wall times are presented for sync IO, only wall
    times are shown for async as the granularity reported by dd and the
    variability is unsuitable for comparison. As async results were variable
    do to writback timings, I'm only reporting the maximum figures. The sync
    results were stable enough to make the mean and stddev uninteresting.

    The performance results are reported based on a run with no profiling.
    Profile data is based on a separate run with oprofile running.

    async dd
    3.15.0-rc3 3.15.0-rc3
    vanilla accessed-v2
    ext3 Max elapsed 13.9900 ( 0.00%) 11.5900 ( 17.16%)
    tmpfs Max elapsed 0.5100 ( 0.00%) 0.4900 ( 3.92%)
    btrfs Max elapsed 12.8100 ( 0.00%) 12.7800 ( 0.23%)
    ext4 Max elapsed 18.6000 ( 0.00%) 13.3400 ( 28.28%)
    xfs Max elapsed 12.5600 ( 0.00%) 2.0900 ( 83.36%)

    The XFS figure is a bit strange as it managed to avoid a worst case by
    sheer luck but the average figures looked reasonable.

    samples percentage
    ext3 86107 0.9783 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
    ext3 23833 0.2710 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
    ext3 5036 0.0573 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
    ext4 64566 0.8961 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
    ext4 5322 0.0713 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
    ext4 2869 0.0384 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
    xfs 62126 1.7675 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
    xfs 1904 0.0554 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
    xfs 103 0.0030 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
    btrfs 10655 0.1338 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
    btrfs 2020 0.0273 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
    btrfs 587 0.0079 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
    tmpfs 59562 3.2628 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
    tmpfs 1210 0.0696 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
    tmpfs 94 0.0054 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed

    [akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer]
    Signed-off-by: Mel Gorman
    Cc: Johannes Weiner
    Cc: Vlastimil Babka
    Cc: Jan Kara
    Cc: Michal Hocko
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Theodore Ts'o
    Cc: "Paul E. McKenney"
    Cc: Oleg Nesterov
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Tested-by: Prabhakar Lad
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Mel Gorman
    Signed-off-by: Jiri Slaby

    Mel Gorman
     
  • commit 9e8c2af96e0d2d5fe298dd796fb6bc16e888a48d upstream.

    ... it does that itself (via kmap_atomic())

    Signed-off-by: Al Viro
    Signed-off-by: Mel Gorman
    Signed-off-by: Jiri Slaby

    Al Viro
     

18 Sep, 2013

2 commits

  • A former patch introducing FUSE_I_SIZE_UNSTABLE flag provided detailed
    description of races between ftruncate and anyone who can extend i_size:

    > 1. As in the previous scenario fuse_dentry_revalidate() discovered that i_size
    > changed (due to our own fuse_do_setattr()) and is going to call
    > truncate_pagecache() for some 'new_size' it believes valid right now. But by
    > the time that particular truncate_pagecache() is called ...
    > 2. fuse_do_setattr() returns (either having called truncate_pagecache() or
    > not -- it doesn't matter).
    > 3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
    > 4. mmap-ed write makes a page in the extended region dirty.

    This patch adds necessary bits to fuse_file_fallocate() to protect from that
    race.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi
    Cc: stable@vger.kernel.org

    Maxim Patlasov
     
  • The patch fixes a race between mmap-ed write and fallocate(PUNCH_HOLE):

    1) An user makes a page dirty via mmap-ed write.
    2) The user performs fallocate(2) with mode == PUNCH_HOLE|KEEP_SIZE
    and covering the page.
    3) Before truncate_pagecache_range call from fuse_file_fallocate,
    the page goes to write-back. The page is fully processed by fuse_writepage
    (including end_page_writeback on the page), but fuse_flush_writepages did
    nothing because fi->writectr < 0.
    4) truncate_pagecache_range is called and fuse_file_fallocate is finishing
    by calling fuse_release_nowrite. The latter triggers processing queued
    write-back request which will write stale data to the hole soon.

    Changed in v2 (thanks to Brian for suggestion):
    - Do not truncate page cache until FUSE_FALLOCATE succeeded. Otherwise,
    we can end up in returning -ENOTSUPP while user data is already punched
    from page cache. Use filemap_write_and_wait_range() instead.
    Changed in v3 (thanks to Miklos for suggestion):
    - fuse_wait_on_writeback() is prone to livelocks; use fuse_set_nowrite()
    instead. So far as we need a dirty-page barrier only, fuse_sync_writes()
    should be enough.
    - rebased to for-linus branch of fuse.git

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi
    Cc: stable@vger.kernel.org

    Maxim Patlasov
     

03 Sep, 2013

2 commits

  • The way how fuse calls truncate_pagecache() from fuse_change_attributes()
    is completely wrong. Because, w/o i_mutex held, we never sure whether
    'oldsize' and 'attr->size' are valid by the time of execution of
    truncate_pagecache(inode, oldsize, attr->size). In fact, as soon as we
    released fc->lock in the middle of fuse_change_attributes(), we completely
    loose control of actions which may happen with given inode until we reach
    truncate_pagecache. The list of potentially dangerous actions includes
    mmap-ed reads and writes, ftruncate(2) and write(2) extending file size.

    The typical outcome of doing truncate_pagecache() with outdated arguments
    is data corruption from user point of view. This is (in some sense)
    acceptable in cases when the issue is triggered by a change of the file on
    the server (i.e. externally wrt fuse operation), but it is absolutely
    intolerable in scenarios when a single fuse client modifies a file without
    any external intervention. A real life case I discovered by fsx-linux
    looked like this:

    1. Shrinking ftruncate(2) comes to fuse_do_setattr(). The latter sends
    FUSE_SETATTR to the server synchronously, but before getting fc->lock ...
    2. fuse_dentry_revalidate() is asynchronously called. It sends FUSE_LOOKUP
    to the server synchronously, then calls fuse_change_attributes(). The
    latter updates i_size, releases fc->lock, but before comparing oldsize vs
    attr->size..
    3. fuse_do_setattr() from the first step proceeds by acquiring fc->lock and
    updating attributes and i_size, but now oldsize is equal to
    outarg.attr.size because i_size has just been updated (step 2). Hence,
    fuse_do_setattr() returns w/o calling truncate_pagecache().
    4. As soon as ftruncate(2) completes, the user extends file size by
    write(2) making a hole in the middle of file, then reads data from the hole
    either by read(2) or mmap-ed read. The user expects to get zero data from
    the hole, but gets stale data because truncate_pagecache() is not executed
    yet.

    The scenario above illustrates one side of the problem: not truncating the
    page cache even though we should. Another side corresponds to truncating
    page cache too late, when the state of inode changed significantly.
    Theoretically, the following is possible:

    1. As in the previous scenario fuse_dentry_revalidate() discovered that
    i_size changed (due to our own fuse_do_setattr()) and is going to call
    truncate_pagecache() for some 'new_size' it believes valid right now. But
    by the time that particular truncate_pagecache() is called ...
    2. fuse_do_setattr() returns (either having called truncate_pagecache() or
    not -- it doesn't matter).
    3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
    4. mmap-ed write makes a page in the extended region dirty.

    The result will be the lost of data user wrote on the fourth step.

    The patch is a hotfix resolving the issue in a simplistic way: let's skip
    dangerous i_size update and truncate_pagecache if an operation changing
    file size is in progress. This simplistic approach looks correct for the
    cases w/o external changes. And to handle them properly, more sophisticated
    and intrusive techniques (e.g. NFS-like one) would be required. I'd like to
    postpone it until the issue is well discussed on the mailing list(s).

    Changed in v2:
    - improved patch description to cover both sides of the issue.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi
    Cc: stable@vger.kernel.org

    Maxim Patlasov
     
  • The patch fixes a race between ftruncate(2), mmap-ed write and write(2):

    1) An user makes a page dirty via mmap-ed write.
    2) The user performs shrinking truncate(2) intended to purge the page.
    3) Before fuse_do_setattr calls truncate_pagecache, the page goes to
    writeback. fuse_writepage_locked fills FUSE_WRITE request and releases
    the original page by end_page_writeback.
    4) fuse_do_setattr() completes and successfully returns. Since now, i_mutex
    is free.
    5) Ordinary write(2) extends i_size back to cover the page. Note that
    fuse_send_write_pages do wait for fuse writeback, but for another
    page->index.
    6) fuse_writepage_locked proceeds by queueing FUSE_WRITE request.
    fuse_send_writepage is supposed to crop inarg->size of the request,
    but it doesn't because i_size has already been extended back.

    Moving end_page_writeback to the end of fuse_writepage_locked fixes the
    race because now the fact that truncate_pagecache is successfully returned
    infers that fuse_writepage_locked has already called end_page_writeback.
    And this, in turn, infers that fuse_flush_writepages has already called
    fuse_send_writepage, and the latter used valid (shrunk) i_size. write(2)
    could not extend it because of i_mutex held by ftruncate(2).

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi
    Cc: stable@vger.kernel.org

    Maxim Patlasov
     

04 Jul, 2013

1 commit

  • Pull second set of VFS changes from Al Viro:
    "Assorted f_pos race fixes, making do_splice_direct() safe to call with
    i_mutex on parent, O_TMPFILE support, Jeff's locks.c series,
    ->d_hash/->d_compare calling conventions changes from Linus, misc
    stuff all over the place."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    Document ->tmpfile()
    ext4: ->tmpfile() support
    vfs: export lseek_execute() to modules
    lseek_execute() doesn't need an inode passed to it
    block_dev: switch to fixed_size_llseek()
    cpqphp_sysfs: switch to fixed_size_llseek()
    tile-srom: switch to fixed_size_llseek()
    proc_powerpc: switch to fixed_size_llseek()
    ubi/cdev: switch to fixed_size_llseek()
    pci/proc: switch to fixed_size_llseek()
    isapnp: switch to fixed_size_llseek()
    lpfc: switch to fixed_size_llseek()
    locks: give the blocked_hash its own spinlock
    locks: add a new "lm_owner_key" lock operation
    locks: turn the blocked_list into a hashtable
    locks: convert fl_link to a hlist_node
    locks: avoid taking global lock if possible when waking up blocked waiters
    locks: protect most of the file_lock handling with i_lock
    locks: encapsulate the fl_link list handling
    locks: make "added" in __posix_lock_file a bool
    ...

    Linus Torvalds
     

29 Jun, 2013

1 commit


18 Jun, 2013

1 commit

  • Changing size of a file on server and local update (fuse_write_update_size)
    should be always protected by inode->i_mutex. Otherwise a race like this is
    possible:

    1. Process 'A' calls fallocate(2) to extend file (~FALLOC_FL_KEEP_SIZE).
    fuse_file_fallocate() sends FUSE_FALLOCATE request to the server.
    2. Process 'B' calls ftruncate(2) shrinking the file. fuse_do_setattr()
    sends shrinking FUSE_SETATTR request to the server and updates local i_size
    by i_size_write(inode, outarg.attr.size).
    3. Process 'A' resumes execution of fuse_file_fallocate() and calls
    fuse_write_update_size(inode, offset + length). But 'offset + length' was
    obsoleted by ftruncate from previous step.

    Changed in v2 (thanks Brian and Anand for suggestions):
    - made relation between mutex_lock() and fuse_set_nowrite(inode) more
    explicit and clear.
    - updated patch description to use ftruncate(2) in example

    Signed-off-by: Maxim V. Patlasov
    Reviewed-by: Brian Foster
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     

03 Jun, 2013

2 commits

  • The bug was introduced with async_dio feature: trying to optimize short reads,
    we cut number-of-bytes-to-read to i_size boundary. Hence the following example:

    truncate --size=300 /mnt/file
    dd if=/mnt/file of=/dev/null iflag=direct

    led to FUSE_READ request of 300 bytes size. This turned out to be problem
    for userspace fuse implementations who rely on assumption that kernel fuse
    does not change alignment of request from client FS.

    The patch turns off the optimization if async_dio is disabled. And, if it's
    enabled, the patch fixes adjustment of number-of-bytes-to-read to preserve
    alignment.

    Note, that we cannot throw out short read optimization entirely because
    otherwise a direct read of a huge size issued on a tiny file would generate
    a huge amount of fuse requests and most of them would be ACKed by userspace
    with zero bytes read.

    Signed-off-by: Maxim Patlasov
    Reviewed-by: Brian Foster
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • If request submission fails for an async request (i.e.,
    get_user_pages() returns -ERESTARTSYS), we currently skip the
    -EIOCBQUEUED return and drop into wait_for_sync_kiocb() forever.

    Avoid this by always returning -EIOCBQUEUED for async requests. If
    an error occurs, the error is passed into fuse_aio_complete(),
    returned via aio_complete() and thus propagated to userspace via
    io_getevents().

    Signed-off-by: Brian Foster
    Reviewed-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Brian Foster
     

20 May, 2013

2 commits

  • An fallocate request without FALLOC_FL_KEEP_SIZE set can extend the
    size of a file. Update the inode size after a successful fallocate.

    Also invalidate the inode attributes after a successful fallocate
    to ensure we pick up the latest attribute values (i.e., i_blocks).

    Signed-off-by: Brian Foster
    Signed-off-by: Miklos Szeredi

    Brian Foster
     
  • fuse supports hole punch via the fallocate() FALLOC_FL_PUNCH_HOLE
    interface. When a hole punch is passed through, the page cache
    is not cleared and thus allows reading stale data from the cache.

    This is easily demonstrable (using FOPEN_KEEP_CACHE) by reading a
    smallish random data file into cache, punching a hole and creating
    a copy of the file. Drop caches or remount and observe that the
    original file no longer matches the file copied after the hole
    punch. The original file contains a zeroed range and the latter
    file contains stale data.

    Protect against writepage requests in progress and punch out the
    associated page cache range after a successful client fs hole
    punch.

    Signed-off-by: Brian Foster
    Signed-off-by: Miklos Szeredi

    Brian Foster
     

15 May, 2013

1 commit

  • Commit 8b41e671 introduced explicit background checking for fuse_req
    structures with BUG_ON() checks for the appropriate type of request in
    in the associated send functions. Commit bcba24cc introduced the ability
    to send dio requests as background requests but does not update the
    request allocation based on the type of I/O request. As a result, a
    BUG_ON() triggers in the fuse_request_send_background() background path if
    an async I/O is sent.

    Allocate a request based on the async state of the fuse_io_priv to avoid
    the BUG.

    Signed-off-by: Brian Foster
    Signed-off-by: Miklos Szeredi

    Brian Foster
     

08 May, 2013

3 commits

  • Merge more incoming from Andrew Morton:

    - Various fixes which were stalled or which I picked up recently

    - A large rotorooting of the AIO code. Allegedly to improve
    performance but I don't really have good performance numbers (I might
    have lost the email) and I can't raise Kent today. I held this out
    of 3.9 and we could give it another cycle if it's all too late/scary.

    I ended up taking only the first two thirds of the AIO rotorooting. I
    left the percpu parts and the batch completion for later. - Linus

    * emailed patches from Andrew Morton : (33 commits)
    aio: don't include aio.h in sched.h
    aio: kill ki_retry
    aio: kill ki_key
    aio: give shared kioctx fields their own cachelines
    aio: kill struct aio_ring_info
    aio: kill batch allocation
    aio: change reqs_active to include unreaped completions
    aio: use cancellation list lazily
    aio: use flush_dcache_page()
    aio: make aio_read_evt() more efficient, convert to hrtimers
    wait: add wait_event_hrtimeout()
    aio: refcounting cleanup
    aio: make aio_put_req() lockless
    aio: do fget() after aio_get_req()
    aio: dprintk() -> pr_debug()
    aio: move private stuff out of aio.h
    aio: add kiocb_cancel()
    aio: kill return value of aio_complete()
    char: add aio_{read,write} to /dev/{null,zero}
    aio: remove retry-based AIO
    ...

    Linus Torvalds
     
  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • Pull fuse updates from Miklos Szeredi:
    "This contains two patchsets from Maxim Patlasov.

    The first reworks the request throttling so that only async requests
    are throttled. Wakeup of waiting async requests is also optimized.

    The second series adds support for async processing of direct IO which
    optimizes direct IO and enables the use of the AIO userspace
    interface."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: add flag to turn on async direct IO
    fuse: truncate file if async dio failed
    fuse: optimize short direct reads
    fuse: enable asynchronous processing direct IO
    fuse: make fuse_direct_io() aware about AIO
    fuse: add support of async IO
    fuse: move fuse_release_user_pages() up
    fuse: optimize wake_up
    fuse: implement exclusive wakeup for blocked_waitq
    fuse: skip blocking on allocations of synchronous requests
    fuse: add flag fc->initialized
    fuse: make request allocations for background processing explicit

    Linus Torvalds
     

01 May, 2013

1 commit


18 Apr, 2013

6 commits

  • The patch improves error handling in fuse_direct_IO(): if we successfully
    submitted several fuse requests on behalf of synchronous direct write
    extending file and some of them failed, let's try to do our best to clean-up.

    Changed in v2: reuse fuse_do_setattr(). Thanks to Brian for suggestion.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • If user requested direct read beyond EOF, we can skip sending fuse requests
    for positions beyond EOF because userspace would ACK them with zero bytes read
    anyway. We can trust to i_size in fuse_direct_IO for such cases because it's
    called from fuse_file_aio_read() and the latter updates fuse attributes
    including i_size.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • In case of synchronous DIO request (i.e. read(2) or write(2) for a file
    opened with O_DIRECT), the patch submits fuse requests asynchronously, but
    waits for their completions before return from fuse_direct_IO().

    In case of asynchronous DIO request (i.e. libaio io_submit() or a file opened
    with O_DIRECT), the patch submits fuse requests asynchronously and return
    -EIOCBQUEUED immediately.

    The only special case is async DIO extending file. Here the patch falls back
    to old behaviour because we can't return -EIOCBQUEUED and update i_size later,
    without i_mutex hold. And we have no method to wait on real async I/O
    requests.

    The patch also clean __fuse_direct_write() up: it's better to update i_size
    in its callers. Thanks Brian for suggestion.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • The patch implements passing "struct fuse_io_priv *io" down the stack up to
    fuse_send_read/write where it is used to submit request asynchronously.
    io->async==0 designates synchronous processing.

    Non-trivial part of the patch is changes in fuse_direct_io(): resources
    like fuse requests and user pages cannot be released immediately in async
    case.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • The patch implements a framework to process an IO request asynchronously. The
    idea is to associate several fuse requests with a single kiocb by means of
    fuse_io_priv structure. The structure plays the same role for FUSE as 'struct
    dio' for direct-io.c.

    The framework is supposed to be used like this:
    - someone (who wants to process an IO asynchronously) allocates fuse_io_priv
    and initializes it setting 'async' field to non-zero value.
    - as soon as fuse request is filled, it can be submitted (in non-blocking way)
    by fuse_async_req_send()
    - when all submitted requests are ACKed by userspace, io->reqs drops to zero
    triggering aio_complete()

    In case of IO initiated by libaio, aio_complete() will finish processing the
    same way as in case of dio_complete() calling aio_complete(). But the
    framework may be also used for internal FUSE use when initial IO request
    was synchronous (from user perspective), but it's beneficial to process it
    asynchronously. Then the caller should wait on kiocb explicitly and
    aio_complete() will wake the caller up.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • fuse_release_user_pages() will be indirectly used by fuse_send_read/write
    in future patches.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     

17 Apr, 2013

1 commit

  • There are two types of processing requests in FUSE: synchronous (via
    fuse_request_send()) and asynchronous (via adding to fc->bg_queue).

    Fortunately, the type of processing is always known in advance, at the time
    of request allocation. This preparatory patch utilizes this fact making
    fuse_get_req() aware about the type. Next patches will use it.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     

10 Apr, 2013

1 commit


28 Feb, 2013

1 commit


04 Feb, 2013

1 commit


01 Feb, 2013

1 commit

  • Commit c69e8d9c0 added rcu lock to fuse/dir.c It was assuming
    that 'task' is some other process but in fact this parameter always
    equals to 'current'. Inline this parameter to make it more readable
    and remove RCU lock as it is not needed when access current process
    credentials.

    Signed-off-by: Anatol Pomozov
    Signed-off-by: Miklos Szeredi

    Anatol Pomozov
     

24 Jan, 2013

11 commits

  • Fix the following sparse warnings:

    fs/fuse/file.c:1216:43: warning: cast removes address space of expression
    fs/fuse/file.c:1216:43: warning: incorrect type in initializer (different address spaces)
    fs/fuse/file.c:1216:43: expected void [noderef] *iov_base
    fs/fuse/file.c:1216:43: got void *
    fs/fuse/file.c:1241:43: warning: cast removes address space of expression
    fs/fuse/file.c:1241:43: warning: incorrect type in initializer (different address spaces)
    fs/fuse/file.c:1241:43: expected void [noderef] *iov_base
    fs/fuse/file.c:1241:43: got void *
    fs/fuse/file.c:1267:43: warning: cast removes address space of expression
    fs/fuse/file.c:1267:43: warning: incorrect type in initializer (different address spaces)
    fs/fuse/file.c:1267:43: expected void [noderef] *iov_base
    fs/fuse/file.c:1267:43: got void *

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • __fuse_direct_io() allocates fuse-requests by calling fuse_get_req(fc, n). The
    patch calculates 'n' based on iov[] array. This is useful because allocating
    FUSE_MAX_PAGES_PER_REQ page pointers and descriptors for each fuse request
    would be waste of memory in case of iov-s of smaller size.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • Let fuse_get_user_pages() pack as many iov-s to a single fuse_req as
    possible. This is very beneficial in case of iov[] consisting of many
    iov-s of relatively small sizes (e.g. PAGE_SIZE).

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • The patch makes preliminary work for the next patch optimizing scatter-gather
    direct IO. The idea is to allow fuse_get_user_pages() to pack as many iov-s
    to each fuse request as possible. So, here we only rework all related
    call-paths to carry iov[] from fuse_direct_IO() to fuse_get_user_pages().

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • Previously, anyone who set flag 'argpages' only filled req->pages[] and set
    per-request page_offset. This patch re-works all cases where argpages=1 to
    fill req->page_descs[] properly.

    Having req->page_descs[] filled properly allows to re-work fuse_copy_pages()
    to copy page fragments described by req->page_descs[]. This will be useful
    for next patches optimizing direct_IO.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • The ability to save page pointers along with lengths and offsets in fuse_req
    will be useful to cover several iovec-s with a single fuse_req.

    Per-request page_offset is removed because anybody who need it can use
    req->page_descs[0].offset instead.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • fuse_do_ioctl() already calculates the number of pages it's going to use. It is
    stored in 'num_pages' variable. So the patch simply uses it for allocating
    fuse_req.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • The patch allocates as many page pointers in fuse_req as needed to cover
    interval [pos .. pos+len-1]. Inline helper fuse_wr_pages() is introduced
    to hide this cumbersome arithmetic.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • The patch uses 'nr_pages' argument of fuse_readpages() as heuristics for the
    number of page pointers to allocate.

    This can be improved further by taking in consideration fc->max_read and gaps
    between page indices, but it's not clear whether it's worthy or not.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • The patch categorizes all fuse_get_req() invocations into two categories:
    - fuse_get_req_nopages(fc) - when caller doesn't care about req->pages
    - fuse_get_req(fc, n) - when caller need n page pointers (n > 0)

    Adding fuse_get_req_nopages() helps to avoid numerous fuse_get_req(fc, 0)
    scattered over code. Now it's clear from the first glance when a caller need
    fuse_req with page pointers.

    The patch doesn't make any logic changes. In multi-page case, it silly
    allocates array of FUSE_MAX_PAGES_PER_REQ page pointers. This will be amended
    by future patches.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • The patch removes inline array of FUSE_MAX_PAGES_PER_REQ page pointers from
    fuse_req. Instead of that, req->pages may now point either to small inline
    array or to an array allocated dynamically.

    This essentially means that all callers of fuse_request_alloc[_nofs] should
    pass the number of pages needed explicitly.

    The patch doesn't make any logic changes.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov