08 Apr, 2014

1 commit

  • Commit f9acc8c7b35a ("readahead: sanify file_ra_state names") left
    ra_submit with a single function call.

    Move ra_submit to internal.h and inline it to save some stack. Thanks
    to Andrew Morton for commenting different versions.

    Signed-off-by: Fabian Frederick
    Suggested-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     

04 Apr, 2014

3 commits

  • Currently max_sane_readahead() returns zero on the cpu whose NUMA node
    has no local memory which leads to readahead failure. Fix this
    readahead failure by returning minimum of (requested pages, 512). Users
    running applications on a memory-less cpu which needs readahead such as
    streaming application see considerable boost in the performance.

    Result:

    fadvise experiment with FADV_WILLNEED on a PPC machine having memoryless
    CPU with 1GB testfile (12 iterations) yielded around 46.66% improvement.

    fadvise experiment with FADV_WILLNEED on a x240 machine with 1GB
    testfile 32GB* 4G RAM numa machine (12 iterations) showed no impact on
    the normal NUMA cases w/ patch.

    Kernel Avg Stddev
    base 7.4975 3.92%
    patched 7.4174 3.26%

    [Andrew: making return value PAGE_SIZE independent]
    Suggested-by: Linus Torvalds
    Signed-off-by: Raghavendra K T
    Acked-by: Jan Kara
    Cc: Wu Fengguang
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Raghavendra K T
     
  • shmem mappings already contain exceptional entries where swap slot
    information is remembered.

    To be able to store eviction information for regular page cache, prepare
    every site dealing with the radix trees directly to handle entries other
    than pages.

    The common lookup functions will filter out non-page entries and return
    NULL for page cache holes, just as before. But provide a raw version of
    the API which returns non-page entries as well, and switch shmem over to
    use it.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The radix tree hole searching code is only used for page cache, for
    example the readahead code trying to get a a picture of the area
    surrounding a fault.

    It sufficed to rely on the radix tree definition of holes, which is
    "empty tree slot". But this is about to change, though, as shadow page
    descriptors will be stored in the page cache after the actual pages get
    evicted from memory.

    Move the functions over to mm/filemap.c and make them native page cache
    operations, where they can later be adapted to handle the new definition
    of "page cache hole".

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Acked-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

30 Jan, 2014

1 commit

  • Commit 63d0f0a3c7e1 ("mm/readahead.c:do_readhead(): don't check for
    ->readpage") unintentionally made do_readahead return 0 for all valid
    files regardless of whether readahead was supported, rather than the
    expected -EINVAL. This gets forwarded on to userspace, and results in
    sys_readahead appearing to succeed in cases that don't make sense (e.g.
    when called on pipes or sockets). This issue is detected by the LTP
    readahead01 testcase.

    As the exact return value of force_page_cache_readahead is currently
    never used, we can simplify it to return only 0 or -EINVAL (when
    readpage or readpages is missing). With that in place we can simply
    forward on the return value of force_page_cache_readahead in
    do_readahead.

    This patch performs said change, restoring the expected semantics.

    Signed-off-by: Mark Rutland
    Acked-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Rutland
     

13 Nov, 2013

2 commits

  • The kernel's readahead algorithm sometimes interprets random read
    accesses as sequential and triggers unnecessary data prefecthing from
    storage device (impacting random read average latency).

    In order to identify sequential cache read misses, the readahead
    algorithm intends to check whether offset - previous offset == 1
    (trivial sequential reads) or offset - previous offset == 0 (sequential
    reads not aligned on page boundary):

    if (offset - (ra->prev_pos >> PAGE_CACHE_SHIFT) current offset (which happens on random pattern), the if
    condition is true and access is wrongly interpeted as sequential. An
    unnecessary data prefetching is triggered, impacting the average random
    read latency.

    Storing the previous offset value in a "pgoff_t" variable (unsigned
    long) fixes the sequential read detection logic.

    Signed-off-by: Damien Ramonda
    Reviewed-by: Fengguang Wu
    Acked-by: Pierre Tardy
    Acked-by: David Cohen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Damien Ramonda
     
  • The callee force_page_cache_readahead() already does this and unlike
    do_readahead(), force_page_cache_readahead() remembers to check for
    ->readpages() as well.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

12 Sep, 2013

1 commit

  • This helps performance on moderately dense random reads on SSD.

    Transaction-Per-Second numbers provided by Taobao:

    QPS case
    -------------------------------------------------------
    7536 disable context readahead totally
    w/ patch: 7129 slower size rampup and start RA on the 3rd read
    6717 slower size rampup
    w/o patch: 5581 unmodified context readahead

    Before, readahead will be started whenever reading page N+1 when it happen
    to read N recently. After patch, we'll only start readahead when *three*
    random reads happen to access pages N, N+1, N+2. The probability of this
    happening is extremely low for pure random reads, unless they are very
    dense, which actually deserves some readahead.

    Also start with a smaller readahead window. The impact to interleaved
    sequential reads should be small, because for a long run stream, the the
    small readahead window rampup phase is negletable.

    The context readahead actually benefits clustered random reads on HDD
    whose seek cost is pretty high. However as SSD is increasingly used for
    random read workloads it's better for the context readahead to concentrate
    on interleaved sequential reads.

    Another SSD rand read test from Miao

    # file size: 2GB
    # read IO amount: 625MB
    sysbench --test=fileio \
    --max-requests=10000 \
    --num-threads=1 \
    --file-num=1 \
    --file-block-size=64K \
    --file-test-mode=rndrd \
    --file-fsync-freq=0 \
    --file-fsync-end=off run

    shows the performance of btrfs grows up from 69MB/s to 121MB/s, ext4 from
    104MB/s to 121MB/s.

    Signed-off-by: Wu Fengguang
    Tested-by: Tao Ma
    Tested-by: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     

22 May, 2013

1 commit

  • Currently there is no way to truncate partial page where the end
    truncate point is not at the end of the page. This is because it was not
    needed and the functionality was enough for file system truncate
    operation to work properly. However more file systems now support punch
    hole feature and it can benefit from mm supporting truncating page just
    up to the certain point.

    Specifically, with this functionality truncate_inode_pages_range() can
    be changed so it supports truncating partial page at the end of the
    range (currently it will BUG_ON() if 'end' is not at the end of the
    page).

    This commit changes the invalidatepage() address space operation
    prototype to accept range to be invalidated and update all the instances
    for it.

    We also change the block_invalidatepage() in the same way and actually
    make a use of the new length argument implementing range invalidation.

    Actual file system implementations will follow except the file systems
    where the changes are really simple and should not change the behaviour
    in any way .Implementation for truncate_page_range() which will be able
    to accept page unaligned ranges will follow as well.

    Signed-off-by: Lukas Czerner
    Cc: Andrew Morton
    Cc: Hugh Dickins

    Lukas Czerner
     

04 Mar, 2013

1 commit


27 Sep, 2012

2 commits


30 May, 2012

1 commit


31 Oct, 2011

1 commit


25 May, 2011

1 commit

  • Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations.

    readahead page allocations are completely optional. They are OK to fail
    and in particular shall not trigger OOM on themselves.

    Reported-by: Dave Young
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Wu Fengguang
    Reviewed-by: Minchan Kim
    Reviewed-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

10 Mar, 2011

2 commits


25 May, 2010

1 commit


07 Apr, 2010

1 commit

  • btrfs relocate_file_extent_cluster() calls us with NULL filp:

    [ 4005.426805] BUG: unable to handle kernel NULL pointer dereference at 00000021
    [ 4005.426818] IP: [] page_cache_sync_readahead+0x18/0x3e

    Signed-off-by: Wu Fengguang
    Cc: Yan Zheng
    Reported-by: Kirill A. Shutemov
    Tested-by: Kirill A. Shutemov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

07 Mar, 2010

1 commit

  • This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.

    POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance:
    a 16K read will be carried out in 4 _sync_ 1-page reads.

    In other places, ra_pages==0 means
    - it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
    - some IO error happened
    where multi-page read IO won't help or should be avoided.

    POSIX_FADV_RANDOM actually want a different semantics: to disable the
    *heuristic* readahead algorithm, and to use a dumb one which faithfully
    submit read IO for whatever application requests.

    So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.

    Note that the random hint is not likely to help random reads performance
    noticeably. And it may be too permissive on huge request size (its IO
    size is not limited by read_ahead_kb).

    In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
    (NFS read) performance of the application increased by 313%!

    Tested-by: Quentin Barnes
    Signed-off-by: Wu Fengguang
    Cc: Nick Piggin
    Cc: Andi Kleen
    Cc: Steven Whitehouse
    Cc: David Howells
    Cc: Jonathan Corbet
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Trond Myklebust
    Cc: Chuck Lever
    Cc: [2.6.33.x]
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

18 Dec, 2009

1 commit

  • I added blk_run_backing_dev on page_cache_async_readahead so readahead I/O
    is unpluged to improve throughput on especially RAID environment.

    The normal case is, if page N become uptodate at time T(N), then T(N)
    Acked-by: Wu Fengguang
    Cc: Jens Axboe
    Cc: KOSAKI Motohiro
    Tested-by: Ronald
    Cc: Bart Van Assche
    Cc: Vladislav Bolkhovitin
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hisashi Hifumi
     

17 Jun, 2009

8 commits

  • Introduce page cache context based readahead algorithm.
    This is to better support concurrent read streams in general.

    RATIONALE
    ---------
    The current readahead algorithm detects interleaved reads in a _passive_ way.
    Given a sequence of interleaved streams 1,1001,2,1002,3,4,1003,5,1004,1005,6,...
    By checking for (offset == prev_offset + 1), it will discover the sequentialness
    between 3,4 and between 1004,1005, and start doing sequential readahead for the
    individual streams since page 4 and page 1005.

    The context readahead algorithm guarantees to discover the sequentialness no
    matter how the streams are interleaved. For the above example, it will start
    sequential readahead since page 2 and 1002.

    The trick is to poke for page @offset-1 in the page cache when it has no other
    clues on the sequentialness of request @offset: if the current requenst belongs
    to a sequential stream, that stream must have accessed page @offset-1 recently,
    and the page will still be cached now. So if page @offset-1 is there, we can
    take request @offset as a sequential access.

    BENEFICIARIES
    -------------
    - strictly interleaved reads i.e. 1,1001,2,1002,3,1003,...
    the current readahead will take them as silly random reads;
    the context readahead will take them as two sequential streams.

    - cooperative IO processes i.e. NFS and SCST
    They create a thread pool, farming off (sequential) IO requests to different
    threads which will be performing interleaved IO.

    It was not easy(or possible) to reliably tell from file->f_ra all those
    cooperative processes working on the same sequential stream, since they will
    have different file->f_ra instances. And NFSD's file->f_ra is particularly
    unusable, since their file objects are dynamically created for each request.
    The nfsd does have code trying to restore the f_ra bits, but not satisfactory.

    The new scheme is to detect the sequential pattern via looking up the page
    cache, which provides one single and consistent view of the pages recently
    accessed. That makes sequential detection for cooperative processes possible.

    USER REPORT
    -----------
    Vladislav recommends the addition of context readahead as a result of his SCST
    benchmarks. It leads to 6%~40% performance gains in various cases and achieves
    equal performance in others. http://lkml.org/lkml/2009/3/19/239

    OVERHEADS
    ---------
    In theory, it introduces one extra page cache lookup per random read. However
    the below benchmark shows context readahead to be slightly faster, wondering..

    Randomly reading 200MB amount of data on a sparse file, repeat 20 times for
    each block size. The average throughputs are:

    original ra context ra gain
    4K random reads: 65.561MB/s 65.648MB/s +0.1%
    16K random reads: 124.767MB/s 124.951MB/s +0.1%
    64K random reads: 162.123MB/s 162.278MB/s +0.1%

    Cc: Jens Axboe
    Cc: Jeff Moyer
    Tested-by: Vladislav Bolkhovitin
    Signed-off-by: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Split all readahead cases, and move the random one to bottom.

    No behavior changes.

    This is to prepare for the introduction of context readahead, and make it
    easy for inserting accounting/tracing points for each case.

    Signed-off-by: Wu Fengguang
    Cc: Vladislav Bolkhovitin
    Cc: Jens Axboe
    Cc: Jeff Moyer
    Cc: Nick Piggin
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Mmap read-around now shares the same code style and data structure with
    readahead code.

    This also removes do_page_cache_readahead(). Its last user, mmap
    read-around, has been changed to call ra_submit().

    The no-readahead-if-congested logic is dumped by the way. Users will be
    pretty sensitive about the slow loading of executables. So it's
    unfavorable to disabled mmap read-around on a congested queue.

    [akpm@linux-foundation.org: coding-style fixes]
    Cc: Nick Piggin
    Signed-off-by: Fengguang Wu
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • The readahead call scheme is error-prone in that it expects the call sites
    to check for async readahead after doing a sync one. I.e.

    if (!page)
    page_cache_sync_readahead();
    page = find_get_page();
    if (page && PageReadahead(page))
    page_cache_async_readahead();

    This is because PG_readahead could be set by a sync readahead for the
    _current_ newly faulted in page, and the readahead code simply expects one
    more callback on the same page to start the async readahead. If the
    caller fails to do so, it will miss the PG_readahead bits and never able
    to start an async readahead.

    Eliminate this insane constraint by piggy-backing the async part into the
    current readahead window.

    Now if an async readahead should be started immediately after a sync one,
    the readahead logic itself will do it. So the following code becomes
    valid: (the 'else' in particular)

    if (!page)
    page_cache_sync_readahead();
    else if (PageReadahead(page))
    page_cache_async_readahead();

    Cc: Nick Piggin
    Signed-off-by: Wu Fengguang
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Make sure interleaved readahead size is larger than request size. This
    also makes the readahead window grow up more quickly.

    Reported-by: Xu Chenfeng
    Signed-off-by: Wu Fengguang
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • (hit_readahead_marker != 0) means the page at @offset is present, so we
    can search for non-present page starting from @offset+1.

    Reported-by: Xu Chenfeng
    Signed-off-by: Wu Fengguang
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Just in case someone aggressively sets a huge readahead size.

    Cc: Nick Piggin
    Signed-off-by: Wu Fengguang
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Impact: code simplification.

    Cc: Nick Piggin
    Signed-off-by: Wu Fengguang
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

03 Apr, 2009

2 commits

  • Recruit a page flag to aid in cache management. The following extra flag is
    defined:

    (1) PG_fscache (PG_private_2)

    The marked page is backed by a local cache and is pinning resources in the
    cache driver.

    If PG_fscache is set, then things that checked for PG_private will now also
    check for that. This includes things like truncation and page invalidation.
    The function page_has_private() had been added to make the checks for both
    PG_private and PG_private_2 at the same time.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Rik van Riel
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     
  • The attached patch causes read_cache_pages() to release page-private data on a
    page for which add_to_page_cache() fails. If the filler function fails, then
    the problematic page is left attached to the pagecache (with appropriate flags
    set, one presumes) and the remaining to-be-attached pages are invalidated and
    discarded. This permits pages with caching references associated with them to
    be cleaned up.

    The invalidatepage() address space op is called (indirectly) to do the honours.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Rik van Riel
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     

26 Mar, 2009

1 commit


20 Oct, 2008

1 commit

  • Split the LRU lists in two, one set for pages that are backed by real file
    systems ("file") and one for pages that are backed by memory and swap
    ("anon"). The latter includes tmpfs.

    The advantage of doing this is that the VM will not have to scan over lots
    of anonymous pages (which we generally do not want to swap out), just to
    find the page cache pages that it should evict.

    This patch has the infrastructure and a basic policy to balance how much
    we scan the anon lists and how much we scan the file lists. The big
    policy changes are in separate patches.

    [lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset]
    [kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru]
    [kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page]
    [hugh@veritas.com: memcg swapbacked pages active]
    [hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED]
    [akpm@linux-foundation.org: fix /proc/vmstat units]
    [nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration]
    [kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo]
    [kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()]
    Signed-off-by: Rik van Riel
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Hugh Dickins
    Signed-off-by: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     

17 Oct, 2008

1 commit


27 Jul, 2008

1 commit

  • radix_tree_next_hole() is implemented as a series of radix_tree_lookup()s.
    So it can be called locklessly, under rcu_read_lock().

    Signed-off-by: Nick Piggin
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Hugh Dickins
    Cc: "Paul E. McKenney"
    Reviewed-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

30 Apr, 2008

1 commit

  • Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info object.
    This allows us to see and set the various BDI specific variables.

    In particular this properly exposes the read-ahead window for all relevant
    users and /sys/block//queue/read_ahead_kb should be deprecated.

    With patient help from Kay Sievers and Greg KH

    [mszeredi@suse.cz]

    - split off NFS and FUSE changes into separate patches
    - document new sysfs attributes under Documentation/ABI
    - do bdi_class_init as a core_initcall, otherwise the "default" BDI
    won't be initialized
    - remove bdi_init_fmt macro, it's not used very much

    [akpm@linux-foundation.org: fix ia64 warning]
    Signed-off-by: Peter Zijlstra
    Cc: Kay Sievers
    Acked-by: Greg KH
    Cc: Trond Myklebust
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

20 Mar, 2008

1 commit

  • Fix kernel-doc notation in mm/readahead.c.

    Change ":" to ";" so that it doesn't get treated as a doc section heading.
    Move the comment block ending "*/" to a line by itself so that the text on
    that last line is not lost (dropped).

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

17 Oct, 2007

2 commits

  • provide BDI constructor/destructor hooks

    [akpm@linux-foundation.org: compile fix]
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Quite a bit of code is used in maintaining these "cached pages" that are
    probably pretty unlikely to get used. It would require a narrow race where
    the page is inserted concurrently while this process is allocating a page
    in order to create the spare page. Then a multi-page write into an uncached
    part of the file, to make use of it.

    Next, the buffered write path (and others) uses its own LRU pagevec when it
    should be just using the per-CPU LRU pagevec (which will cut down on both data
    and code size cacheline footprint). Also, these private LRU pagevecs are
    emptied after just a very short time, in contrast with the per-CPU pagevecs
    that are persistent. Net result: 7.3 times fewer lru_lock acquisitions required
    to add the pages to pagecache for a bulk write (in 4K chunks).

    [this gets rid of some cond_resched() calls in readahead.c and mpage.c due
    to clashes in -mm. What put them there, and why? ]

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin