27 Jul, 2008

1 commit

  • Use get_user_pages_fast in the common/generic block and fs direct IO paths.

    Signed-off-by: Nick Piggin
    Cc: Dave Kleikamp
    Cc: Andy Whitcroft
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Andi Kleen
    Cc: Dave Kleikamp
    Cc: Badari Pulavarty
    Cc: Zach Brown
    Cc: Jens Axboe
    Reviewed-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

06 Feb, 2008

1 commit

  • Simplify page cache zeroing of segments of pages through 3 functions

    zero_user_segments(page, start1, end1, start2, end2)

    Zeros two segments of the page. It takes the position where to
    start and end the zeroing which avoids length calculations and
    makes code clearer.

    zero_user_segment(page, start, end)

    Same for a single segment.

    zero_user(page, start, length)

    Length variant for the case where we know the length.

    We remove the zero_user_page macro. Issues:

    1. Its a macro. Inline functions are preferable.

    2. The KM_USER0 macro is only defined for HIGHMEM.

    Having to treat this special case everywhere makes the
    code needlessly complex. The parameter for zeroing is always
    KM_USER0 except in one single case that we open code.

    Avoiding KM_USER0 makes a lot of code not having to be dealing
    with the special casing for HIGHMEM anymore. Dealing with
    kmap is only necessary for HIGHMEM configurations. In those
    configurations we use KM_USER0 like we do for a series of other
    functions defined in highmem.h.

    Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
    function could not be a macro. zero_user_* functions introduced
    here can be be inline because that constant is not used when these
    functions are called.

    Also extract the flushing of the caches to be outside of the kmap.

    [akpm@linux-foundation.org: fix nfs and ntfs build]
    [akpm@linux-foundation.org: fix ntfs build some more]
    Signed-off-by: Christoph Lameter
    Cc: Steven French
    Cc: Michael Halcrow
    Cc:
    Cc: Steven Whitehouse
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: David Chinner
    Cc: Michael Halcrow
    Cc: Steven French
    Cc: Steven Whitehouse
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

17 Oct, 2007

1 commit

  • The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note

    A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
    (and thus mapcounted and count towards shared rss). These writes to
    the struct page could cause excessive cacheline bouncing on big
    systems. There are a number of ways this could be addressed if it is
    an issue.

    And indeed this cacheline bouncing has shown up on large SGI systems.
    There was a situation where an Altix system was essentially livelocked
    tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
    This situation can be avoided in userspace, but it does highlight the
    potential scalability problem with refcounting ZERO_PAGE, and corner
    cases where it can really hurt (we don't want the system to livelock!).

    There are several broad ways to fix this problem:
    1. add back some special casing to avoid refcounting ZERO_PAGE
    2. per-node or per-cpu ZERO_PAGES
    3. remove the ZERO_PAGE completely

    I will argue for 3. The others should also fix the problem, but they
    result in more complex code than does 3, with little or no real benefit
    that I can see.

    Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
    false optimisation: if an application is performance critical, it would
    not be doing many read faults of new memory, or at least it could be
    expected to write to that memory soon afterwards. If cache or memory use
    is critical, it should not be working with a significant number of
    ZERO_PAGEs anyway (a more compact representation of zeroes should be
    used).

    As a sanity check -- mesuring on my desktop system, there are never many
    mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
    increase much without it.

    When running a make -j4 kernel compile on my dual core system, there are
    about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
    ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
    is torn down without being COWed). So removing ZERO_PAGE will save 1,000
    page faults per second when running kbuild, while keeping it only saves
    less than 1 page clearing operation per second. 1 page clear is cheaper
    than a thousand faults, presumably, so there isn't an obvious loss.

    Neither the logical argument nor these basic tests give a guarantee of no
    regressions. However, this is a reasonable opportunity to try to remove
    the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
    we can reintroduce it and just avoid refcounting it.

    The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see
    much use to them except on benchmarks. All other users of ZERO_PAGE are
    converted just to use ZERO_PAGE(0) for simplicity. We can look at
    replacing them all and maybe ripping out ZERO_PAGE completely when we are
    more satisfied with this solution.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus "snif" Torvalds

    Nick Piggin
     

10 Oct, 2007

1 commit

  • As bi_end_io is only called once when the reqeust is complete,
    the 'size' argument is now redundant. Remove it.

    Now there is no need for bio_endio to subtract the size completed
    from bi_size. So don't do that either.

    While we are at it, change bi_end_io to return void.

    Signed-off-by: Neil Brown
    Signed-off-by: Jens Axboe

    NeilBrown
     

21 Aug, 2007

1 commit

  • This patch uses kzalloc to zero all of struct dio rather than manually
    trying to track which fields we rely on being zero. It passed aio+dio
    stress testing and some bug regression testing on ext3.

    This patch was introduced by Linus in the conversation that lead up to
    Badari's minimal fix to manually zero .map_bh.b_state in commit:

    6a648fa72161d1f6468dabd96c5d3c0db04f598a

    It makes the code a bit smaller. Maybe a couple fewer cachelines to
    load, if we're lucky:

    text data bss dec hex filename
    3285925 568506 1304616 5159047 4eb887 vmlinux
    3285797 568506 1304616 5158919 4eb807 vmlinux.patched

    I was unable to measure a stable difference in the number of cpu cycles
    spent in blockdev_direct_IO() when pushing aio+dio 256K reads at
    ~340MB/s.

    So the resulting intent of the patch isn't a performance gain but to
    avoid exposing ourselves to the risk of finding another field like
    .map_bh.b_state where we rely on zeroing but don't enforce it in the
    code.

    Signed-off-by: Zach Brown
    Signed-off-by: Linus Torvalds

    Zach Brown
     

12 Aug, 2007

1 commit

  • Need to initialize map_bh.b_state to zero. Otherwise, in case of a faulty
    user-buffer its possible to go into dio_zero_block() and submit a page by
    mistake - since it checks for buffer_new().

    http://marc.info/?l=linux-kernel&m=118551339032528&w=2

    akpm: Linus had a (better) patch to just do a kzalloc() in there, but it got
    lost. Probably this version is better for -stable anwyay.

    Signed-off-by: Badari Pulavarty
    Acked-by: Joe Jin
    Acked-by: Zach Brown
    Cc: gurudas pai
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

04 Jul, 2007

1 commit

  • Badari Pulavarty reported a case of this BUG_ON is triggering during
    testing. It's completely bogus and should be removed.

    It's trying to notice if we left references to the dio hanging around in
    the sync case. They should have been dropped as IO completed while this
    path was in dio_await_completion(). This condition will also be
    checked, via some twisty logic, by the BUG_ON(ret != -EIOCBQUEUED) a few
    lines lower. So to start this BUG_ON() is redundant.

    More fatally, it's dereferencing dio-> after having dropped its
    reference. It's only safe to dereference the dio after releasing the
    lock if the final reference was just dropped. Another CPU might free
    the dio in bio completion and reuse the memory after this path drops the
    dio lock but before the BUG_ON() is evaluated.

    This patch passed aio+dio regression unit tests and aio-stress on ext3.

    Signed-off-by: Zach Brown
    Cc: Badari Pulavarty
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     

10 May, 2007

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
    sound: convert "sound" subdirectory to UTF-8
    MAINTAINERS: Add cxacru website/mailing list
    include files: convert "include" subdirectory to UTF-8
    general: convert "kernel" subdirectory to UTF-8
    documentation: convert the Documentation directory to UTF-8
    Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
    remove broken URLs from net drivers' output
    Magic number prefix consistency change to Documentation/magic-number.txt
    trivial: s/i_sem /i_mutex/
    fix file specification in comments
    drivers/base/platform.c: fix small typo in doc
    misc doc and kconfig typos
    Remove obsolete fat_cvf help text
    Fix occurrences of "the the "
    Fix minor typoes in kernel/module.c
    Kconfig: Remove reference to external mqueue library
    Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
    Correct comments in genrtc.c to refer to correct /proc file.
    Fix more "deprecated" spellos.
    Fix "deprecated" typoes.
    ...

    Fix trivial comment conflict in kernel/relay.c.

    Linus Torvalds
     
  • It's very common for file systems to need to zero part or all of a page,
    the simplist way is just to use kmap_atomic() and memset(). There's
    actually a library function in include/linux/highmem.h that does exactly
    that, but it's confusingly named memclear_highpage_flush(), which is
    descriptive of *how* it does the work rather than what the *purpose* is.
    So this patchset renames the function to zero_user_page(), and calls it
    from the various places that currently open code it.

    This first patch introduces the new function call, and converts all the
    core kernel callsites, both the open-coded ones and the old
    memclear_highpage_flush() ones. Following this patch is a series of
    conversions for each file system individually, per AKPM, and finally a
    patch deprecating the old call. The diffstat below shows the entire
    patchset.

    [akpm@linux-foundation.org: fix a few things]
    Signed-off-by: Nate Diller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nate Diller
     

09 May, 2007

1 commit


11 Dec, 2006

7 commits

  • The wait_for_more_bios() function name was poorly chosen. While looking to
    clean it up it I noticed that the dio struct refcounting between the bio
    completion and dio submission paths was racey.

    The bio submission path was simply freeing the dio struct if
    atomic_dec_and_test() indicated that it dropped the final reference.

    The aio bio completion path was dereferencing its dio struct pointer *after
    dropping its reference* based on the remaining number of references.

    These two paths could race and result in the aio bio completion path
    dereferencing a freed dio, though this was not observed in the wild.

    This moves the refcount under the bio lock so that bio completion can drop
    its reference and decide to wake all in one atomic step.

    Once testing and waking is locked dio_await_one() can test its sleeping
    condition and mark itself uninterruptible under the lock. It gets simpler
    and wait_for_more_bios() disappears.

    The addition of the interrupt masking spin lock acquiry in dio_bio_submit()
    looks alarming. This lock acquiry existed in that path before the recent
    dio completion patch set. We shouldn't expect significant performance
    regression from returning to the behaviour that existed before the
    completion clean up work.

    This passed 4k block ext3 O_DIRECT fsx and aio-stress on an SMP machine.

    Signed-off-by: Zach Brown
    Cc: Badari Pulavarty
    Cc: Suparna Bhattacharya
    Cc: Jeff Moyer
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • The only time it is safe to call aio_complete() is when the ->ki_retry
    function returns -EIOCBQUEUED to the AIO core. direct_io_worker() has
    historically done this by relying on its caller to translate positive return
    codes into -EIOCBQUEUED for the aio case. It did this by trying to keep
    conditionals in sync. direct_io_worker() knew when finished_one_bio() was
    going to call aio_complete(). It would reverse the test and wait and free the
    dio in the cases it thought that finished_one_bio() wasn't going to.

    Not surprisingly, it ended up getting it wrong. 'ret' could be a negative
    errno from the submission path but it failed to communicate this to
    finished_one_bio(). direct_io_worker() would return < 0, it's callers
    wouldn't raise -EIOCBQUEUED, and aio_complete() would be called. In the
    future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
    would be called for a second time which can manifest as an oops.

    The previous cleanups have whittled the sync and async completion paths down
    to the point where we can collapse them and clearly reassert the invariant
    that we must only call aio_complete() after returning -EIOCBQUEUED.
    direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
    drop the dio refcount and the aio bio completion path will only call
    aio_complete() when it is the last to drop the dio refcount.
    direct_io_worker() can ensure that it is the last to drop the reference count
    by waiting for bios to drain. It does this for sync ops, of course, and for
    partial dio writes that must fall back to buffered and for aio ops that saw
    errors during submission.

    This means that operations that end up waiting, even if they were issued as
    aio ops, will not call aio_complete() from dio. Instead we return the return
    code of the operation and let the aio core call aio_complete(). This is
    purposely done to fix a bug where AIO DIO file extensions would call
    aio_complete() before their callers have a chance to update i_size.

    Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
    no longer have to translate for it. XFS needs to be careful not to free
    resources that will be used during AIO completion if -EIOCBQUEUED is returned.
    We maintain the previous behaviour of trying to write fs metadata for O_SYNC
    aio+dio writes.

    Signed-off-by: Zach Brown
    Cc: Badari Pulavarty
    Cc: Suparna Bhattacharya
    Acked-by: Jeff Moyer
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • Now that we have a single refcount and waiting path we can reuse it in the
    async 'should_wait' path. It continues to rely on the fragile link between
    the conditional in dio_complete_aio() which decides to complete the AIO and
    the conditional in direct_io_worker() which decides to wait and free.

    By waiting before dropping the reference we stop dio_bio_end_aio() from
    calling dio_complete_aio() which used to wake up the waiter after seeing the
    reference count drop to 0. We hoist this wake up into dio_bio_end_aio() which
    now notices when it's left a single remaining reference that is held by the
    waiter.

    Signed-off-by: Zach Brown
    Cc: Badari Pulavarty
    Cc: Suparna Bhattacharya
    Acked-by: Jeff Moyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • Previously we had two confusing counts of bio progress. 'bio_count' was
    decremented as bios were processed and freed by the dio core. It was used to
    indicate final completion of the dio operation. 'bios_in_flight' reflected
    how many bios were between submit_bio() and bio->end_io. It was used by the
    sync path to decide when to wake up and finish completing bios and was ignored
    by the async path.

    This patch collapses the two notions into one notion of a dio reference count.
    bios hold a dio reference when they're between submit_bio and bio->end_io.

    Since bios_in_flight was only used in the sync path it is now equivalent to
    dio->refcount - 1 which accounts for direct_io_worker() holding a reference
    for the duration of the operation.

    dio_bio_complete() -> finished_one_bio() was called from the sync path after
    finding bios on the list that the bio->end_io function had deposited.
    finished_one_bio() can not drop the dio reference on behalf of these bios now
    because bio->end_io already has. The is_async test in finished_one_bio()
    meant that it never actually did anything other than drop the bio_count for
    sync callers. So we remove its refcount decrement, don't call it from
    dio_bio_complete(), and hoist its call up into the async dio_bio_complete()
    caller after an explicit refcount decrement. It is renamed dio_complete_aio()
    to reflect the remaining work it actually does.

    Signed-off-by: Zach Brown
    Cc: Badari Pulavarty
    Cc: Suparna Bhattacharya
    Acked-by: Jeff Moyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • We only need to call blk_run_address_space() once after all the bios for the
    direct IO op have been submitted. This removes the chance of calling
    blk_run_address_space() after spurious wake ups as the sync path waits for
    bios to drain. It's also one less difference betwen the sync and async paths.

    In the process we remove a redundant dio_bio_submit() that its caller had
    already performed.

    Signed-off-by: Zach Brown
    Cc: Badari Pulavarty
    Cc: Suparna Bhattacharya
    Acked-by: Jeff Moyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • There have been a lot of bugs recently due to the way direct_io_worker() tries
    to decide how to finish direct IO operations. In the worst examples it has
    failed to call aio_complete() at all (hang) or called it too many times
    (oops).

    This set of patches cleans up the completion phase with the goal of removing
    the complexity that lead to these bugs. We end up with one path that
    calculates the result of the operation after all off the bios have completed.
    We decide when to generate a result of the operation using that path based on
    the final release of a refcount on the dio structure.

    I tried to progress towards the final state in steps that were relatively easy
    to understand. Each step should compile but I only tested the final result of
    having all the patches applied.

    I've tested these on low end PC drives with aio-stress, the direct IO tests I
    could manage to get running in LTP, orasim, and some home-brew functional
    tests.

    In http://lkml.org/lkml/2006/9/21/103 IBM reports success with ext2 and ext3
    running DIO LTP tests. They found that XFS bug which has since been addressed
    in the patch series.

    This patch:

    The mechanics which decide the result of a direct IO operation were duplicated
    in the sync and async paths.

    The async path didn't check page_errors which can manifest as silently
    returning success when the final pointer in an operation faults and its
    matching file region is filled with zeros.

    The sync path and async path differed in whether they passed errors to the
    caller's dio->end_io operation. The async path was passing errors to it which
    trips an assertion in XFS, though it is apparently harmless.

    This centralizes the completion phase of dio ops in one place. AIO will now
    return EFAULT consistently and all paths fall back to the previously sync
    behaviour of passing the number of bytes 'transferred' to the dio->end_io
    callback, regardless of errors.

    dio_await_completion() doesn't have to propogate EIO from non-uptodate bios
    now that it's being propogated through dio_complete() via dio->io_error. This
    lets it return void which simplifies its sole caller.

    Signed-off-by: Zach Brown
    Cc: Badari Pulavarty
    Cc: Suparna Bhattacharya
    Acked-by: Jeff Moyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • Account for direct-io.

    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

04 Jul, 2006

1 commit

  • Teach special (rwsem-in-irq) locking code to the lock validator. Has no
    effect on non-lockdep kernels.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

23 Jun, 2006

1 commit

  • A process flag to indicate whether we are doing sync io is incredibly
    ugly. It also causes performance problems when one does a lot of async
    io and then proceeds to sync it. Part of the io will go out as async,
    and the other part as sync. This causes a disconnect between the
    previously submitted io and the synced io. For io schedulers such as CFQ,
    this will cause us lost merges and suboptimal behaviour in scheduling.

    Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
    the O_DIRECT path just directly indicate that the writes are sync
    by using WRITE_SYNC instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

01 Apr, 2006

1 commit


29 Mar, 2006

1 commit


27 Mar, 2006

1 commit

  • Now that get_block() can handle mapping multiple disk blocks, no need to have
    ->get_blocks(). This patch removes fs specific ->get_blocks() added for DIO
    and makes it users use get_block() instead.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

26 Mar, 2006

1 commit

  • There is a bug in direct-io on propagating write error up to the higher I/O
    layer. When performing an async ODIRECT write to a block device, if a
    device error occurred (like media error or disk is pulled), the error code
    is only propagated from device driver to the DIO layer. The error code
    stops at finished_one_bio(). The aysnc write, however, is supposedly have
    a corresponding AIO event with appropriate return code (in this case -EIO).
    Application which waits on the async write event, will hang forever since
    such AIO event is lost forever (if such app did not use the timeout option
    in io_getevents call. Regardless, an AIO event is lost).

    The discovery of above bug leads to another discovery of potential race
    window with dio->result. The fundamental problem is that dio->result is
    overloaded with dual use: an indicator of fall back path for partial dio
    write, and an error indicator used in the I/O completion path. In the
    event of device error, the setting of -EIO to dio->result clashes with
    value used to track partial write that activates the fall back path.

    It was also pointed out that it is impossible to use dio->result to track
    partial write and at the same time to track error returned from device
    driver. Because direct_io_work can only determines whether it is a partial
    write at the end of io submission and in mid stream of those io submission,
    a return code could be coming back from the driver. Thus messing up all
    the subsequent logic.

    Proposed fix is to separating out error code returned by the IO completion
    path from partial IO submit tracking. A new variable is added to dio
    structure specifically to track io error returned in the completion path.

    Signed-off-by: Ken Chen
    Acked-by: Zach Brown
    Acked-by: Suparna Bhattacharya
    Cc: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen, Kenneth W
     

15 Mar, 2006

1 commit

  • Affects only XFS (i.e. DIO_OWN_LOCKING case) - currently it is
    not possible to get i_mutex locking correct when using DIO_OWN
    direct I/O locking in a filesystem due to indeterminism in the
    possible return code/lock/unlock combinations. This can cause
    a direct read to attempt a double i_mutex unlock inside XFS.

    We're now ensuring __blockdev_direct_IO always exits with the
    inode i_mutex (still) held for a direct reader.

    Tested with the three different locking modes (via direct block
    device access, ext3 and XFS) - both reading and writing; cannot
    find any regressions resulting from this change, and it clearly
    fixes the mutex_unlock warning originally reported here:
    http://marc.theaimsgroup.com/?l=linux-kernel&m=114189068126253&w=2

    Signed-off-by: Nathan Scott
    Acked-by: Christoph Hellwig

    Nathan Scott
     

04 Feb, 2006

1 commit

  • Currently, if you open a file O_DIRECT, truncate it to a size that is not a
    multiple of the disk block size, and then try to read the last block in the
    file, the read will return 0. The problem is in do_direct_IO, here:

    /* Handle holes */
    if (!buffer_mapped(map_bh)) {
    char *kaddr;

    ...

    if (dio->block_in_file >=
    i_size_read(dio->inode)>>blkbits) {
    /* We hit eof */
    page_cache_release(page);
    goto out;
    }

    We shift off any remaining bytes in the final block of the I/O, resulting
    in a 0-sized read. I've attached a patch that fixes this. I'm not happy
    about how ugly the math is getting, so suggestions are more than welcome.

    I've tested this with a simple program that performs the steps outlined for
    reproducing the problem above. Without the patch, we get a 0-sized result
    from read. With the patch, we get the correct return value from the short
    read.

    Signed-off-by: Jeff Moyer
    Cc: Badari Pulavarty
    Cc: Suparna Bhattacharya
    Cc: Mingming Cao
    Cc: Joel Becker
    Cc: "Chen, Kenneth W"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Moyer
     

10 Jan, 2006

1 commit


30 Oct, 2005

1 commit

  • Remove PageReserved() calls from core code by tightening VM_RESERVED
    handling in mm/ to cover PageReserved functionality.

    PageReserved special casing is removed from get_page and put_page.

    All setting and clearing of PageReserved is retained, and it is now flagged
    in the page_alloc checks to help ensure we don't introduce any refcount
    based freeing of Reserved pages.

    MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being
    deprecated. We never completely handled it correctly anyway, and is be
    reintroduced in future if required (Hugh has a proof of concept).

    Once PageReserved() calls are removed from kernel/power/swsusp.c, and all
    arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can
    be trivially removed.

    Last real user of PageReserved is swsusp, which uses PageReserved to
    determine whether a struct page points to valid memory or not. This still
    needs to be addressed (a generic page_is_ram() should work).

    A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and
    thus mapcounted and count towards shared rss). These writes to the struct
    page could cause excessive cacheline bouncing on big systems. There are a
    number of ways this could be addressed if it is an issue.

    Signed-off-by: Nick Piggin

    Refcount bug fix for filemap_xip.c

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

24 Jun, 2005

1 commit


17 Apr, 2005

2 commits

  • The direct I/O code is mapping the read request to the file system block. If
    the file size was not on a block boundary, the result would show the the read
    reading past EOF. This was only happening for the AIO case. The non-AIO case
    truncates the result to match file size (in direct_io_worker). This patch
    does the same thing for the AIO case, it truncates the result to match the
    file size if the read reads past EOF.

    When I/O completes the result can be truncated to match the file size
    without using i_size_read(), thus the aio result now matches the number of
    bytes read to the end of file.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel McNeil
     
  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds