07 Jan, 2006

12 commits

  • cleanup

    Signed-off-by: David Gibson
    Signed-off-by: Adam Litke
    Cc: William Lee Irwin III
    Cc: "Seth, Rohit"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke
     
  • Here is the patch to implement madvise(MADV_REMOVE) - which frees up a
    given range of pages & its associated backing store. Current
    implementation supports only shmfs/tmpfs and other filesystems return
    -ENOSYS.

    "Some app allocates large tmpfs files, then when some task quits and some
    client disconnect, some memory can be released. However the only way to
    release tmpfs-swap is to MADV_REMOVE". - Andrea Arcangeli

    Databases want to use this feature to drop a section of their bufferpool
    (shared memory segments) - without writing back to disk/swap space.

    This feature is also useful for supporting hot-plug memory on UML.

    Concerns raised by Andrew Morton:

    - "We have no plan for holepunching! If we _do_ have such a plan (or
    might in the future) then what would the API look like? I think
    sys_holepunch(fd, start, len), so we should start out with that."

    - Using madvise is very weird, because people will ask "why do I need to
    mmap my file before I can stick a hole in it?"

    - None of the other madvise operations call into the filesystem in this
    manner. A broad question is: is this capability an MM operation or a
    filesytem operation? truncate, for example, is a filesystem operation
    which sometimes has MM side-effects. madvise is an mm operation and with
    this patch, it gains FS side-effects, only they're really, really
    significant ones."

    Comments:

    - Andrea suggested the fs operation too but then it's more efficient to
    have it as a mm operation with fs side effects, because they don't
    immediatly know fd and physical offset of the range. It's possible to
    fixup in userland and to use the fs operation but it's more expensive,
    the vmas are already in the kernel and we can use them.

    Short term plan & Future Direction:

    - We seem to need this interface only for shmfs/tmpfs files in the short
    term. We have to add hooks into the filesystem for correctness and
    completeness. This is what this patch does.

    - In the future, plan is to support both fs and mmap apis also. This
    also involves (other) filesystem specific functions to be implemented.

    - Current patch doesn't support VM_NONLINEAR - which can be addressed in
    the future.

    Signed-off-by: Badari Pulavarty
    Cc: Hugh Dickins
    Cc: Andrea Arcangeli
    Cc: Michael Kerrisk
    Cc: Ulrich Drepper
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • This patch makes truncate_inode_pages_range from truncate_inode_pages.
    truncate_inode_pages became a one-liner call to truncate_inode_pages_range.

    Reiser4 needs truncate_inode_pages_ranges because it tries to keep
    correspondence between existences of metadata pointing to data pages and pages
    to which those metadata point to. So, when metadata of certain part of file
    is removed from filesystem tree, only pages of corresponding range are to be
    truncated.

    (Needed by the madvise(MADV_REMOVE) patch)

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hans Reiser
     
  • register_memory is global and declared so in linux/memory.h. Update the
    HOTPLUG specific definition to match. This fixes a compile warning when
    HOTPLUG is enabled.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • Both register_memory_notifer and unregister_memory_notifier are global and
    declared so in linux/memory.h. Update the HOTPLUG specific definitions to
    match. This fixes a compile warning when HOTPLUG is enabled.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • __add_section defines an unused pointer to the zones pgdat. Remove this
    definition. This fixes a compile warning.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • Two changes to the setting of the ALLOC_CPUSET flag in
    mm/page_alloc.c:__alloc_pages()

    - A bug fix - the "ignoring mins" case should not be honoring ALLOC_CPUSET.
    This case of all cases, since it is handling a request that will free up
    more memory than is asked for (exiting tasks, e.g.) should be allowed to
    escape cpuset constraints when memory is tight.

    - A logic change to make it simpler. Honor cpusets even on GFP_ATOMIC
    (!wait) requests. With this, cpuset confinement applies to all requests
    except ALLOC_NO_WATERMARKS, so that in a subsequent cleanup patch, I can
    remove the ALLOC_CPUSET flag entirely. Since I don't know any real reason
    this logic has to be either way, I am choosing the path of the simplest
    code.

    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • - This function returns -EINVAL all the time. Fix.

    - Decruftify it a bit too.

    - Writing to it doesn't seem to do what it's suppoed to do.

    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Cc: Ivan Kokshaysky
    Cc: Richard Henderson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • The hash.h hash_long function, when used on a 64 bit machine, ignores many
    of the middle-order bits. (The prime chosen it too bit-sparse).

    IP addresses for clients of an NFS server are very likely to differ only in
    the low-order bits. As addresses are stored in network-byte-order, these
    bits become middle-order bits in a little-endian 64bit 'long', and so do
    not contribute to the hash. Thus you can have the situation where all
    clients appear on one hash chain.

    So, until hash_long is fixed (or maybe forever), us a hash function that
    works well on IP addresses - xor the bytes together.

    Thanks to "Iozone" for identifying this problem.

    Cc: "Iozone"

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Janos Haar of First NetCenter Bt. reported numerous crashes involving the
    NBD driver. With his help, this was tracked down to bogus bio vectors
    which in turn was the result of a race condition between the
    receive/transmit routines in the NBD driver.

    The bug manifests itself like this:

    CPU0 CPU1
    do_nbd_request
    add req to queuelist
    nbd_send_request
    send req head
    for each bio
    kmap
    send
    nbd_read_stat
    nbd_find_request
    nbd_end_request
    kunmap

    When CPU1 finishes nbd_end_request, the request and all its associated
    bio's are freed. So when CPU0 calls kunmap whose argument is derived from
    the last bio, it may crash.

    Under normal circumstances, the race occurs only on the last bio. However,
    if an error is encountered on the remote NBD server (such as an incorrect
    magic number in the request), or if there were a bug in the server, it is
    possible for the nbd_end_request to occur any time after the request's
    addition to the queuelist.

    The following patch fixes this problem by making sure that requests are not
    added to the queuelist until after they have been completed transmission.

    In order for the receiving side to be ready for responses involving
    requests still being transmitted, the patch introduces the concept of the
    active request.

    When a response matches the current active request, its processing is
    delayed until after the tranmission has come to a stop.

    This has been tested by Janos and it has been successful in curing this
    race condition.

    From: Herbert Xu

    Here is an updated patch which removes the active_req wait in
    nbd_clear_queue and the associated memory barrier.

    I've also clarified this in the comment.

    Signed-off-by: Herbert Xu
    Cc:
    Cc: Paul Clements
    Signed-off-by: Herbert Xu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Herbert Xu
     
  • nls_utf8 is available, and the check in hfsplus_fill_super checks the wrong
    pointer for NULLness (it checks the saved nls, not the new one that it
    needs to use.)

    Signed-off-by: Joshua Kwan
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joshua Kwan
     

06 Jan, 2006

28 commits