19 May, 2007

1 commit


15 May, 2007

1 commit


11 May, 2007

1 commit


09 May, 2007

2 commits


06 Feb, 2007

1 commit


02 Oct, 2006

1 commit

  • Some architectures provide an execve function that does not set errno, but
    instead returns the result code directly. Rename these to kernel_execve to
    get the right semantics there. Moreover, there is no reasone for any of these
    architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so
    remove these right away.

    [akpm@osdl.org: build fix]
    [bunk@stusta.de: build fix]
    Signed-off-by: Arnd Bergmann
    Cc: Andi Kleen
    Acked-by: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Russell King
    Cc: Ian Molton
    Cc: Mikael Starvik
    Cc: David Howells
    Cc: Yoshinori Sato
    Cc: Hirokazu Takata
    Cc: Ralf Baechle
    Cc: Kyle McMartin
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Paul Mundt
    Cc: Kazumoto Kojima
    Cc: Richard Curnow
    Cc: William Lee Irwin III
    Cc: "David S. Miller"
    Cc: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Cc: Miles Bader
    Cc: Chris Zankel
    Cc: "Luck, Tony"
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Signed-off-by: Adrian Bunk
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

27 Sep, 2006

1 commit


09 Sep, 2006

1 commit


23 Jun, 2006

1 commit

  • move_pages() is used to move individual pages of a process. The function can
    be used to determine the location of pages and to move them onto the desired
    node. move_pages() returns status information for each page.

    long move_pages(pid, number_of_pages_to_move,
    addresses_of_pages[],
    nodes[] or NULL,
    status[],
    flags);

    The addresses of pages is an array of void * pointing to the
    pages to be moved.

    The nodes array contains the node numbers that the pages should be moved
    to. If a NULL is passed instead of an array then no pages are moved but
    the status array is updated. The status request may be used to determine
    the page state before issuing another move_pages() to move pages.

    The status array will contain the state of all individual page migration
    attempts when the function terminates. The status array is only valid if
    move_pages() completed successfullly.

    Possible page states in status[]:

    0..MAX_NUMNODES The page is now on the indicated node.

    -ENOENT Page is not present

    -EACCES Page is mapped by multiple processes and can only
    be moved if MPOL_MF_MOVE_ALL is specified.

    -EPERM The page has been mlocked by a process/driver and
    cannot be moved.

    -EBUSY Page is busy and cannot be moved. Try again later.

    -EFAULT Invalid address (no VMA or zero page).

    -ENOMEM Unable to allocate memory on target node.

    -EIO Unable to write back page. The page must be written
    back in order to move it since the page is dirty and the
    filesystem does not provide a migration function that
    would allow the moving of dirty pages.

    -EINVAL A dirty page cannot be moved. The filesystem does not provide
    a migration function and has no ability to write back pages.

    The flags parameter indicates what types of pages to move:

    MPOL_MF_MOVE Move pages that are only mapped by the process.

    MPOL_MF_MOVE_ALL Also move pages that are mapped by multiple processes.
    Requires sufficient capabilities.

    Possible return codes from move_pages()

    -ENOENT No pages found that would require moving. All pages
    are either already on the target node, not present, had an
    invalid address or could not be moved because they were
    mapped by multiple processes.

    -EINVAL Flags other than MPOL_MF_MOVE(_ALL) specified or an attempt
    to migrate pages in a kernel thread.

    -EPERM MPOL_MF_MOVE_ALL specified without sufficient priviledges.
    or an attempt to move a process belonging to another user.

    -EACCES One of the target nodes is not allowed by the current cpuset.

    -ENODEV One of the target nodes is not online.

    -ESRCH Process does not exist.

    -E2BIG Too many pages to move.

    -ENOMEM Not enough memory to allocate control array.

    -EFAULT Parameters could not be accessed.

    A test program for move_pages() may be found with the patches
    on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm3

    From: Christoph Lameter

    Detailed results for sys_move_pages()

    Pass a pointer to an integer to get_new_page() that may be used to
    indicate where the completion status of a migration operation should be
    placed. This allows sys_move_pags() to report back exactly what happened to
    each page.

    Wish there would be a better way to do this. Looks a bit hacky.

    Signed-off-by: Christoph Lameter
    Cc: Hugh Dickins
    Cc: Jes Sorensen
    Cc: KAMEZAWA Hiroyuki
    Cc: Lee Schermerhorn
    Cc: Andi Kleen
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

29 Apr, 2006

1 commit


26 Apr, 2006

2 commits


11 Apr, 2006

1 commit

  • Basically an in-kernel implementation of tee, which uses splice and the
    pipe buffers as an intelligent way to pass data around by reference.

    Where the user space tee consumes the input and produces a stdout and
    file output, this syscall merely duplicates the data inside a pipe to
    another pipe. No data is copied, the output just grabs a reference to the
    input pipe data.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

07 Apr, 2006

1 commit


05 Apr, 2006

1 commit


31 Mar, 2006

1 commit

  • This adds support for the sys_splice system call. Using a pipe as a
    transport, it can connect to files or sockets (latter as output only).

    From the splice.c comments:

    "splice": joining two ropes together by interweaving their strands.

    This is the "extended pipe" functionality, where a pipe is used as
    an arbitrary in-memory buffer. Think of a pipe as a small kernel
    buffer that you can use to transfer data from one end to the other.

    The traditional unix read/write is extended with a "splice()" operation
    that transfers data buffers to or from a pipe buffer.

    Named by Larry McVoy, original implementation from Linus, extended by
    Jens to support splicing to files and fixing the initial implementation
    bugs.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

09 Feb, 2006

1 commit


07 Feb, 2006

1 commit


09 Jan, 2006

1 commit

  • sys_migrate_pages implementation using swap based page migration

    This is the original API proposed by Ray Bryant in his posts during the first
    half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org.

    The intent of sys_migrate is to migrate memory of a process. A process may
    have migrated to another node. Memory was allocated optimally for the prior
    context. sys_migrate_pages allows to shift the memory to the new node.

    sys_migrate_pages is also useful if the processes available memory nodes have
    changed through cpuset operations to manually move the processes memory. Paul
    Jackson is working on an automated mechanism that will allow an automatic
    migration if the cpuset of a process is changed. However, a user may decide
    to manually control the migration.

    This implementation is put into the policy layer since it uses concepts and
    functions that are also needed for mbind and friends. The patch also provides
    a do_migrate_pages function that may be useful for cpusets to automatically
    move memory. sys_migrate_pages does not modify policies in contrast to Ray's
    implementation.

    The current code here is based on the swap based page migration capability and
    thus is not able to preserve the physical layout relative to it containing
    nodeset (which may be a cpuset). When direct page migration becomes available
    then the implementation needs to be changed to do a isomorphic move of pages
    between different nodesets. The current implementation simply evicts all
    pages in source nodeset that are not in the target nodeset.

    Patch supports ia64, i386 and x86_64.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

07 Jan, 2006

1 commit


31 Oct, 2005

1 commit


28 Jul, 2005

1 commit


28 Jun, 2005

1 commit

  • This updates the CFQ io scheduler to the new time sliced design (cfq
    v3). It provides full process fairness, while giving excellent
    aggregate system throughput even for many competing processes. It
    supports io priorities, either inherited from the cpu nice value or set
    directly with the ioprio_get/set syscalls. The latter closely mimic
    set/getpriority.

    This import is based on my latest from -mm.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

22 Jun, 2005

1 commit

  • This is the core of the (much simplified) early reclaim. The goal of this
    patch is to reclaim some easily-freed pages from a zone before falling back
    onto another zone.

    One of the major uses of this is NUMA machines. With the default allocator
    behavior the allocator would look for memory in another zone, which might be
    off-node, before trying to reclaim from the current zone.

    This adds a zone tuneable to enable early zone reclaim. It is selected on a
    per-zone basis and is turned on/off via syscall.

    Adding some extra throttling on the reclaim was also required (patch
    4/4). Without the machine would grind to a crawl when doing a "make -j"
    kernel build. Even with this patch the System Time is higher on
    average, but it seems tolerable. Here are some numbers for kernbench
    runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:

    wall user sys %cpu ctx sw. sleeps
    ---- ---- --- ---- ------ ------
    No patch 1009 1384 847 258 298170 504402
    w/patch, no reclaim 880 1376 667 288 254064 396745
    w/patch & reclaim 1079 1385 926 252 291625 548873

    These numbers are the average of 2 runs of 3 "make -j" runs done right
    after system boot. Run-to-run variability for "make -j" is huge, so
    these numbers aren't terribly useful except to seee that with reclaim
    the benchmark still finishes in a reasonable amount of time.

    I also looked at the NUMA hit/miss stats for the "make -j" runs and the
    reclaim doesn't make any difference when the machine is thrashing away.

    Doing a "make -j8" on a single node that is filled with page cache pages
    takes 700 seconds with reclaim turned on and 735 seconds without reclaim
    (due to remote memory accesses).

    The simple zone_reclaim syscall program is at
    http://www.bork.org/~mort/sgi/zone_reclaim.c

    Signed-off-by: Martin Hicks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Hicks
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds