16 Apr, 2008

1 commit

  • As shown by Gurudas Pai recently, we can put hugepages into the surplus
    state (by echo 0 > /proc/sys/vm/nr_hugepages), even when
    /proc/sys/vm/nr_overcommit_hugepages is 0. This is actually correct, to
    allow the original goal (shrink the static pool to 0) to succeed (we are
    converting hugepages to surplus because they are in use). However, the
    documentation does not accurately reflect this case. Update it.

    Signed-off-by: Nishanth Aravamudan
    Acked-by: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     

07 Mar, 2008

1 commit


22 Feb, 2008

1 commit

  • I keep running upstream and mm kernels and the location of the slab
    directory is different since upstream still uses /sys/slab. This patch
    makes slabinfo check /sys/slab if /sys/kernel/slab is not there. Makes
    slabinfo work on any kernel.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Christoph Lameter
     

08 Feb, 2008

1 commit

  • The statistics provided here allow the monitoring of allocator behavior but
    at the cost of some (minimal) loss of performance. Counters are placed in
    SLUB's per cpu data structure. The per cpu structure may be extended by the
    statistics to grow larger than one cacheline which will increase the cache
    footprint of SLUB.

    There is a compile option to enable/disable the inclusion of the runtime
    statistics and its off by default.

    The slabinfo tool is enhanced to support these statistics via two options:

    -D Switches the line of information displayed for a slab from size
    mode to activity mode.

    -A Sorts the slabs displayed by activity. This allows the display of
    the slabs most important to the performance of a certain load.

    -r Report option will report detailed statistics on

    Example (tbench load):

    slabinfo -AD ->Shows the most active slabs

    Name Objects Alloc Free %Fast
    skbuff_fclone_cache 33 111953835 111953835 99 99
    :0000192 2666 5283688 5281047 99 99
    :0001024 849 5247230 5246389 83 83
    vm_area_struct 1349 119642 118355 91 22
    :0004096 15 66753 66751 98 98
    :0000064 2067 25297 23383 98 78
    dentry 10259 28635 18464 91 45
    :0000080 11004 18950 8089 98 98
    :0000096 1703 12358 10784 99 98
    :0000128 762 10582 9875 94 18
    :0000512 184 9807 9647 95 81
    :0002048 479 9669 9195 83 65
    anon_vma 777 9461 9002 99 71
    kmalloc-8 6492 9981 5624 99 97
    :0000768 258 7174 6931 58 15

    So the skbuff_fclone_cache is of highest importance for the tbench load.
    Pretty high load on the 192 sized slab. Look for the aliases

    slabinfo -a | grep 000192
    :0000192 -r option implied if cache name is mentioned

    .... Usual output ...

    Slab Perf Counter Alloc Free %Al %Fr
    --------------------------------------------------
    Fastpath 111953360 111946981 99 99
    Slowpath 1044 7423 0 0
    Page Alloc 272 264 0 0
    Add partial 25 325 0 0
    Remove partial 86 264 0 0
    RemoteObj/SlabFrozen 350 4832 0 0
    Total 111954404 111954404

    Flushes 49 Refill 0
    Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%)

    Looks good because the fastpath is overwhelmingly taken.

    skbuff_head_cache:

    Slab Perf Counter Alloc Free %Al %Fr
    --------------------------------------------------
    Fastpath 5297262 5259882 99 99
    Slowpath 4477 39586 0 0
    Page Alloc 937 824 0 0
    Add partial 0 2515 0 0
    Remove partial 1691 824 0 0
    RemoteObj/SlabFrozen 2621 9684 0 0
    Total 5301739 5299468

    Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%)

    Descriptions of the output:

    Total: The total number of allocation and frees that occurred for a
    slab

    Fastpath: The number of allocations/frees that used the fastpath.

    Slowpath: Other allocations

    Page Alloc: Number of calls to the page allocator as a result of slowpath
    processing

    Add Partial: Number of slabs added to the partial list through free or
    alloc (occurs during cpuslab flushes)

    Remove Partial: Number of slabs removed from the partial list as a result of
    allocations retrieving a partial slab or by a free freeing
    the last object of a slab.

    RemoteObj/Froz: How many times were remotely freed object encountered when a
    slab was about to be deactivated. Frozen: How many times was
    free able to skip list processing because the slab was in use
    as the cpuslab of another processor.

    Flushes: Number of times the cpuslab was flushed on request
    (kmem_cache_shrink, may result from races in __slab_alloc)

    Refill: Number of times we were able to refill the cpuslab from
    remotely freed objects for the same slab.

    Deactivate: Statistics how slabs were deactivated. Shows how they were
    put onto the partial list.

    In general fastpath is very good. Slowpath without partial list processing is
    also desirable. Any touching of partial list uses node specific locks which
    may potentially cause list lock contention.

    Signed-off-by: Christoph Lameter

    Christoph Lameter
     

25 Jan, 2008

1 commit


18 Dec, 2007

1 commit

  • The hugetlb documentation has gotten a bit out of sync with the current code.
    Updated the sysctl file to refer to Documentation/vm/hugetlbpage.txt. Update
    that file to contain the current state of affairs (with the newer named sysctl
    in place).

    Signed-off-by: Nishanth Aravamudan
    Acked-by: Adam Litke
    Cc: William Lee Irwin III
    Cc: Dave Hansen
    Cc: David Gibson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     

17 Oct, 2007

3 commits

  • This patch does the following cleanups for Documentation/vm/slabinfo.c:

    - Fix two memory leaks;
    - Constify some char pointers;
    - Use snprintf instead of sprintf in case of buffer overflow;
    - Fix some indentations;
    - Other little improvements.

    Acked-by: Christoph Lameter
    Signed-off-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Cong
     
  • Looks like the 00-INDEX file lost its parent directory in -rc6-mm1.

    Signed-off-by: David Rientjes
    Cc: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Allow an application to query the memories allowed by its context.

    Updated numa_memory_policy.txt to mention that applications can use this to
    obtain allowed memories for constructing valid policies.

    TODO: update out-of-tree libnuma wrapper[s], or maybe add a new
    wrapper--e.g., numa_get_mems_allowed() ?

    Also, update numa syscall man pages.

    Tested with memtoy V>=0.13.

    Signed-off-by: Lee Schermerhorn
    Acked-by: Christoph Lameter
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

23 Aug, 2007

1 commit

  • I couldn't find any memory policy documentation in the Documentation
    directory, so here is my attempt to document it.

    There's lots more that could be written about the internal design--including
    data structures, functions, etc. However, if you agree that this is better
    that the nothing that exists now, perhaps it could be merged. This will
    provide a baseline for updates to document the many policy patches that are
    currently being worked.

    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: Andi Kleen
    Cc: Michael Kerrisk
    Acked-by: Rob Landley
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

10 Aug, 2007

1 commit


18 Jul, 2007

1 commit

  • Changes the error reporting format to loosely follow lockdep.

    If data corruption is detected then we generate the following lines:

    ============================================
    BUG :
    --------------------------------------------

    INFO: [possibly multiple times]

    FIX :

    This also adds some more intelligence to the data corruption detection. Its
    now capable of figuring out the start and end.

    Add a comment on how to configure SLUB so that a production system may
    continue to operate even though occasional slab corruption occur through
    a misbehaving kernel component. See "Emergency operations" in
    Documentation/vm/slub.txt.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

17 Jul, 2007

2 commits

  • Use lib/parser.c to parse hugetlbfs mount options. Correct docs in
    hugetlbpage.txt.

    old size of hugetlbfs_fill_super: 675 bytes
    new size of hugetlbfs_fill_super: 686 bytes
    (hugetlbfs_parse_options() is inlined)

    Signed-off-by: Randy Dunlap
    Cc: Hugh Dickins
    Cc: David Gibson
    Cc: Adam Litke
    Acked-by: William Lee Irwin III
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Add a new configuration variable

    CONFIG_SLUB_DEBUG_ON

    If set then the kernel will be booted by default with slab debugging
    switched on. Similar to CONFIG_SLAB_DEBUG. By default slab debugging
    is available but must be enabled by specifying "slub_debug" as a
    kernel parameter.

    Also add support to switch off slab debugging for a kernel that was
    built with CONFIG_SLUB_DEBUG_ON. This works by specifying

    slub_debug=-

    as a kernel parameter.

    Dave Jones wanted this feature.
    http://marc.info/?l=linux-kernel&m=118072189913045&w=2

    [akpm@linux-foundation.org: clean up switch statement]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

31 May, 2007

1 commit

  • Update documentation to describe how to read a SLUB error report.
    Add slub parameters to Documentation/kernel-parameters.

    Signed-off-by: Christoph Lameter
    Cc: "Randy.Dunlap"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

18 May, 2007

1 commit

  • The slab manipulation functions should not be triggered by slabs that
    are unresovable in the subset of slabs selected on the command line.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

17 May, 2007

1 commit

  • Align the output of % with K/M/G of sizes.

    Check for empty NUMA information to avoid segfault on !NUMA.

    -r should work directly not only if we match a single slab
    without additional options.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

10 May, 2007

1 commit

  • -e Show empty slabs
    -d Modification of slab debug options at runtime
    -o Operations. Display of ctor / dtor etc.
    -r Report: Display all available information about a slabcache.

    Cleanup tracking display and make it work right.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

08 May, 2007

2 commits


04 Oct, 2006

1 commit


23 Jun, 2006

2 commits

  • move_pages() is used to move individual pages of a process. The function can
    be used to determine the location of pages and to move them onto the desired
    node. move_pages() returns status information for each page.

    long move_pages(pid, number_of_pages_to_move,
    addresses_of_pages[],
    nodes[] or NULL,
    status[],
    flags);

    The addresses of pages is an array of void * pointing to the
    pages to be moved.

    The nodes array contains the node numbers that the pages should be moved
    to. If a NULL is passed instead of an array then no pages are moved but
    the status array is updated. The status request may be used to determine
    the page state before issuing another move_pages() to move pages.

    The status array will contain the state of all individual page migration
    attempts when the function terminates. The status array is only valid if
    move_pages() completed successfullly.

    Possible page states in status[]:

    0..MAX_NUMNODES The page is now on the indicated node.

    -ENOENT Page is not present

    -EACCES Page is mapped by multiple processes and can only
    be moved if MPOL_MF_MOVE_ALL is specified.

    -EPERM The page has been mlocked by a process/driver and
    cannot be moved.

    -EBUSY Page is busy and cannot be moved. Try again later.

    -EFAULT Invalid address (no VMA or zero page).

    -ENOMEM Unable to allocate memory on target node.

    -EIO Unable to write back page. The page must be written
    back in order to move it since the page is dirty and the
    filesystem does not provide a migration function that
    would allow the moving of dirty pages.

    -EINVAL A dirty page cannot be moved. The filesystem does not provide
    a migration function and has no ability to write back pages.

    The flags parameter indicates what types of pages to move:

    MPOL_MF_MOVE Move pages that are only mapped by the process.

    MPOL_MF_MOVE_ALL Also move pages that are mapped by multiple processes.
    Requires sufficient capabilities.

    Possible return codes from move_pages()

    -ENOENT No pages found that would require moving. All pages
    are either already on the target node, not present, had an
    invalid address or could not be moved because they were
    mapped by multiple processes.

    -EINVAL Flags other than MPOL_MF_MOVE(_ALL) specified or an attempt
    to migrate pages in a kernel thread.

    -EPERM MPOL_MF_MOVE_ALL specified without sufficient priviledges.
    or an attempt to move a process belonging to another user.

    -EACCES One of the target nodes is not allowed by the current cpuset.

    -ENODEV One of the target nodes is not online.

    -ESRCH Process does not exist.

    -E2BIG Too many pages to move.

    -ENOMEM Not enough memory to allocate control array.

    -EFAULT Parameters could not be accessed.

    A test program for move_pages() may be found with the patches
    on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm3

    From: Christoph Lameter

    Detailed results for sys_move_pages()

    Pass a pointer to an integer to get_new_page() that may be used to
    indicate where the completion status of a migration operation should be
    placed. This allows sys_move_pags() to report back exactly what happened to
    each page.

    Wish there would be a better way to do this. Looks a bit hacky.

    Signed-off-by: Christoph Lameter
    Cc: Hugh Dickins
    Cc: Jes Sorensen
    Cc: KAMEZAWA Hiroyuki
    Cc: Lee Schermerhorn
    Cc: Andi Kleen
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

20 Apr, 2006

1 commit

  • Add new line of /proc/meminfo output.

    Explain the HugePage_ lines in /proc/meminfo (from Bill Irwin).

    Change KB to kB since the latter is what is used in the kernel.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

11 Apr, 2006

1 commit


15 Mar, 2006

1 commit

  • Update the documentation for page migration.

    - Fix up bits and pieces in cpusets.txt

    - Rework text in vm/page-migration to be clearer and reflect the final
    version of page migration in 2.6.16. Mention Andi Kleen's numactl
    package that contains user space tools for page migration via
    libnuma. Add reference to numa_maps and to the manpage in numactl.

    - Add todo list for outstanding issues

    Signed-off-by: Christoph Lameter
    Acked-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

02 Feb, 2006

1 commit

  • Add direct migration support with fall back to swap.

    Direct migration support on top of the swap based page migration facility.

    This allows the direct migration of anonymous pages and the migration of file
    backed pages by dropping the associated buffers (requires writeout).

    Fall back to swap out if necessary.

    The patch is based on lots of patches from the hotplug project but the code
    was restructured, documented and simplified as much as possible.

    Note that an additional patch that defines the migrate_page() method for
    filesystems is necessary in order to avoid writeback for anonymous and file
    backed pages.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Mike Kravetz
    Signed-off-by: Christoph Lameter
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

07 Nov, 2005

1 commit


05 Sep, 2005

1 commit

  • The idea of a swap_device_lock per device, and a swap_list_lock over them all,
    is appealing; but in practice almost every holder of swap_device_lock must
    already hold swap_list_lock, which defeats the purpose of the split.

    The only exceptions have been swap_duplicate, valid_swaphandles and an
    untrodden path in try_to_unuse (plus a few places added in this series).
    valid_swaphandles doesn't show up high in profiles, but swap_duplicate does
    demand attention. However, with the hold time in get_swap_pages so much
    reduced, I've not yet found a load and set of swap device priorities to show
    even swap_duplicate benefitting from the split. Certainly the split is mere
    overhead in the common case of a single swap device.

    So, replace swap_list_lock and swap_device_lock by spinlock_t swap_lock
    (generally we seem to prefer an _ in the name, and not hide in a macro).

    If someone can show a regression in swap_duplicate, then probably we should
    add a hashlock for the swap_map entries alone (shorts being anatomic), so as
    to help the case of the single swap device too.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds